Robotics

New submissions
Cross-lists
Replacements

See recent articles

Showing new listings for Friday, 6 June 2025

Total of 58 entries

Showing up to 2000 entries per page: fewer | more | all

[1] arXiv:2506.04308 [pdf, other]: Title: RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Enshen Zhou, Jingkun An, Cheng Chi, Yi Han, Shanyu Rong, Chi Zhang, Pengwei Wang, Zhongyuan Wang, Tiejun Huang, Lu Sheng, Shanghang Zhang

Comments: Project page: this https URL

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Spatial referring is a fundamental capability of embodied robots to interact with the 3D physical world. However, even with the powerful pretrained vision language models (VLMs), recent approaches are still not qualified to accurately understand the complex 3D scenes and dynamically reason about the instruction-indicated locations for interaction. To this end, we propose RoboRefer, a 3D-aware VLM that can first achieve precise spatial understanding by integrating a disentangled but dedicated depth encoder via supervised fine-tuning (SFT). Moreover, RoboRefer advances generalized multi-step spatial reasoning via reinforcement fine-tuning (RFT), with metric-sensitive process reward functions tailored for spatial referring tasks. To support SFT and RFT training, we introduce RefSpatial, a large-scale dataset of 20M QA pairs (2x prior), covering 31 spatial relations (vs. 15 prior) and supporting complex reasoning processes (up to 5 steps). In addition, we introduce RefSpatial-Bench, a challenging benchmark filling the gap in evaluating spatial referring with multi-step reasoning. Experiments show that SFT-trained RoboRefer achieves state-of-the-art spatial understanding, with an average success rate of 89.6%. RFT-trained RoboRefer further outperforms all other baselines by a large margin, even surpassing Gemini-2.5-Pro by 17.4% in average accuracy on RefSpatial-Bench. Notably, RoboRefer can be integrated with various control policies to execute long-horizon, dynamic tasks across diverse robots (e,g., UR5, G1 humanoid) in cluttered real-world scenes.
[2] arXiv:2506.04359 [pdf, html, other]: Title: cuVSLAM: CUDA accelerated visual odometry

Alexander Korovko, Dmitry Slepichev, Alexander Efitorov, Aigul Dzhumamuratova, Viktor Kuznetsov, Hesam Rabeti, Joydeep Biswas

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Accurate and robust pose estimation is a key requirement for any autonomous robot. We present cuVSLAM, a state-of-the-art solution for visual simultaneous localization and mapping, which can operate with a variety of visual-inertial sensor suites, including multiple RGB and depth cameras, and inertial measurement units. cuVSLAM supports operation with as few as one RGB camera to as many as 32 cameras, in arbitrary geometric configurations, thus supporting a wide range of robotic setups. cuVSLAM is specifically optimized using CUDA to deploy in real-time applications with minimal computational overhead on edge-computing devices such as the NVIDIA Jetson. We present the design and implementation of cuVSLAM, example use cases, and empirical results on several state-of-the-art benchmarks demonstrating the best-in-class performance of cuVSLAM.
[3] arXiv:2506.04362 [pdf, html, other]: Title: Learning Smooth State-Dependent Traversability from Dense Point Clouds

Zihao Dong, Alan Papalia, Leonard Jung, Alenna Spiro, Philip R. Osteen, Christa S. Robison, Michael Everett

Comments: 16 pages, 13 figures

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

A key open challenge in off-road autonomy is that the traversability of terrain often depends on the vehicle's state. In particular, some obstacles are only traversable from some orientations. However, learning this interaction by encoding the angle of approach as a model input demands a large and diverse training dataset and is computationally inefficient during planning due to repeated model inference. To address these challenges, we present SPARTA, a method for estimating approach angle conditioned traversability from point clouds. Specifically, we impose geometric structure into our network by outputting a smooth analytical function over the 1-Sphere that predicts risk distribution for any angle of approach with minimal overhead and can be reused for subsequent queries. The function is composed of Fourier basis functions, which has important advantages for generalization due to their periodic nature and smoothness. We demonstrate SPARTA both in a high-fidelity simulation platform, where our model achieves a 91\% success rate crossing a 40m boulder field (compared to 73\% for the baseline), and on hardware, illustrating the generalization ability of the model to real-world settings.
[4] arXiv:2506.04484 [pdf, html, other]: Title: Online Adaptation of Terrain-Aware Dynamics for Planning in Unstructured Environments

William Ward, Sarah Etter, Tyler Ingebrand, Christian Ellis, Adam J. Thorpe, Ufuk Topcu

Comments: Accepted to RSS-ROAR 2025

Subjects: Robotics (cs.RO)

Autonomous mobile robots operating in remote, unstructured environments must adapt to new, unpredictable terrains that can change rapidly during operation. In such scenarios, a critical challenge becomes estimating the robot's dynamics on changing terrain in order to enable reliable, accurate navigation and planning. We present a novel online adaptation approach for terrain-aware dynamics modeling and planning using function encoders. Our approach efficiently adapts to new terrains at runtime using limited online data without retraining or fine-tuning. By learning a set of neural network basis functions that span the robot dynamics on diverse terrains, we enable rapid online adaptation to new, unseen terrains and environments as a simple least-squares calculation. We demonstrate our approach for terrain adaptation in a Unity-based robotics simulator and show that the downstream controller has better empirical performance due to higher accuracy of the learned model. This leads to fewer collisions with obstacles while navigating in cluttered environments as compared to a neural ODE baseline.
[5] arXiv:2506.04505 [pdf, html, other]: Title: SGN-CIRL: Scene Graph-based Navigation with Curriculum, Imitation, and Reinforcement Learning

Nikita Oskolkov, Huzhenyu Zhang, Dmitry Makarov, Dmitry Yudin, Aleksandr Panov

Comments: 7 pages, 11 figures

Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

The 3D scene graph models spatial relationships between objects, enabling the agent to efficiently navigate in a partially observable environment and predict the location of the target this http URL paper proposes an original framework named SGN-CIRL (3D Scene Graph-Based Reinforcement Learning Navigation) for mapless reinforcement learning-based robot navigation with learnable representation of open-vocabulary 3D scene graph. To accelerate and stabilize the training of reinforcement learning-based algorithms, the framework also employs imitation learning and curriculum learning. The first one enables the agent to learn from demonstrations, while the second one structures the training process by gradually increasing task complexity from simple to more advanced scenarios. Numerical experiments conducted in the Isaac Sim environment showed that using a 3D scene graph for reinforcement learning significantly increased the success rate in difficult navigation cases. The code is open-sourced and available at: this https URL\_graph.
[6] arXiv:2506.04539 [pdf, html, other]: Title: Olfactory Inertial Odometry: Sensor Calibration and Drift Compensation

Kordel K. France, Ovidiu Daescu, Anirban Paul, Shalini Prasad

Comments: Published as a full conference paper at the 2025 IEEE International Symposium on Inertial Sensors & Systems

Subjects: Robotics (cs.RO); Emerging Technologies (cs.ET); Machine Learning (cs.LG); Systems and Control (eess.SY)

Visual inertial odometry (VIO) is a process for fusing visual and kinematic data to understand a machine's state in a navigation task. Olfactory inertial odometry (OIO) is an analog to VIO that fuses signals from gas sensors with inertial data to help a robot navigate by scent. Gas dynamics and environmental factors introduce disturbances into olfactory navigation tasks that can make OIO difficult to facilitate. With our work here, we define a process for calibrating a robot for OIO that generalizes to several olfaction sensor types. Our focus is specifically on calibrating OIO for centimeter-level accuracy in localizing an odor source on a slow-moving robot platform to demonstrate use cases in robotic surgery and touchless security screening. We demonstrate our process for OIO calibration on a real robotic arm and show how this calibration improves performance over a cold-start olfactory navigation task.
[7] arXiv:2506.04540 [pdf, html, other]: Title: Chronoamperometry with Room-Temperature Ionic Liquids: Sub-Second Inference Techniques

Kordel K. France

Comments: Published at IEEE BioSensors 2025

Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Chemical Physics (physics.chem-ph); Instrumentation and Detectors (physics.ins-det)

Chronoamperometry (CA) is a fundamental electrochemical technique used for quantifying redox-active species. However, in room-temperature ionic liquids (RTILs), the high viscosity and slow mass transport often lead to extended measurement durations. This paper presents a novel mathematical regression approach that reduces CA measurement windows to under 1 second, significantly faster than previously reported methods, which typically require 1-4 seconds or longer. By applying an inference algorithm to the initial transient current response, this method accurately predicts steady-state electrochemical parameters without requiring additional hardware modifications. The approach is validated through comparison with standard chronoamperometric techniques and is demonstrated to maintain reasonable accuracy while dramatically reducing data acquisition time. The implications of this technique are explored in analytical chemistry, sensor technology, and battery science, where rapid electrochemical quantification is critical. Our technique is focused on enabling faster multiplexing of chronoamperometric measurements for rapid olfactory and electrochemical analysis.
[8] arXiv:2506.04547 [pdf, other]: Title: Multimodal Limbless Crawling Soft Robot with a Kirigami Skin

Jonathan Tirado, Aida Parvaresh, Burcu Seyidoğlu, Darryl A. Bedford, Jonas Jørgensen, Ahmad Rafsanjani

Comments: Cyborg and Bionic Systems (2025)

Subjects: Robotics (cs.RO)

Limbless creatures can crawl on flat surfaces by deforming their bodies and interacting with asperities on the ground, offering a biological blueprint for designing efficient limbless robots. Inspired by this natural locomotion, we present a soft robot capable of navigating complex terrains using a combination of rectilinear motion and asymmetric steering gaits. The robot is made of a pair of antagonistic inflatable soft actuators covered with a flexible kirigami skin with asymmetric frictional properties. The robot's rectilinear locomotion is achieved through cyclic inflation of internal chambers with precise phase shifts, enabling forward progression. Steering is accomplished using an asymmetric gait, allowing for both in-place rotation and wide turns. To validate its mobility in obstacle-rich environments, we tested the robot in an arena with coarse substrates and multiple obstacles. Real-time feedback from onboard proximity sensors, integrated with a human-machine interface (HMI), allowed adaptive control to avoid collisions. This study highlights the potential of bioinspired soft robots for applications in confined or unstructured environments, such as search-and-rescue operations, environmental monitoring, and industrial inspections.
[9] arXiv:2506.04577 [pdf, html, other]: Title: A Novel Transformer-Based Method for Full Lower-Limb Joint Angles and Moments Prediction in Gait Using sEMG and IMU data

Farshad Haghgoo Daryakenari, Tara Farizeh

Comments: 10 pages, 4 figures

Subjects: Robotics (cs.RO)

This study presents a transformer-based deep learning framework for the long-horizon prediction of full lower-limb joint angles and joint moments using surface electromyography (sEMG) and inertial measurement unit (IMU) signals. Two separate Transformer Neural Networks (TNNs) were designed: one for kinematic prediction and one for kinetic prediction. The model was developed with real-time application in mind, using only wearable sensors suitable for outside-laboratory use. Two prediction horizons were considered to evaluate short- and long-term performance. The network achieved high accuracy in both tasks, with Spearman correlation coefficients exceeding 0.96 and R-squared scores above 0.92 across all joints. Notably, the model consistently outperformed a recent benchmark method in joint angle prediction, reducing RMSE errors by an order of magnitude. The results confirmed the complementary role of sEMG and IMU signals in capturing both kinematic and kinetic information. This work demonstrates the potential of transformer-based models for real-time, full-limb biomechanical prediction in wearable and robotic applications, with future directions including input minimization and modality-specific weighting strategies to enhance model efficiency and accuracy.
[10] arXiv:2506.04627 [pdf, html, other]: Title: Enhancing Efficiency and Propulsion in Bio-mimetic Robotic Fish through End-to-End Deep Reinforcement Learning

Xinyu Cui, Boai Sun, Yi Zhu, Ning Yang, Haifeng Zhang, Weicheng Cui, Dixia Fan, Jun Wang

Journal-ref: Physics of Fluids 36 (2024) 031910

Subjects: Robotics (cs.RO)

Aquatic organisms are known for their ability to generate efficient propulsion with low energy expenditure. While existing research has sought to leverage bio-inspired structures to reduce energy costs in underwater robotics, the crucial role of control policies in enhancing efficiency has often been overlooked. In this study, we optimize the motion of a bio-mimetic robotic fish using deep reinforcement learning (DRL) to maximize propulsion efficiency and minimize energy consumption. Our novel DRL approach incorporates extended pressure perception, a transformer model processing sequences of observations, and a policy transfer scheme. Notably, significantly improved training stability and speed within our approach allow for end-to-end training of the robotic fish. This enables agiler responses to hydrodynamic environments and possesses greater optimization potential compared to pre-defined motion pattern controls. Our experiments are conducted on a serially connected rigid robotic fish in a free stream with a Reynolds number of 6000 using computational fluid dynamics (CFD) simulations. The DRL-trained policies yield impressive results, demonstrating both high efficiency and propulsion. The policies also showcase the agent's embodiment, skillfully utilizing its body structure and engaging with surrounding fluid dynamics, as revealed through flow analysis. This study provides valuable insights into the bio-mimetic underwater robots optimization through DRL training, capitalizing on their structural advantages, and ultimately contributing to more efficient underwater propulsion systems.
[11] arXiv:2506.04646 [pdf, html, other]: Title: ActivePusher: Active Learning and Planning with Residual Physics for Nonprehensile Manipulation

Zhuoyun Zhong, Seyedali Golestaneh, Constantinos Chamzas

Subjects: Robotics (cs.RO)

Planning with learned dynamics models offers a promising approach toward real-world, long-horizon manipulation, particularly in nonprehensile settings such as pushing or rolling, where accurate analytical models are difficult to obtain. Although learning-based methods hold promise, collecting training data can be costly and inefficient, as it often relies on randomly sampled interactions that are not necessarily the most informative. To address this challenge, we propose ActivePusher, a novel framework that combines residual-physics modeling with kernel-based uncertainty-driven active learning to focus data acquisition on the most informative skill parameters. Additionally, ActivePusher seamlessly integrates with model-based kinodynamic planners, leveraging uncertainty estimates to bias control sampling toward more reliable actions. We evaluate our approach in both simulation and real-world environments and demonstrate that it improves data efficiency and planning success rates compared to baseline methods.
[12] arXiv:2506.04680 [pdf, html, other]: Title: Application of SDRE to Achieve Gait Control in a Bipedal Robot for Knee-Type Exoskeleton Testing

Ping-Kong Huang, Chien-Wu Lan, Chin-Tien Wu

Comments: 8 pages, 6 figures. Preliminary version submitted for documentation purposes on arXiv. This version records results presented at a conference and is not peer-reviewed

Subjects: Robotics (cs.RO); Optimization and Control (math.OC)

Exoskeletons are widely used in rehabilitation and industrial applications to assist human motion. However, direct human testing poses risks due to possible exoskeleton malfunctions and inconsistent movement replication. To provide a safer and more repeatable testing environment, this study employs a bipedal robot platform to reproduce human gait, allowing for controlled exoskeleton evaluations. A control strategy based on the State-Dependent Riccati Equation (SDRE) is formulated to achieve optimal torque control for accurate gait replication. The bipedal robot dynamics are represented using double pendulum model, where SDRE-optimized control inputs minimize deviations from human motion trajectories. To align with motor behavior constraints, a parameterized control method is introduced to simplify the control process while effectively replicating human gait. The proposed approach initially adopts a ramping trapezoidal velocity model, which is then adapted into a piecewise linear velocity-time representation through motor command overwriting. This modification enables finer control over gait phase transitions while ensuring compatibility with motor dynamics. The corresponding cost function optimizes the control parameters to minimize errors in joint angles, velocities, and torques relative to SDRE control result. By structuring velocity transitions in accordance with motor limitations, the method reduce the computational load associated with real-time control. Experimental results verify the feasibility of the proposed parameterized control method in reproducing human gait. The bipedal robot platform provides a reliable and repeatable testing mechanism for knee-type exoskeletons, offering insights into exoskeleton performance under controlled conditions.
[13] arXiv:2506.04684 [pdf, html, other]: Title: Real-Time LPV-Based Non-Linear Model Predictive Control for Robust Trajectory Tracking in Autonomous Vehicles

Nitish Kumar, Rajalakshmi Pachamuthu

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This paper presents the development and implementation of a Model Predictive Control (MPC) framework for trajectory tracking in autonomous vehicles under diverse driving conditions. The proposed approach incorporates a modular architecture that integrates state estimation, vehicle dynamics modeling, and optimization to ensure real-time performance. The state-space equations are formulated in a Linear Parameter Varying (LPV) form, and a curvature-based tuning method is introduced to optimize weight matrices for varying trajectories. The MPC framework is implemented using the Robot Operating System (ROS) for parallel execution of state estimation and control optimization, ensuring scalability and minimal latency. Extensive simulations and real-time experiments were conducted on multiple predefined trajectories, demonstrating high accuracy with minimal cross-track and orientation errors, even under aggressive maneuvers and high-speed conditions. The results highlight the robustness and adaptability of the proposed system, achieving seamless alignment between simulated and real-world performance. This work lays the foundation for dynamic weight tuning and integration into cooperative autonomous navigation systems, paving the way for enhanced safety and efficiency in autonomous driving applications.
[14] arXiv:2506.04752 [pdf, html, other]: Title: Tire Wear Aware Trajectory Tracking Control for Multi-axle Swerve-drive Autonomous Mobile Robots

Tianxin Hu, Xinhang Xu, Thien-Minh Nguyen, Fen Liu, Shenghai Yuan, Lihua Xie

Comments: Accepted in Journal of Automation and Intelligence

Subjects: Robotics (cs.RO)

Multi-axle Swerve-drive Autonomous Mobile Robots (MS-AGVs) equipped with independently steerable wheels are commonly used for high-payload transportation. In this work, we present a novel model predictive control (MPC) method for MS-AGV trajectory tracking that takes tire wear minimization consideration in the objective function. To speed up the problem-solving process, we propose a hierarchical controller design and simplify the dynamic model by integrating the \textit{magic formula tire model} and \textit{simplified tire wear model}. In the experiment, the proposed method can be solved by simulated annealing in real-time on a normal personal computer and by incorporating tire wear into the objective function, tire wear is reduced by 19.19\% while maintaining the tracking accuracy in curve-tracking experiments. In the more challenging scene: the desired trajectory is offset by 60 degrees from the vehicle's heading, the reduction in tire wear increased to 65.20\% compared to the kinematic model without considering the tire wear optimization.
[15] arXiv:2506.04842 [pdf, html, other]: Title: MineInsight: A Multi-sensor Dataset for Humanitarian Demining Robotics in Off-Road Environments

Mario Malizia, Charles Hamesse, Ken Hasselmann, Geert De Cubber, Nikolaos Tsiogkas, Eric Demeester, Rob Haelterman

Comments: This work has been submitted to the IEEE for possible publication

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

The use of robotics in humanitarian demining increasingly involves computer vision techniques to improve landmine detection capabilities. However, in the absence of diverse and realistic datasets, the reliable validation of algorithms remains a challenge for the research community. In this paper, we introduce MineInsight, a publicly available multi-sensor, multi-spectral dataset designed for off-road landmine detection. The dataset features 35 different targets (15 landmines and 20 commonly found objects) distributed along three distinct tracks, providing a diverse and realistic testing environment. MineInsight is, to the best of our knowledge, the first dataset to integrate dual-view sensor scans from both an Unmanned Ground Vehicle and its robotic arm, offering multiple viewpoints to mitigate occlusions and improve spatial awareness. It features two LiDARs, as well as images captured at diverse spectral ranges, including visible (RGB, monochrome), visible short-wave infrared (VIS-SWIR), and long-wave infrared (LWIR). Additionally, the dataset comes with an estimation of the location of the targets, offering a benchmark for evaluating detection algorithms. We recorded approximately one hour of data in both daylight and nighttime conditions, resulting in around 38,000 RGB frames, 53,000 VIS-SWIR frames, and 108,000 LWIR frames. MineInsight serves as a benchmark for developing and evaluating landmine detection algorithms. Our dataset is available at this https URL.
[16] arXiv:2506.04881 [pdf, html, other]: Title: Efficient Path Planning and Task Allocation Algorithm for Boolean Specifications

Ioana Hustiu, Roozbeh Abolpour, Cristian Mahulea, Marius Kloetzer

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

This paper presents a novel path-planning and task assignment algorithm for multi-robot systems that should fulfill a global Boolean specification. The proposed method is based on Integer Linear Programming (ILP) formulations, which are combined with structural insights from Petri nets to improve scalability and computational efficiency. By proving that the \emph{constraint matrix} is totally unimodular (TU) for certain classes of problems, the ILP formulation can be relaxed into a Linear Programming (LP) problem without losing the integrality of the solution. This relaxation eliminates complex combinatorial techniques, significantly reducing computational overhead and thus ensuring scalability for large-scale systems. Using the approach proposed in this paper, we can solve path-planning problems for teams made up to 500 robots. The method guarantees computational tractability, handles collision avoidance and reduces computational demands through iterative LP optimization techniques. Case studies demonstrate the efficiency of the algorithm in generating scalable, collision-free paths for large robot teams navigating in complex environments. While the conservative nature of collision avoidance introduces additional constraints, and thus, computational requirements, the solution remains practical and impactful for diverse applications. The algorithm is particularly applicable to real-world scenarios, including warehouse logistics where autonomous robots must efficiently coordinate tasks or search-and-rescue operations in various environments. This work contributes both theoretically and practically to scalable multi-robot path planning and task allocation, offering an efficient framework for coordinating autonomous agents in shared environments.
[17] arXiv:2506.04941 [pdf, html, other]: Title: ArtVIP: Articulated Digital Assets of Visual Realism, Modular Interaction, and Physical Fidelity for Robot Learning

Zhao Jin, Zhengping Che, Zhen Zhao, Kun Wu, Yuheng Zhang, Yinuo Zhao, Zehui Liu, Qiang Zhang, Xiaozhu Ju, Jing Tian, Yousong Xue, Jian Tang

Subjects: Robotics (cs.RO)

Robot learning increasingly relies on simulation to advance complex ability such as dexterous manipulations and precise interactions, necessitating high-quality digital assets to bridge the sim-to-real gap. However, existing open-source articulated-object datasets for simulation are limited by insufficient visual realism and low physical fidelity, which hinder their utility for training models mastering robotic tasks in real world. To address these challenges, we introduce ArtVIP, a comprehensive open-source dataset comprising high-quality digital-twin articulated objects, accompanied by indoor-scene assets. Crafted by professional 3D modelers adhering to unified standards, ArtVIP ensures visual realism through precise geometric meshes and high-resolution textures, while physical fidelity is achieved via fine-tuned dynamic parameters. Meanwhile, the dataset pioneers embedded modular interaction behaviors within assets and pixel-level affordance annotations. Feature-map visualization and optical motion capture are employed to quantitatively demonstrate ArtVIP 's visual and physical fidelity, with its applicability validated across imitation learning and reinforcement learning experiments. Provided in USD format with detailed production guidelines, \ours is fully open-source, benefiting the research community and advancing robot learning research. Our project is at this https URL
[18] arXiv:2506.04942 [pdf, other]: Title: A Pillbug-Inspired Morphing Mechanism Covered with Sliding Shells

Jieyu Wang, Yingzhong Tian, Fengfeng Xi, Damien Chablat (LS2N, LS2N - équipe RoMas), Jianing Lin, Gaoke Ren, Yinjun Zhao

Journal-ref: Advances in Mechanism and Machine Science and Engineering in China, Springer Nature Singapore, pp.423-435, 2025, Lecture Notes in Mechanical Engineering

Subjects: Robotics (cs.RO)

This research proposes a novel morphing structure with shells inspired by the movement of pillbugs. Instead of the pillbug body, a loopcoupled mechanism based on slider-crank mechanisms is utilized to achieve the rolling up and spreading motion. This mechanism precisely imitates three distinct curves that mimic the shape morphing of a pillbug. To decrease the degree-of-freedom (DOF) of the mechanism to one, scissor mechanisms are added. 3D curved shells are then attached to the tracer points of the morphing mechanism to safeguard it from attacks while allowing it to roll. Through type and dimensional synthesis, a complete system that includes shells and an underlying morphing mechanism is developed. A 3D model is created and tested to demonstrate the proposed system's shape-changing capability. Lastly, a robot with two modes is developed based on the proposed mechanism, which can curl up to roll down hills and can spread to move in a straight line via wheels.
[19] arXiv:2506.04982 [pdf, html, other]: Title: GEX: Democratizing Dexterity with Fully-Actuated Dexterous Hand and Exoskeleton Glove

Yunlong Dong, Xing Liu, Jun Wan, Zelin Deng

Subjects: Robotics (cs.RO)

This paper introduces GEX, an innovative low-cost dexterous manipulation system that combines the GX11 tri-finger anthropomorphic hand (11 DoF) with the EX12 tri-finger exoskeleton glove (12 DoF), forming a closed-loop teleoperation framework through kinematic retargeting for high-fidelity control. Both components employ modular 3D-printed finger designs, achieving ultra-low manufacturing costs while maintaining full actuation capabilities. Departing from conventional tendon-driven or underactuated approaches, our electromechanical system integrates independent joint motors across all 23 DoF, ensuring complete state observability and accurate kinematic modeling. This full-actuation architecture enables precise bidirectional kinematic calculations, substantially enhancing kinematic retargeting fidelity between the exoskeleton and robotic hand. The proposed system bridges the cost-performance gap in dexterous manipulation research, providing an accessible platform for acquiring high-quality demonstration data to advance embodied AI and dexterous robotic skill transfer learning.
[20] arXiv:2506.05012 [pdf, other]: Title: A Unified Framework for Simulating Strongly-Coupled Fluid-Robot Multiphysics

Jeong Hun Lee, Junzhe Hu, Sofia Kwok, Carmel Majidi, Zachary Manchester

Subjects: Robotics (cs.RO)

We present a framework for simulating fluid-robot multiphysics as a single, unified optimization problem. The coupled manipulator and incompressible Navier-Stokes equations governing the robot and fluid dynamics are derived together from a single Lagrangian using the principal of least action. We then employ discrete variational mechanics to derive a stable, implicit time-integration scheme for jointly simulating both the fluid and robot dynamics, which are tightly coupled by a constraint that enforces the no-slip boundary condition at the fluid-robot interface. Extending the classical immersed boundary method, we derive a new formulation of the no-slip constraint that is numerically well-conditioned and physically accurate for multibody systems commonly found in robotics. We demonstrate our approach's physical accuracy on benchmark computational fluid-dynamics problems, including Poiseuille flow and a disc in free stream. We then design a locomotion policy for a novel swimming robot in simulation and validate results on real-world hardware, showcasing our framework's sim-to-real capability for robotics tasks.
[21] arXiv:2506.05020 [pdf, other]: Title: Hierarchical Language Models for Semantic Navigation and Manipulation in an Aerial-Ground Robotic System

Haokun Liu, Zhaoqi Ma, Yunong Li, Junichiro Sugihara, Yicheng Chen, Jinjie Li, Moju Zhao

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Heterogeneous multi-robot systems show great potential in complex tasks requiring coordinated hybrid cooperation. However, traditional approaches relying on static models often struggle with task diversity and dynamic environments. This highlights the need for generalizable intelligence that can bridge high-level reasoning with low-level execution across heterogeneous agents. To address this, we propose a hierarchical framework integrating a prompted Large Language Model (LLM) and a GridMask-enhanced fine-tuned Vision Language Model (VLM). The LLM performs task decomposition and global semantic map construction, while the VLM extracts task-specified semantic labels and 2D spatial information from aerial images to support local planning. Within this framework, the aerial robot follows a globally optimized semantic path and continuously provides bird-view images, guiding the ground robot's local semantic navigation and manipulation, including target-absent scenarios where implicit alignment is maintained. Experiments on a real-world letter-cubes arrangement task demonstrate the framework's adaptability and robustness in dynamic environments. To the best of our knowledge, this is the first demonstration of an aerial-ground heterogeneous system integrating VLM-based perception with LLM-driven task reasoning and motion planning.
[22] arXiv:2506.05056 [pdf, html, other]: Title: PulseRide: A Robotic Wheelchair for Personalized Exertion Control with Human-in-the-Loop Reinforcement Learning

Azizul Zahid, Bibek Poudel, Danny Scott, Jason Scott, Scott Crouter, Weizi Li, Sai Swaminathan

Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)

Maintaining an active lifestyle is vital for quality of life, yet challenging for wheelchair users. For instance, powered wheelchairs face increasing risks of obesity and deconditioning due to inactivity. Conversely, manual wheelchair users, who propel the wheelchair by pushing the wheelchair's handrims, often face upper extremity injuries from repetitive motions. These challenges underscore the need for a mobility system that promotes activity while minimizing injury risk. Maintaining optimal exertion during wheelchair use enhances health benefits and engagement, yet the variations in individual physiological responses complicate exertion optimization. To address this, we introduce PulseRide, a novel wheelchair system that provides personalized assistance based on each user's physiological responses, helping them maintain their physical exertion goals. Unlike conventional assistive systems focused on obstacle avoidance and navigation, PulseRide integrates real-time physiological data-such as heart rate and ECG-with wheelchair speed to deliver adaptive assistance. Using a human-in-the-loop reinforcement learning approach with Deep Q-Network algorithm (DQN), the system adjusts push assistance to keep users within a moderate activity range without under- or over-exertion. We conducted preliminary tests with 10 users on various terrains, including carpet and slate, to assess PulseRide's effectiveness. Our findings show that, for individual users, PulseRide maintains heart rates within the moderate activity zone as much as 71.7 percent longer than manual wheelchairs. Among all users, we observed an average reduction in muscle contractions of 41.86 percent, delaying fatigue onset and enhancing overall comfort and engagement. These results indicate that PulseRide offers a healthier, adaptive mobility solution, bridging the gap between passive and physically taxing mobility options.
[23] arXiv:2506.05064 [pdf, html, other]: Title: DemoSpeedup: Accelerating Visuomotor Policies via Entropy-Guided Demonstration Acceleration

Lingxiao Guo, Zhengrong Xue, Zijing Xu, Huazhe Xu

Subjects: Robotics (cs.RO)

Imitation learning has shown great promise in robotic manipulation, but the policy's execution is often unsatisfactorily slow due to commonly tardy demonstrations collected by human operators. In this work, we present DemoSpeedup, a self-supervised method to accelerate visuomotor policy execution via entropy-guided demonstration acceleration. DemoSpeedup starts from training an arbitrary generative policy (e.g., ACT or Diffusion Policy) on normal-speed demonstrations, which serves as a per-frame action entropy estimator. The key insight is that frames with lower action entropy estimates call for more consistent policy behaviors, which often indicate the demands for higher-precision operations. In contrast, frames with higher entropy estimates correspond to more casual sections, and therefore can be more safely accelerated. Thus, we segment the original demonstrations according to the estimated entropy, and accelerate them by down-sampling at rates that increase with the entropy values. Trained with the speedup demonstrations, the resulting policies execute up to 3 times faster while maintaining the task completion performance. Interestingly, these policies could even achieve higher success rates than those trained with normal-speed demonstrations, due to the benefits of reduced decision-making horizons.
[24] arXiv:2506.05092 [pdf, html, other]: Title: Synthetic Dataset Generation for Autonomous Mobile Robots Using 3D Gaussian Splatting for Vision Training

Aneesh Deogan, Wout Beks, Peter Teurlings, Koen de Vos, Mark van den Brand, Rene van de Molengraft

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Annotated datasets are critical for training neural networks for object detection, yet their manual creation is time- and labour-intensive, subjective to human error, and often limited in diversity. This challenge is particularly pronounced in the domain of robotics, where diverse and dynamic scenarios further complicate the creation of representative datasets. To address this, we propose a novel method for automatically generating annotated synthetic data in Unreal Engine. Our approach leverages photorealistic 3D Gaussian splats for rapid synthetic data generation. We demonstrate that synthetic datasets can achieve performance comparable to that of real-world datasets while significantly reducing the time required to generate and annotate data. Additionally, combining real-world and synthetic data significantly increases object detection performance by leveraging the quality of real-world images with the easier scalability of synthetic data. To our knowledge, this is the first application of synthetic data for training object detection algorithms in the highly dynamic and varied environment of robot soccer. Validation experiments reveal that a detector trained on synthetic images performs on par with one trained on manually annotated real-world images when tested on robot soccer match scenarios. Our method offers a scalable and comprehensive alternative to traditional dataset creation, eliminating the labour-intensive error-prone manual annotation process. By generating datasets in a simulator where all elements are intrinsically known, we ensure accurate annotations while significantly reducing manual effort, which makes it particularly valuable for robotics applications requiring diverse and scalable training data.
[25] arXiv:2506.05106 [pdf, html, other]: Title: EDEN: Efficient Dual-Layer Exploration Planning for Fast UAV Autonomous Exploration in Large 3-D Environments

Qianli Dong, Xuebo Zhang, Shiyong Zhang, Ziyu Wang, Zhe Ma, Haobo Xi

Subjects: Robotics (cs.RO)

Efficient autonomous exploration in large-scale environments remains challenging due to the high planning computational cost and low-speed maneuvers. In this paper, we propose a fast and computationally efficient dual-layer exploration planning method. The insight of our dual-layer method is efficiently finding an acceptable long-term region routing and greedily exploring the target in the region of the first routing area with high speed. Specifically, the proposed method finds the long-term area routing through an approximate algorithm to ensure real-time planning in large-scale environments. Then, the viewpoint in the first routing region with the lowest curvature-penalized cost, which can effectively reduce decelerations caused by sharp turn motions, will be chosen as the next exploration target. To further speed up the exploration, we adopt an aggressive and safe exploration-oriented trajectory to enhance exploration continuity. The proposed method is compared to state-of-the-art methods in challenging simulation environments. The results show that the proposed method outperforms other methods in terms of exploration efficiency, computational cost, and trajectory speed. We also conduct real-world experiments to validate the effectiveness of the proposed method. The code will be open-sourced.
[26] arXiv:2506.05115 [pdf, html, other]: Title: Whole-Body Constrained Learning for Legged Locomotion via Hierarchical Optimization

Haoyu Wang, Ruyi Zhou, Liang Ding, Tie Liu, Zhelin Zhang, Peng Xu, Haibo Gao, Zongquan Deng

Subjects: Robotics (cs.RO)

Reinforcement learning (RL) has demonstrated impressive performance in legged locomotion over various challenging environments. However, due to the sim-to-real gap and lack of explainability, unconstrained RL policies deployed in the real world still suffer from inevitable safety issues, such as joint collisions, excessive torque, or foot slippage in low-friction environments. These problems limit its usage in missions with strict safety requirements, such as planetary exploration, nuclear facility inspection, and deep-sea operations. In this paper, we design a hierarchical optimization-based whole-body follower, which integrates both hard and soft constraints into RL framework to make the robot move with better safety guarantees. Leveraging the advantages of model-based control, our approach allows for the definition of various types of hard and soft constraints during training or deployment, which allows for policy fine-tuning and mitigates the challenges of sim-to-real transfer. Meanwhile, it preserves the robustness of RL when dealing with locomotion in complex unstructured environments. The trained policy with introduced constraints was deployed in a hexapod robot and tested in various outdoor environments, including snow-covered slopes and stairs, demonstrating the great traversability and safety of our approach.
[27] arXiv:2506.05117 [pdf, html, other]: Title: Realizing Text-Driven Motion Generation on NAO Robot: A Reinforcement Learning-Optimized Control Pipeline

Zihan Xu, Mengxian Hu, Kaiyan Xiao, Qin Fang, Chengju Liu, Qijun Chen

Subjects: Robotics (cs.RO)

Human motion retargeting for humanoid robots, transferring human motion data to robots for imitation, presents significant challenges but offers considerable potential for real-world applications. Traditionally, this process relies on human demonstrations captured through pose estimation or motion capture systems. In this paper, we explore a text-driven approach to mapping human motion to humanoids. To address the inherent discrepancies between the generated motion representations and the kinematic constraints of humanoid robots, we propose an angle signal network based on norm-position and rotation loss (NPR Loss). It generates joint angles, which serve as inputs to a reinforcement learning-based whole-body joint motion control policy. The policy ensures tracking of the generated motions while maintaining the robot's stability during execution. Our experimental results demonstrate the efficacy of this approach, successfully transferring text-driven human motion to a real humanoid robot NAO.
[28] arXiv:2506.05165 [pdf, html, other]: Title: LiPo: A Lightweight Post-optimization Framework for Smoothing Action Chunks Generated by Learned Policies

Dongwoo Son, Suhan Park

Comments: 6 pages, 7 figures, 1 table

Subjects: Robotics (cs.RO)

Recent advances in imitation learning have enabled robots to perform increasingly complex manipulation tasks in unstructured environments. However, most learned policies rely on discrete action chunking, which introduces discontinuities at chunk boundaries. These discontinuities degrade motion quality and are particularly problematic in dynamic tasks such as throwing or lifting heavy objects, where smooth trajectories are critical for momentum transfer and system stability. In this work, we present a lightweight post-optimization framework for smoothing chunked action sequences. Our method combines three key components: (1) inference-aware chunk scheduling to proactively generate overlapping chunks and avoid pauses from inference delays; (2) linear blending in the overlap region to reduce abrupt transitions; and (3) jerk-minimizing trajectory optimization constrained within a bounded perturbation space. The proposed method was validated on a position-controlled robotic arm performing dynamic manipulation tasks. Experimental results demonstrate that our approach significantly reduces vibration and motion jitter, leading to smoother execution and improved mechanical robustness.
[29] arXiv:2506.05168 [pdf, other]: Title: Fabrica: Dual-Arm Assembly of General Multi-Part Objects via Integrated Planning and Learning

Yunsheng Tian, Joshua Jacob, Yijiang Huang, Jialiang Zhao, Edward Gu, Pingchuan Ma, Annan Zhang, Farhad Javid, Branden Romero, Sachin Chitta, Shinjiro Sueda, Hui Li, Wojciech Matusik

Subjects: Robotics (cs.RO)

Multi-part assembly poses significant challenges for robots to execute long-horizon, contact-rich manipulation with generalization across complex geometries. We present Fabrica, a dual-arm robotic system capable of end-to-end planning and control for autonomous assembly of general multi-part objects. For planning over long horizons, we develop hierarchies of precedence, sequence, grasp, and motion planning with automated fixture generation, enabling general multi-step assembly on any dual-arm robots. The planner is made efficient through a parallelizable design and is optimized for downstream control stability. For contact-rich assembly steps, we propose a lightweight reinforcement learning framework that trains generalist policies across object geometries, assembly directions, and grasp poses, guided by equivariance and residual actions obtained from the plan. These policies transfer zero-shot to the real world and achieve 80% successful steps. For systematic evaluation, we propose a benchmark suite of multi-part assemblies resembling industrial and daily objects across diverse categories and geometries. By integrating efficient global planning and robust local control, we showcase the first system to achieve complete and generalizable real-world multi-part assembly without domain knowledge or human demonstrations. Project website: this http URL

[30] arXiv:2506.04399 (cross-list from cs.LG) [pdf, html, other]: Title: Unsupervised Meta-Testing with Conditional Neural Processes for Hybrid Meta-Reinforcement Learning

Suzan Ece Ada, Emre Ugur

Comments: Published in IEEE Robotics and Automation Letters Volume: 9, Issue: 10, 8427 - 8434, October 2024. 8 pages, 7 figures

Journal-ref: IEEE Robotics and Automation Letters Volume: 9, Issue: 10, 8427 - 8434, October 2024,

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

We introduce Unsupervised Meta-Testing with Conditional Neural Processes (UMCNP), a novel hybrid few-shot meta-reinforcement learning (meta-RL) method that uniquely combines, yet distinctly separates, parameterized policy gradient-based (PPG) and task inference-based few-shot meta-RL. Tailored for settings where the reward signal is missing during meta-testing, our method increases sample efficiency without requiring additional samples in meta-training. UMCNP leverages the efficiency and scalability of Conditional Neural Processes (CNPs) to reduce the number of online interactions required in meta-testing. During meta-training, samples previously collected through PPG meta-RL are efficiently reused for learning task inference in an offline manner. UMCNP infers the latent representation of the transition dynamics model from a single test task rollout with unknown parameters. This approach allows us to generate rollouts for self-adaptation by interacting with the learned dynamics model. We demonstrate our method can adapt to an unseen test task using significantly fewer samples during meta-testing than the baselines in 2D-Point Agent and continuous control meta-RL benchmarks, namely, cartpole with unknown angle sensor bias, walker agent with randomized dynamics parameters.
[31] arXiv:2506.04404 (cross-list from cs.NI) [pdf, html, other]: Title: A Framework Leveraging Large Language Models for Autonomous UAV Control in Flying Networks

Diana Nunes, Ricardo Amorim, Pedro Ribeiro, André Coelho, Rui Campos

Comments: 6 pages, 3 figures, 6 tables

Subjects: Networking and Internet Architecture (cs.NI); Robotics (cs.RO)

This paper proposes FLUC, a modular framework that integrates open-source Large Language Models (LLMs) with Unmanned Aerial Vehicle (UAV) autopilot systems to enable autonomous control in Flying Networks (FNs). FLUC translates high-level natural language commands into executable UAV mission code, bridging the gap between operator intent and UAV behaviour.
FLUC is evaluated using three open-source LLMs - Qwen 2.5, Gemma 2, and LLaMA 3.2 - across scenarios involving code generation and mission planning. Results show that Qwen 2.5 excels in multi-step reasoning, Gemma 2 balances accuracy and latency, and LLaMA 3.2 offers faster responses with lower logical coherence. A case study on energy-aware UAV positioning confirms FLUC's ability to interpret structured prompts and autonomously execute domain-specific logic, showing its effectiveness in real-time, mission-driven control.
[32] arXiv:2506.04500 (cross-list from cs.AI) [pdf, other]: Title: "Don't Do That!": Guiding Embodied Systems through Large Language Model-based Constraint Generation

Aladin Djuhera, Amin Seffo, Masataro Asai, Holger Boche

Comments: Preprint; under review

Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

Recent advancements in large language models (LLMs) have spurred interest in robotic navigation that incorporates complex spatial, mathematical, and conditional constraints from natural language into the planning problem. Such constraints can be informal yet highly complex, making it challenging to translate into a formal description that can be passed on to a planning algorithm. In this paper, we propose STPR, a constraint generation framework that uses LLMs to translate constraints (expressed as instructions on ``what not to do'') into executable Python functions. STPR leverages the LLM's strong coding capabilities to shift the problem description from language into structured and transparent code, thus circumventing complex reasoning and avoiding potential hallucinations. We show that these LLM-generated functions accurately describe even complex mathematical constraints, and apply them to point cloud representations with traditional search algorithms. Experiments in a simulated Gazebo environment show that STPR ensures full compliance across several constraints and scenarios, while having short runtimes. We also verify that STPR can be used with smaller, code-specific LLMs, making it applicable to a wide range of compact models at low inference cost.
[33] arXiv:2506.04867 (cross-list from cs.AI) [pdf, html, other]: Title: LLMs for sensory-motor control: Combining in-context and iterative learning

Jônata Tyska Carvalho, Stefano Nolfi

Comments: 24 pages (13 pages are from appendix), 6 figures, code for experiments replication and supplementary material provided at this https URL

Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Robotics (cs.RO)

We propose a method that enables large language models (LLMs) to control embodied agents by directly mapping continuous observation vectors to continuous action vectors. Initially, the LLMs generate a control strategy based on a textual description of the agent, its environment, and the intended goal. This strategy is then iteratively refined through a learning process in which the LLMs are repeatedly prompted to improve the current strategy, using performance feedback and sensory-motor data collected during its evaluation. The method is validated on classic control tasks from the Gymnasium library and the inverted pendulum task from the MuJoCo library. In most cases, it successfully identifies optimal or high-performing solutions by integrating symbolic knowledge derived through reasoning with sub-symbolic sensory-motor data gathered as the agent interacts with its environment.
[34] arXiv:2506.05250 (cross-list from cs.CV) [pdf, html, other]: Title: Spatiotemporal Contrastive Learning for Cross-View Video Localization in Unstructured Off-road Terrains

Zhiyun Deng, Dongmyeong Lee, Amanda Adkins, Jesse Quattrociocchi, Christian Ellis, Joydeep Biswas

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Robust cross-view 3-DoF localization in GPS-denied, off-road environments remains challenging due to (1) perceptual ambiguities from repetitive vegetation and unstructured terrain, and (2) seasonal shifts that significantly alter scene appearance, hindering alignment with outdated satellite imagery. To address this, we introduce MoViX, a self-supervised cross-view video localization framework that learns viewpoint- and season-invariant representations while preserving directional awareness essential for accurate localization. MoViX employs a pose-dependent positive sampling strategy to enhance directional discrimination and temporally aligned hard negative mining to discourage shortcut learning from seasonal cues. A motion-informed frame sampler selects spatially diverse frames, and a lightweight temporal aggregator emphasizes geometrically aligned observations while downweighting ambiguous ones. At inference, MoViX runs within a Monte Carlo Localization framework, using a learned cross-view matching module in place of handcrafted models. Entropy-guided temperature scaling enables robust multi-hypothesis tracking and confident convergence under visual ambiguity. We evaluate MoViX on the TartanDrive 2.0 dataset, training on under 30 minutes of data and testing over 12.29 km. Despite outdated satellite imagery, MoViX localizes within 25 meters of ground truth 93% of the time, and within 50 meters 100% of the time in unseen regions, outperforming state-of-the-art baselines without environment-specific tuning. We further demonstrate generalization on a real-world off-road dataset from a geographically distinct site with a different robot platform.
[35] arXiv:2506.05282 (cross-list from cs.CV) [pdf, html, other]: Title: Rectified Point Flow: Generic Point Cloud Pose Estimation

Tao Sun, Liyuan Zhu, Shengyu Huang, Shuran Song, Iro Armeni

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)

We introduce Rectified Point Flow, a unified parameterization that formulates pairwise point cloud registration and multi-part shape assembly as a single conditional generative problem. Given unposed point clouds, our method learns a continuous point-wise velocity field that transports noisy points toward their target positions, from which part poses are recovered. In contrast to prior work that regresses part-wise poses with ad-hoc symmetry handling, our method intrinsically learns assembly symmetries without symmetry labels. Together with a self-supervised encoder focused on overlapping points, our method achieves a new state-of-the-art performance on six benchmarks spanning pairwise registration and shape assembly. Notably, our unified formulation enables effective joint training on diverse datasets, facilitating the learning of shared geometric priors and consequently boosting accuracy. Project page: this https URL.

[36] arXiv:2401.04003 (replaced) [pdf, html, other]: Title: Simultaneous Task Allocation and Planning for Multi-Robots under Hierarchical Temporal Logic Specifications

Xusheng Luo, Changliu Liu

Comments: 20 pages, 11 figures. Accepted to appear in IEEE Transaction on Robotics 2025. Video this https URL

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL)

Research in robotic planning with temporal logic specifications, such as Linear Temporal Logic (LTL), has relied on single formulas. However, as task complexity increases, LTL formulas become lengthy, making them difficult to interpret and generate, and straining the computational capacities of planners. To address this, we introduce a hierarchical structure for a widely used specification type -- LTL on finite traces (LTL$_f$). The resulting language, termed H-LTL$_f$, is defined with both its syntax and semantics. We further prove that H-LTL$_f$ is more expressive than its standard "flat" counterparts. Moreover, we conducted a user study that compared the standard LTL$_f$ with our hierarchical version and found that users could more easily comprehend complex tasks using the hierarchical structure. We develop a search-based approach to synthesize plans for multi-robot systems, achieving simultaneous task allocation and planning. This method approximates the search space by loosely interconnected sub-spaces, each corresponding to an LTL$_f$ specification. The search primarily focuses on a single sub-space, transitioning to another under conditions determined by the decomposition of automata. We develop multiple heuristics to significantly expedite the search. Our theoretical analysis, conducted under mild assumptions, addresses completeness and optimality. Compared to existing methods used in various simulators for service tasks, our approach improves planning times while maintaining comparable solution quality.
[37] arXiv:2408.11978 (replaced) [pdf, html, other]: Title: Optimized Kalman Filter based State Estimation and Height Control in Hopping Robots

Samuel Burns, Matthew Woodward

Comments: 14 pages, 8 figures, 7 tables

Subjects: Robotics (cs.RO)

Rotor-based hopping locomotion significantly improves efficiency and operation time as compared to purely flying systems; where most hopping robots use the liftoff states and an assumed ballistic trajectory to determine the hopping height. However, significant aerial phase force (e.g., thrust and drag) can invalidate this assumption and lead to poor estimation performance. To combat this issue, a group has implemented multiple sensors (active and passive optical, inertial, and contact) and significant computational power to achieve full state estimation. This, however, poses a significant challenge to the development of light-weight, high-performance, low observable, jamming and electronic interference resistant hopping systems; especially in perceptually degraded environments (e.g., dust, smoke). Here we show a training procedure for a coupled hopping phase and Kalman filter-based vertical state estimator, requiring only inertial measurements, which is able to learn the characteristics of the target system, sensors, locomotion behaviors, environment, and acceleration measurement aliasing conditions. The resulting estimator, given hop heights up to 4 m and velocities up to $\pm7$ m/s, achieves a mean absolute percent error in the hop apex height of 12.5% with an aerial trajectory average normalized mean absolute error in position and velocity of 19% and 16.5%, respectively; while operating at 840 Hz, on a dual-core 240 MHz processor, with a total robot mass of 672 g. Due to the low mass and computational power, the presented estimator could also be used as a degraded operational mode in cases of sensor damage, malfunction, or occlusion in more complex robots.
[38] arXiv:2409.08704 (replaced) [pdf, html, other]: Title: QueryCAD: Grounded Question Answering for CAD Models

Claudius Kienle, Benjamin Alt, Darko Katic, Rainer Jäkel, Jan Peters

Subjects: Robotics (cs.RO)

CAD models are widely used in industry and are essential for robotic automation processes. However, these models are rarely considered in novel AI-based approaches, such as the automatic synthesis of robot programs, as there are no readily available methods that would allow CAD models to be incorporated for the analysis, interpretation, or extraction of information. To address these limitations, we propose QueryCAD, the first system designed for CAD question answering, enabling the extraction of precise information from CAD models using natural language queries. QueryCAD incorporates SegCAD, an open-vocabulary instance segmentation model we developed to identify and select specific parts of the CAD model based on part descriptions. We further propose a CAD question answering benchmark to evaluate QueryCAD and establish a foundation for future research. Lastly, we integrate QueryCAD within an automatic robot program synthesis framework, validating its ability to enhance deep-learning solutions for robotics by enabling them to process CAD models (this https URL).
[39] arXiv:2409.11962 (replaced) [pdf, html, other]: Title: Reactive Collision Avoidance for Safe Agile Navigation

Alessandro Saviolo, Niko Picello, Jeffrey Mao, Rishabh Verma, Giuseppe Loianno

Subjects: Robotics (cs.RO)

Reactive collision avoidance is essential for agile robots navigating complex and dynamic environments, enabling real-time obstacle response. However, this task is inherently challenging because it requires a tight integration of perception, planning, and control, which traditional methods often handle separately, resulting in compounded errors and delays. This paper introduces a novel approach that unifies these tasks into a single reactive framework using solely onboard sensing and computing. Our method combines nonlinear model predictive control with adaptive control barrier functions, directly linking perception-driven constraints to real-time planning and control. Constraints are determined by using a neural network to refine noisy RGB-D data, enhancing depth accuracy, and selecting points with the minimum time-to-collision to prioritize the most immediate threats. To maintain a balance between safety and agility, a heuristic dynamically adjusts the optimization process, preventing overconstraints in real time. Extensive experiments with an agile quadrotor demonstrate effective collision avoidance across diverse indoor and outdoor environments, without requiring environment-specific tuning or explicit mapping.
[40] arXiv:2409.17469 (replaced) [pdf, html, other]: Title: VertiSelector: Automatic Curriculum Learning for Wheeled Mobility on Vertically Challenging Terrain

Tong Xu, Chenhui Pan, Xuesu Xiao

Subjects: Robotics (cs.RO)

Reinforcement Learning (RL) has the potential to enable extreme off-road mobility by circumventing complex kinodynamic modeling, planning, and control by simulated end-to-end trial-and-error learning experiences. However, most RL methods are sample-inefficient when training in a large amount of manually designed simulation environments and struggle at generalizing to the real world. To address these issues, we introduce VertiSelector (VS), an automatic curriculum learning framework designed to enhance learning efficiency and generalization by selectively sampling training terrain. VS prioritizes vertically challenging terrain with higher Temporal Difference (TD) errors when revisited, thereby allowing robots to learn at the edge of their evolving capabilities. By dynamically adjusting the sampling focus, VS significantly boosts sample efficiency and generalization within the VW-Chrono simulator built on the Chrono multi-physics engine. Furthermore, we provide simulation and physical results using VS on a Verti-4-Wheeler platform. These results demonstrate that VS can achieve 23.08% improvement in terms of success rate by efficiently sampling during training and robustly generalizing to the real world.
[41] arXiv:2502.06466 (replaced) [pdf, other]: Title: Inflatable Kirigami Crawlers

Burcu Seyidoğlu, Aida Parvaresh, Bahman Taherkhani, Ahmad Rafsanjani

Subjects: Robotics (cs.RO)

Kirigami offers unique opportunities for guided morphing by leveraging the geometry of the cuts. This work presents inflatable kirigami crawlers created by introducing cut patterns into heat-sealable textiles to achieve locomotion upon cyclic pneumatic actuation. Inflating traditional air pouches results in symmetric bulging and contraction. In inflated kirigami actuators, the accumulated compressive forces uniformly break the symmetry, enhance contraction compared to simple air pouches by two folds, and trigger local rotation of the sealed edges that overlap and self-assemble into an architected surface with emerging scale-like features. As a result, the inflatable kirigami actuators exhibit a uniform, controlled contraction with asymmetric localized out-of-plane deformations. This process allows us to harness the geometric and material nonlinearities to imbue inflatable textile-based kirigami actuators with predictable locomotive functionalities. We thoroughly characterized the programmed deformations of these actuators and their impact on friction. We found that the kirigami actuators exhibit directional anisotropic friction properties when inflated, having higher friction coefficients against the direction of the movement, enabling them to move across surfaces with varying roughness. We further enhanced the functionality of inflatable kirigami actuators by introducing multiple channels and segments to create functional soft robotic prototypes with versatile locomotion capabilities.
[42] arXiv:2504.14170 (replaced) [pdf, html, other]: Title: Collision Induced Binding and Transport of Shape Changing Robot Pairs

Akash Vardhan, Ram Avinery, Hosain Bagheri, Velin Kojohourav, Shengkai Li, Hridesh Kedia, Tianyu Wang, Daniel Soto, Kurt Wiesenfeld, Daniel I. Goldman

Comments: 7 pages, 6 figures, submitted to PRL

Subjects: Robotics (cs.RO); Adaptation and Self-Organizing Systems (nlin.AO)

We report in experiment and simulation the spontaneous formation of dynamically bound pairs of shape changing robots undergoing locally repulsive collisions. These physical `gliders' robustly emerge from an ensemble of individually undulating three-link two-motor robots and can remain bound for hundreds of undulations and travel for multiple robot dimensions. Gliders occur in two distinct binding symmetries and form over a wide range of angular oscillation extent. This parameter sets the maximal concavity which influences formation probability and translation characteristics. Analysis of dynamics in simulation reveals the mechanism of effective dynamical attraction -- a result of the emergent interplay of appropriately oriented and timed repulsive interactions. Tactile sensing stabilizes the short-lived conformation via concavity modulation.
[43] arXiv:2505.03448 (replaced) [pdf, html, other]: Title: AquaticVision: Benchmarking Visual SLAM in Underwater Environment with Events and Frames

Yifan Peng, Yuze Hong, Ziyang Hong, Apple Pui-Yi Chui, Junfeng Wu

Subjects: Robotics (cs.RO)

Many underwater applications, such as offshore asset inspections, rely on visual inspection and detailed 3D reconstruction. Recent advancements in underwater visual SLAM systems for aquatic environments have garnered significant attention in marine robotics research. However, existing underwater visual SLAM datasets often lack groundtruth trajectory data, making it difficult to objectively compare the performance of different SLAM algorithms based solely on qualitative results or COLMAP reconstruction. In this paper, we present a novel underwater dataset that includes ground truth trajectory data obtained using a motion capture system. Additionally, for the first time, we release visual data that includes both events and frames for benchmarking underwater visual positioning. By providing event camera data, we aim to facilitate the development of more robust and advanced underwater visual SLAM algorithms. The use of event cameras can help mitigate challenges posed by extremely low light or hazy underwater conditions. The webpage of our dataset is this https URL.
[44] arXiv:2505.09430 (replaced) [pdf, html, other]: Title: Mini Diffuser: Fast Multi-task Diffusion Policy Training Using Two-level Mini-batches

Yutong Hu, Pinhao Song, Kehan Wen, Renaud Detry

Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

We present a method that reduces, by an order of magnitude, the time and memory needed to train multi-task vision-language robotic diffusion policies. This improvement arises from a previously underexplored distinction between action diffusion and the image diffusion techniques that inspired it: In image generation, the target is high-dimensional. By contrast, in action generation, the dimensionality of the target is comparatively small, and only the image condition is high-dimensional. Our approach, \emph{Mini Diffuser}, exploits this asymmetry by introducing \emph{two-level minibatching}, which pairs multiple noised action samples with each vision-language condition, instead of the conventional one-to-one sampling strategy. To support this batching scheme, we introduce architectural adaptations to the diffusion transformer that prevent information leakage across samples while maintaining full conditioning access. In RLBench simulations, Mini-Diffuser achieves 95\% of the performance of state-of-the-art multi-task diffusion policies, while using only 5\% of the training time and 7\% of the memory. Real-world experiments further validate that Mini-Diffuser preserves the key strengths of diffusion-based policies, including the ability to model multimodal action distributions and produce behavior conditioned on diverse perceptual inputs. Code available at this http URL
[45] arXiv:2505.09833 (replaced) [pdf, html, other]: Title: Learning Rock Pushability on Rough Planetary Terrain

Tuba Girgin, Emre Girgin, Cagri Kilic

Comments: Paper presented at the Workshop on Field Robotics, ICRA 2025, Atlanta, GA, United States

Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

In the context of mobile navigation in unstructured environments, the predominant approach entails the avoidance of obstacles. The prevailing path planning algorithms are contingent upon deviating from the intended path for an indefinite duration and returning to the closest point on the route after the obstacle is left behind spatially. However, avoiding an obstacle on a path that will be used repeatedly by multiple agents can hinder long-term efficiency and lead to a lasting reliance on an active path planning system. In this study, we propose an alternative approach to mobile navigation in unstructured environments by leveraging the manipulation capabilities of a robotic manipulator mounted on top of a mobile robot. Our proposed framework integrates exteroceptive and proprioceptive feedback to assess the push affordance of obstacles, facilitating their repositioning rather than avoidance. While our preliminary visual estimation takes into account the characteristics of both the obstacle and the surface it relies on, the push affordance estimation module exploits the force feedback obtained by interacting with the obstacle via a robotic manipulator as the guidance signal. The objective of our navigation approach is to enhance the efficiency of routes utilized by multiple agents over extended periods by reducing the overall time spent by a fleet in environments where autonomous infrastructure development is imperative, such as lunar or Martian surfaces.
[46] arXiv:2505.10033 (replaced) [pdf, html, other]: Title: Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests

Luis F. W. Batista, Stéphanie Aravecchia, Seth Hutchinson, Cédric Pradalier

Comments: Presented at the 2025 IEEE ICRA Workshop on Field Robotics

Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

Despite significant advancements in Deep Reinforcement Learning (DRL) for Autonomous Surface Vehicles (ASVs), their robustness in real-world conditions, particularly under external disturbances, remains insufficiently explored. In this paper, we evaluate the resilience of a DRL-based agent designed to capture floating waste under various perturbations. We train the agent using domain randomization and evaluate its performance in real-world field tests, assessing its ability to handle unexpected disturbances such as asymmetric drag and an off-center payload. We assess the agent's performance under these perturbations in both simulation and real-world experiments, quantifying performance degradation and benchmarking it against an MPC baseline. Results indicate that the DRL agent performs reliably despite significant disturbances. Along with the open-source release of our implementation, we provide insights into effective training strategies, real-world challenges, and practical considerations for deploying DRLbased ASV controllers.
[47] arXiv:2505.24266 (replaced) [pdf, html, other]: Title: SignBot: Learning Human-to-Humanoid Sign Language Interaction

Guanren Qiao, Sixu Lin, Ronglai Zuo, Zhizheng Wu, Kui Jia, Guiliang Liu

Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)

Sign language is a natural and visual form of language that uses movements and expressions to convey meaning, serving as a crucial means of communication for individuals who are deaf or hard-of-hearing (DHH). However, the number of people proficient in sign language remains limited, highlighting the need for technological advancements to bridge communication gaps and foster interactions with minorities. Based on recent advancements in embodied humanoid robots, we propose SignBot, a novel framework for human-robot sign language interaction. SignBot integrates a cerebellum-inspired motion control component and a cerebral-oriented module for comprehension and interaction. Specifically, SignBot consists of: 1) Motion Retargeting, which converts human sign language datasets into robot-compatible kinematics; 2) Motion Control, which leverages a learning-based paradigm to develop a robust humanoid control policy for tracking sign language gestures; and 3) Generative Interaction, which incorporates translator, responser, and generator of sign language, thereby enabling natural and effective communication between robots and humans. Simulation and real-world experimental results demonstrate that SignBot can effectively facilitate human-robot interaction and perform sign language motions with diverse robots and datasets. SignBot represents a significant advancement in automatic sign language interaction on embodied humanoid robot platforms, providing a promising solution to improve communication accessibility for the DHH community.
[48] arXiv:2505.24305 (replaced) [pdf, html, other]: Title: SR3D: Unleashing Single-view 3D Reconstruction for Transparent and Specular Object Grasping

Mingxu Zhang, Xiaoqi Li, Jiahui Xu, Kaichen Zhou, Hojin Bae, Yan Shen, Chuyan Xiong, Jiaming Liu, Hao Dong

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Recent advancements in 3D robotic manipulation have improved grasping of everyday objects, but transparent and specular materials remain challenging due to depth sensing limitations. While several 3D reconstruction and depth completion approaches address these challenges, they suffer from setup complexity or limited observation information utilization. To address this, leveraging the power of single view 3D object reconstruction approaches, we propose a training free framework SR3D that enables robotic grasping of transparent and specular objects from a single view observation. Specifically, given single view RGB and depth images, SR3D first uses the external visual models to generate 3D reconstructed object mesh based on RGB image. Then, the key idea is to determine the 3D object's pose and scale to accurately localize the reconstructed object back into its original depth corrupted 3D scene. Therefore, we propose view matching and keypoint matching mechanisms,which leverage both the 2D and 3D's inherent semantic and geometric information in the observation to determine the object's 3D state within the scene, thereby reconstructing an accurate 3D depth map for effective grasp detection. Experiments in both simulation and real world show the reconstruction effectiveness of SR3D.
[49] arXiv:2506.01135 (replaced) [pdf, html, other]: Title: Understanding and Mitigating Network Latency Effect on Teleoperated-Robot with Extended Reality

Ziliang Zhang, Cong Liu, Hyoseung Kim

Comments: This documents is a 5 pages technical report version. Removed watermark from acm for copyright purpose

Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Networking and Internet Architecture (cs.NI)

Robot teleoperation with extended reality (XR teleoperation) enables intuitive interaction by allowing remote robots to mimic user motions with real-time 3D feedback. However, existing systems face significant motion-to-motion (M2M) latency--the delay between the user's latest motion and the corresponding robot feedback--leading to high teleoperation error and mission completion time. This issue stems from the system's exclusive reliance on network communication, making it highly vulnerable to network degradation.
To address these challenges, we introduce TeleXR, the first end-to-end, fully open-sourced XR teleoperation framework that decouples robot control and XR visualization from network dependencies. TeleXR leverages local sensing data to reconstruct delayed or missing information of the counterpart, thereby significantly reducing network-induced issues. This approach allows both the XR and robot to run concurrently with network transmission while maintaining high robot planning accuracy. TeleXR also features contention-aware scheduling to mitigate GPU contention and bandwidth-adaptive point cloud scaling to cope with limited bandwidth.
[50] arXiv:2506.01759 (replaced) [pdf, html, other]: Title: ADEPT: Adaptive Diffusion Environment for Policy Transfer Sim-to-Real

Youwei Yu, Junhong Xu, Lantao Liu

Comments: arXiv admin note: substantial text overlap with arXiv:2410.10766

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Model-free reinforcement learning has emerged as a powerful method for developing robust robot control policies capable of navigating through complex and unstructured environments. The effectiveness of these methods hinges on two essential elements: (1) the use of massively parallel physics simulations to expedite policy training, and (2) an environment generator tasked with crafting sufficiently challenging yet attainable environments to facilitate continuous policy improvement. Existing methods of outdoor environment generation often rely on heuristics constrained by a set of parameters, limiting the diversity and realism. In this work, we introduce ADEPT, a novel \textbf{A}daptive \textbf{D}iffusion \textbf{E}nvironment for \textbf{P}olicy \textbf{T}ransfer in the zero-shot sim-to-real fashion that leverages Denoising Diffusion Probabilistic Models to dynamically expand existing training environments by adding more diverse and complex environments adaptive to the current policy. ADEPT guides the diffusion model's generation process through initial noise optimization, blending noise-corrupted environments from existing training environments weighted by the policy's performance in each corresponding environment. By manipulating the noise corruption level, ADEPT seamlessly transitions between generating similar environments for policy fine-tuning and novel ones to expand training diversity. To benchmark ADEPT in off-road navigation, we propose a fast and effective multi-layer map representation for wild environment generation. Our experiments show that the policy trained by ADEPT outperforms both procedural generated and natural environments, along with popular navigation methods.
[51] arXiv:2506.03568 (replaced) [pdf, html, other]: Title: Confidence-Guided Human-AI Collaboration: Reinforcement Learning with Distributional Proxy Value Propagation for Autonomous Driving

Li Zeqiao, Wang Yijing, Wang Haoyu, Li Zheng, Li Peng, Zuo zhiqiang, Hu Chuan

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Autonomous driving promises significant advancements in mobility, road safety and traffic efficiency, yet reinforcement learning and imitation learning face safe-exploration and distribution-shift challenges. Although human-AI collaboration alleviates these issues, it often relies heavily on extensive human intervention, which increases costs and reduces efficiency. This paper develops a confidence-guided human-AI collaboration (C-HAC) strategy to overcome these limitations. First, C-HAC employs a distributional proxy value propagation method within the distributional soft actor-critic (DSAC) framework. By leveraging return distributions to represent human intentions C-HAC achieves rapid and stable learning of human-guided policies with minimal human interaction. Subsequently, a shared control mechanism is activated to integrate the learned human-guided policy with a self-learning policy that maximizes cumulative rewards. This enables the agent to explore independently and continuously enhance its performance beyond human guidance. Finally, a policy confidence evaluation algorithm capitalizes on DSAC's return distribution networks to facilitate dynamic switching between human-guided and self-learning policies via a confidence-based intervention function. This ensures the agent can pursue optimal policies while maintaining safety and performance guarantees. Extensive experiments across diverse driving scenarios reveal that C-HAC significantly outperforms conventional methods in terms of safety, efficiency, and overall performance, achieving state-of-the-art results. The effectiveness of the proposed method is further validated through real-world road tests in complex traffic conditions. The videos and code are available at: this https URL.
[52] arXiv:2506.03896 (replaced) [pdf, html, other]: Title: FLIP: Flowability-Informed Powder Weighing

Nikola Radulov, Alex Wright, Thomas Little, Andrew I. Cooper, Gabriella Pizzuto

Comments: Paper video can be found at this https URL

Subjects: Robotics (cs.RO)

Autonomous manipulation of powders remains a significant challenge for robotic automation in scientific laboratories. The inherent variability and complex physical interactions of powders in flow, coupled with variability in laboratory conditions necessitates adaptive automation. This work introduces FLIP, a flowability-informed powder weighing framework designed to enhance robotic policy learning for granular material handling. Our key contribution lies in using material flowability, quantified by the angle of repose, to optimise physics-based simulations through Bayesian inference. This yields material-specific simulation environments capable of generating accurate training data, which reflects diverse powder behaviours, for training "robot chemists". Building on this, FLIP integrates quantified flowability into a curriculum learning strategy, fostering efficient acquisition of robust robotic policies by gradually introducing more challenging, less flowable powders. We validate the efficacy of our method on a robotic powder weighing task under real-world laboratory conditions. Experimental results show that FLIP with a curriculum strategy achieves a low dispensing error of 2.12 +/- 1.53 mg, outperforming methods that do not leverage flowability data, such as domain randomisation (6.11 +/- 3.92 mg). These results demonstrate FLIP's improved ability to generalise to previously unseen, more cohesive powders and to new target masses.
[53] arXiv:2307.01916 (replaced) [pdf, html, other]: Title: Maximizing Seaweed Growth on Autonomous Farms: A Dynamic Programming Approach for Underactuated Systems Navigating on Uncertain Ocean Currents

Matthias Killer, Marius Wiggert, Hanna Krasowski, Manan Doshi, Pierre F.J. Lermusiaux, Claire J. Tomlin

Comments: 8 pages, submitted to IEEE Robotics and Automation Letters (RA-L) Matthias Killer and Marius Wiggert contributed equally to this work

Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Seaweed biomass presents a substantial opportunity for climate mitigation, yet to realize its potential, farming must be expanded to the vast open oceans. However, in the open ocean neither anchored farming nor floating farms with powerful engines are economically viable. Thus, a potential solution are farms that operate by going with the flow, utilizing minimal propulsion to strategically leverage beneficial ocean currents. In this work, we focus on low-power autonomous seaweed farms and design controllers that maximize seaweed growth by taking advantage of ocean currents. We first introduce a Dynamic Programming (DP) formulation to solve for the growth-optimal value function when the true currents are known. However, in reality only short-term imperfect forecasts with increasing uncertainty are available. Hence, we present three additional extensions. Firstly, we use frequent replanning to mitigate forecast errors. Second, to optimize for long-term growth, we extend the value function beyond the forecast horizon by estimating the expected future growth based on seasonal average currents. Lastly, we introduce a discounted finite-time DP formulation to account for the increasing uncertainty in future ocean current estimates. We empirically evaluate our approach with 30-day simulations of farms in realistic ocean conditions. Our method achieves 95.8\% of the best possible growth using only 5-day this http URL demonstrates that low-power propulsion is a promising method to operate autonomous seaweed farms in real-world conditions.
[54] arXiv:2405.01114 (replaced) [pdf, html, other]: Title: Continual Learning from Simulated Interactions via Multitask Prospective Rehearsal for Bionic Limb Behavior Modeling

Sharmita Dey, Benjamin Paassen, Sarath Ravindran Nair, Sabri Boughorbel, Arndt F. Schilling

Comments: Accepted at Transactions on Machine Learning Research (TMLR) 2025

Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

Lower limb amputations and neuromuscular impairments severely restrict mobility, necessitating advancements beyond conventional prosthetics. While motorized bionic limbs show promise, their effectiveness depends on replicating the dynamic coordination of human movement across diverse environments. In this paper, we introduce a model for human behavior in the context of bionic prosthesis control. Our approach leverages human locomotion demonstrations to learn the synergistic coupling of the lower limbs, enabling the prediction of the kinematic behavior of a missing limb during tasks such as walking, climbing inclines, and stairs. We propose a multitasking, continually adaptive model that anticipates and refines movements over time. At the core of our method is a technique called multitask prospective rehearsal, that anticipates and synthesizes future movements based on the previous prediction and employs a corrective mechanism for subsequent predictions. Our evolving architecture merges lightweight, task-specific modules on a shared backbone, ensuring both specificity and scalability. We validate our model through experiments on real-world human gait datasets, including transtibial amputees, across a wide range of locomotion tasks. Results demonstrate that our approach consistently outperforms baseline models, particularly in scenarios with distributional shifts, adversarial perturbations, and noise.
[55] arXiv:2411.13983 (replaced) [pdf, html, other]: Title: Learning Two-agent Motion Planning Strategies from Generalized Nash Equilibrium for Model Predictive Control

Hansung Kim, Edward L. Zhu, Chang Seok Lim, Francesco Borrelli

Comments: Accepted Proceeding at 2025 Learning for Dynamics and Control Conference (L4DC)

Subjects: Multiagent Systems (cs.MA); Robotics (cs.RO); Systems and Control (eess.SY)

We introduce an Implicit Game-Theoretic MPC (IGT-MPC), a decentralized algorithm for two-agent motion planning that uses a learned value function that predicts the game-theoretic interaction outcomes as the terminal cost-to-go function in a model predictive control (MPC) framework, guiding agents to implicitly account for interactions with other agents and maximize their reward. This approach applies to competitive and cooperative multi-agent motion planning problems which we formulate as constrained dynamic games. Given a constrained dynamic game, we randomly sample initial conditions and solve for the generalized Nash equilibrium (GNE) to generate a dataset of GNE solutions, computing the reward outcome of each game-theoretic interaction from the GNE. The data is used to train a simple neural network to predict the reward outcome, which we use as the terminal cost-to-go function in an MPC scheme. We showcase emerging competitive and coordinated behaviors using IGT-MPC in scenarios such as two-vehicle head-to-head racing and un-signalized intersection navigation. IGT-MPC offers a novel method integrating machine learning and game-theoretic reasoning into model-based decentralized multi-agent motion planning.
[56] arXiv:2503.07323 (replaced) [pdf, other]: Title: Navigating Motion Agents in Dynamic and Cluttered Environments through LLM Reasoning

Yubo Zhao, Qi Wu, Yifan Wang, Yu-Wing Tai, Chi-Keung Tang

Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

This paper advances motion agents empowered by large language models (LLMs) toward autonomous navigation in dynamic and cluttered environments, significantly surpassing first and recent seminal but limited studies on LLM's spatial reasoning, where movements are restricted in four directions in simple, static environments in the presence of only single agents much less multiple agents. Specifically, we investigate LLMs as spatial reasoners to overcome these limitations by uniformly encoding environments (e.g., real indoor floorplans), agents which can be dynamic obstacles and their paths as discrete tokens akin to language tokens. Our training-free framework supports multi-agent coordination, closed-loop replanning, and dynamic obstacle avoidance without retraining or fine-tuning. We show that LLMs can generalize across agents, tasks, and environments using only text-based interactions, opening new possibilities for semantically grounded, interactive navigation in both simulation and embodied systems.
[57] arXiv:2506.01199 (replaced) [pdf, html, other]: Title: Test Automation for Interactive Scenarios via Promptable Traffic Simulation

Augusto Mondelli, Yueshan Li, Alessandro Zanardi, Emilio Frazzoli

Comments: Accepted by CVPR 2025 Workshop Data-Driven Autonomous Driving Simulation (track 1)

Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

Autonomous vehicle (AV) planners must undergo rigorous evaluation before widespread deployment on public roads, particularly to assess their robustness against the uncertainty of human behaviors. While recent advancements in data-driven scenario generation enable the simulation of realistic human behaviors in interactive settings, leveraging these models to construct comprehensive tests for AV planners remains an open challenge. In this work, we introduce an automated method to efficiently generate realistic and safety-critical human behaviors for AV planner evaluation in interactive scenarios. We parameterize complex human behaviors using low-dimensional goal positions, which are then fed into a promptable traffic simulator, ProSim, to guide the behaviors of simulated agents. To automate test generation, we introduce a prompt generation module that explores the goal domain and efficiently identifies safety-critical behaviors using Bayesian optimization. We apply our method to the evaluation of an optimization-based planner and demonstrate its effectiveness and efficiency in automatically generating diverse and realistic driving behaviors across scenarios with varying initial conditions.
[58] arXiv:2506.04086 (replaced) [pdf, html, other]: Title: Optimizing Mesh to Improve the Triangular Expansion Algorithm for Computing Visibility Regions

Jan Mikula (1 and 2), Miroslav Kulich (1) ((1) Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, (2) Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University in Prague)

Comments: 30 pages, 43 figures (including subfigures)

Journal-ref: SN Computer Science, Volume 5, article number 262, 2024

Subjects: Computational Geometry (cs.CG); Robotics (cs.RO)

This paper addresses the problem of improving the query performance of the triangular expansion algorithm (TEA) for computing visibility regions by finding the most advantageous instance of the triangular mesh, the preprocessing structure. The TEA recursively traverses the mesh while keeping track of the visible region, the set of all points visible from a query point in a polygonal world. We show that the measured query time is approximately proportional to the number of triangle edge expansions during the mesh traversal. We propose a new type of triangular mesh that minimizes the expected number of expansions assuming the query points are drawn from a known probability distribution. We design a heuristic method to approximate the mesh and evaluate the approach on many challenging instances that resemble real-world environments. The proposed mesh improves the mean query times by 12-16% compared to the reference constrained Delaunay triangulation. The approach is suitable to boost offline applications that require computing millions of queries without addressing the preprocessing time. The implementation is publicly available to replicate our experiments and serve the community.

Total of 58 entries

Showing up to 2000 entries per page: fewer | more | all

Robotics

Showing new listings for Friday, 6 June 2025

New submissions (showing 29 of 29 entries)

Cross submissions (showing 6 of 6 entries)

Replacement submissions (showing 23 of 23 entries)