Publications | Zifan Xu

2025

Benchmarking Massively Parallelized Multi-Task Reinforcement Learning for Robotics Tasks

Viraj Joshi^* , Zifan Xu^*, Bo Liu , Peter Stone , and Amy Zhang

In Reinforcement Learning Conference (RLC) , Aug 2025

Abs PDF Code

Multi-task Reinforcement Learning (MTRL) has emerged as a critical training paradigm for applying reinforcement learning (RL) to a set of complex real-world robotic tasks, which demands a generalizable and robust policy. At the same time, massively parallelized training has gained popularity, not only for significantly accelerating data collection through GPU-accelerated simulation but also for enabling diverse data collection across multiple tasks by simulating heterogeneous scenes in parallel. However, existing MTRL research has largely been limited to off-policy methods like SAC in the low-parallelization regime. MTRL could capitalize on the higher asymptotic performance of on-policy algorithms, whose batches require data from the current policy, and as a result, take advantage of massive parallelization offered by GPU-accelerated simulation. To bridge this gap, we introduce a massively parallelized Multi-Task Benchmark for robotics (MTBench), an open-sourced benchmark featuring a broad distribution of 50 manipulation tasks and 20 locomotion tasks, implemented using the GPU-accelerated simulator IsaacGym. MTBench also includes four base RL algorithms combined with seven state-of-the-art MTRL algorithms and architectures, providing a unified framework for evaluating their performance. Our extensive experiments highlight the superior speed of evaluating MTRL approaches using MTBench, while also uncovering unique challenges that arise from combining massive parallelism with MTRL. Code is available at https://github.com/Viraj-Joshi/MTBench.
GACL: Grounded Adaptive Curriculum Learning with Active Task and Performance Monitoring

Linji Wang , Zifan Xu, Peter Stone , and Xuesu Xiao

In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Oct 2025

Abs PDF

Curriculum learning has emerged as a promising approach for training complex robotics tasks, yet current applications predominantly rely on manually designed curricula, which demand significant engineering effort and can suffer from subjective and suboptimal human design choices. While automated curriculum learning has shown success in simple domains like grid worlds and games where task distributions can be easily specified, robotics tasks present unique challenges: they require handling complex task spaces while maintaining relevance to target domain distributions that are only partially known through limited samples. To this end, we propose Grounded Adaptive Curriculum Learning (GACL), a framework specifically designed for robotics curriculum learning with three key innovations: (1) a task representation that consistently handles complex robot task design, (2) an active performance tracking mechanism that allows adaptive curriculum generation appropriate for the robot’s current capabilities, and (3) a grounding approach that maintains target domain relevance through alternating sampling between reference and synthetic tasks. We validate GACL on wheeled navigation in constrained environments and quadruped locomotion in challenging 3D confined spaces, achieving 6.8% and 6.1% higher success rates, respectively, than state-of-the-art methods in each domain.

2024

Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

Ziping Xu , Zifan Xu, Runxuan Jiang , Peter Stone , and Ambuj Tewari

In Proceedings of the International Conference on Learning Representations (ICLR) , May 2024

Abs Bib PDF

Multitask Reinforcement Learning (MTRL) approaches have gained increasing attention for its wide applications in many important Reinforcement Learning (RL) tasks. However, while recent advancements in MTRL theory have focused on the improved statistical efficiency by assuming a shared structure across tasks, exploration–a crucial aspect of RL–has been largely overlooked. This paper addresses this gap by showing that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design like ε-greedy that are inefficient in general can be sample-efficient for MTRL. To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL. It may also shed light on the enigmatic success of the wide applications of myopic exploration in practice. To validate the role of diversity, we conduct experiments on synthetic robotic control environments, where the diverse task set aligns with the task selection by automatic curriculum learning, which is empirically shown to improve sample-efficiency.
@inproceedings{ICLR2024-Xu, author = {Xu, Ziping and Xu, Zifan and Jiang, Runxuan and Stone, Peter and Tewari, Ambuj}, title = {<a href="https://openreview.net/forum?id=YZrg56G0JV"> <div class="link"> Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks </div> </a> }, booktitle = {Proceedings of the International Conference on Learning Representations (ICLR)}, year = {2024}, month = may, location = {Vienna, Austria}, }
Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning

Zifan Xu, Amir Hossain Raj , Xuesu Xiao , and Peter Stone

In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA) , May 2024

Abs Bib PDF

Recent advances of locomotion controllers utilizing deep reinforcement learning (RL) have yielded impressive results in terms of achieving rapid and robust locomotion across challenging terrain, such as rugged rocks, non-rigid ground, and slippery surfaces. However, while these controllers primarily address challenges underneath the robot, relatively little research has investigated legged mobility through confined 3D spaces, such as narrow tunnels or irregular voids, which impose all-around constraints. The cyclic gait patterns resulted from existing RL-based methods to learn parameterized locomotion skills characterized by motion parameters, such as velocity and body height, may not be adequate to navigate robots through challenging confined 3D spaces, requiring both agile 3D obstacle avoidance and robust legged locomotion. Instead, we propose to learn locomotion skills end-to-end from goal-oriented navigation in confined 3D spaces. To address the inefficiency of tracking distant navigation goals, we introduce a hierarchical locomotion controller that combines a classical planner tasked with planning waypoints to reach a faraway global goal location, and an RL-based policy trained to follow these waypoints by generating low-level motion commands. This approach allows the policy to explore its own locomotion skills within the entire solution space and facilitates smooth transitions between local goals, enabling long-term navigation towards distant goals. In simulation, our hierarchical approach succeeds at navigating through demanding confined 3D environments, outperforming both pure end-to-end learning approaches and parameterized locomotion skills. We further demonstrate the successful real-world deployment of our simulation-trained controller on a real robot.
@inproceedings{ICRA2024-Xu, author = {Xu, Zifan and Raj, Amir Hossain and Xiao, Xuesu and Stone, Peter}, title = {Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning}, booktitle = {Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA)}, year = {2024}, month = may, location = {Vienna, Austria}, }
LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning

Zifan Xu, Haozhu Wang , Dmitriy Bespalov , Xian Wu , Peter Stone , and 1 more author

In Findings of Empirical Methods in Natural Language Processing , Nov 2024

Abs arXiv Bib

Chain-of-thought (CoT) prompting is a popular in-context learning (ICL) approach for large language models (LLMs), especially when tackling complex reasoning tasks. Traditional ICL approaches construct prompts using examples that contain questions similar to the input question. However, CoT prompting, which includes crucial intermediate reasoning steps (rationales) within its examples, necessitates selecting examples based on these rationales rather than the questions themselves. Existing methods require human experts or pre-trained LLMs to describe the skill, a high-level abstraction of rationales, to guide the selection. These methods, however, are often costly and difficult to scale. Instead, this paper introduces a new approach named Latent Reasoning Skills (LaRS) that employs unsupervised learning to create a latent space representation of rationales, with a latent variable called a reasoning skill. Concurrently, LaRS learns a reasoning policy to determine the required reasoning skill for a given question. Then the ICL examples are selected by aligning the reasoning skills between past examples and the question. This approach is theoretically grounded and compute-efficient, eliminating the need for auxiliary LLM inference or manual prompt design. Empirical results demonstrate that LaRS consistently outperforms SOTA skill-based selection methods, processing example banks four times faster, reducing LLM inferences during the selection stage by half, and showing greater robustness to sub-optimal example banks.
@inproceedings{Latent2023-Xu, title = {LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning}, author = {Xu, Zifan and Wang, Haozhu and Bespalov, Dmitriy and Wu, Xian and Stone, Peter and Qi, Yanjun}, booktitle = {Findings of Empirical Methods in Natural Language Processing}, location = {Miami, Florida}, month = nov, year = {2024}, }

2023

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Megan M. Baker , Alexander New , Mario Aguilar-Simon , Ziad Al-Halah , Sébastien M. R. Arnold , and 40 more authors

Neural Networks, Mar 2023

Abs PDF

Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through the development of "Lifelong Learning" systems that are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability. Unfortunately, efforts to improve these capabilities are typically treated as distinct areas of research that are assessed independently, without regard to the impact of each separate capability on other aspects of the system. We instead propose a holistic approach, using a suite of metrics and an evaluation framework to assess Lifelong Learning in a principled way that is agnostic to specific domains or system techniques. Through five case studies, we show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems. We highlight how the proposed suite of metrics quantifies performance trade-offs present during Lifelong Learning system development - both the widely discussed Stability-Plasticity dilemma and the newly proposed relationship between Sample Efficient and Robust Learning. Further, we make recommendations for the formulation and use of metrics to guide the continuing development of Lifelong Learning systems and assess their progress in the future.
Learning Real-world Autonomous Navigation by Self-Supervised Environment Synthesis

Zifan Xu, Anirudh Nair , Xuesu Xiao , and Peter Stone

In First Workshop on Photorealistic Image and Environment Synthesis for Robotics (PIES-Rob) at IROS 2023 , Oct 2023

Abs Bib PDF

Machine learning approaches have recently enabled autonomous navigation for mobile robots in a datadriven manner. Since most existing learning-based navigation systems are trained with data generated in artificially created training environments, during real-world deployment at scale, it is inevitable that robots will encounter unseen scenarios, which are out of the training distribution and therefore lead to poor real-world performance. On the other hand, directly training in the real world is generally unsafe and inefficient. To address this issue, we introduce Self-supervised Environment Synthesis (SES), in which, after real-world deployment with safety and efficiency requirements, autonomous mobile robots can utilize experience from the real-world deployment, reconstruct navigation scenarios, and synthesize representative training environments in simulation. Training in these synthesized environments leads to improved future performance in the real world. The effectiveness of SES at synthesizing representative simulation environments and improving real-world navigation performance is evaluated via a large-scale deployment in a highfidelity, realistic simulator1 and a small-scale deployment on a physical robot.
@inproceedings{IROS2023-Xu, title = {Learning Real-world Autonomous Navigation by Self-Supervised Environment Synthesis}, author = {Xu, Zifan and Nair, Anirudh and Xiao, Xuesu and Stone, Peter}, booktitle = {First Workshop on Photorealistic Image and Environment Synthesis for Robotics (PIES-Rob) at IROS 2023}, location = {Detroit, Michigan, USA}, month = oct, year = {2023}, }
Benchmarking Reinforcement Learning Techniques for Autonomous Navigation

Zifan Xu, Bo Liu , Anirudh Nair , Xuesu Xiao , and Peter Stone

Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), May 2023

Abs Bib PDF

Deep reinforcement learning (RL) has brought many successes for autonomous robot navigation. However, there still exists important limitations that prevent real-world use of RL-based navigation systems. For example, most learning approaches lack safety guarantees; and learned navigation systems may not generalize well to unseen environments. Despite a variety of recent learning techniques to tackle these challenges in general, a lack of an open-source benchmark and reproducible learning methods specifically for autonomous navigation makes it difficult for roboticists to choose what learning methods to use for their mobile robots and for learning researchers to identify current shortcomings of general learning methods for autonomous navigation. In this paper, we identify four major desiderata of applying deep RL approaches for autonomous navigation: (D1) reasoning under uncertainty, (D2) safety, (D3) learning from limited trial-and-error data, and (D4) generalization to diverse and novel environments. Then, we explore four major classes of learning techniques with the purpose of achieving one or more of the four desiderata: memory-based neural network architectures (D1), safe RL (D2), model-based RL (D2, D3), and domain randomization (D4). By deploying these learning techniques in a new open-source large- scale navigation benchmark and real-world environments, we perform a comprehensive study aimed at establishing to what extent can these techniques achieve these desiderata for RL- based navigation systems
@article{2023ICRA-Xu, title = {Benchmarking Reinforcement Learning Techniques for Autonomous Navigation}, author = {Xu, Zifan and Liu, Bo and Nair, Anirudh and Xiao, Xuesu and Stone, Peter}, journal = {Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA)}, location = {London, England}, month = may, year = {2023}, }
Model-Based Meta Automatic Curriculum Learning

Zifan Xu, Yulin Zhang , Shahaf S. Shperberg , Reuth Mirsky , Yuqian Jiang , and 2 more authors

In The Second Conference on Lifelong Learning Agents (CoLLAs) , Aug 2023

Abs Bib PDF Video

Curriculum learning (CL) has been widely explored to facilitate the learning of hard-exploration tasks in reinforcement learning (RL) by training a sequence of easier tasks, often called a curriculum. While most curricula are built either manually or automatically based on heuristics, e.g. choosing a training task which is barely beyond the current abilities of the learner, the fact that similar tasks might benefit from similar curricula motivates us to explore meta-learning as a technique for curriculum generation or teaching for a distribution of similar tasks. This paper formulates the meta CL problem that requires a meta-teacher to generate the curriculum which will assist the student to train toward any given target task from a task distribution based on the similarity of these tasks to one another. We propose a model-based meta automatic curriculum learning algorithm (MM-ACL) that learns to predict the performance improvement on one task when the student is trained on another, given the current status of the student. This predictor can then be used to generate the curricula for different target tasks. Our empirical results demonstrate that MM-ACL outperforms the state-of-the-art CL algorithms in a grid-world domain and a more complex visual-based navigation domain in terms of sample efficiency.
@inproceedings{CollAs2023-Xu, author = {Xu, Zifan and Zhang, Yulin and Shperberg, Shahaf S. and Mirsky, Reuth and Jiang, Yuqian and Liu, Bo and Stone, Peter}, title = {Model-Based Meta Automatic Curriculum Learning}, booktitle = {The Second Conference on Lifelong Learning Agents (CoLLAs)}, location = {Montreal, Canada}, month = aug, year = {2023}, }
<iframe src="Video presentation" class="img-fluid rounded z-depth-1" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen width="auto" height="auto" />

2022

Autonomous Ground Navigation in Highly Constrained Spaces: Lessons Learned From the Benchmark Autonomous Robot Navigation Challenge at ICRA 2022

Xuesu Xiao , Zifan Xu, Zizhao Wang , Yunlong Song , Garrett Warnell , and 12 more authors

IEEE Robotics & Automation Magazine, Dec 2022

Abs PDF

The BARN (Benchmark Autonomous Robot Navigation) Challenge took place at the 2022 IEEE International Conference on Robotics and Automation (ICRA 2022) in Philadelphia, PA. The aim of the challenge was to evaluate state-of-the-art autonomous ground navigation systems for moving robots through highly constrained environments in a safe and efficient manner. Specifically, the task was to navigate a standardized, differential-drive ground robot from a predefined start location to a goal location as quickly as possible without colliding with any obstacles, both in simulation and in the real world. Five teams from all over the world participated in the qualifying simulation competition, three of which were invited to compete with each other at a set of physical obstacle courses at the conference center in Philadelphia. The competition results suggest that autonomous ground navigation in highly constrained spaces, despite seeming ostensibly simple even for experienced roboticists, is actually far from being a solved problem. In this article, we discuss the challenge, the approaches used by the top three winning teams, and lessons learned to direct future research.

DynaBARN: Benchmarking Metric Ground Navigation in Dynamic Environments

Anirudh Nair , Fulin Jiang , Kang Hou , Zifan Xu, Shuozhe Li , and 2 more authors

International Symposium on Safety, Security, and Rescue Robotics (SSRR), Nov 2022

Bib PDF

@article{SSRR2022-Ani,
  title = {DynaBARN: Benchmarking Metric Ground Navigation in Dynamic Environments},
  author = {Nair, Anirudh and Jiang, Fulin and Hou, Kang and Xu, Zifan and Li, Shuozhe and Xiao, Xuesu and and Peter Stone},
  journal = {International Symposium on Safety, Security, and Rescue Robotics (SSRR)},
  year = {2022},
  month = nov,
  location = {Seville, Spain},
}

APPL: Adaptive Planner Parameter Learning

Xuesu Xiao , Zizhao Wang , Zifan Xu, Bo Liu , Gauraang Dhamankar , and 3 more authors

Robotics and Autonomous Systems, May 2022

Abs Bib PDF

While current autonomous navigation systems allow robots to successfully drive themselves from one point to another in specific environments, they typically require extensive manual parameter re-tuning by human robotics experts in order to function in new environments. Furthermore, even for just one complex environment, a single set of fine-tuned parameters may not work well in different regions of that environment. These problems prohibit reliable mobile robot deployment by non-expert users. As a remedy, we propose Adaptive Planner Parameter Learning (APPL), a machine learning framework that can leverage non-expert human interaction via several modalities â€“ including teleoperated demonstrations, corrective interventions, and evaluative feedback â€“ and also unsupervised reinforcement learning to learn a parameter policy that can dynamically adjust the parameters of classical navigation systems in response to changes in the environment. APPL inherits safety and explainability from classical navigation systems while also enjoying the benefits of machine learning, i.e., the ability to adapt and improve from experience. We present a suite of individual appl methods and also a unifying cycle-oflearning scheme that combines all the proposed methods in a framework that can improve navigation performance through continual, iterative human interaction and simulation training.
@article{RAS2022-Xiao, title = {APPL: Adaptive Planner Parameter Learning}, author = {Xiao, Xuesu and Wang, Zizhao and Xu, Zifan and Liu, Bo and abd Gauraang Dhamankar and Nair, Anirudh and Warnell, Garrett and Stone, Peter}, journal = {Robotics and Autonomous Systems}, year = {2022}, month = may }
Task Factorization in Curriculum Learning

Reuth Mirsky , Shahaf S. Shperberg , Yulin Zhang , Zifan Xu, Yuqian Jiang , and 2 more authors

In Decision Awareness in Reinforcement Learning (DARL) workshop at the 39th International Conference on Machine Learning (ICML) , Jul 2022

Abs Bib PDF

A common challenge for learning when applied to a complex “target” task is that learning that task all at once can be too difficult due to inefficient exploration given a sparse reward signal. Curriculum Learning addresses this challenge by sequencing training tasks for a learner to facilitate gradual learning. One of the crucial steps in finding a suitable curriculum learning approach is to understand the dimensions along which the domain can be factorized. In this paper, we identify different types of factorizations common in the literature of curriculum learning for reinforcement learning tasks: factorizations that involve the agent, the environment, or the mission. For each factorization category, we identify the relevant algorithms and techniques that leverage that factorization and present several case studies to showcase how leveraging an appropriate factorization can boost learning using a simple curriculum.
@inproceedings{DARL22-REUTH, author = {Mirsky, Reuth and Shperberg, Shahaf S. and Zhang, Yulin and Xu, Zifan and Jiang, Yuqian and Cui, Jiaxun and Stone, Peter}, title = {Task Factorization in Curriculum Learning}, booktitle = {Decision Awareness in Reinforcement Learning (DARL) workshop at the 39th International Conference on Machine Learning (ICML)}, location = {Baltimore, Maryland, USA}, month = jul, year = {2022}, }
Model-Based Meta Automatic Curriculum Learning

Zifan Xu, Yulin Zhang , Shahaf S. Shperberg , Reuth Mirsky , Yulin Zhan , and 3 more authors

In Decision Awareness in Reinforcement Learning (DARL) workshop at the 39th International Conference on Machine Learning (ICML) , Jul 2022

Abs Bib PDF

When an agent trains for one target task, its experience is expected to be useful for training on another target task. This paper formulates the meta curriculum learning problem that builds a sequence of intermediate training tasks, called a curriculum, which will assist the learner to train toward any given target task in general. We propose a model-based meta automatic curriculum learning algorithm (MM-ACL) that learns to predict the performance improvement on one task when the policy is trained on another, given contextual information such as the history of training tasks, loss functions, rollout state-action trajectories from the policy, etc. This predictor facilitates the generation of curricula that optimizes the performance of the learner on different target tasks. Our empirical results demonstrate that MM-ACL outperforms a random curriculum, a manually created curriculum, and a commonly used non-stationary bandit algorithm in a GridWorld domain.
@inproceedings{DARL22-ZIFAN, author = {Xu, Zifan and Zhang, Yulin and Shperberg, Shahaf S. and Mirsky, Reuth and Zhan, Yulin and Jiang, Yuqian and Liu, Bo and Stone, Peter}, title = {Model-Based Meta Automatic Curriculum Learning}, booktitle = {Decision Awareness in Reinforcement Learning (DARL) workshop at the 39th International Conference on Machine Learning (ICML)}, location = {Baltimore, Maryland, USA}, month = jul, year = {2022}, }
Causal Dynamics Learning for Task-Independent State Abstraction

Zizhao Wang , Xuesu Xiao , Zifan Xu, Yuke Zhu , and Peter Stone

In Proceedings of the 39th International Conference on Machine Learning (ICML2022) , Jul 2022

Abs Bib PDF

Learning dynamics models accurately is an important goal for Model-Based Reinforcement Learning (MBRL), but most MBRL methods learn a dense dynamics model which is vulnerable to spurious correlations and therefore generalizes poorly to unseen states. In this paper, we introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL), which first learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action, thus generalizing well to unseen states. A state abstraction can then be derived from the learned dynamics, which not only improves sample efficiency but also applies to a wider range of tasks than existing state abstraction methods. Evaluated on two simulated environments and downstream tasks, both the dynamics model and policies learned by the proposed method generalize well to unseen states and the derived state abstraction improves sample efficiency compared to learning without it.
@inproceedings{ICML22-wang, author = {Wang, Zizhao and Xiao, Xuesu and Xu, Zifan and Zhu, Yuke and Stone, Peter}, title = {Causal Dynamics Learning for Task-Independent State Abstraction}, booktitle = {Proceedings of the 39th International Conference on Machine Learning (ICML2022)}, location = {Baltimore, USA}, month = jul, year = {2022}, }

2021

Machine Learning Methods for Local Motion Planning: A Study of End-to-End vs. Parameter Learning

Zifan Xu, Xuesu Xiao , Garrett Warnell , Anirudh Nair , and Peter Stone

In Proceedings of the 2021 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR 2021) , Oct 2021

Abs PDF

While decades of research efforts have been devoted to developing classical autonomous navigation systems to move robots from one point to another in a collision-free manner, machine learning approaches to navigation have been recently proposed to learn navigation behaviors from data. Two representative paradigms are end-to-end learning (directly from perception to motion) and parameter learning (from perception to parameters used by a classical underlying planner). These two types of methods are believed to have complementary pros and cons: parameter learning is expected to be robust to different scenarios, have provable guarantees, and exhibit explainable behaviors; end-to-end learning does not require extensive engineering and has the potential to outperform approaches that rely on classical systems. However, these beliefs have not been verified through real-world experiments in a comprehensive way. In this paper, we report on an extensive study to compare end-to-end and parameter learning for local motion planners in a large suite of simulated and physical experiments. In particular, we test the performance of end-to-end motion policies, which directly compute raw motor commands, and parameter policies, which compute parameters to be used by classical planners, with different inputs (e.g., raw sensor data, costmaps), and provide an analysis of the results.
APPLR: Adaptive Planner Parameter Learning from Reinforcement

Zifan Xu, Gauraang Dhamankar , Anirudh Nair , Xuesu Xiao , Garrett Warnell , and 3 more authors

In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA) , Jun 2021

Abs Bib PDF Video Code Slides Website

Classical navigation systems typically operate using a fixed set of hand-picked parameters (e.g. maximum speed, sampling rate, inflation radius, etc.) and require heavy expert re-tuning in order to work in new environments. To mitigate this requirement, it has been proposed to learn parameters for different contexts in a new environment using human demonstrations collected via teleoperation. However, learning from human demonstration limits deployment to the training environment, and limits overall performance to that of a potentially-suboptimal demonstrator. In this paper, we introduce APPLR, Adaptive Planner Parameter Learning from Reinforcement, which allows existing navigation systems to adapt to new scenarios by using a parameter selection scheme discovered via reinforcement learning (RL) in a wide variety of simulation environments. We evaluate APPLR on a robot in both simulated and physical experiments, and show that it can outperform both a fixed set of hand-tuned parameters and also a dynamic parameter tuning scheme learned from human demonstration.
@inproceedings{ICRA2021-Xu, author = {Xu, Zifan and Dhamankar, Gauraang and Nair, Anirudh and Xiao, Xuesu and Warnell, Garrett and Liu, Bo and Wang, Zizhao and Stone, Peter}, title = {APPLR: Adaptive Planner Parameter Learning from Reinforcement}, booktitle = {Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA)}, location = {Xi'an, China}, month = jun, year = {2021}, }

GOLD-FACTUAL: Learning to Generate Faithful Summaries from Models’ Generations

Zifan Xu, and Liyan Tang

CS394R Final Project, Jun 2021

Bib PDF

@article{GOLD-FACTUAL-LIYAN,
  author = {Xu, Zifan and Tang, Liyan},
  title = {GOLD-FACTUAL: Learning to Generate Faithful Summaries from Models’ Generations},
  journal = {CS394R Final Project},
  year = {2021}
}

A Scavenger Hunt for Service Robots

Harel Yedidsion , Jennifer Suriadinata , Zifan Xu, Stefan Debruyn , and Peter Stone

In Proceedings of the 2021 International Conference on Robotics and Automation (ICRA) , May 2021

Abs Bib PDF Video Code Website

Creating robots that can perform general-purpose service tasks in a human-populated environment has been a longstanding grand challenge for AI and Robotics research. One particularly valuable skill that is relevant to a wide variety of tasks is the ability to locate and retrieve objects upon request. This paper models this skill as a Scavenger Hunt (SH) game, which we formulate as a variation of the NP-hard stochastic traveling purchaser problem. In this problem, the goal is to find a set of objects as quickly as possible, given probability distributions of where they may be found. We investigate the performance of several solution algorithms for the SH problem, both in simulation and on a real mobile robot. We use Reinforcement Learning (RL) to train an agent to plan a minimal cost path, and show that the RL agent can outperform a range of heuristic algorithms, achieving near optimal performance. In order to stimulate research on this problem, we introduce a publicly available software stack and associated website that enable users to upload scavenger hunts which robots can download, perform, and learn from to continually improve their performance on future hunts.
@inproceedings{ICRA21-Yedidsion, author = {Yedidsion, Harel and Suriadinata, Jennifer and Xu, Zifan and Debruyn, Stefan and Stone, Peter}, title = {A Scavenger Hunt for Service Robots}, booktitle = {Proceedings of the 2021 International Conference on Robotics and Automation (ICRA)}, location = {Xi'an China}, month = may, year = {2021}, }