Zifan Xu

I am a Ph.D. student of Computer Science at University of Texas at Austin. I am very honored to be advised by Prof. Peter Stone. I have general interests in reinforcement learning, multitask learning, curriculum learning, and their applicaitons in robotics. My primiary focus is to apply deep reinforcement learning for learning agile and diverse motor skills for legged comotion and autonomous navigation.

Contact: Email / Github / Google Scholar / Twitter

News

Oct 15, 2024	Our paper LaRS is accepted as a finding at EMNLP 2024!
May 11, 2024	We presented one paper on legged locomotion in confined 3D space at ICRA 2024.
May 04, 2024	We presented one paper on multitask RL at ICLR 2024.

Selected Publications

🤖 RL for Robotics

Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning

Zifan Xu, Amir Hossain Raj , Xuesu Xiao , and Peter Stone

In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA) , May 2024

Abs Bib PDF

Recent advances of locomotion controllers utilizing deep reinforcement learning (RL) have yielded impressive results in terms of achieving rapid and robust locomotion across challenging terrain, such as rugged rocks, non-rigid ground, and slippery surfaces. However, while these controllers primarily address challenges underneath the robot, relatively little research has investigated legged mobility through confined 3D spaces, such as narrow tunnels or irregular voids, which impose all-around constraints. The cyclic gait patterns resulted from existing RL-based methods to learn parameterized locomotion skills characterized by motion parameters, such as velocity and body height, may not be adequate to navigate robots through challenging confined 3D spaces, requiring both agile 3D obstacle avoidance and robust legged locomotion. Instead, we propose to learn locomotion skills end-to-end from goal-oriented navigation in confined 3D spaces. To address the inefficiency of tracking distant navigation goals, we introduce a hierarchical locomotion controller that combines a classical planner tasked with planning waypoints to reach a faraway global goal location, and an RL-based policy trained to follow these waypoints by generating low-level motion commands. This approach allows the policy to explore its own locomotion skills within the entire solution space and facilitates smooth transitions between local goals, enabling long-term navigation towards distant goals. In simulation, our hierarchical approach succeeds at navigating through demanding confined 3D environments, outperforming both pure end-to-end learning approaches and parameterized locomotion skills. We further demonstrate the successful real-world deployment of our simulation-trained controller on a real robot.
@inproceedings{ICRA2024-Xu, author = {Xu, Zifan and Raj, Amir Hossain and Xiao, Xuesu and Stone, Peter}, title = {Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning}, booktitle = {Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA)}, year = {2024}, month = may, location = {Vienna, Austria}, }
Benchmarking Reinforcement Learning Techniques for Autonomous Navigation

Zifan Xu, Bo Liu , Anirudh Nair , Xuesu Xiao , and Peter Stone

Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), May 2023

Abs Bib PDF

Deep reinforcement learning (RL) has brought many successes for autonomous robot navigation. However, there still exists important limitations that prevent real-world use of RL-based navigation systems. For example, most learning approaches lack safety guarantees; and learned navigation systems may not generalize well to unseen environments. Despite a variety of recent learning techniques to tackle these challenges in general, a lack of an open-source benchmark and reproducible learning methods specifically for autonomous navigation makes it difficult for roboticists to choose what learning methods to use for their mobile robots and for learning researchers to identify current shortcomings of general learning methods for autonomous navigation. In this paper, we identify four major desiderata of applying deep RL approaches for autonomous navigation: (D1) reasoning under uncertainty, (D2) safety, (D3) learning from limited trial-and-error data, and (D4) generalization to diverse and novel environments. Then, we explore four major classes of learning techniques with the purpose of achieving one or more of the four desiderata: memory-based neural network architectures (D1), safe RL (D2), model-based RL (D2, D3), and domain randomization (D4). By deploying these learning techniques in a new open-source large- scale navigation benchmark and real-world environments, we perform a comprehensive study aimed at establishing to what extent can these techniques achieve these desiderata for RL- based navigation systems
@article{2023ICRA-Xu, title = {Benchmarking Reinforcement Learning Techniques for Autonomous Navigation}, author = {Xu, Zifan and Liu, Bo and Nair, Anirudh and Xiao, Xuesu and Stone, Peter}, journal = {Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA)}, location = {London, England}, month = may, year = {2023}, }
APPLR: Adaptive Planner Parameter Learning from Reinforcement

Zifan Xu, Gauraang Dhamankar , Anirudh Nair , Xuesu Xiao , Garrett Warnell , and 3 more authors

In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA) , Jun 2021

Abs Bib PDF Video Code Slides Website

Classical navigation systems typically operate using a fixed set of hand-picked parameters (e.g. maximum speed, sampling rate, inflation radius, etc.) and require heavy expert re-tuning in order to work in new environments. To mitigate this requirement, it has been proposed to learn parameters for different contexts in a new environment using human demonstrations collected via teleoperation. However, learning from human demonstration limits deployment to the training environment, and limits overall performance to that of a potentially-suboptimal demonstrator. In this paper, we introduce APPLR, Adaptive Planner Parameter Learning from Reinforcement, which allows existing navigation systems to adapt to new scenarios by using a parameter selection scheme discovered via reinforcement learning (RL) in a wide variety of simulation environments. We evaluate APPLR on a robot in both simulated and physical experiments, and show that it can outperform both a fixed set of hand-tuned parameters and also a dynamic parameter tuning scheme learned from human demonstration.
@inproceedings{ICRA2021-Xu, author = {Xu, Zifan and Dhamankar, Gauraang and Nair, Anirudh and Xiao, Xuesu and Warnell, Garrett and Liu, Bo and Wang, Zizhao and Stone, Peter}, title = {APPLR: Adaptive Planner Parameter Learning from Reinforcement}, booktitle = {Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA)}, location = {Xi'an, China}, month = jun, year = {2021}, }

📘 Curriculum Learning for RL

Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

Ziping Xu , Zifan Xu, Runxuan Jiang , Peter Stone , and Ambuj Tewari

In Proceedings of the International Conference on Learning Representations (ICLR) , May 2024

Abs Bib PDF

Multitask Reinforcement Learning (MTRL) approaches have gained increasing attention for its wide applications in many important Reinforcement Learning (RL) tasks. However, while recent advancements in MTRL theory have focused on the improved statistical efficiency by assuming a shared structure across tasks, exploration–a crucial aspect of RL–has been largely overlooked. This paper addresses this gap by showing that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design like ε-greedy that are inefficient in general can be sample-efficient for MTRL. To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL. It may also shed light on the enigmatic success of the wide applications of myopic exploration in practice. To validate the role of diversity, we conduct experiments on synthetic robotic control environments, where the diverse task set aligns with the task selection by automatic curriculum learning, which is empirically shown to improve sample-efficiency.
@inproceedings{ICLR2024-Xu, author = {Xu, Ziping and Xu, Zifan and Jiang, Runxuan and Stone, Peter and Tewari, Ambuj}, title = {<a href="https://openreview.net/forum?id=YZrg56G0JV"> <div class="link"> Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks </div> </a> }, booktitle = {Proceedings of the International Conference on Learning Representations (ICLR)}, year = {2024}, month = may, location = {Vienna, Austria}, }
Model-Based Meta Automatic Curriculum Learning

Zifan Xu, Yulin Zhang , Shahaf S. Shperberg , Reuth Mirsky , Yuqian Jiang , and 2 more authors

In The Second Conference on Lifelong Learning Agents (CoLLAs) , Aug 2023

Abs Bib PDF Video

Curriculum learning (CL) has been widely explored to facilitate the learning of hard-exploration tasks in reinforcement learning (RL) by training a sequence of easier tasks, often called a curriculum. While most curricula are built either manually or automatically based on heuristics, e.g. choosing a training task which is barely beyond the current abilities of the learner, the fact that similar tasks might benefit from similar curricula motivates us to explore meta-learning as a technique for curriculum generation or teaching for a distribution of similar tasks. This paper formulates the meta CL problem that requires a meta-teacher to generate the curriculum which will assist the student to train toward any given target task from a task distribution based on the similarity of these tasks to one another. We propose a model-based meta automatic curriculum learning algorithm (MM-ACL) that learns to predict the performance improvement on one task when the student is trained on another, given the current status of the student. This predictor can then be used to generate the curricula for different target tasks. Our empirical results demonstrate that MM-ACL outperforms the state-of-the-art CL algorithms in a grid-world domain and a more complex visual-based navigation domain in terms of sample efficiency.
@inproceedings{CollAs2023-Xu, author = {Xu, Zifan and Zhang, Yulin and Shperberg, Shahaf S. and Mirsky, Reuth and Jiang, Yuqian and Liu, Bo and Stone, Peter}, title = {Model-Based Meta Automatic Curriculum Learning}, booktitle = {The Second Conference on Lifelong Learning Agents (CoLLAs)}, location = {Montreal, Canada}, month = aug, year = {2023}, }
Video presentation" class="img-fluid rounded z-depth-1" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen width="auto" height="auto" />
Task Factorization in Curriculum Learning

Reuth Mirsky , Shahaf S. Shperberg , Yulin Zhang , Zifan Xu, Yuqian Jiang , and 2 more authors

In Decision Awareness in Reinforcement Learning (DARL) workshop at the 39th International Conference on Machine Learning (ICML) , Jul 2022

Abs Bib PDF

A common challenge for learning when applied to a complex “target” task is that learning that task all at once can be too difficult due to inefficient exploration given a sparse reward signal. Curriculum Learning addresses this challenge by sequencing training tasks for a learner to facilitate gradual learning. One of the crucial steps in finding a suitable curriculum learning approach is to understand the dimensions along which the domain can be factorized. In this paper, we identify different types of factorizations common in the literature of curriculum learning for reinforcement learning tasks: factorizations that involve the agent, the environment, or the mission. For each factorization category, we identify the relevant algorithms and techniques that leverage that factorization and present several case studies to showcase how leveraging an appropriate factorization can boost learning using a simple curriculum.
@inproceedings{DARL22-REUTH, author = {Mirsky, Reuth and Shperberg, Shahaf S. and Zhang, Yulin and Xu, Zifan and Jiang, Yuqian and Cui, Jiaxun and Stone, Peter}, title = {Task Factorization in Curriculum Learning}, booktitle = {Decision Awareness in Reinforcement Learning (DARL) workshop at the 39th International Conference on Machine Learning (ICML)}, location = {Baltimore, Maryland, USA}, month = jul, year = {2022}, }

💭 LLMs Reasoning

LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning

Zifan Xu, Haozhu Wang , Dmitriy Bespalov , Xian Wu , Peter Stone , and 1 more author

In Findings of Empirical Methods in Natural Language Processing , Nov 2024

Abs arXiv Bib

Chain-of-thought (CoT) prompting is a popular in-context learning (ICL) approach for large language models (LLMs), especially when tackling complex reasoning tasks. Traditional ICL approaches construct prompts using examples that contain questions similar to the input question. However, CoT prompting, which includes crucial intermediate reasoning steps (rationales) within its examples, necessitates selecting examples based on these rationales rather than the questions themselves. Existing methods require human experts or pre-trained LLMs to describe the skill, a high-level abstraction of rationales, to guide the selection. These methods, however, are often costly and difficult to scale. Instead, this paper introduces a new approach named Latent Reasoning Skills (LaRS) that employs unsupervised learning to create a latent space representation of rationales, with a latent variable called a reasoning skill. Concurrently, LaRS learns a reasoning policy to determine the required reasoning skill for a given question. Then the ICL examples are selected by aligning the reasoning skills between past examples and the question. This approach is theoretically grounded and compute-efficient, eliminating the need for auxiliary LLM inference or manual prompt design. Empirical results demonstrate that LaRS consistently outperforms SOTA skill-based selection methods, processing example banks four times faster, reducing LLM inferences during the selection stage by half, and showing greater robustness to sub-optimal example banks.
@inproceedings{Latent2023-Xu, title = {LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning}, author = {Xu, Zifan and Wang, Haozhu and Bespalov, Dmitriy and Wu, Xian and Stone, Peter and Qi, Yanjun}, booktitle = {Findings of Empirical Methods in Natural Language Processing}, location = {Miami, Florida}, month = nov, year = {2024}, }