Embodied AI

Embodied AI, Reinforcement Learning, Foundation Models

Embodied AI focuses on creating robots that can perceive, reason, and act in the physical world. It brings together dexterous hand manipulation, humanoid whole-body control, and foundation models that link perception, language, and action. By combining these elements, embodied AI aims to enable robots that can generalize and perform complex real-world tasks with human-like adaptability.

Dexhand Manipulation

Dexterous hand manipulation focuses on enabling robots to interact with objects with the precision, adaptability, and coordination of human hands. This research area explores how to control high-degree-of-freedom robotic hands to grasp, reorient, and manipulate objects of varying shapes, sizes, and materials in dynamic environments. It involves advances in tactile sensing, motor control, and learning-based methods that allow robots to perform complex in-hand manipulations, such as turning, sliding, or regrasping objects. By combining physical modeling with data-driven approaches, dexterous hand manipulation aims to achieve fine motor skills essential for tasks in manufacturing, service robotics, and daily human environments.

Humanoid Whole-Body Control

Humanoid whole-body control focuses on enabling humnanoid robots to move, balance, and interact dynamically with their environments. This research area addresses the challenge of coordinating multiple joints and limbs to achieve stable, agile, and adaptive behaviors such as walking, running, lifting, or manipulating objects while maintaining balance. It integrates principles from control theory, optimization, and machine learning to manage the complex coupling between locomotion and manipulation. By developing controllers that can reason about contact forces, dynamics, and motion planning in real time, humanoid whole-body control aims to create robots capable of performing versatile, coordinated actions in unstructured and human-centered environments.

Foundation Models

Foundation models for embodied AI aim to create large, general-purpose models that unify perception, action, and reasoning for robots and embodied agents. A key research focus is leveraging human data—including videos, motion capture, demonstrations, and language annotations—to teach these models how humans interact with the physical world. By learning from large-scale, diverse human behaviors, these models can acquire priors about object manipulation, body coordination, and goal-directed actions, enabling better generalization to new tasks and environments. Integrating human data helps bridge the gap between human intelligence and robotic capability, providing a scalable pathway for training embodied agents that act naturally, safely, and efficiently in real-world settings.

Publications

[ECCV'24] UniCode: Learning a Unified Codebook for Multimodal Large Language Models

Sipeng Zheng, Bohan Zhou, Yicheng Feng, Ye Wang, and Zongqing Lu

European Conference on Computer Vision (ECCV), Sep. 29- Oct. 4, 2024.

[ICLR'25] Discrete Latent Plans via Semantic Skill Abstractions

Haobin Jiang, Jiangxing Wang, and Zongqing Lu