
✉️ thj23 [at] mails.tsinghua.edu.cn
✉️ bernard.hengk.tan [at] gmail.com
📍 Beijing, China
Github: https://github.com/thkkk/
🎓 Google Scholar: Hengkai Tan - Google Scholar
LinkedIn: Hengkai Tan / LinkedIn
🏸 Various ball sports and fitness
🎹 Piano and music
📚 Reading
⛰️ Traveling
🎥 Movie
📈 Value investing
🇨🇳 Chinese
🇬🇧 English
🇩🇪 German (almost forgot)
<aside> <img src="/icons/sun_red.svg" alt="/icons/sun_red.svg" width="40px" /> I’m Hengkai Tan, the head of world models and embodied AI at Shengshu Technology and a PhD Candidate advised by Professor Jun Zhu and Associate Professor Hang Su in the TSAIL, Department of Computer Science and Technology, Tsinghua University.
</aside>
<aside> <img src="/icons/bullseye_blue.svg" alt="/icons/bullseye_blue.svg" width="40px" /> Vision: Build (embodied) agents that interact with real-world vision space, advancing toward AGI!
</aside>
<aside> <img src="/icons/search_orange.svg" alt="/icons/search_orange.svg" width="40px" /> Recent Focused Research Areas: Unification of Embodied Foundation Models and Multi-modal Foundation Models, as well as Video World Models.
</aside>
Feel free to reach out if you share a belief in AGI and want to collaborate on building a general-purpose embodied agent with vision space as the core focus!

Motus: A Unified Latent Action World Model Hongzhe Bi*†, Hengkai Tan*†, Shenghao Xie*, Zeyuan Wang*, Shuhe Huang*, Haitian Liu*, Ruowen Zhao, Yao Feng, Chendong Xiang, Yinze Rong, Hongyan Zhao, Hanyu Liu, Zhizhong Su, Lei Ma, Hang Su, Jun Zhu **Project Page, Paper, code
Unified world models, VLA, and video models...


RDT2: Enabling Zero-Shot Cross-Embodiment Generalization by Scaling Up UMI Data
RDT2 Team **Project Page, code

Vidarc: Embodied Video Diffusion Model for Closed-loop Control Yao Feng∗ , Chendong Xiang∗, Xinyi Mao, Hengkai Tan, Zuyue Zhang, Shuhe Huang, Kaiwen Zheng, Haitian Liu, Hang Su, Jun Zhu **Paper, code
AR+Diffusion+Embodiment-Aware Loss

Vidar: Embodied Video Diffusion Model for Generalist Manipulation Yao Feng, Hengkai Tan*, Xinyi Mao, Guodong Liu, Shuhe Huang, Chendong Xiang, Hang Su, Jun Zhu* Project Page, Paper, 公众号文章, code
Embodied video foundation model: Vidar

AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation Hengkai Tan, Yao Feng*, Xinyi Mao*, Shuhe Huang, Guodong Liu, Zhongkai Hao, Hang Su, Jun Zhu* Project Page, Paper, 公众号文章
Automated task-agnostic data collection and IDM with near-100% replay success rate

( AAAI 2026 ) H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation Hongzhe Bi, Lingxuan Wu, Tianwei Lin, Hengkai Tan, Zhizhong Su, Hang Su, Jun Zhu **Project Page, PDF, CODE

ManiBox: Enhancing Embodied Spatial Generalization via Scalable Simulation Data Generation Hengkai Tan, Xuezhou Xu*, Chengyang Ying, Xinyi Mao, Songming Liu, Xingxing Zhang, Hang Su, Jun Zhu* Project Page, 公众号文章
Scaling laws of spatial generatlization and robust manipulation using bounding box
See more on my Google Scholar.

