About Me

I am a final-year PhD student in Electrical and Computer Engineering at Duke University, currently visiting Princeton University, advised by Prof. Guillermo Sapiro. I expect to graduate in Spring 2026.

My research lies at the intersection of computer vision and generative AI, with a focus on multi-modal understanding and generation (video, image, text). I have hands-on experience building large-scale training pipelines for diffusion and autoregressive models, and have published at top venues including ICCV (Oral), NeurIPS, ICLR, CVPR.

I have interned at ByteDance and Microsoft Research, where I led projects on text-to-video generation and multi-modal learning. I am passionate about bridging research and real-world impact.

Selected Publications

Experience

ByteDance May 2025 — Dec. 2025
Research Intern, Generative AI
San Jose, CA, USA
  • Developed a semantic planning-driven framework that decouples high-level semantic planning from low-level video diffusion for video generation.
  • Built training and evaluation pipelines for text-to-video, image-to-video, and video continuation.
  • Demonstrated that semantics-driven conditioning improves prompt following and reduces visual hallucinations.
Microsoft Research Asia Jun. 2020 — May 2021
Research Intern, Multi-modal Learning
Beijing, China
  • Developed Godiva, an open-domain text-to-video model using 3-D sparse attention and large-scale multimodal data.
  • Implemented distributed data pipelines and benchmarks.
Tencent AI Lab Apr. 2019 — Sep. 2019
Research Intern, Reinforcement Learning & AutoML
Shenzhen, China
  • Developed a Reinforcement Learning (RL) and AutoML approach to optimize game-AI agent populations.

Education

Duke University Aug. 2020 — Mar. 2026 (expected)
PhD Student, Electrical and Computer Engineering
Durham, NC, USA
  • Supervisor: Prof. Guillermo Sapiro
Princeton University Aug. 2024 — Mar. 2026 (expected)
Visiting Student, Electrical and Computer Engineering
Princeton, NJ, USA
  • Supervisor: Prof. Guillermo Sapiro
Peking University Sep. 2017 — Jul. 2020
M.S., Computer Applied Technology
Beijing, China
  • Thesis: Towards Accurate Attention Mechanisms for Image Captioning
  • Supervisors: Prof. Wen Gao and Prof. Wenmin Wang
Huazhong University of Science and Technology Sep. 2013 — Jun. 2017
B.E., Electronic and Information Engineering
Wuhan, China
  • National Key Class (top students in CS/EE fields)

Honors & Awards

Outstanding Graduate, Beijing & Peking University Jun. 2020
National Scholarship, Ministry of Education of P.R. China Oct. 2019
Exceptional Award for Academic Innovation, PKU Oct. 2019
Merit Student, Peking University Oct. 2019
Outstanding Graduate, HUST Jun. 2017

Academic Service

Reviewer: ICML, ICLR, NeurIPS, CVPR, ECCV, AISTATS, AAAI, IEEE TMM

Teaching

Image and Video Processing Fall 2022 & 2023
Teaching Assistant, Duke University
  • Instructor: Matias Di Martino
Artificial Intelligence Spring 2018
Teaching Assistant, Peking University
  • Instructor: Wenmin Wang

Skills

Programming

Python C/C++ Java

Frameworks & Tools

PyTorch (Distributed) Linux Git LaTeX SLURM

Research Expertise

Multimodal GenAI Post-training (SFT/RL) Diffusion Models Autoregressive Models Self-Supervised Learning

Languages

English (Fluent) Chinese (Native)