About Me
I am a final-year PhD student in Electrical and Computer Engineering at Duke University, currently visiting Princeton University, advised by Prof. Guillermo Sapiro. I expect to graduate in Spring 2026.
My research lies at the intersection of computer vision and generative AI, with a focus on multi-modal understanding and generation (video, image, text). I have hands-on experience building large-scale training pipelines for diffusion and autoregressive models, and have published at top venues including ICCV (Oral), NeurIPS, ICLR, CVPR.
I have interned at ByteDance and Microsoft Research, where I led projects on text-to-video generation and multi-modal learning. I am passionate about bridging research and real-world impact.
Selected Publications
-
arXiv:2511.17986, 2025
-
ICLR, 2025
-
Transactions on Machine Learning Research (TMLR), 2025
-
CVPR Workshops, 2024
-
arXiv:2104.14806, 2021
-
NeurIPS, 2019
-
ICCV (Oral), 2019
Experience
- Developed a semantic planning-driven framework that decouples high-level semantic planning from low-level video diffusion for video generation.
- Built training and evaluation pipelines for text-to-video, image-to-video, and video continuation.
- Demonstrated that semantics-driven conditioning improves prompt following and reduces visual hallucinations.
- Developed Godiva, an open-domain text-to-video model using 3-D sparse attention and large-scale multimodal data.
- Implemented distributed data pipelines and benchmarks.
- Developed a Reinforcement Learning (RL) and AutoML approach to optimize game-AI agent populations.
Education
- Supervisor: Prof. Guillermo Sapiro
- Supervisor: Prof. Guillermo Sapiro
- Thesis: Towards Accurate Attention Mechanisms for Image Captioning
- Supervisors: Prof. Wen Gao and Prof. Wenmin Wang
- National Key Class (top students in CS/EE fields)
Honors & Awards
Academic Service
Reviewer: ICML, ICLR, NeurIPS, CVPR, ECCV, AISTATS, AAAI, IEEE TMM
Teaching
- Instructor: Matias Di Martino
- Instructor: Wenmin Wang