I am a research scientist in Bytedance/TikTok in generative AI, multi-modal learning, and video/image generation. My work focuses on developing advanced AI systems that seamlessly connect language, vision, and creative intelligence. I obtained my Ph.D. from APG lab with Y.Z. Yang.
I work on generative models, video & image generation on diffusion models, and Vision-Language Model. My recent focus includes controllable generation, large scale training for diffusion models, and representation learning.
We are not hiring Full-Time Employee since the headcount filled. But we are open to long-term intern collaboration. Please drop an email.
We’re Hiring (Intern and FTE)!
Who we are: Join the Intelligence Creation Group at TikTok, where we focus on cutting-edge video generation technologies.
Our team develops advanced video generative models to power next-generation TikTok content.
We build video foundation models that bring creativity to life — like this example on TikTok (one single effect
that triggers more than !!30M!! post globally, best AI effect since 2023 on TikTok):
🎬 Click ME . We also build large scale Video Generation model
pre-training for creating powerful video foundation model. We build state-of-the-art VideoGen techniques like: video editing,
video customized generation, video motion controlled generation, etc.
Who we looking for: ✅ Strong-motivated Ph.D. candidates; ✅ Experiecnes of video generation/diffusion/AR model training; ✅ Top-tier publications.
Research Areas
Generative modeling (Image/Video diffusion model)
Vision-Language models (video-language, VLM)
Representation learning & efficient pretraining
News
[2025 Sep] New video generative foundation model (Seedance-1.0 - TikTok Version) trained by us is online. Check out its first effect - AI Flower.
[2025 March] AI Mermaid Effect is online, attracting 30M+ post on TikTok - best TikTok AI effect since 2023!.
Controllable video generation · Any-reference video generation · Foundation video generation pre-training
We develop AI effects from video generative models. I also work closely with
Bytedance SEED
to build state-of-the-art video generative models for TikTok production.
Large-scale Video Foundation Model Pre-training - Technical Owner for Seedance (TikTok Version) development and Pre-training; Model pre-training (large-scale pre-training on billions+ videos over ~2K H100 GPU) ;
@article{fang2025magref,
title={MAGREF: Masked Guidance for Any-Reference Video Generation},
author={Fang, Zhiyuan and others},
journal={arXiv},
year={2025}
}
ATI: Any Trajectory Instruction for Controllable Video GenerationAngtian Wang, Haibin Huang, Jacob Zhiyuan Fang, Yiding Yang, Chongyang Ma · arXiv 2025
Video GenerationMotion Controlled Video Generation
@inproceedings{yu2024zeroshot,
title={Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition},
author={Yu, Shoubin and Fang, Jacob Zhiyuan and Zheng, Skyler and Sigurdsson, Gunnar A and Ordonez, Vicente and Piramuthu, Robinson and Bansal, Mohit},
booktitle={ACM Multimedia},
year={2024}
}
FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image GenerationXuehai He, Jian Zheng, Jacob Zhiyuan Fang, Robinson Piramuthu, Mohit Bansal, Vicente Ordonez, Gunnar A. Sigurdsson, Nanyun Peng, Xin Eric Wang · TMLR 2024
@article{he2024flexecontrol,
title={FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation},
author={He, Xuehai and Zheng, Jian and Fang, Jacob Zhiyuan and Piramuthu, Robinson and Bansal, Mohit and Ordonez, Vicente and Sigurdsson, Gunnar A and Peng, Nanyun and Wang, Xin Eric},
journal={TMLR},
year={2024}
}
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image GenerationYingshan Chang, Yasi Zhang, Zhiyuan Fang, Yingnian Wu, Yonatan Bisk, Feng Gao · ECCV 2024
@inproceedings{chang2024skews,
title={Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation},
author={Chang, Yuqing and Zhang, Yuchen and Fang, Zhiyuan and Wu, Yuchen and Bisk, Yonatan and Gao, Feng},
booktitle={ECCV},
year={2024}
}
SEED: Self-supervised Distillation For Visual RepresentationZhiyuan Fang, Jianfeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, Zicheng Liu · ICLR 2021
@inproceedings{fang2021seed,
title={SEED: Self-supervised Distillation For Visual Representation},
author={Fang, Zhiyuan and Wang, Jianfeng and Wang, Lijuan and Zhang, Lei and Yang, Yezhou and Liu, Zicheng},
booktitle={ICLR},
year={2021}
}
Injecting Semantic Concepts into End-to-End Image CaptioningZhiyuan Fang, Jianfeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu · CVPR 2022
@inproceedings{fang2022injecting,
title={Injecting Semantic Concepts into End-to-End Image Captioning},
author={Fang, Zhiyuan and Wang, Jianfeng and Hu, Xiaowei and Liang, Lin and Gan, Zhe and Wang, Lijuan and Yang, Yezhou and Liu, Zicheng},
booktitle={CVPR},
year={2022}
}
Compressing Visual-linguistic Model via Knowledge DistillationZhiyuan Fang, Jianfeng Wang, Xiaowei Hu, Lijuan Wang, Yezhou Yang, Zicheng Liu · ICCV 2021
@inproceedings{fang2021compressing,
title={Compressing Visual-linguistic Model via Knowledge Distillation},
author={Fang, Zhiyuan and Wang, Jianfeng and Hu, Xiaowei and Lijuan Wang, Yezhou Yang, Zicheng Liu},
booktitle={ICCV},
year={2021}
}
ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural LanguageZhe Wang, Zhiyuan Fang, Jun Wang, Yezhou Yang · ECCV 2020
@inproceedings{wang2020vitaa,
title={ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language},
author={Wang, Zheng and Fang, Zhiyuan and Wang, Jianfeng and Yang, Yezhou},
booktitle={ECCV},
year={2020}
}