Headshot (replace avatar.jpg)

Jacob Zhiyuan Fang

Research Scientist, ByteDance · Ph.D. · Visual Gen AI · Vision & Language

About

I am a research scientist in Bytedance/TikTok in generative AI, multi-modal learning, and video/image generation. My work focuses on developing advanced AI systems that seamlessly connect language, vision, and creative intelligence. I obtained my Ph.D. from APG lab with Y.Z. Yang.

I work on generative models, video & image generation on diffusion models, and vision-language. My recent focus includes controllable generation, large scale training for diffusion models, and representation learning.

We’re Hiring (Intern and FTE)!

Who we are:
Join the Intelligence Creation Group at TikTok, where we focus on cutting-edge video generation technologies. Our team develops advanced video generative models to power next-generation TikTok content. We build video foundation models that bring creativity to life — like this example on TikTok (one single effect that triggers more than !!30M!! post globally, best AI effect since 2023 across TikTok):
🎬 Click ME .
We also build large scale Video Generation model pre-training for creating powerful video foundation model. We build state-of-the-art VideoGen techniques like: video editing, video customized generation, video motion controlled generation, etc.

Who we looking for:
✅ Strong-motivated Ph.D. candidates;
✅ Experiecnes of video generation/diffusion/AR model training;
✅ Top-tier publications.


Research Areas
  • Generative modeling (Image/Video diffusion model )
  • Vision-Language models (video-language, VLM)
  • Representation learning & efficient pretraining
News
  • [2025 Sep] New video generative foundation model trained by us is incoming for TikTok users. Stay tuned.

  • [2025 March] AI Mermaid Effect is online, attracting 30M+ post on TikTok - best TikTok AI effect since 2023!.

  • [2025 Jan] AI Alive is online, check out Shou's demo video on our product.

  • [2024 Oct] ACM MM'24: Zero-Shot Controllable Image-to-Video Animation .

  • [2024 July]: Joined Bytedance Global GenAI - Intelligent Creation team as a Research Scientist.

Experience

Global GenAI, ByteDance / TikTok
Senior Research Scientist · 2024 — Present
San Jose, USA
Video Generative Model
Controllable video generation · Any-reference video generation · Foundation video generation pre-training
We develop AI effects from video generative models. I also work closely with Bytedance SEED to build state-of-the-art video generative models for TikTok production.
Amazon AGI
Applied Scientist · 2022 — 2024
Sunnyvale, USA
Image/Video Generation; Large-scale Diffusion Pre-training & Post-training
Image/Video Diffusion Model
Part of Amazon AGI org working on image generation model Titan (post-training) and video generation model Emerald (pre-training and SFT).

Product Demos

AI Mermaid
AI Mermaid effect demo. Over 30M+ posts since online. BEST AI effect on Tiktok since 2023.
AI Alive - Tiktok
AI Alive online! Demo video by Shou.
AI SwayDance
AI Sway Dance Effect demo. Over 3M+ posts in 3 weeks. Let's hop hop hop!
AI Hug
AI Hug Effect demo. Hug with your loved one.
Explore My Work
Drag nodes • wheel to zoom • double-click background to reset
center nodes are experiences • pubs orbit around them
tests: running…

Selected Preprints & Publications

MAGREF: Masked Guidance for Any-Reference Video Generation Yufan Deng, Xun Guo, Yuanyang Yin, Jacob Zhiyuan Fang, Yiding Yang, Yizhi Wang, Shenghai Yuan, Angtian Wang, Bo Liu, Haibin Huang, Chongyang Ma · arXiv 2025
Video GenerationID/IP Reference Video Generation
ATI: Any Trajectory Instruction for Controllable Video Generation Angtian Wang, Haibin Huang, Jacob Zhiyuan Fang, Yiding Yang, Chongyang Ma · arXiv 2025
Video GenerationMotion Controlled Video Generation
Zero-shot controllable image-to-video animation teaser
Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition Shoubin Yu, Jacob Zhiyuan Fang, Skyler Zheng, Gunnar A. Sigurdsson, Vicente Ordonez, Robinson Piramuthu, Mohit Bansal · ACM MM 2024
Video GenerationControl GenerationDiffusion
FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation Xuehai He, Jian Zheng, Jacob Zhiyuan Fang, Robinson Piramuthu, Mohit Bansal, Vicente Ordonez, Gunnar A. Sigurdsson, Nanyun Peng, Xin Eric Wang · TMLR 2024
Image GenerationDiffusion ModelEfficiency
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation Yingshan Chang, Yasi Zhang, Zhiyuan Fang, Yingnian Wu, Yonatan Bisk, Feng Gao · ECCV 2024
Image GenerationDiffusion
SEED: Self-supervised Distillation For Visual Representation Zhiyuan Fang, Jianfeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, Zicheng Liu · ICLR 2021
Self-supervised LearningKnowledge Distillation
Injecting Semantic Concepts into End-to-End Image Captioning Zhiyuan Fang, Jianfeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu · CVPR 2022
Image CaptioningVision & Language
Compressing Visual-linguistic Model via Knowledge Distillation Zhiyuan Fang, Jianfeng Wang, Xiaowei Hu, Lijuan Wang, Yezhou Yang, Zicheng Liu · ICCV 2021
Knowledge DistillationVision and Language
ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language Zhe Wang, Zhiyuan Fang, Jun Wang, Yezhou Yang · ECCV 2020
Person Search
More in Google Scholar.

Service

  • Reviewer: ICCV, CVPR, ECCV, Neurips, ICLR, ICML, ACL, EMNLP, SIGGRAPH, SIGGRAPH-ASIA, TMLR, etc.

Contact

Email: zfang29@asu.edu · Open to collaborations and intern inquiries.