Jacob Zhiyuan Fang

Senior Research Scientist, ByteDance · Ph.D. · Visual Gen AI · Vision & Language

LinkedIn Google Scholar Email

About

I am a research scientist in Bytedance/TikTok in generative AI, multi-modal learning, and video/image generation. My work focuses on developing advanced AI systems that seamlessly connect language, vision, and creative intelligence. I obtained my Ph.D. from APG lab with Y.Z. Yang.

I work on generative models, video & image generation on diffusion models, and Vision-Language Model. My recent focus includes controllable generation, large scale training for diffusion models, and representation learning.

We are not hiring Full-Time Employee since the headcount filled. But we are open to long-term intern collaboration. Please drop an email.

~~We’re Hiring (Intern and FTE)!~~

Who we are:
Join the Intelligence Creation Group at TikTok, where we focus on cutting-edge video generation technologies. Our team develops advanced video generative models to power next-generation TikTok content. We build video foundation models that bring creativity to life — like this example on TikTok (one single effect that triggers more than !!30M!! post globally, best AI effect since 2023 on TikTok):
🎬 Click ME .
We also build large scale Video Generation model pre-training for creating powerful video foundation model. We build state-of-the-art VideoGen techniques like: video editing, video customized generation, video motion controlled generation, etc.

Who we looking for:
✅ Strong-motivated Ph.D. candidates;
✅ Experiecnes of video generation/diffusion/AR model training;
✅ Top-tier publications.

Research Areas

Generative modeling (Image/Video diffusion model)
Vision-Language models (video-language, VLM)
Representation learning & efficient pretraining

News

[2025 Sep] New video generative foundation model (Seedance-1.0 - TikTok Version) trained by us is online. Check out its first effect - AI Flower.

[2025 March] AI Mermaid Effect is online, attracting 30M+ post on TikTok - best TikTok AI effect since 2023!.

[2025 Jan] AI Alive is online, check out Shou's demo video on our product.

[2024 Oct] ACM MM'24: Zero-Shot Controllable Image-to-Video Animation .

[2024 July]: Joined Bytedance Global GenAI - Intelligent Creation team as a Research Scientist.

Experience

Global GenAI, ByteDance / TikTok

Senior Research Scientist · 2024 — Present

San Jose, USA

Video Generative Model

Controllable video generation · Any-reference video generation · Foundation video generation pre-training

We develop AI effects from video generative models. I also work closely with Bytedance SEED to build state-of-the-art video generative models for TikTok production.

Large-scale Video Foundation Model Pre-training - Technical Owner for Seedance (TikTok Version) development and Pre-training; Model pre-training (large-scale pre-training on billions+ videos over ~2K H100 GPU) ;
Video Generation Model Application Dev (ID/IP Perservation; Video Editing; Controllable Video Gen) - TikTok Production - Core Contributor；
Check out our recent work on ID/IP video Gen, Motion Controlled Video Gen, V2V/Style Transfer, etc.
Agentic Video Generation (incoming);
Video Generation Model Post-Training/RLHF/ReFL;

Highlighted projects

AI Mermaid - Video Generation & XFN

2025 · TikTok Effect
(Best AI Effect on TikTok since 2023 )

Video Generation Effect

MAGREF — Any-Reference Video Generation

2025 · Research Work

ID/IP Video Gen Diffusion Model

ATI — Motion-Controlled Video Generation

2025 · Research Work

Controllable Video Gen

AI Flower 2.0 — Effect Powered by our Seedance 1.0 TikTok Version

2025 · TikTok Effect

Video Generation Effect

Amazon AGI

Applied Scientist · 2022 — 2024

Sunnyvale, USA

Image/Video Generation; Large-scale Diffusion Pre-training & Post-training

Image/Video Diffusion Model

Amazon AGI Project Nova, Image/Video Generation team.

Text-to-Image generation (see Amazon Nova Canvas, Create with Alexa for Kids, AI Art for FireTV , and Amazon Ads). Tech lead for model development/training/post-training.
Video generation (see Amazon Nova Reel, and Amazon Ads). Core Contributor for SFT/post-training.

Highlighted projects

Amazon Nova — Image/Video Generation

2023–2024 · Model family

T2VVideo

Amazon FireTV — Image/Video Generation

2023–2024 · XFN

T2IImage Generation

Zero-Shot I2V via Motion Decomposition

ACM MM 2024 · Website

I2VControl

Microsoft Cloud & AI

Research Intern · 2020 — 2022

Redmond, USA

Vision and Language Model (VLM)

Self-supervised Learning · Knowledge Distillation · Vision-Language Representation Learning

Collaborators: Zicheng Liu, Lijuan Wang, Jianfeng Wang, Zhe Gan

Vision-Language Model Pre-training/Distillation.

Vision Language Model Distillation: Compressing Visual-linguistic Model via Knowledge Distillation
VLM Pre-training & Image Captioning: Injecting Semantic Concepts into End-to-End Image Captioning
Self-Supervised Learning/Visual Pre-training: SEED: Self-supervised Distillation For Visual Representation

Highlighted projects

SEED — Self-supervised Distillation

ICLR 2021 · Paper

SSLKD

Chinse Academy of Sciences, MM Lab

Visiting Student · June. 2016 — Dec. 2016

Shenzhen, China

Deep Learning · Face Recognition

Collaborators: Zhifeng Li, Xiao Zhang, Yu Qiao

Face Recognition: Range Loss for Deep Face Recognition with Long-tailed Training Data

Product Highlights

AI Mermaid

AI Mermaid effect demo. Over 30M+ posts since online. BEST AI effect on Tiktok since 2023.

AI Alive - Tiktok

AI Alive online! Demo video by Shou.

AI SwayDance

AI Sway Dance Effect demo. Over 3M+ posts in 3 weeks. Let's hop hop hop!

AI Hug

AI Hug Effect demo. Hug with your loved one.

center nodes are experiences • pubs orbit around them

tests: running…

Selected Preprints & Publications

MAGREF: Masked Guidance for Any-Reference Video Generation Yufan Deng, Xun Guo, Yuanyang Yin, Jacob Zhiyuan Fang, Yiding Yang, Yizhi Wang, Shenghai Yuan, Angtian Wang, Bo Liu, Haibin Huang, Chongyang Ma · arXiv 2025

Video GenerationID/IP Reference Video Generation

Preprint Github Webpage

@article{fang2025magref,
  title={MAGREF: Masked Guidance for Any-Reference Video Generation},
  author={Fang, Zhiyuan and others},
  journal={arXiv},
  year={2025}
}

ATI: Any Trajectory Instruction for Controllable Video Generation Angtian Wang, Haibin Huang, Jacob Zhiyuan Fang, Yiding Yang, Chongyang Ma · arXiv 2025

Video GenerationMotion Controlled Video Generation

Preprint Github Webpage

@article{wang2025ati,
  title={Any Trajectory Instruction for Controllable Video Generation},
  author={Angtian Wang, Haibin Huang, Jacob Zhiyuan Fang, Yiding Yang, Chongyang Ma},
  journal={arXiv},
  year={2025}
}

Zero-shot controllable image-to-video animation teaser

Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition Shoubin Yu, Jacob Zhiyuan Fang, Skyler Zheng, Gunnar A. Sigurdsson, Vicente Ordonez, Robinson Piramuthu, Mohit Bansal · ACM MM 2024

Video GenerationControl GenerationDiffusion

Paper Website

@inproceedings{yu2024zeroshot,
  title={Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition},
  author={Yu, Shoubin and Fang, Jacob Zhiyuan and Zheng, Skyler and Sigurdsson, Gunnar A and Ordonez, Vicente and Piramuthu, Robinson and Bansal, Mohit},
  booktitle={ACM Multimedia},
  year={2024}
}

FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation Xuehai He, Jian Zheng, Jacob Zhiyuan Fang, Robinson Piramuthu, Mohit Bansal, Vicente Ordonez, Gunnar A. Sigurdsson, Nanyun Peng, Xin Eric Wang · TMLR 2024

Image GenerationDiffusion ModelEfficiency

arXiv Project

@article{he2024flexecontrol,
  title={FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation},
  author={He, Xuehai and Zheng, Jian and Fang, Jacob Zhiyuan and Piramuthu, Robinson and Bansal, Mohit and Ordonez, Vicente and Sigurdsson, Gunnar A and Peng, Nanyun and Wang, Xin Eric},
  journal={TMLR},
  year={2024}
}

Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation Yingshan Chang, Yasi Zhang, Zhiyuan Fang, Yingnian Wu, Yonatan Bisk, Feng Gao · ECCV 2024

Image GenerationDiffusion

Arxiv

@inproceedings{chang2024skews,
  title={Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation},
  author={Chang, Yuqing and Zhang, Yuchen and Fang, Zhiyuan and Wu, Yuchen and Bisk, Yonatan and Gao, Feng},
  booktitle={ECCV},
  year={2024}
}

SEED: Self-supervised Distillation For Visual Representation Zhiyuan Fang, Jianfeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, Zicheng Liu · ICLR 2021

Self-supervised LearningKnowledge Distillation

Arxiv

@inproceedings{fang2021seed,
  title={SEED: Self-supervised Distillation For Visual Representation},
  author={Fang, Zhiyuan and Wang, Jianfeng and Wang, Lijuan and Zhang, Lei and Yang, Yezhou and Liu, Zicheng},
  booktitle={ICLR},
  year={2021}
}

Injecting Semantic Concepts into End-to-End Image Captioning Zhiyuan Fang, Jianfeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu · CVPR 2022

Image CaptioningVision & Language

Paper

@inproceedings{fang2022injecting,
  title={Injecting Semantic Concepts into End-to-End Image Captioning},
  author={Fang, Zhiyuan and Wang, Jianfeng and Hu, Xiaowei and Liang, Lin and Gan, Zhe and Wang, Lijuan and Yang, Yezhou and Liu, Zicheng},
  booktitle={CVPR},
  year={2022}
}

Compressing Visual-linguistic Model via Knowledge Distillation Zhiyuan Fang, Jianfeng Wang, Xiaowei Hu, Lijuan Wang, Yezhou Yang, Zicheng Liu · ICCV 2021

Knowledge DistillationVision and Language

Paper

@inproceedings{fang2021compressing,
  title={Compressing Visual-linguistic Model via Knowledge Distillation},
  author={Fang, Zhiyuan and Wang, Jianfeng and Hu, Xiaowei and Lijuan Wang, Yezhou Yang, Zicheng Liu},
  booktitle={ICCV},
  year={2021}
}

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language Zhe Wang, Zhiyuan Fang, Jun Wang, Yezhou Yang · ECCV 2020

Person Search

Paper

@inproceedings{wang2020vitaa,
  title={ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language},
  author={Wang, Zheng and Fang, Zhiyuan and Wang, Jianfeng and Yang, Yezhou},
  booktitle={ECCV},
  year={2020}
}

Service

Reviewer: ICCV, CVPR, ECCV, Neurips, ICLR, ICML, ACL, EMNLP, SIGGRAPH, SIGGRAPH-ASIA, TMLR, etc.

Contact

Email: zfang29@asu.edu · Open to collaborations and intern inquiries.