You may wonder why "Prime", which is such a strange nickname, so here is the story: I have been having a deep voice since middle school, and that is when I earned myself the nickname "Prime" (obviously because of the famous character who's known to have a deep voice too). So no, this nickname is not something that randomly popped into my head, it is earned. ;)
Before Stanford, I was a CS master student at ETH Zürich
supervised by Prof. Luc Van Gool.
I obtained my Bachelor degree in Computer Science with first honour from King's College London in United Kingdom, where I spent some time working on information theory.
I work on long context, video generation, and (all forms of) world models, hoping they can one day forward simulate the future, and reverse engineer the past.
I also work a bit on large-scale training infrastructures.
They are one of the most principled and beautiful things in computer science that differentiates it from other engineering disciplines, I believe, and they help me train my models.
I am also an extremely amateur body builder and car racer (with my Corvette C7 Z51 and its 6.2L V8 engine).
You might spot me at one of the Equinox gyms, or in a stylish red sports car on the mountain roads behind Stanford.
Decouple local realism and long-range coherence for fast long-video generation by combining mode-seeking teacher distillation with mean-seeking long-video supervision.
Frame context packing for next-frame prediction enables longer contexts within fixed sequence budgets, with drift prevention to reduce error accumulation.
Top-down keyframe planning with bottom-up long-context video synthesis, using interleaved training to adapt MM-DiT for stable and efficient multi-scene generation.
Large-scale benchmark and baseline for instruction-guided image editing with complex non-rigid motions such as viewpoint changes, articulations, and deformations.
Render low fidelity animated mesh directly into animation using pre-trained 2D diffusion models, without the need of any further training/distillation.
Render low fidelity animated mesh directly into animation using pre-trained 2D diffusion models, without the need of any further training/distillation.