Ego-centric Predictive Model
Two-stage egocentric video prediction conditioned on hand trajectories.
A two-stage model for egocentric future prediction:
- Predict future hand trajectories from past frames and recent motion.
- Use the predicted trajectories to condition a Latent Diffusion Model that generates future video.
Trained and evaluated on Ego4D, BridgeData, and RLBench, the approach achieves state-of-the-art egocentric video prediction quality and produces trajectory-consistent futures that can be used as a world model for downstream planning. Currently under review at ICML 2026 (top 15%).