
Sign up to save your podcasts
Or


Modeling how worlds evolve over time is an important aspect of interacting with them. Video world models have become an exciting area of research in robotics over the past year in part for this reason. What if there was a better way to represent changes over time, though?
Trace Anything represents each frame in a video as a trajectory field, i.e. a trajectory through 3d space. This provides a very unique foundation for all kinds of downstream tasks like goal-conditioned manipulation and motion forecasting.
We talked to Xinhang Liu to learn more.
Watch Episode 55 of RoboPapers with Michael Cho and Chris Paxton now!
Abstract:
Effective spatio-temporal representation is fundamental to modeling, understanding, and predicting dynamics in videos. The atomic unit of a video, the pixel, traces a continuous 3D trajectory over time, serving as the primitive element of dynamics. Based on this principle, we propose representing any video as a Trajectory Field: a dense mapping that assigns a continuous 3D trajectory function of time to each pixel in every frame. With this representation, we introduce Trace Anything, a neural network that predicts the entire trajectory field in a single feed-forward pass. Specifically, for each pixel in each frame, our model predicts a set of control points that parameterizes a trajectory (i.e., a B-spline), yielding its 3D position at arbitrary query time instants. We trained the Trace Anything model on large-scale 4D data, including data from our new platform, and our experiments demonstrate that: (i) Trace Anything achieves state-of-the-art performance on our new benchmark for trajectory field estimation and performs competitively on established point-tracking benchmarks; (ii) it offers significant efficiency gains thanks to its one-pass paradigm, without requiring iterative optimization or auxiliary estimators; and (iii) it exhibits emergent abilities, including goal-conditioned manipulation, motion forecasting, and spatio-temporal fusion.
Project Page: https://trace-anything.github.io/
ArXiV: https://arxiv.org/abs/2510.13802
This Post on X
By Chris Paxton and Michael ChoModeling how worlds evolve over time is an important aspect of interacting with them. Video world models have become an exciting area of research in robotics over the past year in part for this reason. What if there was a better way to represent changes over time, though?
Trace Anything represents each frame in a video as a trajectory field, i.e. a trajectory through 3d space. This provides a very unique foundation for all kinds of downstream tasks like goal-conditioned manipulation and motion forecasting.
We talked to Xinhang Liu to learn more.
Watch Episode 55 of RoboPapers with Michael Cho and Chris Paxton now!
Abstract:
Effective spatio-temporal representation is fundamental to modeling, understanding, and predicting dynamics in videos. The atomic unit of a video, the pixel, traces a continuous 3D trajectory over time, serving as the primitive element of dynamics. Based on this principle, we propose representing any video as a Trajectory Field: a dense mapping that assigns a continuous 3D trajectory function of time to each pixel in every frame. With this representation, we introduce Trace Anything, a neural network that predicts the entire trajectory field in a single feed-forward pass. Specifically, for each pixel in each frame, our model predicts a set of control points that parameterizes a trajectory (i.e., a B-spline), yielding its 3D position at arbitrary query time instants. We trained the Trace Anything model on large-scale 4D data, including data from our new platform, and our experiments demonstrate that: (i) Trace Anything achieves state-of-the-art performance on our new benchmark for trajectory field estimation and performs competitively on established point-tracking benchmarks; (ii) it offers significant efficiency gains thanks to its one-pass paradigm, without requiring iterative optimization or auxiliary estimators; and (iii) it exhibits emergent abilities, including goal-conditioned manipulation, motion forecasting, and spatio-temporal fusion.
Project Page: https://trace-anything.github.io/
ArXiV: https://arxiv.org/abs/2510.13802
This Post on X