@gm8xx8
FAST is a robot action tokenizer that simplifies and speeds up robot training. It enables:
> 5x faster training compared to diffusion models.
> Compatibility with all tested robot datasets.
> Zero-shot performance in new environments, including the DROID dataset, successfully controlling robots in various settings with ease.
> Simple autoregressive VLAs that match diffusion VLA performance.
> Mixed-data VLA training, allowing integration of non-robot data like web data, subgoals, and video prediction.
FAST compresses actions using discrete cosine transform, reducing redundancy and enabling efficient VLA training on high-frequency tasks. It scales to complex robot tasks with simple next-token prediction, converging in days instead of weeks.
A pre-trained FAST tokenizer based on 1M robot action sequences is available on Hugging Face, working across various robots and supporting mixed-data VLA training.
https://huggingface.co/physical-intelligence/fast