Standard Intelligence trained a general-purpose computer action foundation model on 11 million hours of screen recordings. Instead of an LLM, FDM-1 operates directly on video and action tokens, achieving 50-100x compression efficiency over existing VLMs with a custom encoder.