Llama 3.1 405B and the License That Mattered
In July 2024 Meta released Llama 3.1 in 8B, 70B, and 405B sizes. The 405B model is the headline. Meta says it was trained on more than 15 trillion tokens using over 16,000 H100 GPUs, with a 128K context window, and that it is competitive with GPT-4, GPT-4o, and Claude 3.5 Sonnet across a range of tasks. That is a strong claim, and Meta states parity rather than publishing a full benchmark table in the post, so it is worth treating as a claim to verify on your own workloads. The quieter change may matter more than the model. Meta updated the license so developers can use the model's outputs, including the 405B's, to train and improve other models. For a system at this capability tier that is new, and it directly enables large-scale synthetic data generation and distillation into smaller models. Meta frames the release around access, writing that open source will help more people share in the benefits of AI.
Why it matters
If you build models, the license change is the part to read first. It made the 405B a legal teacher model for synthetic data and distillation pipelines, which is how a lot of strong small models since then were actually built. The weights being open is useful; the permission to train on outputs is what reshaped practice.