Appendix
Benchmark Saturation & Lifespan
As AI models improve, benchmarks saturate — scores approach their ceiling and the benchmark can no longer distinguish between frontier models. This appendix visualizes that process across every benchmark in our dataset.
Each curve starts at its first recorded score. Hover to isolate individual benchmarks. Bold dashed line = average across all benchmarks.
Filled bubbles have reached 80% saturation. Dashed outlines have not. Bubble size reflects current best score.
Select individual benchmarks to compare saturation trajectories. Toggle "All scores" to see individual model data points behind the frontier.