Benchmark Ceiling

Logic(論理)

4.7 Months: The Half-Life of Cyber Safety in the Age of Mythos

UK AISI found that Claude Mythos Preview exceeded GPT-5.5 and its own prior scores — while outgrowing the benchmark itself. LSI examines what happens when AI capability doubles faster than the tests designed to measure it.