The problem with using real exams as benchmarks is that they are often quite sim...

		amelius on March 14, 2023 \| parent \| context \| favorite \| on: GPT-4 The problem with using real exams as benchmarks is that they are often quite similar over several years. So they only make sense if you don't train on them also (previous editions of course).