My benchmark for large language models | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		My benchmark for large language models (carlini.com)
		4 points by cheviethai123 on Feb 21, 2024 \| hide \| past \| favorite \| 2 comments

cheviethai123 on Feb 21, 2024 [–]

Consider how low the score of Gemini here compared to the other LLM test. And I'm impressed by the evaluation method's ability to assess performance without relying on tailored prompts.

hoamatcuoi on Feb 21, 2024 | [–]

But the benchmark only scoring Gemini-Pro 1, I'm curious how the Gemini Ultra performance here but guessed we couldn't know yet.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact