Full-Leaderboard@1.0 Leaderboard
# The full leaderboard for UEval, covering all domains and tasks.| Rank | Model Name | Avg | Art | Diagram | Exercise | Life | Paper | Space | Tech | Textbook | Date |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | GPT 5 Thinking | 66.4% | 67.8% | 67.8% | 61.4% | 63.8% | 51.9% | 84.0% | 57.0% | 78.0% | 1/14/2025 |
| 2 | Gemini 2.5 Flash | 66.0% | 66.6% | 66.4% | 50.0% | 63.0% | 71.6% | 78.0% | 58.2% | 74.0% | 1/14/2025 |
| 3 | GPT 5 Instant | 65.2% | 71.2% | 62.3% | 57.6% | 69.7% | 55.1% | 72.3% | 50.7% | 77.9% | 1/14/2025 |
| 4 | Gemini 2.0 Flash | 55.1% | 70.4% | 47.6% | 48.0% | 58.0% | 45.8% | 65.2% | 50.2% | 55.2% | 1/14/2025 |
| 5 | Emu 3.5 | 49.1% | 59.3% | 41.1% | 45.4% | 62.0% | 31.6% | 59.1% | 37.0% | 57.4% | 1/14/2025 |
| 6 | Bagel | 31.0% | 39.0% | 37.2% | 21.4% | 33.6% | 20.0% | 29.8% | 24.8% | 42.5% | 1/14/2025 |
| 7 | Janus Pro | 22.9% | 26.4% | 37.4% | 11.5% | 23.0% | 15.2% | 21.0% | 17.6% | 31.0% | 1/14/2025 |
| 8 | Show o2 | 22.6% | 25.6% | 33.2% | 13.1% | 15.6% | 17.4% | 25.4% | 17.4% | 33.1% | 1/14/2025 |
| 9 | MMaDA | 14.4% | 15.7% | 14.2% | 12.6% | 15.8% | 13.3% | 10.8% | 12.4% | 20.0% | 1/14/2025 |
Submit your results by opening an issue in our GitHub.