Setting | Task Type | #Examples | Databases | Cost |
---|---|---|---|---|
Spider 2.0 | Code agent task | 632 | BigQuery(214), Snowflake(198), Postgres(10), ClickHouse(7), SQLite(135), DuckDB (DBT)(68) | Some cost incurred |
Spider 2.0-Snow | Text-to-SQL task | 547 | Snowflake(547) | NO COST!😊 |
Spider 2.0-Lite | Text-to-SQL task | 547 | BigQuery(214), Snowflake(198), SQLite(135) | Some cost incurred |
To meet with research interests in traditional Text2SQL settings, we also release a subset of Spider 2.0 called Spider 2.0-Lite which is more self-contained, to support faster development and evaluation.
Spider 2.0-snow includes 547 examples, all hosted on Snowflake, which offers participants free quotas. If you want to test performance on a single SQL dialect, don’t hesitate to use Spider 2.0-snow.
Refer to the Quick Start to run your experiments on Spider 2.0, Spider 2.0-snow, or Spider 2.0-lite. For submission, provide a clear README, compressed code that passes your dev evaluation, any additional API keys required, and a report of prompt token counts for cost estimation. Follow the Submission Guideline for evaluation on full dataset. Usually, we will return your results in 10 days!
We thank Snowflake for their generous support in hosting the Spider 2.0 Challenge. We also thank Tianbao Xie, Yiheng Xu, Fan Zhou, Yuting Lan, Per Jacobsson, Yiming Huang, Canwen Xu, Zhewei Yao, and Binyuan Hui for their helpful feedback on this work. The leaderboard submission guidelines are greatly inspired by BIRD-SQL, and we thank them for their contributions.
Rank | Method | Score |
---|---|---|
1 Nov 2, 2024 |
Spider-Agent + o1-preview | 17.01 |
2 Nov 2, 2024 |
Spider-Agent + GPT-4o | 10.13 |
3 Nov 2, 2024 |
Spider-Agent + Claude-3.5-Sonnect | 9.02 |
4 Nov 2, 2024 |
Spider-Agent + GPT-4 | 8.86 |
5 Nov 2, 2024 |
Spider-Agent + Qwen2.5-72B | 6.17 |
6 Nov 2, 2024 |
Spider-Agent + DeepSeek-V2.5 | 5.22 |
7 Nov 2, 2024 |
Spider-Agent + Gemini-Pro-1.5 | 2.53 |
8 Nov 2, 2024 |
Spider-Agent + Llama-3.1-405B | 2.21 |