
Sign up to save your podcasts
Or
Researcher Jindong Wang and Associate Professor Steven Euijong Whang explore the NeurIPS 2024 work ERBench. ERBench leverages relational databases to create LLM benchmarks that can verify model rationale via keywords in addition to checking answer correctness.
Read the paper
Get datasets and codes
4.8
8080 ratings
Researcher Jindong Wang and Associate Professor Steven Euijong Whang explore the NeurIPS 2024 work ERBench. ERBench leverages relational databases to create LLM benchmarks that can verify model rationale via keywords in addition to checking answer correctness.
Read the paper
Get datasets and codes
1,040 Listeners
481 Listeners
441 Listeners
298 Listeners
331 Listeners
127 Listeners
156 Listeners
192 Listeners
198 Listeners
88 Listeners
454 Listeners
259 Listeners
61 Listeners
75 Listeners
491 Listeners