
Sign up to save your podcasts
Or


Researcher Jindong Wang and Associate Professor Steven Euijong Whang explore the NeurIPS 2024 work ERBench. ERBench leverages relational databases to create LLM benchmarks that can verify model rationale via keywords in addition to checking answer correctness.
Read the paper
Get datasets and codes
By Researchers across the Microsoft research community4.8
8080 ratings
Researcher Jindong Wang and Associate Professor Steven Euijong Whang explore the NeurIPS 2024 work ERBench. ERBench leverages relational databases to create LLM benchmarks that can verify model rationale via keywords in addition to checking answer correctness.
Read the paper
Get datasets and codes

341 Listeners

154 Listeners

213 Listeners

306 Listeners

90 Listeners

506 Listeners

477 Listeners

59 Listeners

131 Listeners

95 Listeners

123 Listeners

591 Listeners

26 Listeners

35 Listeners

136 Listeners