AI Engineering Now

#1: Chatbot Arenaのデータを使ってドメイン独自の評価データセットを作る


Listen Later

Chatbot Arenaのデータを使ってドメイン独自の評価データセットを作るという論文、Judging LLM-as-a-Judge with MT-Bench and Chatbot Arenaを題材に話しました。


ポッドキャストの書き起こしサービス「LISTEN」はこちら

Shownotes:

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Chat with Open Large Language Models

From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline | LMSYS Org

Benchmarks 201: Why Leaderboards > Arenas >> LLM-as-Judge

https://x.com/karpathy/status/1737544497016578453

https://github.com/lm-sys/arena-hard-auto/tree/main/BenchBuilder


出演者:

seya(@sekikazu01)

kagaya(@ry0_kaga)

...more
View all episodesView all episodes
Download on the App Store

AI Engineering NowBy AI Engineering Now