September 08, 2024

#1: Chatbot Arenaのデータを使ってドメイン独自の評価データセットを作る

Listen Later

32 minutes

Chatbot Arenaのデータを使ってドメイン独自の評価データセットを作るという論文、Judging LLM-as-a-Judge with MT-Bench and Chatbot Arenaを題材に話しました。

ポッドキャストの書き起こしサービス「LISTEN」はこちら

Shownotes：

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Chat with Open Large Language Models

From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline | LMSYS Org

Benchmarks 201: Why Leaderboards > Arenas >> LLM-as-Judge

https://x.com/karpathy/status/1737544497016578453

https://github.com/lm-sys/arena-hard-auto/tree/main/BenchBuilder

出演者：

seya(@sekikazu01)

kagaya(@ry0_kaga)

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

AI Engineering Now

By AI Engineering Now

September 08, 2024

#1: Chatbot Arenaのデータを使ってドメイン独自の評価データセットを作る

Listen Later

32 minutes

Chatbot Arenaのデータを使ってドメイン独自の評価データセットを作るという論文、Judging LLM-as-a-Judge with MT-Bench and Chatbot Arenaを題材に話しました。

ポッドキャストの書き起こしサービス「LISTEN」はこちら

Shownotes：

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Chat with Open Large Language Models

From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline | LMSYS Org

Benchmarks 201: Why Leaderboards > Arenas >> LLM-as-Judge

https://x.com/karpathy/status/1737544497016578453

https://github.com/lm-sys/arena-hard-auto/tree/main/BenchBuilder

出演者：

seya(@sekikazu01)

kagaya(@ry0_kaga)

...more