November 18, 2024

#10: Agent-as-a-judge 〜エージェントの評価を行うエージェント〜

Listen Later

29 minutes

LLM-as-a-Judgeに着想を得て、エージェンティックシステムを評価するためにエージェンティックシステムを用いることを提案したAgent-as-a-Judge: Evaluate Agents with Agentsを題材に話しました。

ポッドキャストの書き起こしサービス「LISTEN」は⁠こちら⁠

Shownotes:

https://arxiv.org/abs/2410.10934v1

https://huggingface.co/DEVAI-benchmark

https://github.com/metauto-ai/agent-as-a-judge/tree/main

https://blog.langchain.dev/scipe-systematic-chain-improvement-and-problem-evaluation/

⁠

出演者：

seya(⁠@sekikazu01⁠)

kagaya(⁠@ry0_kaga⁠)

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

AI Engineering Now

By AI Engineering Now

November 18, 2024

#10: Agent-as-a-judge 〜エージェントの評価を行うエージェント〜

Listen Later

29 minutes

LLM-as-a-Judgeに着想を得て、エージェンティックシステムを評価するためにエージェンティックシステムを用いることを提案したAgent-as-a-Judge: Evaluate Agents with Agentsを題材に話しました。

ポッドキャストの書き起こしサービス「LISTEN」は⁠こちら⁠

Shownotes:

https://arxiv.org/abs/2410.10934v1

https://huggingface.co/DEVAI-benchmark

https://github.com/metauto-ai/agent-as-a-judge/tree/main

https://blog.langchain.dev/scipe-systematic-chain-improvement-and-problem-evaluation/

⁠

出演者：

seya(⁠@sekikazu01⁠)

kagaya(⁠@ry0_kaga⁠)

...more