Into AI Safety

HACKATHON: Evals November 2023 (2)


Listen Later

Join our hackathon group for the second episode in the Evals November 2023 Hackathon subseries. In this episode, we solidify our goals for the hackathon after some preliminary experimentation and ideation.

Check out Stellaric's website, or follow them on Twitter.

01:53 - Meeting starts
05:05 - Pitch: extension of locked models
23:23 - Pitch: retroactive holdout datasets
34:04 - Preliminary results
37:44 - Next steps
42:55 - Recap

Links to all articles/papers which are mentioned throughout the episode can be found below, in order of their appearance.

  • Evalugator library
  • Password Locked Model blogpost
  • TruthfulQA: Measuring How Models Mimic Human Falsehoods
  • BLEU: a Method for Automatic Evaluation of Machine Translation
  • BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
  • Detecting Pretraining Data from Large Language Models
...more
View all episodesView all episodes
Download on the App Store

Into AI SafetyBy Jacob Haimes