Rabbit Hole Research

On Data, feat. Shayne Longpre | TRACES Appendix 38


Listen Later

In this conversation, Cristian and Shayne discuss the foundational role of data in AI and the challenges associated with data provenance and curation. They explore the organization and sourcing of data sets, the complexities of filtering and balancing data, and the legal and ethical implications of data usage.
They also touch on the importance of transparency, accountability, and independent evaluation in the development of AI models. The conversation highlights the need for responsible data practices and the potential impact of AI on society. The conversation explores the protocols and challenges surrounding AI research and the need for infrastructure in the field.
The discussion delves into the concept of safe harbor for good faith research and the importance of distinguishing between good and bad researchers. The conversation also touches on the changing landscape of the web and the impact on data access and consent.
The enforceability of consent mechanisms and the complexities of copyright in the digital age are also discussed.
PRE-ORDER TRACES: A PSY-FI NOVEL NOW (https://ccblife.gumroad.com/l/traces)
Also, who are you? Get a draft of TRACES if you fill out this form (https://forms.gle/rFnVFrCNUAJz7Fvn7)
About the Guest:
Shayne Longpre is a PhD Candidate at MIT, where he works on training language models, and understanding their broader social challenges. In particular he investigates their risks, access and transparency, with an emphasis on training data. He leads the Data Provenance Initiative, and co-organized the AI safe harbor open letter (co-signed by 350+ researchers and journalists), advocating for better independent research access to closed models. His work has been covered by the New York Times, the Washington Post, and VentureBeat.
Set-Up:
- Camera: https://amzn.to/3PZVscb (don't laugh)
- Microphone: https://amzn.to/46f3pB5
- Teleprompter Stand: https://amzn.to/3tgS98y
- Telepromter App: https://amzn.to/46jdH31
- Teleprompter Screen: https://amzn.to/3PNfKFI (yup)
- Headphones: https://amzn.to/46gMSwo
Timestamps
00:00 Introduction and Background
02:25 The Foundational Role of Data in AI
08:57 Challenges in Data Provenance and Curation
15:36 Transparency and Accountability in AI Development
21:49 Legal and Ethical Implications of Data Usage
29:56 The Potential of Foundation Models and Best Practices
41:59 Protocols and Infrastructure for AI Research
44:11 Distinguishing Good and Bad Researchers in AI
48:25 The Changing Landscape of the Web and Data Access
01:10:55 Enforceability of Consent Mechanisms and Copyright in the Digital Age
Hashtags
#DataProvenance #DataCuration #AIEthics #AITransparency #DataSets #AIChallenges #DataBalance #LegalImplications #AIResearch #DataUsage #ResponsibleAI #AIModels #DataOrganization #AIRegulations #SafeHarbor #GoodFaithResearch #AIResponsibility #WebEvolution #DataAccess #UserConsent #CopyrightLaws #DigitalEthics #AIImpact #AIAccountability #IndependentEvaluation
...more
View all episodesView all episodes
Download on the App Store

Rabbit Hole ResearchBy Cristian Cibils Bernardes