July 12, 2024

On Data, feat. Shayne Longpre | TRACES Appendix 38

50 minutes

In this conversation, Cristian and Shayne discuss the foundational role of data in AI and the challenges associated with data provenance and curation. They explore the organization and sourcing of data sets, the complexities of filtering and balancing data, and the legal and ethical implications of data usage.

They also touch on the importance of transparency, accountability, and independent evaluation in the development of AI models. The conversation highlights the need for responsible data practices and the potential impact of AI on society. The conversation explores the protocols and challenges surrounding AI research and the need for infrastructure in the field.

The discussion delves into the concept of safe harbor for good faith research and the importance of distinguishing between good and bad researchers. The conversation also touches on the changing landscape of the web and the impact on data access and consent.

The enforceability of consent mechanisms and the complexities of copyright in the digital age are also discussed.

Find me at [email protected]

PRE-ORDER TRACES: A PSY-FI NOVEL NOW (https://ccblife.gumroad.com/l/traces)

Also, who are you? Get a draft of TRACES if you fill out this form (https://forms.gle/rFnVFrCNUAJz7Fvn7)

About the Guest:

Shayne Longpre is a PhD Candidate at MIT, where he works on training language models, and understanding their broader social challenges. In particular he investigates their risks, access and transparency, with an emphasis on training data. He leads the Data Provenance Initiative, and co-organized the AI safe harbor open letter (co-signed by 350+ researchers and journalists), advocating for better independent research access to closed models. His work has been covered by the New York Times, the Washington Post, and VentureBeat.

Set-Up:

- Camera: https://amzn.to/3PZVscb (don't laugh)

- Microphone: https://amzn.to/46f3pB5

- Teleprompter Stand: https://amzn.to/3tgS98y

- Telepromter App: https://amzn.to/46jdH31

- Teleprompter Screen: https://amzn.to/3PNfKFI (yup)

- Headphones: https://amzn.to/46gMSwo

Timestamps

00:00 Introduction and Background

02:25 The Foundational Role of Data in AI

08:57 Challenges in Data Provenance and Curation

15:36 Transparency and Accountability in AI Development

21:49 Legal and Ethical Implications of Data Usage

29:56 The Potential of Foundation Models and Best Practices

41:59 Protocols and Infrastructure for AI Research

44:11 Distinguishing Good and Bad Researchers in AI

48:25 The Changing Landscape of the Web and Data Access

01:10:55 Enforceability of Consent Mechanisms and Copyright in the Digital Age

Hashtags

#DataProvenance #DataCuration #AIEthics #AITransparency #DataSets #AIChallenges #DataBalance #LegalImplications #AIResearch #DataUsage #ResponsibleAI #AIModels #DataOrganization #AIRegulations #SafeHarbor #GoodFaithResearch #AIResponsibility #WebEvolution #DataAccess #UserConsent #CopyrightLaws #DigitalEthics #AIImpact #AIAccountability #IndependentEvaluation

...more