Share Diaries of Social Data Research
Share to email
Share to Facebook
Share to X
By Katherine A. Keith, Naitian Zhou, & Lucy Li
5
66 ratings
The podcast currently has 20 episodes available.
In this episode, we speak to Christian Baden, Christian Pipal, and Mariken van der Velden about their 2022 journal paper in Communications Methods and Measures, titled, “Three Gaps in Computational Text Analysis Methods for Social Sciences: A Research Agenda”. They co-authored this paper with Martijn Schoonvelde, and the authors span several disciplines, from communication to political science.
We discuss the challenges and joys of writing for a cross-disciplinary audience, how their frustrations with the validity of computational methods are shared across fields with different methodological conventions, and how this paper laid the groundwork for a larger project on European political text analysis.
Our guests on this episode are Diyi Yang, assistant professor at the School of Interactive Computing, and David Muchlinski, assistant professor in the Sam Nunn School of International Affairs, both at Georgia Tech. We discuss their EMNLP 2021 paper, "Latent Hatred: A Benchmark for Understanding Implicit Hate Speech." This paper is co-authored with Mai ElSherief, Caleb Ziems, Vaishnavi Anupindi, Jordyn Seybolt, and Munmun De Choudhury.
Diyi and David reveal that the annotation process behind this paper took two years and incorporated domain expertise on the broader context around hateful language. That is, an understanding of the social groups who produce this language allowed for better categorization and interpretation of implicit hate. We also discuss the cross-discipline connections they’ve forged in the past and present, and the ongoing challenges this type of work poses for computational methods.
This episode features Ted Underwood, a professor in the School of Information Sciences and Department of English at the University of Illinois Urbana-Champaign, and David Bamman, an associate professor at UC Berkeley’s School of Information. We discuss their 2018 Cultural Analytics paper co-authored with literary studies PhD student Sabrina Lee, titled “The Transformation of Gender in English-Language Fiction.”
We trace how Twitter brought Ted and David together as collaborators, and the email that sparked the beginnings of this project. They describe how this paper uses predictive modeling for an unconventional purpose, and various “means of interrogating data.” They also provide tips for establishing collaborative relationships, and advocate using substantive research questions to motivate learning technical skills.
Our guests in this episode are Ryan Gallagher, a PhD Candidate in Network Science at Northeastern University, and Brooke Foucault Welles, an Associate Professor in Communication Studies and the Network Science Institute at Northeastern University. We discuss their 2019 CSCW paper, "Reclaiming Stigmatized Narratives: The Networked Disclosure Landscape of #MeToo" with co-authors Elizabeth Stowell and Andrea G. Parker.
This episode features Dora Demszky, a PhD student in Linguistics at Stanford University. Dora works at the intersection of natural language processing and education. We discuss her ACL 2021 paper titled "Measuring Conversational Uptake: A Case Study on Student-Teacher Interactions", co-authored with Jing Liu, Zid Mancenido, Julie Cohen, Heather Hill, Dan Jurafsky, and Tatsunori Hashimoto.
Dora's work is motivated by creating tools that are useful for educators, so her research is not only descriptive or predictive, but also applicable to classrooms. She talks about managing large interdisciplinary teams, approaching research with care, and working with actual teachers to annotate data.
Our guest in this episode is Deen Freelon, Associate Professor at the University of North Carolina in the School of Journalism and Media. We chat about his 2020 Social Science Computer Review Paper "Black Trolls Matter: Racial and Ideological Asymmetries in Social Media Disinformation" with co-authors Michael Bossetta, Chris Wells, Josephine Lukito, Yiping Xia, and Kirsten Adams.
Deen also talks about writing a "behind the scenes" book chapter about the process of making this paper, being one of the first movers in the discipline of computational methods for communication studies, and how he learns programming best when it is connected to the goals of his project. He emphasizes that many of his great research ideas come from reading deeply and recommends devoting at least half a day a week solely to reading.
In this episode, we talk with David Lazer, the University Distinguished Professor of Political Science and Computer Sciences at Northeastern University and the Co-Director of the NULab for Texts, Maps, and Networks. We discuss two seminal papers in computational social science he co-authored a decade apart: "Life in the network: the coming age of computational social science" (Science 2009) and "Computational social science: Obstacles and opportunities" (Science 2020).
David shares with us events in his long and distinguished CSS research career. In the early 2000s, he helped gather a small group of people working on new "data streams" and how they intentionally created the term computational social science. He also talks about his own struggles on the academic job market, advice for aspiring CSS researchers, and a wish for better data availability structures.
Our guests on this episode are Kenneth Joseph, an assistant professor in Computer Science and Engineering at the University of Buffalo, and Sarah Shugars, a Faculty Fellow at New York University’s Center for Data Science. We discuss the process behind their EMNLP 2021 paper, “(Mis)alignment Between Stance Expressed in Social Media Data and Public Opinion Surveys,” co-authored with Ryan Gallagher, Jon Green, Alexi Quintana Mathé, Zijian An, and David Lazer.
Kenneth and Sarah offer tips around communication, collaboration, and project management, especially for papers written during a pandemic. Kenneth talks about “privileging ethics” when making decisions around data privacy and experimental replicability, and Sarah reflects on navigating differences in terminology use in interdisciplinary environments.
Our guests on this episode are Vinodkumar Prabhakaran, who was a computer science postdoc at Stanford and now a senior research scientist at Google, and Camilla Griffiths, who is a postdoc at Stanford SPARQ (Social Psychological Answers to Real-world Questions). With Hang Su, Prateek Verma, Nelson Morgan, Jennifer Eberhardt, and Dan Jurafsky, they are co-authors on a TACL 2018 paper, "Detecting Institutional Dialog Acts in Police Traffic Stops".
Vinod and Camilla share with us how this collaboration formed over a common goal and a deep respect for each other’s disciplines. We discuss the considerations that went into forming community partnerships, handling sensitive police body-camera data, and recognizing the implications of their findings.
This episode features Aaron Schein, a computer scientist and postdoctoral fellow at Columbia University. We discuss his WWW 2021 paper "Assessing the Effects of Friend-to-Friend Texting on Turnout in the 2018 US Midterm Elections", co-authored with Keyon Vafa, Dhanya Sridhar, Victor Veitch, Jeffery Quinn, James Moffet, David Blei, and Donald Green.
Aaron shares with us how he collaborated with industry partners, overcame the discovery of a confounder that challenged the experiment’s original design, and responded to public feedback. He also mapped his interdisciplinary journey through linguistics, political science, and computer science, and shared his twist on imposter syndrome.
The podcast currently has 20 episodes available.