Show Notes
- (02:06) Fabiana talked about her Bachelor’s degree in Applied Mathematics from the University of Lisbon in the early 2010s.
- (04:18) Fabiana shared lessons learned from her first job out of college as a Siebel and BI Developer at Novabase.
- (05:13) Fabiana discussed unique challenges while working as an IoT Solutions Architect at Vodafone.
- (09:56) Fabiana mentioned projects she contributed to as a Data Scientist at startups such as ODYSAI and Habit Analytics.
- (12:44) Fabiana talked about the two Master’s degrees she got while working in the industry (Applied Econometrics from Lisbon School of Economics and Management and Business Intelligence from NOVA IMS Information Management School).
- (14:41) Fabiana distinguished the difference between data science and business intelligence.
- (18:01) Fabiana shared the founding story of YData, the first data-centric platform with synthetic data, whose she is currently the Chief Data Officer.
- (21:32) Fabiana discussed different techniques to generate synthetic data, including oversampling, Bayesian Networks, and generative models.
- (24:01) Fabiana unpacked the key insights in her blog series on generating synthetic tabular data.
- (29:40) Fabiana summarized novel design and optimization techniques to cope with the challenges of training GAN models.
- (33:44) Fabiana brought up the benefits of using Differential Privacy as a complement to synthetic data generation.
- (38:07) Fabiana unpacked her post “The Cost of Poor Data Quality,” — where she defined data quality as data measures based on factors such as accuracy, completeness, consistency, reliability, and above all, whether it is up to date.
- (42:11) Fabiana explained the important role that data quality plays in ensuring model explainability.
- (44:57) Fabiana reasoned about YData’s decision to pursue the open-source strategy.
- (47:47) Fabiana discussed her podcast called “When Machine Learning Meets Privacy” in collaboration with the MLOps Slack community.
- (49:14) Fabiana briefly shared the challenges encountered to get the first cohort of customers for YData.
- (50:12) Fabiana went over valuable lessons to attract the right people who are excited about YData’s mission.
- (51:52) Fabiana shared her take on the data community in Lisbon and her effort to inspire more women to join the tech industry.
- (53:47) Closing segment.
Fabiana’s Contact Info
YData’s Resources
- Website
- Github
- LinkedIn
- Twitter
- AngelList
- Synthetic Data Community
Mentioned Content
Blog Posts
- Synthetic Data: The Future Standard for Data Science Development (April 2020)
- Generating Synthetic Tabular Data with GANs — Part 1 (May 2020)
- Generating Synthetic Tabular Data with GANs — Part 2 (May 2020)
- What Is Differential Privacy? (May 2020)
- What Is Going On With My GAN? (July 2020)
- How To Generate Synthetic Tabular Data? Wasserstein Loss for GANs (Sep 2020)
- The Cost of Poor Data Quality (Sep 2020)
- How Can I Explain My ML Models To The Business? (Oct 2020)
- Synthetic Time-Series Data: A GAN Approach (Jan 2021)
Podcast
- “When Machine Learning Meets Privacy”
People
- Jean-Francois Rajotte (Resident Data Scientist at the University of British Columbia)
- Sumit Mukherjee (Associate Professor of Statistics at Columbia University)
- Andrew Trask (Leader at OpenMined, Research Scientist at DeepMind, Ph.D. Student at the University of Oxford)
- Théo Ryffel (Co-Founder of Arkhn, Ph.D. Student at ENS and INRIA, Leader at OpenMined)
Recent Announcements/Articles
- Partnerships with UbiOps and Algorithmia
- The rise of DataPrepOps (March 2021)
- From model-centric to data-centric (March 2021)
This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit datacast.substack.com/subscribe