Datacast

Episode 60: Algorithms and Data Structures for Massive Datasets with Dzejla Medjedovic


Listen Later

Show Notes
  • (01:58) Dzejla described her undergraduate experience studying Computer Science at the Sarajevo School of Science and Technology back in the mid-2000s.
  • (07:59) Dzejla recapped her overall experience getting a Ph.D. in Computer Science at Stony Brook University.
  • (14:38) Dzejla unpacked the key research problem in her Ph.D. thesis titled “Upper and Lower Bounds on Sorting and Searching in External Memory.”
  • (19:13) Dzejla went over the details of her paper “Don’t Thrash: How to Cache Your Hash on Flash,” — which describes the Cascade Filter, an approximate-membership-query data structure that scales beyond main memory, that is an alternative to the well-known Bloom-filter data structure.
  • (24:41) Dzejla elaborated on her work “The batched predecessor problem in external memory,” — which studies the lower bounds in three external memory models: the I/O comparison model, the I/O pointer-machine model, and the index-ability model.
  • (29:56) Dzejla shared her learnings from being a Teaching Assistant for the Introduction to Algorithms course at Stony Brook (both at the undergraduate and graduate level).
  • (35:08) Dzejla went over her summer internships at Microsoft’s Server and Tools Division during her Ph.D.
  • (41:06) Dzejla reasoned about her decision to return to Sarajevo School of Science and Technology as an Assistant Professor of Computer Science.
  • (47:22) Dzejla dissected the essential concepts and methods covered in her Data Structures, Introductory Algorithms, Advanced Algorithms, and Algorithms for Big Data courses taught at SSIT.
  • (48:42) Dzejla provided a brief overview of the Computer Science/Software Engineering department at the International University of Sarajevo (where she has been a professor since 2017.
  • (50:57) Dzejla briefly talked about the courses that she taught at IUS, including Intro to Programming, Human-Computer Interaction, and Algorithms/Data Structures.
  • (52:49) Dzejla shared the challenges of writing Algorithms and Data Structures for Massive Datasets, which introduces data processing and analytics techniques specifically designed for large distributed datasets.
  • (56:14) Dzejla explained concepts in Part 1 of the book — including Hash Tables, Approximate Membership, Bloom Filters, Frequency/Cardinality Estimation, Count-Min Sketch, and Hyperloglog.
  • (58:38) Dzejla provided a brief overview of techniques to handle streaming data in Part 2 of the book.
  • (01:00:14) Dzejla mentioned the data structures for large databases and external-memory algorithms in Part 3 of the book.
  • (01:02:15) Dzejla shared her thoughts about the tech community in Sarajevo.
  • (01:04:16) Closing segment.
Dzejla’s Contact Info
  • LinkedIn
  • Twitter
  • Google Scholar
Mentioned Content

Papers

  • “Upper and Lower Bounds on Sorting and Searching in External Memory” (Dzejla’s Ph.D. Thesis, 2014)
  • “Don’t Thrash: How to Cache Your Hash on Flash” (2012)
  • “The batched predecessor problem in external memory” (2014)

People

  • Erik Demaine (Computer Science Professor at MIT)
  • Michael Bender (Computer Science Professor at Stony Brook, Dzejla’s Ph.D. Advisor)
  • Joseph Mitchell (Computational Geometry Professor at Stony Brook)
  • Steven Skiena (Computer Science Professor at Stony Brook)
  • Jeff Erickson (Computer Science Professor at UIUC)

Books

  • “Algorithms and Data Structures for Massive Datasets” (by Dzejla Medjedovic, Emin Tahirovic, and Ines Dedovic)
  • “The Algorithm Design Manual” (by Steven Skiena)

Here is a permanent 40% discount code (good for all Manning products in all formats) for Datacast listeners: poddcast19. Link at http://mng.bz/4MAR.

Here is one free eBook code good for a copy of Algorithms and Data Structures for Massive Datasets for a lucky listener: algdcsr-7135. Link at http://mng.bz/Q2y6



This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit datacast.substack.com/subscribe
...more
View all episodesView all episodes
Download on the App Store

DatacastBy James Le