Disseminate: The Computer Science Research Podcast

Thomas Hütter | JEDI: These aren’t the JSON documents you’re looking for | #4


Listen Later

Summary:


The JavaScript Object Notation (JSON) is a popular data format used in document stores to natively support semi-structured data.

In this interview, Thomas talks about how he addressed the problem of JSON similarity lookup queries: given a query document and a distance threshold, retrieve all documents that are within the threshold from the query document, i.e., get me all similar documents!. Different from other hierarchical formats such as XML, JSON supports both ordered and unordered sibling collections within a single document which poses a new challenge to the tree model and distance computation. Thomas talks about his proposal JSON tree, a lossless tree representation of JSON documents, and define the JSON Edit Distance (JEDI), the first edit-based distance measure for JSON. He talks about the development of QuickJEDI, an algorithm that computes JEDI by leveraging a new technique to prune expensive sibling matchings. It outperforms a baseline algorithm by an order of magnitude in runtime. Our experimental evaluation shows that our solution scales to databases with millions of documents and JSON trees with tens of thousands of nodes.


Questions:

0:47: Can you explain to the listeners what is JSON?

1:14: What is the problem you're trying to solve in your research?

1:48: What was the reason JSON was under researched?

2:13: What is the motivation for this research? Why do we need it?

2:52: What was the solution you developed to solve this problem?

4:35: How does tree edit distance work?

5:18: How do we go from tree edit distance to JEDI?

6:29: How did you evaluate JEDI?

8:31: Do other database systems provide similar functionality?

9:33: Can you tell the listeners more about AsterixDB?

10:20: What was the most challenge aspect of working on this topic?

10:59: What are the future plans for this research?

11:56: What attracted you to working on similarity queries?


Links:


Hosted on Acast. See acast.com/privacy for more information.

...more
View all episodesView all episodes
Download on the App Store

Disseminate: The Computer Science Research PodcastBy Jack Waudby

  • 5
  • 5
  • 5
  • 5
  • 5

5

6 ratings


More shows like Disseminate: The Computer Science Research Podcast

View all
The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

284 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

621 Listeners

The Daily by The New York Times

The Daily

111,864 Listeners

Oxide and Friends by Oxide Computer Company

Oxide and Friends

47 Listeners

Developer Voices by Kris Jenkins

Developer Voices

28 Listeners

localfirst.fm by localfirst.fm

localfirst.fm

18 Listeners

Better Offline by Cool Zone Media and iHeartPodcasts

Better Offline

491 Listeners