
Sign up to save your podcasts
Or
Angelo and Manos' connection began in the 265 Course at Harvard University on Big Data Systems. This course inspired Angelo's thesis. The two discuss Manos' papers and how the future of Big Data is on the boundaries of Moore's Law. If you think about LSM trees (Log-Structured Merge Trees) and compacting data, what is considered acceptable deletion when users ask for their data to be removed? Is it when the data is removed from the identifying user that is good enough? In the analysis of Big Data Systems, considerations are always towards performance. An extensive delete sequence will cause a significant disruption in the system. Most people would address the completion of current execution cycles, perhaps during non-peak hours, and flag the no longer valid data. Maybe it could be that your data starts to become dirty, then what? How do you solve issues like privacy and the request for the "Right to be forgotten" or the "Right to erase"?
Manos speaks about the papers he has written, which you can read in the links below. He addresses the delete question and boundaries with privacy in mind. Performance is a crucial factor, and looking at the issue holistically is just as important as encryption when protecting privacy.
Mano's Research Papers
Further Reading
Host: Angelo Kastroulis
Executive Producer: Náture Kastroulis
Producer: Albert Perrotta
Communications Strategist: Albert Perrotta
Video/Audio Engineer: Ryan Thompson
Music: All Things Grow by Oliver Worth
5
2323 ratings
Angelo and Manos' connection began in the 265 Course at Harvard University on Big Data Systems. This course inspired Angelo's thesis. The two discuss Manos' papers and how the future of Big Data is on the boundaries of Moore's Law. If you think about LSM trees (Log-Structured Merge Trees) and compacting data, what is considered acceptable deletion when users ask for their data to be removed? Is it when the data is removed from the identifying user that is good enough? In the analysis of Big Data Systems, considerations are always towards performance. An extensive delete sequence will cause a significant disruption in the system. Most people would address the completion of current execution cycles, perhaps during non-peak hours, and flag the no longer valid data. Maybe it could be that your data starts to become dirty, then what? How do you solve issues like privacy and the request for the "Right to be forgotten" or the "Right to erase"?
Manos speaks about the papers he has written, which you can read in the links below. He addresses the delete question and boundaries with privacy in mind. Performance is a crucial factor, and looking at the issue holistically is just as important as encryption when protecting privacy.
Mano's Research Papers
Further Reading
Host: Angelo Kastroulis
Executive Producer: Náture Kastroulis
Producer: Albert Perrotta
Communications Strategist: Albert Perrotta
Video/Audio Engineer: Ryan Thompson
Music: All Things Grow by Oliver Worth