Explore regulatory‑grade multimodal data de‑identification and tokenisation with Youssef Mellah, PhD, Senior Data Scientist at John Snow Labs and Srikanth Kumar Rana, Solutions Architect at Databricks.
Learn how to remove, mask or transform PHI across clinical notes, tables, PDFs and DICOMs at scale, while meeting HIPAA, GDPR and CCPA standards — all without sacrificing analytical value.
Timestamps
00:00 – Welcome & Episode Overview
02:43 – How Databricks supports secure De‑identification workflows
03:50 – Built-in techniques: masking, encryption, hashing
05:26 – Introduction to Multimodal Data De-Identification
07:15 – OCR + NLP pipeline for visual & text data – PHI Extraction
08:35 – Notebook demo: PHI identification in clinical notes
12:00 – PDF de-identification
12:56 – DICOM file de-identification
14:18 – Output: consistent masking across all modalities
Listen on your favourite platform:
• YouTube: https://www.youtube.com/playlist?list=PL5zieHHAlvApZKkwtu746ivthRc5zyTiU
• Apple Podcast: https://podcasts.apple.com/us/podcast/the-healthcare-ai-podcast/id1827098175
• Spotify: https://open.spotify.com/show/2XNrQBeCY7OGql2jVhcP7t
• Amazon Music: https://music.amazon.com/podcasts/5b1f49a6-dba8-479e-acdf-9deac2f8f60e/the-healthcare-ai-podcast
Resources:
• John Snow Labs Models Hub: https://nlp.johnsnowlabs.com/models
• Spark NLP Workshop Repo: https://github.com/JohnSnowLabs/spark-nlp-workshop
• Visual NLP Workshop Repo: https://github.com/JohnSnowLabs/visual-nlp-workshop
• JSL Docs: https://nlp.johnsnowlabs.com/docs
• JSL Live Demos: https://nlp.johnsnowlabs.com/demos
• JSL Learning Hub: https://nlp.johnsnowlabs.com/learn
Connect with us:
Our website: https://www.johnsnowlabs.com/
LinkedIn: https://www.linkedin.com/company/johnsnowlabs/
Facebook: https://www.facebook.com/JohnSnowLabsInc/
X: https://x.com/JohnSnowLabs
#HealthcareAI #DataPrivacy #HIPAA #PHI #DeIdentification #MedicalAI #GDPR #HealthTech #MultimodalAI