Codex Mentis: Science and tech to study cognition

Beyond the cloud: Reclaiming data sovereignty in speech transcription


Listen Later

🪄 Created using Google Gemini and NotebookLM.
In this episode of Codex Mentis, we explore the critical intersection of generative AI and research methodology, focusing on a production-ready, open-source workflow for secure speech transcription developed by Dr Pablo Bernabeu. While OpenAI’s Whisper models have set a new gold standard for speech-to-text accuracy, relying on consumer-grade cloud interfaces like ChatGPT or Google Gemini often proves incompatible with the rigorous demands of academic and clinical research. We dissect the three primary limitations of these cloud-based tools—restrictive file size caps, a lack of methodological reproducibility, and the significant privacy and GDPR risks inherent in transmitting sensitive human data to third-party servers. The discussion highlights a sophisticated alternative that leverages high-performance computing environments to achieve complete data sovereignty by running transcription entirely offline within a secure institutional perimeter. We break down the engineering behind this transition, including the use of SLURM job scheduling for unlimited scalability across GPU nodes and the implementation of advanced quality controls to fix common AI hallucinations such as spurious repetitions and accidental language switching. Furthermore, we examine the system's intelligent, multi-tiered approach to personal name masking and speaker diarisation, which ensures participant anonymity and structured dialogue without compromising the semantic integrity of the research data. This episode provides a comprehensive look at how researchers can balance the power of modern AI with the non-negotiable requirements of ethical compliance and long-term scientific sustainability.
Sources and related content can be consulted at https://pablobernabeu.github.io/2025/speech-transcription-python
...more
View all episodesView all episodes
Download on the App Store

Codex Mentis: Science and tech to study cognitionBy Pablo Bernabeu