December 03, 2020

HPR3219: Linux Inlaws S01E18: Voice Recognition and Text to Speech

1 hour 19 minutes

In this episode, Chris is harassed by quite a few artificial nuisance callers, among

drug lords, Irish nurses and some random Linux Inlaws Chief Financial Officer. Based

on these examples, our two heroes discuss the history and current state of text-to-

speech (TTS) and voice recognition. We attempted to use voice recognition software in order

to produce a transcript of the show.

Shownotes:

Wavenet: https://deepmind.com/blog/article/wavenet-generative-model-raw-audio

Tacotron: https://ai.googleblog.com/2017/12/tacotron-2-generating-human-like-speech.html

DeepSpeech: https://github.com/mozilla/DeepSpeech

Lyrebird / Welcome.AI: https://www.welcome.ai/lyrebird

Nvidia Tacotron 2: https://github.com/NVIDIA/tacotron2

Tensorflow: https://www.tensorflow.org

PyTorch: https://pytorch.org

Melspectrograms: https://medium.com/analytics-vidhya/understanding-the-mel-spectrogram-fca2afa2ce53

GRAPHCORE: https://www.graphcore.ai

FGPA: https://en.wikipedia.org/wiki/Field-programmable_gate_array

IBM ROMP: https://en.wikipedia.org/wiki/IBM_ROMP

Google's TTS: https://cloud.google.com/text-to-speech

Apple M1: https://www.gsmarena.com/the_apple_m1_is_the_first_armbased_chipset_for_macs_with_the_fastest_cpu_cores_and_top_igpu-news-46222.php

Secure Enclaves: https://support.apple.com/guide/security/secure-enclave-overview-sec59b0b31ff/web

OSDU: https://www.opengroup.org/osdu/forum-homepage

Jack Kerouac's On the Road: https://en.wikipedia.org/wiki/On_the_Road

...more

View all episodes

By Linux Inlaws

December 03, 2020

HPR3219: Linux Inlaws S01E18: Voice Recognition and Text to Speech

1 hour 19 minutes

In this episode, Chris is harassed by quite a few artificial nuisance callers, among

drug lords, Irish nurses and some random Linux Inlaws Chief Financial Officer. Based

on these examples, our two heroes discuss the history and current state of text-to-

speech (TTS) and voice recognition. We attempted to use voice recognition software in order

to produce a transcript of the show.

Shownotes:

Wavenet: https://deepmind.com/blog/article/wavenet-generative-model-raw-audio

Tacotron: https://ai.googleblog.com/2017/12/tacotron-2-generating-human-like-speech.html

DeepSpeech: https://github.com/mozilla/DeepSpeech

Lyrebird / Welcome.AI: https://www.welcome.ai/lyrebird

Nvidia Tacotron 2: https://github.com/NVIDIA/tacotron2

Tensorflow: https://www.tensorflow.org

PyTorch: https://pytorch.org

Melspectrograms: https://medium.com/analytics-vidhya/understanding-the-mel-spectrogram-fca2afa2ce53

GRAPHCORE: https://www.graphcore.ai

FGPA: https://en.wikipedia.org/wiki/Field-programmable_gate_array

IBM ROMP: https://en.wikipedia.org/wiki/IBM_ROMP

Google's TTS: https://cloud.google.com/text-to-speech

Apple M1: https://www.gsmarena.com/the_apple_m1_is_the_first_armbased_chipset_for_macs_with_the_fastest_cpu_cores_and_top_igpu-news-46222.php

Secure Enclaves: https://support.apple.com/guide/security/secure-enclave-overview-sec59b0b31ff/web

OSDU: https://www.opengroup.org/osdu/forum-homepage

Jack Kerouac's On the Road: https://en.wikipedia.org/wiki/On_the_Road

...more

Share HPR3219: Linux Inlaws S01E18: Voice Recognition and Text to Speech

Sign up to save your podcasts

HPR3219: Linux Inlaws S01E18: Voice Recognition and Text to Speech

HPR3219: Linux Inlaws S01E18: Voice Recognition and Text to Speech