
Sign up to save your podcasts
Or
Our last AI PhD grad student feature was Shunyu Yao, who happened to focus on Language Agents for his thesis and immediately went to work on them for OpenAI. Our pick this year is Jack Morris, who bucks the “hot” trends by -not- working on agents, benchmarks, or VS Code forks, but is rather known for his work on the information theoretic understanding of LLMs, starting from embedding models and latent space representations (always close to our heart).
Jack is an unusual combination of doing underrated research but somehow still being to explain them well to a mass audience, so we felt this was a good opportunity to do a different kind of episode going through the greatest hits of a high profile AI PhD, and relate them to questions from AI Engineering.
Papers and References made
AI grad school: https://x.com/jxmnop/status/1933884519557353716
A new type of information theory: https://x.com/jxmnop/status/1904238408899101014
Embeddings
Text Embeddings Reveal (Almost) As Much As Text: https://arxiv.org/abs/2310.06816
Contextual document embeddings https://arxiv.org/abs/2410.02525
Harnessing the Universal Geometry of Embeddings: https://arxiv.org/abs/2505.12540
Language models
GPT-style language models memorize 3.6 bits per param: https://x.com/jxmnop/status/1929903028372459909
Approximating Language Model Training Data from Weights: https://arxiv.org/abs/2506.15553
https://x.com/jxmnop/status/1936044666371146076
LLM Inversion
"There Are No New Ideas In AI.... Only New Datasets"
https://x.com/jxmnop/status/1910087098570338756
https://blog.jxmo.io/p/there-are-no-new-ideas-in-ai-only
misc reference: https://junyanz.github.io/CycleGAN/
—
for others hiring AI PhDs, Jack also wanted to shout out his coauthor
Zach Nussbaum, his coauthor on Nomic Embed: Training a Reproducible Long Context Text Embedder.
4.7
6666 ratings
Our last AI PhD grad student feature was Shunyu Yao, who happened to focus on Language Agents for his thesis and immediately went to work on them for OpenAI. Our pick this year is Jack Morris, who bucks the “hot” trends by -not- working on agents, benchmarks, or VS Code forks, but is rather known for his work on the information theoretic understanding of LLMs, starting from embedding models and latent space representations (always close to our heart).
Jack is an unusual combination of doing underrated research but somehow still being to explain them well to a mass audience, so we felt this was a good opportunity to do a different kind of episode going through the greatest hits of a high profile AI PhD, and relate them to questions from AI Engineering.
Papers and References made
AI grad school: https://x.com/jxmnop/status/1933884519557353716
A new type of information theory: https://x.com/jxmnop/status/1904238408899101014
Embeddings
Text Embeddings Reveal (Almost) As Much As Text: https://arxiv.org/abs/2310.06816
Contextual document embeddings https://arxiv.org/abs/2410.02525
Harnessing the Universal Geometry of Embeddings: https://arxiv.org/abs/2505.12540
Language models
GPT-style language models memorize 3.6 bits per param: https://x.com/jxmnop/status/1929903028372459909
Approximating Language Model Training Data from Weights: https://arxiv.org/abs/2506.15553
https://x.com/jxmnop/status/1936044666371146076
LLM Inversion
"There Are No New Ideas In AI.... Only New Datasets"
https://x.com/jxmnop/status/1910087098570338756
https://blog.jxmo.io/p/there-are-no-new-ideas-in-ai-only
misc reference: https://junyanz.github.io/CycleGAN/
—
for others hiring AI PhDs, Jack also wanted to shout out his coauthor
Zach Nussbaum, his coauthor on Nomic Embed: Training a Reproducible Long Context Text Embedder.
1,032 Listeners
441 Listeners
298 Listeners
322 Listeners
192 Listeners
198 Listeners
87 Listeners
389 Listeners
121 Listeners
201 Listeners
462 Listeners
461 Listeners
29 Listeners
22 Listeners
43 Listeners