AI Papers Podcast

By PocketPod

A daily update on the latest AI Research Papers. We provide a high level overview of a handful of papers each day and will link all papers in the description for further reading. This podcast is creat... more

· Education

Download on the App Store

Download on the App Store

Get it on Google Play

FAQs about AI Papers Podcast:

How many episodes does AI Papers Podcast have?

The podcast currently has 145 episodes available.

AI Papers Podcast episodes:

May 26, 2024 Revolution in Image Generation, Thermodynamic Gradient Descent, DMD2 for Fast Synthesis, Distributed Speculative Inference
...more
11min
May 24, 2024 Language Model Mysteries, Personalized Image Generation, Audio-Visual Transformer Innovations, DeepSeek-Prover, Dense Connector: MLLM Potential
ReVideo: Remake a Video with Motion and Content Control
Not All Language Model Features Are Linear
RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale
Synthetic Data
Dense Connector for MLLMs
...more
11min
May 23, 2024 Transformer Linearity, Face-Adapter Diffusion Models, Cross-Layer Attention Shrinks LLMs, Image Generation Breakthrough
Your Transformer is Secretly Linear
Diffusion for World Modeling: Visual Details Matter in Atari
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and
Attribute Control
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
Personalized Residuals for Concept-Driven Text-to-Image Generation
...more
11min
May 22, 2024 Infinite Video Generation, High-Rank Fine-Tuning, Modular LLMs with LoRA Libraries
FIFO-Diffusion: Generating Infinite Videos from Text without Training
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Octo: An Open-Source Generalist Robot Policy
Towards Modular LLMs by Building and Reusing a Library of LoRAs
...more
10min
May 21, 2024 Tailoring Language Models for Science, Scaling Laws in NLP, Grounded 3D-LLM Innovations, Efficient Large Model Inference
INDUS: Effective and Efficient Language Models for Scientific
Applications
Observational Scaling Laws and the Predictability of Language Model
Performance
Grounded 3D-LLM with Referent Tokens
Layer-Condensed KV Cache for Efficient Inference of Large Language
Models
Dynamic data sampler for cross-language transfer learning in large
language models
...more
10min
May 18, 2024 Chameleon's Multimodal Breakthrough, LoRA's Learning Efficiency, Many-Shot In-Context Learning, Object Detection Innovation, Text-to-3D Generation
Chameleon: Mixed-Modal Early-Fusion Foundation Models
LoRA Learns Less and Forgets Less
Many-Shot In-Context Learning in Multimodal Foundation Models
CAT3D: Create Anything in 3D with Multi-View Diffusion Models
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode
...more
11min
May 17, 2024 Efficient Multimodality, Vision Suite's Custom Data, EEG Music Decoding Advances, Mobile Video Breakthrough
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in
Language Models
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Naturalistic Music Decoding from EEG Data via Latent Diffusion Models
No Time to Waste: Squeeze Time into Channel for Mobile Video
Understanding
...more
9min
May 16, 2024 Transformer Models Beyond Scaling, Multilingual Image Synthesis, Advanced Text-to-Image Control
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video
Diffusion Models
Beyond Scaling Laws: Understanding Transformer Performance with
Associative Memory
Coin3D: Controllable and Interactive 3D Assets Generation with
Proxy-Guided Conditioning
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with
Fine-Grained Chinese Understanding
Compositional Text-to-Image Generation with Dense Blob Representations
...more
10min
May 15, 2024 Vision-Language Model Design, Online RLHF Workflow, Multilingual AI, AI Memory Solution
What matters when building vision-language models?
RLHF Workflow: From Reward Modeling to Online RLHF
SUTRA: Scalable Multilingual Language Model Architecture
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and
Composition of Experts
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large
Language Models in Code Generation from Scientific Plots
...more
10min
May 14, 2024 BlenderAlchemy Revolution, Stylus Adapter Magic, DressCode Digital Fashion
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Stylus: Automatic Adapter Selection for Diffusion Models
Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual
and Action Representations
DressCode: Autoregressively Sewing and Generating Garments from Text
Guidance
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video
Dense Captioning
...more
11min

FAQs about AI Papers Podcast:

How many episodes does AI Papers Podcast have?

The podcast currently has 145 episodes available.