AI Papers Podcast

By PocketPod

A daily update on the latest AI Research Papers. We provide a high level overview of a handful of papers each day and will link all papers in the description for further reading. This podcast is creat... more

· Education

Download on the App Store

Download on the App Store

Get it on Google Play

FAQs about AI Papers Podcast:

How many episodes does AI Papers Podcast have?

The podcast currently has 145 episodes available.

AI Papers Podcast episodes:

December 16, 2024 AI Gets Human-Like Memory, Microsoft's New Math Whiz, and Teaching Robots to See Shapes
Today's advances in artificial intelligence showcase how researchers are tackling fundamental human capabilities - from continuous learning and memory to mathematical reasoning and visual understanding. These breakthroughs could transform everything from how we interact with AI assistants to enabling robots to better navigate our world, though questions remain about how closely machines can truly mimic human cognition. Links to all the papers we discussed: InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions, InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions, Phi-4 Technical Report, Phi-4 Technical Report, Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions, Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions
...more
11min
December 13, 2024 AI Video Generation Breakthrough, Enhanced Image Understanding, and Bilingual Vision Models
Today's tech advances signal a dramatic shift in how computers understand and create visual content, with new systems that can generate synchronized multi-camera videos, understand complex scene relationships, and bridge language barriers in visual recognition. These developments could revolutionize everything from virtual film production to global communication, while raising important questions about the future of human creativity and cross-cultural understanding in an AI-powered world.
Links to all the papers we discussed: SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse
Viewpoints, SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse
Viewpoints, LAION-SG: An Enhanced Large-Scale Dataset for Training Complex
Image-Text Models with Structural Annotations, LAION-SG: An Enhanced Large-Scale Dataset for Training Complex
Image-Text Models with Structural Annotations, POINTS1.5: Building a Vision-Language Model towards Real World
Applications, POINTS1.5: Building a Vision-Language Model towards Real World
Applications
...more
11min
December 12, 2024 AI Video Generation Improvements, Code Models Learn Human Preferences, and Manga Gets an AI Makeover
Today's tech frontiers showcase how artificial intelligence is becoming more attuned to human creativity and preferences across multiple domains. From a new system that can turn text and images into fluid videos, to programming models that write code the way humans actually want it, to AI that can generate custom manga stories, we explore how machines are learning to create content that feels more natural and personalized than ever before. Links to all the papers we discussed: STIV: Scalable Text and Image Conditioned Video Generation, STIV: Scalable Text and Image Conditioned Video Generation, Evaluating and Aligning CodeLLMs on Human Preference, Evaluating and Aligning CodeLLMs on Human Preference, DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation, DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
...more
10min
December 11, 2024 AI Memory Breakthrough, Math Error Detection, and New Ways of Machine Thinking
Today we explore how artificial intelligence is evolving to think more like humans, from developing different types of memory to catching mathematical mistakes. As researchers unveil new approaches to machine reasoning that go beyond traditional language-based thinking, these advances raise fascinating questions about the future relationship between human and artificial intelligence, and whether machines might someday outpace human cognitive capabilities in unexpected ways.
Links to all the papers we discussed: Unraveling the Complexity of Memory in RL Agents: an Approach for
Classification and Evaluation, Unraveling the Complexity of Memory in RL Agents: an Approach for
Classification and Evaluation, ProcessBench: Identifying Process Errors in Mathematical Reasoning, ProcessBench: Identifying Process Errors in Mathematical Reasoning, Training Large Language Models to Reason in a Continuous Latent Space, Training Large Language Models to Reason in a Continuous Latent Space
...more
11min
December 09, 2024 AI Models Break New Ground, Human Feedback Shapes Video Generation, and Open-Source Projects Challenge Tech Giants
Today's tech landscape sees a dramatic shift as artificial intelligence reaches new milestones in understanding and creating content, with open-source projects increasingly rivaling commercial giants. At the heart of these developments is a growing focus on human preferences and feedback, suggesting a future where AI systems become more attuned to human needs while remaining accessible to the broader research community. Links to all the papers we discussed: Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling, Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling, LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment, LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment, MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale, MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
...more
11min
August 21, 2024 Improving Agent Design, JPEG-LM's Visual Breakthrough, TurboEdit's Real-Time Image Edits, Video Segmentation Advances, LLMs Learning Like Humans, RL Benchmarks
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
Automated Design of Agentic Systems
TurboEdit: Instant text-based image editing
Surgical SAM 2: Real-time Segment Anything in Surgical Video by
Efficient Frame Pruning
Fine-tuning Large Language Models with Human-inspired Learning
Strategies in Medical Question Answering
D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning
...more
17min
August 16, 2024 Science & Clinical LLMs Leaps, Enhancing Small Model Reasoning, New Frontiers in Controlled Media Generation
The AI Scientist: Towards Fully Automated Open-Ended Scientific
Discovery
Med42-v2: A Suite of Clinical LLMs
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
ControlNeXt: Powerful and Efficient Control for Image and Video
Generation
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
FruitNeRF: A Unified Neural Radiance Field based Fruit Counting
Framework
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation
Agents
...more
15min
August 08, 2024 Multimodal Benchmarks, Visual Task Transfer, and 3D Object Generation
MMIU: Multimodal Multi-image Understanding for Evaluating Large
Vision-Language Models
LLaVA-OneVision: Easy Visual Task Transfer
An Object is Worth 64x64 Pixels: Generating 3D Object via Image
Diffusion
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular
Annotations for Medicine
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning
using Instruct Prompts
Scaling LLM Test-Time Compute Optimally can be More Effective than
Scaling Model Parameters
Diffusion Models as Data Mining Tools
...more
15min
August 05, 2024 Image and Video Segmentation with SAM 2, Gemma 2 for Efficient Language Models, Boosting Small Models with Contrastive Fine-Tuning, and MM-Vet v2 Challenges Large Multimodal Models
SAM 2: Segment Anything in Images and Videos
Gemma 2: Improving Open Language Models at a Practical Size
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal
Language Model
Improving Text Embeddings for Smaller Language Models Using Contrastive
Fine-tuning
OmniParser for Pure Vision Based GUI Agent
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and
Illumination Disentanglement
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models
for Integrated Capabilities
...more
14min
July 30, 2024 Text-Guided Image Inpainting, AMEX for Mobile GUI Agents, AgentScope's Multi-Agent Simulation
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
LAMBDA: A Large Model Based Data Agent
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents
BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular
Depth Estimation
Very Large-Scale Multi-Agent Simulation in AgentScope
Data Mixture Inference: What do BPE Tokenizers Reveal about their
Training Data?
Course-Correction: Safety Alignment Using Synthetic Preferences
...more
15min

FAQs about AI Papers Podcast:

How many episodes does AI Papers Podcast have?

The podcast currently has 145 episodes available.