
Sign up to save your podcasts
Or
本期的 20 篇论文如下:
[00:25] 🎙 Soundwave: Less is More for Speech-Text Alignment in LLMs(声波:减少数据需求,优化语音与文本对齐在LLMs中的应用)
[01:05] 🔍 Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity(将1568个Token压缩到一个向量并再次解压:探索嵌入空间容量的极限)
[01:48] 🌊 Continuous Diffusion Model for Language Modeling(连续扩散模型用于语言建模)
[02:30] 🎥 Phantom: Subject-consistent video generation via cross-modal alignment(幻影:通过跨模态对齐实现主体一致性视频生成)
[03:12] 🧠 Rethinking Diverse Human Preference Learning through Principal Component Analysis(重新思考通过主成分分析进行多样化人类偏好学习)
[04:00] 🤖 SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation(SoFar:语言引导的方向桥接空间推理与对象操作)
[04:36] 🛡 SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models(SafeRoute:大型语言模型中高效且准确的安全防护栏的自适应模型选择)
[05:25] 🐍 Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation(多模态Mamba:通过二次到线性蒸馏的解码器多模态状态空间模型)
[06:08] 📚 You Do Not Fully Utilize Transformer's Representation Capacity(你没有充分利用Transformer的表示能力)
[06:50] 🤖 Magma: A Foundation Model for Multimodal AI Agents(熔岩:多模态AI代理的基础模型)
[07:23] 💹 FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading(FLAG-Trader:融合LLM与基于梯度的强化学习用于金融交易)
[08:08] 📄 RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm(RealSyn:一种有效且可扩展的多模态交错文档转换范式)
[08:49] 🧠 PAFT: Prompt-Agnostic Fine-Tuning(PAFT:与提示无关的微调)
[09:27] 🛠 OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning(OctoTools:一个具有扩展工具的复杂推理代理框架)
[10:13] 📊 Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?(重新审视o1类模型的测试时缩放能力:它们是否真正具备测试时缩放能力?)
[11:00] 🔄 MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections(MUDDFormer:通过多路动态密集连接打破Transformer中的残差瓶颈)
[11:37] 🩺 HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation(HealthGPT:通过异构知识适应实现医疗大视觉语言模型的统一理解与生成)
[12:12] 🧠 HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading(HeadInfer:通过分头卸载实现高效的LLM推理)
[12:51] 🌍 Text2World: Benchmarking Large Language Models for Symbolic World Model Generation(文本到世界:大语言模型符号世界模型生成的基准测试)
[13:32] 🧠 Atom of Thoughts for Markov LLM Test-Time Scaling(用于马尔可夫LLM测试时扩展的原子思维)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
本期的 20 篇论文如下:
[00:25] 🎙 Soundwave: Less is More for Speech-Text Alignment in LLMs(声波:减少数据需求,优化语音与文本对齐在LLMs中的应用)
[01:05] 🔍 Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity(将1568个Token压缩到一个向量并再次解压:探索嵌入空间容量的极限)
[01:48] 🌊 Continuous Diffusion Model for Language Modeling(连续扩散模型用于语言建模)
[02:30] 🎥 Phantom: Subject-consistent video generation via cross-modal alignment(幻影:通过跨模态对齐实现主体一致性视频生成)
[03:12] 🧠 Rethinking Diverse Human Preference Learning through Principal Component Analysis(重新思考通过主成分分析进行多样化人类偏好学习)
[04:00] 🤖 SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation(SoFar:语言引导的方向桥接空间推理与对象操作)
[04:36] 🛡 SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models(SafeRoute:大型语言模型中高效且准确的安全防护栏的自适应模型选择)
[05:25] 🐍 Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation(多模态Mamba:通过二次到线性蒸馏的解码器多模态状态空间模型)
[06:08] 📚 You Do Not Fully Utilize Transformer's Representation Capacity(你没有充分利用Transformer的表示能力)
[06:50] 🤖 Magma: A Foundation Model for Multimodal AI Agents(熔岩:多模态AI代理的基础模型)
[07:23] 💹 FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading(FLAG-Trader:融合LLM与基于梯度的强化学习用于金融交易)
[08:08] 📄 RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm(RealSyn:一种有效且可扩展的多模态交错文档转换范式)
[08:49] 🧠 PAFT: Prompt-Agnostic Fine-Tuning(PAFT:与提示无关的微调)
[09:27] 🛠 OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning(OctoTools:一个具有扩展工具的复杂推理代理框架)
[10:13] 📊 Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?(重新审视o1类模型的测试时缩放能力:它们是否真正具备测试时缩放能力?)
[11:00] 🔄 MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections(MUDDFormer:通过多路动态密集连接打破Transformer中的残差瓶颈)
[11:37] 🩺 HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation(HealthGPT:通过异构知识适应实现医疗大视觉语言模型的统一理解与生成)
[12:12] 🧠 HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading(HeadInfer:通过分头卸载实现高效的LLM推理)
[12:51] 🌍 Text2World: Benchmarking Large Language Models for Symbolic World Model Generation(文本到世界:大语言模型符号世界模型生成的基准测试)
[13:32] 🧠 Atom of Thoughts for Markov LLM Test-Time Scaling(用于马尔可夫LLM测试时扩展的原子思维)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递