
Sign up to save your podcasts
Or
本期的 21 篇论文如下:
[00:22] 🎥 VideoRoPE: What Makes for Good Video Rotary Position Embedding?(视频旋转位置嵌入:什么使得视频旋转位置嵌入有效?)
[01:07] 🎥 Fast Video Generation with Sliding Tile Attention(基于滑动瓦片注意力的快速视频生成)
[01:54] 🎥 Goku: Flow Based Video Generative Foundation Models(悟空:基于流的视频生成基础模型)
[02:35] 🌍 AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting(AuraFusion360:基于参考的360°无界场景修补增强未见区域对齐)
[03:19] 🔢 QuEST: Stable Training of LLMs with 1-Bit Weights and Activations(QuEST:使用1位权重和激活值稳定训练大型语言模型)
[03:57] 🛡 DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails(DuoGuard:一种基于双玩家强化学习的多语言大模型防护框架)
[04:40] 🧠 Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach(通过潜在推理扩展测试时计算:一种递归深度方法)
[05:28] 🎯 Agency Is Frame-Dependent(代理是框架依赖的)
[06:04] 🎥 FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation(闪视频:高效高分辨率视频生成中的细节保真)
[06:46] 📊 Linear Correlation in LM's Compositional Generalization and Hallucination(语言模型中的组合泛化与幻觉的线性相关性)
[07:32] 🧠 Generating Symbolic World Models via Test-time Scaling of Large Language Models(通过测试时扩展大型语言模型生成符号世界模型)
[08:09] 📱 On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices(设备上的Sora:为移动设备实现基于扩散的文本到视频生成)
[08:51] ⚡ CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference(CMoE:用于高效LLM推理的快速混合专家模型雕刻)
[09:32] 🧩 Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More(补丁化缩放定律:图像价值50,176个标记及以上)
[10:20] 🔄 Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models(退一步跃进:提升语言模型推理能力的自回溯机制)
[11:06] 🧠 CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance(CodeSteer:通过代码/文本引导的符号增强语言模型)
[11:50] 🧩 No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces(无任务落后:各向同性模型合并与通用及任务特定子空间)
[12:39] 🌓 YINYANG-ALIGN: Benchmarking Contradictory Objectives and Proposing Multi-Objective Optimization based DPO for Text-to-Image Alignment(阴阳对齐:基准测试矛盾目标并提出基于多目标优化的DPO用于文本到图像对齐)
[13:20] 🌐 QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation(QLIP:文本对齐视觉标记化统一自回归多模态理解和生成)
[14:02] 🧠 ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning(ARR:通过分析、检索和推理进行问答的大语言模型)
[14:48] 🤖 MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf(会议代表:评估大型语言模型在代为参加会议中的表现)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
本期的 21 篇论文如下:
[00:22] 🎥 VideoRoPE: What Makes for Good Video Rotary Position Embedding?(视频旋转位置嵌入:什么使得视频旋转位置嵌入有效?)
[01:07] 🎥 Fast Video Generation with Sliding Tile Attention(基于滑动瓦片注意力的快速视频生成)
[01:54] 🎥 Goku: Flow Based Video Generative Foundation Models(悟空:基于流的视频生成基础模型)
[02:35] 🌍 AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting(AuraFusion360:基于参考的360°无界场景修补增强未见区域对齐)
[03:19] 🔢 QuEST: Stable Training of LLMs with 1-Bit Weights and Activations(QuEST:使用1位权重和激活值稳定训练大型语言模型)
[03:57] 🛡 DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails(DuoGuard:一种基于双玩家强化学习的多语言大模型防护框架)
[04:40] 🧠 Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach(通过潜在推理扩展测试时计算:一种递归深度方法)
[05:28] 🎯 Agency Is Frame-Dependent(代理是框架依赖的)
[06:04] 🎥 FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation(闪视频:高效高分辨率视频生成中的细节保真)
[06:46] 📊 Linear Correlation in LM's Compositional Generalization and Hallucination(语言模型中的组合泛化与幻觉的线性相关性)
[07:32] 🧠 Generating Symbolic World Models via Test-time Scaling of Large Language Models(通过测试时扩展大型语言模型生成符号世界模型)
[08:09] 📱 On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices(设备上的Sora:为移动设备实现基于扩散的文本到视频生成)
[08:51] ⚡ CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference(CMoE:用于高效LLM推理的快速混合专家模型雕刻)
[09:32] 🧩 Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More(补丁化缩放定律:图像价值50,176个标记及以上)
[10:20] 🔄 Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models(退一步跃进:提升语言模型推理能力的自回溯机制)
[11:06] 🧠 CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance(CodeSteer:通过代码/文本引导的符号增强语言模型)
[11:50] 🧩 No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces(无任务落后:各向同性模型合并与通用及任务特定子空间)
[12:39] 🌓 YINYANG-ALIGN: Benchmarking Contradictory Objectives and Proposing Multi-Objective Optimization based DPO for Text-to-Image Alignment(阴阳对齐:基准测试矛盾目标并提出基于多目标优化的DPO用于文本到图像对齐)
[13:20] 🌐 QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation(QLIP:文本对齐视觉标记化统一自回归多模态理解和生成)
[14:02] 🧠 ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning(ARR:通过分析、检索和推理进行问答的大语言模型)
[14:48] 🤖 MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf(会议代表:评估大型语言模型在代为参加会议中的表现)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递