January 16, 2025

2025.01.16 | MMDocIR推动多模态检索标准化，CityDreamer4D创新4D城市生成模型。

Listen Later

6 minutes

本期的 9 篇论文如下：

[00:25] 📊 MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents（MMDocIR：长文档多模态检索的基准测试）

[01:06] 🏙 CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities（CityDreamer4D：无界4D城市的组合生成模型）

[01:49] 🎥 RepVideo: Rethinking Cross-Layer Representation for Video Generation（RepVideo：重新思考视频生成中的跨层表示）

[02:30] 📚 Towards Best Practices for Open Datasets for LLM Training（面向LLM训练的最佳开放数据集实践）

[03:11] 🎵 XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework（XMusic：迈向通用且可控的符号音乐生成框架）

[03:46] 🔒 Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography（可信机器学习模型解锁当前密码学无法解决的隐私推理问题）

[04:23] 🔍 Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding（参数倒置图像金字塔网络用于视觉感知与多模态理解）

[05:03] 🎨 Multimodal LLMs Can Reason about Aesthetics in Zero-Shot（多模态大语言模型在零样本条件下对美学的推理能力）

[05:39] 🎥 Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion（Ouroboros-Diffusion：探索无调优长视频扩散中的一致内容生成）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

HuggingFace 每日AI论文速递

By duan

5

22 ratings

January 16, 2025

2025.01.16 | MMDocIR推动多模态检索标准化，CityDreamer4D创新4D城市生成模型。

Listen Later

6 minutes

本期的 9 篇论文如下：

[00:25] 📊 MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents（MMDocIR：长文档多模态检索的基准测试）

[01:06] 🏙 CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities（CityDreamer4D：无界4D城市的组合生成模型）

[01:49] 🎥 RepVideo: Rethinking Cross-Layer Representation for Video Generation（RepVideo：重新思考视频生成中的跨层表示）

[02:30] 📚 Towards Best Practices for Open Datasets for LLM Training（面向LLM训练的最佳开放数据集实践）

[03:11] 🎵 XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework（XMusic：迈向通用且可控的符号音乐生成框架）

[03:46] 🔒 Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography（可信机器学习模型解锁当前密码学无法解决的隐私推理问题）

[04:23] 🔍 Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding（参数倒置图像金字塔网络用于视觉感知与多模态理解）

[05:03] 🎨 Multimodal LLMs Can Reason about Aesthetics in Zero-Shot（多模态大语言模型在零样本条件下对美学的推理能力）

[05:39] 🎥 Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion（Ouroboros-Diffusion：探索无调优长视频扩散中的一致内容生成）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

More shows like HuggingFace 每日AI论文速递

硅谷101|中国版 by 泓君Jane

硅谷101|中国版

56 Listeners

商业就是这样 by 商业就是这样

商业就是这样

292 Listeners

声动早咖啡 by 声动活泼

声动早咖啡

293 Listeners

思文，败类 by 思文败类

思文，败类

157 Listeners

不开玩笑 Jokes Aside by 不开玩笑JokesAside

不开玩笑 Jokes Aside

136 Listeners

人民公园说AI by JustSayAI

人民公园说AI

7 Listeners

數創實驗室 - AI時代的學習指南 by Vincent在數創

數創實驗室 - AI時代的學習指南

1 Listeners

AI可可AI生活 by fly51fly

AI可可AI生活

0 Listeners