
Sign up to save your podcasts
Or
ArXiv Computer Vision research for Wednesday, June 05, 2024.
00:20: A Self-Supervised Denoising Strategy for Underwater Acoustic Camera Imageries
01:26: Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
02:40: U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation
04:14: Exploring Data Efficiency in Zero-Shot Learning with Diffusion Models
06:09: P2PFormer: A Primitive-to-polygon Method for Regular Building Contour Extraction from Remote Sensing Images
08:03: Radiomics-guided Multimodal Self-attention Network for Predicting Pathological Complete Response in Breast MRI
09:08: AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
10:49: Understanding the Impact of Negative Prompts: When and How Do They Take Effect?
12:03: Adversarial Generation of Hierarchical Gaussians for 3D Generative Model
13:45: Event3DGS: Event-based 3D Gaussian Splatting for Fast Egomotion
15:19: DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection
16:56: Sparse Color-Code Net: Real-Time RGB-Based 6D Object Pose Estimation on Edge Devices
18:07: Self-Supervised Skeleton Action Representation Learning: A Benchmark and Beyond
20:01: Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment
21:05: Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification
22:52: A Human-Annotated Video Dataset for Training and Evaluation of 360-Degree Video Summarization Methods
23:53: EdgeSync: Faster Edge-model Updating via Adaptive Continuous Learning for Video Data Drift
25:13: Phy-Diff: Physics-guided Hourglass Diffusion Model for Diffusion MRI Synthesis
26:34: DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences
27:58: DifAttack++: Query-Efficient Black-Box Adversarial Attack via Hierarchical Disentangled Feature Space in Cross Domain
29:35: Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction
31:02: Instructing Prompt-to-Prompt Generation for Zero-Shot Learning
32:31: Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control
34:20: Giving each task what it needs -- leveraging structured sparsity for tailored multi-task learning
ArXiv Computer Vision research for Wednesday, June 05, 2024.
00:20: A Self-Supervised Denoising Strategy for Underwater Acoustic Camera Imageries
01:26: Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
02:40: U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation
04:14: Exploring Data Efficiency in Zero-Shot Learning with Diffusion Models
06:09: P2PFormer: A Primitive-to-polygon Method for Regular Building Contour Extraction from Remote Sensing Images
08:03: Radiomics-guided Multimodal Self-attention Network for Predicting Pathological Complete Response in Breast MRI
09:08: AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
10:49: Understanding the Impact of Negative Prompts: When and How Do They Take Effect?
12:03: Adversarial Generation of Hierarchical Gaussians for 3D Generative Model
13:45: Event3DGS: Event-based 3D Gaussian Splatting for Fast Egomotion
15:19: DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection
16:56: Sparse Color-Code Net: Real-Time RGB-Based 6D Object Pose Estimation on Edge Devices
18:07: Self-Supervised Skeleton Action Representation Learning: A Benchmark and Beyond
20:01: Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment
21:05: Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification
22:52: A Human-Annotated Video Dataset for Training and Evaluation of 360-Degree Video Summarization Methods
23:53: EdgeSync: Faster Edge-model Updating via Adaptive Continuous Learning for Video Data Drift
25:13: Phy-Diff: Physics-guided Hourglass Diffusion Model for Diffusion MRI Synthesis
26:34: DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences
27:58: DifAttack++: Query-Efficient Black-Box Adversarial Attack via Hierarchical Disentangled Feature Space in Cross Domain
29:35: Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction
31:02: Instructing Prompt-to-Prompt Generation for Zero-Shot Learning
32:31: Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control
34:20: Giving each task what it needs -- leveraging structured sparsity for tailored multi-task learning