
Sign up to save your podcasts
Or
ArXiv Computer Vision research for Monday, June 03, 2024.
00:20: Mixture of Rationale: Multi-Modal Reasoning Mixture for Visual Question Answering
01:35: An expert-driven data generation pipeline for histological images
02:26: Sensitivity-Informed Augmentation for Robust Segmentation
04:10: EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding
05:44: ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models
07:12: SLANT: Spurious Logo ANalysis Toolkit
09:09: SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation
10:48: Automatic Fused Multimodal Deep Learning for Plant Identification
12:19: MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization
13:53: DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors
15:27: Towards Automating the Retrospective Generation of BIM Models: A Unified Framework for 3D Semantic Reconstruction of the Built Environment
16:07: Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
17:42: DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention
18:58: Learning Temporally Consistent Video Depth from Video Diffusion Priors
20:42: Robust Classification by Coupling Data Mollification with Label Smoothing
21:27: ELSA: Evaluating Localization of Social Activities in Urban Streets
23:17: Towards Flexible Interactive Reflection Removal with Human Guidance
25:08: Prototypical Transformer as Unified Motion Learners
26:18: Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation
27:52: Tetrahedron Splatting for 3D Generation
29:40: Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIP
31:03: SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
32:44: DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation
33:53: Text-guided Controllable Mesh Refinement for Interactive 3D Modeling
35:01: Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting
36:19: DiffUHaul: A Training-Free Method for Object Dragging in Images
37:58: MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild
ArXiv Computer Vision research for Monday, June 03, 2024.
00:20: Mixture of Rationale: Multi-Modal Reasoning Mixture for Visual Question Answering
01:35: An expert-driven data generation pipeline for histological images
02:26: Sensitivity-Informed Augmentation for Robust Segmentation
04:10: EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding
05:44: ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models
07:12: SLANT: Spurious Logo ANalysis Toolkit
09:09: SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation
10:48: Automatic Fused Multimodal Deep Learning for Plant Identification
12:19: MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization
13:53: DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors
15:27: Towards Automating the Retrospective Generation of BIM Models: A Unified Framework for 3D Semantic Reconstruction of the Built Environment
16:07: Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
17:42: DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention
18:58: Learning Temporally Consistent Video Depth from Video Diffusion Priors
20:42: Robust Classification by Coupling Data Mollification with Label Smoothing
21:27: ELSA: Evaluating Localization of Social Activities in Urban Streets
23:17: Towards Flexible Interactive Reflection Removal with Human Guidance
25:08: Prototypical Transformer as Unified Motion Learners
26:18: Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation
27:52: Tetrahedron Splatting for 3D Generation
29:40: Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIP
31:03: SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
32:44: DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation
33:53: Text-guided Controllable Mesh Refinement for Interactive 3D Modeling
35:01: Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting
36:19: DiffUHaul: A Training-Free Method for Object Dragging in Images
37:58: MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild