June 04, 2024

Ep. 237 - Part 3 - June 3, 2024

39 minutes

ArXiv Computer Vision research for Monday, June 03, 2024.

00:20: Mixture of Rationale: Multi-Modal Reasoning Mixture for Visual Question Answering

01:35: An expert-driven data generation pipeline for histological images

02:26: Sensitivity-Informed Augmentation for Robust Segmentation

04:10: EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding

05:44: ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models

07:12: SLANT: Spurious Logo ANalysis Toolkit

09:09: SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation

10:48: Automatic Fused Multimodal Deep Learning for Plant Identification

12:19: MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization

13:53: DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors

15:27: Towards Automating the Retrospective Generation of BIM Models: A Unified Framework for 3D Semantic Reconstruction of the Built Environment

16:07: Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos

17:42: DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention

18:58: Learning Temporally Consistent Video Depth from Video Diffusion Priors

20:42: Robust Classification by Coupling Data Mollification with Label Smoothing

21:27: ELSA: Evaluating Localization of Social Activities in Urban Streets

23:17: Towards Flexible Interactive Reflection Removal with Human Guidance

25:08: Prototypical Transformer as Unified Motion Learners

26:18: Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation

27:52: Tetrahedron Splatting for 3D Generation

29:40: Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIP

31:03: SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model

32:44: DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation

33:53: Text-guided Controllable Mesh Refinement for Interactive 3D Modeling

35:01: Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting

36:19: DiffUHaul: A Training-Free Method for Object Dragging in Images

37:58: MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild

...more

By Brad Edwards