
Sign up to save your podcasts
Or
ArXiv Computer Vision research for Sunday, June 09, 2024.
00:20: ControlLoc: Physical-World Hijacking Attack on Visual Perception in Autonomous Driving
02:23: Unified Text-to-Image Generation and Retrieval
03:51: F-LMM: Grounding Frozen Large Multimodal Models
05:34: Multi-Stain Multi-Level Convolutional Network for Multi-Tissue Breast Cancer Image Segmentation
07:43: BOSC: A toolbox for aerial imagery mapping
08:27: Mamba YOLO: SSMs-Based YOLO For Object Detection
10:12: Solution for CVPR 2024 UG2+ Challenge Track on All Weather Semantic Segmentation
11:02: Scaling Graph Convolutions for Mobile Vision
12:59: RefGaussian: Disentangling Reflections from 3D Gaussian Splatting for Realistic Rendering
14:28: Self-supervised Adversarial Training of Monocular Depth Estimation against Physical-World Attacks
15:45: Procrastination Is All You Need: Exponent Indexed Accumulators for Floating Point, Posits and Logarithmic Numbers
16:40: OmniControlNet: Dual-stage Integration for Conditional Image Generation
17:51: GCtx-UNet: Efficient Network for Medical Image Segmentation
19:14: InfoGaussian: Structure-Aware Dynamic Gaussians through Lightweight Information Shaping
20:40: BD-SAT: High-resolution Land Use Land Cover Dataset & Benchmark Results for Developing Division: Dhaka, BD
22:19: Bits-to-Photon: End-to-End Learned Scalable Point Cloud Compression for Direct Rendering
23:28: MeanSparse: Post-Training Robustness Enhancement Through Mean-Centered Feature Sparsification
24:38: Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024
26:12: CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
29:32: Inter-slice Super-resolution of Magnetic Resonance Images by Pre-training and Self-supervised Fine-tuning
31:04: Causality-inspired Latent Feature Augmentation for Single Domain Generalization
32:41: MHS-VM: Multi-Head Scanning in Parallel Subspaces for Vision Mamba
34:13: FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model
ArXiv Computer Vision research for Sunday, June 09, 2024.
00:20: ControlLoc: Physical-World Hijacking Attack on Visual Perception in Autonomous Driving
02:23: Unified Text-to-Image Generation and Retrieval
03:51: F-LMM: Grounding Frozen Large Multimodal Models
05:34: Multi-Stain Multi-Level Convolutional Network for Multi-Tissue Breast Cancer Image Segmentation
07:43: BOSC: A toolbox for aerial imagery mapping
08:27: Mamba YOLO: SSMs-Based YOLO For Object Detection
10:12: Solution for CVPR 2024 UG2+ Challenge Track on All Weather Semantic Segmentation
11:02: Scaling Graph Convolutions for Mobile Vision
12:59: RefGaussian: Disentangling Reflections from 3D Gaussian Splatting for Realistic Rendering
14:28: Self-supervised Adversarial Training of Monocular Depth Estimation against Physical-World Attacks
15:45: Procrastination Is All You Need: Exponent Indexed Accumulators for Floating Point, Posits and Logarithmic Numbers
16:40: OmniControlNet: Dual-stage Integration for Conditional Image Generation
17:51: GCtx-UNet: Efficient Network for Medical Image Segmentation
19:14: InfoGaussian: Structure-Aware Dynamic Gaussians through Lightweight Information Shaping
20:40: BD-SAT: High-resolution Land Use Land Cover Dataset & Benchmark Results for Developing Division: Dhaka, BD
22:19: Bits-to-Photon: End-to-End Learned Scalable Point Cloud Compression for Direct Rendering
23:28: MeanSparse: Post-Training Robustness Enhancement Through Mean-Centered Feature Sparsification
24:38: Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024
26:12: CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
29:32: Inter-slice Super-resolution of Magnetic Resonance Images by Pre-training and Self-supervised Fine-tuning
31:04: Causality-inspired Latent Feature Augmentation for Single Domain Generalization
32:41: MHS-VM: Multi-Head Scanning in Parallel Subspaces for Vision Mamba
34:13: FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model