June 11, 2024

Ep. 242 - June 8, 2024

36 minutes

ArXiv Computer Vision research for Saturday, June 08, 2024.

00:20: Blurry-Consistency Segmentation Framework with Selective Stacking on Differential Interference Contrast 3D Breast Cancer Spheroid

01:31: 1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation

03:01: Metric Convolutions: A Unifying Theory to Adaptive Convolutions

04:13: Layered Image Vectorization via Semantic Simplification

05:18: Select-Mosaic: Data Augmentation Method for Dense Small Object Scenes

06:31: 3D MRI Synthesis with Slice-Based Latent Diffusion Models: Improving Tumor Segmentation Tasks in Data-Scarce Regimes

07:51: Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models

09:42: Unsupervised learning of Data-driven Facial Expression Coding System (DFECS) using keypoint tracking

11:36: HDRT: Infrared Capture for HDR Imaging

13:14: Attri-Net: A Globally and Locally Inherently Interpretable Model for Multi-Label Classification Using Class-Specific Counterfactuals

14:49: Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis

16:18: Training-Free Robust Interactive Video Object Segmentation

17:49: One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models

19:50: A Two-Stage Adverse Weather Semantic Segmentation Method for WeatherProof Challenge CVPR 2024 Workshop UG2+

21:04: PAPR in Motion: Seamless Point-level 3D Scene Interpolation

22:25: VP-LLM: Text-Driven 3D Volume Completion with Large Language Models through Patchification

23:38: Medical Vision Generalist: Unifying Medical Imaging Tasks in Context

25:24: Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification

26:50: Understanding Inhibition Through Maximally Tense Images

27:52: Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

29:19: Deep Learning to Predict Glaucoma Progression using Structural Changes in the Eye

30:58: Which Backbone to Use: A Resource-efficient Domain Specific Comparison for Computer Vision

32:32: Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval

34:11: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

35:35: Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion

...more

View all episodes

By Brad Edwards