June 05, 2024

Ep. 238 - Part 2 - June 4, 2024

44 minutes

ArXiv Computer Vision research for Tuesday, June 04, 2024.

00:20: FedDr+: Stabilizing Dot-regression with Global Feature Distillation for Federated Learning

02:06: EUFCC-340K: A Faceted Hierarchical Dataset for Metadata Annotation in GLAM Collections

03:14: Learning to Edit Visual Programs with Self-Supervision

04:15: Low-Rank Adaption on Transformer-based Oriented Object Detector for Satellite Onboard Processing of Remote Sensing Images

06:12: WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections

07:39: Decoupling of neural network calibration measures

08:48: IterMask2: Iterative Unsupervised Anomaly Segmentation via Spatial and Frequency Masking for Brain Lesions in MRI

10:29: CoNav: A Benchmark for Human-Centered Collaborative Navigation

12:05: Generative Active Learning for Long-tailed Instance Segmentation

13:17: RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting

14:51: Learning Image Priors through Patch-based Diffusion Models for Solving Inverse Problems

16:23: DL-KDD: Dual-Light Knowledge Distillation for Action Recognition in the Dark

18:13: Inpainting Pathology in Lumbar Spine MRI with Latent Diffusion

19:59: Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation

21:43: GenS: Generalizable Neural Surface Reconstruction from Multi-View Images

23:20: An Open-Source Tool for Mapping War Destruction at Scale in Ukraine using Sentinel-1 Time Series

24:48: Guiding a Diffusion Model with a Bad Version of Itself

25:59: CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation

27:17: V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation

28:50: DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering

30:17: SatSplatYOLO: 3D Gaussian Splatting-based Virtual Object Detection Ensembles for Satellite Feature Recognition

31:24: Enhancing 2D Representation Learning with a 3D Prior

32:32: Parrot: Multilingual Visual Instruction Tuning

34:32: ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

36:20: Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

38:20: Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning

39:41: Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation

41:52: Dreamguider: Improved Training free Diffusion-based Conditional Generation

43:10: VHS: High-Resolution Iterative Stereo Matching with Visual Hull Priors

...more

By Brad Edwards