
Sign up to save your podcasts
Or
ArXiv Computer Vision research for Monday, June 03, 2024.
00:20: Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model
02:26: MultiEdits: Simultaneous Multi-Aspect Editing with Text-to-Image Diffusion Models
03:57: Uni-ISP: Unifying the Learning of ISPs from Multiple Cameras
05:36: CLIP-Guided Attribute Aware Pretraining for Generalizable Image Quality Assessment
07:01: Khayyam Offline Persian Handwriting Dataset
08:04: LLEMamba: Low-Light Enhancement via Relighting-Guided Mamba with Deep Unfolding Network
09:31: CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos
11:00: Generalized Jersey Number Recognition Using Multi-task Learning With Orientation-guided Weight Refinement
12:41: Synthetic Data Generation for 3D Myocardium Deformation Analysis
14:00: Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting
15:51: Virtual avatar generation models as world navigators
16:26: VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model
18:04: SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
19:32: DANCE: Dual-View Distribution Alignment for Dataset Condensation
21:10: UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment
22:47: Visual Car Brand Classification by Implementing a Synthetic Image Dataset Creation Pipeline
23:50: Understanding the Cross-Domain Capabilities of Video-Based Few-Shot Action Recognition Models
25:42: Estimating Canopy Height at Scale
26:50: CUT: A Controllable, Universal, and Training-Free Visual Anomaly Generation Framework
27:59: Object Aware Egocentric Online Action Detection
29:06: BACON: Bayesian Optimal Condensation Framework for Dataset Distillation
30:49: $\Delta$-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers
32:29: Learning Adaptive Fusion Bank for Multi-modal Salient Object Detection
34:18: Towards Practical Single-shot Motion Synthesis
35:37: DeepUniUSTransformer: Towards A Universal UltraSound Model with Prompted Guidance
37:12: Dimba: Transformer-Mamba Diffusion Models
38:35: Zero-Shot Out-of-Distribution Detection with Outlier Label Exposure
ArXiv Computer Vision research for Monday, June 03, 2024.
00:20: Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model
02:26: MultiEdits: Simultaneous Multi-Aspect Editing with Text-to-Image Diffusion Models
03:57: Uni-ISP: Unifying the Learning of ISPs from Multiple Cameras
05:36: CLIP-Guided Attribute Aware Pretraining for Generalizable Image Quality Assessment
07:01: Khayyam Offline Persian Handwriting Dataset
08:04: LLEMamba: Low-Light Enhancement via Relighting-Guided Mamba with Deep Unfolding Network
09:31: CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos
11:00: Generalized Jersey Number Recognition Using Multi-task Learning With Orientation-guided Weight Refinement
12:41: Synthetic Data Generation for 3D Myocardium Deformation Analysis
14:00: Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting
15:51: Virtual avatar generation models as world navigators
16:26: VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model
18:04: SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
19:32: DANCE: Dual-View Distribution Alignment for Dataset Condensation
21:10: UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment
22:47: Visual Car Brand Classification by Implementing a Synthetic Image Dataset Creation Pipeline
23:50: Understanding the Cross-Domain Capabilities of Video-Based Few-Shot Action Recognition Models
25:42: Estimating Canopy Height at Scale
26:50: CUT: A Controllable, Universal, and Training-Free Visual Anomaly Generation Framework
27:59: Object Aware Egocentric Online Action Detection
29:06: BACON: Bayesian Optimal Condensation Framework for Dataset Distillation
30:49: $\Delta$-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers
32:29: Learning Adaptive Fusion Bank for Multi-modal Salient Object Detection
34:18: Towards Practical Single-shot Motion Synthesis
35:37: DeepUniUSTransformer: Towards A Universal UltraSound Model with Prompted Guidance
37:12: Dimba: Transformer-Mamba Diffusion Models
38:35: Zero-Shot Out-of-Distribution Detection with Outlier Label Exposure