
Sign up to save your podcasts
Or
ArXiv Computer Vision research for Monday, June 10, 2024.
00:20: ReCon1M:A Large-scale Benchmark Dataset for Relation Comprehension in Remote Sensing Imagery
01:59: Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset
03:44: Vript: A Video Is Worth Thousands of Words
05:38: FRAG: Frequency Adapting Group for Diffusion Video Editing
06:50: Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training
08:38: Robust Latent Representation Tuning for Image-text Classification
09:46: Generalizable Human Gaussians from Single-View Image
11:05: ProcessPainter: Learn Painting Process from Sequence Data
12:29: PointABM:Integrating Bidirectional State Space Model with Multi-Head Self-Attention for Point Cloud Analysis
13:41: Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control
15:00: Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks
16:14: GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
17:54: Texture Re-scalable Universal Adversarial Perturbation
19:44: W-Net: One-Shot Arbitrary-Style Chinese Character Generation with Deep Neural Networks
20:46: ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models
22:04: DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection
23:13: A Comparative Survey of Vision Transformers for Feature Extraction in Texture Analysis
25:15: Extending Segment Anything Model into Auditory and Temporal Dimensions for Audio-Visual Segmentation
26:36: Generalized Nested Latent Variable Models for Lossy Coding applied to Wind Turbine Scenarios
27:48: Black carbon plumes from gas flaring in North Africa identified from multi-spectral imagery with deep learning
28:58: An Effective-Efficient Approach for Dense Multi-Label Action Detection
30:42: 2DP-2MRC: 2-Dimensional Pointer-based Machine Reading Comprehension Method for Multimodal Moment Retrieval
31:49: iMotion-LLM: Motion Prediction Instruction Tuning
33:05: Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis
34:57: Data Augmentation in Earth Observation: A Diffusion Model Approach
36:22: UEMM-Air: A Synthetic Multi-modal Dataset for Unmanned Aerial Vehicle Object Detection
37:49: UnSupDLA: Towards Unsupervised Document Layout Analysis
39:11: I-MPN: Inductive Message Passing Network for Effective and Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data
40:46: Tuning-Free Visual Customization via View Iterative Self-Attention Control
ArXiv Computer Vision research for Monday, June 10, 2024.
00:20: ReCon1M:A Large-scale Benchmark Dataset for Relation Comprehension in Remote Sensing Imagery
01:59: Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset
03:44: Vript: A Video Is Worth Thousands of Words
05:38: FRAG: Frequency Adapting Group for Diffusion Video Editing
06:50: Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training
08:38: Robust Latent Representation Tuning for Image-text Classification
09:46: Generalizable Human Gaussians from Single-View Image
11:05: ProcessPainter: Learn Painting Process from Sequence Data
12:29: PointABM:Integrating Bidirectional State Space Model with Multi-Head Self-Attention for Point Cloud Analysis
13:41: Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control
15:00: Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks
16:14: GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
17:54: Texture Re-scalable Universal Adversarial Perturbation
19:44: W-Net: One-Shot Arbitrary-Style Chinese Character Generation with Deep Neural Networks
20:46: ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models
22:04: DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection
23:13: A Comparative Survey of Vision Transformers for Feature Extraction in Texture Analysis
25:15: Extending Segment Anything Model into Auditory and Temporal Dimensions for Audio-Visual Segmentation
26:36: Generalized Nested Latent Variable Models for Lossy Coding applied to Wind Turbine Scenarios
27:48: Black carbon plumes from gas flaring in North Africa identified from multi-spectral imagery with deep learning
28:58: An Effective-Efficient Approach for Dense Multi-Label Action Detection
30:42: 2DP-2MRC: 2-Dimensional Pointer-based Machine Reading Comprehension Method for Multimodal Moment Retrieval
31:49: iMotion-LLM: Motion Prediction Instruction Tuning
33:05: Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis
34:57: Data Augmentation in Earth Observation: A Diffusion Model Approach
36:22: UEMM-Air: A Synthetic Multi-modal Dataset for Unmanned Aerial Vehicle Object Detection
37:49: UnSupDLA: Towards Unsupervised Document Layout Analysis
39:11: I-MPN: Inductive Message Passing Network for Effective and Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data
40:46: Tuning-Free Visual Customization via View Iterative Self-Attention Control