
Sign up to save your podcasts
Or
ArXiv Computer Vision research for Friday, June 07, 2024.
00:21: RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection
01:52: AGBD: A Global-scale Biomass Dataset
03:30: MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
04:52: Faster Than Lies: Real-time Deepfake Detection using Binary Neural Networks
06:03: Leveraging Activations for Superpixel Explanations
07:02: Joint Spatial-Temporal Modeling and Contrastive Learning for Self-supervised Heart Rate Measurement
08:28: Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment
10:10: Multi-style Neural Radiance Field with AdaIN
10:52: Multiplane Prior Guided Few-Shot Aerial Scene Rendering
12:15: Semantic Segmentation on VSPW Dataset through Masked Video Consistency
13:24: CityCraft: A Real Crafter for 3D City Generation
15:21: ProMotion: Prototypes As Motion Learners
16:57: AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation
18:00: Clarifying Myths About the Relationship Between Shape Bias, Accuracy, and Robustness
19:50: GANetic Loss for Generative Adversarial Networks with a Focus on Medical Applications
21:35: Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs
23:28: Bootstrapping Referring Multi-Object Tracking
24:50: Prototype Correlation Matching and Class-Relation Reasoning for Few-Shot Medical Image Segmentation
26:48: GenHeld: Generating and Editing Handheld Objects
27:57: Classification Metrics for Image Explanations: Towards Building Reliable XAI-Evaluations
29:11: Hibou: A Family of Foundational Vision Transformers for Pathology
30:41: Diving Deep into the Motion Representation of Video-Text Models
31:46: CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion
33:18: A Novel Time Series-to-Image Encoding Approach for Weather Phenomena Classification
34:48: LLavaGuard: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment
36:06: Contextual fusion enhances robustness to image blurring
37:01: Energy Propagation in Scattering Convolution Networks Can Be Arbitrarily Slow
38:12: Towards Semantic Equivalence of Tokenization in Multimodal LLM
39:33: PatchSVD: A Non-uniform SVD-based Image Compression Algorithm
40:29: DVOS: Self-Supervised Dense-Pattern Video Object Segmentation
42:16: 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs
ArXiv Computer Vision research for Friday, June 07, 2024.
00:21: RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection
01:52: AGBD: A Global-scale Biomass Dataset
03:30: MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
04:52: Faster Than Lies: Real-time Deepfake Detection using Binary Neural Networks
06:03: Leveraging Activations for Superpixel Explanations
07:02: Joint Spatial-Temporal Modeling and Contrastive Learning for Self-supervised Heart Rate Measurement
08:28: Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment
10:10: Multi-style Neural Radiance Field with AdaIN
10:52: Multiplane Prior Guided Few-Shot Aerial Scene Rendering
12:15: Semantic Segmentation on VSPW Dataset through Masked Video Consistency
13:24: CityCraft: A Real Crafter for 3D City Generation
15:21: ProMotion: Prototypes As Motion Learners
16:57: AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation
18:00: Clarifying Myths About the Relationship Between Shape Bias, Accuracy, and Robustness
19:50: GANetic Loss for Generative Adversarial Networks with a Focus on Medical Applications
21:35: Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs
23:28: Bootstrapping Referring Multi-Object Tracking
24:50: Prototype Correlation Matching and Class-Relation Reasoning for Few-Shot Medical Image Segmentation
26:48: GenHeld: Generating and Editing Handheld Objects
27:57: Classification Metrics for Image Explanations: Towards Building Reliable XAI-Evaluations
29:11: Hibou: A Family of Foundational Vision Transformers for Pathology
30:41: Diving Deep into the Motion Representation of Video-Text Models
31:46: CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion
33:18: A Novel Time Series-to-Image Encoding Approach for Weather Phenomena Classification
34:48: LLavaGuard: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment
36:06: Contextual fusion enhances robustness to image blurring
37:01: Energy Propagation in Scattering Convolution Networks Can Be Arbitrarily Slow
38:12: Towards Semantic Equivalence of Tokenization in Multimodal LLM
39:33: PatchSVD: A Non-uniform SVD-based Image Compression Algorithm
40:29: DVOS: Self-Supervised Dense-Pattern Video Object Segmentation
42:16: 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs