
Sign up to save your podcasts
Or
ArXiv Computer Vision research for Sunday, June 02, 2024.
00:20: Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
02:12: SAM-LAD: Segment Anything Model Meets Zero-Shot Logic Anomaly Detection
04:12: Correlation Matching Transformation Transformers for UHD Image Restoration
06:06: MGI: Multimodal Contrastive pre-training of Genomic and Medical Imaging
07:23: Diff-Mosaic: Augmenting Realistic Representations in Infrared Small Target Detection via Diffusion Prior
09:19: T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences
11:00: Representing Animatable Avatar via Factorized Neural Fields
12:24: An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition
14:01: Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance
15:18: SimSAM: Zero-shot Medical Image Segmentation via Simulated Interaction
16:57: Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
18:16: Task-oriented Embedding Counts: Heuristic Clustering-driven Feature Fine-tuning for Whole Slide Image Classification
19:49: W-Net: A Facial Feature-Guided Face Super-Resolution Network
21:25: Exploiting Frequency Correlation for Hyperspectral Image Reconstruction
22:46: Deciphering Oracle Bone Language with Diffusion Models
24:07: Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training
25:29: Lay-A-Scene: Personalized 3D Object Arrangement Using Text-to-Image Priors
26:35: Bilinear-Convolutional Neural Network Using a Matrix Similarity-based Joint Loss Function for Skin Disease Classification
27:54: Towards General Robustness Verification of MaxPool-based Convolutional Neural Networks via Tightening Linear Approximation
29:22: An Optimized Toolbox for Advanced Image Processing with Tsetlin Machine Composites
30:47: A Survey of Deep Learning Based Radar and Vision Fusion for 3D Object Detection in Autonomous Driving
32:07: Explore Internal and External Similarity for Single Image Deraining with Graph Neural Networks
33:48: CCF: Cross Correcting Framework for Pedestrian Trajectory Prediction
35:38: Freeplane: Unlocking Free Lunch in Triplane-Based Sparse-View Reconstruction Models
37:09: Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption
38:53: Unsupervised Contrastive Analysis for Salient Pattern Detection using Conditional Diffusion Models
40:42: Diffusion Features to Bridge Domain Gap for Semantic Segmentation
42:26: AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark
43:46: Towards Point Cloud Compression for Machine Perception: A Simple and Strong Baseline by Learning the Octree Depth Level Predictor
45:19: PruNeRF: Segment-Centric Dataset Pruning via 3D Spatial Consistency
46:36: EchoNet-Synthetic: Privacy-preserving Video Generation for Safe Medical Data Sharing
48:07: Stealing Image-to-Image Translation Models With a Single Query
49:21: Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection
51:08: Eating Smart: Advancing Health Informatics with the Grounding DINO based Dietary Assistant App
52:21: DistilDIRE: A Small, Fast, Cheap and Lightweight Diffusion Synthesized Deepfake Detection
53:40: Streaming quanta sensors for online, high-performance imaging and vision
55:14: OLIVE: Object Level In-Context Visual Embeddings
56:31: Visual place recognition for aerial imagery: A survey
57:54: Global High Categorical Resolution Land Cover Mapping via Weak Supervision
59:49: DDA: Dimensionality Driven Augmentation Search
ArXiv Computer Vision research for Sunday, June 02, 2024.
00:20: Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
02:12: SAM-LAD: Segment Anything Model Meets Zero-Shot Logic Anomaly Detection
04:12: Correlation Matching Transformation Transformers for UHD Image Restoration
06:06: MGI: Multimodal Contrastive pre-training of Genomic and Medical Imaging
07:23: Diff-Mosaic: Augmenting Realistic Representations in Infrared Small Target Detection via Diffusion Prior
09:19: T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences
11:00: Representing Animatable Avatar via Factorized Neural Fields
12:24: An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition
14:01: Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance
15:18: SimSAM: Zero-shot Medical Image Segmentation via Simulated Interaction
16:57: Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
18:16: Task-oriented Embedding Counts: Heuristic Clustering-driven Feature Fine-tuning for Whole Slide Image Classification
19:49: W-Net: A Facial Feature-Guided Face Super-Resolution Network
21:25: Exploiting Frequency Correlation for Hyperspectral Image Reconstruction
22:46: Deciphering Oracle Bone Language with Diffusion Models
24:07: Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training
25:29: Lay-A-Scene: Personalized 3D Object Arrangement Using Text-to-Image Priors
26:35: Bilinear-Convolutional Neural Network Using a Matrix Similarity-based Joint Loss Function for Skin Disease Classification
27:54: Towards General Robustness Verification of MaxPool-based Convolutional Neural Networks via Tightening Linear Approximation
29:22: An Optimized Toolbox for Advanced Image Processing with Tsetlin Machine Composites
30:47: A Survey of Deep Learning Based Radar and Vision Fusion for 3D Object Detection in Autonomous Driving
32:07: Explore Internal and External Similarity for Single Image Deraining with Graph Neural Networks
33:48: CCF: Cross Correcting Framework for Pedestrian Trajectory Prediction
35:38: Freeplane: Unlocking Free Lunch in Triplane-Based Sparse-View Reconstruction Models
37:09: Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption
38:53: Unsupervised Contrastive Analysis for Salient Pattern Detection using Conditional Diffusion Models
40:42: Diffusion Features to Bridge Domain Gap for Semantic Segmentation
42:26: AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark
43:46: Towards Point Cloud Compression for Machine Perception: A Simple and Strong Baseline by Learning the Octree Depth Level Predictor
45:19: PruNeRF: Segment-Centric Dataset Pruning via 3D Spatial Consistency
46:36: EchoNet-Synthetic: Privacy-preserving Video Generation for Safe Medical Data Sharing
48:07: Stealing Image-to-Image Translation Models With a Single Query
49:21: Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection
51:08: Eating Smart: Advancing Health Informatics with the Grounding DINO based Dietary Assistant App
52:21: DistilDIRE: A Small, Fast, Cheap and Lightweight Diffusion Synthesized Deepfake Detection
53:40: Streaming quanta sensors for online, high-performance imaging and vision
55:14: OLIVE: Object Level In-Context Visual Embeddings
56:31: Visual place recognition for aerial imagery: A survey
57:54: Global High Categorical Resolution Land Cover Mapping via Weak Supervision
59:49: DDA: Dimensionality Driven Augmentation Search