June 04, 2024

Ep. 236 - June 2, 2024

1 hour 15 minutes

ArXiv Computer Vision research for Sunday, June 02, 2024.

00:20: Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering

02:12: SAM-LAD: Segment Anything Model Meets Zero-Shot Logic Anomaly Detection

04:12: Correlation Matching Transformation Transformers for UHD Image Restoration

06:06: MGI: Multimodal Contrastive pre-training of Genomic and Medical Imaging

07:23: Diff-Mosaic: Augmenting Realistic Representations in Infrared Small Target Detection via Diffusion Prior

09:19: T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences

11:00: Representing Animatable Avatar via Factorized Neural Fields

12:24: An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition

14:01: Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance

15:18: SimSAM: Zero-shot Medical Image Segmentation via Simulated Interaction

16:57: Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

18:16: Task-oriented Embedding Counts: Heuristic Clustering-driven Feature Fine-tuning for Whole Slide Image Classification

19:49: W-Net: A Facial Feature-Guided Face Super-Resolution Network

21:25: Exploiting Frequency Correlation for Hyperspectral Image Reconstruction

22:46: Deciphering Oracle Bone Language with Diffusion Models

24:07: Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training

25:29: Lay-A-Scene: Personalized 3D Object Arrangement Using Text-to-Image Priors

26:35: Bilinear-Convolutional Neural Network Using a Matrix Similarity-based Joint Loss Function for Skin Disease Classification

27:54: Towards General Robustness Verification of MaxPool-based Convolutional Neural Networks via Tightening Linear Approximation

29:22: An Optimized Toolbox for Advanced Image Processing with Tsetlin Machine Composites

30:47: A Survey of Deep Learning Based Radar and Vision Fusion for 3D Object Detection in Autonomous Driving

32:07: Explore Internal and External Similarity for Single Image Deraining with Graph Neural Networks

33:48: CCF: Cross Correcting Framework for Pedestrian Trajectory Prediction

35:38: Freeplane: Unlocking Free Lunch in Triplane-Based Sparse-View Reconstruction Models

37:09: Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption

38:53: Unsupervised Contrastive Analysis for Salient Pattern Detection using Conditional Diffusion Models

40:42: Diffusion Features to Bridge Domain Gap for Semantic Segmentation

42:26: AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark

43:46: Towards Point Cloud Compression for Machine Perception: A Simple and Strong Baseline by Learning the Octree Depth Level Predictor

45:19: PruNeRF: Segment-Centric Dataset Pruning via 3D Spatial Consistency

46:36: EchoNet-Synthetic: Privacy-preserving Video Generation for Safe Medical Data Sharing

48:07: Stealing Image-to-Image Translation Models With a Single Query

49:21: Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

51:08: Eating Smart: Advancing Health Informatics with the Grounding DINO based Dietary Assistant App

52:21: DistilDIRE: A Small, Fast, Cheap and Lightweight Diffusion Synthesized Deepfake Detection

53:40: Streaming quanta sensors for online, high-performance imaging and vision

55:14: OLIVE: Object Level In-Context Visual Embeddings

56:31: Visual place recognition for aerial imagery: A survey

57:54: Global High Categorical Resolution Land Cover Mapping via Weak Supervision

59:49: DDA: Dimensionality Driven Augmentation Search

...more

By Brad Edwards