June 06, 2024

Ep. 239 - Part 1 - June 5, 2024

36 minutes

ArXiv Computer Vision research for Wednesday, June 05, 2024.

00:20: A Self-Supervised Denoising Strategy for Underwater Acoustic Camera Imageries

01:26: Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models

02:40: U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation

04:14: Exploring Data Efficiency in Zero-Shot Learning with Diffusion Models

06:09: P2PFormer: A Primitive-to-polygon Method for Regular Building Contour Extraction from Remote Sensing Images

08:03: Radiomics-guided Multimodal Self-attention Network for Predicting Pathological Complete Response in Breast MRI

09:08: AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection

10:49: Understanding the Impact of Negative Prompts: When and How Do They Take Effect?

12:03: Adversarial Generation of Hierarchical Gaussians for 3D Generative Model

13:45: Event3DGS: Event-based 3D Gaussian Splatting for Fast Egomotion

15:19: DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection

16:56: Sparse Color-Code Net: Real-Time RGB-Based 6D Object Pose Estimation on Edge Devices

18:07: Self-Supervised Skeleton Action Representation Learning: A Benchmark and Beyond

20:01: Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment

21:05: Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification

22:52: A Human-Annotated Video Dataset for Training and Evaluation of 360-Degree Video Summarization Methods

23:53: EdgeSync: Faster Edge-model Updating via Adaptive Continuous Learning for Video Data Drift

25:13: Phy-Diff: Physics-guided Hourglass Diffusion Model for Diffusion MRI Synthesis

26:34: DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

27:58: DifAttack++: Query-Efficient Black-Box Adversarial Attack via Hierarchical Disentangled Feature Space in Cross Domain

29:35: Puzzle Pieces Picker: Deciphering Ancient Chinese Characters with Radical Reconstruction

31:02: Instructing Prompt-to-Prompt Generation for Zero-Shot Learning

32:31: Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control

34:20: Giving each task what it needs -- leveraging structured sparsity for tailored multi-task learning

...more

By Brad Edwards