June 10, 2024

Ep. 241 - Part 1 - June 7, 2024

47 minutes

ArXiv Computer Vision research for Friday, June 07, 2024.

00:20: Image Processing Based Forest Fire Detection

01:08: STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting

03:05: UVCPNet: A UAV-Vehicle Collaborative Perception Network for 3D Object Detection

04:47: UCDNet: Multi-UAV Collaborative 3D Object Detection Network by Reliable Feature Mapping

06:14: SMART: Scene-motion-aware human action recognition framework for mental disorder group

08:12: LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model

09:34: Evaluating and Mitigating IP Infringement in Visual Generative AI

11:01: MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models

12:20: OVMR: Open-Vocabulary Recognition with Multi-Modal References

13:57: ACE Metric: Advection and Convection Evaluation for Accurate Weather Forecasting

15:11: XctDiff: Reconstruction of CT Images with Consistent Anatomical Structures from a Single Radiographic Projection Image

16:22: MTS-Net: Dual-Enhanced Positional Multi-Head Self-Attention for 3D CT Diagnosis of May-Thurner Syndrome

17:58: CDeFuse: Continuous Decomposition for Infrared and Visible Image Fusion

19:41: MGIMM: Multi-Granularity Instruction Multimodal Model for Attribute-Guided Remote Sensing Image Detailed Description

21:24: PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction

22:58: Interpretable Multimodal Out-of-context Detection with Soft Logic Regularization

24:24: SMC++: Masked Learning of Unsupervised Video Semantic Compression

26:19: Diffusion-based Generative Image Outpainting for Recovery of FOV-Truncated CT Images

27:09: MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

28:35: Predictive Dynamic Fusion

29:43: Online Continual Learning of Video Diffusion Models From a Single Video Stream

30:40: A short review on graphonometric evaluation tools in children

31:49: Navigating Efficiency in MobileViT through Gaussian Process on Global Architecture Factors

33:04: EGOR: Efficient Generated Objects Replay for incremental object detection

34:37: 3rd Place Solution for MeViS Track in CVPR 2024 PVUW workshop: Motion Expression guided Video Segmentation

36:02: Multi-Granularity Language-Guided Multi-Object Tracking

37:56: Normal-guided Detail-Preserving Neural Implicit Functions for High-Fidelity 3D Surface Reconstruction

39:52: Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior

41:48: 3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views

43:54: Seeing the Unseen: Visual Metaphor Captioning for Videos

45:09: Zero-Shot Video Editing through Adaptive Sliding Score Distillation

46:28: Labeled Data Selection for Category Discovery

...more

By Brad Edwards