ArXiv Computer Vision research for Saturday, June 01, 2024.
00:20: Complex Style Image Transformations for Domain Generalization in Medical Images
01:36: HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model
03:03: From Seedling to Harvest: The GrowingSoy Dataset for Weed Detection in Soy Crops via Instance Segmentation
04:30: Quality Sentinel: Estimating Label Quality and Errors in Medical Segmentation Datasets
06:22: Whole Heart 3D+T Representation Learning Through Sparse 2D Cardiac MR Images
08:08: Image Captioning via Dynamic Path Customization
09:29: DSCA: A Digital Subtraction Angiography Sequence Dataset and Spatio-Temporal Model for Cerebral Artery Segmentation
11:29: DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection
12:55: Details Enhancement in Unsigned Distance Field Learning for High-fidelity 3D Surface Reconstruction
14:21: E$^3$-Net: Efficient E(3)-Equivariant Normal Estimation Network
15:30: An Effective Weight Initialization Method for Deep Learning: Application to Satellite Image Classification
16:52: SynthBA: Reliable Brain Age Estimation Across Multiple MRI Sequences and Resolutions
18:40: SpikeMM: Flexi-Magnification of High-Speed Micro-Motions
20:21: CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation
21:51: DS@BioMed at ImageCLEFmedical Caption 2024: Enhanced Attention Mechanisms in Medical Caption Generation through Concept Detection Integration
23:17: Arabic Handwritten Text for Person Biometric Identification: A Deep Learning Approach
24:32: Multimodal Metadata Assignment for Cultural Heritage Artifacts
25:33: You Only Need Less Attention at Each Stage in Vision Transformers
27:02: Towards Generalizable Multi-Object Tracking
28:16: Localize, Understand, Collaborate: Semantic-Aware Dragging via Intention Reasoner
29:45: MoDGS: Dynamic Gaussian Splatting from Causually-captured Monocular Videos
30:52: Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture
32:28: GLCAN: Global-Local Collaborative Auxiliary Network for Local Learning
33:41: DroneVis: Versatile Computer Vision Library for Drones
34:28: Bilateral Guided Radiance Field Processing
36:11: Dual Hyperspectral Mamba for Efficient Spectral Compressive Imaging
37:59: The Curious Case of End Token: A Zero-Shot Disentangled Image Editing using CLIP
38:53: Pedestrian intention prediction in Adverse Weather Conditions with Spiking Neural Networks and Dynamic Vision Sensors
40:28: Adapting Fine-Grained Cross-View Localization to Areas without Fine Ground Truth
41:59: End-to-End Model-based Deep Learning for Dual-Energy Computed Tomography Material Decomposition
43:00: AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
44:38: Effectiveness of Vision Language Models for Open-world Single Image Test Time Adaptation
45:54: Research on the Application of Computer Vision Based on Deep Learning in Autonomous Driving Technology
47:12: SAM-VMNet: Deep Neural Networks For Coronary Angiography Vessel Segmentation
48:58: Audio-Visual Talker Localization in Video for Spatial Sound Reproduction
50:27: 2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation
51:36: Diffusion-based Image Generation for In-distribution Data Augmentation in Surface Defect Detection
53:22: Improving Text Generation on Images with Synthetic Captions
54:24: FlowIE: Efficient Image Enhancement via Rectified Flow
56:00: Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection
57:29: On the use of first and second derivative approximations for biometric online signature recognition
58:07: Memory-guided Network with Uncertainty-based Feature Augmentation for Few-shot Semantic Segmentation