
Sign up to save your podcasts
Or
ArXiv Computer Vision research for Monday, June 03, 2024.
00:20: Patch-Based Encoder-Decoder Architecture for Automatic Transmitted Light to Fluorescence Imaging Transition: Contribution to the LightMyCells Challenge
01:26: UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation
03:18: S-CycleGAN: Semantic Segmentation Enhanced CT-Ultrasound Image-to-Image Translation for Robotic Ultrasonography
04:26: AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation
06:04: 3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information
07:21: Scaling Up Deep Clustering Methods Beyond ImageNet-1K
08:36: GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
10:11: Augmented Commonsense Knowledge for Remote Object Grounding
11:47: FreeTumor: Advance Tumor Segmentation via Large-Scale Tumor Synthesis
13:44: fruit-SALAD: A Style Aligned Artwork Dataset to reveal similarity perception in image embeddings
15:06: Capsule Enhanced Variational AutoEncoder for Underwater Image Reconstruction
16:51: Enhancing Dynamic CT Image Reconstruction with Neural Fields Through Explicit Motion Regularizers
18:01: pOps: Photo-Inspired Diffusion Operators
19:25: Pulmonary Embolism Mortality Prediction Using Multimodal Learning Based on Computed Tomography Angiography and Clinical Data
21:52: Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization
23:21: Scale-Free Image Keypoints Using Differentiable Persistent Homology
24:14: Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUs
25:32: TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
27:29: HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models
29:13: ARCH2S: Dataset, Benchmark and Challenges for Learning Exterior Architectural Structures from Point Clouds
30:04: Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation
31:46: Differentially Private Fine-Tuning of Diffusion Models
33:40: MP-PolarMask: A Faster and Finer Instance Segmentation for Concave Images
34:57: From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation
36:12: Convolutional Unscented Kalman Filter for Multi-Object Tracking with Outliers
37:41: AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
39:41: TE-NeXt: A LiDAR-Based 3D Sparse Convolutional Network for Traversability Estimation
ArXiv Computer Vision research for Monday, June 03, 2024.
00:20: Patch-Based Encoder-Decoder Architecture for Automatic Transmitted Light to Fluorescence Imaging Transition: Contribution to the LightMyCells Challenge
01:26: UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation
03:18: S-CycleGAN: Semantic Segmentation Enhanced CT-Ultrasound Image-to-Image Translation for Robotic Ultrasonography
04:26: AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation
06:04: 3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information
07:21: Scaling Up Deep Clustering Methods Beyond ImageNet-1K
08:36: GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
10:11: Augmented Commonsense Knowledge for Remote Object Grounding
11:47: FreeTumor: Advance Tumor Segmentation via Large-Scale Tumor Synthesis
13:44: fruit-SALAD: A Style Aligned Artwork Dataset to reveal similarity perception in image embeddings
15:06: Capsule Enhanced Variational AutoEncoder for Underwater Image Reconstruction
16:51: Enhancing Dynamic CT Image Reconstruction with Neural Fields Through Explicit Motion Regularizers
18:01: pOps: Photo-Inspired Diffusion Operators
19:25: Pulmonary Embolism Mortality Prediction Using Multimodal Learning Based on Computed Tomography Angiography and Clinical Data
21:52: Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization
23:21: Scale-Free Image Keypoints Using Differentiable Persistent Homology
24:14: Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUs
25:32: TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
27:29: HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models
29:13: ARCH2S: Dataset, Benchmark and Challenges for Learning Exterior Architectural Structures from Point Clouds
30:04: Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation
31:46: Differentially Private Fine-Tuning of Diffusion Models
33:40: MP-PolarMask: A Faster and Finer Instance Segmentation for Concave Images
34:57: From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation
36:12: Convolutional Unscented Kalman Filter for Multi-Object Tracking with Outliers
37:41: AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
39:41: TE-NeXt: A LiDAR-Based 3D Sparse Convolutional Network for Traversability Estimation