TechcraftingAI Computer Vision

Ep. 237 - Part 2 - June 3, 2024


Listen Later

ArXiv Computer Vision research for Monday, June 03, 2024.


00:20: Patch-Based Encoder-Decoder Architecture for Automatic Transmitted Light to Fluorescence Imaging Transition: Contribution to the LightMyCells Challenge

01:26: UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation

03:18: S-CycleGAN: Semantic Segmentation Enhanced CT-Ultrasound Image-to-Image Translation for Robotic Ultrasonography

04:26: AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation

06:04: 3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information

07:21: Scaling Up Deep Clustering Methods Beyond ImageNet-1K

08:36: GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer

10:11: Augmented Commonsense Knowledge for Remote Object Grounding

11:47: FreeTumor: Advance Tumor Segmentation via Large-Scale Tumor Synthesis

13:44: fruit-SALAD: A Style Aligned Artwork Dataset to reveal similarity perception in image embeddings

15:06: Capsule Enhanced Variational AutoEncoder for Underwater Image Reconstruction

16:51: Enhancing Dynamic CT Image Reconstruction with Neural Fields Through Explicit Motion Regularizers

18:01: pOps: Photo-Inspired Diffusion Operators

19:25: Pulmonary Embolism Mortality Prediction Using Multimodal Learning Based on Computed Tomography Angiography and Clinical Data

21:52: Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization

23:21: Scale-Free Image Keypoints Using Differentiable Persistent Homology

24:14: Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUs

25:32: TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

27:29: HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models

29:13: ARCH2S: Dataset, Benchmark and Challenges for Learning Exterior Architectural Structures from Point Clouds

30:04: Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation

31:46: Differentially Private Fine-Tuning of Diffusion Models

33:40: MP-PolarMask: A Faster and Finer Instance Segmentation for Concave Images

34:57: From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation

36:12: Convolutional Unscented Kalman Filter for Multi-Object Tracking with Outliers

37:41: AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation

39:41: TE-NeXt: A LiDAR-Based 3D Sparse Convolutional Network for Traversability Estimation

...more
View all episodesView all episodes
Download on the App Store

TechcraftingAI Computer VisionBy Brad Edwards