TechcraftingAI Computer Vision

Ep. 246 - Part 3 - June 12, 2024


Listen Later

ArXiv Computer Vision research for Wednesday, June 12, 2024.


00:20: From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition

02:09: APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentatio

03:57: 2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction

05:47: DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor

06:58: Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze

08:02: LaneCPP: Continuous 3D Lane Detection using Physical Priors

09:23: FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

11:10: VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

12:46: MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

14:39: OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

16:49: AWGUNET: Attention-Aided Wavelet Guided U-Net for Nuclei Segmentation in Histopathology Images

18:15: Diffusion Soup: Model Merging for Text-to-Image Diffusion Models

19:58: Coherent Optical Modems for Full-Wavefield Lidar

21:32: Transformation-Dependent Adversarial Attacks

22:45: PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement

24:10: GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

25:57: ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery

27:26: Self-supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement

28:51: Real2Code: Reconstruct Articulated Objects via Code Generation

30:02: Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models

31:42: RMem: Restricted Memory Banks Improve Video Object Segmentation

33:12: What If We Recaption Billions of Web Images with LLaMA-3?

34:42: Real3D: Scaling Up Large Reconstruction Models with Real-World Images

36:07: Enhancing End-to-End Autonomous Driving with Latent World Model

37:12: Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation

38:43: On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models

40:16: Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

42:15: ICE-G: Image Conditional Editing of 3D Gaussian Splats

...more
View all episodesView all episodes
Download on the App Store

TechcraftingAI Computer VisionBy Brad Edwards