June 07, 2024

Ep. 256 - Part 1 - June 6, 2024

35 minutes

ArXiv NLP research for Thursday, June 06, 2024.

00:20: Efficient Knowledge Infusion via KG-LLM Alignment

01:25: NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human

02:34: Character-Level Chinese Dependency Parsing via Modeling Latent Intra-Word Structure

03:30: XL-HeadTags: Leveraging Multimodal Retrieval Augmentation for the Multilingual Generation of News Headlines and Tags

04:59: End-to-End Trainable Soft Retriever for Low-resource Relation Extraction

06:07: Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning

07:37: Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores

08:52: ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search

10:29: Chaos with Keywords: Exposing Large Language Models Sycophancy to Misleading Keywords and Evaluating Defense Strategies

11:39: Lean Workbook: A large-scale Lean problem set formalized from natural language math problems

12:56: Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism

14:18: Performance of large language models in numerical vs. semantic medical knowledge: Benchmarking on evidence-based Q&As

16:24: Recovering document annotations for sentence-level bitext

17:40: BLSP-Emo: Towards Empathetic Large Speech-Language Models

19:01: Decoder-only Streaming Transformer for Simultaneous Translation

20:28: Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

21:53: Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models

23:06: How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?

24:13: HeSum: a Novel Dataset for Abstractive Text Summarization in Hebrew

25:19: ArMeme: Propagandistic Content in Arabic Memes

26:26: Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art

27:11: UltraMedical: Building Specialized Generalists in Biomedicine

28:43: Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech

30:02: A + B: A General Generator-Reader Framework for Optimizing LLMs to Unleash Synergy Potential

31:29: On The Persona-based Summarization of Domain-Specific Documents

33:14: Assessing LLMs for Zero-shot Abstractive Summarization Through the Lens of Relevance Paraphrasing

34:28: American Sign Language Handshapes Reflect Pressures for Communicative Efficiency

...more

By Brad Edwards