June 11, 2024

Ep. 259 - June 9, 2024

37 minutes

ArXiv NLP research for Sunday, June 09, 2024.

00:19: How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States

01:40: DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation

03:25: Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses

05:08: MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations

06:17: SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models

08:11: Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions

09:54: MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation

11:20: QGEval: A Benchmark for Question Generation Evaluation

12:44: MrRank: Improving Question Answering Retrieval System through Multi-Result Ranking Model

13:43: Arabic Diacritics in the Wild: Exploiting Opportunities for Improved Diacritization

14:46: The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

16:30: RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation

18:14: Hidden Holes: topological aspects of language models

19:46: Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper

20:40: Seventeenth-Century Spanish American Notary Records for Fine-Tuning Spanish Large Language Models

22:02: MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering

23:12: II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

25:17: Zero-Shot End-To-End Spoken Question Answering In Medical Domain

26:27: Are Large Language Models Actually Good at Text Style Transfer?

27:32: Feriji: A French-Zarma Parallel Corpus, Glossary & Translator

28:56: TTM-RE: Memory-Augmented Document-Level Relation Extraction

30:12: Why Don't Prompt-Based Fairness Metrics Correlate?

31:27: Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

33:12: Semisupervised Neural Proto-Language Reconstruction

34:12: Prompting Large Language Models with Audio for General-Purpose Speech Summarization

35:14: A Dual-View Approach to Classifying Radiology Reports by Co-Training

36:07: ThaiCoref: Thai Coreference Resolution Dataset

...more

By Brad Edwards