
Sign up to save your podcasts
Or
ArXiv NLP research for Sunday, June 09, 2024.
00:19: How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
01:40: DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation
03:25: Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses
05:08: MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
06:17: SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models
08:11: Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions
09:54: MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation
11:20: QGEval: A Benchmark for Question Generation Evaluation
12:44: MrRank: Improving Question Answering Retrieval System through Multi-Result Ranking Model
13:43: Arabic Diacritics in the Wild: Exploiting Opportunities for Improved Diacritization
14:46: The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
16:30: RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation
18:14: Hidden Holes: topological aspects of language models
19:46: Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper
20:40: Seventeenth-Century Spanish American Notary Records for Fine-Tuning Spanish Large Language Models
22:02: MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering
23:12: II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
25:17: Zero-Shot End-To-End Spoken Question Answering In Medical Domain
26:27: Are Large Language Models Actually Good at Text Style Transfer?
27:32: Feriji: A French-Zarma Parallel Corpus, Glossary & Translator
28:56: TTM-RE: Memory-Augmented Document-Level Relation Extraction
30:12: Why Don't Prompt-Based Fairness Metrics Correlate?
31:27: Hello Again! LLM-powered Personalized Agent for Long-term Dialogue
33:12: Semisupervised Neural Proto-Language Reconstruction
34:12: Prompting Large Language Models with Audio for General-Purpose Speech Summarization
35:14: A Dual-View Approach to Classifying Radiology Reports by Co-Training
36:07: ThaiCoref: Thai Coreference Resolution Dataset
ArXiv NLP research for Sunday, June 09, 2024.
00:19: How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
01:40: DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation
03:25: Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses
05:08: MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
06:17: SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models
08:11: Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions
09:54: MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation
11:20: QGEval: A Benchmark for Question Generation Evaluation
12:44: MrRank: Improving Question Answering Retrieval System through Multi-Result Ranking Model
13:43: Arabic Diacritics in the Wild: Exploiting Opportunities for Improved Diacritization
14:46: The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
16:30: RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation
18:14: Hidden Holes: topological aspects of language models
19:46: Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper
20:40: Seventeenth-Century Spanish American Notary Records for Fine-Tuning Spanish Large Language Models
22:02: MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering
23:12: II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
25:17: Zero-Shot End-To-End Spoken Question Answering In Medical Domain
26:27: Are Large Language Models Actually Good at Text Style Transfer?
27:32: Feriji: A French-Zarma Parallel Corpus, Glossary & Translator
28:56: TTM-RE: Memory-Augmented Document-Level Relation Extraction
30:12: Why Don't Prompt-Based Fairness Metrics Correlate?
31:27: Hello Again! LLM-powered Personalized Agent for Long-term Dialogue
33:12: Semisupervised Neural Proto-Language Reconstruction
34:12: Prompting Large Language Models with Audio for General-Purpose Speech Summarization
35:14: A Dual-View Approach to Classifying Radiology Reports by Co-Training
36:07: ThaiCoref: Thai Coreference Resolution Dataset