
Sign up to save your podcasts
Or
This technical report introduces Mobile-MMLU, a new benchmark designed to evaluate large language models (LLMs) specifically for mobile devices, addressing the limitations of existing benchmarks which focus on desktop or server environments. Mobile-MMLU and its challenging subset, Mobile-MMLU-Pro, consist of thousands of multiple-choice questions across 80 mobile-relevant domains, emphasizing practical daily tasks and on-device AI constraints like efficiency and privacy. The creation process involved AI and human collaboration to generate and refine questions, ensuring relevance and mitigating biases. Evaluation results show that Mobile-MMLU effectively differentiates the performance of LLMs in mobile contexts, revealing that strong performance on traditional benchmarks doesn't guarantee success on mobile tasks.
This technical report introduces Mobile-MMLU, a new benchmark designed to evaluate large language models (LLMs) specifically for mobile devices, addressing the limitations of existing benchmarks which focus on desktop or server environments. Mobile-MMLU and its challenging subset, Mobile-MMLU-Pro, consist of thousands of multiple-choice questions across 80 mobile-relevant domains, emphasizing practical daily tasks and on-device AI constraints like efficiency and privacy. The creation process involved AI and human collaboration to generate and refine questions, ensuring relevance and mitigating biases. Evaluation results show that Mobile-MMLU effectively differentiates the performance of LLMs in mobile contexts, revealing that strong performance on traditional benchmarks doesn't guarantee success on mobile tasks.