CD Voice

英语新闻丨AI赋能中文语言数据库建设指南


Listen Later

China is accelerating the digitalization of ancient texts and boosting access to oracle bone script data, aiming to integrate cultural heritage with digital Chinese, officials said on Monday.

中国正加速推进古籍数字化进程并扩大甲骨文数据开放,旨在将文化遗产保护与数字中文建设相结合。

The Ministry of Education, the National Language Commission and the Cyberspace Administration of China issued a guideline to promote the digitalization of the Chinese language and characters. The focus is on developing national language resources and large-scale Chinese language models to support artificial intelligence.

有关部门周一表示,教育部、国家语言文字工作委员会及中央网信办已联合发布《关于推进语言文字数字化的指导意见》,重点开发国家语言资源和大规模中文语言模型,为人工智能发展提供支持。

The guideline aims to establish a national corpus and strategic language resources information database by 2027. By 2035, the country hopes it will have significantly expanded the presence of the Chinese language in global digital and generative AI scenarios.

该指南提出,到2027年将建成国家语料库和战略语言资源信息库;至2035年,中文在全球数字化场景及生成式人工智能领域的应用影响力将显著提升。

Liu Peijun, head of the Department of Language Information Management at the Ministry of Education, said the guideline calls for the digitalization of linguistic and cultural heritage, while promoting the construction of a national digital language and script museum.

教育部语言文字信息管理司司长刘培俊表示,该指南要求推进语言文化遗产数字化,同时推动建设国家数字语言文字博物馆。

It emphasizes advancing key technologies for ancient text digitalization, enhancing the accessibility of oracle bone script data and launching a multilingual digital education program to facilitate Chinese language learning globally, Liu said at a news conference.

刘培俊在新闻发布会上强调,需重点突破古籍数字化关键技术,增强甲骨文数据的可获取性,并启动多语种数字教育计划,助力中文教育的全球化发展。

A key aspect of this initiative is the development of large-scale linguistic data resources. The guideline outlines a plan to build a national corpus with extensive Chinese language datasets to support AI applications.

该计划聚焦大规模语言数据资源建设。根据指南要求,将系统性构建国家语料库,整合海量中文数据集,为人工智能应用提供支撑。

Among the pilot projects, Beijing Normal University has launched a large-scale Classical Chinese language model, an AI-driven initiative that sets a new benchmark in the field, Liu said.

在试点项目中,北京师范大学已推出大规模文言文语言模型。刘培俊指出,这一人工智能驱动的举措为该领域树立了新的标杆。

Kang Zhen, vice-president of BNU, said the university has developed a range of digital language databases, including a comprehensive holographic Chinese character database, a digital resource of the ancient Chinese dictionary Shuowen Jiezi, and repositories for ancient inscriptions and handwritten texts.

北师大副校长康震表示,该校已构建包括全息汉字数据库、《说文解字》数字资源库,古代铭文及手写文本库在内的系列数字化语言数据库体系。

These resources have played a crucial role in linguistic research and cultural preservation, Kang added.

康震补充称,这些资源对语言研究和文化保护发挥了关键作用。

The university's AI Taiyan, a Classical Chinese large language model trained with 1.8 billion parameters, has been designed for high-accuracy interpretation of ancient texts, supporting tasks such as word and phrase explanations, as well as classical-to-modern Chinese translation.

该校研发的文言文大语言模型“AI太炎”基于18亿个参数训练出来的古汉语大型语言模型,专为高精度古籍解读而设计,可支持字词释义、文言文与现代汉语互译等任务。

China is also spearheading the construction of a new national corpus to strengthen linguistic infrastructure in the AI era, said Wang Hui, deputy head of the Ministry of Education's Department of Language Application and Administration.

教育部语言文字应用管理司副司长王晖表示,中国正带头建设新型国家语料库,以强化人工智能时代的语言基础设施。

"Currently, most linguistic datasets remain limited to single-text formats and specific academic domains, lacking the scale and diversity required for AI applications," Wang said.

王晖指出,当前语言数据资源仍主要集中于纯文本形态与特定学术研究领域,在数据规模与类型多样性方面存在明显不足,难以满足人工智能技术发展的多维需求。

The department has begun planning for the corpus this year, seeking to launch two flagship databases, the Chinese civilization corpus for AI-assisted teaching and research, and the Chinese grand reading system corpus, Wang said.

王晖表示,该司今年已启动语料库规划,计划推出两大核心数据库:一是支撑人工智能辅助教学研究的中华文明语料库,二是中华经典诵读系统语料库。

oracle bone script

甲骨文

national corpus

国家语料库

the National Language Commission

国家语言文字工作委员会

strategic language resources information database

战略语言资源信息库

cultural heritage

文化遗产

ancient text digitalization

古籍数字化

benchmark

n.标杆

spearhead

v.带头;先锋



...more
View all episodesView all episodes
Download on the App Store

CD VoiceBy China Daily

  • 4
  • 4
  • 4
  • 4
  • 4

4

9 ratings


More shows like CD Voice

View all
潘吉Jenny告诉你|学英语聊美国|开言英语 · Podcast by OpenLanguage 英语

潘吉Jenny告诉你|学英语聊美国|开言英语 · Podcast

431 Listeners

The Beijing Hour by China Plus

The Beijing Hour

29 Listeners

Round Table China by China Plus

Round Table China

149 Listeners

雅思口语新周刊 English Podcast by 雅思口语家森Jason

雅思口语新周刊 English Podcast

8 Listeners

和Emily一起练口语(附中英双语字幕) by 英语主播Emily

和Emily一起练口语(附中英双语字幕)

19 Listeners

早安英文-每日外刊精读 by 早安英文

早安英文-每日外刊精读

34 Listeners

双语早餐【英语漫游 英语口语听力每天学 Learning English】 by 喜马拉雅播客

双语早餐【英语漫游 英语口语听力每天学 Learning English】

1 Listeners

雅思口语IELTS English2025 by 英语口语家森Jason

雅思口语IELTS English2025

25 Listeners

Headline News by China Plus

Headline News

13 Listeners

一席英语·脱口秀:老外来了 by 一席英语

一席英语·脱口秀:老外来了

45 Listeners

每天5分钟,轻松学口语 by 哩滴吖小姐姐_超甜糖

每天5分钟,轻松学口语

10 Listeners

英语每日一听 | 每天少于5分钟 by 晨听英语

英语每日一听 | 每天少于5分钟

4 Listeners

英文小酒馆 LHH by 英文小酒馆 LHH

英文小酒馆 LHH

15 Listeners

高效磨耳朵 | 最好的英语听力资源 by 英语磨耳朵

高效磨耳朵 | 最好的英语听力资源

61 Listeners

Deep Dive by China Plus

Deep Dive

4 Listeners

全国两会现场播报 | 中国日报 by 中国日报

全国两会现场播报 | 中国日报

0 Listeners