From Oracle to Intelligent Cloud - Drawing a New Volume of Chinese Development with Digital Wizard

 Recently, the Ministry of Education, the National Language Commission, and the Central Cyberspace Affairs Office jointly issued the "Opinions on Strengthening the Construction of Digital Chinese and Promoting the Development of Language and Character Informatization" (hereinafter referred to as the "Opinions"), which made comprehensive arrangements for accelerating the promotion of the high-quality development of the language and character undertakings through informatization, and empowering language and characters to better serve modernization construction with digitalization.

  When the thousand-year-old documents of the Dunhuang Sutra Cave awaken in the digital world, the marks of oracle bones leaps with data to stay in the clouds... Digital Chinese uses code as the pen and algorithm as the ink to connect the past and the future in the interweaving of virtual and real.

  Digital intelligence empowers high-quality development of language and writing

  Language and characters "learning daily without observing it, and using it without realizing it" are widely present in all aspects of social production.

  Today, China has built the world's largest language resource library and Chinese language resource knowledge map, integrating more than 120 languages ​​and dialect resources. This year, the national language and text usage survey will be implemented for the first time, creating an integrated survey platform integrating data collection, transmission, storage and processing, providing big data support for deepening the comprehensive reform of education and comprehensive national strength analysis.

  In order to accelerate the promotion of language and text informatization, the "Opinions" proposes to take digital Chinese as an important task in serving the construction of digital China and a prominent focus for comprehensively promoting the development of language and text informatization, focus on promoting Chinese digitalization and culture in data, and improve the construction of a new Chinese service system and the language and text governance system.

  Liu Peijun, Director of the Language and Character Information Management Department of the Ministry of Education, introduced that China has issued more than 100 national standard for informatization of common languages ​​and ethnic languages, laying the standardized foundation for the application innovation of natural language processing technology in the fields of artificial intelligence, digital products and information industries.

  The extensive development of intelligent learning of language and characters has effectively served educational reform and innovation. For example, we have carried out Mandarin proficiency tests at a high level, fully realized the transformation from artificial to intelligent Mandarin testing methods, and issued more than 90 million electronic certificates. In Guangdong, the first smart examination room for Mandarin proficiency testing in the country has been built, and the examination room has created the "as-as-you-can-eat" test model, which has greatly improved the efficiency of Mandarin testing.

  The intelligent dissemination of language civilization connects the world and also effectively serves international exchanges and mutual learning. Through digital empowerment, the words written in ancient books have been "revitalized", a database of Chinese ideological and cultural terms has been built, and more than 1,200 ideological and cultural terms that reflect the core and essential in the Chinese nation's discourse system are spread to the international community, and multilingual digital copyright cooperation has been carried out with more than 40 countries and regions.

  "China has built an integrated, intelligent and international global Chinese learning platform with more than 16 million users, covering more than 190 countries and regions, and has in-depth cooperation to establish alliances. The Chinese Learning Alliance cloud service platform provides 30,000 online courses and cooperates with more than 1,600 institutions in China and abroad to promote the realization that Chinese people can learn and use them at all times, and can be learned and used easily." Liu Peijun said.

  Building a new national corpus

  This year, the Ministry of Education launched the construction of a new national corpus. The "Opinions" clearly state that by 2027, the national key corpus and the national strategic language resource information database will be initially built.

  Why is the new national corpus so important? What role will it play in the informatization of language and characters?

  "At present, artificial intelligence technology innovation represented by DeepSeek and others has made continuous breakthroughs. Against this background, the country has proposed such a strategic deployment to build a new national corpus, highlighting its importance, necessity and importance." Wang Hui, deputy director of the Language and Word Application Management Department of the Ministry of Education, said.

  At this stage, there are multiple corpuses in the fields of language education, teaching and research, but many corpuses are still in the stage of single text model and field application. These corpus still have shortcomings in the construction concept, technology and method, scale, as well as data diversity, timeliness, and large-scale applications combined with artificial intelligence, and are difficult to meet the diverse, dynamic, and especially intelligent language data needs.

  To find this difficulty, Wang Hui introduced that building a new national corpus is based on the background of the era of artificial intelligence, breaking through the single text model and field application barriers of traditional corpus, taking large-scale and intelligent computing as the core, and taking new quality, multi-modal, multilingual, large-scale, and global characteristics as the highlights, providing standardized, credible and high-quality language and cultural corpus resources for the application and innovative development of multiple scenarios in general and subdivided fields.

  "It mainly includes two aspects: one is standardized leadership, mainly to strengthen the supply of systems, develop corpus construction standards, highlight value orientation, application orientation, innovation orientation, coordinate quality and safety, and provide basic principles and method guidance for corpus construction. The second is demonstration guidance, mature first, develop and build the "Chinese cultural context new corpus" and "Chinese reading system corpus". The overall construction of these two demonstration databases can also be simply understood to target smart teachers, and the "Chinese reading system corpus" target smart school companions." Wang Hui said.

  Digital Chinese promotes industrial upgrading

  In the 1980s, the Wang Xuan team of Peking University invented laser illumination technology, combined with Chinese character coding standards, breaking through the spatial limitations of Chinese digitalization, allowing Chinese that carries Chinese culture to be reborn in the global Internet space. It was a transformation from "lead and fire" to "light and electricity", and now, large-language model technology has put forward unprecedented demands for large-scale high-quality corpus, giving new historical connotations and missions to the culture in data.

  The historical stages are different, but opportunities and challenges are similar.

  Tang Zhi, director of the Wangxuan Computer Research Institute of Peking University, believes that at present, the development of Chinese information processing technology has gone from solving the basic problems of Chinese characters input and output in the past to the advancement of releasing the value of language and text data elements first.

  The "Opinions" propose to implement Digital Chinese to promote industrial upgrading. Support the development of new products, new occupations and new business forms of language and text information technology, encourage the digital transformation and upgrading of traditional language industries, and cultivate a new language industry based on digital Chinese. Promote the research and development and application of software and hardware products such as language resources, language translation, intelligent robots, and Chinese content services, support the formation of industrial agglomeration around the ecology of voice, corpus, and language application, and encourage the creation of language industry application demonstration brands.

  "Under the new situation, language and text will transform from realizing 'static symbols' to 'dynamic digital assets' and from 'information carrier' to 'production factors'. We must focus on promoting the development of standards such as corpus, data annotation and evaluation, and support various tasks such as text generation and understanding, language translation, and sentiment analysis." Tang Zhi said that artificial intelligence is developing rapidly, and the innovative application of language and text information processing technology is undergoing a paradigm change from "GB2312 character set" to "trillion-parameter large language model". Language and text will achieve deep integration with information technology in the future, forming a virtuous cycle of "technical breakthrough - scenario implementation - ecological prosperity". (Reporter Sun Yahui)

[Editor in charge: Zhao Wenhan]

Comment

Dedicated to interviewing and publishing global news events.