current position:Home>A new breakthrough in AI Simultaneous Interpreting: Sogou simultaneous interpreting 3.0 pioneered the context engine, and the accuracy of PPT content translation increased by 40%

A new breakthrough in AI Simultaneous Interpreting: Sogou simultaneous interpreting 3.0 pioneered the context engine, and the accuracy of PPT content translation increased by 40%

2022-05-07 19:55:56Qingdeng ancient temple

Almost Human reports

Machine center editorial department

This is the first multimodal AI voice simultaneous interpreting product , Sogou simultaneous interpreting 3.0 Bring the accuracy of intelligent simultaneous interpreting to a new level .

last Saturday , Sogou released the industry's first multimodal simultaneous interpreting product —— Sogou simultaneous interpreting 3.0 edition . Based on Sogou's original 「 Context engine 」, Sogou simultaneous interpreting 3.0 Added visual and thinking skills , The simultaneous interpreting of the machine will not only listen to it. , It is also the first time to see 、 Ability to understand and reason . After the first exhibition of this technology , The scene attracted public attention .

AI A new breakthrough in simultaneous interpreting : Sogou simultaneous interpreting 3.0 The first context engine ,PPT The accuracy of content translation is improved 40%


Last Saturday , Sogou simultaneous interpreting 3.0 Debut .

lately , sogou AI Chen Wei, general manager of Interactive Technology Department 、 Zhangjingjing, product director of Sogou simultaneous interpreting, and Zhaochao, project leader, revealed to us the technology behind Sogou simultaneous interpreting .

Initiate 「 Context engine 」, sogou AI A new breakthrough in simultaneous interpreting

Sogou simultaneous interpreting technology is from 2016 Since its release , It has experienced the practical application of simultaneous interpreting in thousands of meetings . Developers have found in practice that , The mainstream voice simultaneous interpreting system in the industry is not stable 、 High quality to meet the needs of a variety of speech occasions , The recognition and translation of professional words in speech content are often ineffective .

In order to solve the above problems , Sogou is in simultaneous interpreting 3.0 Added in version 「 Context engine 」, I hope to solve the problem through in-depth understanding of the language .「 The context engine can use the camera to recognize the scene on the screen in real time PPT Content ,」 Chen Wei introduced ,「 Before, the machine simultaneous interpreting could only obtain voice information , adopt OCR technology , Now Sogou simultaneous interpreting can get voice information + PPT Information , Then the context engine can build personalized knowledge , Thus, the translation effect of simultaneous interpreting is greatly improved .」

The following figure shows some 3.0 The application effect of version simultaneous interpreting , The second column is the original content of the guest speech , The third column is the content of the old version of speech recognition . In the past , The speaker uttered some rare words , such as 「 dice 」, Usually it will be AI Identified as an investment , however PPT There is... In the content AlphaGo Man machine war with Li Shishi , Will make simultaneous interpreting 3.0 The system expands 「 dice 」( It means that one party admits defeat ) Such GO terms , With the help of knowledge map ,AI A lot of corrections can be made to the translation .

AI A new breakthrough in simultaneous interpreting : Sogou simultaneous interpreting 3.0 The first context engine ,PPT The accuracy of content translation is improved 40%


Except for proper nouns , How much has the performance of the new technology improved ? Sogou means , In particular, they chose a more difficult professional conference speech , Yes, simultaneous interpreting 2.0 edition 、3.0 A comparative test was carried out between version a and human professional simultaneous interpreting . Man has reached 4.08 branch 、 Sogou simultaneous interpreting 2.0 You can achieve 3.41 branch , and 3.0 The version has obtained 3.82 branch . This achievement has achieved a new breakthrough in the field of simultaneous interpreting , Give Way AI It is one step closer to the professional level of human simultaneous interpreting .

The multimodal technology of seeing and listening is not Sogou simultaneous interpreting 3.0 The only bright spot . Sogou means , Simultaneous interpreting 3.0 It mainly brings improvements in three directions :
  • Closer to nature , From simple speech recognition to speech + Images , The new method simulates the working mode of manual simultaneous interpreting , Increase the function of vision and brain to spread knowledge points , Have a more complex perception system .
  • More professional , Previous AI The simultaneous interpreting model uses general data , The new model enhances the capability by customizing knowledge in real time , Able to capture the scene PPT The content complements the knowledge of professional fields related to the speech , And customized the model for each speech , Enhance the effect of simultaneous interpreting .
  • More intelligent , In the past, model training needed a passive learning process , Now learn automatically PPT The content of , Automatically capture massive vocabulary , Ensure that the quality of simultaneous interpreting is excellent .

AI A new breakthrough in simultaneous interpreting : Sogou simultaneous interpreting 3.0 The first context engine ,PPT The accuracy of content translation is improved 40%


Chen Wei further concluded :「 Sogou simultaneous interpreting 3.0 The version has a large-scale update from front to back , The first is the introduction of multimodality , Added visual processing capabilities . Secondly, in the process of processing, it is upgraded from the perceptual level to the cognitive level , stay 『 Context engine 』 With the help of the , The system can further expand the content of simultaneous interpreting with the help of knowledge map . Form contextual information related to the content of the speech . In the new simultaneous interpreting tool , The system can also enhance the effect of simultaneous interpreting and translation in real time , Less delay .」

With the speaker 「 Look and think 」

Compared with the previous , Multimodal AI Simultaneous interpreting is closer to human beings ,「 Will see 」 It means that simultaneous interpreting has the visual ability for the first time . According to introducing , Sogou simultaneous interpreting 3.0 In use, it can be intercepted with the help of the screen , Get real-time image information or ordinary camera , There is no need to use specific equipment . 「 Can understand and reason 」, Thanks to the application of Sogou context engine . This includes the knowledge map of Sogou and the reasoning ability of encyclopedia , The system can OCR The text content obtained by technology is related to the core knowledge related to the speech , And pass 「 Search dog knows cube 」 Knowledge map real-time reasoning , Acquire background knowledge . in addition , The simultaneous interpreting system can obtain bilingual Chinese English comparison based on the Chinese English language library of Sogou encyclopedia , Real time optimization of simultaneous interpreting recognition and translation .

AI A new breakthrough in simultaneous interpreting : Sogou simultaneous interpreting 3.0 The first context engine ,PPT The accuracy of content translation is improved 40%


Sogou means , Get information in a multimodal way , In the case of introducing knowledge map at the same time , Sogou simultaneous interpreting 3.0 in the light of PPT The recognition accuracy of content has been improved 21.7%, The accuracy of translation has improved 40.3%.

In addition to the conference speech , The technical system of Sogou simultaneous interpreting will be implemented in more scenes , Teleconferencing 、 Journalist interview 、 Live video 、 Tourism travel , Even court trial records are the direction of future efforts .

Sogou simultaneous interpreting technology is from 2016 Released in 1.0 Since Edition , Experienced a process of continuous upgrading .「 Behind the translation module of the simultaneous interpreting system ,1.0 Version use RNN Model , stay 2.0 In the version , We introduced Transformer Model , Solved the problem of gradient explosion , And can remember longer historical content . stay 3.0 Version of the system , except Transformer, Context based streaming decoding is also used , And introduces the knowledge map based on Sogou encyclopedia .」 Zhao Chao said .

AI A new breakthrough in simultaneous interpreting : Sogou simultaneous interpreting 3.0 The first context engine ,PPT The accuracy of content translation is improved 40%


But at the same time, we should also see the common problems of the industry ,AI The accuracy of simultaneous interpreting is still far from the level of human experts , Among them, the challenge of existing algorithm ability , There are also people for AI「 Higher requirements 」 Why .「 After communicating with many simultaneous interpreting practitioners, we found that , Follow the normal process , Manual simultaneous interpreting requires the partner to provide background materials in advance , And have one or two days to prepare ,」 Chen Wei explained ,「 But there is no preparation time for machine simultaneous interpreting , And at the beginning of simultaneous interpreting , Humans can also see the scene PPT Content on . So for machine simultaneous interpreting , In addition to doing a good job in pronunciation , Visual information is also very important .」

Sogou simultaneous interpreting 3.0 behind , It's the company 「 Natural interaction + Knowledge of computing 」 The deepening of strategy . sogou CEO Wang Xiaochuan recently said , sogou AI The core of Technology , Is to add perception to the machine through deep learning , So as to realize the natural interaction with human beings , At the same time, we can further extract the relevance in the language , Let machines produce human 「 cognition 」 Ability .

From initial voice interaction to lip recognition , To machine translation 、 Sogou busy ( Synthesis of the host ), To today's multimodal interaction , Sogou is relying on voice 、 Images 、 Gestures and other ways to make AI It is more important to expand with human beings 「 natural 」 The communication of .

copyright notice
author[Qingdeng ancient temple],Please bring the original link to reprint, thank you.
https://en.fheadline.com/2022/127/202205071917121272.html

Random recommended