OpenAI has just released GPT-4, and Baidu’s long-awaited AI-generated dialogue product has finally appeared. yesterday afternoon,Wenxin Yiyan (ERNIE Bot)—— Baidu’s new generation of knowledge-enhanced large language model and new member of the Wenxin large model family were officially released in the “Waving the World” conference room at Baidu’s headquarters.

As soon as the press conference opened, Baidu CEO Robin Li gave everyone a shot:

In a sense, Baidu has been preparing for this (publishing Wenxin Yiyan) for many years. We started investing in AI research more than ten years ago, and launched the Wenxin big language model in 2019. Today’s Wenxin Yiyan is A continuation of the efforts of the past many years.

But it cannot be said that we are completely ready. Wenxin said that the threshold for benchmarking against ChatGPT, or even GPT-4 is very high. No major global manufacturer has made it yet, and Baidu is the first. My own test feels that there are still many imperfections.

Li Yanhong pointed out: “No matter which company it is, it is impossible to make such a large language model in a few months. Deep learning and natural language processing require years of persistence and accumulation, and there is no way to speed it up.”

Wenxin said, what can you do?

As the first generative AI product born out of several major domestic manufacturers, what kind of functions can Wenxin Yiyan achieve? Baidu CEO Robin Li did not hide it either. At the beginning of the press conference, he showed five usage scenarios of Wenxin Yiyan, and demonstrated the functions of these usage scenarios one by one.

  • Creative writing;

  • Creation of business copywriting;

  • Mathematical and logical calculations;

  • Chinese understanding;

  • multimodal generation;

At present, when users experience generative AI such as ChatGPT, they will find a problem, that is, even if there is a factual basis, the AI ​​will still talk nonsense in a serious manner (such as the mistakes made by Google Bard some time ago), if the user has not been verified and trusts the AI The generated content, on the contrary, will make a big mistake. So in the face of such a problem, can Wen Xin Yiyan deal with it easily?

Scene 1: In the first scene of the dialogue, Wenxin provided accurate information on the similarities and differences between the author of “Three-Body”, the core content, the cast and crew of the TV series, and even the actors themselves. In the case of mistakes, it also demonstrates the creative ability to continue writing content.

Scenario 2: For commercial copywriting, there are three rounds of dialogue demonstrations on site, involving naming the company, writing a Slogan, and writing a press release. Judging from the on-site demonstration, Wenxinyiyan has a good understanding of the preferences of Chinese people and the deep meaning contained in Chinese words. Taking the link of naming a technology service company as an example, the answers given are completely in line with the imagination of Chinese people for naming such companies. Customers can know the type of company and even the business direction at a glance through the name.

Scene 3: In the part of mathematical logic deduction, it demonstrates a well-known problem of chicken and rabbit in the same cage, but Baidu secretly set up a “trap” in order to show the ability of Wenxin Yiyan, and gave a completely unsolvable problem . However, this did not bother Wen Xinyiyan. He immediately found that the question was wrong, and after revisions, Wenxin answered the question accurately and gave a simple idea to solve the problem.

Scene 4: During the press conference, Baidu CEO Robin Li also generously admitted that although Wenxin Yiyan also supports English Q&A, its strength is still limited. As for the main understanding of Chinese, whether it is the meaning of the idiom “Luoyang Zhigui”, or the economic principles behind the idiom, or even the understanding of the word “Tibetan Poetry”, as well as the final verses, in this round it is to show vividly.

Scenario 5: Multimodal generation is a major feature that was enhanced when GPT-4 was released a few days ago. At that time, OpenAI demonstrated the ability to generate code from sketches. Wenxin Yiyan also showed a multi-modal generation in this link. In addition to the text dialogue ability shown in the previous scenes, it also demonstrated the image, video and voice (dialect) generation capabilities in scene five. , the current hot ChatGPT is still unable to implement the functions listed below.

Comparison with Bing Chat and ChatGPT

Compared with ChatGPT and Bing Chat, the biggest difference of “Wen Xin Yi Yan” is multi-modal generation, that is, posters, voice and even video content can be generated through language. In the presentation of the press conference, Li Yanhong demonstrated the use of Wenxin Yiyan to generate event posters, dialect voices, and generate event-related videos based on the content of questions. However, the cost of generating video is relatively high, and it is not yet open to all users at this stage.

The ability to generate pictures and videos really made our eyes shine. Robin Li also said that multi-modal generative AI is a clear development trend.

During the demonstration, Li Yanhong repeatedly emphasized that Baidu is in a unique position in the processing of the Chinese language.

The following example shows using the content demonstrated in the conference to ask ChatGPT (version 3.5) and Bing Chat for answers.

The first is the question about “The Three-Body Problem”. Both Bing Chat and Wenxin Yiyan can correctly answer the question of who the author is and where he is from, while ChatGPT mislabeled Liu Cixin’s hometown as Shandong.

Bing Chat’s answer also shows that its source of information is Baidu Encyclopedia.

In the question about the actors of the “Three-Body” TV series staged in early 2023, ChatGPT, whose information base stayed in 2021, made another mistake, saying that the “Three-Body” TV series had not yet started filming, and Bing Chat found the answer in Douban.

In terms of business copywriting, all three can give their opinions, and ChatGPT also thoughtfully attached the English name.

However, Bing Chat misidentified the meaning of the question when it first asked, and did not provide an accurate company name, but provided a solution on how to choose a company name.

Although in previous use, whether it is ChatGPT or Bing Chat, they do not give us complete peace of mind when doing math problems. However, the problem of chickens and rabbits in the same cage mentioned in Baidu’s press conference did not bother the two, and they both answered it accurately.

It can be seen that Bing Chat’s interpretation is like a persuasive teacher, while Wen Xinyiyan’s answer is a bit like a reference answer after class.

When it comes to Chinese comprehension, the advantages of Wenxinyiyan are reflected.

When asking “How expensive is the paper in Luoyang at that time”, ChatGPT mistakenly thought that it was asking about the price of the Tang Dynasty, so the returned information was that the paper in Luoyang was not expensive at all. There was no problem in Bing Chat’s identification, but it did not give Accurate data.

And Wenxin’s price of two to three thousand Wen is at least consistent with the data obtained from the search.

I believe you have also noticed that, not to mention the content of the writing, neither ChatGPT nor Bing Chat understands what Tibetan acrostics are. In comparison, Baidu Wenxin Yiyan’s performance is indeed outstanding.

It can be seen that Wenxin Yiyan’s performance in the Chinese field is indeed better than ChatGPT and Bing Chat. However, Li Yanhong also mentioned in the press conference that although the Chinese language has obvious advantages, Wenxinyiyan has not trained enough for English language and code scenarios, and its performance is not good enough. I believe that Baidu will improve rapidly in the future.

Technical Architecture & Features

Baidu Chief Technology Officer Wang Haifeng explained in detail the Wenxin model and technical features behind Wenxin Yiyan at the press conference.

Baidu has a full-stack layout in the four-layer artificial intelligence architecture: including the underlying chip, deep learning framework, large model, and top-level search applications; Wenxinyiyan is located in the model layer.

Wang Haifeng said that the rapid launch of Wenxin Yiyan is mainly based on Baidu’s accumulation in the past 11 years, and the formation of layer-to-layer feedback and end-to-end optimization between the four layers. In particular, the collaborative optimization between the flying paddle at the framework layer and the Wenxin large model at the model layer played a vital role in the development of Wenxin Yiyan.

According to the introduction, Wenxin Yiyan is a new generation of knowledge-enhanced large language model, developed based on the ERNIE and PLATO series models; it adopts six core technologies, including: supervised fine-tuning, reinforcement learning of human feedback, prompting, knowledge enhancement, and retrieval enhancement and dialogue enhancements. The first three items are the common capabilities of similar large-scale language models. They have been applied and accumulated in ERNIE and PLATO, and have been further strengthened and polished in a word; the last three items are the re-innovation of Baidu’s existing characteristic technologies.

The training data of Wenxinyiyan large-scale model includes trillions of webpage data, billions of search data and image data, tens of billions of voice calls per day, and knowledge graphs of 550 billion facts. But Wang Haifeng also admitted that the current training of large models is not sufficient. In the future, with more and more feedback from real users, the effect and ability of Wenxinyiyan will gradually improve.

Experience Path

Baidu has announced the invitation test plan for Wenxin Yiyan.

From March 16th, the first batch of users can experience the product on the official website of Wenxinyiyan through the invitation test code, and it will be opened to more users in succession.

Enterprise customers can use the “Wen Xin Yi Yan” API interface opened by Baidu Smart Cloud
(, enterprises that have not yet obtained the API can make an appointment on the Baidu Smart Cloud Platform.

