If AI Learn to surf the Internet , Then it has an unlimited way to acquire knowledge , It's not easy to predict what will happen later . So famous AI Research Institute OpenAI Teaching that opens the door to general artificial intelligence 、 A massive artificial intelligence model GPT-3 Learned to surf the Internet .
2020 year 5 month ,OpenAI go online
have 1750 Billion parameter GPT-3, This big model is powerful , The maximum data set it uses has reached... Before processing 45TB, Not only can you answer questions better 、 translate 、 Write an article , It also has some mathematical computing power . Such a powerful deep learning model , It can't help but create an illusion ： real AI Are you coming ？
stay GPT-3 after , Language big model has become an important trend in the research of science and technology companies , There is a combination of large models and knowledge maps , There are also 「 Big 」 Walk in this direction to the black . This year, 12 month ,
Google GLaM The parameter quantity has been pushed up to 1.2 One trillion .
image GPT-3 Such a language model is useful for many different tasks , However, when performing real-world knowledge tasks, it often produces 「 illusion 」 Information . They often have one drawback —— Lack of common sense . For example, when asked 「 My feet have a few eyes 」 when , It will answer 「 Two 」. This defect is known in the industry as 「GPT-3 The Achilles heel of 」. In specific applications , It will lead to poor performance of the model in some tasks involving logical reasoning and cognition .
To solve this problem ,
OpenAI Church GPT-3 Use text-based web browser .
Now? , Some tricky models can handle this problem correctly ： such as , Someone asked the wrong question ：「 When did Shakespeare write 《 Harry · potter 》 novel series ?」
The model answers ： Shakespeare didn't write 《 Harry · potter 》 A novel . These novels are written by J.K. Rowling's done ……
Now it seems , This will go online WebGPT, No more direct answers 「 My feet have a few eyes 」 The obvious problem of such mistakes , But to help you correct .
From the content of the answer , This model is completely correct , Besides , The model also provides readers with references , As shown by the numbers in blue , At the end of the answer, there are relevant links , Click on each link , It can also link to the corresponding web page .
And such as , Someone asked ： Are there connections in the hippocampus ？ The answer of the model feels more professional than professionals . alike , The model also provides reference links .
For some more professional questions ,WebGPT No problem , such as , What is sparsity in machine learning transformer？ For this question , Maybe just started AI None of the researchers can answer , But the model can give an accurate answer , The one with a formula .
The following is the model search process ：
How to realize the above functions ？ say concretely ,OpenAI Yes GPT-3 Fine tuned , To answer open-ended questions more accurately using a text-based web browser , this
Allow model search and web browsing . The prototype model reproduces the way humans study the answers to questions online , Involves submitting search queries , Follow links , And scrolling pages up and down . After training the model , It will reference the information source , This makes it easier for the model to provide feedback , So as to improve the accuracy of the facts .
Besides , The model also provides an open-ended question and browser status summary , And must have such as 「Search……」、「Find in page：……」 or 「Quote：……」 Orders like that .
In this way , The model collects paragraphs from web pages , Then use these paragraphs to write answers .
By setting up tasks ,OpenAI Be able to use imitation to learn （imitation learning） Training models on different tasks , Then optimize the answer quality based on human feedback .OpenAI stay ELI5 The model is trained and evaluated , among ELI5 It's a by Reddit Set of questions asked by users .
Address of thesis ：https://cdn.openai.com/WebGPT.pdf
How is such an intelligent model realized ？
Overall speaking ,OpenAI Yes GPT-3 The model of the model family is fine tuned , This paper focuses on 760M、13B and 175B Parameter model . Starting from these models ,OpenAI Four main training methods are used ：
about BC、RM and RL,OpenAI Using disjoint problem sets . In conclusion ,BC in ,OpenAI About 4% As a verification set .RM in ,OpenAI Models of different sizes are used （ Mainly 175B Model ） Sample the answers of the comparison dataset , Use different methods and the combination of super parameters for training , And combine them into a single data set . The final reward model goes through about 16,000 A comparative training , rest 5,500 Used to evaluate . and RL In a mixed way , among 90% The problem comes from ELI5,10% The problem comes from TriviaQA.
The model is trained to answer questions from ELI5 The problem of ,OpenAI Three different models were trained （760M、13B and 175B）, Calculate the budget corresponding to three different reasoning times .OpenAI The best model （175B best-of-64） The resulting answer is 56% More popular than the answers written by human presenters . Although these are the same demonstrations used for the training model , But we can use artificial feedback to improve the answer of the model to optimize .
Behavioral cloning （Behavior cloning,BC）：OpenAI Use supervised learning to fine tune the presentation , And use the commands issued by the human presenter as a label ;
Modeling rewards （Reward modeling,RM）： Remove from unembedding Layer of BC Model start ,OpenAI The trained model can accept questions and answers with references , And output scalar rewards , The reward model uses cross entropy loss for training ;
Reinforcement learning （RL）：OpenAI Use Schulman Et al PPO fine-tuning BC Model . For environmental rewards ,OpenAI stay episode Get the reward model score at the end , And add it to each token Of BC Model KL In punishment , To reduce the over optimization of the reward model ;
Eliminate sampling （best-of-n）：OpenAI from BC A model or RL Model （ If not specified , Then use BC Model ） Draw a fixed number of answers from （4、16 or 64）, And choose the answer with the highest ranking in the reward model .
stay ELI5 Test set , take OpenAI The model is compared with the human demonstrator .
For training from （training distribution） The question raised in ,OpenAI The answers of the best models are on average as accurate as those written by our human presenters . However , about out-of-distribution problem , Robustness is a challenge . To explore this issue ,OpenAI stay TruthfulQA The data set was evaluated .OpenAI The model of TruthfulQA Better than GPT-3, And show more favorable extension characteristics . However ,OpenAI Our model lags behind human performance , Part of the reason is that they cite unreliable sources . The study hopes to use techniques such as confrontation training to reduce these problems .
Assess real-time accuracy
In order to provide correct feedback to improve the accuracy of facts , Humans must be able to evaluate the answers generated by the model . This can be a challenging task , Because the reply may be technical 、 Subjective or ambiguous . For this reason , The developer asks the model to reference the source of its answer .
After testing ,OpenAI Think WebGPT Still can't recognize many subtle differences , It is expected that with the improvement of artificial intelligence system , Such decisions will become more important , Interdisciplinary research is needed to develop practical and cognitive standards . Maybe the way of debate can alleviate these problems .
Risks of deployment and training
Because the probability of generating false statements is lower ,WebGPT Obviously than GPT-3 Better , But there are still risks . Answers with quotations from the original text are usually considered authoritative , This may cover up OpenAI The fact that the new model still has basic errors . The model also tends to reinforce users' existing beliefs , Researchers are exploring how best to solve these problems .
Apart from mistakes and misleading , By making AI Training methods of model access network , New risks have been introduced into the study . Regarding this OpenAI It indicates that the browsing environment of artificial intelligence is not complete network access , Is to send the query request to... Through the model Microsoft Bing Web Search API And link with the existing links on the network , This may have side effects .
OpenAI Express , According to GPT-3 Our experience , The model doesn't seem to be enough to make dangerous use of these ways of connecting with the outside world . However , The risk will increase with the increase of model ability , Researchers are trying to establish internal protection measures against them .
OpenAI Think , Human feedback and Web Browser and other tools to achieve stable and reliable , Truly universal AI The system has found a promising way . Although the current large language model still faces many unknowns and challenges , But significant progress has been made in this direction .
author[Heart of machine],Please bring the original link to reprint, thank you.