current position：Home>Why does Tesla stick to the pure visual route?
Why does Tesla stick to the pure visual route?
2022-02-02 20:51:48 【TechWeb】
【TechWeb】 In recent days, , Tesla China shared its ideas and research progress of using pure vision scheme with the media offline .
Adhere to visual perception use AI Neural network technology improves the ability of assisted driving
Pictured 1 Shown ,Andrej say ：“ We hope to build a neural network connection similar to the animal visual cortex , Simulate the process of brain information input and output . It's like light entering the retina , We hope to simulate this process through the camera .”
chart 1 Schematic diagram of human image processing process simulated by camera
Multi task learning neural network architecture HydraNets, Through a backbone network 8 Raw data from a camera , utilize RegNet Residual network and BiFPN Unified processing of algorithm model , The characteristics of various types of images with different accuracy are obtained , It is used to supply neural network tasks of different demand types .
chart 2 Multi task learning neural network architecture HydraNets
However, because this structure deals with the single frame picture of a single camera , There are many bottlenecks in practical application ; So... Is added to the sub structure Transformer Neural network structure , Make the originally extracted two-dimensional image features , Become the feature of three-dimensional vector space combined by multiple cameras , This greatly improves the recognition rate and accuracy .
It's not over yet. , Because it's still a single frame , So we also need time dimension and space dimension , So that the vehicle has the characteristics “ memory ” function , For response “ Occlusion ”、“ Road signs ” And so on , Finally, it is realized in the form of video stream , Extract the features of the driving environment , Form a vector space , So that the vehicle can accurately 、 Judge the surrounding environment with low delay , formation 4D Vector space , The database of these video form features is used for training automatic driving .
chart 3 Video 4D Neural network architecture in vector space
However, due to the difference between urban automatic driving and high-speed automatic driving , The vehicle planning module has two major problems , One is that the driving scheme does not necessarily have an optimal solution , There will be many local optimal solutions , That means the same driving environment , Autopilot can choose from many possible solutions , And they are all good plans ; The second is the higher dimension , The vehicle not only needs to react now , We also need to plan for the next period of time , Estimate the location space 、 Speed 、 Acceleration and so on .
So Tesla chose two ways to solve the two problems of planning module , One is to solve the local optimal solution by discrete search “ answer ”, per 1.5 millisecond 2500 Super efficient execution of this search ; The other is to use continuous function optimization to solve high-dimensional problems . A global optimal solution is obtained by discrete search , Then continuous function optimization is used to balance the demands of multiple dimensions , For example, comfort 、 Ride comfort, etc , Get the final planning path .
Besides , In addition to planning for yourself , still more “ Estimate ” And guess the planning of other objects , In the same way , Based on the recognition of other objects and the basic speed 、 Acceleration and other parameters , Then plan the route for other vehicles , And deal with .
But road conditions around the world are changing , Very complicated , If the discrete search method is adopted, it will consume a lot of resources , And make the decision-making time too long , Therefore, we choose the way of deep neural network combined with Monte Carlo search tree , Greatly improve the efficiency of decision-making , Almost an order of magnitude gap .
chart 5 Efficiency in different ways
The overall architecture of the final planning module is shown in the figure 5, Firstly, based on the architecture of pure vision scheme, the data is processed into 4D Vector space , Then based on the previously obtained object recognition and shared feature data , Then the depth neural network is used to find the global optimal solution , The final planning result is handed over to the executing agency for execution .
chart 6 Visual recognition + planning 、 Execute the overall architecture
Of course , The best neural network architecture and processing method , Are inseparable from an effective and huge database . In the data from 2D towards 3D、4D In the process of transformation , about 1000 The multi person manual annotation team is also keeping pace with the times 4D Dimensioning in space , And only after labeling in vector space , It will be automatically mapped into a specific single picture of different cameras , Greatly increase the amount of data annotation , But that's not enough , The amount of manually marked data is far from enough to feed the amount of training required for automatic driving .
chart 7 4D Demonstration of manual annotation in vector space
Because people are better at semantic recognition , And computers are better at Geometry 、 Triangulation 、 track 、 Reconstruction, etc , So Tesla wants to create a human and computer “ Harmonious division of labor ” Mode of common annotation .
Tesla built a huge automatic labeling pipeline , use 45 second -1 Sub video , Including a large amount of sensor data , Give it to neural network offline learning , Then, a large number of machines and artificial intelligence algorithms are used to generate annotation data sets that can be used for training networks .
chart 8 Video clip automatic annotation processing flow
For areas that can be driven, such as roads 、 Track line 、 Identification of intersections, etc , Tesla used NeRF“ Neural radiation fields ”, That is, one 2D towards 3D Transformed image processing algorithm , Give a given XY Coordinate point data , Let the neural network predict the height of the ground , This generates countless XYZ Coordinates , And various semantics , For example, roadside 、 Lane line 、 Road surface, etc. , Form a large number of information points , And project it back into the camera picture ; Then compare the road data with the image segmentation results recognized by the neural network , And optimize the images of all cameras as a whole ; Combine time dimension and space dimension at the same time , Create a perfect reconstruction scene .
chart 9 Demonstration of road reconstruction
Using this technology, the road information reconstructed by different vehicles passing through the same place , Cross compare , They must be on the same information at all locations , To predict correctly , Under such joint action , An effective marking method of road surface is formed .
chart 10 Multiple video data labels overlap and check each other
This is totally different from high-precision maps , As long as the annotation information generated by all video clips is more and more accurate , The marked information is consistent with the actual road conditions in the video , You don't have to maintain this data anymore .
Using these technologies at the same time , It can also recognize and reconstruct static objects , And textured 、 No texture can be based on these 3D Mark the information points ; These marking points are very useful for the camera to recognize any obstacle .
chart 11 Of static objects 3D Information point reconstruction
Another benefit of using offline processing of these data and annotations is , The bicycle network can only predict other sports at a time , While offline, due to the fixed line of data , You can know the past and the future , According to certain data , Ignore occlusion or not , The speed of all objects 、 Acceleration prediction and calibration optimization , And mark , The training network later judged other sports more accurately , It is convenient for the planning module to plan .
chart 12 Offline to vehicle 、 Pedestrian speed 、 Acceleration calibration and marking
Then combine these , It forms a pair of video data , All road related 、 Recognition of static and dynamic objects 、 Anticipation and reconstruction , And mark the dynamic data .
chart 13 Reconstruction and annotation of the surrounding environment by video clips
Such video data annotation will become the core part of training automatic driving neural network . One of the projects is in 3 months , Use this data to train the network , All functions of millimeter wave radar are successfully realized and more accurate , So the millimeter wave radar is removed .
chart 14 When the camera can hardly see , The judgment of speed and distance is still accurate
It is verified that this method is highly effective , Then we need massive video data to train . So at the same time , Tesla also developed “ Simulation scene technology ”, It can simulate the less common “ Edge scenes ” For automatic driving training . Pictured 4 Shown , In the simulation scenario , Tesla engineers can provide different environments and other parameters （ obstacle 、 Collision 、 Comfort, etc ）, It greatly improves the training efficiency .
chart 15 Simulation scenario
Tesla uses simulation mode to train the network , It has been used 3 Billion images and 50 Billion tags to train the network , Next, we will use this model to continue to solve more problems .
chart 16 The improvement brought by the simulation mode is expected in the coming months
Sum up , If you want to improve the ability of automatic driving network more quickly , Need to deal with a large number of video clips and operations . A simple example , To get rid of the millimeter wave radar , Just deal with it 250 Ten thousand video clips , Generated more than 100 Billion labels ; And these , Let hardware become the bottleneck of development speed more and more .
Previously, Tesla used a set of about 3000 block GPU、 Slightly below 20000 individual CPU Training hardware , And for simulation, I also added 2000 More than one FSD Computer ; Later it developed to 10000 block GPU The world's fifth largest supercomputer , But even so , It's not enough .
chart 17 The parameters and changes of supercomputers currently in use
So Tesla decided to develop its own supercomputer .
“ The pioneering work of Engineering ”——D1 Chip and Dojo supercomputer
The present , As the data to be processed begins to grow exponentially , Tesla is also improving the computational power of training neural networks , therefore , Tesla Dojo supercomputer .
Tesla's goal is to achieve the ultra-high computing power of artificial intelligence training , Dealing with large and complex neural network patterns 、 At the same time, expand the bandwidth 、 Reduce the delay 、 Cost savings . This requires that Dojo The layout of supercomputers , To achieve the best balance between space and time .
As shown in the figure , form Dojo The key unit of supercomputer is the neural network training chip independently developed by Tesla ——D1 chip .D1 The chip adopts distributed structure and 7 Nanotechnology , carrying 500 100 million transistors 、354 Training nodes , The internal circuit alone is as long as 17.7 km , It realizes super computing power and ultra-high bandwidth .
chart 18 D1 Chip technical parameters
chart 19 D1 Chip live display
As shown in the figure ,Dojo The single training module of the supercomputer consists of 25 individual D1 Chip composition . Because each D1 The chips are seamlessly connected , The delay between adjacent chips is very low , The training module realizes the bandwidth reservation to the greatest extent , With the high bandwidth created by Tesla 、 Low latency connector ; In less than 1 In cubic feet , Up to 9PFLOPs（9 Billions of times ）,I/O Bandwidth up to 36TB/s.
chart 20 D1 Training module composed of chip
chart 21 The training module is displayed on site
Thanks to the independent operation ability and unlimited link ability of the training module , Composed of Dojo The performance expansion of supercomputers is theoretically unlimited , It's a true “ Performance beast ”. Pictured 9 Shown , Practical application , Tesla will 120 A training module is assembled into ExaPOD, It is the world's leading artificial intelligence training computer . Compared with other products in the industry , Its performance is improved at the same cost 4 times , Performance improvement under the same energy consumption 1.3 times , Space savings 5 times .
chart 9 The training modules are combined into ExaPOD
Matching powerful hardware , It is a distributed system developed by Tesla ——DPU（Dojo Processing Unit）.DPU Is a visual interactive software , The scale can be adjusted according to requirements at any time , Efficiently process and calculate , Data modeling 、 Storage allocation 、 Optimize the layout 、 Partition expansion and other tasks .
soon , Tesla is about to start Dojo The first assembly of supercomputers , And from the entire supercomputer to the chip 、 System , Further improvement . For AI technology , Musk obviously has a bigger pursuit . This pursuit , In his opening remarks “ We have a technical problem , I hope I can use AI To solve ” The teasing of , What's more, he promised at the end of the activity “ We will further explore the whole human world ” Commitment .
author[TechWeb],Please bring the original link to reprint, thank you.
The sidebar is recommended
- Musk: over time, the total investment of Tesla Texas super factory will exceed $10 billion
- The yuan universe has not arrived yet, and the migrant workers go first? Face pinchers and virtual scene builders are on fire
- How does the tongue affect your beauty?
- In some places, a large number of good fields do not grow grain. What about eating?
- Boss Li, you opened the pattern with a Shua!
- see the scene which is dreadful to one 's mind! The Yellow River has become a large-scale solid waste "dump"?
- One person voluntarily surrendered and two were punished
- Rely on enterprises to eat enterprises, set up rent-seeking, "shadow shareholders"... Strictly investigate corruption and accurately supervise to prevent the loss of state-owned assets
- [see you at 8:00] the police responded to three doubts about the "23-year-old woman lost contact" incident
guess what you like
Doctor "tiger dad" forces young children to learn advanced mathematics. How can high knowledge be ignorant?
Muddy water shorting shell housing company responded late at night: resist malicious shorting, and welcome all kinds of investigations
Energy chain newlink won the E2 round of strategic investment from international green fund and Shandong Expressway
New action of Zhihu content marketing: planting grass and trees elsewhere
The chief technology officer of Amazon predicts five major technology trends in 2022 and the future
Dachang depression patient: I'm all right, just unhappy
Facebook made concessions for the EU approval of the kustomer acquisition
Cadillac lyriq will be equipped with a new generation of super cruise super assisted driving system
Ruishu information announced the completion of C3 round of 100 million yuan financing, with a cumulative financing of more than 600 million yuan in three months
The 2021 Nuggets annual essay solicitation of a Nuggets operator
- 2021 VDC: the technical architecture evolution of vivo Internet services for 100 million users
- Bleeding heart, the editor of firewood cutting academy taught and wrote popular money for free, but I missed the opportunity because of this problem
- US media: another huge theft of cryptocurrency was exposed, and hackers stole nearly $200 million
- How to plan TikTok Live Streaming? Novice must see (dry goods collection)
- Baijia No. certification receipt
- Baijia blue V certification
- Where is baijiahao Certification Center
- Baidu Baijia blue V certification application
- How to verify the real name of Baijia number
- Baijia certification number V
- Baijia V certification
- Bing has suspended the search automatic suggestion function in the mainland: it cannot be used for 30 days
- Morgan Stanley: the target price of Xiaomi group was lowered from HK $31.5 to HK $27, and the rating was overweight
- Musk: Bezos takes himself too seriously
- Pollen Club app is merged into "my Huawei" from the original function of Huawei's application market
- The tea workers make complaints about 11 hours per hour and 3000 yuan per month.
- IDC: China's Tablet PC market is expected to grow by 22.4% in 2021
- India will introduce semiconductor and display manufacturers through a $10 billion plan
- Three interviews involving illegal modification of electric bicycles in Beijing warned e-commerce companies such as jd.com and Alibaba
- Musk again sold Tesla shares worth about $884 million and has sold 75.66% of its commitment
- Tencent's 9 apps are gradually restored and updated: including QQ music, enterprise wechat, etc
- Release of anti food waste work plan: it is forbidden to produce audio and video such as eating mostly and overeating
- [Jieju] many countries secretly help Taiwan build submarines? No one dares to admit it
- A fire in a building in Osaka, Japan, has caused 27 people's lung function to stop
- Pingdingshan youth help! Official latest response
- Why did Shanghai's "wanghong" community stop selling from "10000 people grabbing"?
- Gaode map Lane level navigation adaptation oppo find n folding screen mobile phone
- Rivian's share price fell more than 11% after announcing its first financial report
- Suddenly lost contact! What kind of "Waterloo" has Wang Chaoyong, a 10 billion PE boss, experienced
- Alpha's smart city project was merged into Google's project, and the founder and CEO resigned
- See unreasonable pressure again! The US Treasury will list eight Chinese science and technology enterprises in the "investment blacklist"
- Autonomous vehicles hit pedestrians! Waymo clarified that the driver was driving manually
- Reddit submitted an IPO application with a valuation of more than US $15 billion
- The biggest acquisition in Oracle's history! It is said that it plans to buy Senna for us $30 billion
- According to the industry chain news, Samsung has obtained the OEM order of Italian French semiconductor MCU for the next generation iPhone
- US regulators investigate "buy before pay" service providers
- Sources said that Facebook's acquisition of kustomer was approved by the European Union
- Google joins hands with well-known female video creators to send her blessings
- Openai taught gpt-3 how to surf the Internet, and the AI model of "omniscient and omnipotent" was launched
- Shangtang is expected to restart its IPO next Monday, maintaining its target of $767 million