current position：Home>The world's super chips unlock the "human brain level" Ai model, and the cluster is equipped with 163 million cores
The world's super chips unlock the "human brain level" Ai model, and the cluster is equipped with 163 million cores
2021-08-31 20:59:32 【TechWeb】
Today in the morning ,Cerebras Systems Announce launch The world's first human brain size AI Solution , a CS-2 AI The computer can support more than 120 Trillion parameter scale training . by comparison , The human brain has about 100 Trillion synapses .
Besides ,Cerebras also Realized 192 platform CS-2 AI The computer expands almost linearly , To create a product that contains up to 1.63 Billion core computing clusters .
Cerebras Founded on 2016 year , So far 14 Countries have more than 350 Position Engineer , before Cerebras The world's largest computing chip WSE and WSE-2 Once shocked the industry .
WSE-2 use 7nm process , It's an area of 46225 Square millimeter single crystal round chip , Have 2.6 Trillions of transistors and 85 m AI Optimize the core , Both the number of cores and the on-chip memory capacity are much higher than those with the strongest performance so far GPU.
WSE-2 Be integrated in Cerebras CS-2 AI In the computer . With the large-scale development of the industry in recent years AI Model breakthrough 1 Trillion parameters , Small clusters are difficult to support high-speed training of a single model .
and Cerebras The latest published results , Single set CS-2 The size of neural network parameters that machines can support , Expand to the largest existing model 100 times —— achieve 120 Trillion parameters .
At the top of the international chip architecture Hot Chips On ,Cerebras Co founder and chief hardware architect Sean Lie It shows in detail the New technology portfolio , Include 4 Xiang Chuangxin ：
（1）Cerebras Weight Streaming： A new software execution Architecture , The ability to store model parameters off chip is realized for the first time , At the same time, it provides the same training and reasoning performance as on the film . This new execution model decomposes computation and parameter storage , It makes the expansion of cluster size and speed more independent and flexible , It also eliminates the delay and memory bandwidth problems often faced by large clusters , Greatly simplify the workload distribution model , So that users do not need to change the software , You can use 1 platform CS-2 Extended to 192 platform CS-2.
（2）Cerebras MemoryX： A memory expansion technology , by WSE-2 Provide up to 2.4PB Off chip high performance storage , It can maintain performance comparable to that on the chip . With the help of MemoryX,CS-2 Can support up to 120 Trillions of parameter models .
（3）Cerebras SwarmX： It's a high performance 、AI Optimized communication structure , Extend the on-chip structure to off chip , send Cerebras can Connect up to 192 platform CS-2 Of 1.63 One hundred million AI Optimize the core , Work together to train a single neural network .
（4）Selectable Sparsity： A dynamic sparse selection technique , Enables the user to select the degree of weight sparsity in the model , And directly reduce FLOP And resolution time . Weight sparsity has always been a challenge in the field of machine learning , Because it is in GPU The efficiency is extremely low . This technology makes CS-2 Can speed up work , Various available sparsity types, including unstructured and dynamic weight sparsity, are used to generate answers in a shorter time .
Cerebras CEO and co-founder Andrew Feldman Said it promoted the development of the industry . Deputy director of Argonne National Laboratory Rick Stevens Also affirm this invention , I think this will be the first time we can explore a brain scale model , Open up broad new avenues for research and insights .
One 、 Weight Streaming ： Deposit is separate , Realize off chip storage of model parameters
Use large clusters to solve AI One of the biggest challenges of the problem , Is set for a specific neural network 、 The complexity and time required to configure and optimize them . Software execution architecture Cerebras Weight Streaming It can just reduce the difficulty of programming the cluster system .
Weight Streaming Based on the WSE Based on oversized , Its calculation and parameter storage are completely separated . With the highest configuration 2.4PB Of storage devices MemoryX combination , A single CS-2 Can support running 120 Trillions of parameters .
Those who took part in the test 120 Trillion parameter neural network consists of Cerebras Internal development , Not a published neural network .
stay Weight Streaming in , The model weight is stored outside the central chip , Flow to wafer , It is used to calculate each layer of neural network . Trained in neural networks delta On the channel , The gradient flows from the wafer to the central storage area MemoryX Used to update weights in .
And GPU Different ,GPU The amount of on-chip memory is very small , Need to partition large models across multiple chips , and WSE-2 Large enough , Can adapt and execute very large-scale layers , Without traditional blocks or partitions to decompose .
This ability to adapt to each model layer in on-chip memory without partitioning , The same neural network workload mapping can be given , And independent of all others in the cluster CS-2 Do the same calculation for each layer .
The benefit of this is , Users do not need to make any software changes , The model can be easily run on a single machine CS-2 On , Extend to clusters of any size . in other words , In large quantities CS-2 The system runs on a cluster AI Model , Programming is like on a single computer CS-2 Run the model on the same .
Cambrian AI Founder and chief analyst Karl Freund Appraisal way ：“Weight Streaming The execution model is very concise 、 grace , Allow in CS-2 Clusters allow easier allocation of work on incredible computing resources . adopt Weight Streaming,Cerebras Eliminates all the complexity we face today in building and efficiently using large clusters , Push the industry forward , I think it will be a journey of change .”
Two 、 MemoryX ： Realize the multi billion parameter model
Have 100 Trillions of parameters on the human brain scale AI Model , About need 2PB Bytes of memory to store .
As mentioned earlier, model parameters can be stored off chip and efficiently streamed to CS-2, Achieve near on-chip performance , The key facility for storing neural network parameter weights , That is Cerebras MemoryX.
MemoryX yes DRAM and Flash The combination of , Designed to support the operation of large neural networks , It also contains the intelligence of accurate scheduling and weight updating .
Its architecture is scalable , Support from the 4TB to 2.4PB Configuration of , Support 2000 Million to 120 Trillion parameter scale .
3、 ... and 、 SwarmX ： Almost linear scaling performance , Support 192̨ CS-2 interconnection
Although one CS-2 The machine can store all the parameters of a given layer , but Cerebras It is also proposed to use a high-performance interconnection structure technology SwarmX, To achieve data parallelism .
This technology will Cerebras The on-chip structure is extended to off chip , Expanded AI The boundary of the cluster .
historically , Bigger AI Clustering brings significant performance and power losses . In terms of calculation , The performance increases linearly , The power and cost increase superlinearly . As more and more graphics processors are added to the cluster , Each processor contributes less and less to solving the problem .
SwarmX The structure does both communication , Also do calculations , Can make the cluster realize Near linear performance expansion . this It means that if extended to 16 A system , The speed of training neural network is close to improving 16 times . Its structure is independent of MemoryX Expand , Every MemoryX Units can be used for any number of CS-2.
In this completely separate mode , SwarmX Structure support from 2 platform CS-2 Expand up to 192 platform , Because each one CS-2 Provide 85 m AI Optimize the core , Therefore, up to... Will be supported 1.63 One hundred million AI Optimize the cluster of cores .
Feldman say ,CS-2 The utilization rate of is much higher . The utilization rate of other methods is 10%~20% Between , and Cerebras Utilization on the largest network is 70%~80% Between .“ Today everyone CS2 Have replaced hundreds of GPU, We can now use the cluster method to take thousands of algebras GPU.”
Four 、 Selectable Sparsity ： Dynamic sparsity improves computational efficiency
Sparsity is the key to improve computational efficiency . With AI The cost of community efforts to train large models increases exponentially , Sparsity and other algorithmic techniques are used to reduce the computation required to train the model to the most advanced accuracy FLOP More and more important .
The existing sparsity research has brought 10 Double the speed .
To speed up training ,Cerebras A new sparse method is proposed Selectable Sparsity, To reduce the computational effort required to find a solution , This reduces the response time .
Cerebras WSE Based on a fine-grained data flow architecture , Designed for sparse computing , Its 85 m AI The optimization kernel can ignore 0, Only for non 0 Data to calculate . This is something that other architectures cannot do .
In the neural network , There are many types of sparsity . Sparsity can exist in activation and parameters , It can be structured or unstructured .
Cerebras Architecture specific data flow scheduling and huge memory bandwidth , This enables such fine-grained processing to accelerate dynamic sparsity 、 Unstructured sparsity and other forms of sparsity . The result is ,CS-2 You can select and dial out , To produce a certain degree of FLOP Reduce , This reduces the response time .
Conclusion ： The combination of new technologies makes cluster expansion less complex
Large clusters have always been plagued by setup and configuration challenges , Prepare and optimize in large scale GPU Neural networks running on clusters need more time . In order to be in GPU Achieve reasonable utilization on the cluster , Researchers often need to partition the model manually 、 Manage memory size and bandwidth limits 、 Perform complex and repetitive operations such as additional hyperparameters and optimizer tuning .
And through the Weight Streaming、MemoryX and SwarmX And other technologies ,Cerebras It simplifies the construction process of large clusters . It developed a completely different architecture , Completely eliminate the complexity of expansion . because WSE-2 Large enough , You don't have to work on multiple machines CS-2 Divide the layer of neural network , Even today's largest network layer can be mapped to a single CS-2.
Cerebras Each one in the cluster CS-2 The computer will have the same software configuration , Add another CS-2 It hardly changes the execution of any work . therefore , In dozens of stations CS-2 Running a neural network on is the same as running on a single system , Setting up a cluster is as simple as compiling a workload for a single machine and applying the same mapping to all machines of the desired cluster size .
On the whole ,Cerebras Our new technology portfolio is designed to accelerate the operation of very large-scale AI Model , But for now AI In terms of development process , The number of global institutions that can use such a cluster system is expected to be very limited .
author[TechWeb],Please bring the original link to reprint, thank you.
The sidebar is recommended
- What does Kwai do certification mean?
- What are the advantages of Kwai certification?
- Difference between Kwai master account and authentication
- Cool group had a net loss of HK $238 million in the first half of 2021, and Jia Yueting and other former directors were condemned by the Hong Kong stock exchange
- Global chip shortage intensifies Bosch executives: semiconductor supply chain in automotive industry has collapsed
- Due to strong demand, Ford plans to double the output target of F-150 lightning in 2024
- The sequel to "gongdou" of Shuanghui Shuanghui's development of employees: after raising funds to help the enterprise MBO succeed, it was "demoralized and killed by some management"
- McDonald's UK suspended milkshake due to supply chain problems
- Samsung's lawsuit against Huawei was rejected again
- What Kwai Fu certification is for service providers?
guess what you like
How to promote Kwai Fu enterprise number
How do Kwai certified promoters do it?
Xiaomi responded to abandoning the MI brand: there is no saying to stop using the MI brand
LG: 42 inch OLED TV will be postponed to 2022
Text classification of small samples: Interpretation of super dry goods. Don't say you don't understand capsule network after reading it
Dataworks data modeling - package data model management solution
Kwai Flink based real time digital warehouse practice
Li Shufu establishes Geely Thanksgiving partnership in Ningbo
National energy conservation publicity Week opens sharing single car travel to help cities reduce carbon
What are the Kwai certification privileges?
- Kwai LAN blue V invitation code for what use?
- Is Kwai certified promoters high commission?
- Chinese Ambassador writes to tan Desai
- "I'm coming to nvgao. I'm not afraid of anything!"
- Will Kwai LAN V become more popular after certification?
- The CSRC requires Chinese companies listed in the United States to disclose more information
- 7 necessary Karaoke software sharing for smart box, so that you can sing at home
- Recommend a particularly easy-to-use flow chart drawing software
- Extremely underestimated SaaS leader, why did Inspur International's share price suddenly soar?
- TV home is a treasure TV live app! Smart TV how to watch TV channel TV home is OK
- Fun clip video editing app
- "Don't ask, just believe"?!
- UK auto production in July fell 37.6% year-on-year, a 65 year low
- Is pinduoduo's second quarter earnings too weak?
- Hoarding nearly 500 million doses of new coronal vaccine and engaging in political manipulation by tracing the source of the virus... The United States has become a stumbling block to the fight against the epidemic all over the world
- Apple is accused of stealing trade secrets: copying technology and digging at the foot of the wall
- Qiangmao announced a global sales agreement with MAOZe electronics
- Sa: vivo topped the list of 5g smartphone shipments in the Asia Pacific region for the first time in the second quarter
- Senior executives of Amazon AWS joined Microsoft, and Andy Jassy was a potential successor
- Ipad Mini 6 appearance details exposure: touch ID biometric side power key is used
- How to popularize the official account of education industry
- What is the Kwai Fu business number after certification?
- How to make short videos in industrial products industry?
- Rights after Kwai certification
- The classic battle of 65 times return cash out of 2.5 billion central enterprises in 6 years was born
- American retail investors are crazy about buying zhonggai shares! Weekly purchases soared to a five-year high
- Monthly live users decreased, the value of bitcoin plummeted, and metu lost 138 million in the first half of the year
- Impact of merchant number certification on Enterprises
- Haagen Dazs also "overturned"
- Gong Yu, CEO of iqiyi: the idol talent show and any off-site voting in the next few years have been cancelled
- Turn around? Huawei is licensed to buy hundreds of millions of dollars of auto chips in the United States
- The agency expects that Tesla has delivered more than 1.03 million model 3 by the end of the second quarter of this year
- Rookies open bonded goods warehouse delivery service, and imported goods can reach the next day
- A century of "discipline" recalling the miracle of steel discipline
- A business license can authenticate several business numbers
- Japanese electronics packaging company considers investing in TSMC's Japanese chip foundry
- Meituan affiliated company establishes a new company, and its business scope includes housekeeping services, etc
- Xiaopeng automobile's revenue in the second quarter was 3.761 billion yuan, a year-on-year increase of 536.7%
- Can I change my business license after the certification is Kwai Fu?
- Netdragon's net profit in the first half of 2021 is 430 million yuan, and it plans to start a US $300 million equity repurchase