current position:Home>The world's super chips unlock the "human brain level" Ai model, and the cluster is equipped with 163 million cores

The world's super chips unlock the "human brain level" Ai model, and the cluster is equipped with 163 million cores

2021-08-31 20:59:32 TechWeb

 

Today in the morning ,Cerebras Systems Announce launch   The world's first human brain size AI Solution , a CS-2 AI The computer can support more than 120 Trillion parameter scale training .  by comparison , The human brain has about 100 Trillion synapses .

Besides ,Cerebras also   Realized 192 platform CS-2 AI The computer expands almost linearly , To create a product that contains up to 1.63 Billion core computing clusters .

Cerebras Founded on 2016 year , So far 14 Countries have more than 350 Position Engineer , before Cerebras The world's largest computing chip WSE and WSE-2 Once shocked the industry .

WSE-2 use 7nm process , It's an area of 46225 Square millimeter single crystal round chip , Have 2.6 Trillions of transistors and 85 m AI Optimize the core , Both the number of cores and the on-chip memory capacity are much higher than those with the strongest performance so far GPU.

WSE-2 Be integrated in Cerebras CS-2 AI In the computer . With the large-scale development of the industry in recent years AI Model breakthrough 1 Trillion parameters , Small clusters are difficult to support high-speed training of a single model .

and Cerebras The latest published results ,  Single set CS-2 The size of neural network parameters that machines can support , Expand to the largest existing model 100 times —— achieve 120 Trillion parameters  .

At the top of the international chip architecture Hot Chips On ,Cerebras Co founder and chief hardware architect Sean Lie It shows in detail the   New technology portfolio ,  Include 4 Xiang Chuangxin :

(1)Cerebras Weight Streaming: A new software execution Architecture ,  The ability to store model parameters off chip is realized for the first time , At the same time, it provides the same training and reasoning performance as on the film  . This new execution model decomposes computation and parameter storage , It makes the expansion of cluster size and speed more independent and flexible , It also eliminates the delay and memory bandwidth problems often faced by large clusters , Greatly simplify the workload distribution model ,  So that users do not need to change the software , You can use 1 platform CS-2 Extended to 192 platform CS-2.

(2)Cerebras MemoryX: A memory expansion technology , by WSE-2 Provide up to 2.4PB Off chip high performance storage , It can maintain performance comparable to that on the chip .  With the help of MemoryX,CS-2 Can support up to 120 Trillions of parameter models .

(3)Cerebras SwarmX: It's a high performance 、AI Optimized communication structure , Extend the on-chip structure to off chip , send Cerebras can   Connect up to 192 platform CS-2 Of 1.63 One hundred million AI Optimize the core  , Work together to train a single neural network .

(4)Selectable Sparsity: A dynamic sparse selection technique , Enables the user to select the degree of weight sparsity in the model , And directly reduce FLOP And resolution time . Weight sparsity has always been a challenge in the field of machine learning , Because it is in GPU The efficiency is extremely low . This technology makes CS-2 Can speed up work , Various available sparsity types, including unstructured and dynamic weight sparsity, are used to generate answers in a shorter time .

Cerebras CEO and co-founder Andrew Feldman Said it promoted the development of the industry . Deputy director of Argonne National Laboratory Rick Stevens Also affirm this invention , I think this will be the first time we can explore a brain scale model , Open up broad new avenues for research and insights .

One 、 Weight Streaming : Deposit is separate , Realize off chip storage of model parameters

Use large clusters to solve AI One of the biggest challenges of the problem , Is set for a specific neural network 、 The complexity and time required to configure and optimize them . Software execution architecture Cerebras Weight Streaming It can just reduce the difficulty of programming the cluster system .

Weight Streaming Based on the WSE Based on oversized , Its calculation and parameter storage are completely separated . With the highest configuration 2.4PB Of storage devices MemoryX combination , A single CS-2 Can support running 120 Trillions of parameters .

Those who took part in the test 120 Trillion parameter neural network consists of Cerebras Internal development , Not a published neural network .

stay Weight Streaming in , The model weight is stored outside the central chip , Flow to wafer , It is used to calculate each layer of neural network . Trained in neural networks delta On the channel , The gradient flows from the wafer to the central storage area MemoryX Used to update weights in .

And GPU Different ,GPU The amount of on-chip memory is very small , Need to partition large models across multiple chips , and WSE-2 Large enough , Can adapt and execute very large-scale layers , Without traditional blocks or partitions to decompose .

This ability to adapt to each model layer in on-chip memory without partitioning , The same neural network workload mapping can be given , And independent of all others in the cluster CS-2 Do the same calculation for each layer .

The benefit of this is ,  Users do not need to make any software changes , The model can be easily run on a single machine CS-2 On , Extend to clusters of any size . in other words , In large quantities CS-2 The system runs on a cluster AI Model , Programming is like on a single computer CS-2 Run the model on the same .

Cambrian AI Founder and chief analyst Karl Freund Appraisal way :“Weight Streaming The execution model is very concise 、 grace , Allow in CS-2 Clusters allow easier allocation of work on incredible computing resources . adopt Weight Streaming,Cerebras Eliminates all the complexity we face today in building and efficiently using large clusters , Push the industry forward , I think it will be a journey of change .”

Two 、 MemoryX : Realize the multi billion parameter model

Have 100 Trillions of parameters on the human brain scale AI Model , About need 2PB Bytes of memory to store .

As mentioned earlier, model parameters can be stored off chip and efficiently streamed to CS-2, Achieve near on-chip performance , The key facility for storing neural network parameter weights , That is Cerebras MemoryX.

MemoryX yes DRAM and Flash The combination of , Designed to support the operation of large neural networks , It also contains the intelligence of accurate scheduling and weight updating .

Its architecture is scalable ,  Support from the 4TB to 2.4PB Configuration of , Support 2000 Million to 120 Trillion parameter scale  .

3、 ... and 、 SwarmX : Almost linear scaling performance , Support  192̨ CS-2  interconnection

Although one CS-2 The machine can store all the parameters of a given layer , but Cerebras It is also proposed to use a high-performance interconnection structure technology SwarmX, To achieve data parallelism .

This technology will Cerebras The on-chip structure is extended to off chip , Expanded AI The boundary of the cluster .

historically , Bigger AI Clustering brings significant performance and power losses . In terms of calculation , The performance increases linearly , The power and cost increase superlinearly . As more and more graphics processors are added to the cluster , Each processor contributes less and less to solving the problem .

SwarmX The structure does both communication , Also do calculations , Can make the cluster realize   Near linear performance expansion . this   It means that if extended to 16 A system , The speed of training neural network is close to improving 16 times .  Its structure is independent of MemoryX Expand , Every MemoryX Units can be used for any number of CS-2.

In this completely separate mode , SwarmX Structure support from 2 platform CS-2 Expand up to 192 platform , Because each one CS-2 Provide 85 m AI Optimize the core , Therefore, up to... Will be supported 1.63 One hundred million AI Optimize the cluster of cores .

Feldman say ,CS-2 The utilization rate of is much higher . The utilization rate of other methods is 10%~20% Between , and Cerebras Utilization on the largest network is 70%~80% Between .“ Today everyone CS2 Have replaced hundreds of GPU, We can now use the cluster method to take thousands of algebras GPU.”

Four 、 Selectable Sparsity : Dynamic sparsity improves computational efficiency

Sparsity is the key to improve computational efficiency . With AI The cost of community efforts to train large models increases exponentially , Sparsity and other algorithmic techniques are used to reduce the computation required to train the model to the most advanced accuracy FLOP More and more important .

The existing sparsity research has brought 10 Double the speed .

To speed up training ,Cerebras A new sparse method is proposed Selectable Sparsity, To reduce the computational effort required to find a solution , This reduces the response time .

Cerebras WSE Based on a fine-grained data flow architecture , Designed for sparse computing , Its 85 m AI The optimization kernel can ignore 0, Only for non 0 Data to calculate . This is something that other architectures cannot do .

In the neural network , There are many types of sparsity . Sparsity can exist in activation and parameters , It can be structured or unstructured .

Cerebras Architecture specific data flow scheduling and huge memory bandwidth , This enables such fine-grained processing to accelerate dynamic sparsity 、 Unstructured sparsity and other forms of sparsity . The result is ,CS-2 You can select and dial out , To produce a certain degree of FLOP Reduce , This reduces the response time .

Conclusion : The combination of new technologies makes cluster expansion less complex

Large clusters have always been plagued by setup and configuration challenges , Prepare and optimize in large scale GPU Neural networks running on clusters need more time . In order to be in GPU Achieve reasonable utilization on the cluster , Researchers often need to partition the model manually 、 Manage memory size and bandwidth limits 、 Perform complex and repetitive operations such as additional hyperparameters and optimizer tuning .

And through the Weight Streaming、MemoryX and SwarmX And other technologies ,Cerebras It simplifies the construction process of large clusters . It developed a completely different architecture , Completely eliminate the complexity of expansion . because WSE-2 Large enough , You don't have to work on multiple machines CS-2 Divide the layer of neural network , Even today's largest network layer can be mapped to a single CS-2.

Cerebras Each one in the cluster CS-2 The computer will have the same software configuration , Add another CS-2 It hardly changes the execution of any work . therefore , In dozens of stations CS-2 Running a neural network on is the same as running on a single system , Setting up a cluster is as simple as compiling a workload for a single machine and applying the same mapping to all machines of the desired cluster size .

On the whole ,Cerebras Our new technology portfolio is designed to accelerate the operation of very large-scale AI Model , But for now AI In terms of development process , The number of global institutions that can use such a cluster system is expected to be very limited .

 

copyright notice
author[TechWeb],Please bring the original link to reprint, thank you.
https://en.fheadline.com/2021/08/20210831205920327m.html

Random recommended