The world's super chips unlock the "human brain level" Ai model, and the cluster is equipped with 163 million cores

2021-08-31 20:59:32 TechWeb


Today in the morning ,Cerebras Systems Announce launch   The world's first human brain size AI Solution , a CS-2 AI The computer can support more than 120 Trillion parameter scale training .  by comparison , The human brain has about 100 Trillion synapses .

Besides ,Cerebras also   Realized 192 platform CS-2 AI The computer expands almost linearly , To create a product that contains up to 1.63 Billion core computing clusters .

Cerebras Founded on 2016 year , So far 14 Countries have more than 350 Position Engineer , before Cerebras The world's largest computing chip WSE and WSE-2 Once shocked the industry .

WSE-2 use 7nm process , It's an area of 46225 Square millimeter single crystal round chip , Have 2.6 Trillions of transistors and 85 m AI Optimize the core , Both the number of cores and the on-chip memory capacity are much higher than those with the strongest performance so far GPU.

WSE-2 Be integrated in Cerebras CS-2 AI In the computer . With the large-scale development of the industry in recent years AI Model breakthrough 1 Trillion parameters , Small clusters are difficult to support high-speed training of a single model .

and Cerebras The latest published results ,  Single set CS-2 The size of neural network parameters that machines can support , Expand to the largest existing model 100 times —— achieve 120 Trillion parameters  .

At the top of the international chip architecture Hot Chips On ,Cerebras Co founder and chief hardware architect Sean Lie It shows in detail the   New technology portfolio ,  Include 4 Xiang Chuangxin :

(1)Cerebras Weight Streaming: A new software execution Architecture ,  The ability to store model parameters off chip is realized for the first time , At the same time, it provides the same training and reasoning performance as on the film  . This new execution model decomposes computation and parameter storage , It makes the expansion of cluster size and speed more independent and flexible , It also eliminates the delay and memory bandwidth problems often faced by large clusters , Greatly simplify the workload distribution model ,  So that users do not need to change the software , You can use 1 platform CS-2 Extended to 192 platform CS-2.

(2)Cerebras MemoryX: A memory expansion technology , by WSE-2 Provide up to 2.4PB Off chip high performance storage , It can maintain performance comparable to that on the chip .  With the help of MemoryX,CS-2 Can support up to 120 Trillions of parameter models .

(3)Cerebras SwarmX: It's a high performance 、AI Optimized communication structure , Extend the on-chip structure to off chip , send Cerebras can   Connect up to 192 platform CS-2 Of 1.63 One hundred million AI Optimize the core  , Work together to train a single neural network .

(4)Selectable Sparsity: A dynamic sparse selection technique , Enables the user to select the degree of weight sparsity in the model , And directly reduce FLOP And resolution time . Weight sparsity has always been a challenge in the field of machine learning , Because it is in GPU The efficiency is extremely low . This technology makes CS-2 Can speed up work , Various available sparsity types, including unstructured and dynamic weight sparsity, are used to generate answers in a shorter time .

Cerebras CEO and co-founder Andrew Feldman Said it promoted the development of the industry . Deputy director of Argonne National Laboratory Rick Stevens Also affirm this invention , I think this will be the first time we can explore a brain scale model , Open up broad new avenues for research and insights .

One 、 Weight Streaming : Deposit is separate , Realize off chip storage of model parameters

Use large clusters to solve AI One of the biggest challenges of the problem , Is set for a specific neural network 、 The complexity and time required to configure and optimize them . Software execution architecture Cerebras Weight Streaming It can just reduce the difficulty of programming the cluster system .

Weight Streaming Based on the WSE Based on oversized , Its calculation and parameter storage are completely separated . With the highest configuration 2.4PB Of storage devices MemoryX combination , A single CS-2 Can support running 120 Trillions of parameters .

Those who took part in the test 120 Trillion parameter neural network consists of Cerebras Internal development , Not a published neural network .

stay Weight Streaming in , The model weight is stored outside the central chip , Flow to wafer , It is used to calculate each layer of neural network . Trained in neural networks delta On the channel , The gradient flows from the wafer to the central storage area MemoryX Used to update weights in .

And GPU Different ,GPU The amount of on-chip memory is very small , Need to partition large models across multiple chips , and WSE-2 Large enough , Can adapt and execute very large-scale layers , Without traditional blocks or partitions to decompose .

This ability to adapt to each model layer in on-chip memory without partitioning , The same neural network workload mapping can be given , And independent of all others in the cluster CS-2 Do the same calculation for each layer .

The benefit of this is ,  Users do not need to make any software changes , The model can be easily run on a single machine CS-2 On , Extend to clusters of any size . in other words , In large quantities CS-2 The system runs on a cluster AI Model , Programming is like on a single computer CS-2 Run the model on the same .

Cambrian AI Founder and chief analyst Karl Freund Appraisal way :“Weight Streaming The execution model is very concise 、 grace , Allow in CS-2 Clusters allow easier allocation of work on incredible computing resources . adopt Weight Streaming,Cerebras Eliminates all the complexity we face today in building and efficiently using large clusters , Push the industry forward , I think it will be a journey of change .”

Two 、 MemoryX : Realize the multi billion parameter model

Have 100 Trillions of parameters on the human brain scale AI Model , About need 2PB Bytes of memory to store .

As mentioned earlier, model parameters can be stored off chip and efficiently streamed to CS-2, Achieve near on-chip performance , The key facility for storing neural network parameter weights , That is Cerebras MemoryX.

MemoryX yes DRAM and Flash The combination of , Designed to support the operation of large neural networks , It also contains the intelligence of accurate scheduling and weight updating .

Its architecture is scalable ,  Support from the 4TB to 2.4PB Configuration of , Support 2000 Million to 120 Trillion parameter scale  .

3、 ... and 、 SwarmX : Almost linear scaling performance , Support  192̨ CS-2  interconnection

Although one CS-2 The machine can store all the parameters of a given layer , but Cerebras It is also proposed to use a high-performance interconnection structure technology SwarmX, To achieve data parallelism .

This technology will Cerebras The on-chip structure is extended to off chip , Expanded AI The boundary of the cluster .

historically , Bigger AI Clustering brings significant performance and power losses . In terms of calculation , The performance increases linearly , The power and cost increase superlinearly . As more and more graphics processors are added to the cluster , Each processor contributes less and less to solving the problem .

SwarmX The structure does both communication , Also do calculations , Can make the cluster realize   Near linear performance expansion . this   It means that if extended to 16 A system , The speed of training neural network is close to improving 16 times .  Its structure is independent of MemoryX Expand , Every MemoryX Units can be used for any number of CS-2.

In this completely separate mode , SwarmX Structure support from 2 platform CS-2 Expand up to 192 platform , Because each one CS-2 Provide 85 m AI Optimize the core , Therefore, up to... Will be supported 1.63 One hundred million AI Optimize the cluster of cores .

Feldman say ,CS-2 The utilization rate of is much higher . The utilization rate of other methods is 10%~20% Between , and Cerebras Utilization on the largest network is 70%~80% Between .“ Today everyone CS2 Have replaced hundreds of GPU, We can now use the cluster method to take thousands of algebras GPU.”

Four 、 Selectable Sparsity : Dynamic sparsity improves computational efficiency

Sparsity is the key to improve computational efficiency . With AI The cost of community efforts to train large models increases exponentially , Sparsity and other algorithmic techniques are used to reduce the computation required to train the model to the most advanced accuracy FLOP More and more important .

The existing sparsity research has brought 10 Double the speed .

To speed up training ,Cerebras A new sparse method is proposed Selectable Sparsity, To reduce the computational effort required to find a solution , This reduces the response time .

Cerebras WSE Based on a fine-grained data flow architecture , Designed for sparse computing , Its 85 m AI The optimization kernel can ignore 0, Only for non 0 Data to calculate . This is something that other architectures cannot do .

In the neural network , There are many types of sparsity . Sparsity can exist in activation and parameters , It can be structured or unstructured .

Cerebras Architecture specific data flow scheduling and huge memory bandwidth , This enables such fine-grained processing to accelerate dynamic sparsity 、 Unstructured sparsity and other forms of sparsity . The result is ,CS-2 You can select and dial out , To produce a certain degree of FLOP Reduce , This reduces the response time .

Conclusion : The combination of new technologies makes cluster expansion less complex

Large clusters have always been plagued by setup and configuration challenges , Prepare and optimize in large scale GPU Neural networks running on clusters need more time . In order to be in GPU Achieve reasonable utilization on the cluster , Researchers often need to partition the model manually 、 Manage memory size and bandwidth limits 、 Perform complex and repetitive operations such as additional hyperparameters and optimizer tuning .

And through the Weight Streaming、MemoryX and SwarmX And other technologies ,Cerebras It simplifies the construction process of large clusters . It developed a completely different architecture , Completely eliminate the complexity of expansion . because WSE-2 Large enough , You don't have to work on multiple machines CS-2 Divide the layer of neural network , Even today's largest network layer can be mapped to a single CS-2.

Cerebras Each one in the cluster CS-2 The computer will have the same software configuration , Add another CS-2 It hardly changes the execution of any work . therefore , In dozens of stations CS-2 Running a neural network on is the same as running on a single system , Setting up a cluster is as simple as compiling a workload for a single machine and applying the same mapping to all machines of the desired cluster size .

On the whole ,Cerebras Our new technology portfolio is designed to accelerate the operation of very large-scale AI Model , But for now AI In terms of development process , The number of global institutions that can use such a cluster system is expected to be very limited .


