2021-08-31 06:08:35 TechWeb

【TechWeb】8 month 25 Daily news , Today, Suiyuan technology is in the annual Hot Chips At the conference, chief architect Liu Yan and senior chip design director Feng Chuang introduced the first generation cloud training chip “ Think deeply 1.0” Architecture details of .Hot Chips It is one of the important conferences related to high-performance microprocessors and integrated circuits in the world , Chip industry giants take this opportunity to show their latest achievements every year , Including processor architecture , Infrastructure computing platform , Memory processing and other technologies .

The first generation of general artificial intelligence training chip of Suiyuan technology “ Think deeply 1.0” Package diagram

Think deeply 1.0 It's Suiyuan technology 2019 year 12 The first generation cloud released in May AI Training chip , Adopt multi-core structure , Its calculation core adopts the self-developed... Of Suiyuan technology GCU-CARE Calculation engine . Whole SOC Have 32 individual GCU-CARE Calculation engine , form 4 A computing group , Comprehensive support for common AI Tensor data format (FP32/FP16/BF16, INT8/INT16/INT32), More comprehensive support for customer business .CARE It is also innovative to reuse the tensor core , Scalar efficiency is provided with more efficient transistors 、 vector 、 Tensor and the computing power of various data accuracy .

CU-DARE The data architecture , Data flow oriented optimization , Processing in data flow .512GB/s Of HBM and 200GB/s Of GCU-LARE interconnection , Several times more than traditional GPU、CPU; Robust Distributed on-chip shared cache , Provide 10TB/s Very large bandwidth ; Programmable shared cache , Controllable thread 、 Data resident sharing between threads , Eliminate unnecessary IO visit , It not only reduces the data access delay , And save valuable IO bandwidth ; meanwhile ,DARE The architecture also provides an asynchronous data loading interface , Support pipeline execution of data and operation , Improve the parallelism of operations .

Four way GCU-LARE Intelligent interconnection ,200GB/s High speed low delay inter chip interconnection interface , Flexible support for computing needs of different scales , It can support kcal scale clusters , Provide artificial intelligence training product portfolio based on different needs for large, medium and small data centers .

“ Think deeply 1.0”SOC

Think deeply 1.0 The AI acceleration chip is designed for cloud training scenarios , Support CNN、RNN、LSTM、BERT And so on , Can be used for images 、 Stream data 、 Voice training scenarios . Adopted standards PCIe 4.0 Interface , Widely compatible with mainstream AI The server , It can meet the needs of large-scale deployment of Data Center , And the energy efficiency ratio is leading .

The last part of the speech , Liu Yan also introduced what was just released at the world artificial intelligence conference last month “ Think deeply 2.0” Training chip . After a new upgrade iteration , Think deeply 2.0 Computing power 、 Storage and bandwidth 、 Compared with the first generation of training products, the Internet ability has been greatly improved , The ability to support large-scale models has been significantly enhanced . thus , Chert has become the first company in China to release the second generation artificial intelligence training product portfolio .

Think deeply 2.0 Large scale architecture upgrade , Deep optimization for the characteristics of Artificial Intelligence Computing , Consolidate the foundation of supporting general heterogeneous computing ; Support comprehensive calculation accuracy , Covering from FP32、TF32、FP16、BF16 To INT8, Single precision FP32 The peak computing power reaches 40TFLOPS, Single precision tensor TF32 The peak computing power reaches 160TFLOPS. At the same time 4 star HBM2E On chip memory chips , High configuration support 64GB Memory , Bandwidth up to 1.8TB/s.GCU-LARE Also fully upgraded , Provide two-way 300GB/s Internet bandwidth , Support thousands of Zhang yunsui CloudBlazer Speed up card interconnection , Achieve excellent linear speedup .

The second generation general AI training chip of Suiyuan technology “ Think deeply 2.0”

And the synchronous upgrade of the control calculation TopsRider software platform , Become the cornerstone of Suiyuan technology to build the original innovation software ecology . Through hardware and software co architecture design , Give full play to deep thinking 2.0 Performance of ; Based on operator generalization technology and graph optimization strategy , Support all kinds of model training under the mainstream deep learning framework ; utilize Horovod Distributed training framework and GCU-LARE Interconnection technologies work together , To provide solutions for the efficient operation of large-scale clusters . Open and upgraded programming model and extensible operator interface , It provides custom development capability for the optimization of customer model .

It is reported that Suiyuan technology focuses on cloud computing platform in the field of artificial intelligence , Committed to providing inclusive infrastructure solutions for the development of artificial intelligence industry , Provide high computing power with independent intellectual property rights 、 High energy efficiency ratio 、 Programmable general artificial intelligence training and reasoning products . Its innovative architecture 、 Interconnection scheme and distributed computing and programming platform , Can be widely used in Cloud Data Center 、 Supercomputing Center 、 Internet 、 Many AI scenarios such as finance and smart city .

The enterprise check message shows , Suiyuan technology has previously obtained several rounds of financing .2021 year 1 month 5 Risuiyuan technology announced the completion of C Round of funding 18 RMB 100 million , By CITIC Industrial Fund 、 Fund of CICC capital 、 Chunhua capital leads the investment , tencent 、 Wu Yuefeng capital 、 Many new and old shareholders such as red dot Venture Capital China fund follow the investment .

