current position：Home>This is a very simple explanation of artificial neural network
This is a very simple explanation of artificial neural network
20210831 06:08:31 【TechWeb】
introduction
I can't machine learn , But last month I was GitHub I found a The minimalist 、 Entry level neural network tutorial , The sample code is Go Language . It is simple and easy to understand, and can make clear the truth with a line of formula , No more nonsense , I had a good time watching it .
Such a good thing has to be seen by more people , But the original text is in English and cannot be shared directly , So you have to contact the author first to get the authorization of translation , Then little bear translated the project , The last article you see . The process is arduous and takes a month , If you feel good after reading , Welcome to thumb up 、 Share with more people .
The content is divided into two parts ：
The first part ： The simplest artificial neural network
The second part ： The most basic back propagation algorithm
Artificial neural network is the basis of artificial intelligence , Only by laying a solid foundation , To play AI magic ！
reminder ： There are many formulas, but they just look bluffing , It's not difficult to read with patience . The following text begins ！
One 、 The simplest artificial neural network
The simplest artificial neural network explained and demonstrated by theory and code .
Sample code ：https://github.com/gokadin/aisimplestnetwork
theory Simulated neurons
Inspired by the working mechanism of the human brain , Artificial neural networks have interconnected analog neurons , Used to store patterns and communicate with each other . The simplest form of an analog neuron is to have one or more input values and one output value , Each of them has a weight .
Take the simplest , The output value is the sum of the input value multiplied by the weight .
A simple example
The function of the network is to simulate a complex function through multiple parameters , Thus, a specific output value can be obtained when a series of input values are given , And these parameters are usually difficult for us to formulate .
Suppose we have a network with two input values ,, They correspond to two weight values and .
Now we need to adjust the weight value , So that they can produce our preset output values .
At initialization , Because we don't know the optimal value , It is often a random assignment of weights , Here we are for simplicity , Initialize them all to 1 .
In this case , What we get is
Error value
If the output value is not consistent with our expected output value , Then there is an error .
for example , If we want the target value to be , So the difference here is
Usually we use variance （ That's the cost function ） To measure the error ：
If there are multiple sets of input and output values , Then the error is the average of the variance of each group .
We use variance to measure the difference between the output value and our expected target value . The effect of the negative deviation can be removed in the form of square , Highlight those deviation values that deviate greatly （ No matter positive or negative ）.
In order to correct the error , We need to adjust the weight value , So that the result is close to our target value . In our case , Will be taken from 1.0 drop to 0.5 You can achieve your goal , because
However , Neural networks often involve many different input and output values , In this case, we need a learning algorithm to help us complete this step automatically .
gradient descent
Now we need to use the error to help us find the weight value that should be adjusted , This minimizes the error . But before that , Let's look at the concept of gradient .
What is gradient ？
A gradient is essentially a vector pointing to the maximum slope of a function . We use to represent the gradient , In a nutshell , It is the vector form of the partial derivative of the function variable .
For a bivariate function , It is expressed in the following form ：
Let's use some numbers to simulate a simple example . Suppose we have a function that is , Then the gradient will be
What is gradient descent ？
Descent can be simply understood as finding the direction of the maximum slope of our function through the gradient , Then try many times with small steps in the opposite direction , So as to find the global function （ Sometimes local ） The weight with the smallest error value .
We use a method called Learning rate To represent this small step in the opposite direction , In the formula we use to characterize .
If the value is too large , It's possible to miss the minimum directly , But if the value is too small , Then our network will take longer to learn , It is also possible to fall into a shallow local minimum .
For the two weight values in our example and , We need to find the gradient of these two weight values relative to the error function
Remember our above formula and ？ For and , We can bring it in and calculate its gradient separately through the chain derivation rule in calculus
brevity , Later we will use this term to mean .
Once we have gradients , Bring our proposed learning rate into , The weight value can be updated as follows ：
Then repeat the process , Until the error value is minimum and approaches zero .
Code example
The accompanying example uses the gradient descent method , The following data sets are trained into a neural network with two input values and one output value ：
Once the training is successful , The network will enter two 1 Time output ~0, In the input 1 and 0 when , Output ~1 .
How to run ？ Go PS D:githubaisimplestnetworkmastersrc> go build o bin/test.exe PS D:githubaisimplestnetworkmasterin> ./test.exe err: 1.7930306267024234 err: 1.1763080417089242 …… err: 0.00011642621631266815 err: 0.00010770190838306002 err: 9.963134967988221e05 Finished after 111 iterations Results  [1 1] => [0.007421243532258703] [1 0] => [0.9879921757260246] Docker docker build t simplestnetwork . docker run rm simplestnetwork Two 、 The most basic back propagation algorithm
Back propagation （ English ：Backpropagation, Abbreviation for BP） yes “ Error back propagation ” For short , It's an optimization method （ Such as gradient descent method ） Used in combination with , Common methods for training artificial neural networks .
Back propagation technique can be used to train neural networks with at least one hidden layer . Let's start from the theory and combine the code to win Back propagation algorithm .
Sample code ：https://github.com/gokadin/aibackpropagation
theory Introduction to perceptron
The perceptron is such a processing unit ： It accepts input , Use the activation function to convert it , And output the result .
In a neural network , The input value is the sum of the weights of the output values of the previous layer nodes , Plus the error of the previous layer ：
If we take the error as another constant in the layer, it is 1 The node of , Then we can simplify the formula to
Activation function
Why do we need to activate functions ？ without , The output of each of our nodes will be linear , Thus, the whole neural network will be the output of a linear operation based on the input value . Because the combination of linear functions is still linear , So we must introduce nonlinear functions , In order to make the neural network different from the linear regression model .
in the light of , A typical activation function has the following form ：
Sigmoid function :
Linear rectification function ：
tanh function ：
Back propagation
Back propagation algorithm can be used to train artificial neural networks , Especially for networks with more than two layers .
The principle is to use forward pass To calculate the network output and error , Then, the weight value of the input layer is inversely updated according to the error gradient .
The term
Namely I, J, K The input value of the layer node .
Namely I, J, K Output value of layer node .
yes K The expected output value of the output node .
Namely I To J Layer and the J To K The weight value of the layer .
representative T The current group of associations in a group association .
In the following example , We will use the following activation functions for different layer nodes ：
Input layer > Identity function
Hidden layer > Sigmoid function
Output layer > Identity function
The forward pass
stay forward pass in , We input in the input layer , The results are obtained at the output layer .
The input of each node of the hidden layer is the weighted sum of the input values of the input layer ：
Because the activation function of the hidden layer is sigmoid, So the output will be ：
Again , The input value of the output layer is
Because we give the identity function as the activation function , So the output of this layer will be equal to the input value .
Once the input value propagates through the network , We can calculate the error value . If there are multiple sets of associations , Remember the variance we learned in the first part ？ here , We can use the mean variance to calculate the error .
The backward pass
Now we've got the error , It can be transmitted in reverse , To correct the weight value of the network with error .
Through the first part of the study , We know that the adjustment of the weight can be based on the partial derivative of the error to the weight multiplied by the learning rate , In the following form
We calculate the error gradient through the chain rule , as follows ：
therefore , The adjustment of weight is
For multiple associations , Then the weight adjustment will be the sum of each associated weight adjustment value
Similarly , For weight adjustment between hidden layers , Continue with the example above , The weight adjustment value between the input layer and the first hidden layer is
that , The weight adjustment based on all associations is the sum of the adjustment values calculated for each association
Calculation
here , We can explore further . In this paper , We see .
For the first half , We can have
For the second half , Because we have adopted sigmoid function , We know ,sigmoid The derivative form of the function is , therefore , Yes
Sum up , The calculation formula can be obtained as follows
Algorithm is summarized
First , Assign a small random value to the network weight value .
Repeat the following steps , Until the error is 0 ：
For each Association , Forward transmission through neural network , Get the output value
Calculate the error of each output node （）
The gradient of each output weight is calculated by superposition （）
Calculate the of each node in the hidden layer （）
Overlay calculates the gradient of each hidden layer weight （）
Update ownership revaluation , Reset overlay gradient （）
Graphical back propagation
In this example , We use real data to simulate every step in the neural network . The input value is [1.0, 1.0], The expected output value is [0.5]. In order to simplify the , We set the initialization weight to 0.5 （ Although in practice , Random values are often used ）. For input 、 Hide and output layers , We use the identity function 、 sigmoid function And identity function as activation function , The learning rate is 0.01 .
Forward pass
At the beginning of the operation , We set the node input value of the input layer to .
Because we use the identity function as the activation function for the input layer , So there is .
Next , We pass the network forward to... Through the weighted sum of the previous layer J layer , as follows
then , We will J Enter the value of the layer node into sigmoid function （, Will be substituted in , obtain 0.731） Activate .
Last , We pass this result to the final output layer .
Because the activation function of our output layer is also an identity function , therefore
Backward pass
The first step of back propagation , Is the name of the calculated output node ,
By calculation J and K The weight gradient between two nodes ：
Next , Calculate the value of each hidden layer in the same way （ In this example , There is only one hidden layer ）：
in the light of I and J The gradient of layer node weight calculation is ：
The last step is to update all weight values with the calculated gradient . Note here if we have more than one Association , Then you can accumulate for each group of associated gradients , Then update the weight value .
You can see that the weight value changes very little , But if we run again with this weight forward pass, In general, you will get a smaller error than before . Let's now look at ……
The first time we got , The new weight value is used to calculate .
thus ,, and .
so , The error is reduced ！ Although the reduction is small , But it is also very representative for a real scene . Repeat the operation according to the algorithm , Generally, the error can be reduced to 0, Then the training of neural network is completed .
Code example
In this example , Will a 2X2X1 The network trained XOR The effect of the operator .
here ,f For hidden layers sigmoid Activation function .
Be careful ,XOR The operator cannot be simulated by the linear network in the first part , Because the data set distribution is nonlinear . That is, you can't pass through a straight line XOR The four input values of are correctly divided into two categories . If we were to sigmoid Replace the function with an identity function , This network will also be infeasible .
After talking so much , It's your turn to do it yourself ！ Try using different activation functions 、 Learning rate and network topology , See how it works ？
Thanks for the authorization of the original author ：
copyright notice
author[TechWeb],Please bring the original link to reprint, thank you.
https://en.fheadline.com/2021/08/20210831060829017B.html
The sidebar is recommended
 Kwai do certification audit is not approved?
 Waymo will test waymo one self driving taxi service in San Francisco
 China telecom users with 20 years of network age upgrade Gigabit broadband for one year
 Saijing technology announced its midterm performance in 2021, and its sales revenue decreased by 28.6%
 Zhonggai stocks rebounded sharply, and the "wooden sister" quickly increased her position in Jingdong
 Suiyuan technology appeared at the hot chips conference to explain the deep chip architecture in detail
 Apple responded that South Korea plans to prohibit the drawing of app store: users will face risks
 Seth and China Unicom successfully signed a framework contract for centralized procurement of ultrahigh precision time synchronization equipment
 Ministry of industry and information technology: the first batch of aging adapted websites and apps will be completed by the end of 2021
 Cryptopunk is wildly fired, Buffett faces a dilemma
guess what you like

Where is the Kwai service number invitation code?

What is the authentication of Kwai Tong business?

Kwai certification promotion process

What is the authentication of Kwai service number?

What is the promoter of blue Kwai v?

Is Kwai LAN blue V certified worth it?

On the digital transformation of manufacturing industry from the "Transformation Road" of technology suppliers

Alarm  he wants to buy a house, "brother" takes the initiative to swipe his card; He likes cars, "brother" gives millions of luxury cars... "Jianghu friendship" poisoned him too deeply

BYD semiconductor established a new company in Jinan with a registered capital of 4.9 billion

Kwai certified service provider agent
Random recommended
 Kwai certified agent
 Is it correct to choose promotion by Kwai LAN V certification?
 Naixue's tea achieved an adjusted net profit of 48.2 million yuan in the first half of the year
 "I found my son who had been separated for 28 years in prison..."
 Kwai LAN blue V promotion is real?
 Excuse me, what audio software can convert the recording format to MP3?
 These bad habits you think are actually good habits! Never change!!!
 Google's UAV delivery business has delivered more than 100000 packages
 US listed companies issued a record $75 billion in new shares this year
 Tiger tooth Betta merger was stopped, and Tencent announced the restart of penguin Esports
 IPO meeting of Shanghai Xindao Electronic Technology Co., Ltd
 The 22 apps off the shelves of Shanghai Communications Administration Bureau have problems such as opening the screen and popup information to harass users
 The impact of Kwai certification on businesses
 How to make short videos in the garment industry?
 Death penalty“ Judgment of first instance on Wu Xieyu's mother killing case
 Authorized by Yunding chess of League of heroes! Tencent's "golden shovel battle" went online today
 POSCO chemical of Korea invested more than 280 billion won to build an electric vehicle battery material factory in China
 How can the jewelry industry promote it on short videos?
 Siemens announces US charging pile expansion plan
 Hardcore observation 375 Godson was accused of copying MIPS code from loongarch's kernel code
 The chairman of Fidelity International China will leave, with an asset management scale of more than trillion
 SF intra city express affiliated companies have added valet services to their business scope
 BOE established an automotive electronics company with a registered capital of 150 million
 Research Institute: Global NAND flash sales increased to US $16.4 billion in the second quarter
 Faraday future FF announces qmerit as a partner of electric vehicle home charging service
 1000 days! Chinese Ambassador calls Meng Wanzhou
 What are the benefits of merchant number authentication