Difference between revisions of "Feedforward backpropagation"

From Eyewire
Jump to: navigation, search
m (Adds category)
Line 1: Line 1:
'''Feedforward backpropagation''' is an error-driven learning technique popularized in 1986 by David Rumelhart (1942-2011), an American psychologist, Geoffrey Hinton (1947-), a British informatician, and Ronald Williams, an American professor of computer science.<ref name=Rumelhart1986>{{cite journal|last=Rumelhart|first=David E.|coauthors=Hinton, Geoffrey E., Williams, Ronald J.|title=Learning representations by back-propagating errors|journal=Nature|date=8 October 1986|volume=323|issue=6088|pages=533–536|doi=10.1038/323533a0|url=http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf}}</ref> It is a ''supervised'' learning technique, meaning that the desired outputs are known beforehand, and the task of the network is to learn to generate the desired outputs from the inputs.
+
'''Feedforward backpropagation''' is an error-driven learning technique popularized in 1986 by David Rumelhart (1942-2011), an American psychologist, Geoffrey Hinton (1947-), a British informatician, and Ronald Williams, an American professor of computer science.<ref name=Rumelhart1986>Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (October 8, 1986). [http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf "Learning representations by back-propagating errors"] <em>Nature</em> <strong>323</strong> (6088): 533–536</ref> It is a ''supervised'' learning technique, meaning that the desired outputs are known beforehand, and the task of the network is to learn to generate the desired outputs from the inputs.
  
 
== Model ==
 
== Model ==
Line 9: Line 9:
 
Given a set of k-dimensional inputs with values between 0 and 1 represented as a column vector:
 
Given a set of k-dimensional inputs with values between 0 and 1 represented as a column vector:
  
<center><math>\vec{x} = [x_1, x_2, \cdots, x_k]^T</math></center>
+
[[File:Hebb1.png|center]]
  
 
and a nonlinear neuron with (initially random, uniformly distributed between -1 and 1) synaptic weights from the inputs:
 
and a nonlinear neuron with (initially random, uniformly distributed between -1 and 1) synaptic weights from the inputs:
  
<center><math>\vec{w} = [w_1, w_2, \cdots, w_k]^T</math></center>
+
[[File:Hebb2.png|center]]
  
 
then the output of the neuron is defined as follows:
 
then the output of the neuron is defined as follows:
  
<center><math>y = \varphi \left ( \vec{w}^T \vec{x} \right ) = \varphi \left ( \sum_{i=1}^k w_i x_i \right )</math></center>
+
[[File:FfBp1.png|center]]
  
where <math>\varphi \left ( \cdot \right )</math> is a sigmoidal function. We will assume that the sigmoidal function is the simple logistic function:
+
where [[File:RBM3.png]] is a sigmoidal function. We will assume that the sigmoidal function is the simple logistic function:
  
<center><math> \varphi \left ( \nu \right ) = \frac{1}{1+e^{-\nu}}</math></center>
+
[[File:RBM4.png|center]]
  
 
This function has the useful property that
 
This function has the useful property that
  
<center><math>\frac{\mathrm{d} \varphi }{\mathrm{d} \nu} = \varphi \left ( \nu \right ) \left ( 1 - \varphi \left ( \nu \right ) \right )</math></center>
+
[[File:FfBp2.png|center]]
  
  
 
Feedforward backpropagation is typically applied to multiple layers of neurons, where the inputs are called the ''input layer'', the layer of neurons taking the inputs is called the ''hidden layer'', and the next layer of neurons taking their inputs from the outputs of the hidden layer is called the ''output layer''. There is no direct connectivity between the output layer and the input layer.
 
Feedforward backpropagation is typically applied to multiple layers of neurons, where the inputs are called the ''input layer'', the layer of neurons taking the inputs is called the ''hidden layer'', and the next layer of neurons taking their inputs from the outputs of the hidden layer is called the ''output layer''. There is no direct connectivity between the output layer and the input layer.
  
If there are <math>N_I</math> inputs, <math>N_H</math> hidden neurons, and <math>N_O</math> output neurons, and the weights from inputs to hidden neurons are <math>w_{Hij}</math> (<math>i</math> being the input index and <math>j</math> being the hidden neuron index), and the weights from hidden neurons to output neurons are <math>w_{Oij}</math> (<math>i</math> being the hidden neuron index and <math>j</math> being the output neuron index), then the equations for the network are as follows:
+
If there are [[File:FfBp3.png]] inputs, [[File:FfBp4.png]] hidden neurons, and [[File:FfBp5.png]] output neurons, and the weights from inputs to hidden neurons are [[File:FfBp6.png]] (<em>i</em> being the input index and <em>j</em> being the hidden neuron index), and the weights from hidden neurons to output neurons are [[File:FfBp7.png]] (<em>i</em> being the hidden neuron index and <em>j</em> being the output neuron index), then the equations for the network are as follows:
  
<center><math>\begin{align}
+
[[File:FfBp8.png|center]]
n_{Hj} &= \sum_{i=1}^{N_I} w_{Hij} x_i, j \in \left \{ 1, 2, \cdots, N_H \right \} \\
+
y_{Hj} &= \varphi \left ( n_{Hj} \right ) \\
+
n_{Oj} &= \sum_{i=1}^{N_H} w_{Oij} y_{Hi}, j \in \left \{ 1, 2, \cdots, N_O \right \}  \\
+
y_{Oj} &= \varphi \left ( n_{Oj} \right )\\
+
\end{align}</math></center>
+
  
  
If the desired outputs for a given input vector are <math>t_j, j \in \left \{ 1, 2, \cdots, N_O \right \}</math>, then the update rules for the weights are as follows:
+
If the desired outputs for a given input vector are [[File:FfBp9.png]], then the update rules for the weights are as follows:
  
 +
[[File:FfBp10.png|center]]
  
<center><math>\begin{align}
+
where η is some small learning rate, [[File:FfBp11.png]] is an error term for output neuron <em>j</em> and [[File:FfBp12.png]] is a ''backpropagated'' error term for hidden neuron <em>j</em>.
\delta_{Oj} &= \left ( t_j-y_{Oj} \right )\\
+
\Delta w_{Oij} &= \eta \delta_{Oj} y_{Hi} \\
+
\delta_{Hj} &= \left ( \sum_{k=1}^{N_O} \delta_{Ok} w_{Ojk} \right ) y_{Hj} \left ( 1-y_{Hj} \right ) \\
+
\Delta w_{Hij} &= \eta \delta_{Hj} x_i
+
\end{align}</math></center>
+
 
+
where <math>\eta</math> is some small learning rate, <math>\delta_{Oj}</math> is an error term for output neuron <math>j</math> and <math>\delta_{Hj}</math> is a ''backpropagated'' error term for hidden neuron <math>j</math>.
+
  
 
== Derivation ==
 
== Derivation ==
Line 98: Line 87:
 
== Objections ==
 
== Objections ==
  
While mathematically sound, the feedforward backpropagation algorithm has been called biologically implausible due to its requirements for neural connections to communicate backwards.<ref>{{cite book|title=Backpropagation: Theory, Architectures, and Applications|editor=Chauvin, Yves; Rumelhart, David E.|year=1995|publisher=Lawrence Erlbaum Associates, Inc.|isbn=0805812598}}</ref>
+
While mathematically sound, the feedforward backpropagation algorithm has been called biologically implausible due to its requirements for neural connections to communicate backwards.<ref><em>Backpropagation: Theory, Architectures, and Applications</em>. Chauvin, Yves; Rumelhart, David E. (1995). Lawrence Erlbaum Associates, Inc. ISBN 0805812598</ref>
  
== References ==
+
== References =
 
<references/>
 
<references/>
  
 
[[Category: Neural computational models]]
 
[[Category: Neural computational models]]

Revision as of 15:50, 24 June 2014

Feedforward backpropagation is an error-driven learning technique popularized in 1986 by David Rumelhart (1942-2011), an American psychologist, Geoffrey Hinton (1947-), a British informatician, and Ronald Williams, an American professor of computer science.[1] It is a supervised learning technique, meaning that the desired outputs are known beforehand, and the task of the network is to learn to generate the desired outputs from the inputs.

Model

Model of a neuron. j is the index of the neuron when there is more than one neuron. The activation function for feedforward backpropagation is sigmoidal.


Given a set of k-dimensional inputs with values between 0 and 1 represented as a column vector:

Hebb1.png

and a nonlinear neuron with (initially random, uniformly distributed between -1 and 1) synaptic weights from the inputs:

Hebb2.png

then the output of the neuron is defined as follows:

FfBp1.png

where RBM3.png is a sigmoidal function. We will assume that the sigmoidal function is the simple logistic function:

RBM4.png

This function has the useful property that

FfBp2.png


Feedforward backpropagation is typically applied to multiple layers of neurons, where the inputs are called the input layer, the layer of neurons taking the inputs is called the hidden layer, and the next layer of neurons taking their inputs from the outputs of the hidden layer is called the output layer. There is no direct connectivity between the output layer and the input layer.

If there are FfBp3.png inputs, FfBp4.png hidden neurons, and FfBp5.png output neurons, and the weights from inputs to hidden neurons are FfBp6.png (i being the input index and j being the hidden neuron index), and the weights from hidden neurons to output neurons are FfBp7.png (i being the hidden neuron index and j being the output neuron index), then the equations for the network are as follows:

FfBp8.png


If the desired outputs for a given input vector are FfBp9.png, then the update rules for the weights are as follows:

FfBp10.png

where η is some small learning rate, FfBp11.png is an error term for output neuron j and FfBp12.png is a backpropagated error term for hidden neuron j.

Derivation

We first define an error term which is the cross-entropy of the output and target. We use cross-entropy because, in a sense, each output neuron represents a hypothesis about what the input represents, and the activation of the neuron represents a probability that the hypothesis is correct.

<math>E = -\sum_{j=1}^{N_O} t_j \ln y_{Oj} + \left ( 1-t_j \right ) \ln \left (1 - y_{Oj} \right )</math>

The lower the cross entropy, the more accurately the network represents what needs to be learned.

Next, we determine how the error changes based on changes to an individual weight from hidden neuron to output neuron:

<math>\begin{align}

\frac{\partial E }{\partial w_{Oij}} &= \frac{\partial E }{\partial y_{Oj}} \frac{\mathrm{d} y_{Oj} }{\mathrm{d} n_{Oj}} \frac{\partial n_{Oj}}{\partial w_{Oij}} \\ &= - \left [ \frac{t_j}{y_{Oj}} - \frac{1-t_j}{1-y_{Oj}} \right ] \frac{\mathrm{d} \varphi }{\mathrm{d} n_{Oj}} y_{Hi} \\ &= - \left [ \frac{t_j}{y_{Oj}} - \frac{1-t_j}{1-y_{Oj}} \right ] \varphi \left ( n_{Oj} \right ) \left ( 1 - \varphi \left ( n_{Oj} \right ) \right ) y_{Hi} \\ &= - \left [ \frac{t_j}{y_{Oj}} - \frac{1-t_j}{1-y_{Oj}} \right ] y_{Oj} \left ( 1-y_{Oj} \right ) y_{Hi} \\ &= \left ( y_{Oj} - t_j \right ) y_{Hi}

\end{align}</math>

We then want to change <math>w_{Oij}</math> slightly in the direction which reduces <math>E</math>, that is, <math>\Delta w_{Oij} \propto - \partial E / \partial w_{Oij}</math>. This is called gradient descent.

<math>\begin{align}

\Delta w_{Oij} &= - \eta \left ( y_{Oj} - t_j \right ) y_{Hi} \\ &= \eta \left ( t_j - y_{Oj} \right ) y_{Hi} \\ &= \eta \delta_{Oj} y_{Hi}

\end{align}</math>

We do the same thing to find the update rule for the weights between input and hidden neurons:

<math>\begin{align}

\frac{\partial E }{\partial w_{Hij}} &= \frac{\partial E }{\partial y_{Hj}} \frac{\mathrm{d} y_{Hj} }{\mathrm{d} n_{Hj}} \frac{\partial n_{Hj}}{\partial w_{Hij}} \\ &= \left ( \sum_{k=1}^{N_O} \frac{\partial E }{\partial y_{Ok}} \frac{\mathrm{d} y_{Ok} }{\mathrm{d} n_{Ok}} \frac{\partial n_{Ok}}{\partial y_{Hj}} \right ) \frac{\mathrm{d} y_{Hj} }{\mathrm{d} n_{Hj}} \frac{\partial n_{Hj}}{\partial w_{Hij}} \\ &= \left ( \sum_{k=1}^{N_O} \left ( y_{Ok} - t_k \right ) w_{Ojk} \right ) \frac{\mathrm{d} \varphi }{\mathrm{d} n_{Hj}} x_i \\ &= \left ( \sum_{k=1}^{N_O} \left ( y_{Ok} - t_k \right ) w_{Ojk} \right ) y_{Hj} \left ( 1 - y_{Hj} \right ) x_i \\ &= \left ( \sum_{k=1}^{N_O} - \delta_{Ok} w_{Ojk} \right ) y_{Hj} \left ( 1 - y_{Hj} \right ) x_i

\end{align}</math>

We then want to change <math>w_{Hij}</math> slightly in the direction which reduces <math>E</math>, that is, <math>\Delta w_{Hij} \propto - \partial E / \partial w_{Hij}</math>:

<math>\begin{align}

\Delta w_{Hij} &= - \eta \left ( \sum_{k=1}^{N_O} - \delta_{Ok} w_{Ojk} \right ) y_{Hj} \left ( 1 - y_{Hj} \right ) x_i \\ &= \eta \left ( \sum_{k=1}^{N_O} \delta_{Ok} w_{Ojk} \right ) y_{Hj} \left ( 1 - y_{Hj} \right ) x_i \\ &= \eta \delta_{Hj} x_i

\end{align}</math>

Objections

While mathematically sound, the feedforward backpropagation algorithm has been called biologically implausible due to its requirements for neural connections to communicate backwards.[2]

= References

  1. Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (October 8, 1986). "Learning representations by back-propagating errors" Nature 323 (6088): 533–536
  2. Backpropagation: Theory, Architectures, and Applications. Chauvin, Yves; Rumelhart, David E. (1995). Lawrence Erlbaum Associates, Inc. ISBN 0805812598