Difference between revisions of "Restricted Boltzmann machine"

From Eyewire
Jump to: navigation, search
(Model)
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
<translate>
 +
 
A '''restricted Boltzmann machine''', commonly abbreviated as '''RBM''', is a neural network where neurons beyond the visible have probabilitistic outputs. The machine is restricted because connections are restricted to be from one layer to the next, that is, having no intra-layer connections.
 
A '''restricted Boltzmann machine''', commonly abbreviated as '''RBM''', is a neural network where neurons beyond the visible have probabilitistic outputs. The machine is restricted because connections are restricted to be from one layer to the next, that is, having no intra-layer connections.
 +
 +
As with [[contrastive Hebbian learning]], there are two phases to the model, a ''positive'' phase, or ''wake'' phase, and a ''negative'' phase, or ''sleep'' phase.
  
 
==Model==
 
==Model==
Line 5: Line 9:
 
[[File:ArtificialNeuronModel english.png|thumb|right|400px|Model of a neuron. <i>j</i> is the index of the neuron when there is more than one neuron. For the RBM, the activation function is logistic, and the activation is actually the probability that the neuron will fire.]]
 
[[File:ArtificialNeuronModel english.png|thumb|right|400px|Model of a neuron. <i>j</i> is the index of the neuron when there is more than one neuron. For the RBM, the activation function is logistic, and the activation is actually the probability that the neuron will fire.]]
  
We use a set of binary-valued neurons. Given a set of k-dimensional inputs represented as a column vector <math>\vec{x} = [x_1, x_2, \cdots, x_k]^T</math>, and a set of <i>m</i> neurons with (initially random, between -0.01 and 0.01) synaptic weights from the inputs, represented as a matrix formed by <i>m</i> weight column vectors (i.e. a <i>k</i> row x <i>m</i> column matrix):
+
We use a set of binary-valued neurons. Given a set of k-dimensional inputs represented as a column vector [[File:Hebb1.png]], and a set of <i>m</i> neurons with (initially random, between -0.01 and 0.01) synaptic weights from the inputs, represented as a matrix formed by <i>m</i> weight column vectors (i.e. a <i>k</i> row x <i>m</i> column matrix):
  
<center><math>\mathbf{W} = \begin{bmatrix}
+
[[File:Sanger1.png|center]]
w_{11} & w_{12} & \cdots & w_{1m}\\
+
w_{21} & w_{22} & \cdots & w_{2m}\\
+
\vdots & & & \vdots \\
+
w_{k1} & w_{m2} & \cdots & w_{km}
+
\end{bmatrix}</math></center>
+
  
where <math>w_{ij}</math> is the weight between input <i>i</i> and neuron <i>j</i>, the output of the set of neurons is defined as follows:
+
where [[File:Sanger2.png]] is the weight between input <i>i</i> and neuron <i>j</i>.
  
<center><math>\vec{p_y} = \varphi \left ( \mathbf{W}^\mathsf{T} \vec{x} \right )</math></center>
+
During the positive phase, the output of the set of neurons is defined as follows:
  
where <math>\vec{p_y}</math> is a column vector of probabilities, where element <math>i</math> indicates the probability that neuron <math>i</math> will output a 1. <math>\varphi \left ( \cdot \right )</math> is the logistic sigmoidal function:
+
[[File:RBM1.png|center]]
  
<center><math> \varphi \left ( \nu \right ) = \frac{1}{1+e^{-\nu}}</math></center>
+
where [[File:RBM2.png]] is a column vector of probabilities, where element <em>i</em> indicates the probability that neuron <em>i</em> will output a 1. [[File:RBM3.png]] is the logistic sigmoidal function:
  
 +
[[File:RBM4.png]]
  
From this output, a binary-valued ''reconstruction'' of the input <math>\vec{x'}</math> is formed as follows. First, choose the binary outputs of the output neurons <math>\vec{y}</math> based on the probabilities <math>\vec{p_y}</math>. Then:
+
During the negative phase, from this output, a binary-valued ''reconstruction'' of the input [[File:RBM5.png]] is formed as follows. First, choose the binary outputs of the output neurons [[File:RBM6.png]] based on the probabilities [[File:RBM2.png]]. Then:
  
<center><math> \vec{p_{x'}} = \varphi \left ( \mathbf{W} \vec{y} \right )</math></center>
+
[[File:RBM7.png]]
  
Then the reconstructed binary inputs <math>\vec{x'}</math> based on the probabilities <math>\vec{p_{x'}}</math>.
+
Then the reconstructed binary inputs [[File:RBM5.png]] based on the probabilities [[File:RBM8.png]]. Next, the binary outputs [[File:RBM9.png]] are computed again based on the probabilities [[File:RBM10.png]], but this time from the reconstructed input:
  
To update the weights, a set of inputs are presented, the outputs generated, and the inputs reconstructed. In practice, the reconstruction is then fed back to the input layer and another cycle is run, for several cycles, which is known as ''Gibbs sampling''. Then an average is taken over the results, and the weights are updated as follows:
+
[[File:RBM11.png|center]]
 +
 
 +
This completes one ''wake-sleep'' cycle.
 +
 
 +
To update the weights, a wake-sleep cycle is completed, and weights updated as follows:
 +
 
 +
[[File:RBM12.png|center]]
 +
 
 +
where η is some learning rate. In practice, several wake-sleep cycles can be run before doing the weight update. This is known as ''Gibbs sampling''.
 +
 
 +
A batch update can also be used, where some number of patterns less than the full input set (a ''mini-batch'') are uniformly randomly presented, the wake and sleep results recorded, and then the updates done as follows:
 +
 
 +
[[File:RBM13.png|center]]
 +
 
 +
where [[File:RBM14.png]] is an average over the input presentations. This method is called ''contrastive divergence''.
  
<center><math>\Delta w_{ij} = \eta \left ( \langle x_i y_j \rangle_N - \langle x'_i y_j \rangle_N \right )</math></center>
 
  
where <math>\eta</math> is some learning rate and the subscript <math>N</math> indicates that the average is taken over <math>N</math> input presentations.
 
  
 
==References==
 
==References==
  
* {{cite web|url=http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf|title=A practical guide to training restricted Boltzmann machines|first=Geoffrey|last=Hinton|date=2 August 2010|publisher=University of Toronto Department of Computer Science}}
+
* Hinton, Geoffrey (August 2, 2010). [http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf "A practical guide to training restricted Boltzmann machines"]. University of Toronto Department of Computer Science.
 +
 
 +
* Cho, KyungHyun (March 14, 2011) [http://lib.tkk.fi/Dipl/2011/urn100427.pdf "Improved Learning Algorithms for Restricted Boltzmann Machines"]. Aalto University.
 +
 
 +
[[Category: Neural computational models]]
 +
 
 +
</translate>

Latest revision as of 03:24, 24 June 2016

A restricted Boltzmann machine, commonly abbreviated as RBM, is a neural network where neurons beyond the visible have probabilitistic outputs. The machine is restricted because connections are restricted to be from one layer to the next, that is, having no intra-layer connections.

As with contrastive Hebbian learning, there are two phases to the model, a positive phase, or wake phase, and a negative phase, or sleep phase.

Model

Model of a neuron. j is the index of the neuron when there is more than one neuron. For the RBM, the activation function is logistic, and the activation is actually the probability that the neuron will fire.
We use a set of binary-valued neurons. Given a set of k-dimensional inputs represented as a column vector
Error creating thumbnail: Unable to save thumbnail to destination
, and a set of m neurons with (initially random, between -0.01 and 0.01) synaptic weights from the inputs, represented as a matrix formed by m weight column vectors (i.e. a k row x m column matrix):
Error creating thumbnail: Unable to save thumbnail to destination
where
Error creating thumbnail: Unable to save thumbnail to destination
is the weight between input i and neuron j.

During the positive phase, the output of the set of neurons is defined as follows:

Error creating thumbnail: Unable to save thumbnail to destination
where
Error creating thumbnail: Unable to save thumbnail to destination
is a column vector of probabilities, where element i indicates the probability that neuron i will output a 1.
Error creating thumbnail: Unable to save thumbnail to destination
is the logistic sigmoidal function:
Error creating thumbnail: Unable to save thumbnail to destination
During the negative phase, from this output, a binary-valued reconstruction of the input
Error creating thumbnail: Unable to save thumbnail to destination
is formed as follows. First, choose the binary outputs of the output neurons
Error creating thumbnail: Unable to save thumbnail to destination
based on the probabilities
Error creating thumbnail: Unable to save thumbnail to destination
. Then:
Error creating thumbnail: Unable to save thumbnail to destination
Then the reconstructed binary inputs
Error creating thumbnail: Unable to save thumbnail to destination
based on the probabilities
Error creating thumbnail: Unable to save thumbnail to destination
. Next, the binary outputs
Error creating thumbnail: Unable to save thumbnail to destination
are computed again based on the probabilities
Error creating thumbnail: Unable to save thumbnail to destination
, but this time from the reconstructed input:
Error creating thumbnail: Unable to save thumbnail to destination

This completes one wake-sleep cycle.

To update the weights, a wake-sleep cycle is completed, and weights updated as follows:

Error creating thumbnail: Unable to save thumbnail to destination

where η is some learning rate. In practice, several wake-sleep cycles can be run before doing the weight update. This is known as Gibbs sampling.

A batch update can also be used, where some number of patterns less than the full input set (a mini-batch) are uniformly randomly presented, the wake and sleep results recorded, and then the updates done as follows:

Error creating thumbnail: Unable to save thumbnail to destination
where
Error creating thumbnail: Unable to save thumbnail to destination
is an average over the input presentations. This method is called contrastive divergence.


References