# Difference between revisions of "Restricted Boltzmann machine"

(→Model) |
|||

Line 3: | Line 3: | ||

==Model== | ==Model== | ||

− | [[File:ArtificialNeuronModel english.png|thumb|right|400px|Model of a neuron. <i>j</i> is the index of the neuron when there is more than one neuron. For the RBM, the activation function is logistic.]] | + | [[File:ArtificialNeuronModel english.png|thumb|right|400px|Model of a neuron. <i>j</i> is the index of the neuron when there is more than one neuron. For the RBM, the activation function is logistic, and the activation is actually the probability that the neuron will fire.]] |

We use a set of binary-valued neurons. Given a set of k-dimensional inputs represented as a column vector <math>\vec{x} = [x_1, x_2, \cdots, x_k]^T</math>, and a set of <i>m</i> neurons with (initially random, between -0.01 and 0.01) synaptic weights from the inputs, represented as a matrix formed by <i>m</i> weight column vectors (i.e. a <i>k</i> row x <i>m</i> column matrix): | We use a set of binary-valued neurons. Given a set of k-dimensional inputs represented as a column vector <math>\vec{x} = [x_1, x_2, \cdots, x_k]^T</math>, and a set of <i>m</i> neurons with (initially random, between -0.01 and 0.01) synaptic weights from the inputs, represented as a matrix formed by <i>m</i> weight column vectors (i.e. a <i>k</i> row x <i>m</i> column matrix): | ||

Line 16: | Line 16: | ||

where <math>w_{ij}</math> is the weight between input <i>i</i> and neuron <i>j</i>, the output of the set of neurons is defined as follows: | where <math>w_{ij}</math> is the weight between input <i>i</i> and neuron <i>j</i>, the output of the set of neurons is defined as follows: | ||

− | <center><math> | + | <center><math>\vec{p_y} = \varphi \left ( \mathbf{W}^\mathsf{T} \vec{x} \right )</math></center> |

− | where <math>\varphi \left ( \cdot \right )</math> is the logistic sigmoidal function: | + | where <math>\vec{p_y}</math> is a column vector of probabilities, where element <math>i</math> indicates the probability that neuron <math>i</math> will output a 1. <math>\varphi \left ( \cdot \right )</math> is the logistic sigmoidal function: |

<center><math> \varphi \left ( \nu \right ) = \frac{1}{1+e^{-\nu}}</math></center> | <center><math> \varphi \left ( \nu \right ) = \frac{1}{1+e^{-\nu}}</math></center> | ||

− | |||

− | <center><math> | + | From this output, a binary-valued ''reconstruction'' of the input <math>\vec{x'}</math> is formed as follows. First, choose the binary outputs of the output neurons <math>\vec{y}</math> based on the probabilities <math>\vec{p_y}</math>. Then: |

+ | |||

+ | <center><math> \vec{p_{x'}} = \varphi \left ( \mathbf{W} \vec{y} \right )</math></center> | ||

+ | |||

+ | Then the reconstructed binary inputs <math>\vec{x'}</math> based on the probabilities <math>\vec{p_{x'}}</math>. | ||

To update the weights, a set of inputs are presented, the outputs generated, and the inputs reconstructed. In practice, the reconstruction is then fed back to the input layer and another cycle is run, for several cycles, which is known as ''Gibbs sampling''. Then an average is taken over the results, and the weights are updated as follows: | To update the weights, a set of inputs are presented, the outputs generated, and the inputs reconstructed. In practice, the reconstruction is then fed back to the input layer and another cycle is run, for several cycles, which is known as ''Gibbs sampling''. Then an average is taken over the results, and the weights are updated as follows: |

## Revision as of 17:11, 26 April 2012

A **restricted Boltzmann machine**, commonly abbreviated as **RBM**, is a neural network where neurons beyond the visible have probabilitistic outputs. The machine is restricted because connections are restricted to be from one layer to the next, that is, having no intra-layer connections.

## Model

We use a set of binary-valued neurons. Given a set of k-dimensional inputs represented as a column vector <math>\vec{x} = [x_1, x_2, \cdots, x_k]^T</math>, and a set of *m* neurons with (initially random, between -0.01 and 0.01) synaptic weights from the inputs, represented as a matrix formed by *m* weight column vectors (i.e. a *k* row x *m* column matrix):

w_{11} & w_{12} & \cdots & w_{1m}\\ w_{21} & w_{22} & \cdots & w_{2m}\\ \vdots & & & \vdots \\ w_{k1} & w_{m2} & \cdots & w_{km}

\end{bmatrix}</math>where <math>w_{ij}</math> is the weight between input *i* and neuron *j*, the output of the set of neurons is defined as follows:

where <math>\vec{p_y}</math> is a column vector of probabilities, where element <math>i</math> indicates the probability that neuron <math>i</math> will output a 1. <math>\varphi \left ( \cdot \right )</math> is the logistic sigmoidal function:

From this output, a binary-valued *reconstruction* of the input <math>\vec{x'}</math> is formed as follows. First, choose the binary outputs of the output neurons <math>\vec{y}</math> based on the probabilities <math>\vec{p_y}</math>. Then:

Then the reconstructed binary inputs <math>\vec{x'}</math> based on the probabilities <math>\vec{p_{x'}}</math>.

To update the weights, a set of inputs are presented, the outputs generated, and the inputs reconstructed. In practice, the reconstruction is then fed back to the input layer and another cycle is run, for several cycles, which is known as *Gibbs sampling*. Then an average is taken over the results, and the weights are updated as follows:

where <math>\eta</math> is some learning rate and the subscript <math>N</math> indicates that the average is taken over <math>N</math> input presentations.