# Difference between revisions of "Restricted Boltzmann machine"

(→Model) |
|||

Line 1: | Line 1: | ||

A '''restricted Boltzmann machine''', commonly abbreviated as '''RBM''', is a neural network where neurons beyond the visible have probabilitistic outputs. The machine is restricted because connections are restricted to be from one layer to the next, that is, having no intra-layer connections. | A '''restricted Boltzmann machine''', commonly abbreviated as '''RBM''', is a neural network where neurons beyond the visible have probabilitistic outputs. The machine is restricted because connections are restricted to be from one layer to the next, that is, having no intra-layer connections. | ||

+ | |||

+ | As with [[constrastive Hebbian learning]], there are two phases to the model, a ''positive'' phase, or ''wake'' phase, and a negative phase, or ''sleep'' phase. | ||

==Model== | ==Model== | ||

Line 14: | Line 16: | ||

\end{bmatrix}</math></center> | \end{bmatrix}</math></center> | ||

− | where <math>w_{ij}</math> is the weight between input <i>i</i> and neuron <i>j</i>, the output of the set of neurons is defined as follows: | + | where <math>w_{ij}</math> is the weight between input <i>i</i> and neuron <i>j</i>. |

+ | |||

+ | During the positive phase, the output of the set of neurons is defined as follows: | ||

<center><math>\vec{p_y} = \varphi \left ( \mathbf{W}^\mathsf{T} \vec{x} \right )</math></center> | <center><math>\vec{p_y} = \varphi \left ( \mathbf{W}^\mathsf{T} \vec{x} \right )</math></center> | ||

Line 23: | Line 27: | ||

− | + | During the negative phase, from this output, a binary-valued ''reconstruction'' of the input <math>\vec{x'}</math> is formed as follows. First, choose the binary outputs of the output neurons <math>\vec{y}</math> based on the probabilities <math>\vec{p_y}</math>. Then: | |

<center><math> \vec{p_{x'}} = \varphi \left ( \mathbf{W} \vec{y} \right )</math></center> | <center><math> \vec{p_{x'}} = \varphi \left ( \mathbf{W} \vec{y} \right )</math></center> | ||

− | Then the reconstructed binary inputs <math>\vec{x'}</math> based on the probabilities <math>\vec{p_{x'}}</math>. | + | Then the reconstructed binary inputs <math>\vec{x'}</math> based on the probabilities <math>\vec{p_{x'}}</math>. Next, the binary outputs <math>\vec{y'}</math> are computed again based on the probabilities <math>\vec{p_{y'}}</math>, but this time from the reconstructed input: |

− | + | <center><math>\vec{p_{y'}} = \varphi \left ( \mathbf{W}^\mathsf{T} \vec{x'} \right )</math></center> | |

− | + | This completes one ''wake-sleep'' cycle. | |

− | where <math>\eta</math> is some learning rate and the | + | To update the weights, a wake-sleep cycle is completed, and weights updated as follows: |

+ | |||

+ | <center><math>\Delta \mathbf{W} = \eta \left ( \vec{x} \vec{y}^\mathsf{T} - \vec{x'} \vec{y'}^\mathsf{T} \right )</math></center> | ||

+ | |||

+ | where <math>\eta</math> is some learning rate. This method is called ''contrastive divergence''. | ||

+ | |||

+ | A batch update can also be used, where all the patterns are presented, the wake and sleep results recorded, and then the updates done as follows: | ||

+ | |||

+ | <center><math>\Delta \mathbf{W} = \eta \left ( \left \langle \vec{x} \vec{y}^\mathsf{T} \right \rangle - \left \langle \vec{x'} \vec{y'}^\mathsf{T} \right \rangle \right )</math></center> | ||

+ | |||

+ | where <math>\left \langle \cdot \right \rangle</math> is an average over the input presentations. | ||

+ | |||

+ | In practice, several wake-sleep cycles can be run before doing the weight update. This is known as ''Gibbs sampling''. | ||

==References== | ==References== | ||

* {{cite web|url=http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf|title=A practical guide to training restricted Boltzmann machines|first=Geoffrey|last=Hinton|date=2 August 2010|publisher=University of Toronto Department of Computer Science}} | * {{cite web|url=http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf|title=A practical guide to training restricted Boltzmann machines|first=Geoffrey|last=Hinton|date=2 August 2010|publisher=University of Toronto Department of Computer Science}} | ||

+ | |||

+ | * {{cite web|url=http://lib.tkk.fi/Dipl/2011/urn100427.pdf|title=Improved Learning Algorithms for Restricted Boltzmann Machines|first=KyungHyun|last=Cho|date=14 March 2011|publisher=Aalto University}} |

## Revision as of 17:40, 26 April 2012

A **restricted Boltzmann machine**, commonly abbreviated as **RBM**, is a neural network where neurons beyond the visible have probabilitistic outputs. The machine is restricted because connections are restricted to be from one layer to the next, that is, having no intra-layer connections.

As with constrastive Hebbian learning, there are two phases to the model, a *positive* phase, or *wake* phase, and a negative phase, or *sleep* phase.

## Model

We use a set of binary-valued neurons. Given a set of k-dimensional inputs represented as a column vector <math>\vec{x} = [x_1, x_2, \cdots, x_k]^T</math>, and a set of *m* neurons with (initially random, between -0.01 and 0.01) synaptic weights from the inputs, represented as a matrix formed by *m* weight column vectors (i.e. a *k* row x *m* column matrix):

w_{11} & w_{12} & \cdots & w_{1m}\\ w_{21} & w_{22} & \cdots & w_{2m}\\ \vdots & & & \vdots \\ w_{k1} & w_{m2} & \cdots & w_{km}

\end{bmatrix}</math>where <math>w_{ij}</math> is the weight between input *i* and neuron *j*.

During the positive phase, the output of the set of neurons is defined as follows:

where <math>\vec{p_y}</math> is a column vector of probabilities, where element <math>i</math> indicates the probability that neuron <math>i</math> will output a 1. <math>\varphi \left ( \cdot \right )</math> is the logistic sigmoidal function:

During the negative phase, from this output, a binary-valued *reconstruction* of the input <math>\vec{x'}</math> is formed as follows. First, choose the binary outputs of the output neurons <math>\vec{y}</math> based on the probabilities <math>\vec{p_y}</math>. Then:

Then the reconstructed binary inputs <math>\vec{x'}</math> based on the probabilities <math>\vec{p_{x'}}</math>. Next, the binary outputs <math>\vec{y'}</math> are computed again based on the probabilities <math>\vec{p_{y'}}</math>, but this time from the reconstructed input:

This completes one *wake-sleep* cycle.

To update the weights, a wake-sleep cycle is completed, and weights updated as follows:

where <math>\eta</math> is some learning rate. This method is called *contrastive divergence*.

A batch update can also be used, where all the patterns are presented, the wake and sleep results recorded, and then the updates done as follows:

where <math>\left \langle \cdot \right \rangle</math> is an average over the input presentations.

In practice, several wake-sleep cycles can be run before doing the weight update. This is known as *Gibbs sampling*.