Difference between revisions of "Conditional principal components analysis"

Revision as of 15:55, 10 April 2012

Conditional principal components analysis seeks to restrict neurons to perform principal components analysis only when activated.^[1]

Model

Model of a neuron. j is the index of the neuron when there is more than one neuron. For a linear neuron, the activation function is not present (or simply the identity function).

We use a set of linear neurons with binary inputs. Given a set of k-dimensional binary inputs represented as a column vector <math>\vec{x} = [x_1, x_2, \cdots, x_k]^T</math>, and a set of m linear neurons with (initially random) synaptic weights from the inputs, represented as a matrix formed by m weight column vectors (i.e. a k row x m column matrix):

<math>\mathbf{W} = \begin{bmatrix}

w_{11} & w_{12} & \cdots & w_{1m}\\ w_{21} & w_{22} & \cdots & w_{2m}\\ \vdots & & & \vdots \\ w_{k1} & w_{k2} & \cdots & w_{km}

\end{bmatrix}</math>

where <math>w_{ij}</math> is the weight between input i and neuron j, the output of the set of neurons is defined as follows:

<math>\vec{y} = \mathbf{W}^T \vec{x}</math>

The CPCA rule gives the update rule which is applied after an input pattern is presented:

<math>\Delta w_{ij} = \eta y_j(x_i - w_{ij})</math>

With a set of such neurons, typically a k-Winner-Takes-All pass is run before the update: all neurons are evaluated, and the <math>k</math> neurons with the highest outputs have their outputs set to 1, while the rest have their outputs set to 0.

Derivation

We want the weight from input <math>i</math> to neuron <math>j</math> to eventually settle at the probability that neuron <math>j</math> will be activated given that input <math>i</math> is activated. That is, when the weight is at equilibrium, we have:

By the definition of conditional probability:

<math>w_{ij} = P(y_j=1 \mid x_i=1) = \frac{P(y_j=1 \wedge x_i=1)}{P(y_j=1)}</math>

Using the total probability theorem, we can condition the numerator and denominator on the input patterns. If an input pattern is <math>t</math>, then we have:

<math>\begin{align}

P(y_j=1) &= \sum_t P(y_j=1 \mid t)P(t),\\ P(y_j=1 \wedge x_i=1) &= \sum_t P(y_j=1 \wedge x_i=1 \mid t)P(t)

\end{align}</math>

Substituting back into the equation for <math>w_{ij}</math> and doing some rearrangement, we get:

<math>0 = \sum_t P(y_j=1 \wedge x_i=1 \mid t)P(t) - w_{ij} \sum_t P(y_j=1 \mid t)P(t)</math>

A good assumption is that all input patterns in the set of input patterns are equally likely to appear, so that <math>P(t)</math> is a constant and can be eliminated:

<math>0 = \sum_t P(y_j=1 \wedge x_i=1 \mid t) - w_{ij} \sum_t P(y_j=1 \mid t)</math>

Since inputs and outputs are either 0 or 1, conveniently the average over all patterns of an input or output (or a combination of input and output) will be equal to the probability of that input or output (or combination of input and output) being 1. Thus:

<math> 0 = \left \langle y_j x_i \right \rangle_t - w_{ij} \left \langle y_j \right \rangle_t</math>

We can easily turn this into an update rule which will drive the weights to the above equilibrium condition:

<math>\Delta w_{ij} = \eta (y_j x_i - w_{ij} y_j) = \eta y_j(x_i - w_{ij})</math>

Interpretation

Since the inputs and outputs are binary, the update rule can be interpreted as follows:

If the output is not active, do not alter any weight.
If the output is active and an input is not active, subtract the weight (times a learning rate).
If the output is active and an input is also active, add 1 minus the weight (times a learning rate).

The second rule has the effect of driving the weight towards zero (asymptotically), and the third rule has the effect of driving the weight towards one (asymptotically). Overall, the rules combine to equilibrate the weight towards the probability that the output is active given that the input is active.

References

↑ Template:Cite book

[1] Template:Cite book

[1]

@@ Line 65: / Line 65: @@
 * If the output is active and an input is also active, add 1 minus the weight (times a learning rate).
-This will have the effect of attempting to saturate the weight towards zero or one.
+The second rule has the effect of driving the weight towards zero (asymptotically), and the third rule has the effect of driving the weight towards one (asymptotically). Overall, the rules combine to equilibrate the weight towards the probability that the output is active given that the input is active.
 ==References==
 <references/>

Difference between revisions of "Conditional principal components analysis"

Revision as of 15:55, 10 April 2012

Contents

Model

Derivation

Interpretation

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools