# Hebb's rule

Hebb's Rule or Hebb's postulate attempts to explain "associative learning", in which simultaneous activation of cells leads to pronounced increases in synaptic strength between those cells. Hebb stated:

Let us assume that the persistence or repetition of a reverberatory activity (or "trace") tends to induce lasting cellular changes that add to its stability.… When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased.[1]

## Model

Model of a neuron. j is the index of the neuron when there is more than one neuron. For a linear neuron, the activation function is not present (or simply the identity function).

Given a set of k-dimensional inputs represented as a column vector:

$\vec{x} = [x_1, x_2, \cdots, x_k]^T$

and a linear neuron with (initially random, uniformly distributed between -1 and 1) synaptic weights from the inputs:

$\vec{w} = [w_1, w_2, \cdots, w_k]^T$

then the output the neuron is defined as follows:

$y = \vec{w}^T \vec{x} = \sum_{i=1}^k w_i x_i$

Hebb's rule gives the update rule which is applied after an input pattern is presented:

$\Delta \vec{w} = \eta \vec{x} y$

where $\eta$ is some small fixed learning rate.

It should be clear that given the same input applied over and over, the weights will continue to grow without bound. One solution is to limit the size of the weights. Another solution is to normalize the weights after every presentation:

$\vec{w} \leftarrow \vec{w} / \left \| \vec{w} \right \|$

Normalizing the weights leads to Oja's rule.

## Hebb's rule and correlation

Instead of updating the weights after each input pattern, we can also update the weights after all input patterns. Suppose that there are $N$ input patterns. If we set the learning rate $\eta$ equal to $1/N$, then the update rule becomes

$\Delta \vec{w} = \frac{1}{N} \sum_{n=1}^N \vec{x}_n y_n = \left \langle \vec{x}_n y_n \right \rangle_N$

where $n$ is the pattern number, and $\left \langle \cdot \right \rangle_N$ is the average over N input patterns. This is convenient, because we can now substitute $y_n$:

$\Delta \vec{w} = \left \langle \vec{x}_n y_n \right \rangle_N = \left \langle \vec{x}_n \vec{w}^T \vec{x}_n \right \rangle_N = \left \langle \vec{x}_n \vec{x}_n^T \vec{w} \right \rangle_N = \left \langle \vec{x}_n \vec{x}_n^T \right \rangle_N \vec{w} = \mathbf{C} \vec{w}$

$\mathbf{C}$ is the correlation matrix for $\vec{x}$, provided that $\vec{x}$ has mean zero and variance one. This means that strong correlation between elements of $\vec{x}$ will result in a large increase in the weights from those elements, which is what Hebb's rule is all about.

Note that if $\vec{x}$ does not have mean zero and variance one, then the relationship holds up to a factor. Similarly, if the learning rate is not equal to $1/N$, then the relationship is still true up to a factor.