After being linked to this article about in neural networks using Python, I decided to follow along in Scheme. Here is a little article on writing a very basic neural network from scratch using MIT/GNU Scheme.

If you’re interested, I recommend reading at least the first part of the article to get a more thorough understanding of the mathematics and terms.

Keeping with the theme of “making it from scratch”, here are the the necessary matrix operations.

```
(define (matrix-multiply x y)
(map (lambda (row) (apply map
(lambda column (apply + (map * row column))) y)) x))
(define (matrix-add x y) (map (lambda (x y) (map + x y)) x y))
(define (transpose m)
(apply map list m))
```

In the first section, we’ll just do some basic addition of two bits. Two inputs, a sum bit output, and a carry bit output. Pretty simple.

The basic unit of decision making is the “neuron”. Given any number of inputs it will return the result of a singular output based on a bias and per-input weights and fed into an “activation function”.

```
(define (neuron activation-function input weight bias)
(activation-function (+ (* input weight) bias)))
```

The simplest “neuron”, the perceptron, uses the step function as its activation function. If the input is less than zero, it outputs a `1`

. Otherwise, it outputs a `0`

.

```
(define (step-function z)
(if (< 0 z) 1 0))
```

Later, we’ll use a common type of activation function, the sigmoid function. This function is a smooth curve instead of a sharp step, allowing for a “learning” routine where weights and balances are tuned to slightly vary a decimal output as opposed to a strictly binary output.

Thus a perceptron can be implemented as

```
(lambda (input weight bias)
(step-function (+ (* input weight) bias)))
```

We can implement a NAND gate in a perceptron by using two inputs, each with weight `-2`

and the perceptron having a bias of `3`

.

To do this, we need to first adapt the genereralized `neuron`

function to accept multiple inputs and weights. To do this, we’ll use a `1x2`

matrix for the weights and `2x1`

matrix for the input. That way, they’ll multiply nicely.

```
(define neuron (activation-function inputs weights bias)
(activation-function (+
(caar (matrix-multiply weights inputs))
bias)))
```

The `caar`

is to turn the `1x1`

matrix into a number for the step function.

Now that we have a NAND gate implementation, we can form a 1-bit binary adder! Well, almost. We need to adapt the single-neuron function into a procedure that returns the result of a lot of linked neurons. It will be extremely helpful to refer again to the article on neural networking in Python, as I prefer to not duplicate the entire explanation.

The main idea is that we will have multiple layers of neurons, each receiving input from the layer before it (unless it is the input layer) and outputting a value to the layer after it (unless it is the output layer). Each input to a neuron must come from the layer before it (again, unless it is the input layer). This will greatly simplify the implementation, allowing us to code the network as a set of vectors and to calculate the result using little more than simple matrix operations.

In examining the design of a 1-bit binary adder, the inputs feed into neuron `A`

, `B`

, and `C`

. However, `A`

also feeds into `B`

and `C`

, which means that we need three layers, one for inputs, one for `A`

, and one for `B`

and `C`

. How, then, will be pass the input across two layers? We just need to insert a neuron into the second layer that passes along whatever input it gets to the next neuron. This means a weight of `1`

for all inputs and a bias of `0`

.

Notice how the weight of `A`

to the carry bit output (techincally to the “pass” neuron and then to the output) has a weight of `-4`

. This is because the output of `A`

feeds into both inputs of the NAND gate.

```
-2 * i1 + -2 * i1 = -4 * i1
```

Now that the layering problem is solved, we can begin to encode the network as a bunch of matricies. Again, taking from the original article, we will encode weights as a list of `MxN`

matricies where `M`

is the number of receiving neurons and `N`

is the number of sending neurons. For example, the weight of the connection between input neuron `x1`

and the `A`

would be row `2`

, column `1`

. If there is no connection between neurons, the weight is simply `0`

. Going from input `x2`

to “pass neuron” `p2`

would correspond to row `3`

, column `2`

and have a weight of `0`

.

Biases are encoded as a list of `Nx1`

matricies, one for each layer, where `N`

is the number of neurons in each layer. We exclude the input biases, as they should be `0`

. In our matrix calculations, we skip the input layer anyway.

Finally, the inputs. This is the easy part. Simply a `Nx1`

matrix where `N`

is the number of inputs.

For the 1-bit binary adder, the weights and biases are encoded as such:

```
(define weights '(
((1 0) (-2 -2) (0 1))
((-2 -2 0) (0 -2 -2) (0 1 0))
((-2 -2 0) (0 0 -4))))
(define biases '(
((0) (3) (0))
((3) (3) (0))
((3) (3))))
```

From a high level, we need to multiply the input matrix `A`

by weight matrix `W[0]`

and add bias matrix `B[0]`

, yielding `A'`

. Repeat again with `A'`

as the input for the next layer of neurons (weight matrix `W[1]`

and bias matrix `B[1]`

), until the network is done “feeding the results forward”.

```
feed-forward (A, W, B) {
if (w is null) return a
A' = map(activation-function, car(W) * transpose(A) + car(B))
feed-forward(A', cdr(W), cdr(B))
}
```

Because the list of weight and bias matricies are the same length, they’ll run out at the same time. translating into Scheme we get:

```
(define (feed-forward a-f a w b)
(if (null? w) a
(feed-forward
a-f
(map a-f (map car
(matrix-add
(matrix-multiply
(car w) (transpose (list a)))
(car b))))
(cdr w)
(cdr b))))
```

We need to `(map car ... )`

and `(transpose (list a))`

to provide a numerical input to the activation function and to get everything in the proper form for matrix multiplication respectively.

Now that we have a `feed-forward`

procedure, activation function, and some weights and biases, let’s put it all together:

```
;matrix & vector functions
(define (matrix-multiply m n)
(map (lambda (row) (apply map
(lambda column (apply + (map * row column))) n)) m))
(define (matrix-add x y) (map (lambda (x y) (map + x y)) x y))
(define (transpose m)
(apply map list m))
;activation functions
(define (step-function z)
(if (< 0 z) 1 0))
;get result of neural net computation
(define (feed-forward a-f a w b)
(if (null? w) a
(feed-forward
a-f
(map a-f (map car
(matrix-add
(matrix-multiply
(car w) (transpose (list a)))
(car b))))
(cdr w)
(cdr b))))
;example: add two bits X and Y
;output is list (SUM CARRY-BIT)
(define (add-two-bits x y)
(define weights '(
((1 0) (-2 -2) (0 1))
((-2 -2 0) (0 -2 -2) (0 1 0))
((-2 -2 0) (0 0 -4))))
(define biases '(
((0) (3) (0))
((3) (3) (0))
((3) (3))))
(feed-forward step-function (list x y) weights biases))
```

```
1 ]=> (add-two-bits 0 1)
;Value 1: (1 0)
1 ]=> (add-two-bits 1 1)
;Value 2: (0 1)
```

Since I initially published this article, I’ve turned these foundations into an actual machine learning library called Neural.NET (however in F# instead of Scheme). Included in the library project is NeuralTeach, a command line program that trains a feed-forward net to recognize handwritten numbers.

I’ve also written a simple demo app for Windows Phone, NeuralAppTest that incorporates the trained net from NeuralTeach and the Neural.NET routines to recognize a hand-drawn digit.

Please send any suggestions and corrections via e-mail!