AI Deep Dive

The neurons that a neural network uses are based on simple models of real neurons , which work like this:

The dendrites of a neuron collect incoming signals from other neurons. Those signals travel down the branches of the dendrite to the cell body, where the signals all accumulate. If enough signal is present, the neuron will fire, and send its own signal traveling down its axon. The signal will travel through the branches of the axon to all its synapses, where it connects with other neurons. Some synapses are big and others small, so the signal that gets passed along will be strong or weak depending on the size of that synapse. And the process continues in each connected neuron.

Modern neural networks use a mathematical model that mimics this process. This drawing helps describe the model:


The current state-of-the-art artificial intelligence methods are based on simple neuroscience models from the 1970s. These models are called neural networks, and they use very simple models of neurons and build connections between them using a powerful mathematical technique that many researchers don’t believe exists in the brain. Using these neural networks, computer scientists are able to automatically or semi-automatically perform many previously difficult tasks, like image and speech recognition.

Each incoming signal is a number, and that number is multiplied by the strength of the connection (what’s known as the connection “weight”). You can think of the weight like the importance of the incoming signal - the larger the weight, the more important the signal. The neuron adds all those weighted signals up (that’s the big Sigma box). Depending on the amount of signal the neuron receives, it will send out its own signal as a new number that will be received by all its downstream neurons. How much signal the neuron determines to send along can be done in many ways. The simplest is to say that the neuron sends out the same signal, but only if enough signal has arrived from its upstream neighbors. (For those who want a little more realism, a common interpretation of the signal that this model neuron outputs is a firing rate.)


How do neural networks work?

Neural networks connect these model neurons together in all types of ways. How the network is laid out is known as the architecture. Here’s a common architecture:


The circles represent the model neurons, and the lines represent the connections between them. The lines are arrows, because information passes in one direction, from one neuron to the next. This architecture has its neurons organized into layers. Layers are a useful way to conceptually keep track of what a set of neurons is supposed to be doing -- more on that in a little bit. As an architecture gets bigger and bigger, it quickly gets complicated.


The input layer takes in a set of numbers, one for each neuron in the layer. Those numbers represent some object, like an image. It doesn’t have to be an image, but images are common because trying to get computers to understand images like a human is a perfect target problem for neural networks.


For a computer, images are a 2D grid of dots, and the dots are called pixels. Each pixel has a specific color, and we represent that color to the computer as a set of numbers. If the image is black and white, one pixel has just one number. For a colored image, one pixel has three values: one number for the amount of red, one number for the amount of green, and one number for the amount of blue that, when combined, make pretty much every color we can see.


(The above network with three input numbers is too small for nearly any image that we care about, but you can imagine that if the input layer were bigger we could input our image by flattening it out: you take each row of pixels and tack them together into one super long column of numbers. There is a special type of neural network called a convolutional network which allows you to skip this flattening process -- see below. If it’s a colored image, you just have three columns: one with all the reds, one with all the greens, and one with all the blues. The reds, greens, and blues get combined together in the hidden layers that follow.)


The output layer is designed so that each output neuron carries some meaning. From our dog/cat example above, you could have two output neurons: one for dog, one for cat. If the neuron thinks there’s a dog in the input image, then the output neuron for dog will output a strong signal.


The layers in between the output and the input layer are called “hidden” layers. The neurons in these layers are taking the input, and performing operations on it to produce the output.


Remember that the signal that each neuron receives comes with a weight. The big mathematical trick is figuring out how to set the weight of each connection so that for a given image, we get the right output.


When the network is first built, we don’t know what the weights should be, so we just set them to be random numbers. So right out of the gate, the network is really unlikely to get the right answer when we show it an image. So we train it! We show the network a bunch of examples, the network gives us an answer, and we give the network feedback on whether it’s answer is right or wrong. If its answer is wrong, the network will start adjusting the weights of the connections, making the largest adjustments to the biggest weights that contributed more to its wrong decision. It makes the adjustments starting at the output layer and working backwards towards the inputs (so this process is called backpropagation).


The adjustments it makes are really small, and it can take a long time before a network can start to approach human accuracy at the task -- many times the network needs to see the examples hundreds of thousands of times before it’s considered trained.


There’s a lot of work that goes into making sure that the training is effective. The examples that are used need to be representative of the inputs that the network will see when it’s in use. We want networks to generalize about the inputs, so that when they see an input not shown in training, they can get the right answer. If that doesn’t happen, we say that the network overfits. For example, the if all the images we use to train our network have only black dogs and only white cats, the network may incorrectly learn that dogs are always black and cats are always white. If after training we showed the network a white dog, the network would likely mark it as an image with a cat.


What’s a convolutional neural network?

One of the useful things we’ve picked up from neuroscience is that real neurons in the visual system only respond to inputs that come from a very restricted region of the visual field. So rather than have a neuron that connects to every pixel in the input image, it need only connect to the pixels in some local area. Neuroscience has also led us to believe that groups of neurons in the same layer of the visual system are all looking for the same types of “feature,” but just in their local region of space. For example, in the first layer, there may be a set of neurons that are all looking for a horizontal line in their small rectangle of the image.

This is great! These insights led researchers to adjust their network architectures to take advantage of this local spatial importance (and save computation to boot). It’s an added constraint to the earlier versions of neural networks (convolutional networks are a subset of neural networks), but it was an important change that allowed the networks to learn more reliably and robustly.

It also allows us to introduce some new graphics.


Here are two layers in a convolutional neural network. Each neuron in the top layer only gets input from a local region of the bottom layer. And for each neuron in the top layer, all their input weights are the same. So the neurons are all learning to detect the same pattern, but each neuron is responsible for detecting that pattern in a different area of the input image. The pattern that each neuron is looking for is known as a feature. Mathematically, this procedure is called a convolution, and the feature is convolved with the input image to produce the output image.


(If you think hard, you can see how a convolutional layer is a subset of a fully connected layer. If this were actually a fully connected layer, each neuron in the top image would be connected to every neuron in the bottom image. In a convolution, most of the connections are zeroed out, and the connections that remain for one neuron are matched up to be identical across all neurons.)


So the input to a convolution is an image, with its 2D shape still intact (no need for the flattening mentioned earlier), and the output is also an image with a 2D shape. The output represents a heatmap of where in the input image the network sees areas that match well with the feature that’s being used, so we call the output a “feature map”.


This whole hierarchy of features is how neuroscientists believe the visual system is structured. One neuron detects a vertical line on the left, another neuron detects a vertical line on the right, and another neuron detects a horizontal line on the top. All those neurons are just single pixels in their own heatmap. The next layer might have a neuron that’s looking for those three neurons to fire for it to register a pattern and say, “that’s a table!”.


In the early stages of a convolutional network, you can look at the features to see what pattern they’re trying to detect, but beyond the first few layers, the features stop making sense, because they’re not detecting features in the original image, they’re detecting features on a heatmap generated from features made on a heatmap of earlier features. But if we keep creating layers of features, we eventually give the network enough neurons to learn the relevant hierarchy of patterns to help it solve the task.


Often times, we stop using convolutions in the final layers of the network, and let those layers be fully connected to our final set of output neurons that indicate the meaning we hope to get out of the network. Early on, constraining the network to find features has proven to work, because we’ve determined that spatial information means a lot at the beginning of understanding an image. But after enough convolutions, we lose the spatial information, so that the neurons in higher layers can learn more global patterns about the image, like whether there’s a dog present anywhere in the image, not just in the lower lefthand corner.


Shortcomings of neural networks

While neural networks have helped programmers achieve human-level performance in a number of tasks, they still face a number of obstacles that limit their usefulness:

  • Networks require a very large, well-sampled set of training examples (poor one-shot learning)
  • Adjusting all the parameters in the training process is borderline a dark art
  • How to structure problems so that machines understand them


The hunt for the ultimate feature detector

One of the more seductive hypotheses in neuroscience is that there is a basic circuit of intelligence that’s just replicated all over the brain. If researchers could only figure out what that basic circuit is, how it’s wired, and how it wires together with other basic circuits, then we’ve cracked intelligence.


It’s unlikely that it will be that simple, but there are a few tantalizing observations:


  • Mammals all share a structure in the brain called the neocortex. It’s the outer sheet of neurons commonly called gray matter that from an evolutionary standpoint is a relatively recent development.
  • As mammals develop greater signs of intelligence, the ratio of neocortex to body mass increases. And of course, humans have the largest neocortex for our body size.
  • Under a microscope, the neocortex looks very similar across all mammals: (1) it has a column-like organization to it, with vertical lines of neurons, and (2) it’s organized into multiple layers, with certain layers hosting certain types of neurons.
  • When researchers measure the electrical responses of neurons to particular stimuli, neurons again seem to organize their responses along a column.


Could it be that these columns, known as cortical columns, are a basic computational unit of intelligence? It’s possible. Regardless, cortical columns are definitely an important object to study. Cortical columns have attracted a lot of research in neuroscience, including the famous Blue Brain Project in Europe, but no one has been able to definitively map all the connections between neurons in one column. That’s the target of the teams in iARPA’s MICrONs project.


Will this lead to human-level intelligence?

We don’t know. It’s science! But one of the more seductive hypotheses in neuroscience is that there is a basic circuit of intelligence that’s just replicated all over the brain. If researchers could only figure out what that basic circuit is, how it’s wired, and how it wires together with other basic circuits, then we’ve cracked intelligence.

It’s unlikely that it will be that simple, but there are a few tantalizing observations:

  • Mammals all share a structure in the brain called the neocortex. It’s the outer sheet of neurons commonly called gray matter that from an evolutionary standpoint is a relatively recent development.
  • As mammals develop greater signs of intelligence the ratio of neocortex to body mass increases. And of course, humans have the largest neocortex for our body size.
  • Under a microscope, the neocortex looks very similar across all mammals: (1) it has a column-like organization to it, with vertical lines of neurons, and (2) it’s organized into multiple layers, with certain layers hosting certain types of neurons.
  • When researchers measure the electrical responses of neurons to particular stimuli, neurons again seem to organize their responses along a column.

Could it be that these columns, or cortical columns, are a basic computational unit of intelligence? It’s possible. Regardless, cortical columns are definitely an important object to study.

Cortical columns have attracted a lot of research in neuroscience, including the famous Blue Brain Project in Europe, but no one has been able to definitively map all the connections between neurons in one column.

That’s the primary target for the teams in iARPA’s MICrONs project.

Zoe Gilette

These models can be very complicated, though some aspects are also very easy to understand. Here, we’ve displayed a popular pattern for connecting model neurons together. Generally, these are called model architectures, and this is one we use a lot. The input, usually an EM image, is fed into the left half of the diagram, and information flows through the connections to the output on the other side. Each circle in the diagram represents many different model neurons wired together into a module. You can think of this as one step of processing the input to the output. In addition to these modules, there are connections between them which go up, down, and to the right. The “up” connections are similar to zooming out, so that the model can “see” more context at a lower level of detail. The “down” connections are similar to zooming back in, and the rightwards connections stay at the same level of detail. By having all three types of connections within our models, we allow it to combine information from different levels of detail together to make decisions about parts of an image.

Now artificial intelligence is helping neuroscientists come up with even more advanced models. Most EM labs use convolutional neural networks to help reconstruct the branches of the neurons in images, identify synapses between the cells, and assign labels to what is seen in the image, like marking an object as an axon, dendrite, cell body, glia, or blood vessel, or pointing out where there are mitochondria.

The hope is that neuroscience can help artificial intelligence in areas where it’s still struggling. The simple neural networks that people have developed are still far from human-level intelligence in many ways. Most artificial intelligence systems are more like savants than intelligent in the sense that we usually think of it: they excel at a specific task but anything far outside their expected parameters does not compute. Some believe that neuroscience will help guide changes to AI neural networks so that they can learn with only a few examples instead of the thousands to millions they currently require and that they can easily identify similarities and differences between objects that the network has never seen before.

That’s the goal of the iARPA MICrONs project: to refine our simple artificial neural networks by studying the real neural networks of a brain.