_**With recent advances in machine learning techniques, vendors like [Nvidia][1], [Intel][2], [AMD][3] and[IBM][3]are announcing hardware offerings specifically tailored around machine learning. In this post we examine the key differences between “traditional” software and machine learning software and why those differences necessitate a new type of hardware stack.**_
Most readers would certainly be forgiven for wondering why NVidia (NVDA on the stock market), a company that rose to prominence for manufacturing and distributing graphics processing chips to video games enthusiasts, are suddenly being mentioned in tandem with machine learning and AI products. You would also be forgiven for wondering why machine learning needs its own hardware at all. Surely a program is a program right? To understand how these things are connected, we need to talk a little bit about how software runs and the key differences between a procedural application that you’d run on your smart phone versus a deep neural network.
<imgloading="lazy"src="https://i1.wp.com/openclipart.org/image/2400px/svg_to_png/28411/freephile-Cake.png?resize=293%2C210&ssl=1"alt="Cake by freephile"width="293"height="210"data-recalc-dims="1"/><figcaptionclass="wp-caption-text">An algorithm is a lot like a cake recipe</figcaption></figure>
You can think of software as a series of instructions. In fact, that’s all an algorithm is. A cooking recipe that tells you how to make a cake step-by-step is a real world example of an algorithm that you carry out by hand every day.
Traditional software is very similar to a food recipe in principle.
1. First you define your variables (a recipe tells you what ingredients you need and how much you’ll need for each).
2. Then you follow a series of instructions. (Measure out the flour, add it to the mixing bowl, measure out the sugar, add that to the bowl).
3. Somewhere along the way you’re going to encounter conditions (mix in the butter until the mixture is smooth or whip the cream until it is stiff).
4. At the end you produce a result (i.e. you present the cake to the birthday girl or boy).
A traditional Central Processing Unit (CPU) that you’d find in your laptop, mobile phone or server is designed to process one instruction at a time. When you are baking a cake that’s fine because often the steps are dependent upon each other. You wouldn’t want to beat the eggs, put them in the oven and start pouring the flour all at the same time because that would make a huge mess. In the same way, it makes no sense to send each character in an email at the same time unless you want the recipient’s message to be garbled.
## Parallel Processing and “Dual Core”<figure style="width: 273px" class="wp-caption alignleft">
<imgloading="lazy"src="https://i0.wp.com/openclipart.org/image/2400px/svg_to_png/25734/markroth8-Conveyor-Belt.png?resize=273%2C114&ssl=1"alt="Conveyor Belt by markroth8"width="273"height="114"data-recalc-dims="1"/><figcaptionclass="wp-caption-text">CPUs have been getting faster at processing like more and more efficient cake making production lines</figcaption></figure>
Over the last 2 decades, processing speed of CPUs has got faster and faster which effectively means that they are able to do more and more instructions one at a time. Imagine moving from one person making a cake to a machine that makes cakes on a conveyer belt. However, consumer computing has also become more and more demanding and with many homes globally connected to high speed internet, multitasking, running more than one application on your laptop at the same time or looking at multiple tabs in your browser, is becoming more and more common.
Before Parallel Processing (machines that advertise being “dual core”, and more recently “quad core” and even “octo-core”), computers appeared to be running multiple applications at the same time by doing little bits of each of the applications and switching around. Continuing our cake analogy, this would be like putting a chocolate cake in the oven and then proceeding to mix the flour and eggs for a vanilla sponge all the time, periodically checking that the chocolate cake isn’t burning.
Multi-processing (dual/quad/octo core) allows your computer really run multiple programs at the same time, rather than appearing to. This is because each chip has 2 (duo) 4 (quad) or 8 (octo) CPUs all working on the dataat the same time. The cake analogy is that we now have 2 chefs or even 2 conveyer belt factory machines.
## How [Deep] Neural Networks Work
Neural Networks are modelled around how the human brain processes and understands information. Like a brain, they consist of neurons which get excited under certain circumstances like observing a particular word or picture and synapses which pass messages between neurons. Training a neural network is about strengthening and weakening the synapses that connect the neurons to manipulate which neurons get excited based on particular inputs. This is more or less how humans learn too!
The thing about human thinking is that we don’t tend to process the things we see and hear in small chunks, one at a time, like a traditional processor would. We process a whole image in one go, or at least if feels that way right? Our brains do a huge amount of parallel processing. Each neuron in our retinas receives a small part of the light coming in through our eyes and through communication via the synapses connecting our brain cells, we assemble a single coherent image.
Simulated neural networks work in the same way. In a model learning to recognise faces in an image, each neuron receives a small part of the picture – usually a single pixel – carries out some operation and passes the message along a synapse to the next neuron which carries out an operation. The calculations that each neuron makes is largely independent unless it is waiting for the output from a neuron the next layer up. That means that while it is possible to simulate a neural network on a single CPU, it is very inefficient because it has to calculate what each neuron’s verdict about it’s pixel is independently. It’s a bit like the end of the Eurovision song contest where each country is asked for its own vote over the course of about an hour. Or if you’re unfamiliar with our wonderful but[obscure european talent contest][4], you could say its a bit like a government vote where each representative has to say “Yea” or “Ney” one after another. Even with a dual, quad or octo core machine, you can still only simulate a small number of neurons at a time. If only there was a way to do that…
<imgloading="lazy"src="https://i1.wp.com/openclipart.org/image/2400px/svg_to_png/213387/Video-card.png?resize=273%2C198&ssl=1"alt="Video card by jhnri4"width="273"height="198"data-recalc-dims="1"/><figcaptionclass="wp-caption-text">GPUs with sporty go-faster stripes are quite common in the video gaming market.</figcaption></figure>
GPUs or Graphical Processing Units are microprocessors that were historically designed for running graphics-based workloads such as rendering 3D models in video games or animated movies like Toy Story or Shrek. Graphics workloads are also massively parallel in nature.
An image on a computer is made up of a series of pixels. In order to generate a coherent image, a traditional single-core CPU has to calculate what colour each pixel should be one-by-one. When a modern (1280×1024) laptop screen is made up of1310720 pixels – that’s 1.3 million pixels. If we’re watching a video, which usually runs at 30 frames per second, we’re looking at nearly 40 million pixels per second that have to be processed. That is a LOT of processing. If we’re playing a video game, then on top of this your CPU has to deal with all the maths that comes with running around a virtual environment and the behaviours and actions of the in-game characters. You can see how things could quickly add up and your machine grind to a halt.
GPUs, unlike CPUs are made up of thousands – that’s right, not duo or octo but thousands of processing cores so that they can do a lot of that pixel rendering in parallel. The below video, which is also hosted on the [NVidia website,][5] gives an amusing example of the differences here.
GPUs trade off their massively parallel nature with their speed at handling sequential functions. Back to the cake analogy, a GPU is more like having 10 thousand human chefs versus a CPU which is like having 2 to 8 cake-factory-conveyer-machines. This is why traditional CPUs remain relevant for running traditional workloads today.
## GPUs and Neural Networks
In the same way that thousands of cores in a GPU can be leveraged to render an image by rendering all of the pixels at the same time, a GPU can also be used to simulate a very large number of neurons in a neural network at the same time. This is why NVidia et al., formally famous for rendering cars and tracks in your favourite racing simulation to steering real self-driving cars via a simulated deep neural network.
You don’t always need a GPU to run a Neural Network. When building a model, the training is the computationally expensive bit. This is where we expose the network to thousands of images and change the synapse weights according to whether the network provided the correct answer (e.,g. is this a picture of a face? Yes or no?). Once the network has been trained, the weights are frozen and typically the throughput of images is a lot lower. Therefore, it can sometimes be feasible to train your neural network on more expensive GPU hardware and then query or run it on cheaper commodity CPUs. Again, this all depends on the amount of usage that your model is going to be getting.
## Final Thoughts
In a world where machine learning and artificial intelligence software are transforming the way we use computers, the underlying hardware is also shifting. In order to stay relevant, organisations must understand the difference between CPU and GPU workloads and as they integrate machine learning and AI into their businesses, they need to make sure that they have the right hardware available to run these tasks effectively.