The role of novel ideas in neural network research & applications

Public opinion has it that the current success of deep learning is built upon ideas from the 1970s and 1980s, enhanced by the availability of (a) increased computational power and (b) “big” data. Some add that a few minor algorithmic improvements also have been involved (e.g. BatchNorm weight initialization [see also explanation here], ReLU [rectified linear unit] nonlinearities, dropout regularization, or the ADAM optimizer). Building deep neural networks, the legend goes on, then boils down to clever engineering of the knobs and faders of these “black boxes”; and nobody really understands why they work or how they produce results: Pure black magic at worst, empiricism instead of science at best.

A prime example, on first glance, seems be Google’s famous inception architecture: A very wide & deep CNN that won the ImageNet ILSVRC’14 competition by using an ensemble of 7(!) 22-layer(!) networks. Sounds like raw computational power (as ownly Google can have it), thrown at a huge pile of data, and thus winning by brute force.

But public opinion is not always right.

The original inception paper* from Google has this sentence: “[M]ost of this progress is not just the result of more powerful hardware, larger datasets and bigger models, but mainly a consequence of new ideas, algorithms and improved network architectures”. It goes on to show how how the merger of two beautiful ideas resulted in a network that is very deep & wide, but sparsly connected and explicitly designed to run on small-scale hardware (mobile phones) by exploiting considerably less parameters as competitive standard CNNs. These ideas are as follows:

A convolutional kernel in a CNN layer can be viewed as a small “model within the larger neural network model”, that basically resembles a generalized linear model (GLM). It is linear, thus limited in expressiveness. This paper explored the idea of replacing the typical GLM with an embedded neural network of its own, and gave positive results.
Quite some myths surround the challenge of finding the correct network architecture, given a specific problem. This purely theoretical work (remarkably, a collaboration of academia and industry – so much for the boundaries of “applied science”) showed a way to iteratively (layer by layer) construct the best architecture by exploting local correlation structure of neurons firing together; only those neurons are then wired together that typically fire together (this goes back to Donald Hebb’s research in neuroscience).

This shows beautifully how at the core of recent deep learning successes there are indeed fundamentally new ideas, well backed up in theory and even biologically plausible. I conclude that we should let go of the following 3 urban legends:

neural networks being black boxes
their succes being a suprise, brought by mere clever engineering and brute force
their “making them work” being alchemy (Lukas Tuggener‘s term) or again brute force

Or, as Ismail Elezi put it: “We are getting closer to ‘we don’t understand how deep learning works’ becoming a total false statement”.

*) Great additional explanations of the inner workings of the inception module are found here; if you wonder about the 1×1 convolution, check here.

Written on December 14, 2016 (last modified: December 14, 2016)