The ups and downs of Neural Networks

Interesting enough, the current hype topic (rightly so!) in Machine Learning is Deep Learning – which is largely the current term for ongoing research in Neural Networks.

Why is this interesting?

Neural Networks have a colourful history. Being researched first in the 1950′s as biologically inspired machine learning approaches (though that term wasn’t used then), they repeatedly

excited the fanasies of users and laypeople,
stirred and subsequently disappointed the exaggerated expectations,
fell out of favour as if “we tried that; it didn’t work”,
just to rise again as new results showed superiority compared with other approaches on previously unseen tasks.

Three famous NN publications: Minsky’s “Perceptrons”, Rumelhart’s “Parallel Distributed Processing” and LeCun’s “Gradient-based Learning Applied to Document Recognition”.

This up and down is readily observed in the literature (see the picture): Inflated expectations in the 1950′s and 1960′s where shattered by (largely a misunderstanding of) Minsky et al’s “Perceptrons” of 1969, which lead to the first so-called “AI winter”. Rumelhart et al’s “Parallel Distributed Processing” from 1987 swung the pendulum to the other side again with new results and ideas. These where again not directly transferable to all applications domains, leading to another AI winter in the 1990′s.

In 2006, Hinton et al finally showed how to train deeper architectures of Neural Networks and thus realize their full potential. This kicked off a whole bunch of new and continued dedicated “Deep Learning” (a.k.a. Representation Learning) research that culminated an most of all pattern recognition benchmarks won by a large margin through deep learning approaches within the last 4 years – see Schmidhuber’s excellent survey of the history of Deep Learning research and achievements here.

Big Names in Deep Learning (left to right): Yann LeCun, Geoffrey Hinton, Yoshua Bengio and Juergen Schmidhuber.

As an interesting side note, the heads of what are on my perception the top 4 research groups in Deep Learning world wide are the one that already helped to shape the field since the 1980 (compare the second picture): Geoffrey hinto for example was already a coauthor in Rumelhart et al’s work on Parallel Distributed Processing. Yann LeCun and Yoshua Bengio wrote an articel in 1998 that is still foundational to the mastering of Convolutional Neural Nets (a.k.a. ConvNets), a recently very fashionable approach even besides image processing. According to Jonathan Masci’s Keynote at last year’s “Intelligent Systems and Applications” conference, most that changed since then are small algorithmic adjustments to facilitate the training of deeper architectures and huge improvements in hardware power (via GPUs mainly).

What are we going to do about it?

The Datalab invested considerably in parallel computing hardware last year in order to kick off Deep Learning research at ZHAW. A group of about 10 researchers formed, reading foundations and gaining momentum by implementing prototypes for various applications including face recognition and OCR. First projects are already finished, and more are yet to come.

Written on April 9, 2015 (last modified: April 9, 2015)