The increased interest of the wider public in artificial intelligence (AI) and machine learning (ML) has, in my opinion, been driven by progress in two subfields: supervised learning and reinforcement learning. From the perspective of industry, it’s really the improvement in supervised learning that’s drawing attention. You can be fairly sure that when a product “uses AI” it’s going to involve supervised learning (if it uses any machine learning at all).
In this blog, I’m going to discuss some shortcomings of supervised learning and research addressing these shortcomings. Many have written similar articles and blogs before, so what I want to explore is the implications of this research on industry.
Consider training a classifier to distinguish between pictures of cats and pictures of dogs. The classifier takes an input image and maps it to either the label “cat” or the label “dog”. Supervised learning is all about learning mappings from input spaces to output space. For this example, the input space is the collection of original images and the output space contains the labels “cat” and “dog”.
Training this mapping requires lots of labeled data. For the example above, you need not only lots of pictures of cats and dogs but also access to the corresponding labels. This labeling is normally done by hand. Someone (or more commonly many people) must painstakingly go through every image you have of cats and dogs and hand label the image with either “cat” or “dog”. This is the current paradigm of supervised learning; training requires lots of labeled data.
The amount of labeled data required by an algorithm is called its sample efficiency. As supervised learning systems need lots of labeled data, they are very sample inefficient. This poor sample efficiency of supervised learning leads to the following issues:
The reasons outlined above show the need for algorithms that are far more sample efficient than supervised learning. Self-supervised learning is a broad class of techniques that can achieve this sample efficiency.
With supervised learning, a model takes some data and predicts a label. With self-supervised learning, the model takes some data and predicts an attribute of the same data. This definition is quite cryptic and is best understood through example. Some examples of self-supervised tasks include:
Although these examples are disparate, they all follow a common pattern: take some data, potentially augment the data, predict a property of the original data. Note that no labels are needed at any stage. The resulting representations can be used to train classifiers with much higher sample efficiency than standard supervised learning.
In the preceding section, I listed four shortcomings of supervised learning. By severely reducing the dependence on labeled data, self-supervised learning addresses the first three shortcomings that I listed.
It also moves us closer to the lower bound on potential sample efficiency mentioned in the fourth point. It is widely thought that most of the learning infants do is through self-supervision. In that sense, the current self-supervised methods being explored bring us slightly closer to recreating the algorithm used by the human neocortex.
It is worth stressing that self-supervised learning is still in its infancy and is just one set of techniques for improving sample efficiency. Some of the research we have been doing at Speechmatics investigates how meta-learning can improve sample efficiency. Below you can watch a video summarizing a paper we presented at the MetaLearn workshop at NeurIPS last year.
Although I have very low confidence on whether it will be done using self-supervision, meta-learning or some other techniques, I am confident that machine learning will only become more sample efficient as years pass.
From my admittedly narrow experience of how machine learning is being used in wider industry, it appears that the “ML ecosystem” is developing to account for bad sample efficiency. For instance:
All of these are valuable and logical phenomena in the current ML paradigm. However, I don’t think it is unreasonable to expect this to significantly change if very sample efficient algorithms are developed.
On the demand side, there will be less need to label large amounts of data as less labeled data is needed. “Cleaning” a dataset will also look very different. You don’t need labels to be consistent and correct if there are no labels in the first place.
On the supply side, there is far, far more unlabeled data available for free (think the entire Internet) than there is labeled data altogether. There is less of a business case for selling labeled data when there is an abundance of unlabeled data available for free that algorithms can make use of.
It’s likely that the most dramatic changes will be due to second-order effects which I am absolutely not going to speculate about.
That all said, what is far more important than any of these individual effects is realizing that machine learning algorithms are going to change, hopefully significantly, over the next 10 years. The current shortcomings will be ameliorated, and new problems will arise. This will be a threat to some and an opportunity for others.
For businesses building tooling around ML, there should at least be an awareness that deep learning is only a paradigm of machine learning and a different paradigm may dominate in the future. When shooting a moving target, those who aim at the target will inevitably miss. The best marksmen aim at where the target will be in the future.
Sam Ringer, Speechmatics