How to Create Synthetic Data to Train Deep Learning Algorithms?
How to use deep learning (even if you lack the data)? It’s a tricky task. To train a computer algorithm when you don’t have any…
First, let’s bust a few myths.
If you Google ‘what’s needed for deep learning,’ you’ll find plenty of advice that says vast swathes of labeled data (say, millions of images with annotated sections) are an absolute must.
You’ll probably also read that it takes a lot of computer power. Then, there’s the small ask of having a Ph.D. in Maths; or a background in computer science, at least. You may well come away thinking, deep learning is for ‘superhumans only’ — superhumans with supercomputers.
But the reality is much less stark. Deep learning can be for everyone. If you think we’re crazy, give us a few minutes of your time, and we will attempt to show you otherwise.
|See also: Why Deep Learning May Not Be the Right Solution for Your Business|
In this article, we’ll discuss why deep learning doesn’t need much beyond the norm.
In fact, we’ll show you how it really can be for everyone. So if you, like many of us, don’t have a maths Ph.D. — or don’t have access to a super-fast computer — and you would still like to explore the field, then this article is for you.
Yes, deep learning requires some understanding of programming (you’ll have to use Python). However, we’ll give you a step-by-step explanation of what’s going on in the code, while keeping things understandable, so there’s no need to worry if you haven’t used Python before.
To get you started, we’ll show you how to train a computer to classify pictures of cats and dogs (this is a classic example of how one might use deep learning and computer vision). And we’ll even take things a level further.
Our model won’t only recognize cats versus dogs, but it will also know the difference between different breeds of cats and dogs. Sounds interesting?
…Let’s get to it.
For this task, we’ll use cutting-edge, but soon-to-be readily-available software. At the time of writing, the ‘fastai library v2’ is in pre-release. That said, the MOOC will become publicly available around July 2020.
(Note: If you’re reading this article before the release date, use ‘fastai2’ — in place of ‘fastai’ — on import, as described in the code below.)
‘Fastai’ is an open-source, modern deep learning library. It sits on top of PyTorch. Alongside ‘fastai,’ we will use a computer with a graphics processing unit (GPU) so that we can get results as quickly as possible.
Now, let’s be clear. This task is not easy. Just ten years ago, few computers could have completed it. However, it is now possible thanks to the hard work of the many people who have advanced the hardware and software we’ll use today.
These are the kind of toolkits that make deep learning accessible to all.
Right, it’s time to get stuck in.
Let’s train an image classifier to recognize breeds of cats and dogs. This model should achieve close to maximum possible accuracy within a couple of minutes of training it on a single GPU.
A quick heads up: we will not be using the PEP8 coding style. Throughout this article, we decided to follow the ‘fastai’ style, which you can read more about here. Why not use PEP8? Let’s let Jeremy, the library’s creator, answer that question:
“I don’t think it’s ideal for the style of programming that we use, or for math-heavy code. If you’ve never used anything except PEP 8, here’s a chance to experiment and learn something new!”
The following code is everything you need to perform the task:
from fastai.vision.all import *
path = untar_data(URLs.PETS)
dls = ImageDataLoaders.from_name_re(path=path,
learn = cnn_learner(dls, resnet34, metrics=error_rate)
Going one step further, these are the three tasks performed, once the code executes:
Now, let’s look at what each line of code actually does.
Below is the training stats output.
As you can see, we can train a model pretty quickly (in less than 5 minutes) on a single computer with a GPU. Typically, we can get a GPU for free.
Once a Learner has been trained, we can see the classification error rate on the validation dataset alongside the training and validation losses. We can also visualize the results, using the following helper function:
Once the model is trained, we can deploy it as a web application and make it available for others to use. Although fastai is mostly focused on model training, you can easily export the PyTorch model to use it in production with the command Learner.export
The library provides two methods to evaluate the model on user-uploaded images:
Jupyter Notebook, Voila, and ipywidgets offer an easy way to create web applications for machine learning projects. Using binder (mybinder.org), we can also deploy an app to share with a team, for example.Click here for an example of a very simple app created to classify images to one of the 37 pets breeds (note: it takes a while to launch the app as it first creates the environment, then serves the app).
Now you know how to apply deep learning to image classification. But how much do you know about deep learning in general? If you have one more minute to spare, let’s take a quick look at its history and theory.
Deep learning stems from research into neural networks, which were inspired by how the human brain works.
This dates back to the 1940s.
Back then, researchers were starting to investigate the theory and practice behind deep learning. However, it wasn’t until the 1980s that researchers started adding layers to neural networks — thus, making them ‘deep.’
And this is the field we’ve come to call deep learning.
Deep learning is a subset of machine learning. Machine learning is like regular programming: a way to get a computer to perform a specific task — say, ‘to recognize cats and dogs,’ as we covered in this article.
But it would be extremely difficult to use standard computer code to complete a task like that, which is why machine learning doesn’t tell a computer what steps to take.
Instead, we train an algorithm on how to solve a problem using examples so that it can figure out how to solve the problem itself — and most crucially of all, neural networks are flexible enough to solve any given problem using this method.
Neural networks are used in all sorts of modern-day applications. And deep learning is one of the best-known approaches. In many areas, it’s as good — if not better — than human performance.
Deep Learning is here, it’s real, and it’s widely used across the following fields:
All said, deep learning can’t solve every problem.
The technology may seem magical at times. But there are limitations to what even state-of-the-art machine learning algorithms can do.
First, a model can only learn to recognize patterns that are present in the training dataset we share. In our example, we trained our model to identify photos of cats and dogs, so our model can classify photos of cats and dots. But it wouldn’t be able to recognize drawings, for example — or anything other than photos, for that matter
Training an algorithm also needs correctly labeled data, also known as the ‘ground truth’ — in our case, that means classifying each picture either as a breed of cat or dog.
The example used in this article was possible thanks to the fastai library, and its associated book and Deep Learning course, which will be publicly available around July 2020: these resources include examples of how to build and deploy state-of-the-art deep learning image classifiers.
Fastai is a project led by the Fast.ai team (Howard et al.). It is built on top of PyTorch, and it’s primarily designed for interactive development with notebook-oriented development systems, like Jupyter Notebook.
The mission of the Fast.ai team is to provide:
Additionally, we used a GPU deep learning server, run on Linux.
There are plenty of options to access a computer that already has everything set up and ready to use: some options are free, like Google Colab, Paperspace, or Kaggle Notebooks; others are paid, but cost as little as US$0.25 per hour.
If you use a paid option, remember to shut down your instance to avoid paying for it when you don’t need it (note: shutting down your browser or local computer is not enough!).
And there you have it — do you still think we’re crazy? We hope not 🙂
Traditional wisdom tells us to market to consumers based on gender, age, income, and marital status. Characteristics known as demographics. But while this type of…