In Part 1 of our series, we looked at machine learning, including supervised learning, unsupervised learning, and reinforcement learning. Now we’re going to dive a little deeper into how supervised learning works.
The output of machine learning algorithms is a “model”—a method of processing data to generate outputs. Different types of machine learning algorithms are used to create the models. One of them is the supervised classifier.
What is a Supervised Classifier?
The goal of a classifier is to take a series of inputs and classify them into different categories (this process is also known as labeling). For instance, we might have a public discussion forum for which we want to automate some of the moderation. We want the algorithm to classify the posts into groups such as likely spam, threatening messages, or conventional conversation. Or we might want to do something entirely different such as writing medical software which will scan images and determine if they are cancerous or non-cancerous.
In both cases, because these are our classification systems, we must use a supervised learning model. In other words, we must find some way to tell the machine learning algorithm about our categories. In unsupervised learning, by contrast, we would just give our algorithm a pile of posts, and let it come up with its own categorization system, which may or may not fit our needs. In unsupervised learning, the machine simply breaks the data out into the most obvious and straightforward sets without regards to the actual needs of the system.
Instead, with supervised learning, we tell the computer both the questions (the input data) and the answers (the classification) for a set of data. The computer then develops an internal model which relates the questions and the answers. This model can then be applied to future data sets which have not been classified so that the labels can be applied in the same way.
Algorithms Commonly Used for Classification
While new algorithms are always being developed, here we will present a few that are commonly used today. They can be broken down into several categories.
A binary classifier algorithm only works when there are two categories and everything belongs to one of the two. Logistic regression is one of the oldest systems used for this task. Logistic regressions essentially provide a logistic curve to separate multidimensional data into categories.
A multiclass classifier, as you may have guessed, works with any number of categories for classification. The simplest such algorithm is k-Nearest Neighbor. In this algorithm, the training data is itself the model. Then, when new instances are shown, the training data is examined to find training data which is the nearest to the instance. You decide ahead of time how many examples of training data to locate (this is the “k”). These training examples then “vote” and the most common classification among them wins.
Another popular system for this purpose is a Bayesian network. Bayesian networks operate by modeling conditional dependence. For each feature of a data set, the Bayesian network asks, “What is the probability for a given classification given this particular feature?” While Bayesian networks can also model dependencies within features, oftentimes a naive bayes classifier is used, which simply assumes no dependence among the features.
Other types of multiclass classifiers include:
• decision tree a system where the data is separated, a piece at a time, according to individual variables of the data
• support vector machine a highly mathematical approach to learning.
• neural network a system which aims to mimic the neurons in your brain
Neural networks, in particular, are widely used enough (and complex enough) to deserve their own article, so I won’t go into detail about them here.
A multilabel classifier is a multiclass classifier used in cases where a given example may belong to more than one classification. The problem is usually broken down into multiple subproblems. For instance, if you have a set of labels, you can create a binary classifier for each label, where the two classifications will be “has this label” or “doesn’t have this label.” Then, a binary classifier is run for each one, and you have separate models for each label. When unknown data is examined, each binary classifier is used to determine whether or not the associated label should be applied.
In short, supervised classification works by providing training data where each input example is pre-classified into one of two or more categories. The system then uses this data to build a model that can be used to classify examples that have not been seen before. Many algorithms with different theoretical underpinnings are available for data scientists to use.
In any event, breaking the components of machine learning down into smaller, simpler concepts, as I am doing here, shows that it is not as hard a field to grasp as some may have feared.
Jonathan Bartlett is the Research and Education Director of the Blyth Institute.
See also: Part 1: Navigating the Machine Learning Landscape To choose the right type of machine learning model for your project, you need to answer a few specific questions
Also by Jonathan Bartlett: If you don’t want to get left behind, here are some conferences to give you a boost
It’s 2019: Begin the AI Hype Cycle Again
How can information theory help the economy grow