A Brief Introduction to Machine Learning

In addition to reading about reinforcement learning, I have also been following Andrew Ng’s course on machine learning (hosted on Coursera), to understand some of the fundamentals of machine learning as a whole. In this post, I’m going to discuss some of the basic principles of machine learning and hopefully follow on with more posts as I keep working through the course.

First of all, what is machine learning? It is a way of using computers to complete tasks without explicitly programming them to do those tasks [1]. Machine learning is considered a subset of artificial intelligence and is used in places where writing a program to do something explicitly would be very difficult [1]. Ng offers classifying spam e-mails as a task that is well suited to machine learning. Writing a program to determine if e-mails are spam is difficult, because a human programmer would have to exhaustively list all the keywords and clues that would suggest an e-mail is spam. Instead, using a machine learning algorithm to train on a large dataset and learn a mathematical model that can predict whether an e-mail is spam or not is a much faster and more effective solution [2]. Machine learning also benefits from modern hardware the same way that reinforcement learning does - powerful GPUs make it possible for machine learning algorithms to train on more data than a human can see in a lifetime.

There are two main categories of machine learning that are in use today, and they describe how much help a human designer will give a computer algorithm to learn the right mathematical model. The first category is called supervised learning, and it gives the computer algorithm a lot of help, in the form of a labeled set of training data [2]. The data is labeled to indicate how it should be classified. In the context of our e-mail spam example above, we can give the computer a set of e-mails that are already labeled as spam or not spam. The algorithm just needs to build a model that returns the correct label for new e-mails it hasn’t seen before. We use a supervised learning approach when we, the human designers, already know the relationship between the input and output [2]. This is applicable to spam e-mails because we know that an e-mail subject line that reads “free vacation to the Bahamas” is probably spam.

Unsupervised learning gives the computer less information about how to build a proper mathematical model to correctly classify a dataset [2]. In unsupervised learning, the human designer will not know a priori the relationship between the input and output; in fact, the human may not even be sure if there is such a relationship [2]. In this situation, we are using the computer algorithm to look for patterns or relationships in the data. For example, we may have data on a set of patients who have tumors - this data might include details on the type of tumors the patients have, and their age, weight and height. We can feed this data to an unsupervised learning algorithm and ask it to look for relationships between all these parameters, with the hope of finding a relationship between one of the patients’ characteristics (like height) and whether the tumor is benign or cancerous [2]. This could help us predict what kinds of tumors patients are likely to have based on data we already have for them.

Let’s return to supervised learning for a moment and discuss some of the most popular ways of solving a supervised learning problem. Recall that data can either be discrete or continuous. Discrete data means that the data can only have certain values - e-mails can be spam or not spam, tumors can be cancerous or benign. In contrast, continuous data can have a range of values - a patient’s height or weight is continuous, for example. Depending on what kind of data we have, we can use different kinds of mathematical models to represent it. For discrete data, we can use classification to classify data according to what category it falls into [2]. For example, a supervised learning approach for solving the e-mail problem is to use classification to map new inputs (e-mails) into outputs (spam or not spam). If we have a dataset that is continuous, then we can use regression to fit a curve to the data set and find a function that approximates the relationship [2]. Many basic laws in science were discovered using regression, like Hooke’s law which relates the force on a spring to how far it will stretch. There is a function that describes the spring extension (F = kx) and when you perform an experiment to validate this model, you will find that this function produces a line which closely fits your data points. When you find the line that fits the data, you are performing regression.

Unsupervised learning also has some popular methods of looking for structure in unlabeled data sets. For example, clustering is used to group points in a dataset based on characteristics they have in common, and then looking for relationships in those clusters [2]. A clustering algorithm might group patients with tumors by their height and find that tall patients tend to have cancerous tumors while short patients have benign tumors. Ng also explained that there are non-clustering algorithms for looking for patterns in chaotic data, but I did not find that description very helpful [2]. He presented the “cocktail party algorithm” as an example, which uses singular value decomposition to find patterns in data without clustering [2]. (Singular value decomposition is a very useful tool from linear algebra and probably deserves its own post someday.)

In my next post I will discuss the basic mathematics behind using a model to represent patterns in a dataset.

References:

[1] Wikipedia. “Machine learning.” https://en.wikipedia.org/wiki/Machine_learning Visited 12/10/2019.

[2] Ng, Andrew. Machine Learning course, week 1 lecture series. https://www.coursera.org/learn/machine-learning/home/week/1 Visited 12/10/2019.

Written on December 11, 2019