A few weeks ago I was asked to do an introductory talk about Machine Learning. This got me thinking about how my first encounters with machine learning were, and what was lacking from them. When first starting to learn about machine learning I think it is easy to get overwhelmed by all the different methods without seeing how it all ties together, which leaves people confused and stupefied. To avoid this I tried to think of a better approach to introduce machine learning than just listing methods or categories of methods. This was harder than expected since I had to identify a lot of internalized knowledge about machine learning which I hadn’t explicitly thought about before, but it also created a great opportunity for me to identify some of the things I had aquired a deeper understanding about in the last couple of years.
When I was studying machine learning at university there was a heavy focus on learning the mathematical details of lots of different methods and algorithms. But starting out as a machine learning practitioner you quickly realize that the hard part is not to understand the math or specific details about the methods, but instead to formulate a problem in terms of data as well as knowing what method or type of methods to try to solve the problem with. I think this knowledge is something most universities fail to teach.
To achieve this it is crucial to create a fundamental framework or holistic understanding of how machine learning works and what it can be used for. It is my belief that this is how humans normally achieve an intuition and deeper understanding about things.
In this article I therefore want to introduce a viewpoint and philosophy about statistics and machine learning which have helped me a lot so far in my career and spare time projects.
To backtrack how to explain the fundamentals of my view of machine learning, I started to reason about learning in general, and how it relates to data. So to start off let’s ask some fundamental questions about learning in general.
How do we learn?
This question is harder to answer than it seems and there can be multiple opinions about it, mainly dependent on how you define knowledge (which is another rabbit hole kind of topic). But I believe that knowledge is based on generalization and categorization of data, which are in turn built on rules about how to generalize and categorize data. If this is true, then learning is about creation of those rules. To understand how those rules are built we can think about how we learn the basic things about life, the things we learnt when we were children. So how did we learn those things? I think the answer is either through actually experiencing them, thus collecting data, or by instructions from our parents.
Instructions or data
We can then categorize learning in two types, either learnt by data or by instructions. We can also apply this rule to how we learn things as adults, if we read an article that we learn from, it is by instructions. If we learn by trial and error, we learn by collecting and analyzing data.
So how does this relate to computers and machine learning? From the computers point of view I consider programming the same as learning by instructions and machine learning the same as learning from data.
Then there is also a question of how "deep" the knowledge is of course. If the rules are about generalizations or higher level categories of things, in my view it is the same as what we call understanding.
So then machine learning is just like programming but instead of explicit instructions we use data to describe the behavior we want the machine to learn.
Or as Arthur Samuel, who created an algorithm playing checkers which learnt from data and also coined the term machine learning in 1959, described machine learning:
"Field of study that gives the computers the ability to learn without being explicitly programmed."
I believe that we with machine learning can create any kind of program, defined by data instead of explicit instructions (programming). But often it is not practical with current technology, because it is easier to program the program than define and collect the data needed. You can checkout Francois
A thought experiment
Let’s try this idea with a simple example, it might be a bit non-sensical but I hope it conveys the general idea.
Problem: Define a program returning sine of input times two.
By explicit instructions:
def sinextwo(x): return math.sin(x)*2
input = [0.0, 0.5, 1.0, 1.5, 2.0, 2.5] output = [0.0, 0.95885, 1.68294, 1.99499, 1.81859, 1.19694]
That was easy enough, so why don't we use machine learning all the time? Then we don't have to program!
Well let's take a closer look at the example. What did we actually define with that data?
What is really defined outside of these points? This is obviously not enough to infer that this is a sine curve.
We need more data.
There we go, we can all see what this is supposed to be now.
But what would happen if we would ask the output for number 10 if this is all the information the computer can learn from?
We can clearly see that then is supposed to end up close to -1.
But if we stop for a moment and think about it, how can we see that? There is actually no data suggesting a behaviour outside of the defined range.
We are obviously using our knowledge about how functions normally behave when we draw this conclusion. But this knowledge is not available to the computer unless we make it.
Model vs Data
"You have to make structural assumptions about the world. If you assume anything, then you can't learn."
Ryan Adams, co-host of the podcast Talking Machines
So even though we had some data about the problem we couldn’t really learn from it. Obviously there must be something missing. How can humans make inferences about these data points when a computer can’t?
As humans we always use a framework/model to make sense of the things we see, it seems like we can’t even avoid doing this. But for a machine any kind of model not defined by data has to be explicitly programmed into it.
Therefore, in machine learning we always use a model. Even when we're using deep learning or any of the more data heavy models, there is always some kind of instructions passed to the computer allowing it to make sense of the data.
So with this in mind, we can create a high-level framework about machine learning, that is based on the dichotomy which we can call model vs data, instructions vs data or programming vs data.
With this framework we can now make sense about a lot of things related to machine learning and statistics.
We can for example explain how predictive modeling have evolved during the last century. We can describe this evolution as a shift from model heavy methods to data heavy methods. A very simple yet powerful explanation connecting a huge series of events, made possible by the view of machine learning I introduced above.
A more detailed explanation would be that we in the past used statistical models that made a lot of assumptions about the structure of the data and the problem, thus using a rigid model requiring less data and computational power and also offering a lot of explainability about the predictions we made. With the rise of machine learning we moved towards more data-driven methods, and with the recent successes of deep learning we have moved even further towards data-driven methods to make predictions.
The point of this article is to communicate my (current) view of statistics and machine learning and to highlight the power of having good mental models to help you and your computer to make sense of things.
I believe that humans ability to make good holistic models is the number one reason we are better at a lot things than current machine learning technologies, and that the main difficulty facing AI research in the following decades will be in that area.
I realize that like with all things people use to make sense of the world in a holistic way, different people can have different intuitions about this. So if you have another opinion or want to discuss this, please send me an email, I’m always happy to find someone to discuss these things with.