Machine learning - what it is and is not for investors
I have been seeing more managers use machine learning tools to help with the investment process. This is an important advancement and will be very useful in helping generate better return generating engines. However, there are may investors who do not known what machine learning is or what it should be able to do. Simply put, machine learning is having an algorithm learn without explicit programming, a process of improvement with experience from new data. It is the training of a model for data that can be generalized for decisions against some performance measure.
I have listed some key ideas of what machine learning is and is not for an investor who is just being exposed to the concept.
Machine learning is not artificial intelligence.
Many think that machine learning is a sub-discipline within AI; however, there has been a large divergence between machine learning which is grounded in data and statistics and AI which is focused on logical systems. AI, for many, never realized its potential in the investment area. We do not believe that the same failure will exist in the machine learning field given the different goals and objectives.
Machine learning is not data mining.
Data mining focused on finding relationships in large data sets that were not able to be easily discovered or are not readily apparent. Machine learning potentially uses large data sets to train systems to make predictions. The elements of machine learning include: data mining, statistical inference, and prediction.
Machine learning is not a black box.
Many AI models have relied on neural networks which can seem like a black box which cannot be understood by the manager or investor. There is talk of a “hidden” layer which adds mystery. Machine learning can be closely directed or supervised in order to apply what has been learned to new data, or it can be unsupervised in order to draw inferences from data.
Machine learning is not new.
Machine learning was first developed decades ago with many of the statistical techniques used coming from older approaches toward inference and prediction. The more recent development is that cheap computing and large data sets has allowed for more and quicker learning.
Machine learning is not testing endless alternatives.
Machine learning does not mean that a computer and manager is going on an endless hunting expedition for an over-optimized result. In fact, the value of machine learning is the ability to blend and weigh many alternatives which could not be done in the past.
Machine learning requires statistical/programming/market skill.
Fundamentally, the manager who uses machine learning combines strong statistical foundation with programming skills to look at a wide set of alternative which can be weighted or excluded. Nevertheless, there still needs to be a sense of market to help learning and understanding.
Machine learning will find relationships you did not expect.
Machine learning is not just data mining which is explicitly looking for relationships within a large data set; nevertheless, machine learning may be able to weigh different model alternatives and find combinations which would not be thought of through the normal process of simple hypothesis testing. There is learning without explicit programming.
Machine learning does require a lot of computing power.
The machine learning by its very nature will look at many different combinations of data and learn how to update models when new information is introduced and patterns found. The process of stepping forward and determining the impact of new information is computing intensive.
Machine learning can be either supervised or unsupervised.
While machine learning can be unsupervised and free form when looking at model alternatives, it can also supervised whereby the learning is very specific and directed to apply what has been learned from new data.
Machine learning will make you a smarter investor.
When machines learn, the modeler learns. One of the key advancements of using machine learning is the ability to find dynamic relationship in data which are not immediately obvious and make predictions.