Big Data and Progress

We are living in an age of Big Data. In 2012 the Harvard Business Review reported that “… about 2.5 exabytes of data are created each day, and that number is doubling every 40 months or so. More data cross the internet every second than were stored in the entire internet just 20 years ago.” During Fortune’s 2016 annual Brainstorm Tech conference, Shivon Zilis (Bloomberg Beta) said: “Data is the new oil.”  Machines can now offer valuable services such as Google Translate, Apple’s Siri, facial recognition, and self-driving cars but to do this they must learn from vast data sets. Shivon Zilis argued that this demand for data is making it into a commodity like oil.

In some specific areas and with enough data, deep learning machines can outperform human experts. This performance is a tremendous achievement and an example of technical progress, but it is not a universal panacea or silver bullet. Machine learning as whole has many aspects in common with the drivers of progress I described in my earlier article “The formula for progress.”

In that article I outline the formula: 

  • Goal: Setting a goal defines progress - what you are journeying towards and how far have travelled.
  • Ideas: You need to know how you will get to your destination
  • Prediction: Progress is about the future. The objective is in the future, and you work towards it. For an idea to be useful and help it must make a prediction. It must tell you something about the future you are interested in, and like a weather forecast, it must be accurate and reliable.
  • Testing/Results: Progress needs truth or to put it the other way round if you apply false ideas you are very unlikely to make progress and much more likely to regress. The truth needs to best tested.
  • Belief: The test results should decide the truth and what you believe - seeing is believing.
  • Application: Once you believe an idea you need to apply it.
  • Progress: Progress is how far that application takes you towards your goal.

At the core of much of machine learning are mathematical algorithms for updating belief as new information is presented (learning) and in these algorithms belief is a probability calculation. The original example that shows this pattern is Bayes Theorem.

Although the language is different and it uses mathematical notation Bayes Theorem and the symbol for progress have the same shape or form:

  • Idea = Hypothesis
  • Results = Observation
  • Belief: Prior Probability = Belief before the new test results, Posterior Probability = updated belief after including the new results

Testing is the gold standard. Executing a well designed and rigorous experiment provides the most informative data. Bayesian statistics complements this by providing a mathematical method which uses the data to assess truth and update belief. Machine learning and Bayesian statistics are powerful techniques for handling data and when they are harnessed to the right goals they can drive progress.