Breaking the Status Quo, Bayesian Models and Better Decisions
Based in Westminster, focaldata is a tech startup founded in 2017. We use Bayesian modelling to help companies communicate with their customers more effectively.
Classification Models and the Status Quo
For as long as commercial data analysis has existed, classification models have typically been built with a maximum-likelihood algorithm. Besides performing the computation on the best model fit to the data, it optimises the model’s parameters. The latter involves finding the most likely single value for each parameter, given the data input.
As to linear regression, this can be achieved via a rapid analytical solution based on matrix multiplication. However, for classification problems, a numerical algorithm must be used to iteratively attempt plausible values and assess their likelihood (given the selected model structure, such as logistic regression).
Usually, this includes a gradient calculation that allows the algorithm to ‘climb the peak’ to the maximum probability and return that single value as the solution. This process has often shaped even the way we think about and interpret results; that they are somehow exact and final in themselves. That if they prove to be wrong or misleading, this must mean the model was poor.
Still, maximum-likelihood algorithms are at least easy to implement and interpret – given the wide availability of open-source libraries – and are quite fast.
Bayesian Breaks Tradition to Unearth Richer Data
In Bayesian analysis, we aim to return not just a single value, but the entirety of the posterior probability for each parameter. To do this, we use algorithms that rely on a sampling approach. Here, random draws are made from each parameter, and whilst each individual draw is random, over a large enough number, the draws represent an increasingly accurate view of the parameter posterior. That is to say, the most likely points will be the ones drawn most often.
This leads to richer information so we can assess the uncertainty involved in each estimate.
Crucially, this gives not only a single parameter estimate, but as many estimates as draws that were taken. This leads to richer information so we can assess the uncertainty involved in each estimate. What’s more, it forms a substantially more robust algorithm – one which copes with more parameters than that of a maximum likelihood setting.
There, estimators often fail when there’s insufficient data for the number of parameters; hoped-for complexity must be scaled back and insight limited. This is seen most clearly for asymmetric or distorted likelihoods with local maxima. A gradient-based approach can wrongly represent the global maximum (amongst other difficulties).
Figure: on the left, the maximum likelihood algorithm climbs the peak to find the single highest value – the most likely parameter estimate – and returns that one value. On the right, we have a Bayesian result showing the full distribution, allowing understanding of both uncertainty and also more complex shapes to the posterior, such as asymmetries (not shown).
Preventing Data From Going to Waste
Furthermore, in Bayesian analysis, we can apply past information in a straightforward and intuitive way. That is, we use data from previous years as a starting point for this year’s analysis.
We use data from previous years as a starting point for this year’s analysis.
And this aligns with experience-based decision-making in a business context. If I would reject results of an analysis that don’t match my industry knowledge, why wouldn’t I also account for that information in the model structure itself? Bayesian analysis offers a straightforward way to do this.
Once Impractical, Now Practical
But the costs up to this point have been too great to justify the benefits. The computing power required is relatively higher, due to the higher number of estimates made (one for each draw). Until recently, you couldn’t get this at a reasonable, practical cost.
Add to this the increased complexity of model implementations and interpretation of their outputs, and you end up requiring a higher level of training for analysts. Not to mention software that needs lots of manual tuning and a time for fitting that is unavoidably longer.
Bayesian analysis – till now – has largely been the preserve of academic research…But this is changing, driven by a confluence of factors that alters the cost-benefit equation greatly.
All this is why Bayesian analysis – till now – has largely been the preserve of academic research, next to a few commercial applications with deep pockets, like drug discovery.
But this is changing, driven by a confluence of factors that alters the cost-benefit equation greatly. Most obviously, computing power gets cheaper every year. Combine this with cloud computing platforms, and you also enjoy; the ease of powering up and down high-powered machines on-demand; the ability to only charge users for the specific time they are in use.
Life is easier for analysts too. To reduce the burden, we have user-friendlier open-source software implementations – Stan, PyMC3, Edward – alongside sophisticated Bayesian algorithms that auto-tune parameters.
The final driver of change is the relationship between PhD graduate numbers and academic positions available. While the former continues to rise annually, the latter stays stagnant. This ever-increasing overflow of highly-trained scientists and mathematicians then fills up the data science profession (indeed, the writer of this article went through this path). There, training, ideas, ambition – alongside familiarity with and interest in the latest academic research – shape the radical changes happening in commercial data analysis.
Empowered Executives and Meaningful Decisions
Certainly, Bayesian analysis may be even more powerful in a commercial setting than has been proven in an academic one.
For, in a commercial environment, we’re not just focussing on the relationships between different inputs to a model and their relative importance. We’re also interested in a decision-making process not found a lot in academic research (outside of decision theory itself).
The Bayesian method empowers executives to make decisions with the maximum of information available from all known sources, both past and current.
Consider a business with a choice of several possible strategies. The Bayesian method empowers executives to make decisions with the maximum of information available from all known sources, both past and current. Beyond a mere improvement on the status quo, it is an entirely new way of interpreting and utilising models. Enabling all this is Bayesian’s full understanding of uncertainty, use of past data and ability to specify more complex models.
And at the heart of this development is focaldata: we’re so excited about new commercial applications of Bayesian methods. What’s more, we’d love to meet businesses who share our passion for ensuring the best-informed decision-making in today’s competitive marketplace.