Future Data 2020
Many winter moons ago, I (virtually) attended Future Data 2020, a conference about the next generation of data systems. During the conference, I watched an interesting talk given by Tristan Handy, founder and CEO of Fishtown Analytics, called The Modern Data Stack: Past, Present, and Future. During the talk, Tristan discussed a so-called Cambrian explosion of data products built upon data warehouses, such as Amazon Redshift, between 2012 and 2016, as well as his opinion that we are on the precipice of a similar paradigm shift, which he referred to as “the second Cambrian explosion.”
Tristan’s perspective on the modern data stack provided much food for thought; however, the topic I want to explore here stems from a brief comment made about the future of self-service in data-driven decision-making: How those who are not necessarily on the cutting-edge of data science can best leverage their data to make informed decisions.
To start this exploration, I will first give a simplified version of a past during which I was not of working age and with which I therefore have no direct experience: Prior to the aforementioned (first) Cambrian explosion, data analysis was primarily carried out using spreadsheets, such as (of course) Microsoft Excel. In many theoretical offices in the 90s and 00s, countless nameless and faceless theoretical analyst/decision-makers spent their Mondays through Fridays bouncing among tens of tens of Excel spreadsheets, adding calculated fields in two-lettered columns and introducing errors for which there would be no record; it was a laugh riot, the analyst/decision-makers earned decent theoretical wages for their time spent, and everyone watched Friends in the evenings without feeling obligated to discuss how problematic it was.
In more recent times, with the advent of modern data warehouses, data storage was able to be better separated from data analysis, and many, many SaaS companies profited off this division on scales not easily understood by humans. So rather than the happy-go-lucky Friends’ era paradigm, with data tabulated in one program with nice little cells and able to be analyzed in that same program by analyst/decision-makers, a number of new business intelligence platforms began to make their way into offices, raining on everyone’s parade, and just because the new guy attended a “conference” about the “future” in “Des Moines.”
The Modern Data Racket
Let me take a step back: In or around his talk (source), Tristan made the following comments:
“How do you democratize self-service? Controversial, but I believe the Modern Data Stack disempowered many decision-makers. Those comfortable w/ Excel feel cut off from the source of truth. What if the spreadsheet interface is actually the correct way?”
Tristan prefaces his claim that the modern data stack has disempowered decision-makers with the warning that his opinion may be controversial, but I would argue that his statement is not controversial at all, mostly because it is unarguably true. With the shift of data storage and analysis away from the flexible and easy-to-use spreadsheet and toward ecosystems such as data lakes, data rivers, data abysses, and data Charybdises, end-users (i.e., the analyst/decision-makers of yore), many of whom primarily use data tools as a means to an end, have likely lost their way.
Put simply, it is not as simple to navigate the modern data stack as it was to navigate acres of spreadsheets. Spreadsheets, with all of their flaws, have almost no learning curve: if you can turn on a computer and open a file, you can navigate a spreadsheet. Furthermore, from the start, you are only a few clicks, keystrokes, and neural connections away from mastering formulas, pivot tables, and visualizations. I hate spreadsheets! — but they are a near-perfect balance of usability and flexibility.
In contrast, while modern business intelligence tools may be built with end-users with varying levels of technical expertise in mind, they tend to have a steeper learning curve. For example, while today’s decision-maker, now stripped of his or her ‘analyst’ status, can likely navigate a dashboard and make decisions based on the information presented, he or she has lost the almost-tactile experience of sifting through the data with his or her own hands.
The Second Law
I know what you are thinking—literal metric tons of decisions were made based on little more than a pie chart from an hours-long presentation that was not put together by the person who had final say in the decision-making process—but please allow me to employ the above generalization to support my next point: Every new data technology moves the decision-maker further downstream from the data source.
Today, the data required by a decision-maker may be located in a neatly designed dashboard, on physical servers, somewhere in the cloud, and/or on the backs of napkins, they may have underwent various transformations and exist in several slightly different forms of varying accuracy and transparency, and most likely, that decision-maker does not know which of these sources contains the data he or she needs to make an optimal decision, nor the processing those data underwent; it is complete chaos, and not the fun and festive Bacchian kind (and I haven’t even spoken of the inherent fuzziness of data).
The more technologies a company implements in its data stack, the more points there are for potential misunderstandings, and the more training individual decision-makers have to undergo to become fluent in the data stack on which they rely. In other words, decision-makers are being disempowered by the increasing complexity of the modern data stack.
So… What now?
I see three possible solutions to the problem of disempowerment:
- Stop trying to reinvent the wheel and keep spreadsheets around for the long haul;
- Hire decision-makers who are prepared to use and keep up with the modern data stack; or
- Promote close collaboration between data experts and decision-makers to support decision-making.
All three options have pros and cons, but I am personally a fan of the third. In the last decade or so, there has been a rapid increase in our ability to store and manipulate data, and spreadsheets alone cannot be expected to fulfill all modern data needs. Similarly, in many industries, decision-makers alone cannot be expected to stay on the cutting-edge of data science. Therefore, it follows that close collaboration between data experts and decision-makers is becoming increasingly necessary in the modern office.
Alternatively, perhaps in time an easy-to-use tool will come along that can be used to both store and analyze data… Oh… wait… that’s the spreadsheet.