This decade has seen the rise of Big Data informing all things important in our lives. Whether “money balling” winning sports teams, accelerating the pursuit of a cure for cancer or understanding voter sentiments, Big Data has a role to play. It has become common parlance in most large commercial enterprises, some of which have undertaken significant transformative efforts using Data Science to reshape their mission and business functions. Other businesses, not wanting the appearance of being left behind, have put a marketing spin on “us too, we have lots of data”.
For executives who wish to embrace Big Data opportunities, and not just put a spin on the amount of data they already oversee, figuring out courses of action involves traversing through an onslaught of new software technologies and using advanced statistics to drive new decision-making approaches. Making this specialised knowledge accessible to the C-Suite and their wider stakeholder audiences is a critical first step in offering up Big Data opportunities.
The Big Data Challenge For Executives
Most executives understand that there may be high potential upside opportunity in pursuing a “Big Data initiative,” nonetheless, many hesitate. The narrative of “not at this time” revolves around three core themes:
One, they rightfully claim already to possess an abundance of data but a paucity of data-driven insights. The idea of undertaking an initiative that will invite and unleash even more data in this context is considerably unappealing. Two, they assume that it must be an IT initiative, and this may be a function already overrun and behind schedule on existing projects. No matter where you place a Big Data initiative in the organisational structure, executives know it is in large measure a technology based solution set, irrespective of how it is dressed up in new vernacular.
These two challenges interact to create the third, most pressing obstacle. To keep on schedule and to be responsive to market changes, companies have turned to point solution software, typically SaaS based subscription models or sometimes licensed versions of applications. These multi-tenant software-applications build out large tables with linked forms and workflows. Every client who subscribes or rents the software insists on the ability to configure it to their liking and to uniquely name the data inside it. Multiple point solutions are brought in to cover the enterprise’s needs. This reality results in numerous unconnected data sources that in turn sit within various functional silos, most of it “dark” or untouched by analysts. It is organised and (occasionally) stored exactly how data would never be arranged if Analytics was the goal. Massive amounts of dark data make the task of bringing in Big Data Science daunting and hence the idea gets dismissed.
Data Science was born to address these very challenges.
Thinking Through Data Science Solutions
Big Data initiatives almost always begin on an exciting note. The engines of new technology are revved, the cosmetics of dashboards are lit up, and new discoveries are showcased. In the discussions that follow, though, an unfortunate scenario often unfolds where much of the executive audience gets left behind. Bring in the big vendors too quickly and you hear fantastic solutions absent a proper vetting of the actual problems you may wish to solve. As with many challenges in life, it can be helpful to have everyone take a deep breath and step back.
A metaphor that I have found useful is the notion that “data is the new oil” driving the digital economy. It’s a phrase now into its 10th year, originally coined by Clive Humby of DunnHumby fame. It can frame a discussion that allows for decision making across the executive suite, allowing for significant contributions from accomplished general executives that span all corporate functions, each with their domain expertise but none necessarily steeped in technology nor statistics.
“Data is the New Oil”: A 10 Step Framework
If we accept that data is the new oil, then we have a way of wrestling out some important decisions based on some common understanding. There is oil underground. You need oil in your vehicle to urgently get to where you need to get to very soon. Now let’s think through some logical steps in between. When contemplating a Big Data initiative get some clarity by proceeding to think through the following ten steps for extracting oil and using it to get to the place you believe is important.
1. Find the sources of data
To begin the journey, consider the numbers, text, pictures, geolocations, workflow activity, etc. that will be needed to serve up answers and then think where you may find them. It may be underfoot in a beautifully architected well of your company’s making. Unfortunately, it probably exists in several of these, all of them built separately with different original purposes in mind. It may also exist in 3rd party applications like social media streams or public databases. With many Big Data initiatives of merit, you will need to mine from several sources.
2. Extract the data
You may want to grab all of the data or pragmatically just some of it. In many cases, you will want to parse, classify, and aggregate it first while extracting. New Data Science methods exist for drilling into your company’s servers efficiently versus old methods that would either stall or clog traditional “data warehouses”. Don’t forget that your methods must respect data governance standards that meet privacy and regulatory requirements.
3. Pipe it to a common place
A piping conversation will be overshadowed (pun intended) by all the wonders of “the cloud”. Still, you need to figure out how all your required data gets efficiently streamed in and out of where it is now to where you can use it in the cloud. Figure out how all of it gets streamed efficiently to a common place, including to a cloud-based system. It is no fun to learn later that sometimes it gets loaded on disks or tapes and couriered to the place that services the cloud. Or that one of your options is to wait an eternity while the data loads. Be careful when considering a cloud service and ensure you think through how they are going to meet your needs.
4. Clean, blend and treat it
This stage is for most the worst part. The term you need to learn is “data wrangling”. You will be frustrated to find out how every application your company uses stores the same (or similar) data in different ways for different purposes and how this is stalling your initiative. Ask up front to see all the ways a single customer is recorded across all your systems. For B2B companies, it can be disheartening. Go deeper and get dizzy trying to align units of time. If your Big Data initiative has merit, there is no clerical pool big enough to do the job of all the wrangling that needs to get done. That’s when Algorithms and Machine Learning can be a tremendous help.
5. Build your big data lake
Let’s not talk about a big lake with a lot of oil in it. That despoils our beautiful metaphor and is not cool. How about just a big, safe lake of treated data (oil) waiting to be safely piped to responsible users? Does new, clean, blended and processed data stream in continuously? Have you contemplated reasonable future use cases in building it? Is it secure?
6. Pipe it out to users
Is there one big pipe to an internal user group? Or are there some small pipes to internal and external users, where each is served up just what they need? Do you let others take what they need from your lake or do you pipe it out to them as they need it? Get a visual on the various users and ask how they are getting access to the data for the analysis that they are doing.
On trend are “slice and dice” descriptive statistics alongside dashboards. These typically are core to self-serve analytics and they are a good place to start. Research analysts take it one step further, using advanced multivariate statistical analyses on batches of data to create predictive analytics. One more notch up and we have Machine Learning and Algorithms. This is the process of continuously and dynamically improving the predictive equations previously formed, using data that streams. Prescriptive analytics provides one final level above, building scenarios that interact with a domain expert for decision making. Every time you open a Google Map in your city and decide upon the fastest distance versus the shortest distance to your known destination, informed by streaming data on current traffic slowdowns, you are an Analyst using prescriptive analytics in a Big Data initiative.
Data visualisation is to statistics what UX design is to software. Tools of this trade improve on a daily basis. The challenge is not what you can do here but rather what you should do. Be very clear on whether you want your audience not to have to think at all, think a little, or think a lot. Each has its place and purpose. We too often assume our audience needs the latter because we are enamoured with our own knowledge, discoveries and technology. Be wary asking a meteorologist what the weather is like outside. The correct answer is “hot” or “cold”, with an occasional judicious addition of “wet” or “windy”. Ironically, in business, there is often an inverse correlation between how much we have to think about something and how much action it drives.
Find ways that Data Science can increase speed, accuracy, or optimisation and then communicate these improvements to users. The final result of a successful Big Data initiative is not a PowerPoint presentation with insightful recommendations, although this may be a step in the process. In most instances, it is Algorithms and Machine Learning buried deeply in the software ecosystem of the company that it services.
This is the business chasm. The goal of any data science initiative should be to cause someone or something to do things differently today than they did yesterday as part of a process or outcome improvement for your organisation.
There are experts and tools used every step I have outlined above. Unicorns do not exist. Several Professors collaborate and present for one short course on Big Data at MIT as they each have different sets of expertise. Software unicorns also do not exist. Be wary of audacious claims of how all ten steps can be made easy with the click of a button and an easy payment plan.
Bring old school discipline to new school technology. Before making large scale capital expenditures in tools and technology, know that there is an abundance of open source technology available for free or close to free use. The people investment is far more critical than the technology one. In many cases, consider building out a pilot project that includes a deep dive data, talent and resource audit, alongside a Predictive Analytics exercise, all with minimal technology infrastructure expenditures.
After a successful Proof of Concept, you will have a much clearer understanding of what technology and team requirements you need for your Big Data initiative. One of the key outcomes of this exercise is to go back and think through the data “oil pipeline” to determine your requirements. You can now make clear decisions about investments in technology and people, in order to build out a Big Data initiative in your business.
You are now ready to join the Big Data discussion!