I have been trying subtly shift people from saying “big data” to saying “complex events” for some time now. But perhaps I was being too subtle.
Big Data, like master data management (MDM) and customer relationship management (CRM) before it, has a problem. The problem is that's it's a true innovation – so nobody understands it.
Also, it is an innovation that is technology-driven in the sense that it is something that wasn't possible without advances in technology. However, again like MDM and CRM this distinction between being technology-driven and being a true innovation is hard to manage. For these types of innovations, eventually (or rather almost instantly in the case of big data) the phrase is adopted by vendors and then becomes just “a set of technologies” rather than the innovation that those technologies have enabled.
The simplest solution to this problem is to two names. One name for the problem and one name for the solution. In this case the problem is “complex events” and the solution is “big data”. More specifically, the problem is complex events over mixed granularity and time, where some data is a proxy for other data.
Firstly, let's clarify what we mean by a simple event. Take a look at that following row of data:
This data represents a simple event. The way we understand this data as an event is also simple. We just have to put some clear labels on the data, and resolve any codes used, and we can quickly see that the event was the purchase of 2 blue ballpoint pens.
As this is a simple event it lacks the features of a complex event. This purchase occurred at a single moment – in this case some time on the 12th of May 2012. We can debate how accurately we can see the time of the event but this is the type of event that is described entirely at one particular moment.
The second significant feature that the simple event lacks is mixed granularity. Note that the price shown is the price of a single pen. There is also no complexity beyond simple multiplication in determining the total price involved in the event. There is no reference to things like “the total value of pens that the particular person has bought in their lifetime” that is fundamental to understanding the event.
Thirdly, all of the information provided to describe the event is directly related to the transaction. Even the numerical code used to identify that the product was a pen is just a code to simplify some uses of the information. The event wasn't a transaction to acquire part of the means to produce a letter to the purchase's wife's mother. While this might be true it is not part of the event.
These characteristics are what make the event simple. Complex events have the opposite characteristics. Managing these events takes data of mixed granularity, mixed time periods, and includes some data that is a proxy for other data.
If the purchase of the pen itself is a simple event then the decision to buy a pen is a complex event. You might sketch this decision as follows:
In fact, you could quickly add even more complexity to this sketch. This is a genuinely complex event. It crosses time periods because if you are going to be home in 20 minutes and you know you have pens there you don't have to buy one now.
If you are trying to encourage this person to buy a pen you will need to mix granularity and use proxies for the information in the buyer's head. You may know that pens displayed at the front counter sell better than pens at the back of the counter – and you can use this aggregated information – which is a proxy and at a different level of granularity to the rest of the event – when managing the event.
Big Data then is the technology that you can use if you want to understand, influence the outcomes of, or otherwise manage complex events. When big data is presented as Volume, Variety, Variability (etc.) it could be argued that these are features of the data required to support complex events rather than features of big data.
You will often hear “that's not a real big data problem!” which is true but equally you could say “that's not a complex event!”. Importantly, while you could apply the technology of big data to problems that are not real big data problems a complex event is a complex event.
Some may suggest that this distinction between simple and complex events has always been true – and they would be right. The concepts are not new. However, it shouldn't be suggested that we had the ability to manage complex events before the technology allowed us to do so.
It is true that we have been able to create computerised models for some time which have allowed us to, for example, generated a list of customers likely to churn, or pre-process a reasonably accurate next best offer for a customers. However, there is a difference between being able to provide business intelligence and analytical insights across domains where complex events exist, compared to actually being able to work with the complex events themselves, in real-time, across broad domains, cost effectively, and with minimum engineering investment.
So, it's the “complex event” revolution we should be exalting not the “big data” revolution.