Interpreting Big Data in the Lambda Architecture

Why business owners should care about this thing called the Lambda Architecture

In the past 25 years I have seen four things that really made me step back and say, “This changes everything.” The first was the browser (before that we got data from the Internet using news groups and anonymous FTP). The second was open source distribution (we could get whole architectures up in hours, not weeks or months). The third was App Stores (Amazon and Apple allowed us to distribute software with zero marginal cost). The most recent was the Lambda Architecture

Yep, it is that big.

If into a business owner or product manager who is into Big Data, data-driven decision-making, iterative A/B testing, machine learning-driven recommendation or any similar analytics application you have probably heard a passing reference about this thing called the Lambda Architecture. However, anyone digging in deeper immediately finds a menagerie of arcane terms that could only appeal to developer: Kafka, Storm, Spark, Cassandra, Elephant DB, Impala, Speed Layer, Batch Layer, Immutable Data Store, etc. This is unfortunate, because it obscures how disruptive of a change the Lambda Architecture represents. As a result, many people with decision-making authority to fund technology changes are missing out on something really big.

Life in the traditional architecture world

Traditional architectures are based on transactions. They force collection of data into formats required to complete a given transaction (i.e., I need to collect N fields of information to process sale of an item). In addition, traditional architectures enable data to be changed: I can update my profile, update my shopping cart, update my order status, etc. This makes perfect sense if your object is to complete a transaction.

But what if I want to understand more about who buys what, who is doing what, or often more importantly what leads something to happen (or not happen)? I cannot get this from the transaction data but instead have to perform “data archaeology” stitching multiple sources of data together to create what happened just before and after the transaction. If I am lucky, I have all this data. However, more often than not I need to engage in development efforts to: collect more data at the time of transaction, log more info, pull it into a data warehouse, change my reports, then dig in to see if I can figure things out. This not only takes much time and effort; it is also a ripe source of errors.

Lambda flips how we view data on its head

The Lambda Architecture starts with an entirely different premise: that it is impossible to understand today all the future uses and interpretations we will need from our data.

This is not just a platitude. It is underlying philosophy that the value of data comes from the ability to ask it to answer as many questions for you that would every want to ask. This drives entirely different approaches to how data is captured, stored, interpreted—and most importantly of all—continuously reinterpreted as you learn and discover more about your company, customers and operations:

  • First data is preserved in its original form and never changed or destroyed. This lets you look at any piece of data at any point in time and factor in changes over time. For example, you could re-segment your customers every year, quarter, or even day as you learn new patterns.
  • Second data is not forced into arbitrary formats (i.e., schemas) but is preserved raw as you may want to go back and gleam different elements. For example you could later realize a variable such as source IP address of a customer visit to your site may entirely change how you measure, interpret and react to customers from this address
  • Third data is engineered to allow it to be easily reinterpreted as you learn more. This does not just focus on making reinterpretation fast; it also makes reinterpretation fault-tolerant (i.e., easy to correct in the event of a bug—without any loss of information)
  • Finally it allows all of this in real-time with two points of view: a just-in-time view and the deep cross-sectional view (both of which are always current). This lets you make decisions quickly without sacrificing the 100% loss-less accuracy needed for important business areas (such as finance, medicine, or mission-critical operations).

Once you have these capabilities, the things you can do with data—quickly and at scale—are pretty amazing. I will share some of these in future posts, as I want to keep this post short.

However, I will close this post out with a simple analogy…

“Think Like I Chef” vs. the Fast Food Menu

Traditional architectures are like fast food menus. You have these options. If you want to change the menu, we can do some market research, see what works and rollout a new menu. If you want to change again (or explore “what if we had done this?”) we can repeat this process.

Lambda architecture is like the pantry of a great chef. You have all these ingredients. If you feel like duck à l’orange, we can make this. If you want a duck confit salad, we can re-purpose the ingredients. If you want really rich potatoes, we can render the fat and cook the potatoes in it. If you want vegan, we can pull other items out of the pantry and make something else. There are so many more options.

5 points where tech balances between life and work