Analysis

Analyzing tech trends, products, and developments

Why business owners should care about this thing called the Lambda Architecture

Updated on April 19, adding “Mapping this back to…” final section

In the past 25 years I have seen four things that really made me step back and say, “This changes everything.” The first was the browser (before that we got data from the Internet using news groups and anonymous FTP). The second was open source distribution (we could get whole architectures up in hours, not weeks or months). The third was App Stores (Amazon and Apple allowed us to distribute software with zero marginal cost). The most recent was the Lambda Architecture

Yep, it is that big.

If into a business owner or product manager who is into Big Data, data-driven decision-making, iterative A/B testing, machine learning-driven recommendation or any similar analytics application you have probably heard a passing reference about this thing called the Lambda Architecture. However, anyone digging in deeper immediately finds a menagerie of arcane terms that could only appeal to developer: Kafka, Storm, Spark, Cassandra, Elephant DB, Impala, Speed Layer, Batch Layer, Immutable Data Store, etc. This is unfortunate, because it obscures how disruptive of a change the Lambda Architecture represents. As a result, many people with decision-making authority to fund technology changes are missing out on something really big.

Life in the traditional architecture world

Traditional architectures are based on transactions. They force collection of data into formats required to complete a given transaction (i.e., I need to collect N fields of information to process sale of an item). In addition, traditional architectures enable data to be changed: I can update my profile, update my shopping cart, update my order status, etc. This makes perfect sense if your object is to complete a transaction.

But what if I want to understand more about who buys what, who is doing what, or often more importantly what leads something to happen (or not happen)? I cannot get this from the transaction data but instead have to perform “data archaeology” stitching multiple sources of data together to create what happened just before and after the transaction. If I am lucky, I have all this data. However, more often than not I need to engage in development efforts to: collect more data at the time of transaction, log more info, pull it into a data warehouse, change my reports, then dig in to see if I can figure things out. This not only takes much time and effort; it is also a ripe source of errors.

Lambda flips how we view data on its head

The Lambda Architecture starts with an entirely different premise: that it is impossible to understand today all the future uses and interpretations we will need from our data.

This is not just a platitude. It is underlying philosophy that the value of data comes from the ability to ask it to answer as many questions for you that would every want to ask. This drives entirely different approaches to how data is captured, stored, interpreted—and most importantly of all—continuously reinterpreted as you learn and discover more about your company, customers and operations:

  • First data is preserved in its original form and never changed or destroyed. This lets you look at any piece of data at any point in time and factor in changes over time. For example, you could re-segment your customers every year, quarter, or even day as you learn new patterns.
  • Second data is not forced into arbitrary formats (i.e., schemas) but is preserved raw as you may want to go back and gleam different elements. For example you could later realize a variable such as source IP address of a customer visit to your site may entirely change how you measure, interpret and react to customers from this address
  • Third data is engineered to allow it to be easily reinterpreted as you learn more. This does not just focus on making reinterpretation fast; it also makes reinterpretation fault-tolerant (i.e., easy to correct in the event of a bug—without any loss of information)
  • Finally it allows all of this in real-time with two points of view: a just-in-time view and the deep cross-sectional view (both of which are always current). This lets you make decisions quickly without sacrificing the 100% loss-less accuracy needed for important business areas (such as finance, medicine, or mission-critical operations).

Once you have these capabilities, the things you can do with data—quickly and at scale—are pretty amazing. I will share some of these in future posts, as I want to keep this post short.

However, I will close this post out with a simple analogy…

“Think Like I Chef” vs. the Fast Food Menu

Traditional architectures are like fast food menus. You have these options. If you want to change the menu, we can do some market research, see what works and rollout a new menu. If you want to change again (or explore “what if we had done this?”) we can repeat this process.

Lambda architecture is like the pantry of a great chef. You have all these ingredients. If you feel like duck à l’orange, we can make this. If you want a duck confit salad, we can re-purpose the ingredients. If you want really rich potatoes, we can render the fat and cook the potatoes in it. If you want vegan, we can pull other items out of the pantry and make something else. There are so many more options.

Mapping This Back to Things Business People Care About

So what does this mean for your business? Do you remember the last time heard these comments:

  • “You’ll see that report. It will be in our Data Warehouse–tomorrow around 10am.”
  • “Oh, that’s in our warehouse. We can build a program to convert and and load the data into production. It will only take 3 weeks. Can you submit your TPS form to the Steering Committee so we can prioritize this?”
  • “Gee, it’s too bad we did not capture that data. We can start to capture it now. In a few months we can start analyzing it.”

With Lambda, all of these comments–and many more–go away. Data is never thrown away. It is always in production, ready to be used–for analysis or real-time transactions. There is no delay between transactional use and analysis–data flows down both paths as once.

Just imagine what problems you can solve when these limitations go away.

Minority Report meets the NFL: Which College Football Program Is Most Likely to Lead to Future NFL Arrests?

A few weeks ago I published a deep-dive analysis of USA Today’s NFL Arrest Database.  While I received many comments (mostly via email or Twitter DM), two rose to the top:

  1. College is a formative experience. Did the college the player attended affect the likelihood of arrest (and criminal charge)?
  2. Many towns have very active football programs. Did the town or high school the player attended drive specific outcomes?

Are we getting into Minority Report territory?

The more we look at variables that could be used to to classify future criminal behavior (e.g., does going to college X now indicate you will be arrested for Y seven years later?), the more we get into a world more like that depicted in the movie Minority Report. As such, we need to be really careful to ensure as compare “apples-to-apples” for all analysis.

This post will answer the first question. I am still processing the data on high schools before writing up the second.

The college that led to the most arrests: WVU

It is a bit tricky analyzing which college led to the most arrests. You cannot simply count arrests by NFL players and group them by college program. This would penalize colleges with great programs (a college with 200+ alumni in the NFL should have more alumni with arrests than a college with only 5 alumni). Similarly ,you cannot simply look at the ratio of arrested NFL alumni to total NFL alumni (as this would penalize a college with few alumni).

So how did I answer this question? I combined two factors. I overlayed the following:

  • Top 5% college programs with most alumni in the NFL
  • Top 5% college programs with most alumni arrested in the NFL

Here is a visualization of the result:

Spiral chart shows how many NFL arrests were from players from each respective college program. The seven schools highlighted were schools in BOTH the top 5% for NFL placement AND top 5% for NFL arrests.
Spiral chart shows how many NFL arrests were from players from each respective college program. The seven schools highlighted were schools in BOTH the top 5% for NFL placement AND top 5% for NFL arrests.

West Virginia University not only had the most arrested NFL alumni. It also had the most arrested NFL alumni in comparison to all other top college programs. 

I took a look at the ratio of each of these school’s arrested NFL alumni per 125 total NFL alumni to get a “arrest per squad that made it to the NFL.” The results we interesting:

  • The average “Top 5%” team had 4.53 arrest per NFL squad
  • WVU had 18.06, nearly 4x the average arrest rate
  • WVU also had almost double the next highest arrest rate: University of Miami (FL) who had 9.87 arrests per NFL squad

Other Schools with Many Arrested NFL Alumni

As the spiral diagram shows above, WVU was not the only school in the “Top 5%” for both alumni who made it to the NFL and alumni arrested in the NFL. There were seven schools or made this list:

College/
University

NFL
Alumni

Arrests of
NFL Alumni

Arrests
per Squad

West Virginia

180

26

18.06

Miami (FL)

304

24

9.87

USC

470

23

6.12

Ohio State

409

22

6.72

Florida

278

19

8.54

Michigan

346

17

6.14

Georgia

281

16

7.12

Are these numbers really bad?  The answer is a definitive “Yes”.  Let’s take a look:

  • While these colleges represent 2% of all colleges  that have placed players in the NFL, they represent 20% of all future NFL arrests.
  • The average college team’s NFL alumni have an “arrest per squad” rate of 4.89
  • The average “arrest per squad” rate of the team with the highest placement of NFL players is even better: 4.53
  • These seven colleges have a much higher rate: 8.10. This is 118% higher than average arrest rate of the all other schools with the most success placing alumni in the NFL

What is the path from College to NFL team to type of arrest?

After look at my Sankey diagram of NFL team to  criminal charge to legal outcome, many people asked me if I could show a similar diagram leading from college to NFL team to criminal charge. Doing this for all 158 colleges with arrested NFL alumni would be unreadable. However, here is a Sankey diagram of the flow from the seven universities with the most NFL arrests:

Sankey diagram representing the "flow" from university to NFL team to arrest (for the "Top 7" programs highlighted above).
Sankey diagram representing the “flow” from university to NFL team to arrest (for the “Top 7” programs highlighted above).

So, do specific college programs have a higher tendency for a type of crime?

My prior analysis  showed a strong correlation of criminal charge type by NFL team.  This led to people to ask me the following “smoking gun” question: does any college stand out as the “leader” in arrests of a particular criminal charge. The answer is No.

Yes, there is a college whose NFL alumni had the most arrests for criminal charge X. However, the numbers of arrest by criminal charge by college are so small that there are no statistically valid indicators that college team indicates any future criminal pattern. We should all be happy for that.

There might be a wider dimension than college that we could assess (e.g., conference, geographic area). However, college–in and of itself–is not a valid dimension to predict future criminal charge.

Some other interesting (and positive) insights

With all the attention on NFL arrests if it easy to overlook the positive. My analysis of colleges showed some strong positives as well.

Very successful college programs–in general–do not equate to high arrest rates:

  • The colleges with the highest success rate placing players have a 19% lower arrest rate than the average college program. Notre Dame, UCLA, UL Monroe, Wisconsin, Syracuse, Minnesota, Boston College, Mississippi, Baylor, Indiana, Northwestern, Northwestern State (LA), and Arizona stand out as schools with the lowest alumni arrest rates.
  • The most successful program, Notre Dame (with a whopping 536 alumni who made it to the NFL) had only three  alumni arrested. This corresponds to an arrest rate that is 86% lower than the average school.

Also, NFL players are arrested 1/3 less often than the average US population. Clearly emulating the the examples set at Notre Dame, UCLA, UL Monroe, Wisconsin, Syracuse, Minnesota, Boston College, Mississippi, Baylor, Indiana, Northwestern, Northwestern State (LA), and Arizona would lead to better outcomes for all.

More to follow later…