Tag Archives: risk management

If a tree falls in the woods and no one heard it, did it happen? Not in Streaming Analytics

Interest in “Streaming Analytics” has exploded over the past few years. The reasons are two-fold. First, the rise of the Internet of Things has made it possible for the first time ever to get data directly (and automatically) from infrastructure, cars, homes, factories and more—all without a human people ever having to do something. To put this in perspective, last quarter more new automobiles were connected to mobile networks than new cellphones were. Second, the technology is now readily available to implement streaming analytics at massive scales without needing to invent your own frameworks. Not one, but three technology projects (Storm, Spark, and Flink) are available for your choice. One of them, Apache Spark, is now the second-fastest growing open source project in history.

Streaming Analytics is a very fun field to be in (I have been in for 22 years—in the national security arena, eCommerce, med-tech, and now Industrial IoT). Taking in data faster than any human being could examine it and analyzing in near real-time to make split-second decisions creates provide omnipresent knowledge and enormous business value. However, Streaming Analytics presents a new challenge that does not exist in traditional After-the-fact Analytics:

You need to figure out how to make decisions on data that you do not know about yet—and may not ever find out about it time to make it worth your time.

Three real-world examples

To put it in philosophical parlance, how does anyone know if a tree in the forest fell down if no one ever sees (or hears) that it fell? As philosophical as this sounds, it can have multi-million-dollar impacts in the real world. Here are three examples:

Example 1: eCommerce Chatbot

My chatbot is engaged with a new prospective customer who may eligible—based on her mobile number—for our bank’s highest value credit card. Unfortunately, that data is delay in getting to my bot. As a result, at this point in time, I do not know whether the customer is: very valuable, average, or a credit risk. What does my chatbot do?

chartbot_400px

Example 2: Guaranteed Shipping

I have a booking to delivery high-value cargo to a customer site by end of business today. It is now 15 minutes after the day is over. I might be inclined to escalate to my carrier that the container has not arrived. However, at this point in time, I cannot tell if: the container arrived but the signal from the carrier is delayed getting to me or of the container did not arrive. What do I do?

container_3

Example 3: Infrastructure Security Monitoring

I run a cattle farm that is hundreds of thousands of acres. I have equipped all gates in my Smart Ranch with sensors to alert me if any are open (so I can prevent the cattle from getting away). The sensors send updates every 15 minutes. However, one of the gate sensors is a few minutes late. At this point in time, I do not know if the gate is open or closed. Does my system trigger an alert?

140647

What makes Streaming Analytics different

All of these challenges are based on lack of information. Lack of information is typical in analytics (as well as messy data, data gaps, corrupt data, duplicate data and many other issues). However, in the streaming analytics there is one critical difference: you will eventually have the data you need right now to answer your question. However, by the time you receive it, it will be too late to make your decision: the eCommerce customer will be gone; your freight contract will be honored or broken; the cattle will be safe or have gotten away.

What makes this especially different is that all the parties involved with your business will know the answer as well. If my chatbot fails to offer a valuable customer the best credit card, the line of business GM will ask why “it was so stupid.” If I call the customer up to tell them the freight has not arrived and they respond with “but it got here 10 minutes before closing”, I will look stupid. It all boils down to this:

56818465People may not know when After-the-fact Analytics miss a point; however, everyone will know that your Streaming Analytics made a mistake.

 

That can be stressful 😉

What’s a person to do?

The essential thing to remember when designing your Streaming Analytics solution is this:

Close enough and in-time is much more valuable than perfect and too late

This means you need to build your solution to make a decision based on the information available (rather than waiting until the critical moment has passed). The trick is determining what is “close enough”. The answer to that question depends on your business context. Specifically, given your context, is it better to accidentally do something you should not have (a Type I error) or is it better to not doing something you should have done (a Type II error).

Let’s looks at how this works in each of the three examples:

Example 1: eCommerce Chatbot

Our business context determines it is far worse to get a prospective customer excited about an offer that we cannot deliver instead of offering a less valuable package (i.e., we are Type II biased, something typical in ad-tech and eCommerce). We do not make the highest-value offer.

Depending on our Risk Policies we make the normal offer (one for which a majority of customers qualify) or shunt the customer to a slower process (email vs. chat) to wait time for the data to catch up (essentially shifting to batch). Most commerce companies have created default packages that allow the former action, allowing them to make more money in the “80% most likely case”. We could also apply a machine learning algorithm to guess the best alternative offer, maximizing revenue and minimizing the risk of an angry customer (or wasted time).

Example 2: Guaranteed Shipping

Our business context indicates that it does not make sense to alert that we are late if we do not know it (yet)—especially given the likelihood that this could result in some “egg on our face” when the customer asks why we did not know the container arrived 20 minutes ago. As a result, we do not alert we are late at 5:00pm. We make the call when we know for sure that the container was on time vs. late (i.e., when the delivery message actually arrives). This scenario is also Type II biased.

However, we do not want to expose ourselves to a completely irate customer in high-value circumstances. As such, we place a secondary streaming analytic in place: if we do not receive confirmation within more than 60 minutes from scheduled delivery we trigger an alert to reach out to our delivery carrier and find out the real status (i.e., by taking the expensive step of talking to a person vs. a sensor). We determined the “magic number” of 60 minutes by doing After-the-fact Analytics that determined waiting this long will automatically resolve the 80% of false positives while still giving us enough heads up to detect the true issues. If we are even smarter we can have our After-the-fact Analytics system automatically calculate the magic number to delay alerts based on location, time-of-day, day-of-week and other features.

Example 3: Infrastructure Security Monitoring

Our business context indicates that is not good to close the farm door after all the cattle got away. As such, we have programmed our Streaming Analytic system to alert us if the gate is opened (before a human has sent a “I am opening the gate” message) OR if we have not received confirmation that the gate is closed for period of longer than 15 minutes. Essentially we are Type I biased (not uncommon in safety and security situations).

Unfortunately this bias will result in lots of alerts. Essentially any time the sensor message is delay in the cell network our alarm will go off. Luckily, we have some more advanced analytic techniques to help with this. Namely, we can use a Lambda Architecture model that provides self-healing: the initial lack of confirmation that the gate is closed triggers an alert; the arrival of the delayed message that the gate WAS closed then cancels this alert (with a resolution message). This is still a bit chatty. However, it short-circuits false positives and prevents the need to send a worker (or a drone) all the way out to the gate to check if it is open.

Conclusion

Yes, Streaming Analytics is a harder than After-The-Fact Analytics. However, it the near real-time omnipresence (not omniscience) offers tremendous benefits. You just need to think in philosophical terms when designing your analytic rules.

entropy

On the 45th Anniversary of the Moon Landing: 5 Lessons the Apollo’s Program Manager taught me at MIT

I originally posted a version of this on five years ago, on 40th Anniversary of the Apollo Moon landing. At that time, social media and smartphone were just starting to explode. Today, as social sharing and mobile are giving rise to IoT, these lessons from 1969 are perhaps even more important.

Putting things in perspective

It is easy to feel really proud of our accomplishments, whether we are scaling a consumer application a 1,000-fold in one year, rolling out a huge ERP program or even creating a new technology. However these accomplishments pale in comparison to what the Apollo, Gemini, and Mercury Missions achieved 45 years ago. Imagine this scenario:

You are listening to the radio and the President announces that the country is going to put a man on the Moon by the end of the decade. Keep in mind that no one has ever even escaped low earth orbit–let alone escaped Earth’s gravity, executed Holman transfers AND navigated to another body. Now you have to implement the largest engineering project in history, while inventing not only technologies, but also whole fields of study. All under the watch of the press—and all completed within one decade.

This is inconceivable to most of us in our work today. It is inspirational.

Success: One small step for man, one giant leap for mankind. (Credit: NASA)
Success: One small step for man, one giant leap for mankind. (Credit: NASA)

My lucky exposure to the people of Apollo

At the time I studied aerospace engineering at MIT, we were lucky enough to have several veterans of the Apollo Program on staff as our instructors. Not only were they great instructors; they also could recount first-hand experiences of events that the rest of us could only read about in the history books.

One of these professors was Joe Shea, the original Program Manager of NASA’s Apollo Program (portrayed by Kevin Pollack on HBO’s excellent series, “From the Earth to the Moon”). Contrary to what that series depicted, it was Joe who came up with concept of splitting the Apollo Program into missions that achieved never-before-achieved technology marvels.

Joe is also considered by some a founder of the Systems Engineering profession (many consider him the greatest systems engineer who ever lived). This made him the perfect person to each the capstone class of the aerospace curriculum: Systems Engineering (Fred Wilson of USV has written a great post on how fun Systems Engineering is and how important it is for engineering leadership). Every year, he would get a project from NASA and guide his students through all aspects of design, simulation, planning and even cost analysis. Our midterms and finals were real-life presentations to the Administrator of NASA.

Under Joe, I got to work on something called “Project Phoenix,” returning to the moon—but now with a re-usable capsule and landing four astronauts at the pole and keeping there for 30 days (a much harder prospect). In this project I learned about everything from active risk management to critical path costing to lifting bodies to Class-E solar flares. (How cool was that for a 20-year-old?)

Life lessons I learned from Joe

The technical things I learned from Joe got me my first job at Lockheed Martin (then GE Aerospace). It was great to be able to say that I had worked on a NASA program, helped create both a PDR (Preliminary Design Review) and CDR (Critical Design Review) and present elements of them to the Administrator of NASA in Washington.

However, I learned five much more important lessons — independent of aerospace or any other technology – that I have used in the eighteen twenty-three years since:

  1. Break Big Challenges into Small Parts. Any obstacle can be achieved if you break it down to smaller items. If these are too large, break them down again. Eventually you will get to things that have clear, straightforward paths for success. Essentially this is the engineer’s version of “a journey of a thousand miles begins with a single step”
  2. Know Your Stuff Inside and Out. You cannot be a technology leader who only manages from above. You must understand how the components work. This is the only way you will see problems before they happen. Remember, you are the leader who is the only one positioned to connect the “Big Picture” to the execution details.
  3. S#!% Happens. Things break. Schedules are late. People leave the project. Plan for this. Ask yourself every week what can go wrong. Put contingency plans together to address the biggest or most likely of these. Today, this is done in everything from Risk Management to DevOps.
  4. There is No Such Thing as Partial Credit. Yes, unlike a rocket, you can “back out” (essentially un-launch) software. However, the costs of this type of failure are enormous: not only does it cost 3-5x more to back-out, fix and regression test changes, it also frequently results in lost revenue and customers. Get things right in development – then certify them in testing (not the other way around). Don’t count on being able to “back-out” after a failed launch–this will be come more and more true as we push software to millions of “things” comprising IoT. Joe hammered a lesson into our heads with a chilling story: when people forgot this and rushed three astronauts died during a basic systems test on the Apollo 1.
  5. Take Ownership. If you are the leader, you are responsible for the team’s or product’s success. If you are a line manager, you are not only responsible for your area but are being relied upon by your peers for success. If you are a hands-on analyst or engineer you are actually delivering the work that leads to success. In all cases, ensure you do your job right, ask for help when you need it and never lie or hide anything.

Five really important lessons. I am grateful I had the opportunity to learn them before I entered the full-time career work force. I try to “pay this back” by teaching these lessons and concepts everywhere I go.

Before I forget…

Thank you to the men and women of Apollo. Thank you also to the men and women of Gemini and Mercury (it is easy to forget them on this day). You achieved miracles on a daily basis and inspired whole generations of scientists and engineers.