L2: In the Shadows

Lagrange Point 2 (L2): Potential surprises and developments under the radar

If a tree falls in the woods and no one heard it, did it happen? Not in Streaming Analytics

Interest in “Streaming Analytics” has exploded over the past few years. The reasons are two-fold. First, the rise of the Internet of Things has made it possible for the first time ever to get data directly (and automatically) from infrastructure, cars, homes, factories and more—all without a human people ever having to do something. To put this in perspective, last quarter more new automobiles were connected to mobile networks than new cellphones were. Second, the technology is now readily available to implement streaming analytics at massive scales without needing to invent your own frameworks. Not one, but three technology projects (Storm, Spark, and Flink) are available for your choice. One of them, Apache Spark, is now the second-fastest growing open source project in history.

Streaming Analytics is a very fun field to be in (I have been in for 22 years—in the national security arena, eCommerce, med-tech, and now Industrial IoT). Taking in data faster than any human being could examine it and analyzing in near real-time to make split-second decisions creates provide omnipresent knowledge and enormous business value. However, Streaming Analytics presents a new challenge that does not exist in traditional After-the-fact Analytics:

You need to figure out how to make decisions on data that you do not know about yet—and may not ever find out about it time to make it worth your time.

Three real-world examples

To put it in philosophical parlance, how does anyone know if a tree in the forest fell down if no one ever sees (or hears) that it fell? As philosophical as this sounds, it can have multi-million-dollar impacts in the real world. Here are three examples:

Example 1: eCommerce Chatbot

My chatbot is engaged with a new prospective customer who may eligible—based on her mobile number—for our bank’s highest value credit card. Unfortunately, that data is delay in getting to my bot. As a result, at this point in time, I do not know whether the customer is: very valuable, average, or a credit risk. What does my chatbot do?

chartbot_400px

Example 2: Guaranteed Shipping

I have a booking to delivery high-value cargo to a customer site by end of business today. It is now 15 minutes after the day is over. I might be inclined to escalate to my carrier that the container has not arrived. However, at this point in time, I cannot tell if: the container arrived but the signal from the carrier is delayed getting to me or of the container did not arrive. What do I do?

container_3

Example 3: Infrastructure Security Monitoring

I run a cattle farm that is hundreds of thousands of acres. I have equipped all gates in my Smart Ranch with sensors to alert me if any are open (so I can prevent the cattle from getting away). The sensors send updates every 15 minutes. However, one of the gate sensors is a few minutes late. At this point in time, I do not know if the gate is open or closed. Does my system trigger an alert?

140647

What makes Streaming Analytics different

All of these challenges are based on lack of information. Lack of information is typical in analytics (as well as messy data, data gaps, corrupt data, duplicate data and many other issues). However, in the streaming analytics there is one critical difference: you will eventually have the data you need right now to answer your question. However, by the time you receive it, it will be too late to make your decision: the eCommerce customer will be gone; your freight contract will be honored or broken; the cattle will be safe or have gotten away.

What makes this especially different is that all the parties involved with your business will know the answer as well. If my chatbot fails to offer a valuable customer the best credit card, the line of business GM will ask why “it was so stupid.” If I call the customer up to tell them the freight has not arrived and they respond with “but it got here 10 minutes before closing”, I will look stupid. It all boils down to this:

56818465People may not know when After-the-fact Analytics miss a point; however, everyone will know that your Streaming Analytics made a mistake.

 

That can be stressful 😉

What’s a person to do?

The essential thing to remember when designing your Streaming Analytics solution is this:

Close enough and in-time is much more valuable than perfect and too late

This means you need to build your solution to make a decision based on the information available (rather than waiting until the critical moment has passed). The trick is determining what is “close enough”. The answer to that question depends on your business context. Specifically, given your context, is it better to accidentally do something you should not have (a Type I error) or is it better to not doing something you should have done (a Type II error).

Let’s looks at how this works in each of the three examples:

Example 1: eCommerce Chatbot

Our business context determines it is far worse to get a prospective customer excited about an offer that we cannot deliver instead of offering a less valuable package (i.e., we are Type II biased, something typical in ad-tech and eCommerce). We do not make the highest-value offer.

Depending on our Risk Policies we make the normal offer (one for which a majority of customers qualify) or shunt the customer to a slower process (email vs. chat) to wait time for the data to catch up (essentially shifting to batch). Most commerce companies have created default packages that allow the former action, allowing them to make more money in the “80% most likely case”. We could also apply a machine learning algorithm to guess the best alternative offer, maximizing revenue and minimizing the risk of an angry customer (or wasted time).

Example 2: Guaranteed Shipping

Our business context indicates that it does not make sense to alert that we are late if we do not know it (yet)—especially given the likelihood that this could result in some “egg on our face” when the customer asks why we did not know the container arrived 20 minutes ago. As a result, we do not alert we are late at 5:00pm. We make the call when we know for sure that the container was on time vs. late (i.e., when the delivery message actually arrives). This scenario is also Type II biased.

However, we do not want to expose ourselves to a completely irate customer in high-value circumstances. As such, we place a secondary streaming analytic in place: if we do not receive confirmation within more than 60 minutes from scheduled delivery we trigger an alert to reach out to our delivery carrier and find out the real status (i.e., by taking the expensive step of talking to a person vs. a sensor). We determined the “magic number” of 60 minutes by doing After-the-fact Analytics that determined waiting this long will automatically resolve the 80% of false positives while still giving us enough heads up to detect the true issues. If we are even smarter we can have our After-the-fact Analytics system automatically calculate the magic number to delay alerts based on location, time-of-day, day-of-week and other features.

Example 3: Infrastructure Security Monitoring

Our business context indicates that is not good to close the farm door after all the cattle got away. As such, we have programmed our Streaming Analytic system to alert us if the gate is opened (before a human has sent a “I am opening the gate” message) OR if we have not received confirmation that the gate is closed for period of longer than 15 minutes. Essentially we are Type I biased (not uncommon in safety and security situations).

Unfortunately this bias will result in lots of alerts. Essentially any time the sensor message is delay in the cell network our alarm will go off. Luckily, we have some more advanced analytic techniques to help with this. Namely, we can use a Lambda Architecture model that provides self-healing: the initial lack of confirmation that the gate is closed triggers an alert; the arrival of the delayed message that the gate WAS closed then cancels this alert (with a resolution message). This is still a bit chatty. However, it short-circuits false positives and prevents the need to send a worker (or a drone) all the way out to the gate to check if it is open.

Conclusion

Yes, Streaming Analytics is a harder than After-The-Fact Analytics. However, it the near real-time omnipresence (not omniscience) offers tremendous benefits. You just need to think in philosophical terms when designing your analytic rules.

entropy

Four Common IoT Security Holes

If you follow the Internet of Things space, not a day passes where you do not see an analyst report or news article talking about IoT security vulnerabilities across every sector: consumer, enterprise, industrial and government/Smart City.

I’ve been working with Internet-connected devices (medical devices, industrial actuators, sensors for environmental, security monitoring, even military systems) for many years. In my job, I am lucky enough to able to work with industrial and enterprise devices daily. At home, I play with them both as a consumer and developer. Time and again, I see the following IoT security holes with alarming frequency:

Security Hole #1: Not Using Strong Encryption

It is amazing that in 2016 people are still not using strong encryption to protect important data. However, I frequently see IoT devices that use no encryption at all: they store and transmit data in the clear. Other devices use homegrown encryption techniques that are are unproven by peer review and relatively easy to hack.

Most of the arguments I have seen against encryption fall into three camps: 1) it is too computationally expensive for low-powered devices, 2) it is too hard to use for IoT protocols, and 3) the device data is too obscure to understand. Let’s look at each:

  1. Yes, encryption is computationally expensive. However, ongoing investments in the space are providing more efficient RSA, AES, and ECC algorithms that work on smaller devices. In addition, Moore’s Law is even allowing penny-sized devices to have enough power to use these.
  2. IoT protocols are also getting better and better at providing strong encryption and secure connections (see Security Hole #2).
  3. Finally, the old “Our-data-is-too-obscure-for-hackers-to-understand Argument” was proven a fallacy years ago, first by the credit card industry’s Cardholder Information Security Program, and later by its replacement: PCI DSS. Any disgruntled employee (or hacker masquerading as a contractor) can bypass the “obscurity protection.”

Not using strong encryption is probably the most egregious security vulnerability. Any 14-year-old can use downloadable packet sniffing programs to capture your data. Solutions that mitigate this risk are readily available. There is no excuse to not encrypting your data.

Security Hole #2: Not Using Secured Sessions

A common error is information/cyber security is forgetting that secure communication consists of two components:

  1. Encryption of data and
  2. Establishment of secured sessions

Secured sessions use protocols to establish mutual authentication and to exchange  shared secret that only the transmitter and receiver have. If you do not establish a secured session you are blindly guessing that the recipient of your data is the correct person. When you do not use secured session you invite a Man-In-The-Middle (MITM) attack where the attacker can intercept and redirect your transmissions.

Many people think they are not likely targets of a MITM attack. Here is simple scenario.

  • A disgruntled employee or hacker-posing-as-contractors first intercepts and copies traffic from your devices.
  • From this data, he learns what devices are attached to items of interest (a patient, your house, etc.). He can then also learn the normal pattern of communication from the device.
  • Next he replaces the data from your device to send his own. This can give the appearance that a patient who is sick is now health (or vice versa) or that your house is not being broken into (allowing his partners to break in). The hacker can even intercept your over-the-air commands and download programmable software or send commands to shut-down devices.

This work is technically hard, but doable with software downloadable on the Internet. If communication between your IoT devices and your secured (and encrypted), the hacker would have to gain enough permissions to get a hold of your SSL certificates and hijack DNS (if he has this, you are in a lot of trouble already). However, if the communication between your IoT devices and servers is not secured, a hacker can conduct this MITM attack from anywhere. By the time you learn about it, the damage will be long done.

Thankfully, there are many solutions available in the IoT domain that provide both strong encryption and secured sessions (plugging Security Holes #1 and #2):

  • If you are using standard “Internet of Servers” protocols, simply installing a full compliment of certificates will enable you to use SSL over TLS for HTTPS and FTPS (but not SFTP).
  • If you are using MQTT (one of my favorites), there are many brokers available that also provide SSL over TLS.
  • If you are using CoAP (which rides over UDP), you can use DTLS.
  • If your devices have edge constellations, you can turn on Bluetooth Security Mode 4 and get SSL with the same Elliptic Curve Diffie-Hellman secret key exchange used by the NSA.
  • You can even download and borrow the wonderful MTproto protocol designed by the folks over at Telegram (it is designed for low-powered, lossy, distributed communication).

None of these solutions are perfect. However, all reduce security risks significantly. Furthermore, all are evolving in the open source community as people find new vulnerabilities. Why more people do not use them is puzzling.

Security Hole #3: Not Protecting Against Buffer Overflow

When a hacker triggers a Buffer Overflow vulnerability, she typically causes a program to do two things: dump critical data and crash.

The first documented cases of Buffer Overview exploits data back to 1972. As more and more computers were connected to the Internet, these attacks became more pervasive. Fifteen years ago, Code Red highlighted to much of the general public what a Buffer Overflow exploit can do.

Over the past few years, application framework libraries have and higher-level languages, have added many defensive programming protection to make these vulnerabilities less prevalent than they were in the past. (As anyone who has encountered an awlful error page that shows you a stack trace error, these defenses are still far-from-perfect). Nevertheless, they have plugged many holes.

However, IoT devices are bringing this vulnerability back into the mainstream again. As most IoT devices operate with far less memory and CPU than expensive devices like your laptop or smartphone, their firmware and applications are primarily written in lower level programing languages. It is much easier to trigger buffer overflows in these languages than more forgiving higher level languages. Exception handling libraries are less robust. More often than not, memory management is handled using good old-fashioned C/C++ programming (there is no Garbage Collector to save you). This significantly raises the risk of buffer overflows in devices.

When buffer overflow crashes occur in the data center there is at least someone around to fix things. When they happen to a remote IoT device in the field, they can literally shut down a security or medical sensor. There is no IT or Ops department nearby to fix it. The device is shut down (at best, or bricked at worst). Essentially device is dead to world. Depending on what is was responsible for, lots real-world physical damage can ensure.

Devices that maintain continuously open Internet connections (like all those connected baby monitors) are especially prone to buffer flow attacks as remote hackers can discover them using port-scanning software. However, even industrial IoT devices that only pull commands and programs down over-the-air are vulnerable to MITM attacks that can shut them down by flooding data to the device (this reinforces the need to plug Security Holes #1 and #2 discussed above).

The fix to this problem is fairly clear:  implement defensive programming and test it aggressively. Today’s automation technologies for continuous integration and delivery make this a much easier and trustworthy process than it was even a decade ago.

Security Hole #4: Weak Systems Engineering

The fourth big security hole I commonly see spans the intersection of technical design, system processes, and human behavior. It essentially boils down to this: if you use flawless technology in ways that it is not intended, you can create big vulnerabilities. If I design perfectly secure medical device but put it on the wrong patient (accidentally or maliciously), I will prevent capture of data about that sensor. If someone who installs the security sensors in my house sets my account up to call their cell phone (and not mine), they can break in while I am gone and I trick the company into thinking it is a false alarm.

The way around this is to design IoT devices that work when things (humans, the network, servers, etc.) fail.

  • Build in redundancy (devices, network paths and servers) to mitigate technical failures
  • Build in positive and negative feedback looks to mitigate human failures. For example, I should not just be notified if my home security sensor goes off. I should should be notified if my smartphone and my security companies servers both cannot communicate with my home security IoT devices.

Plugging this systems engineering IoT security hole takes a combination of technology engineering and business process design.  This is a natural fit to the enterprise, where IoT can be used as a component of business transformation. In the consumer segment the answer is usually an ecosystem solution. Amazon’s and Google’s solutions stand out regarding robustness and security.

***

The Internet of Things offers great potential to transform how we work and live by removing many tedious tasks from our day-to-day activities. Making this a reality requires a secure Internet of Things. We will never make security perfect. However, we have the tools to make it trustworthy. What is needed is just the discipline to include them as we build new IoT devices, systems and processes.