Tag Archives: drones

If a tree falls in the woods and no one heard it, did it happen? Not in Streaming Analytics

Interest in “Streaming Analytics” has exploded over the past few years. The reasons are two-fold. First, the rise of the Internet of Things has made it possible for the first time ever to get data directly (and automatically) from infrastructure, cars, homes, factories and more—all without a human people ever having to do something. To put this in perspective, last quarter more new automobiles were connected to mobile networks than new cellphones were. Second, the technology is now readily available to implement streaming analytics at massive scales without needing to invent your own frameworks. Not one, but three technology projects (Storm, Spark, and Flink) are available for your choice. One of them, Apache Spark, is now the second-fastest growing open source project in history.

Streaming Analytics is a very fun field to be in (I have been in for 22 years—in the national security arena, eCommerce, med-tech, and now Industrial IoT). Taking in data faster than any human being could examine it and analyzing in near real-time to make split-second decisions creates provide omnipresent knowledge and enormous business value. However, Streaming Analytics presents a new challenge that does not exist in traditional After-the-fact Analytics:

You need to figure out how to make decisions on data that you do not know about yet—and may not ever find out about it time to make it worth your time.

Three real-world examples

To put it in philosophical parlance, how does anyone know if a tree in the forest fell down if no one ever sees (or hears) that it fell? As philosophical as this sounds, it can have multi-million-dollar impacts in the real world. Here are three examples:

Example 1: eCommerce Chatbot

My chatbot is engaged with a new prospective customer who may eligible—based on her mobile number—for our bank’s highest value credit card. Unfortunately, that data is delay in getting to my bot. As a result, at this point in time, I do not know whether the customer is: very valuable, average, or a credit risk. What does my chatbot do?


Example 2: Guaranteed Shipping

I have a booking to delivery high-value cargo to a customer site by end of business today. It is now 15 minutes after the day is over. I might be inclined to escalate to my carrier that the container has not arrived. However, at this point in time, I cannot tell if: the container arrived but the signal from the carrier is delayed getting to me or of the container did not arrive. What do I do?


Example 3: Infrastructure Security Monitoring

I run a cattle farm that is hundreds of thousands of acres. I have equipped all gates in my Smart Ranch with sensors to alert me if any are open (so I can prevent the cattle from getting away). The sensors send updates every 15 minutes. However, one of the gate sensors is a few minutes late. At this point in time, I do not know if the gate is open or closed. Does my system trigger an alert?


What makes Streaming Analytics different

All of these challenges are based on lack of information. Lack of information is typical in analytics (as well as messy data, data gaps, corrupt data, duplicate data and many other issues). However, in the streaming analytics there is one critical difference: you will eventually have the data you need right now to answer your question. However, by the time you receive it, it will be too late to make your decision: the eCommerce customer will be gone; your freight contract will be honored or broken; the cattle will be safe or have gotten away.

What makes this especially different is that all the parties involved with your business will know the answer as well. If my chatbot fails to offer a valuable customer the best credit card, the line of business GM will ask why “it was so stupid.” If I call the customer up to tell them the freight has not arrived and they respond with “but it got here 10 minutes before closing”, I will look stupid. It all boils down to this:

56818465People may not know when After-the-fact Analytics miss a point; however, everyone will know that your Streaming Analytics made a mistake.


That can be stressful 😉

What’s a person to do?

The essential thing to remember when designing your Streaming Analytics solution is this:

Close enough and in-time is much more valuable than perfect and too late

This means you need to build your solution to make a decision based on the information available (rather than waiting until the critical moment has passed). The trick is determining what is “close enough”. The answer to that question depends on your business context. Specifically, given your context, is it better to accidentally do something you should not have (a Type I error) or is it better to not doing something you should have done (a Type II error).

Let’s looks at how this works in each of the three examples:

Example 1: eCommerce Chatbot

Our business context determines it is far worse to get a prospective customer excited about an offer that we cannot deliver instead of offering a less valuable package (i.e., we are Type II biased, something typical in ad-tech and eCommerce). We do not make the highest-value offer.

Depending on our Risk Policies we make the normal offer (one for which a majority of customers qualify) or shunt the customer to a slower process (email vs. chat) to wait time for the data to catch up (essentially shifting to batch). Most commerce companies have created default packages that allow the former action, allowing them to make more money in the “80% most likely case”. We could also apply a machine learning algorithm to guess the best alternative offer, maximizing revenue and minimizing the risk of an angry customer (or wasted time).

Example 2: Guaranteed Shipping

Our business context indicates that it does not make sense to alert that we are late if we do not know it (yet)—especially given the likelihood that this could result in some “egg on our face” when the customer asks why we did not know the container arrived 20 minutes ago. As a result, we do not alert we are late at 5:00pm. We make the call when we know for sure that the container was on time vs. late (i.e., when the delivery message actually arrives). This scenario is also Type II biased.

However, we do not want to expose ourselves to a completely irate customer in high-value circumstances. As such, we place a secondary streaming analytic in place: if we do not receive confirmation within more than 60 minutes from scheduled delivery we trigger an alert to reach out to our delivery carrier and find out the real status (i.e., by taking the expensive step of talking to a person vs. a sensor). We determined the “magic number” of 60 minutes by doing After-the-fact Analytics that determined waiting this long will automatically resolve the 80% of false positives while still giving us enough heads up to detect the true issues. If we are even smarter we can have our After-the-fact Analytics system automatically calculate the magic number to delay alerts based on location, time-of-day, day-of-week and other features.

Example 3: Infrastructure Security Monitoring

Our business context indicates that is not good to close the farm door after all the cattle got away. As such, we have programmed our Streaming Analytic system to alert us if the gate is opened (before a human has sent a “I am opening the gate” message) OR if we have not received confirmation that the gate is closed for period of longer than 15 minutes. Essentially we are Type I biased (not uncommon in safety and security situations).

Unfortunately this bias will result in lots of alerts. Essentially any time the sensor message is delay in the cell network our alarm will go off. Luckily, we have some more advanced analytic techniques to help with this. Namely, we can use a Lambda Architecture model that provides self-healing: the initial lack of confirmation that the gate is closed triggers an alert; the arrival of the delayed message that the gate WAS closed then cancels this alert (with a resolution message). This is still a bit chatty. However, it short-circuits false positives and prevents the need to send a worker (or a drone) all the way out to the gate to check if it is open.


Yes, Streaming Analytics is a harder than After-The-Fact Analytics. However, it the near real-time omnipresence (not omniscience) offers tremendous benefits. You just need to think in philosophical terms when designing your analytic rules.


Drone Commerce, Part 2: Global Internet Access

In Part 1 of this series, I looked at Amazon’s use of drones for same-day delivery. In this post, I will examine Google’s proposed use of drones for ubiquitous Internet access and near-Earth monitoring from the point of view of someone who has built things that fly, the software that controls them and large-scale Internet platforms.

The Drones of Titan

The drones created by Titan (now Google) Aerospace are quite different from the quadcopters you can buy online or the military UAVs featured so prominently in the news since 9-11. They are high-endurance drones intended to stay continuously aloft at 65,000’ (20 km) for 3 to 5 years. Running on solar-rechargeable batteries, they are designed to function as in-atmosphere satellites, providing communications (like COMSATs) or sensor-based observation (like weather and surveillance satellites).


Packets of energy, not goods

Amazon’s is exploring use of drones to delivery physical goods. This brings on a host of complex aeronautic and air traffic challenges: the ability to carry payload while staying small enough to navigate inside cities; efficiently taking off and landing several times per day in the midst of wind gusts and other weather conditions; and the need to avoid trees, birds, power lines, buildings and host of other obstacles. Google’s drones avoid all of these challenges:

  • Flying at 65,000’ places them above all weather events and a majority of atmospheric turbulence. It also places them above birds, buildings, mountains and even commercial airline traffic
  • Staying aloft for years (or even just a few months) eliminates exposure to the highest-risk operation any non-military aircraft can do: takeoff or land. It also reduces equipment replacement costs and virtually eliminates re-fueling costs.
  • By transmitting and receiving photons (light and other electromagnetic waves) the drones do not need to be engineered to carry high payloads. They also do not need to be engineered for repeated loading and unloading of packages.

These changes significantly reduce operational risk and cost. From an engineer’s point of view, the technology is a great fit to its intended function. However…

Is this just and engineer’s fantasy?

Yes, the Google Drones appear to be great candidates for in-atmosphere satellites. However, keeping hundreds or thousands of drones aloft is a pricey enterprise with complexity akin to that of operating a mid-sized airport. Aren’t there technologies already available that already meet the needs these drones are intended to satisfy? Let’s look at the two commonly considered alternatives to help answer this:

Cellular (GSM/GPRS/3G/LTE/4G):

Cellular technology already exists in many, many parts of the world (even 95% of the people in Africa who live in areas with electrical power, live within coverage of cell towers). At first examination, using drones to give coverage to everyone outside cell tower coverage seems to be a display of “First World Hi-Tech Hubris”. If these drones were just intended to provide Internet (as Facebook was exploring), I would agree 100%.

However these drones can have cameras and other sensors to provide monitoring of the environment, climate change, and natural disasters that cell towers cannot. Given the benefits already provided by using Google Earth data for analysis of climate, population, infrastructure and more, one can easily see the doors that opened by feeding camera and sensor data from these drones to developers and researchers via Google’s Maps APIs (including weather and traffic layers and ‘satellite’ views).

Finally as these drones are powered by sunlight, they would continue to function and provide monitoring and Internet access even if a natural disaster took at power grids and energy pipelines for an area.


horizon-1One could easily argue that satellites (between Iridium, SPOT, INMARSAT, COMSAT, and all those government programs I cannot mention) cover all the gaps cellular technology misses. At 65,000’ of altitude, these drones would only be able to cover a 300-mile radius: satellites (depending on orbital parameters) can cover up to 160x this coverage area.

However, satellites are expensive (as we have learned with the disappearance of flight MH370), satellite is expensive (about $0.14-$0.18 per small 1-Kilobyte message). The reason for this high-cost is two-fold: the high-cost of launching a satellite and the distance they are above the earth (it takes over 1500x the power to transmit a signal to an Iridium satellite than it does to transmit a signal to a drone overhead at 65,000’).

This opens to door to communication with a whole new class of technologies, ones far less expensive than satphones. This includes everything from low-cost mobile phones to OLPC (One Laptop Per Child) laptops to sensors used to track endangered species and protect them against poaching.

This distance factor goes beyond power consumption to image resolution (Ground Sample Distance or GSD). Quite simply, a drone at 65,000’ can get photos with 6x the resolution of satellite in Low Earth Orbit (LEO) and 40x the resolution of satellites like SPOT.

A great addition, but not the only answer

The Google Drone concept is not a one-size-fits-all answer. It would take thousands of drones to cover the Earth, a very costly operation. While providing more coverage than cell towers, they would often be farther away and more costly to operate. While providing better bandwidth and GSD than satellite, they would have less coverage area. As such the answer, like all things in Internet access (and sensor technology) is a blended combination of fixed-line Internet, multiple terrestrial wireless technologies (from ZigBee to 4G), satellites and drones.

This begs an important question…

One question that has plagued me from the day I first saw Facebook’s interest in Titan was why communications companies like Vodafone (which is rather well known for its 21-country mobile SIM network) were not interested in companies like Titan. Overall, using drones for ubiquitous Internet would appear to be a much better strategic fit to a company that already charges customers for Internet access. Perhaps Google can make more money from higher-resolution image and sensor data than it would initially appear. Or perhaps these drones could serve as a potential grid network that could bypass carriers if the Net Neutrality wars go in a bad direction (just like Netflix is exploring with its peer-to-peer research).


Only time will tell.