Design & Development

Design and development of products and platforms

Wine Clones, Drones, and Behavioral Cloning: Heretical IoT Thoughts on Winemaking

In addition to technology, I love cooking and craftsmanship. That naturally leads to an interest and wine, and especially, the winemaking process. Over the past few years, I have been lucky to meet a few great winemakers and discuss how they work their craft. This gave me an opportunity to learn not only how skilled they are, but also the incredible amount of work they and their teams do. They routinely begin their work at 3am and work past sunset. While much of this work is repetitive, the majority of it requires continuous application of expert judgment. The more I spoke to them the engineer in me started to think, “What could make their work easier without sacrificing their expertise?” That led me to this post.

Caveat: This post explores use of automation and IoT to augment (not replace) work by humans. The idea is not to replace people, but rather to take some of backbreaking labor out of their work and give them back more time with their families and friends. It is not intended to be as “heretical” as the idea may initially appear 😉

* * *

If you visit a winery and get to speak with the winemaker, he or she will talk about the process of routinely going out and doing things such as:

  • Inspecting the vines for damage, disease, and general health
  • Pruning grapes to concentrate resources to the very best
  • Pruning leaves to control the amount of sun grape bunches get

Watching someone do this on one vine is amazing (you can see the expertise in action). Watching them repeat alone a single row of vines starts to give the idea of how much work it takes to make great wine. Staying around all day to see this done across acres of grapes (and considering this is done throughout the season) drives home how we should respect everyone who works in a vineyard. It also gives an appreciation for the effect of this labor on their backs, knees, fingers, and eyesight. The goal of IoT is to reduce danger, dirtiness and drudgery. This is where my idea started.

Imagine doing this 10,000 times, year-in, year-out

Imagine this

The team goes out in the morning, before sunrise to tend their grapes. For this day, let’s assume they are pruning bunches of grapes to concentrate resources on the very best. Each person grabs their section of the vineyard and starts his or her work. However, they are each partnered with two drones. The first drone watches what each person does, recording which grapes are pruned. The other drone picks up discarded grapes for retrieval and composting.

After the expert finishes a few rows of grapes, the drones fly back, plug in, and upload their video images. The wait for a machine learning program to complete construction of a new model that they then upload. Then the drones go out and finish the work, based on what they have learned from that wine expert, for that section of grapes, for that location, for that day’s weather and solar conditions. Much backbreaking (and eye-straining) work is saved. Allowing the team to do the umpteen other activities that require their expertise and attention. However, today they may work “only” ten hours instead of sixteen.

This is not replacing people

No jobs are lost. People do an enormous amount of work at wineries. With this technology they might now work “only” ten hours instead of sixteen. Imagine the benefits to their health of this.

Furthermore, human knowledge and expertise are not replaced. Every plot of land and every day bring new variables. Every workday, the human shares expertise that is used to reduce repetitive work for that day. If you remove the human, you remove the expert. Eventually the winemaking would get worse and worse and the drones do not have sufficient expertise. The vineyard would suffer and lose out to others who apply expert knowledge every day. What is reduces is drudgery.

This is not far-fetched

A decade ago this would have been more Star Trek than reality. However, the advances of the last few years in technologies for drone, autonomous vehicle, and machine learning have made this achievable. Let’s look at a few

A photo taken on September 9, 2014 shows a drone flying over vineyards of the Pape Clement castle, belonging to Bordeaux winemaker Bernard Magrez in the southwestern French town of Pessac. Magrez is the first winemaker to have bought last February a drone equipped with a infrared camera to determine the optimal maturity of the domain’s grapes and thus harvest them at different times. AFP PHOTO JEAN PIERRE MULLER.
Drone and Autonomous Tech

Over the last few years, growth in drone technology and its supporting infrastructure has exploded. It will hit $12 billion per year in the next four years (comparison: US wineries sold $34 billion of product last year). Major “Blue Chip” companies are now exploring active programs to embed drones in their supply chain. Major consultancies are now using drones in numerous aspects of farming. Furthermore, regulations are now clear. Two years ago, it cost you over $2,000 to get certified as drone pilot. Today you can do this with $200.

Autonomous technology is growing as fast or faster. There will be 10+ million autonomous vehicles on the road by 2020. Autonomous vehicles are no longer just an idea in Silicon Valley. All “Big Three” US automotive manufacturers have autonomous vehicle programs. Even major shipping companies are now exploring combined drone and autonomous technology for cargo ships.

Computing and Machine Learning (ML) Infrastructure

Thanks to computer gaming, GPU costs have dropped dramatically. You can know get resources to build models on-demand and pay by the minute. The libraries to process imagery (e.g., OpenCV) and to build neural networks (e.g., TensorFlow, Keras) are advancing new major versions every 3-4 months. Some have even wrote that the “Object Detection Problem” is now solved (a great technical, but to technical write here). As one of my friends said to me this week—over a 2012 Tempranillo—“It’s a great time to be in tech.”

However, having the ML infrastructure is only the start. Building models takes lots of time, trial and error, and compute time. Luckily we now have models that recognize images with greater accuracy than humans. Using “Knowledge Transfer” we can start with these basic models and extend them to add new knowledge (here is an example of what Gilt did to extend Microsoft’s ReNet vision recognition model to detect clothing styles). Combine this with “Behavioral Cloning” (an approach widely used to teach autonomous vehicles how to drive; here is one example I have used) we can clone and graft winemaking knowledge to these existing models—just a winemaker grafts a wine clone to his or her vines.

 

The foundations are in place and costs are coming down drastically. Making this cost-feasible for small business is just a matter of time (and perhaps the startup to focus on it).

Finally, this does not eliminate nuance and individuality

One of the most enjoyable aspects of exploring wine and vineyards is seeing how each winemaker executes his or craft (just as going to new restaurants lets you explore how each chef interprets his or her craft). This technology does not remove the nuance and individuality of winemaking.

Even if every vineyard used this technology and started with the same baseline drones and machine learning models, they would each evolve differently. Thanks to behavioral cloning and knowledge transfer, each vineyards model would evolve weekly as they learn how the winemakers and his or her team applies their expertise day-by-day, wine block-by-wine block, year-by-year. These models really would not even be trade secrets that could be stolen as they would evolve to literally fit specific terriors—just as best practices in winemaking are.

If a tree falls in the woods and no one heard it, did it happen? Not in Streaming Analytics

Interest in “Streaming Analytics” has exploded over the past few years. The reasons are two-fold. First, the rise of the Internet of Things has made it possible for the first time ever to get data directly (and automatically) from infrastructure, cars, homes, factories and more—all without a human people ever having to do something. To put this in perspective, last quarter more new automobiles were connected to mobile networks than new cellphones were. Second, the technology is now readily available to implement streaming analytics at massive scales without needing to invent your own frameworks. Not one, but three technology projects (Storm, Spark, and Flink) are available for your choice. One of them, Apache Spark, is now the second-fastest growing open source project in history.

Streaming Analytics is a very fun field to be in (I have been in for 22 years—in the national security arena, eCommerce, med-tech, and now Industrial IoT). Taking in data faster than any human being could examine it and analyzing in near real-time to make split-second decisions creates provide omnipresent knowledge and enormous business value. However, Streaming Analytics presents a new challenge that does not exist in traditional After-the-fact Analytics:

You need to figure out how to make decisions on data that you do not know about yet—and may not ever find out about it time to make it worth your time.

Three real-world examples

To put it in philosophical parlance, how does anyone know if a tree in the forest fell down if no one ever sees (or hears) that it fell? As philosophical as this sounds, it can have multi-million-dollar impacts in the real world. Here are three examples:

Example 1: eCommerce Chatbot

My chatbot is engaged with a new prospective customer who may eligible—based on her mobile number—for our bank’s highest value credit card. Unfortunately, that data is delay in getting to my bot. As a result, at this point in time, I do not know whether the customer is: very valuable, average, or a credit risk. What does my chatbot do?

chartbot_400px

Example 2: Guaranteed Shipping

I have a booking to delivery high-value cargo to a customer site by end of business today. It is now 15 minutes after the day is over. I might be inclined to escalate to my carrier that the container has not arrived. However, at this point in time, I cannot tell if: the container arrived but the signal from the carrier is delayed getting to me or of the container did not arrive. What do I do?

container_3

Example 3: Infrastructure Security Monitoring

I run a cattle farm that is hundreds of thousands of acres. I have equipped all gates in my Smart Ranch with sensors to alert me if any are open (so I can prevent the cattle from getting away). The sensors send updates every 15 minutes. However, one of the gate sensors is a few minutes late. At this point in time, I do not know if the gate is open or closed. Does my system trigger an alert?

140647

What makes Streaming Analytics different

All of these challenges are based on lack of information. Lack of information is typical in analytics (as well as messy data, data gaps, corrupt data, duplicate data and many other issues). However, in the streaming analytics there is one critical difference: you will eventually have the data you need right now to answer your question. However, by the time you receive it, it will be too late to make your decision: the eCommerce customer will be gone; your freight contract will be honored or broken; the cattle will be safe or have gotten away.

What makes this especially different is that all the parties involved with your business will know the answer as well. If my chatbot fails to offer a valuable customer the best credit card, the line of business GM will ask why “it was so stupid.” If I call the customer up to tell them the freight has not arrived and they respond with “but it got here 10 minutes before closing”, I will look stupid. It all boils down to this:

56818465People may not know when After-the-fact Analytics miss a point; however, everyone will know that your Streaming Analytics made a mistake.

 

That can be stressful 😉

What’s a person to do?

The essential thing to remember when designing your Streaming Analytics solution is this:

Close enough and in-time is much more valuable than perfect and too late

This means you need to build your solution to make a decision based on the information available (rather than waiting until the critical moment has passed). The trick is determining what is “close enough”. The answer to that question depends on your business context. Specifically, given your context, is it better to accidentally do something you should not have (a Type I error) or is it better to not doing something you should have done (a Type II error).

Let’s looks at how this works in each of the three examples:

Example 1: eCommerce Chatbot

Our business context determines it is far worse to get a prospective customer excited about an offer that we cannot deliver instead of offering a less valuable package (i.e., we are Type II biased, something typical in ad-tech and eCommerce). We do not make the highest-value offer.

Depending on our Risk Policies we make the normal offer (one for which a majority of customers qualify) or shunt the customer to a slower process (email vs. chat) to wait time for the data to catch up (essentially shifting to batch). Most commerce companies have created default packages that allow the former action, allowing them to make more money in the “80% most likely case”. We could also apply a machine learning algorithm to guess the best alternative offer, maximizing revenue and minimizing the risk of an angry customer (or wasted time).

Example 2: Guaranteed Shipping

Our business context indicates that it does not make sense to alert that we are late if we do not know it (yet)—especially given the likelihood that this could result in some “egg on our face” when the customer asks why we did not know the container arrived 20 minutes ago. As a result, we do not alert we are late at 5:00pm. We make the call when we know for sure that the container was on time vs. late (i.e., when the delivery message actually arrives). This scenario is also Type II biased.

However, we do not want to expose ourselves to a completely irate customer in high-value circumstances. As such, we place a secondary streaming analytic in place: if we do not receive confirmation within more than 60 minutes from scheduled delivery we trigger an alert to reach out to our delivery carrier and find out the real status (i.e., by taking the expensive step of talking to a person vs. a sensor). We determined the “magic number” of 60 minutes by doing After-the-fact Analytics that determined waiting this long will automatically resolve the 80% of false positives while still giving us enough heads up to detect the true issues. If we are even smarter we can have our After-the-fact Analytics system automatically calculate the magic number to delay alerts based on location, time-of-day, day-of-week and other features.

Example 3: Infrastructure Security Monitoring

Our business context indicates that is not good to close the farm door after all the cattle got away. As such, we have programmed our Streaming Analytic system to alert us if the gate is opened (before a human has sent a “I am opening the gate” message) OR if we have not received confirmation that the gate is closed for period of longer than 15 minutes. Essentially we are Type I biased (not uncommon in safety and security situations).

Unfortunately this bias will result in lots of alerts. Essentially any time the sensor message is delay in the cell network our alarm will go off. Luckily, we have some more advanced analytic techniques to help with this. Namely, we can use a Lambda Architecture model that provides self-healing: the initial lack of confirmation that the gate is closed triggers an alert; the arrival of the delayed message that the gate WAS closed then cancels this alert (with a resolution message). This is still a bit chatty. However, it short-circuits false positives and prevents the need to send a worker (or a drone) all the way out to the gate to check if it is open.

Conclusion

Yes, Streaming Analytics is a harder than After-The-Fact Analytics. However, it the near real-time omnipresence (not omniscience) offers tremendous benefits. You just need to think in philosophical terms when designing your analytic rules.

entropy