L3: What If’s

Lagrange Point 3 (L3): Exploring “What If” scenarios and flights of fancy

Wine Clones, Drones, and Behavioral Cloning: Heretical IoT Thoughts on Winemaking

In addition to technology, I love cooking and craftsmanship. That naturally leads to an interest and wine, and especially, the winemaking process. Over the past few years, I have been lucky to meet a few great winemakers and discuss how they work their craft. This gave me an opportunity to learn not only how skilled they are, but also the incredible amount of work they and their teams do. They routinely begin their work at 3am and work past sunset. While much of this work is repetitive, the majority of it requires continuous application of expert judgment. The more I spoke to them the engineer in me started to think, “What could make their work easier without sacrificing their expertise?” That led me to this post.

Caveat: This post explores use of automation and IoT to augment (not replace) work by humans. The idea is not to replace people, but rather to take some of backbreaking labor out of their work and give them back more time with their families and friends. It is not intended to be as “heretical” as the idea may initially appear 😉

* * *

If you visit a winery and get to speak with the winemaker, he or she will talk about the process of routinely going out and doing things such as:

  • Inspecting the vines for damage, disease, and general health
  • Pruning grapes to concentrate resources to the very best
  • Pruning leaves to control the amount of sun grape bunches get

Watching someone do this on one vine is amazing (you can see the expertise in action). Watching them repeat alone a single row of vines starts to give the idea of how much work it takes to make great wine. Staying around all day to see this done across acres of grapes (and considering this is done throughout the season) drives home how we should respect everyone who works in a vineyard. It also gives an appreciation for the effect of this labor on their backs, knees, fingers, and eyesight. The goal of IoT is to reduce danger, dirtiness and drudgery. This is where my idea started.

Imagine doing this 10,000 times, year-in, year-out

Imagine this

The team goes out in the morning, before sunrise to tend their grapes. For this day, let’s assume they are pruning bunches of grapes to concentrate resources on the very best. Each person grabs their section of the vineyard and starts his or her work. However, they are each partnered with two drones. The first drone watches what each person does, recording which grapes are pruned. The other drone picks up discarded grapes for retrieval and composting.

After the expert finishes a few rows of grapes, the drones fly back, plug in, and upload their video images. The wait for a machine learning program to complete construction of a new model that they then upload. Then the drones go out and finish the work, based on what they have learned from that wine expert, for that section of grapes, for that location, for that day’s weather and solar conditions. Much backbreaking (and eye-straining) work is saved. Allowing the team to do the umpteen other activities that require their expertise and attention. However, today they may work “only” ten hours instead of sixteen.

This is not replacing people

No jobs are lost. People do an enormous amount of work at wineries. With this technology they might now work “only” ten hours instead of sixteen. Imagine the benefits to their health of this.

Furthermore, human knowledge and expertise are not replaced. Every plot of land and every day bring new variables. Every workday, the human shares expertise that is used to reduce repetitive work for that day. If you remove the human, you remove the expert. Eventually the winemaking would get worse and worse and the drones do not have sufficient expertise. The vineyard would suffer and lose out to others who apply expert knowledge every day. What is reduces is drudgery.

This is not far-fetched

A decade ago this would have been more Star Trek than reality. However, the advances of the last few years in technologies for drone, autonomous vehicle, and machine learning have made this achievable. Let’s look at a few

A photo taken on September 9, 2014 shows a drone flying over vineyards of the Pape Clement castle, belonging to Bordeaux winemaker Bernard Magrez in the southwestern French town of Pessac. Magrez is the first winemaker to have bought last February a drone equipped with a infrared camera to determine the optimal maturity of the domain’s grapes and thus harvest them at different times. AFP PHOTO JEAN PIERRE MULLER.
Drone and Autonomous Tech

Over the last few years, growth in drone technology and its supporting infrastructure has exploded. It will hit $12 billion per year in the next four years (comparison: US wineries sold $34 billion of product last year). Major “Blue Chip” companies are now exploring active programs to embed drones in their supply chain. Major consultancies are now using drones in numerous aspects of farming. Furthermore, regulations are now clear. Two years ago, it cost you over $2,000 to get certified as drone pilot. Today you can do this with $200.

Autonomous technology is growing as fast or faster. There will be 10+ million autonomous vehicles on the road by 2020. Autonomous vehicles are no longer just an idea in Silicon Valley. All “Big Three” US automotive manufacturers have autonomous vehicle programs. Even major shipping companies are now exploring combined drone and autonomous technology for cargo ships.

Computing and Machine Learning (ML) Infrastructure

Thanks to computer gaming, GPU costs have dropped dramatically. You can know get resources to build models on-demand and pay by the minute. The libraries to process imagery (e.g., OpenCV) and to build neural networks (e.g., TensorFlow, Keras) are advancing new major versions every 3-4 months. Some have even wrote that the “Object Detection Problem” is now solved (a great technical, but to technical write here). As one of my friends said to me this week—over a 2012 Tempranillo—“It’s a great time to be in tech.”

However, having the ML infrastructure is only the start. Building models takes lots of time, trial and error, and compute time. Luckily we now have models that recognize images with greater accuracy than humans. Using “Knowledge Transfer” we can start with these basic models and extend them to add new knowledge (here is an example of what Gilt did to extend Microsoft’s ReNet vision recognition model to detect clothing styles). Combine this with “Behavioral Cloning” (an approach widely used to teach autonomous vehicles how to drive; here is one example I have used) we can clone and graft winemaking knowledge to these existing models—just a winemaker grafts a wine clone to his or her vines.


The foundations are in place and costs are coming down drastically. Making this cost-feasible for small business is just a matter of time (and perhaps the startup to focus on it).

Finally, this does not eliminate nuance and individuality

One of the most enjoyable aspects of exploring wine and vineyards is seeing how each winemaker executes his or craft (just as going to new restaurants lets you explore how each chef interprets his or her craft). This technology does not remove the nuance and individuality of winemaking.

Even if every vineyard used this technology and started with the same baseline drones and machine learning models, they would each evolve differently. Thanks to behavioral cloning and knowledge transfer, each vineyards model would evolve weekly as they learn how the winemakers and his or her team applies their expertise day-by-day, wine block-by-wine block, year-by-year. These models really would not even be trade secrets that could be stolen as they would evolve to literally fit specific terriors—just as best practices in winemaking are.

Bringing Machine Vision to Olympic Judging

If you’re like me, your favorite part of the Olympics is watching athletes from all over the world come together and compete to see who is the best. For many situations it is easy to clearly determine who is the best. The team that scores the most goals wins at Football (a.k.a. Soccer). The person who crosses the finish line first wins the 100-meter Dash. The swimmer who touches the wall first wins the 200-meter Breaststroke.

Victims of Human Error (and Bias)

However, is some cases, determining what happened is less clear. All of these cases involve subjective human judgment. I am not just talking about judgment regarding stylistic components; I am talking about judgment on absolute principles of scoring and penalties. As a result, athletes (who have trained for tens of thousands of hours over years of their lives) are often at the mercy of human judgment of motions that are almost to fast to observe. A few examples:

  1. A sprinter can be disqualified if she or he kicks off the starting blocks before the sound of the starting gun could potentially reach him or her
  2. A boxer may miss a point because he punches and connects too quickly
  3. A diver or gymnast can receive unwarranted penalties (or conversely, not receive warranted ones) because human judges misperceive the smallest of angles during an movement that takes just a few seconds

Even worse, athletes in these situations are not only subject to human error, they are often subject to human bias as well. We have all seen countless questionable judgment calls based on national or political bias in too many events. As upsetting as these are to the spectator they are utterly heart breaking for the athletes involved.

Bringing Machine Intelligence to the Rescue

We already use technology to aid in places where events happen to quickly for humans to accurately perceive them. In racing (humans to horses, on land or water), we use photo-finish cameras to resolve which athlete has actually one when a finish is too close (or as happened this year, when there is actually a tie for the Gold Medal). In Gymnastics and Skating we allow judges to review slow motion cameras as part of their judging. In Fencing, we go one step further and equip athletes with electronic sensors to measure when a blade has touched a target area (or which touched first to resolve simultaneous touches).

It is time to go a few steps further and actually bring machine intelligence (machine vision + machine learning) to the stage to provide the same absolute scoring that photo-finish cameras bring. I am not advocating using machines to replace people for stylistic judging. However, it makes absolutely no sense to not use machines to detect and score absolutes such as:

  • A gymnast’s bent arms, separated knees or mixed tempo
  • Level of differentiation of a diver’s twist from 90°
  • The actual time a sprinter kicks off the blocks based a microphone’s detection of when the sound arrived
  • Detection of a skater’s under-rotated jump

Not only would this significantly reduce bias and error. It would actually be a great training tool. Just as advanced athletes today use sensors to measure performance and conditioning, they could use a similar approach to detect small errors and work to eliminate them earlier in training.

This is Now Possible

Just a few years ago, this was the stuff to science fiction. Today it is feasible. Half a dozen companies have developer self-driving cars equipped with sensors and machine learning programs to deal with conditions with much higher levels of variability than judging a 10-meter dive or Balance Beam program. However, one does not need to equip arenas with multiple cameras and LIDAR arrays. Researchers at DARPA have even moved down the direction of teaching robots to cook by having them review two-dimensional YouTube videos.

Similar approaches could be uses for “Scoring Computers.” If we wanted to go down the path of letting computer see exactly (and only) what humans can see we can go down the machine-learning route. First program the rules for scores and penalties. Then create training sets with identified scores and infractions to train a computer to detect penalties and score them as a judge would do—but with the aid of slow motion review in a laboratory without the pressure of on-the-spot judging on live TV. This would not remove the human, it would just let the human teach a computer to do something with higher accuracy and speed than a person could do in real-time.

If we wanted to go a step further, just as Fencing has done. We can add sensors to mix. A LIDAR array could measure exact motion (actually measuring that bent knee or over-rotation). Motion- capture (mo-cap) would make this accuracy even better. Both would also create amazing advanced sports training technology.

It’s More Feasible Then You May Think

All of this technology sounds pretty expensive: computers, sensors, data capture, programming, testing, verification, deployment, etc. However, it is not nearly as expensive and “sci-fi-ish” as one might think (or fear).

Tens of thousands of hours of video already exists to train computers to judge events (the same videos that judges, athletes and coaches review in training libraries—libraries even better than robo.watch). Computing time is getting cheaper every year thanks to Moore’s Law and public cloud computing. An abundant number of Open Source libraries for machine learning are available (some companies have opened proprietary libraries; others are offering Machine Learning-as-a-Service). There are now even low-cost LIDAR sensors available for less than $500 that can resolve distances of 1 cm or less (for $2,000 college programs and Tier I competitive venues can get sensors that resolve to 1 mm or less).

Given the millions of dollars poured into these sports (and the billions into transmission rights), it would not require an Apollo Program to build a pilot of this in time for the 2020 Olympics (or even 2018 Winter Olympics). Companies like Google and IBM likely donate some R&D to show off their capabilities. Universities like MIT, Carnegie Mellon, and Stanford are already putting millions of dollars in biomimetics, computer vision, and more. Even companies like ILM and Weta Digital might offer their mo-cap expertise as they would benefit from joint R&D. Thousands of engineers would likely jump in to help out via Kaggle Competitions and Hackathons as this would be really fun to create.

Some Interesting Side Benefits

There are benefits to technology outside of “just” providing more accurate judging and better training tools. This same technology could create amazing television that would enable spectators to better appreciate and understand these amazing sports. Yes, you could also add your Oculus Rift or similar AR technology to create some amazing immersive games (creating new sources of funding for organizations like the US Olympic Team or USA Gymnastics to help pay for athlete training).