This week, at the Washington DC Spark Interactive, Savi Engineering shared some of our work on using Spark Streaming and Expert Systems technology (Drools) to analyze the Industrial IoT in near-real time.
At Savi, we use a hybrid Lambda Architecture (see my post on why Lambda is so important). By “hybrid” we mean that unlike pure Lambda Architectures, we cannot restate the past 100% as we have already notified humans of critical IoT events (e.g., theft, safety risk). We can only enrich and auto-resolve these as more data becomes available. You can find tips on how do this — in general with streaming technologies and specifically with Spark — in the following presentation. You can also learn more about tackling real-world IoT challenges:
In addition, at Savi we combine fully explicit rules with real-time machined learning algorithms to perform risk and performance analytics in near-real time (see my post on the differences in focus areas between our Data Engineers and Data Scientists). James Nowell of our Engineering team provided a great presentation on how we run Drools inside Spark RDDs (yes–Drools, we do this without performance penalties) to create linear-scale expert systems to analyze all that IoT as if we were an omniscient human. You can find his presentation here:
In future presentations, we will expand on areas such as:
- The differences in use of Spark (using the same data) between Data Scientists and Engineers
- How we scale machine learning algorithms for real-time, sub-second execution (thousands of times per second)
- Creating a DAG that combines hardware device edge intelligence with cloud-based intelligence
If you like what you see here, Savi is hiring. Take a look at here.
Last week I had the pleasure of doing a podcast with Forbe-contributor Mike Kavis on how to architect for the Internet of Things (“IoT”). We originally connected on Twitter regarding a discussion on whether the IoT and sensors are Big Data. That discussion led a podcast on architecture challenges–from device to data to data consumer–created by the onset of millions (or billions) of connected sensors and smart things.
Here in an excerpt of what we discussed
- Connected devices bring back some classic engineering challenges back into the forefront. How do you transmit data securely and with low power consumption? How do you handle lossy networks and cut-off transmissions?
- Not everything is smartphone app transmitting JSON over HTTP (that would be cost prohibitive from both a hardware and bandwidth perspective). How do you handle communication myriad protocols, each of which could be using a near-infinite variety of data encoding formats?
- IoT data is messy. Devices get cut off in mid-transition (or repeat over and over until they get an acknowledgement). How do you detect this–and clean it up–as data arrives?
- IoT data is of incredibly high volume. By 2020, we will have 4x more sensor and IoT data than enterprise data. We already get more data today from sensors than we do from PCs. How do we scale to consume and use this. In addition, connected devices are not always smart or fault-tolerant. How do you ensure you are always ready to catch all that data (i.e., you need a zero-downtime IoT utility)
- IoT and sensor and of itself is not terribly useful. It is rarely in a format that a (business or consumer) analyst would even be able to read. It would be incredibly wasteful to store all this as-is in a business warehouse, DropBox repo, etc.
- IoT and sensor data needs context. Knowing device Knowing that FE80:0000:0000:0000:0202:B3FF:FE1E:8329 is at GPS location X,Y is of no use. You need to marry it to data about the “things” to get useful insights.
- IoT data simultaneously “lives” in two points of view: what does this mean right now and what does this imply for the big picture. The Lambda Architecture is an ideal tool to handle this.
- Finally, while all the attention is on the consumer stories, the real money is the Industrial and Enterprise Internet of Things. It’s also where smart things are far less creepy.
Listen to the podcast to hear more of the details
You can find the full podcast on Cloud Technology Partner’s website and SoundCloud:
I also want to take a moment to extend a big thank you to the folks at Cloud Technology Partners, SYS-CON Media, and Cloud Computing Journal for sharing this podcast. I also want thanks to all of you on Twitter who retweeted it. I was happily overwhelmed by the sharing and interest!