Tag Archives: sensors

My MIT EF Podcast: IoT as Internet 3.0

Last week, I had the pleasure of doing a podcast with Randall Cronk of the MIT Enterprise Forum (my alma mater) on the practicalities and challenges of using the Internet of Things (a.k.a. IoT) to solve real-world problems.

Here in an excerpt of some of the things (no pun intended) we discussed:

IoT is not just about talking toasters (or creepy monitoring), it can be use to solve many high-cost, real-world problems. We already have some clear analogies for this:

  • Commercialization of the World Wide Web (Internet 1.0) radically changed how we get information. Instead of waiting to get it physically (via mail or newspapers) we could get it instantly from our desktops
  • The mobile Internet, smartphones, mobile web and app stores (Web 2.0 or Internet 2.0). Let us take the convenience of this instantaneous access virtually anywhere. We no longer had to go back to our desks and could now look up info on street corner at a restaurant, etc.
  • The Internet of Things (Internet 3.0) takes this convenience to the next level. We no longer have to go look at things to see where they are, what state they are in, etc. We can find out without manual effort. This lets us focus on things we really care about (instead of the drudgery of getting information)

Of course, this is not a simple prospect. We have many challenges to solve. The most obvious are the ones around data connectivity and protocols (these challenges, however, are pretty straightforward). The next is privacy and security (we have models for these from regulated industries like banking, healthcare, and medicine). The next is how to handle all that information. If we do not solve this problem, connected things will swarm us with so much useless data that it will make our email inboxes look simple.

Listen to the podcast to hear more of the details

You can find it at the MIT Enterprise Forum:

Internet_3_0_podcast

or on iTunes:

Get_it_on_iTunes_Badge_US_135

 

Twitter traffic jams in Washington, created by… John Oliver

Summary: In the first week of June, 20% of the Tweets about traffic, delays and congestion by people around the Washington Beltway were caused by John Oliver’s “Last Week Tonight” segment about Net Neutrality.

At work, we are always exploring a wide range of sensors to obtain useful insights that can used to make work and routine activities faster, more efficient and less risky. One of our Alpha Tests is examining use of “arrays” of high-targeted Twitter sensors to detect early indications of traffic congestion, accidents and other sources of delays. Specifically we are training our system how to use Twitter is a good traffic sensor (by good, in “data science speak” we are determining whether we can train a model for traffic detection that has a good balance of precision and recall, and hence a good F1 Score). To do this, I setup a test bed around the nation’s second-worst commuter corridor: the Washington DC Beltway (our my backyard).

Earlier this month our array of geographic Twitter sensors picked up an interesting surge in highly localized tweets about traffic-related congestion and delays. This was not an expected “bad commuter-day”-like surge. The number of topic- and geographically-related tweets seen on June 4th was more than double the expected number for a Tuesday in June around the Beltway; the number seen during lunchtime was almost 5x normal.

So what was the cause? Before answering, it is worth taking a step back.

The folks at Twitter have done a wonderful job at not only allowing you to fetch tweets based on topics, hash tags and geographies. They have also added some great machine learning-driven processing to screen out likely spammers and suspect accounts. Nevertheless Twitter data, like all sensor data, is messy. It is common to see Tweets with words spelled wrong, words used out of context, or simply nonsensical Tweets. In addition, people frequently repeat the same tweets throughout the day (a tactic to raise social media exposure) and do lots of other things that you must train the machine to account for.

That’s why we use a Lambda Architecture to process our streaming sensor data (I’ll write about why everyone–from marketers to DevOps staff should be excited about Lambda architectures in a future post). As such, not only do use Complex Event Processing (via Apache Storm) to detect patterns as they happen; we also keep a permanent copy of all raw data that we can explore to discover new patterns and improve our machine learning models).

That is exactly what we did as soon as we detected the surge. Here is what we found: the cause of the traffic- and congestion-related Twitter surge around the Beltway was… John Oliver:

  1. In the back half of June 1st’s episode of “Last Week Tonight” (HBO, 11pm ET), John Oliver had an interesting 13-minute segment on Net Neutrality. In this segment he encouraged people to visit the FCC website and comment on this topic.
  2. Seventeen hours later, the FCC tweeted that “[they were] experiencing technical difficulties with [their] comment system due to heavy traffic.” They tweeted a similar message 74-minutes later.
  3. This triggered a wave of re-tweets and comments about the outage in many places. Interestingly this wave was delayed in the Beltway. It surged the next day, just before lunchtime in DC, continuing throughout the afternoon. The two spikes were at lunchtime and just after work . Evidently, people are not re-tweeting while working. The timing of the spikes also reveals some interesting behavior patterns on Twitter use in DC.
  4. By 4am on Wednesday the surge was over. People around the Beltway were back to their normal tweeting about traffic, construction, delays, lights, outages and other items confounding their commute.

Of course, as soon as we saw the new pattern, we adjusted our model to account for this pattern. However, we thought it would be interesting to show in a simple graph how much “traffic on traffic, delays and congestion” Mr. Oliver induced in the geography around the Beltway for a 36-hour period. Over the first week of June, one out of every five Tweets about traffic, delays and congestion by people around the Beltway were not about commuter traffic, but instead around FCC website traffic caused by John Oliver:

Tweets from people geographically Tweeting around the Washington Beltway on traffic, congestion, delays and related frustration for first week of June. (Click to enlarge.)
Tweets from people geographically Tweeting around the Washington Beltway on traffic, congestion, delays and related frustration for first week of June. (Click to enlarge.)

Obviously, a simple count of tweets is a gross measure. To really use Twitter as a sensor, one needs to factor in many other variables: use text vs. hash-tags, tweets vs. mentions and re-tweets, the software client used to send the tweet (e.g., HootSuite is less likely to be a good source for accurate commuter traffic data); the number of followers the tweeter has (not a simple linear weighting) and much more. However, the simple count is simple first-order visualization. It also makes interesting “water-cooler conversation.”