Tag Archives: Lambda Architecture

Twitter traffic jams in Washington, created by… John Oliver

Summary: In the first week of June, 20% of the Tweets about traffic, delays and congestion by people around the Washington Beltway were caused by John Oliver’s “Last Week Tonight” segment about Net Neutrality.

At work, we are always exploring a wide range of sensors to obtain useful insights that can used to make work and routine activities faster, more efficient and less risky. One of our Alpha Tests is examining use of “arrays” of high-targeted Twitter sensors to detect early indications of traffic congestion, accidents and other sources of delays. Specifically we are training our system how to use Twitter is a good traffic sensor (by good, in “data science speak” we are determining whether we can train a model for traffic detection that has a good balance of precision and recall, and hence a good F1 Score). To do this, I setup a test bed around the nation’s second-worst commuter corridor: the Washington DC Beltway (our my backyard).

Earlier this month our array of geographic Twitter sensors picked up an interesting surge in highly localized tweets about traffic-related congestion and delays. This was not an expected “bad commuter-day”-like surge. The number of topic- and geographically-related tweets seen on June 4th was more than double the expected number for a Tuesday in June around the Beltway; the number seen during lunchtime was almost 5x normal.

So what was the cause? Before answering, it is worth taking a step back.

The folks at Twitter have done a wonderful job at not only allowing you to fetch tweets based on topics, hash tags and geographies. They have also added some great machine learning-driven processing to screen out likely spammers and suspect accounts. Nevertheless Twitter data, like all sensor data, is messy. It is common to see Tweets with words spelled wrong, words used out of context, or simply nonsensical Tweets. In addition, people frequently repeat the same tweets throughout the day (a tactic to raise social media exposure) and do lots of other things that you must train the machine to account for.

That’s why we use a Lambda Architecture to process our streaming sensor data (I’ll write about why everyone–from marketers to DevOps staff should be excited about Lambda architectures in a future post). As such, not only do use Complex Event Processing (via Apache Storm) to detect patterns as they happen; we also keep a permanent copy of all raw data that we can explore to discover new patterns and improve our machine learning models).

That is exactly what we did as soon as we detected the surge. Here is what we found: the cause of the traffic- and congestion-related Twitter surge around the Beltway was… John Oliver:

  1. In the back half of June 1st’s episode of “Last Week Tonight” (HBO, 11pm ET), John Oliver had an interesting 13-minute segment on Net Neutrality. In this segment he encouraged people to visit the FCC website and comment on this topic.
  2. Seventeen hours later, the FCC tweeted that “[they were] experiencing technical difficulties with [their] comment system due to heavy traffic.” They tweeted a similar message 74-minutes later.
  3. This triggered a wave of re-tweets and comments about the outage in many places. Interestingly this wave was delayed in the Beltway. It surged the next day, just before lunchtime in DC, continuing throughout the afternoon. The two spikes were at lunchtime and just after work . Evidently, people are not re-tweeting while working. The timing of the spikes also reveals some interesting behavior patterns on Twitter use in DC.
  4. By 4am on Wednesday the surge was over. People around the Beltway were back to their normal tweeting about traffic, construction, delays, lights, outages and other items confounding their commute.

Of course, as soon as we saw the new pattern, we adjusted our model to account for this pattern. However, we thought it would be interesting to show in a simple graph how much “traffic on traffic, delays and congestion” Mr. Oliver induced in the geography around the Beltway for a 36-hour period. Over the first week of June, one out of every five Tweets about traffic, delays and congestion by people around the Beltway were not about commuter traffic, but instead around FCC website traffic caused by John Oliver:

Tweets from people geographically Tweeting around the Washington Beltway on traffic, congestion, delays and related frustration for first week of June. (Click to enlarge.)
Tweets from people geographically Tweeting around the Washington Beltway on traffic, congestion, delays and related frustration for first week of June. (Click to enlarge.)

Obviously, a simple count of tweets is a gross measure. To really use Twitter as a sensor, one needs to factor in many other variables: use text vs. hash-tags, tweets vs. mentions and re-tweets, the software client used to send the tweet (e.g., HootSuite is less likely to be a good source for accurate commuter traffic data); the number of followers the tweeter has (not a simple linear weighting) and much more. However, the simple count is simple first-order visualization. It also makes interesting “water-cooler conversation.”

RE: Can Twitter be saved? (What I would do if I worked at Twitter)

Mark Gimein (columnist of Slate’s “The Big Money”) recently posited that Twitter is “collapsing under its own weight” due to the sheer number of Tweets we all have to wade through. I agree. I recommend three steps to the leaders of Twitter to make the service more useful. These would not only make Twitter more enterprise-friendly; they would also provide Twitter a service for new revenue generation.

Who says Twitter is in danger?

Mark Gimein recently posited that Twitter is in danger of “collapsing under its own weight” by virtue of the following:

  1. Twitter’s hyper-growth has added millions of people and millions of posts to follow
  2. As a result, all of us now have an exponentially growing stream of information to follow
  3. Not only does this take too much time…
  4. It also flows by so quickly that we may miss key items of interest

The basic question for us is deciding if Twitter is in danger of outgrowing its micro-blog adaption of the old-fashioned party-line mode of communication?

Why should we care?

Twitter is one of those products that Guy Kawaski calls “Highly Emotive,” i.e., you either love it or hate it. (I got this first-hand from him at Wharton last spring; see his most current book and lecture series on his DICEE model.)

He indicated that Twitter is the only thing out there that provides the ability to 1) see in real-time what people are talking about online and 2) immediately reach out and directly message them to converse or develop business around these topics of shared interest. It enables you to do this more quickly and cost-effectively that use of “traditional” market research and direct marketing tools. However, once you lose the ability to easily see when people are talking about the things you care about (in a way that is compelling and valuable to you) Twitter loses its value.

Better Twitter Search does not provide the answer

saas_searchEveryone focuses on Twitter’s Search capabilities (or lack thereof). I do not believe Search provides the answer. Why? Because Search requires you to act (i.e., you need to Search for someone discussing something of interest, then see if you can act on the results). Unless you are a professional Social Media Manager you likely have lots of other things occupying your day and do not have time to do this (especially as Twitter gets busier).

My answer to make Twitter more valuable (and useful)

If I led Product Development of Technology at Twitter I would do three things:

  1. Enable Filtered Subscriptions (a.k.a. Filtered Following)
  2. Integrate Meritocracy into Feeds
  3. Enable Scheduled Delivery of Matches (in Addition to Real-time Feeds)

The First Part of the Answer: Filtered Subscriptions (a.k.a. filtered Following)

The first part of the answer lies in Subscriptions (actually Filtered Subscriptions). Rather than going onto Twitter and looking for things of interest, you instead register your topics and let Twitter inform you when people discuss things you care about.

Useless Tweets

Twitter does let you subscribe to (“Follow”) people today (Following function) but its does not let you Follow topics (Search is the NOT the same as Subscribing). As a result, when I find someone who is something of enough interest to me that I want to follow him or I now subject myself to everything he or she Tweets: including hearing when “took off” from SFO and “touched down” at JFK. Uggh.

Let me—

  1. Follow Tweets with topics of interest (e.g., keywords)
  2. Specify if I want to Follow everything a person says or only when they Tweet something on a list of keywords (this would cut my Twitter stream by 80-90%)

The Second Part of the Answer: Integrate Meritocracy into My Feeds

I apologize for using the Social Networking Buzzword (Meritocracy). What I am saying here is we need to ability to provide feedback on Tweets and Tweeters. Why? Because this will leverage social networking behavior to incentivize people to Tweet items of value to others.

tweet_fav1I do not recommend anything complicated (Twitter’s beauty is in its simplicity). I also do not recommend anything that allows you to punish people (this create a harsh environment). Instead “piggyback” on the Favorites feature and use it to let me further filter my feed and subscriptions—

  1. Let me prioritize my Followers List and Feed based on people with the highest percentage of Favorited Tweets
  2. Let me toggle my Feed to only show Tweets Favorited by others or Tweeters who have ever been Favorited

You might need to change the name of this from “Favorite this Tweet” to “Like this Tweet” (and make the Star “button” visible even if you do not “mouse over” it. However it would let me turn down the flood of “Just bought a latte as #Sbux” tweets and instead let me see Tweets that others recommend as valuable.

The Third (and Final) Part of the Answer: Enable Scheduled Delivery of Matching Feeds

If I really want to harness the power of Twitter to create value, I should be able to feed off its network without having to actively visit the site. Why? Because other I have to take time out to go to (or ping) Twitter myself (I don’t have the time to do this daily–let alone every hour, on the hour)

Let me schedule when I want Twitter to share information with me (e.g., daily, hourly) and specify what format I want (e.g., email, XML file). When combined with filtering and meritocracy, this enables Twitter to become a Business Service for Marketing, Business Development, Customer Service, Loyalty Management, Competitive Intelligence, etc. that I could integrate into my enterprise, i.e., I WOULD PAY FOR THIS SERVICE.

Why this needs to be done INSIDE Twitter

Many would argue that these recommendations are add-on services that can be done through external applications like Twitterhawk. Theoritically this is true. However, (practically) it requires external parties to make continuous RESTful GET calls to Twitter, inflating traffic, causing Fail Whales and generally making performance miserable for everyone. Twitter can do this much more efficiently from the inside (and create revenue-driving services at the same time).

I state this emphatically because it has already been done elsewhere with great success in many other places. I have personally created systems that have monitored thousands of transactions per second and automated response based on rules. Every Stock Market Brokerage from NYC to Tokyo has been doing this for years at even higher transaction rates (imagine using a Twitter model to follow every stock trade).

Five Years Later

Looking back five years later much has changed, and much is still the same. Twitter (via BackType) pioneered the Lambda Architecture model, enabling them to do much more with data–and to (mostly) kill the Fail Whale. However, even five years later, will all of Twitter’s developer APIs, business users are still complaining about how hard it is to get the most important content from Twitter, just as it is happening. Right now (as of 2014), the only way to do this is to hire a social media analytics provider or build your own Lambda architecture using Twitter mentions and streaming feeds.