Tag Archives: leadership

Data Scientists vs. Data Engineers: Facts vs. Interpretation

Some of the things we build at work are closed-loop, Internet-scale machine learning micro-services. We have created algorithms that run in milliseconds that we can invoke via REST calls, thousands of times per second. We also have created data pipeline processes that process new (mostly sensor) data and build and publish new models when critical thresholds are reached. This work requires the collaboration of two very in-demand specialists: Data Scientists and Data Engineers.

Contrary to the classic Math vs. Coding vs. Domain Expertise Venn diagram, Data Scientists and Data Engineers share many similarities. Both love data. Both have domain expertise. Both are great functional programmers. Both are good at solving complicate mathematical problems—both discrete and continuous. Both use many similar tools and languages (in our case, Spark, Hadoop, Python and Scala).

However, over the past two years, as we have improved the collaboration between each to build better machine learning services, we have some key differences between each role. These differences are not just based on skill set or disposition. They also include differences areas of responsibility that are essential to creating fast, scalable, and accurate machine learning services.

It is easy to muddle raw data from fully deterministic derived data from algorithmically derived data. Raw data never changes. Rules may change but are easy to manage with clean version controls. However, even the same deterministic algorithms can produce different results (one example: whenever you refit or rebuild a model using new data, your results can change). If you are building algorithmic services you need to keep everything clean and separate. If not, you cannot cleanly “learn” from new data and continuously improve your services.

We have found a very nice separation of responsibility that prevents muddling things:

  • Our Data Engineers are responsible for determinist facts
  • Our Data Scientists are responsible for interpretation of these

This boils down to this: determinist rules are the purview of engineers while algorithmic guesses come from scientists. This is a gross simplification (as both engineers deal in many, many complexities). However, this separate keeps it very clear, not only in determining “who does what” but also preventing errors, guesses, and other unintended consequences that pollute data driven decision-making.

Let’s take Google Now’s “Where you parked” service as an example. Data Engineers are responsible for processing the streaming sensor updates from your phone, combining this with past data, determining motion vs. at rest, factoring out duplicate transmission, geospatial drift, etc. Data Scientists are responsible for coming up with the algorithm to determine whether your detected stop state is a place where you parked (vs. simply being at work, at home, or at a really bad stop light). Essentially, Data Engineers capture and process the data to extract required model features while Data Scientists come up with the algorithm to interpret these features and provide an answer.

Once you have separation down, both teams can collaborate cleanly. Data Scientists experiment and test algorithms while Engineers design how to apply at scale, with sub-second execution. Data Scientists determine what approach is used to build models (and what triggers model optimization, build and re-fitting). Data Engineers build seamless implementation of this. Data Scientists build algorithm prototypes and MVPs; Data Engineers scale these into fast, reliable, services. Data Scientists worry about (and define rules) to exclude outliers that would wreak havoc on F-tests; Data Engineers implement defensive programming and automated test coverage to ensure unplanned data does not wreak havoc on production operation.

On the 45th Anniversary of the Moon Landing: 5 Lessons the Apollo’s Program Manager taught me at MIT

I originally posted a version of this on five years ago, on 40th Anniversary of the Apollo Moon landing. At that time, social media and smartphone were just starting to explode. Today, as social sharing and mobile are giving rise to IoT, these lessons from 1969 are perhaps even more important.

Putting things in perspective

It is easy to feel really proud of our accomplishments, whether we are scaling a consumer application a 1,000-fold in one year, rolling out a huge ERP program or even creating a new technology. However these accomplishments pale in comparison to what the Apollo, Gemini, and Mercury Missions achieved 45 years ago. Imagine this scenario:

You are listening to the radio and the President announces that the country is going to put a man on the Moon by the end of the decade. Keep in mind that no one has ever even escaped low earth orbit–let alone escaped Earth’s gravity, executed Holman transfers AND navigated to another body. Now you have to implement the largest engineering project in history, while inventing not only technologies, but also whole fields of study. All under the watch of the press—and all completed within one decade.

This is inconceivable to most of us in our work today. It is inspirational.

Success: One small step for man, one giant leap for mankind. (Credit: NASA)
Success: One small step for man, one giant leap for mankind. (Credit: NASA)

My lucky exposure to the people of Apollo

At the time I studied aerospace engineering at MIT, we were lucky enough to have several veterans of the Apollo Program on staff as our instructors. Not only were they great instructors; they also could recount first-hand experiences of events that the rest of us could only read about in the history books.

One of these professors was Joe Shea, the original Program Manager of NASA’s Apollo Program (portrayed by Kevin Pollack on HBO’s excellent series, “From the Earth to the Moon”). Contrary to what that series depicted, it was Joe who came up with concept of splitting the Apollo Program into missions that achieved never-before-achieved technology marvels.

Joe is also considered by some a founder of the Systems Engineering profession (many consider him the greatest systems engineer who ever lived). This made him the perfect person to each the capstone class of the aerospace curriculum: Systems Engineering (Fred Wilson of USV has written a great post on how fun Systems Engineering is and how important it is for engineering leadership). Every year, he would get a project from NASA and guide his students through all aspects of design, simulation, planning and even cost analysis. Our midterms and finals were real-life presentations to the Administrator of NASA.

Under Joe, I got to work on something called “Project Phoenix,” returning to the moon—but now with a re-usable capsule and landing four astronauts at the pole and keeping there for 30 days (a much harder prospect). In this project I learned about everything from active risk management to critical path costing to lifting bodies to Class-E solar flares. (How cool was that for a 20-year-old?)

Life lessons I learned from Joe

The technical things I learned from Joe got me my first job at Lockheed Martin (then GE Aerospace). It was great to be able to say that I had worked on a NASA program, helped create both a PDR (Preliminary Design Review) and CDR (Critical Design Review) and present elements of them to the Administrator of NASA in Washington.

However, I learned five much more important lessons — independent of aerospace or any other technology – that I have used in the eighteen twenty-three years since:

  1. Break Big Challenges into Small Parts. Any obstacle can be achieved if you break it down to smaller items. If these are too large, break them down again. Eventually you will get to things that have clear, straightforward paths for success. Essentially this is the engineer’s version of “a journey of a thousand miles begins with a single step”
  2. Know Your Stuff Inside and Out. You cannot be a technology leader who only manages from above. You must understand how the components work. This is the only way you will see problems before they happen. Remember, you are the leader who is the only one positioned to connect the “Big Picture” to the execution details.
  3. S#!% Happens. Things break. Schedules are late. People leave the project. Plan for this. Ask yourself every week what can go wrong. Put contingency plans together to address the biggest or most likely of these. Today, this is done in everything from Risk Management to DevOps.
  4. There is No Such Thing as Partial Credit. Yes, unlike a rocket, you can “back out” (essentially un-launch) software. However, the costs of this type of failure are enormous: not only does it cost 3-5x more to back-out, fix and regression test changes, it also frequently results in lost revenue and customers. Get things right in development – then certify them in testing (not the other way around). Don’t count on being able to “back-out” after a failed launch–this will be come more and more true as we push software to millions of “things” comprising IoT. Joe hammered a lesson into our heads with a chilling story: when people forgot this and rushed three astronauts died during a basic systems test on the Apollo 1.
  5. Take Ownership. If you are the leader, you are responsible for the team’s or product’s success. If you are a line manager, you are not only responsible for your area but are being relied upon by your peers for success. If you are a hands-on analyst or engineer you are actually delivering the work that leads to success. In all cases, ensure you do your job right, ask for help when you need it and never lie or hide anything.

Five really important lessons. I am grateful I had the opportunity to learn them before I entered the full-time career work force. I try to “pay this back” by teaching these lessons and concepts everywhere I go.

Before I forget…

Thank you to the men and women of Apollo. Thank you also to the men and women of Gemini and Mercury (it is easy to forget them on this day). You achieved miracles on a daily basis and inspired whole generations of scientists and engineers.