The GOP Debate Taught us about machine learning

While large swaths of Twitterdom were playing drinking games during the GOP Debate two nights ago, we had some fun of our own playing with our machine learning classifier. Here’s what we learned:

What worked out well

Our classifier really excelled in two important categories: candidate identification and sentiment. Correctly distinguishing candidates was something we thought we’d see, since, in the end, it’s one of the easier tasks for a machine learning classifier. If someone says “Carly,” in other words, you’d hope to see “Carly Fiorina” come as a confident prediction. And so it was. The cool thing was seeing some common misspellings and tangential information helping our model. “Turmp”, for example, is a great predictor for Donald Trump. “Ron”–a.k.a. Ron Paul–came up enough to help our classifier choose those tweets as related to Rand Paul since, commonly, tweets about Ron Paul compared father to son. In the end, our classifier sat at 84% confidence identifying candidates and, with about 20 minutes of tuning, we were able to get it to 87%. Now, let’s move to sentiment. For starters, candidates’ official hashtags–things like #standwithrand–were strong positive predictors. Those particular hashtags are essentially hunky-dory echo chambers where supporters and like-minded folks trumpet their candidates’ bona fides. Meanwhile, the negative predictors are a bit more interesting: past the usual invective, a lot of our strong predictors here concern issues about which the Republican party isn’t necessarily known for its visionary leadership. “Climate,” for example, evidences the fact that most of the candidates in these debates poo-pooed doing much of anything about the environment and instead focused on issues like terrorism or immigration.

What didn’t work out quite as well–and how to can fix it

Subject matter proved far more difficult to accurately classify. Our model hovered around 60% confident, but the reasons this is the case are actually quite interesting. So why doesn’t the subject matter classifier work as well as the others? Well, this might shock you, but a lot of tweets are just pronouncements of context-free opinion. You know, stuff like this:

Donald Trump: “People respect what I say.” Yeah… About that… #GOPDebate

— Jordan Peeples (@jopeeps31) December 16, 2015

That’s not really about anything past the fact that our intrepid tweeter is expressing a normal human emotion: Trump-induced nausea. Similarly, the following:

Best opening speeches: Marco Rubio & Carly Fiorina #GOPDebate

— Justin Russ (@coachjustinruss) December 16, 2015

Our classifier rightly thinks this is a positive statement (“best” happens to be a fairly good predictor here) but there’s no real meat on the bone. It’s just a statement of opinion. Our classifier bucketed these tweets into “none of the above”–correctly, we might add–but since so much of our data fits this grouping, the model doesn’t have the necessary amount of data to confidently categorize some of the other major issues. Unlike candidates’ names, the issues themselves have a large associated lexicon. Words like “ISIS” give the model confidence that the statement is about terrorism because ISIS has been a constant in all the debates upon which we built our training data. But what about “San Bernadino”? Since that attack was so recent, and since it occurred since we last trained this model, our classifier has no way of knowing what to do with it. This is why data scientists put so much time into retraining their algorithms. Machine learning models don’t read the news or keep current on world events: they are trained by people who read the news and keep current on world events. To reiterate a common refrain, this is human-in-the-loop machine learning and this is a great–if fairly basic–example of how it works. As of writing, we’re currently running another Appen job where our contributors will classify the same tweets our model did, only they’re be looking only at data our model isn’t fairly confident about. By using those tougher rows to further tune our algorithm, we’ll be adding training data that’s brand new or particularly tricky, which is actually some of the best training data you can use. In short: our model already knows that “Ben” means “Ben Carson” but it doesn’t know that “San Bernadino” was a terrorist attack. After a bit of retraining, however, it will.

What’s next

Since there’s a GOP debate seemingly every other day and since Twitter basically explodes each time, we’re going to keep honing and tweaking our model. We’ll let you know how much better it gets each time, what bits it struggles with, and any particularly interesting insights we get along the way. Till next, when maybe someone will go with a yellow tie or something.