Let's Make A Bet: Three Stanford Students De-Anonymize Buzzfeed’s Tennis Exposé


THE SCANDAL FIRST SURFACED when Buzzfeed released an article containing allegations against 16 unnamed professional tennis players for accepting money to manipulate their match outcomes. According to the story, more than half of the the players, whose names they refused to release, would be competing in the Australian Open in just a few days.

Instead of playing the guessing game, undergraduates Christina Wadsworth, Russell Kaplan, and Jason Teplitz decided to de-anonymize the players Buzzfeed had found. In a genuine Stanford fashion, these curious and talented individuals did the dirty work themselves and found data linking pro tennis players to questionable betting odds. And on Jan. 20, they finally released the names of 16 professional tennis players as potential match fixers.

Using Buzzfeed’s open-sourced data, they were able to uncover the names of Grand Slam champions and top-10 ATP competitors, including Lleyton Hewitt, Igor Andreev, and Janko Tipsarević. Buzzfeed collected betting data from 7 bookmakers over an entire calendar year on various matches from one website. They published their code and analysis, but the names they flagged for highly suspicious match-play were masked.

Wadsworth, Kaplan, and Teplitz downloaded and cleaned a subset of the same data to display distinct odds without duplicates, replicating how they believe Buzzfeed found the names. Their methodical approach ensured 100% confidence in the results.

The biggest challenge they faced was acquiring data. It wasn’t as simple as having a ‘Download Tennis Data’ button. They had to reverse-engineer the original code to get the same numbers as Buzzfeed and match everything up precisely, a process referred to as “scraping a website.” Coming up with an automation process for this and letting it run across tens of thousands of data points was no small feat.

By the end of it, the trio was extremely confident that the names they found were in fact the names that Buzzfeed refused to publish. Most of their time was spent not doing data collection or analysis, but verifying their results with multiple parties to ensure correctness. Picture three laptops running at 100% capacity for two days straight downloading and processing data, professors and faculty reviewing the code, and the team of three emerging from the process, sleep-deprived and successful.

Computer science has the potential to ignite a lot of positive change in the world. The team believes that looking into corruption at that level was worth their time, and they even received response from a former pro on the circuit who had witnessed match-fixing firsthand. But unfortunately, the media often prefers reporting match results and interesting tournament anecdotes, not statistical analysis that demonstrates corruption. That’s just too heavy, right? Our lawyers would never approve.

But luckily, the subject is not being ignored. During Lleyton Hewitt’s press conference after announcing his retirement following the Aussie Open, people asked questions regarding allegations made against him. He vehemently denied any association with the matter, declaring the thought of it absurd. But at what point do the numbers speak louder than words? These technological discoveries and intellectual curiosity do tend to catalyze social, political, and organizational improvements. Tennis may need to take a lesson or two from baseball, which historically has banned multiple players for accepting money to throw games.

An end to tennis match-fixing would be a fantastic end result to this story. Historically, if you look at corrupt institutions- and there are strong indications that this may be a corrupt institution, then they don’t fix themselves. It happens when there is outrage from the community it serves. Whether you care about the sport and its integrity like Kaplan and Wadsworth or the end to corruption in industry like Teplitz, the hope is that the story doesn’t get buried and people continue to show their frustration to ignite change. These three Stanford undergrads would like to deliver a message, and that’s to try to maintain the competitive edge to sports and be an ethical society because “it’s something we all care about, and if you don’t, you probably should.”


MEET THE TEAM

Russell

Russell Kaplan, co-founder to Teplitz of TreeHacks, is also a Junior Computer Science major at Stanford University. Last year he interned at DropBox, and this year he will be working on a personal project. He is a lifelong tennis fan, his favorite player being Novak Djokavic.

Kaplan’s favorite thing about Stanford is the access to an incredible community of students and professors doing impactful work. His dream job is to work for himself doing interesting things in machine learning and artificial intelligence.

Jason

Jason Teplitz is a Junior Computer Science major at Stanford University. He worked at Google last summer and plans to work at a systems startup this summer, making data center’s software fast enough to analyze massive sets of real-time data. He got involved with the team’s project due to his experience gathering large data sets from websites.

Teplitz co-founded TreeHacks, Stanford’s annual major hackathon, with team member Russell Kaplan. In 10 years, he sees himself either working on a project he deeply cares about, or living by himself in the woods. He hasn’t decided yet.

Christina

Christina Wadsworth is a Sophomore at Stanford University majoring in Computer Science, focusing on artificial intelligence. This summer she will be interning at Google to work on the Android run-time team. She was inspired to work on the Buzzfeed project because of her passion for sports, her talent for coding, and her love for two of her best friends, Jason and Russell.

Having struggled with being a woman in computer science and as the TreeHacks coordinator this year, Wadsworth’s main focus is on being a positive role model. She cares deeply about diversity. At Treehacks this year, they accepted a 50:50 gender ratio. This is the first time that has happened in a major college hackathon.


To see the complete list of names, see the team’s blog post at:

HTTPS://MEDIUM.COM/@RKAPLAN/FINDING-THE-TENNIS-SUSPECTS-C2D9F198C33D#.VOYFOP204


Comment