Meet the American Running a Soccer Analytics Research Group in Belgium
And let’s learn something about Bayesian approaches to in-game win probabilities while we’re at it.
Here are a few unlikely things:
An influential research center on soccer analytics based at a university, which isn’t the kind of setting where you tend to find the cutting-edge work in the data science of sports.
An influential research center on soccer analytics based at a university in Belgium, a country with a weak domestic club game in spite of its abundantly talented national team.
An influential research center on soccer analytics based at a university in Belgium run by an American, a Wisconsinite professor in machine learning and artificial intelligence.
Yet Jesse Davis is part of the Declarative Languages and Artificial Intelligence Lab and runs its subsidiary, the Sports Analytics Lab, at KU Leuven. This laureled, almost-600-year-old institution half an hour east of Brussels has been the home for some of the most advanced research in a sport that has taken a hard turn towards analytics in the last decade—after trying very hard to ignore it at first.
Davis’s group consists of post-doctoral researchers, PhD and master’s students, who work closely with companies like SciSports, a Dutch soccer data intelligence company that supports clubs and players directly. Of course, the Sports Analytics Lab also talks to Oud-Heverlee Leuven, the local top-tier professional team. It has had interactions with the data analysts at Liverpool FC, FC Barcelona and elsewhere, although Davis can’t say too much about the work they have done with clubs. And the lab is about to publish a paper co-authored by a data analyst at the Belgian federation about the risk and reward of pressing. Sometimes, the federation will give Davis an area that it would like researched, which students can then take on as a project or thesis.
But what is remarkable is that an awful lot of the research done by the Sports Analytics Lab winds up in the mainstream media, crossing over from academic publishing—a notoriously difficult trick to pull off. Its work has been picked up by the likes of FiveThirtyEight and all of the major newspapers in Flanders, Belgium’s Dutch-speaking half wherein Leuven sits.
Davis grew up in Madison, another small city built around a major college, like Leuven. All his life he was besotted by sports. His family had basketball and football season tickets for UW-Madison and he swam competitively until his freshman year at Williams College. But he didn’t discover soccer until the 2002 World Cup. Several of his friends in grad school were soccer fans, and he watched Bayern Munich games with a German professor in Seattle during his post-doc.
When, in 2010, he and his Irish wife finished up their respective PhDs, they wound up in Belgium so that he could research machine learning, data mining and AI. But the pull from sports was strong, and in Europe, sports means soccer.
“There was a core group of us extremely interested in sports,” Davis recalls. “Because we found it fun and interesting. The data that you have on sports is extremely rich and complicated. And you can do lots of interesting things in AI. There’s a lot of artificial intelligence can gain from analyzing sports data because innovation in these technologies is often driven by practical problems.”
The first PhD student Davis hired for his lab went on to become the data scientist for Club Brugge, one of the powerhouses of the Belgian league. By 2013, a pair of master’s students managed to get access to some data for a few top-tier teams in Belgium and from there things snowballed. The Sports Analytics Lab was founded in 2014 and got its hands on ever more data as it burrowed into the soccer community.
It began simply, by researching things like “total shot ratio,” deducting shots conceded from shots taken and working out whether it related to winning. Then Davis and his students moved on to GPS tracking data on player movement on the field, looking for patterns, figuring out which kinds of plays led to shots. All the while, they were publishing papers and establishing their name.
Since then, the lab has worked on predicting match outcomes, injury prevention by tracking workloads, automatically-generated match reports and a model identifying playing styles. It has published papers on things like the ideal areas for taking, or passing up, long-distance shots—identifying optimal behaviors. Other paper titles include “A Bayesian Approach to In-Game Win Probability” and “Analyzing Learned Markov Decision Processes Using Model Checking for Providing Tactical Advice in Professional Soccer” and “Valuing On-the-Ball Actions in Soccer: A Critical Comparison of xT and VAEP.” (Helpfully, the lab runs an excellent blog that explains things in layman’s terms, in case you really want to understand that Bayesian approach.)
What intrigues about soccer analytics is that, on the one hand, it seems to be advancing at warp speed. But, on the other, there is still no real consensus in its tight-knit community on what data matters and what doesn’t. Soccer is so fluid that it remains difficult to parse what factors combine to produce high-value scoring chances, given the endless variables in any opportunity to score a goal.
“I’m not sure soccer lends itself to the same level of statistical detail as basketball or baseball,” Davis concedes. “Baseball has these one-off interactions between the batter and pitcher and it’s very natural to track these events. But in soccer it’s a little bit unclear what you should be keeping track of, what is meaningful. Because a lot of the things that are of interest are off the ball and it’s difficult to quantify those. So it’s not natural for someone to sit there and come up with the vocabulary to describe in a detailed way what’s going on on the pitch.”
Tracking data has attracted a lot of attention from analysts for its obvious potential. But it remains a problematic area because it doesn’t capture which way a player faces, Davis points out. That information is crucial in a sport that offers 360 degrees of possibilities in almost every scenario.
But the point of Davis’s lab is to work through these problems in a transparent way. Most of what it does is open source. Almost all of its big projects are published with accompanying code bases so that others might piggyback off the work and move it forward. The Sports Analytics Lab offers a Soccer Action Package that provides tools for anybody to create their own expected goals model, or to quickly distill massive amounts of raw data from soccer’s various data purveyors. It has been downloaded more than 10,000 times.
“We try to be extremely open so that others can use it,” Davis says. “That probably comes from us being academics where you’re supposed to have things that are replicable.”
This collaborative spirit is likely what makes the lab a trusted and respected entity in the analytics community. After all, everybody else is competing. The overwhelming majority of the sport’s data analysts work for clubs, federations or the data companies. They’re all trying to beat someone, whether on the field or in business. The lack of a vested interested in any outcome other than to advance the science has made the Sports Analytics Lab a kind of neutral hub. Because nobody else is doing this, consistently researching soccer data with as many researchers and then publishing its findings.
Davis isn’t interested in ranking where his lab falls in the hierarchy of its field. He quickly deflects credit for its progress and publications to his students and coworkers. He’s onto the next frontier in computer science: verification in AI, helping the software to identify things more accurately by verifying its findings. Here, too, he sees soccer as a useful venue to advance computer science, because it offers reams of scenarios that can help to determine whether models value things correctly.
“One of the reasons that I find the sports stuff appealing is that in data science and artificial intelligence, all of the core problems occur in sports,” he says. “How do you represent the data? How do you make predictions about the future when things are uncertain? How do you understand interactions between players? These are general, reoccurring problems that people are interested in in a variety of different domains.”