An Inside-Out Attempt to Classify Playing Styles

Are Paula Badosa and Emma Navarro actually the same player?

Every so often, an analyst introduces a new way to classify playing styles. The approach usually involves taking a bunch of different stats and identifying clusters of more or less similar players. One group might be aggressive, flat hitters; another might be clay-court experts; a third might be serve-plus-forehand specialists.

Two problems. First, tennis stats tend to be highly correlated. If you’re good at one thing, you’re probably good at most other things. Second, the stats we have aren’t that great. Sometimes we get goodies like spin rate and shot speed from broadcasts, but that’s the exception. Instead, we have to build classifiers from pedestrian metrics like second-serve win rate or–at best–charting-based stats like Backhand Potency and Aggression Score. All of these things are tied to points won, which brings us back to the correlation problem.

I don’t have a solution. I do, however, have a zany idea that might just yield some insights. Instead of classifying players by the most granular metrics we have, what about identifying styles from results?

If two players have the same unexpected head-to-head against, say, Iga Swiatek, they might just have something in common. If both players have similar unexpected head-to-head records against many opponents–not just Iga–it’s probably not just a coincidence, right? We might be able to look at their playing styles and see that they are troubling opponents in similar ways, but even if we didn’t know the first thing about their skills or tactics, we could spot the parallels in their results.

The concept is simple enough. The math is not, and more importantly, the results offer more questions than answers. It’s possible this is a dead end, but I can’t see far enough around the next corner to be sure.

Welcome to the matrix

We’re going to dive into the weeds in a moment. Surely this network graph will entice you to come along?

Those are the 20 most “unique” players in the dataset. The degree of similarity between each pair of players is represented by the thickness of the line that connects them. (I know, you can’t really tell most of the lines apart.) Clara Tauson is a yellow dot because she’s by far the most unique of all.

We’ll come back to that, maybe.

Here’s how this works. I took the 60 players with the most tour-level wins since 2021 and found all the meetings among them. For each pair of players, I used pre-match Elo ratings to determine how “unexpected” the results were, and in which direction. For example, Elo ratings say Jelena Ostapenko usually has a ~20% chance of beating Iga, yet she has done so every time. Ostapenko’s score vs Swiatek, then, is +0.8, and Iga’s score for the same matchup is -0.8. Very few scores are so extreme. Most head-to-heads go roughly as expected, so they hover around zero.

Each player, then, has a score against every other player. Next, I use a method called matrix factorization to analyze and compare those sets of scores. Matrix factorization is commonly used in recommendation systems–if you and I give similar ratings to a bunch of movies and I like a new movie, you’ll probably like it, too. In tennis terms, say that Players A and B have unexpected results against many of the same players. If Player A upsets Aryna Sabalenka, Player B might have a better shot than we think to knock out Sabalenka as well.

In theory, this approach should capture some things about playing style, but it doesn’t actually know anything beyond the Elo-adjusted head-to-heads. Matrix factorization looks for efficient ways to characterize the relationships between players. Those might correspond to real-world attributes like “heavy topspin” or “attackable second serve,” but they might be incomprehensible to us lowly humans.

Mostly incomprehensible

The algorithm decided that the cleanest solution was to divide the 60 players into ten categories. I’ve numbered them, but the order doesn’t matter. Here’s one:

1: Azarenka, Bencic, Kalinskaya, Kasatkina, Kostyuk, Mertens, Parry, Rybakina

    Ok… some flat hitters (except Kasatkina), nobody who likes taking a lot of risk… you can sort of see what’s behind this one. Next:

    2: Alexandrova, Anisimova, Frech, Kontaveit, Muchova, Pegula, Schmiedlova, Vekic

    Flat hitters who swing big, though I wouldn’t have put Pegula in this group. Muchova isn’t a great fit either. Another one:

    3: Begu, Krejcikova, Kvitova, Linette, Maria, Siniakova, Svitolina, Swiatek, Tomljanovic, Vondrousova, Qinwen Zheng

    Ah yes, those noted twinsies, Iga Swiatek and Petra Kvitova. Matrix factorization works in mysterious ways, I guess. Next is my favorite:

    4: Bogdan, Tauson

    Tauson, as noted above, is the most unique player in the dataset. Bogdan is not far behind. That’s the only thing they have in common, right? Onward:

    5: Badosa, Garcia, Haddad Maia, Navarro, Pliskova, Potapova, Sherif

    Almost as head-scratching as the Iga group. Are there any players you’d be less likely to group together than Caroline Garcia and Mayar Sherif? For what it’s worth, the algorithm thinks Badosa and Navarro are the two most similar players in the dataset.

    We don’t need to comment on them all, but here are the rest:

    6: Bouzkova, Kudermetova, Ostapenko, Samsonova

    7: Putintseva, Shnaider, Sorribes Tormo

    8: Blinkova, Bronzetti, Collins, Fernandez, Kalinina, Parrizas Diaz, Sabalenka

    9: Cirstea, Paolini

    10: Cocciaretto, Cornet, Gauff, Gracheva, Jabeur, Keys, Osorio, Sakkari

    If these groupings were based on traditional or charting-based stats, I’d assume there was a coding error. As it is, the clusters do not inspire confidence in this alternative method.

    Style-ish

    Those groups were determined by how players rated on three “style factors” that the algorithm extracted from all those head-to-head scores. Again, we don’t know what they correspond to in the real world, but each one is associated with how players over- and under-perform their ratings.

    This plot shows how players measure up on the first two style factors:

    Ostapenko and Samsonova in one corner, Sorribes Tormo (and Putitnseva, and Bogdan) in the other? This might actually make some sense! From left to right, we have a very approximate ranking of most aggressive to least aggressive, though with curveballs like Marie Bouzkova on the left side (hidden just to the left of Schmiedlova) and Garcia on the right.

    Top to bottom is harder to parse. There’s some correlation between these two style factors, so there’s a whisper of aggression level there. But Paolini on top? I wouldn’t have thought there was any attribute, positive or negative, where she would stand out so much from the crowd.

    Here are scatterplots showing the first and third factors:

    And the second and third:

    I don’t know, man. Iga is hidden in the middle graph because she’s so close to Tatjana Maria. That pretty much says it all.

    A game of matchups

    Here’s the thing–this should work, right? Players have strengths and weaknesses that don’t change too much over time. They are susceptible to certain types of opponents, and they feast on others. People talk like this all the time: It’s why they say tennis is a game of matchups. Coco Gauff struggles against this sort of player, her next opponent is this sort of player. Upset watch!

    It’s possible that my zany idea does work. We could use these clusters and similarity metrics to tweak the Elo prediction for each match and see whether the results improve. Incorporating head-to-heads in pre-match forecasts barely moves the needle, but that’s mostly because there are so few meetings between most pairs of players. Looking at clusters of players increases the sample size, even if there’s a cost in precision. While we might never figure out what these style factors mean, the proof would be in the forecasts. Maybe.

    I don’t have it in me to run that test, at least not this week. Let’s imagine that we did, and that we discovered this was all a worthless exercise. Here are some possible reasons why:

    • Players change too fast. This might be why Paolini is such an outlier: She’s barely the same player she was a few years ago. Any attempt to characterize her will struggle to reconcile 2021-Jasmine with 2024-Jasmine. And she’s hardly the only one to have made noteworthy changes. What’s more, pros are plenty aware of their weaknesses. The type of opponent that bedevils Mirra Andreeva this year might be the focus of an offseason training block.
    • Players are streaky. I suspect that Tauson registers as so unique because she has won and lost in bunches. She reeled off seven in a row in January, but not because she lucked into the right types of opponents. On the flip side, she lost six straight last summer. In both cases, the streak was more about her form than about the styles on the other side of the net.
    • There’s not enough data. In theory, each player’s “profile” is a set of 59 head-to-head scores. But in the last four years, fewer than three-quarters of the possible head-to-heads have been played. Of those, about 40% consist of just one meeting. I’ve given the one-meeting scores less weight, but that’s still a lot of room for noise to take over the profiles.
    • Tennis isn’t really a game of matchups. I’m not willing to go this far, but I do believe that the “game of matchups” business is overstated. For every lopsided Ostapenko-Swiatek head-to-head, there are dozens more boring ones, 2-1 in favor of the better player. The “matchups” line is invoked more often after a match is over, as part of a post-hoc narrative. Betting lines hew far closer to style-neutral Elo ratings than to any kind of matchup/style profile.

    Like I said, more questions than answers. I started this project with ATP data, and believe it or not, the results for men were even more puzzling.

    If you think you know why Sabalenka is grouped with Anna Blinkova and Nuria Parrizas Diaz, the comments are open.

    * * *

    Subscribe to the blog to receive each new post by email:

     

    ​Heavy Topspin


    Thanks for reading! Ready to elevate your game? Explore myAI Tennis Coach for AI-powered coaching and match strategies or check out my book, Stop Losing!, for winning tips. Don’t forget to explore our Live Scores page for the latest results and highlights. Stay tuned for more updates—see you on the court!