Making Sense of the Ever-Changing USL and its Relationship with MLS

As the largest league in US Soccer and functioning partially as a farm system for MLS, United Soccer League takes up an intriguing space in the domestic game. MLS clubs either set up their own developmental club or loan some of their players to a USL affiliate, making the lower league a preview of the top tier’s younger players. It also serves as an organizational preview, since the 29-club USL is facing scheduling challenges that MLS will likely face if they continue their current pace of expansion. Whether or not MLS is guiding USL on scheduling, they are likely gauging fan reactions to better anticipate their own course when they get than large.

The USL has nearly tripled in size during the last few years, and it can be hard to keep track of league-wide trends, even for fans of a specific USL club. The league can seem even more daunting for outsiders, like MLS supporters who want to keep tabs on their club’s USL affiliate. I’ve made the following visual guide to USL Conferences and clubs’ MLS affiliates.

The above project started as my attempt to improve upon a map USL released yesterday to illustrate their new conference alignment. It was quickly criticized on Twitter for being confusing, and from my perspective it was undone by arbitrary coloration of states. For my version I started with a USL wiki page that lists city, MLS affiliate, etc. for every USL club, then adding the names of every other US state and Canadian province to the dataset before loading it into Tableau. To define conference for all states and provinces (even those without a team) I used rectangular selection on the map. Then I drove coloration of them based on conference and set lighter hues for those that don’t have a club right now.

I hope you find this useful. If you have questions about the dashboard, or how I developed it in Tableau, please leave them in the comments section. If run a soccer-themed site or blog where this visual guide to the USL would be helpful to your readers, click Share in the bottom right of the dashboard, copy the embed code, and post it into the HTML input for your own post. Then add a comment with a link to the location of the post.

Steph Curry’s Unbelievably Effective Shooting

Steph Curry is currently having the best shooting season the NBA has seen in 30 years. Throughout NBA history, the only players who ever had season with Effective Field Goal Percentages in his current range were Wilt Chamberlain and Artis Gilmore, 7-footers who played in eras in which they were rarely defended by someone their size. His shot volume is unprecedented, too. The record for 3-pointers per game is Baron Davis in 2004-05 with 8.7, but Curry is launching 10.6 per game, and while Davis and most others in the top 10 weren’t making enough to justify that rate, there are arguments to be that Curry should shoot even more often.

I wanted to see what this actually looked like in terms of Curry’s shot locations and how far into the shot clock he’s shooting. So, I plugged his stats into Tableau and came up with a hex plot that bins his attempts into areas on the floor, as well as other charts around general shot ranges and seconds left on the shot clock. By default, none of the charts will show a data point in which less than 4 shots have been taken, but you can change that in the bottom left:

Note that you can click on a shot range or shot clock times to filter the hex plot of shot locations. For example, if you hold down shift and select all shots taken within the first half of the shot clock (66% of his shots), you get the following result:
curry within 12 seconds
If you get too specific in those selections, you may want to change the Minimum Shots threshold set in the bottom left, as you’ll find few areas of the court with 4+ shots taken if you’re narrowing the focus to only 3-pointers taken with 0-2 seconds left on the shot clock.

While there are a few areas on the floor in which Curry’s Effective Field Goal Percentage falls below 50% (roughly the overall league average, and the midpoint of the color scheme on all the charts), the overall picture is almost unbelievable, as 75% of the data points represent an Effective FG% above 50, many of them well beyond 60% or even 70%. That’s absurd.

Effective Field Goal Percentages is more useful than standard Field Goal Percentage in conveying Curry’s accomplishments because it normalizes shots based on how many points they are worth. Shots from three point have a 150% the contribution of other shots in the calculation of Effective FG% just as they do in the game. Without that adjustment, field goal percentage looks markedly worse for players who routinely attempt threes, even though a good distance shooter is an immensely valuable commodity.

The other interesting aspect I alluded to before is the layout of Curry’s shots within the shot clock. As the primary shooter in an offense designed to get good shots off quickly, it is remarkable that 77% of his attempts came before the shot clock reached single digits. I do wish I had league-wide shot clock/location stats as points of comparison, but even without those figures it’s obvious that we are witnessing a truly remarkable season from Steph Curry right now.

2015 Cell Phone Buyers Guide (and how to turn Tableau custom shapes into custom buttons)

Like many people right now, I’m nearing the end of my final cell phone contract. However, it’s becoming far less common to opt for a heavily discounted handset upfront only to pay a great deal extra over the 24-month commitment. As I looked over potential phone purchases, I was overwhelmed by the diversity of options, many of which get very good word of mouth and reviews. I figured that many others are also looking to spend some of their Christmas money on a new cell phone, so I made a visualization to aid in surveying options quickly.

If you hover over any items in the dashboard (bars above, phone icon scatterplot below, or the processor, RAM, or external memory categories in between) you will highlight corresponding marks in every other dashboard element. I would recommend using this feature to select price points and other features that would consider prerequisites for your cell phone before using the check marks to the left of the bar chart to eliminate any phones you aren’t interested in.

The drop down menus above the scatter plot then allow you to chart your chosen contenders based on the measures that matter most to you, such as average review across popular internet sites (as compiled by Engadget), megapixels by camera, screen sharpness (PPI), etc. Of course, your decision on a handset will probably come down to other features, as the data in this dashboard is hardly all-inclusive. I found it useful to use these data to narrow the field before diving into issues like speaker quality, waterproofing, and just the general feel of the phone in my hand. I hope you find value in it, too.

I’d be remiss if I didn’t mention that there are other cell phones worthy of consideration as well. LG’s $650 V10 seems to be replacing the G4 as that company’s flagship, but I couldn’t find an aggregation of it’s reviews on Engadget (it was probably released too recently). I didn’t give separate line items to Samsungs S6 Edge-and-or-Note variants either, but the specs are close enough that you can gauge general S6 interest from the charts than choose for yourself if those extra features are worth the money to you. I limited the budget offerings above to the critical-consensus in that space, the 3rd Generation Moto G, but if you’re looking to only spend around $200, there are offerings from Allcatel, Asus, and Blu that you may want to look into.

Personally, after looking at this data and then researching my finalists in more detail, I couldn’t resist the Moto G. Yes, price is the only objective category in which it tops the list, but it’s remarkable that Motorola put out a phone that’s less than half the price of its competitors here which is somewhat competitive in everything but pixel density and storage space. I’ll use a 32 GB microSD card (the newest Android OS promises to integrate SD storage much better than before) to mostly fix that last issue, and I can live without the sharpest screen. Actually, a good-enough screen might be smarter than going full 4K, because the more impressive screens naturally drain batteries much faster, and the Moto G’s battery life is supposedly second to none. Also, it’s certified as waterproof, which may well come in handy as my 2 pre-school sons have been known to put small electronics into bad situations.

Of course, there are personal preferences at play here, and each of the 10 Android devices in this dashboard is probably the best fit for someone reading this (and Apple fans are generally quite happy with their iPhones regardless). I hope this guide is useful for some of you in at least narrowing down your choices to 2 or 3 handsets before taking a more subjective approach in your final choice.


As usual, I’d also like to give you a little guidance on some of the Tableau techniques I used in building this dashboard. This dashboard came from very small data, as I simply copied a table from Wikipedia’s Comparison of Smartphones page, looked up off-contract price on the phone’s official site and average review as compiled by Engadget. The format of the tables was a bit awkward, but consistently so. For example the Nexus 6P’s display was listed as “5.7″ 2560 x 1440”, and the other phones followed the same format. This is a perfect place for Tableau’s split capability. You just right-click on a dimension, choose Transform then Split, and see what Tableau does. If the standard split doesn’t get what you need, choose Custom Split to define it in a better way, and if the output is numeric, edit the field that resulted from the split and wrap it in the FLOAT function. As with any of my work (and just about anything else on Tableau Public), feel free to click download in the bottom corner of the dashboard to dive into it as a developer.

So, that’s the data shaping work that went into this viz. The formatting trick that probably stands out at first is the visual cell phone theme. I achieved it by floating 1) a cell phone shape over the entire 2) a horizontal container as header with text, a screen shot of icons from the top of my own Android and 3) the time, generated by the following calculated field:

IIF(DATEPART(‘hour’,NOW())>12,
STR(DATEPART(‘hour’,NOW())-12)+”:”+STR(DATEPART(‘minute’,NOW())),
STR(DATEPART(‘hour’,NOW()))+”:”+STR(DATEPART(‘minute’,NOW())))

3) a vertical container with all of the other elements that occupies the blank space inside the cell phone shape.

However, the biggest formatting trick (and one that’s highly applicable to other dashboards) is central to the scatter plot at the bottom. I started with custom shapes for each company that makes cell phones (as detailed in Tableau’s own article on custom shapes). However, to my eye custom shapes can look pretty messy in a scatter plot. Instead I wanted little buttons with custom shapes in the middle. Thankfully, this is pretty easy to do, as you just have to set up a dual, synchronized axis with one’s Marks card set to custom shapes and the other on circles. Some might prefer putting the circle in the back with whatever color they want, but usually I like to park the circle in front, then set it to fully transparent and put a black border around it (which isn’t affected by the transparency). To further illustrate this, here’s another simple scatter using the same concept with fictional Super Store data:

The Planets of Star Wars

Tableau Software created a Star Wars Web Data Connector, and in the release week of The Force Awakens I couldn’t resist the opportunity to dig in and make a visualization. I was drawn to the Planets subset of the data, and came up with the above dashboard. Two visual approaches for the available data seemed obvious: diameter driving planet size (I know, area would be more accurate, but if I put pi*r^2 on the Size shelf everything but Bespin becomes absolutely miniscule), and a figure on each planet sized to represent population.
I wanted a Star Wars crawl for the instructions, and I made one through a feature on starwars.com, but I couldn’t get it to work well as a web object in the dashboard, but you can click here to see the scroll I created, or make your own (it will even play the Star Wars theme). Instead I just played with font sizes to put a faux scroll in the dashboard.
The Tableau tip I can pass along from my creation of this dashboard (outside of formatting to remove almost all lines) is that I used a parameter to allow users to change the measure driving the axis (as described by Tableau here) and as an added bonus I made a reference line for Earth’s value in that measure, through the following CASE statement:

CASE [Choose axis] WHEN “Rotation Period” THEN 24
WHEN “Earth Similarity” THEN NULL
WHEN “Surface Water%” THEN 71
WHEN “Diameter (already driving size)” THEN 12475
WHEN “Orbital Period (already driving color)” THEN 365
WHEN “Population (already driving size of man on planet)” THEN NULL
END

Earth’s population, even 4.2 billion in 1977, was much larger than any of the listed planets, so I left it out of the reference line field, and its Earth similarity score would be pretty much infinite, so that was out, too.

The Driving Offensive Forces of the 2015 MLS Cup Finalists

Sunday’s MLS Cup Final should be an entertaining affair, as the potent attacks Portland and Columbus will hopefully overcome the tendency for Cup Finals to be cagey, cautious affairs. Much of that expectation comes thanks to the five players profiled above. The shape of their contributions can be very different, as evidenced by the very statistical profiles of their major playmaking forces, Federico Higuain (for the unfamiliar, yes he’s Gonzalo’s older brother) and Darlington Nagbe. Higuain’s role in an offense designed around overwhelming, quick attacks often has him very directly involved in the actual act of scoring. Meanwhile, the evidence of Nagbe’s influence is more subtle. His ability to retain possession forces opponents to foul him and allows the Timbers to more methodically cut up the opposing defense.

The strikers, on the other hand, have had somewhat similarly dominant scoring contributions. As detailed in the Strikers page above, Ethan Finlay’s assists and lower number of shots make him distinct from the big African strikers, but the numbers alone don’t draw much difference between those two. Watching them may unearth distinctions in their playing style, with Adi’s dominant, sometimes methodical hold-up play contrasting to some extent with Kamara’s quicker transitions standing out.

Of course, none of this happens in isolation, and others will have certainly have big impacts on the match, both positive and negative, but it’s fair to expect that one of the above players will be enormously influential in the moments that decide who will lift MLS Cup on Sunday.

On the Tableau development side of this, the table calculations required to graph a calculated metric across a rolling half season were a little on the tricky side, and I also used LODs to generate the reference lines with season averages for goals + assists per 90 minutes. The main issue I’d like to make sure you’re aware of is that it is not a good idea to run a table calculation of a calculated field (or an inherent percentage field). You will only get accurate results if you run the same table calculation function for the individual elements of that calculated field.

MLS and Serie A Wages: Closer Than You Might Think, but Structured Very Differently

The four pages in the above dashboard present a few perspectives on the recent MLS and Serie A salary releases (big thanks to Falvio Fusi for gathering the Italian data for his own Serie A visualization, then letting me use them and acting as a sounding board for my viz design). I designed these visualizations to pretty much speak for themselves, but I do want to take just a few hundred words to talk through a few other insights that aren’t obvious in the charts as well as some significant caveats.

As the fourth page above lays out, the differences in roster building in these two leagues goes well beyond total spending. Because of MLS’ salary cap and designated player rule, teams must have unbalanced rosters if they want to spend big, while those with smaller budget, like league-leading New York Red Bulls and FC Dallas, have less variation in pay across their rosters than the only Serie A club who spends anywhere close to their level, Frosinone.

The differences between big spenders in MLS and Serie A are far more dramatic. Between the two leagues, MLS employs 10 of 14 highest-paid players, but only 25 out of the top 200. That top 200 is dominated by Serie A’s richest clubs, which gets to the other big difference between the two leagues. Those seven, Juventus, Milan, Inter, Roma, Lazio, Napoli, and Fiorentina, outspend the dregs of their league more than the big spenders in MLS overpay relative to their more frugal table-mates. Big spending doesn’t buy points nearly as effectively in MLS either, but that’s better discussed with data for wages, points, and goal differential that spans multiple years for both leagues (hopefully coming soon).

Beware of diving into either the MLSPU salary release or La Gazzetta’s listing of Serie A wages on a terribly granular level, though (I’m looking at you, myriad “Most Underpaid/Overpaid Players in MLS” articles). The accuracy of the MLSPU release has been derided by MLS coaches, owners, etc. over the years, and it’s hard to think of anything that comes from a European paper prone to publishing erroneous transfer rumors as God’s honest truth. That’s why I don’t make it terribly easy to look up a specific player’s wage in any of the wage visualizations. Of course, mixing data data from two completely separate sources is risky, too. Sure, there’s still some garbage-in-garbage-out risk in using these data at all, but I’m hopeful that rolling it up to clubs and binning the salaries into histograms points us toward useful insights.

However, that doesn’t mean that this data isn’t worth analyzing. MLS’ stated goal is to become a top league in relatively short order. Depending on your definition of “top league,” this probably means surpassing Serie A, the Eredivisie, or Ligue 1 (let’s just assume the Premier League, Bundesliga, and La Liga are totally out of reach). The impression I’m left with in viewing this data is that MLS is an enormously long way from surpassing Serie A’s big spenders, and would need aggressive improvements to wages on the low end of MLS rosters (filled by players worthy of those salaries) to have rosters that are of a similar quality top-to-bottom of clubs in the lower half of Serie A. That last bit seems attainable, but the length of time and means required to get there are hard to decipher.

2015 MLS Salaries Visualized (& You Can Add in Hypothetical Signings)

Whenever the MLS Players Union releases player salaries, I use Tableau Public to visualize them, augmenting each one with new Tableau tricks that I’ve picked up in the months which separate these releases. Another release came out a couple weeks ago, but various obligations prevented me from visualizing it right away. In the meantime, Didier Drogba was signed by Montreal, Shaun Wright-Phillips joined his brother in New York, Nelson Valdez landed in Seattle, among numerous other signings around the league.

This sparked an idea: I added in the ability for readers to insert these signings or others of their own invention. At the bottom of the dashboard you will find a place to type in first name, last name, guaranteed compensation, and choose an MLS club for Player 1, with Drogba at $4.5m (figure I found from some rumor somewhere) by default, Player 2, and Player 3.

As you’ll notice, the players you insert into the chart will be in black, while the rest or in shades of green based on their base salaries, with the darkest hue reserved for those above the current Designated Player threshold, $436,250. Every time you insert or remove a created player from a club, the chart will automatically re-sort the rosters based on wages.

There’s also a filter allowing you to flip back and forth between the 2014 and 2015. Clubs re-sort themselves based on wages when you enact this filter, too. If a club, or even players, are selected at the time you change years, the selections will carry over and show you where they were in the other season. Your created players won’t appear in 2014 though, since there isn’t much point in inserting hypotheticals into a year that has already passed.

Feel free to use all of this interactivity, then take screenshots to illustrate points on Twitter, mentioning me @StatHunting if you’d like me to see your particular usage of the dashboard.

How To Do Something Like This in Tableau

A core truth in Tableau is that you can only work with the data available to you. I consider this a strong feature for the most part, because it makes it harder for someone to manipulate data in order to support their particular viewpoint. What about when you want to ask hypotheticals like we did above, though? That is one of the many use cases for parameters in Tableau. Parameters a very powerful way to give users a portal which they can use to input values or at least choose from a list. In this case, I created 4 parameters, then triplicated them, so that users could add three players to MLS clubs of their choosing and assign them a salary. The parameters look like this (I didn’t include Last Name in these images because it’s basically identical to First Name):player parameters

Then, as with all parameters, I needed a way for them to relate to my data source. In this case I needed to replace particular cells in my data source with these parameter values. I could have found an existing row in my data that was otherwise irrelevant (such as a “pool” player that isn’t assigned to a particular club), but in this case I chose to create some extra rows in my Excel data source:

Club Last Name First Name Pos Base Salary Compensation season
Unknown1 Unknown1 Unknown1 Unknown1 -0.001 0 2015
Unknown2 Unknown2 Unknown2 Unknown2 -0.001 0 2015
Unknown3 Unknown3 Unknown3 Unknown3 -0.001 0 2015

Every time you use a parameter you have to plug it into your data via a calculated field. In my case I made 4 different calculated fields with the following format:club calcFirst Name*, Last Name*, and Compensation* where all organized the same way, simply changing the parameters on the right and the last use of the data field after “ELSE.” In english whe’re telling Tableau that when Club is  “Unknown1,” then  replace it with Player 1 Club, the same for “Unknown2” and “Unknown3,” and when that’s not the case, leave it as the value of Club in that row.

The last part is simply using Club*, First Name*, Last Name*, and Compensation* where you would their original fields. If you’ve already built something, use Tableau’s Replace References feature, a great shortcut for situations like this. For example, you right-click on Club in the Measures pane, select “Replace References…”, click “Club*” and OK, then every example of [Club] in calculated fields, shelves, etc. will be replaced with [Club*]. Go back to your calculated fields after doing so, as you don’t want First Name*, Last Name*, and Compensation* to start with “CASE [Club*]” instead of  “CASE [Club].”

To make the parameter players stand out a little more, I set base salary to drive color in a custom diverging, black and green scheme, the mid-point set at $0 and using the full color range. Because the base salaries in our Unknown1-3 rows were all -0.001, they will be the only data points that show up as black, while the others’ hues are driven by how close they are to the 2015 cap maximum (aka DP level), $436,250.

 

I hope MLS fans enjoy the visual above (please tweet fun uses of it to @StatHunting), and that some other Tableau developers out there can find other use cases for this particular Tableau hack.

The Medal Count of USA Women’s Soccer Across Major Competitions is Nothing Short of Dominant

The Women’s World Cup, which ended last night, has a sister competition in the Summer Olympics, which will take place in Brazil roughly 13 months from now. These two tournaments pull competitors from the same pool of countries, their histories begin roughly at the same time (the inaugural Women’s World Cup was in 1991, while women joined Olympic soccer in 1996), and have very similar structures, so here’s a visualization of all-time medal totals:

As you can clearly see, the US national team has as many gold medals in both tournaments than any other country has total medals.

Tableau Hack: Avoid “*” in Tooltips & Other Interactive Text, Using LOD Calculations

When Tableau 9.0 was released, I looked at Level of Detail calculations and thought, “I’m sure this adds a lot of capabilities to Tableau, but I just can’t wrap my mind around them, yet.” Then I discovered that this new functionality allowed me to avoid what I feel to be one of the least intuitive parts of Tableau for end-users: the tooltip asterisk. When you create interactive text in Tableau, like a tooltip, and the Dimension driving it has more than one response, Tableau simply displays an * in that space.

In the past, I smart dashboard designer would either 1) organize their dashboards or data sources so that they were sure these *s would not pop up, 2) include text explaining what the * meant, or 3) went through very complicated calculations that mimicked LOD functionality that didn’t exist yet. Now in two simple calculated fields you can get Tableau to either display the single value for a field or give you a count of unique values present in the data.

If you’re confused, here are screen capture examples from a dashboard I published earlier this week (full dashboard at the bottom of this post):

tooltip no asteriskIf I had simply put the [Club] dimension into that spot in the Tooltip editor, Nagbe’s would have read “POR,” while Amarikwa’s would have simply displayed an *. Since 160 of the 366 players displayed on this graph played for multiple clubs during that timeframe of my data source (2011-present), this would be far from ideal. Instead I wrote the following calculated fields:

calc fields no asteriskLet’s dive into the first one, since this is the new LOD calculation type. It’s syntax is markedly different from many Tableau calculations, but a simple way to think about the FIXED LOD calculation is that if you want the sum of [A] per [B], then you’re calculation will read: “{ FIXED [B] : SUM([A]) }. In the case above we were looking for the distinct count of clubs per player and so my calculation was: { FIXED [Name.Href] : COUNTD([Club]) }.

True, I could have built this as one longer calculated field, but I thought clubs per player might come in handy outside of tooltips, so I left it on its own then used it in my “no * club field” calculation. This one used three functions in its own right. The uses of ‘+” ‘ are Tableau’s way of requesting concatenation (stringing together different text elements, for those who don’t speak Excel). Tableau concatenation doesn’t allow the mixing of strings and numbers, which is the reason that the second instance of [clubs per player] is wrapped in the STR formula, which changes its data type to string, to match ” Different Clubs” within the if and only if statement here.

This may seem like a lot of trouble for a tooltip element, but keep in mind that you could use this hack for any interactive text element on a dashboard. Since you would rarely want “*” in a text field, this approach opens up some important possibilities. If your data is structured in a certain way, this approach could very well be the most expeditious way to summarize an important dimension in an interactive way, which can be very valuable.

Here’s the full dashboard from the example above:

MLS Foul Magnets on Both Sides of the Ball

 

Whistles are among the more common occurrences in soccer, but it’s easy for fans to lose track of which players tend to most often be on either end of physical play. Who tends to commit the fouls and who tends to suffer them can be important to understanding the dynamics of a match though, especially if the referee is willing to enforce persistent infringement.

I took data from MLS’ own statistics pages going back to the 2011 season MLS’ own statistics pages using an API service called Kimono. In the resultant dashboards it’s clear to see that different positions have very different foul patterns. Unsurprisingly, keepers grouping is the most distinct, as the commit almost no fouls whatsoever, and they commit only a few. Of course, a keeper committing a foul is more costly than anyone else, as location dictates that many of the rate infractions that they do get called for are likely to result in a penalty kick and/or red card.

All other positions have a wider spread, but also a much more varied tactical assignments. Lumping all defenders together is certainly suboptimal, as many of them rarely enter the attacking third outside of set pieces, while others regularly overlap midfielders far upfield. Imperfect as these standard, basic positional designations are, they are clearly different foul tendencies between defenders, midfielders, and forwards. Beware drawing conclusions on the in-between positions, as F-M and M-D are the smallest sample size groups. Filtering them in alongside other positions is probably more helpful than isolating those intermediate positions. I also made a small multiples version to make it easy to compare positions side by side:

None of this is to say that foul data absolutely indicates overall player quality, but it must be beneficial for an attack when someone who routinely enters the attacking third is also regularly drawing fouls, as has been the case for Mauro Diaz, Darlington Nagbe, Benny Feilhaber, Javier Morales, and (in smaller samples) Matias Perez Garcia, Sebastian Giovinco, and Tommy Thompson. Outliers among defenders include Damarcus Beasley (who draws fouls more often than any other MLS defender while committed fouls at one of the lowest rates), Abdoulie Mansally (2nd to Beasley in fouls suffered rate for defenders, but also second in fouls committed per 90), and Greg Cochrane (easily the least foul-involved outfield player in MLS).

I’ve included a lot of instruction inside the dashboards, so I’ll keep my writing here brief. Later this week I plan to write some how to guide on a few different Tableau techniques I employed when putting this together. I’m most excited to tell you how I banished the asterisk from tooltips using a Level of Detail calculation. For now, enjoy these fouls dashboards, and please let me know if you have a particular question on how I did certain things in Tableau.