Salary Disparities Between and Within 2016 MLS Clubs

As in most sports leagues, MLS club payrolls vary greatly and the stars on any one roster make many times more than their youngest, least-proven teammates. You see similar dynamics in baseball, basketball, football, and hockey, but the scale of the issue is very different in those sports’ major American leagues, as their minimum salaries all hover around half a million dollars. Major League Soccer on the other hand has 52 of their 555 players making less than $60,000, and 124 making between $60k and $70k. While this is a marked improvement over David Beckham having teammates earning less than $20,000 (in Los Angeles!) when he entered the league in 2007, it’s still odd to see millionaires sharing locker rooms with guys making not a whole lot more than the Unite States’ national average wage.

There is a lot of interactivity built into the above chart, most of which I’ve tried to make intuitive, but I want to spell it all out here. Hover over a player for a summary of his wages and when click of him the league logo in the bottom right will become his club’s logo and the list that club’s total base salary and guaranteed compensation.
There’s a parameter under the chart’s title allowing you to switch between base salary and guaranteed compensation driving size, sorting, and coloration of the visualization. The color scheme, by the way, is there to reinforce the
Hover over the MLS Players Union logo and you’ll see they’re description of the dataset that feeds these charts. Click on it and you’ll be presented with a hyperlink to the data source.
Finally, and most subtly, you can right-click on any player and choose to exclude him. The effects of these changes will flow into the sorting of the chart of the league/club totals displayed at the bottom.

Beyond simply visualizing these disparities, salary data from the MLS Players Union reflected above are noteworthy for how little they drive results in this league. While most soccer leagues are oligarchical, dominated by and large by huge spenders with downmarket usurpers, like Leicester City in the 2015-16 Premier League, rare exceptions to the rule. Meanwhile, smart team-building is generally a more important factor than big spending in MLS, a trend which appears to be continuing in this young season, and FC Dallas, Real Salt Lake, and the San Jose Earthquakes are vying for the top spot in the league with below-average wages. I’ll likely revisit studies wages vs results studies later in the season (it’s far too early to say anything definitive based on 2016 results), and maybe look at impacts on attendance, too.

Role of Shot Location in Premier League Keepers’ Shot Stopping Ability. Interactive Viz of the Day

Basic goalkeeper statistics are too simple to be repeatable or useful. It is easy to look up a keeper’s save percentage or goals against average, but in the end those stats are so heavily reliant on quality of shots on goal faced that they are not a good indicator of keeping skill. Instead, let’s look at Premier League keepers in sample sizes that go beyond single seasons and focus on the role of shot locations, as well as those of the resultant saves and goals. The charts below default to all shots faced by keepers 2010-present in the English top tier. All hexagons are sized based on the volume of shots, saves, and goals. The visualization really comes alive once you click on either a) the name of a keeper to focus both charts on his shots faced and their outcomes and/or b) a zone in the top chart to see only save and goal locations below that came from shots that area.

What do we make of all this? First, while I’ll note some of my takeaways below, this dashboard is a representation of Opta data (mined by Paul Riley, @footballfactman on Twitter) containing over 150,000 shots of goal, and there are over 500 combinations of keeper and shot zone that can be selected here. I invite readers, especially those who have focused on a particular keeper’s club in the 2010s, to explore perspectives relevant to their interests and offer their own interpretations in the comments below, Twitter, reddit, or their own blogs. Please mention me on Twitter click to see my Twitter account for the last 3, as I really look forward to seeing others’ interpretations of this data. That said, here are my thoughts after sifting through these data while creating the dashboard:

1) A Keeper’s chart is an invitation for further investigation, not a final judgement.
While some keepers come across as more impressive (Adrian, with only one significantly below-average zone) than others (Wayne Hennessey, with 8 bad areas), it is probably more productive to view keepers in terms of opportunity for improvement and apparent relative strengths. For example, let’s compare some keepers in the middle range of the metric driving the bar chart on the left, Pepe Reina and Hugo Lloris:
Dashboard 2 (12)Dashboard 2 (11)
Here we have keepers with “bad” zones that never overlap. What do you make of these zones in which the keeper’s save percentage falls below average? Their own coaches might focus film study on those areas to see if the keeper has exhibited bad positioning tendencies that cost goals when facing shots from there (Reina in particular seems to have one particular angle to his left that was consistently problematic). If so, they can orient training around correction of the issue. Opponents might gameplan to specifically target the keeper’s “weaker” positions (Lloris is so strong near goal that Spurs opponents might not want to stress so much about working hard to create prime opportunities and instead let fly as soon they get an opening inside the box), though this could well be easier said than done.

2) When not shooting from straight on, far post shots seem to produce more goals.
While overall we see goals being scored in a largely symmetrical fashion, if you focus on shots to the left or right, keepers seem to have a harder time protecting the far post than the near post.
shots keeper's leftshots keeper's right
Note here that the data driving this presentation are limited to on-target attempts, so further study may be needed to discover whether strikers miss the goal mouth more often on far post shots than near post ones.

3) Major differences from the norm are particularly important.
On this topic I can’t help but pick on Wayne Hennessey, and by extension Crystal Palace.
The highlighted hex differences in the top chart are common among all keepers, as no one aligns very well with shot volumes or save percentages from all locations, but the differences in goal locations is quite unique to Hennessey and his club. Why is he allowing so many goals towards the center of the goal mouth? This data doesn’t include keeper positioning, speed of buildup, etc., so it’s hard to say if this trend is more a reflection of Hennessey errors, or his defense forcing him into extremely disadvantageous situations. As I mentioned earlier, sometimes exploration of these data mostly lead to asking a smarter, more focused question than providing irrefutable answers.

Again, do any patterns stick out to you while exploring these keeper trends? Insights from those who have followed a few seasons of a particular keeper’s club(s) or who want to delve into spinoff video analysis would be particularly interesting, as I have been looking at this from a much broader perspective. I’d love to read your thoughts in the comments below, Twitter, reddit, or your own site.

2016 Major League Baseball Team and Player Salaries, Visualized

Baseball has the best numbers. This is rather obvious when you look at the nature of the game in comparison to other team sports, as America’s Pastime is at its core a succession of individual pitcher-batter confrontations that are always carried out under extremely similar conditions. It’s a little more surprising that their salary data is better than other sports, though. USA Today has been compiling Major League Baseball salary data since 1988! In the last few years they have even begun to include the start and end date of a player’s current contract, with average salary over the life of that contract. In other sports leagues, salary data either gets leaked out by a players union then disputed by league officials or an enterprising journalist has to grill his contacts behind closed doors to get figures.

I took the 2016 MLB salary data from USA Today and charted them below. The boxes are all sized by current salary, while they are colored by average salary during the player’s current contract. When you see a small box with a dark shade of green, it means that either that this player will be paid much more in later years of his contract (i.e. Giancarlo Stanton of the Marlins) or is in the closing years of his contract with a salary that’s already falling a bit (the Yankees’ Alex Rodriguez is a good example). Hovering over any player will display a card with all of their contract details. The AL and NL tabs will give you a closer look within those leagues, and the Current vs. Average tab is a deeper dive into the relationship between current salaries and those laid out over the rest of the current contract. On any page, click on a team name or logo to go a tighter focus on their spending.

I tried to build interactivity into this so that baseball fans can dig into it however they see fit. I won’t expound on it all that much myself because as much as I enjoy baseball, I don’t dig into MLB anywhere near enough to feel like I can say anything terribly valuable about roster dynamics. Hopefully those that do have that knowledge will find the above tool useful for illustrating their thoughts on 2016 MLB salaries.

Fool’s Gold: But What If Klinsmann & The USMNT Had Lost?

After the US Men’s embarrassment in Guatemala last week, calls for the firing of Jurgen Klinsmann got, well just as fevered as they have been a few times during the last few years:

Sources have told me that US Soccer Federation held precautionary discussions around appropriate Klinsi replacements, and were ready to act had the team not rebounded on home soil. While that win on Tuesday saved Jurgen’s job for the time being, should the USMNT not show real improvement at Copa America, they will put the reigns in a manager’s hands currently in charge of a pro club here in the USA.
So, now we’re left with the hypothetical of what would have happened after a Tuesday loss or draw, and what might happen should the US lay an egg while hosting a significant tournament. Looking at the full US Soccer Pyramid, who would be most qualified? Opta data can provide us with a clear answer. There are a few simple statistics which we can collate into a single metric measuring coaching ability – this an assumption merely for the sake of this piece.
First, the USMNT needs a coach with motivational abilities, represented by distance covered, “r” in the equation.
Next, Opta passing statistics easily show us passing percentage allowing us to quantify ability to foster team chemistry, which “a” signifies.
To measure managers’ ability to inspire discipline, an inverse of fouls committed are signified by “d.”
We also need to account for defense, so blocked shots and saves are added up and signified by the single metric “o.”
Finally, the factor which pundits mention most often: possession, represented in our equation below as “m”)

The following dashboard visualizes the results. The word cloud and the bubble chart are the primary views, with a bar chart and pie graph illustrating the top 3 managers and each league’s total pool of coaching talent, respectively.

As you can see from the word cloud, Bruce Arena is clearly still the best manager in US Soccer.
Let’s just call him the Once & Future Boss. Colin Clarke and Oscar Pareja are the only ones that come close.
Most of the best managers are in MLS, as you would expect, but there are a few men in charge of lower-division sides, such as Clarke, who are more skilled than their top-tier comparables.
In fact, every tier has some coaches decidedly worse than those with the best BAPS scores in the tier just below them, and even some in the barely-professional leagues, PDL and NPSL, are better than Pablo Mastroeni. If MLS doesn’t start getting better managers soon, these superior lower-division managers, like Clarke, Sidd Finch, and Loof Lirpa, will upgrade their sides so much that the US Soccer cartel will no longer be able to avoid a structure of unlimited clubs in a promotion and relegation system.

Interactive Schedules for Every Club in the Top 3 US Soccer Leagues

When a sports league releases printed out schedules to their fans, it is generally condensed to a visual calendar that clearly and quickly conveys the full season at a glance. Unfortunately, online these same schedules tend to be organized as a long list that requires fans to scroll through or use control-F to find particular games they are interested in.

Below are the schedule for clubs within the top 3 soccer leagues in the US inspired by traditional schedule calendar handouts, but with the added interactive benefits in that hovering over an opponent will 1) highlight all other fixtures against that club and 2) pop up details about that specific fixtures. Major League Soccer kicks off this Sunday and the lower US soccer leagues, NASL and USL, also begin their seasons about a month from now.

Numbers to the left of every fixture represent day of the month.

Each league defaults to its 2015 champion. Use the navigator at the top to switch leagues and the dropdown above the league champion’s crest to switch the focus to a club of your choice. This format should make it easy to see periods that are particularly heavy or light on home fixtures. For example, don’t expect Toronto FC to start the MLS season strong, as their ninth match is the first one they will play at home:
That TFC roadtrip is pretty extreme, but it matches a general trend in MLS scheduling as they generally frontload early matches to be hosted mostly by clubs in the south, with those teams going on the road more often during the summer. It’s awkward, and it tends to make those clubs run hot and cold (pun intended) on predictable cycles. At the same time, this arrangement does have friendlier climates for matches as much as possible, which hopefully leads to more attractive play and higher tickets sales.

Feel free to take a screenshot, or generate an image using the Download link in the bottom right then save it for later reference or park it on your own blog. If you would really like an interactive version that defaults to a club of your choice, reach out to me on Twitter, @StatHunting. I could even quickly include a link an MLS club’s USL affiliate or vice versa.

On the fourth page above there’s also a chart tracking fixtures per month for all 3 leagues (regular seasons only). The general order of fixture congestion makes sense because of the size of these leagues, as USL has 31 clubs, MLS has 20, and NASL will start its season with 10 and end with 11. You can also see that MLS and NASL have opted to avoid to overlapping their fixtures with the high-profile Copa America and Euro tournaments in June, while USL will barrel through, scheduling 65 matches during that month.

I hope you find these useful. As much as I travel for business, I will definitely be using them to quickly figure out if there’s a match happening while I’m visiting a particular city. If you have any questions about the calendars, particularly how I built them in Tableau, please reach out to me on the comments or (for a quicker response) on Twitter.

Making Sense of the Ever-Changing USL and its Relationship with MLS

As the largest league in US Soccer and functioning partially as a farm system for MLS, United Soccer League takes up an intriguing space in the domestic game. MLS clubs either set up their own developmental club or loan some of their players to a USL affiliate, making the lower league a preview of the top tier’s younger players. It also serves as an organizational preview, since the 29-club USL is facing scheduling challenges that MLS will likely face if they continue their current pace of expansion. Whether or not MLS is guiding USL on scheduling, they are likely gauging fan reactions to better anticipate their own course when they get than large.

The USL has nearly tripled in size during the last few years, and it can be hard to keep track of league-wide trends, even for fans of a specific USL club. The league can seem even more daunting for outsiders, like MLS supporters who want to keep tabs on their club’s USL affiliate. I’ve made the following visual guide to USL Conferences and clubs’ MLS affiliates.

The above project started as my attempt to improve upon a map USL released yesterday to illustrate their new conference alignment. It was quickly criticized on Twitter for being confusing, and from my perspective it was undone by arbitrary coloration of states. For my version I started with a USL wiki page that lists city, MLS affiliate, etc. for every USL club, then adding the names of every other US state and Canadian province to the dataset before loading it into Tableau. To define conference for all states and provinces (even those without a team) I used rectangular selection on the map. Then I drove coloration of them based on conference and set lighter hues for those that don’t have a club right now.

I hope you find this useful. If you have questions about the dashboard, or how I developed it in Tableau, please leave them in the comments section. If run a soccer-themed site or blog where this visual guide to the USL would be helpful to your readers, click Share in the bottom right of the dashboard, copy the embed code, and post it into the HTML input for your own post. Then add a comment with a link to the location of the post.

Steph Curry’s Unbelievably Effective Shooting

Steph Curry is currently having the best shooting season the NBA has seen in 30 years. Throughout NBA history, the only players who ever had season with Effective Field Goal Percentages in his current range were Wilt Chamberlain and Artis Gilmore, 7-footers who played in eras in which they were rarely defended by someone their size. His shot volume is unprecedented, too. The record for 3-pointers per game is Baron Davis in 2004-05 with 8.7, but Curry is launching 10.6 per game, and while Davis and most others in the top 10 weren’t making enough to justify that rate, there are arguments to be that Curry should shoot even more often.

I wanted to see what this actually looked like in terms of Curry’s shot locations and how far into the shot clock he’s shooting. So, I plugged his stats into Tableau and came up with a hex plot that bins his attempts into areas on the floor, as well as other charts around general shot ranges and seconds left on the shot clock. By default, none of the charts will show a data point in which less than 4 shots have been taken, but you can change that in the bottom left:

Note that you can click on a shot range or shot clock times to filter the hex plot of shot locations. For example, if you hold down shift and select all shots taken within the first half of the shot clock (66% of his shots), you get the following result:
curry within 12 seconds
If you get too specific in those selections, you may want to change the Minimum Shots threshold set in the bottom left, as you’ll find few areas of the court with 4+ shots taken if you’re narrowing the focus to only 3-pointers taken with 0-2 seconds left on the shot clock.

While there are a few areas on the floor in which Curry’s Effective Field Goal Percentage falls below 50% (roughly the overall league average, and the midpoint of the color scheme on all the charts), the overall picture is almost unbelievable, as 75% of the data points represent an Effective FG% above 50, many of them well beyond 60% or even 70%. That’s absurd.

Effective Field Goal Percentages is more useful than standard Field Goal Percentage in conveying Curry’s accomplishments because it normalizes shots based on how many points they are worth. Shots from three point have a 150% the contribution of other shots in the calculation of Effective FG% just as they do in the game. Without that adjustment, field goal percentage looks markedly worse for players who routinely attempt threes, even though a good distance shooter is an immensely valuable commodity.

The other interesting aspect I alluded to before is the layout of Curry’s shots within the shot clock. As the primary shooter in an offense designed to get good shots off quickly, it is remarkable that 77% of his attempts came before the shot clock reached single digits. I do wish I had league-wide shot clock/location stats as points of comparison, but even without those figures it’s obvious that we are witnessing a truly remarkable season from Steph Curry right now.

2015 Cell Phone Buyers Guide (and how to turn Tableau custom shapes into custom buttons)

Like many people right now, I’m nearing the end of my final cell phone contract. However, it’s becoming far less common to opt for a heavily discounted handset upfront only to pay a great deal extra over the 24-month commitment. As I looked over potential phone purchases, I was overwhelmed by the diversity of options, many of which get very good word of mouth and reviews. I figured that many others are also looking to spend some of their Christmas money on a new cell phone, so I made a visualization to aid in surveying options quickly.

If you hover over any items in the dashboard (bars above, phone icon scatterplot below, or the processor, RAM, or external memory categories in between) you will highlight corresponding marks in every other dashboard element. I would recommend using this feature to select price points and other features that would consider prerequisites for your cell phone before using the check marks to the left of the bar chart to eliminate any phones you aren’t interested in.

The drop down menus above the scatter plot then allow you to chart your chosen contenders based on the measures that matter most to you, such as average review across popular internet sites (as compiled by Engadget), megapixels by camera, screen sharpness (PPI), etc. Of course, your decision on a handset will probably come down to other features, as the data in this dashboard is hardly all-inclusive. I found it useful to use these data to narrow the field before diving into issues like speaker quality, waterproofing, and just the general feel of the phone in my hand. I hope you find value in it, too.

I’d be remiss if I didn’t mention that there are other cell phones worthy of consideration as well. LG’s $650 V10 seems to be replacing the G4 as that company’s flagship, but I couldn’t find an aggregation of it’s reviews on Engadget (it was probably released too recently). I didn’t give separate line items to Samsungs S6 Edge-and-or-Note variants either, but the specs are close enough that you can gauge general S6 interest from the charts than choose for yourself if those extra features are worth the money to you. I limited the budget offerings above to the critical-consensus in that space, the 3rd Generation Moto G, but if you’re looking to only spend around $200, there are offerings from Allcatel, Asus, and Blu that you may want to look into.

Personally, after looking at this data and then researching my finalists in more detail, I couldn’t resist the Moto G. Yes, price is the only objective category in which it tops the list, but it’s remarkable that Motorola put out a phone that’s less than half the price of its competitors here which is somewhat competitive in everything but pixel density and storage space. I’ll use a 32 GB microSD card (the newest Android OS promises to integrate SD storage much better than before) to mostly fix that last issue, and I can live without the sharpest screen. Actually, a good-enough screen might be smarter than going full 4K, because the more impressive screens naturally drain batteries much faster, and the Moto G’s battery life is supposedly second to none. Also, it’s certified as waterproof, which may well come in handy as my 2 pre-school sons have been known to put small electronics into bad situations.

Of course, there are personal preferences at play here, and each of the 10 Android devices in this dashboard is probably the best fit for someone reading this (and Apple fans are generally quite happy with their iPhones regardless). I hope this guide is useful for some of you in at least narrowing down your choices to 2 or 3 handsets before taking a more subjective approach in your final choice.

As usual, I’d also like to give you a little guidance on some of the Tableau techniques I used in building this dashboard. This dashboard came from very small data, as I simply copied a table from Wikipedia’s Comparison of Smartphones page, looked up off-contract price on the phone’s official site and average review as compiled by Engadget. The format of the tables was a bit awkward, but consistently so. For example the Nexus 6P’s display was listed as “5.7″ 2560 x 1440”, and the other phones followed the same format. This is a perfect place for Tableau’s split capability. You just right-click on a dimension, choose Transform then Split, and see what Tableau does. If the standard split doesn’t get what you need, choose Custom Split to define it in a better way, and if the output is numeric, edit the field that resulted from the split and wrap it in the FLOAT function. As with any of my work (and just about anything else on Tableau Public), feel free to click download in the bottom corner of the dashboard to dive into it as a developer.

So, that’s the data shaping work that went into this viz. The formatting trick that probably stands out at first is the visual cell phone theme. I achieved it by floating 1) a cell phone shape over the entire 2) a horizontal container as header with text, a screen shot of icons from the top of my own Android and 3) the time, generated by the following calculated field:


3) a vertical container with all of the other elements that occupies the blank space inside the cell phone shape.

However, the biggest formatting trick (and one that’s highly applicable to other dashboards) is central to the scatter plot at the bottom. I started with custom shapes for each company that makes cell phones (as detailed in Tableau’s own article on custom shapes). However, to my eye custom shapes can look pretty messy in a scatter plot. Instead I wanted little buttons with custom shapes in the middle. Thankfully, this is pretty easy to do, as you just have to set up a dual, synchronized axis with one’s Marks card set to custom shapes and the other on circles. Some might prefer putting the circle in the back with whatever color they want, but usually I like to park the circle in front, then set it to fully transparent and put a black border around it (which isn’t affected by the transparency). To further illustrate this, here’s another simple scatter using the same concept with fictional Super Store data:

The Planets of Star Wars

Tableau Software created a Star Wars Web Data Connector, and in the release week of The Force Awakens I couldn’t resist the opportunity to dig in and make a visualization. I was drawn to the Planets subset of the data, and came up with the above dashboard. Two visual approaches for the available data seemed obvious: diameter driving planet size (I know, area would be more accurate, but if I put pi*r^2 on the Size shelf everything but Bespin becomes absolutely miniscule), and a figure on each planet sized to represent population.
I wanted a Star Wars crawl for the instructions, and I made one through a feature on, but I couldn’t get it to work well as a web object in the dashboard, but you can click here to see the scroll I created, or make your own (it will even play the Star Wars theme). Instead I just played with font sizes to put a faux scroll in the dashboard.
The Tableau tip I can pass along from my creation of this dashboard (outside of formatting to remove almost all lines) is that I used a parameter to allow users to change the measure driving the axis (as described by Tableau here) and as an added bonus I made a reference line for Earth’s value in that measure, through the following CASE statement:

CASE [Choose axis] WHEN “Rotation Period” THEN 24
WHEN “Earth Similarity” THEN NULL
WHEN “Surface Water%” THEN 71
WHEN “Diameter (already driving size)” THEN 12475
WHEN “Orbital Period (already driving color)” THEN 365
WHEN “Population (already driving size of man on planet)” THEN NULL

Earth’s population, even 4.2 billion in 1977, was much larger than any of the listed planets, so I left it out of the reference line field, and its Earth similarity score would be pretty much infinite, so that was out, too.

The Driving Offensive Forces of the 2015 MLS Cup Finalists

Sunday’s MLS Cup Final should be an entertaining affair, as the potent attacks Portland and Columbus will hopefully overcome the tendency for Cup Finals to be cagey, cautious affairs. Much of that expectation comes thanks to the five players profiled above. The shape of their contributions can be very different, as evidenced by the very statistical profiles of their major playmaking forces, Federico Higuain (for the unfamiliar, yes he’s Gonzalo’s older brother) and Darlington Nagbe. Higuain’s role in an offense designed around overwhelming, quick attacks often has him very directly involved in the actual act of scoring. Meanwhile, the evidence of Nagbe’s influence is more subtle. His ability to retain possession forces opponents to foul him and allows the Timbers to more methodically cut up the opposing defense.

The strikers, on the other hand, have had somewhat similarly dominant scoring contributions. As detailed in the Strikers page above, Ethan Finlay’s assists and lower number of shots make him distinct from the big African strikers, but the numbers alone don’t draw much difference between those two. Watching them may unearth distinctions in their playing style, with Adi’s dominant, sometimes methodical hold-up play contrasting to some extent with Kamara’s quicker transitions standing out.

Of course, none of this happens in isolation, and others will have certainly have big impacts on the match, both positive and negative, but it’s fair to expect that one of the above players will be enormously influential in the moments that decide who will lift MLS Cup on Sunday.

On the Tableau development side of this, the table calculations required to graph a calculated metric across a rolling half season were a little on the tricky side, and I also used LODs to generate the reference lines with season averages for goals + assists per 90 minutes. The main issue I’d like to make sure you’re aware of is that it is not a good idea to run a table calculation of a calculated field (or an inherent percentage field). You will only get accurate results if you run the same table calculation function for the individual elements of that calculated field.