The Online, Interactive World Atlas

Welcome to my Wikipedia-fueled world atlas. I’ve taken ratings of nations from many Wikipedia pages and provided a portal through which you can aggregate them through dividing by population, GDP, and physical size and display the metric’s geographic trends. On the second page you can choose two different metrics to generate a scatterplot illustrating the relationship (or lack thereof) between them.

My aim here is to enable viewers to investigate the data themselves, without having to build their own dashboards. To that end, I’d like to invite anyone to write up an analysis of a map and/or scatterplot they generate using this atlas. If you do, please send such an analysis to me ( Any submissions that I like (I value brevity, so aim for 500-1,000 words) will get published on this site, and I’ll give constructive criticisms to any that don’t make the cut. If you want to post on your own site, please send me an email or a message on Twitter (@StatHunting) and link to this site in your post, so I’ll know to point my followers in your direction.

I’m going to try to make some enhancements of this viz throughout the week as well, I have my eye on an alternate option to the filled map for tomorrow morning, and would be happy to incorporate more data as well. If you can find a Wiki page that you think would fit this project and gives ratings for at least 120 countries, please let me know in the comments or through social media.

Custom Colors, Interactivity, and a Couple Happy Little Charts

Last week some of my Tableau work was featured on, and since I think the charts speak for themselves, I figure I should use this space to explain how I used Tableau to make the dashboard, and then tack on a couple enhancements.

If you want to follow along with this how-to guide, simply download Tableau Public (it’s free) and install on your PC or Mac, then click the Download link in the bottom right of the viz.

A little while ago the Galaxy’s digital media manager, Chris Thomas, reached out and asked me to create something in Tableau that charted LA’s record against other MLS clubs, 2009-2014 (all of Bruce Arena’s full seasons in charge). My initial thoughts were 1) I can grab all the scorelines in that period from MLS’ schedule page, 2) I’d like to in some way use an isolated image of the Galaxy crest’s quasar, simple yet distinctive, 3) I’d need to customize the colors to match LA Galaxy branding, and 4) I wanted a second chart on the dashboard offering a different perspective on this era, and also functioning as a filter.

The data was pretty straightforward in 1), though there were a couple of odd null scorelines listed in 2010, but nothing quick internet research and Excel adjustments couldn’t fix. After loading the resultant csv into Tableau, I simply used calculated fields to list results in terms of the Galaxy and Opponents instead of Home and Away. Then a couple other calculated fields to get figures for wins, losses, draws, points, and goal differential.

Chris quickly gave me the image for 2) and Pantone color codes for 3). I used this site to convert the Pantone codes to RGB, and then popped them into Tableau by choosing More Colors when setting the color of any element and typing the RGB numbers in the Red, Green, and Blue spaces. After I input them once, Tableau held onto the Galaxy’s midnight blue, lighter blue and gold for easy selection on other chart elements.

Now that I had my data and palette set, it was time to get charting. Lollipop charts are a nice option for visualizing figures that can land above and below zero, so I made some quasar-tipped lollipops (using this technique, but with custom shapes for good measure) to display the Galaxy’s average goal differential against all of their opponents. Chris asked me to filter out results from the CONCACAF Champions League, US Open Cup, and World Football Challenge because they were a bit off topic (and probably also because the records vs Real Madrid and the Carolina Railhawks are embarrassing sans context). Turns out that over these years and within only an MLS set, the Galaxy haven’t been outscored by anyone! Impressive, even if it negated the lollipops’ strength in charting negatives. I still liked the shooting star aesthetic, though.

For the second chart, I figured that since I had a date dimension for all the Galaxy fixtures, I’d use a line graph to display points over time. Obviously, individual matches’ points would be far to erratic to be useful, so I right-clicked on my calculated field for points, chose a quick table calculation for moving average, then selected “Edit Table Calculation” and set “Previous Values” to 34, the current length of an MLS season. This chart got funky in early 2009 because it didn’t have 34 matches to look back on, so I went back to my data source, added 2008 fixtures to the csv file, and refreshed my data in Tableau. I then had to filter out 2008 in the goal difference table, and forced my date axis in the line graph to start on Jan 1, 2009 so that the 2008 results were only evident in the background calculation for the moving average of points.

After some discussion with Chris and others on the Galaxy web team on design elements (the original had a gold background and a different font) and glamming up the tooltips, text you see when hovering over a data point, I ended up with the dashboard you see above.

I’m always tinkering with these projects though, and after some Twitter feedback I decided to make an alternate version. First I undid the context filter on Competition, making it clearer that LA hasn’t been without their issues in the CONCACAF Champions League (aka CCL), the World Football Challenge, and the US Open Cup. However, the later two can be partially chalked up to Arena often placing less priority on those tournaments to save his best players’ legs for more important competitions. Second, I created parameters and calculated fields (similar to this technique) a couple of times to allow readers to switch between charts based on points or goal differential. Here’s the result:

By the way if any other club would like to look back on their records (full history or a limited window) in a similar manner, it would be very easy for me to replicate this approach, tweaking the color palette, images, etc. to fit the project.

Thierry Henry in MLS: The Grass is Always Greener

The king is hanging up his cleats. Thierry Henry officially retired today, and will be an (assumedly blunt, witty, and insightful) analyst for Sky Sports. Many exclusively remember him for his years with Arsenal and Barcelona, but those in the United States feel privileged to have seen him casually dominate for the last five years in a New York Red Bulls kit. Joining the MLS club at 32 after already scoring 232 goals in 455 professional matches, for many European fans Henry’s stateside accomplishments were always going to be little more than footnotes on a remarkable career. However, despite his nonchalant bearing, Henry never played MLS matches as though they were anything but serious business.

What was truly remarkable about Thierry Henry’s time in MLS was that he aged his game in a smart way that few athletes seem to even attempt. Oftentimes an aging star either tries to recreate his old magic by sacrificing other areas¹ or just takes a walking paid retirement². Thierry Henry took the road less traveled by recognizing the limitations body was thrust on him and adapting himself to still be enormously effective despite being a very different player³.

The blazing footspeed of Henry’s heyday was gone, but without it he more clearly exhibited his beautiful mind which could read the circumstances of the match faster than anyone else, then elegantly position himself and execute a sly pass, run, or shot that turned everything on its ear. His goal and assist numbers below reflect that he remained enormously productive throughout this period.

Some MLS supporters didn’t get to see the show in person though, because Henry selectively sat out matches played on certain artificial surfaces. Every time he was scheduled to play on turf that was tailored for American football, there was always a reason for Henry to take the day off. I’m not sure if “turf distaste” was ever the explicit reason for Henry’s absence, but the last four stadiums in the below graphic lay out the trend pretty clearly.

Portland is the exception on which he played every available minute, and it is noteworthy that their artificial turf is the only such surface in MLS that doesn’t occasionally host professional tackle football games. Turf can be setup for soccer or football, and whether that makes a difference can be debated, but Henry clearly acted as though Portland’s approach to artificial turf was the only acceptable non-grass solution.

Montreal has played the vast majority of their MLS matches on grass in Stade Saputo, but of the Red Bulls’ five visits to Montreal, only two were played on grass, including, on July 28, 2012, Henry’s only regular season visit to Seattle, Vancouver, New England, or Montreal that involved kicking a ball. If Montreal did purposefully dodge Henry visits through disproportionately staging them on Olympic Stadium turf, it was a smart strategy, as he was positively merciless when he did play against the Impact, to the tune of 1.8 goals per 90 minutes and 0.8 assists per 90, both the highest marks Henry posted against MLS opponents.

In the above graphic you can explore Henry’s output across 10 different categories, organized by MLS club, city in which the match was played, or even by month. In all cases, the results will be sorted highest to lowest⁴. All of three of the dashboards above are driven simply by the game log on Henry’s MLS profile page, and I should re-iterate that this resource only covers the regular season. Ironically, his final professional appearance was a playoff appearance on New England’s turf.

In no way is this meant to imply that Thierry Henry’s contributions can be fully rendered in graphs. His smooth brilliance was best witnessed by keeping your eyes on him for 90 minutes so you could try to catch as many of the great moments, subtle and/or sublime, that he would churn out regularly while flowing through the beautiful game. The charts just reflect that all of that elegance was still enormously productive.


1. NBA fans are probably thinking of Kobe Bryant hurting the chemistry and defense of the Lakers right now, distilling his game to little more than inefficient scoring and lots of it.

2. Ironically, New York also paid huge sums to two of the most unashamedly lazy aging stars in MLS history, Lothar Matthäus and Rafa Marquez.

3. Another NBA reference: one of the best corralaries to Henry’s evolving approach was the last few years of Jason Kidd’s career, in which his speed was gone, but his brain helped him manipulate circumstances, getting many easy points and steals for his team.

4. Admittedly, this order takes some getting used to when looking at monthly stats, but I figured that some would enjoy the feature nonetheless. 

Create Your Own Home & Home, Away Goals Chart

After the first leg of a home-&-home series, some supporters are always left scratching their heads as to what their club needs to accomplish in order to win the aggregate series. Since most home-&-home series use away goals as their primary tiebreaker, I decided to create a tool by which readers can enter team names, first leg scoreline, and then be shown the necessities for those sides in the second leg.

In short, you just have to replace “Team A” and “Team B” with club or national team names, abbreviations, or even nicknames (complimentary or not), then use the slider to select the first leg scoreline. You can use this for matchups in the Champions League, MLS, Liga MX, certain stages of some confederations’ national team qualifying for major tournaments, or even a competition between tiny clubs that very few care about.

I used the current state of the MLS conference finals for tabs illustrating the use of this tool. The New England Revolution obviously has a substantial advantage heading into the second leg, having struck twice in Red Bull Arena and conceding only once. Meanwhile, LA Galaxy have to be pretty happy with a 1-0 win in their home, though the Seattle Sounders seem quite capable of doing enough to take the series in the second leg.

In all of this, keep in mind that certain scorelines are more likely than others, so simply counting cells in the chart isn’t very useful. Hover over any one cell and a pop-up will note the percentage of soccer matches that generally end in that scoreline, based on this piece from the Soccer By the Numbers blog, which cataloged outcomes from the EPL, La Liga, Bundesliga, and Serie seasons from 2005/06 through 2009/10. I simply averaged the distributions for those four leagues, so don’t consider this the be-all-end-all of scoreline distribution. The figures are also problematic predictors of particular home & home fixtures, especially since objectives are so different within a playoff structure, but they do help us set expectations on a basic level.

50% of those matches ended with the home side scoring between zero and two goals and the visitors either getting shut out or scoring once. That makes the six cells in the top left of each chart immensely more important than less-likely scorelines on the periphery. Expanding up to three home goals and two away tallies brings the total above 80%, then by five and four almost 99% of outcomes are covered, which is why those are the limits of the home-&-home away goals charts.

Speaking of those home-&-home maps, I just hope that they help some folks understand the away goals tiebreaker (regardless of my opinion that they are a poor fit for MLS playoffs). They are pretty easy to understand when you look at them from the right perspective, even if away goals are a thoroughly arbitrary way to lessen the likelihood of extra time, and cumulative home field advantage within a series.

Effect ofAway Goals on 2nd Legs of MLS Conference Semifinals

For the first time this year, MLS is using away goals as the first tiebreaker in their playoff home and home matchups. Before we get to the pros and (mostly) cons of this rule, here’s a guide to the 2nd legs of the conference semifinals, taking away goals into effect: semis

Cells with white bars above and below represent aggregate ties, and are the scenarios in which the rule matters. In previous MLS home-and-home matchups, the 2nd leg would have gone to extra time, but now you only get OT if the two legs’ scorelines are palindromic. It is important to note that away goals will only count in regulation, so on rare instances where a matchup reaches overtime, the home side will get a glimmer of an edge. But if they can’t take advantage in 30 minutes, the edge evaporates, as studies have shown that penalty kick shootouts are home field neutral.

In this year’s matchups, this means that Los Angeles & Seattle will be incentivized to clamp down the match and keep it low scoring, so that even an aggregate tie would land in their favor, or at least give them 30 minutes of home overtime. Same for New England, but all they have to do is hold Columbus to a couple of goals or less. The other top seed, D.C. United, has the trickiest path, as they need two goals, but if New York scores once, that number doubles.

Which brings us to the major problem with this setup. Home-and-home is designed to negate home field advantage with 90 minutes played in both settings, and the only way for there to be an advantage for the higher seed is for them to get an extra 30 minutes of home field in that second leg. An away goals tiebreaker makes that outcome even less likely than it was before. I’m not the only one who sees the issue this way, and you can read Brian Straus’ smart critique upon the rule’s introduction here.

The issue goes beyond the basic logic of the setup, though. A study performed by the University of Munich’s Department of Statistics concluded in 2010 that

It is shown that the observed differences in frequencies of winning between teams first playing away and those which are first playing at home can be completely explained by their performances on the group stage and – more importantly – by the teams’ general strength.

That’s what you want in a champions league competition where seeding is far from straightforward or trustworthy, but in a season-culminating playoff system? If MLS wants the MLS Cup to feel like the legitimate ultimate trophy for each season, they need a playoff system in which regular season excellence is rewarded, not neutered. They had that, though it was cumbersome, before 2003 in their best of three format, but they’ve been adrift ever since in home and home murkiness, and away goals is taking them even further from shore.

Champions of away goals in MLS, such as MLS’ Technical Director of Competition Jeff Agoos, point to various rationales like the rule promoting attacking play, drama, or being an “authentic” Europe-bred standard. Some very smart people in Europe have problems with it, though, and I have yet to see proof of any off these defenses of away goals.

Thankfully, MLS only have to peer south of the border for a clear, simple upgrade. Mexico’s Liga MX has playoffs in which the home-and-home tiebreaker is regular season record. Underdogs have to win outright, which makes far more sense. As things stand in MLS, there is effectively no difference between the 2nd and 3rd seeds in each conference, who meet each other to start their playoff runs, and the only advantage footer the top seeds is the hope that their opponent is wounded from their play-in wild card match. Shouldn’t 34 matches carry much more weight than that?

The New MLS Salary Release and Its Implications for Looming CBA Negotiations

Yesterday the MLS Players Union updated their MLS salary release, incorporating the wages of new signings and extensions granted recently to the likes of Kaka, Jermaine Jones, Graham Zusi, and Matt Besler.  This release paints a useful picture of overall spending trends in the league right now, and we can take a couple extra steps with these figures to see what they imply for spending after the Collective Bargaining Agreement (CBA), whose negotiations loom over the coming MLS offseason.

First, lets look at the salary data as it is. Here’s my visualization of all 568 players in the release that are assigned to one of the 19 current MLS clubs or next year’s expansion teams, New York City FC and Orlando City SC.

Yes, despite only listing five players here, Orlando City already has the fifth-highest payroll, driven almost entirely by Kaka, whose $6,660,000 base salary and $7,167,500 guaranteed compensation are the highest in MLS history. Orlando’s expansion partner, NYCFC, represents the biggest oddity in this data. One of their star signings, Frank Lampard is not mentioned, and the other, David Villa, shows only $60,000. Considering both these players’ histories and the high-spending reputation of the new employer, Mansour bin Zayed Al Nahyan, next it is likely that both will be pulling in a figure that would place them among the richest in the league.

These oddities, especially Villa’s farcical $60k, call to mind the history of managers, owners, and others within MLS downplaying the accuracy of particular players’ salaries listed in previous MLSPU releases. This is why these figures are most useful when viewed from a wide angle, and we should resist the urge to use them to label specific players “underpaid” or “overpaid.” They also don’t take MLS’ myriad salary cap mechanisms like designated player designations, allocation money, retention funds, pro-rated transfer fees, homegrown status, Generation Adidas status, trades in which a player’s former club continues to pay some of his wages, the fact that only 20 of a team’s 30 players hit the cap at all, and the general accuracy of the MLSPU release. Confused? You can read go read MLS Rules and Regulations if you’d like, but you should go in understanding that not all of the rules are publicly stated, and commissioner Don Garber has admitted that the league sometimes alters the rules when it is convenient to do so.

For this reason, I am not estimating clubs’ cap numbers armed with only the MLSPU release. Instead, what we have above is a simple ranking of clubs’ total wages, and a visual reminder of the disparities between the league’s stars, and its rank and file. It is alarming that Kaka will make over 135 times the veteran’s minimum, $48,500, but this kind of ratio is nearly assured by the Designated Player (DP) rule. Since only three players per club can make more than the DP level $387,500, it is 100% guaranteed that at least 90% of the players will make less than that. In actual practice this year, here is the breakdown of players within certain wage ranges:

The middle class in MLS is so barren that the above histogram has to skip every salary bin containing zero players. Without this filter, the number of rows would render the chart on the left absurd, displaying 204 rows, 171 of them blank. On the right, we can see the shift in this salary distribution based on potential post-CBA salary cap levels. This figure defaults to $4,500,000, but you can use the arrows or the slider to adjust it, $100,000 at a time. There still is not much of a middle class, but at the very least the lower class will make enough to stop fretting over their monthly expenses. Feeding this view is a calculation for hypothetical wages which increases every non-DP player’s guaranteed compensation proportionally to the salary cap increase. For a player making $50,000, if the cap rose from $3.1 million to $6.2 million, their hypothetical wage would be $100,000.

Hypothetical wages here are by no means meant to imply that every player will receive such a raise himself. The theory at play in the hypothetical visuals is that wage dynamics will stay roughly proportional across the board, and it is actually quite likely that some of the lowest paid (145 players, 25.5% of the player pool, currently make less than $50,000) will lose their jobs when/if the salary cap and minimum wage increase. Undoubtedly, many of these players deserve to make more money, but a fair number will get squeezed out in favor of new competitors who may have previously dodged MLS because the wages on offer were too low.

The dynamic there portends an oddity in the pending CBA negotiations, where up to a quarter of the current players may want to avoid pushing salaries very far beyond the status quo for fear of losing employment. Meanwhile, the more secure players will surely want an increased cap so they can reap the benefits not only for themselves, but for the sake of getting better teammates. Consider that MLS sponsors and broadcasters may also be calling for increased spending, and you see that CBA negotiations look to be complex even before considering topics that go beyond wages.

In any event, despite some conservative parties on both sides, it seems very likely that we will see a substantial increase in the MLS salary cap. Television ratings are increasing (though the numbers are still far below other leagues), attendance continues to rise, and broadcast rights fees and sponsorships are bringing in more money. My hope is that they set the salary cap as a percentage of league revenues, so it can self-correct over time. With that in mind, here’s a version of the first chart, but using hypothetical wages and also containing a interactive salary cap selector.

Again, the hypothetical wages here run off the assumption that the spread of sub-DP salaries on club rosters will raise roughly in proportion to the salary cap increase. This prediction won’t be 100%, but given the league’s insistence that the cap and single entity isn’t going anywhere, it’s a safe bet that it will be close. The key dynamic I will look for in CBA negotiations is whether each senior designated player will still account for 12.5% of his club’s cap. If that percentage lowers, it will be an enormous victory for the richer clubs, as they would then have more freedom to spend elsewhere on the roster in support of their high-price stars. That could portend an enormous rise in the importance of spending, a factor whose relevance has been quite small thusfar in MLS.

The roster for Chivas USA (who will go on hiatus for two years) stack onto the hypothetical Orlando squad as a means of seeing how the wages of a full team with Kaka would look. NYCFC’s reported squad was so thin that I didn’t alter their player list, but that does make for a notable under-reporting of potential league wages, especially with Sheikh Mansour expected to spend as aggressively as possible, as he has at Manchester City.

Commissioner Garber has stated many times that he wants MLS to become a top league in the world. If this is his true objective, the coming CBA is a prime opportunity for the league to aggressively increase wages, thus attracting and retaining better players with an aim to improve the standard of play and the marketability of the league and its clubs.

The MLS Playoff, Top 3, and Supporters Shield Races

League tables lie, and MLS’ is among the least honest. With 19 clubs, one has to sit out every weekend, and even without those bye weeks the league is generally averse to scheduling the season such that clubs’ games played stay roughly even, so ranking by raw points is usually quite misleading. Instead, I’m ordering clubs by their points per game (PPG), which has the added benefit of being able to chart the PPG pace they’ll need over the rest of the season on the same axis.

Hover or click on a club’s crest or bar for a writeup of their place in each race.

The colored boxes represent the PPG that club will need over their remaining matches to compete for that particular objective. Each box spreads over 2 points in the final standings, for example, today (I aim to update this table at least on a weekly basis) the left side of the purple Supporters Shield box represents the pace needed to reach 64 points, the midpoint is 65, and the right side is 66. I’m roughly trying to place the boxes such that the left side represents a 50/50 chance for most clubs, and the right side is at least above 75%, all based on SportsClubStats’ Monte Carlo simulations. Note that the East and West have different point targets for the playoffs (blue) and top three (green), since the two conference’s races are basically independent.

The three races outlined are the most important when considering playoff implications. The Supporters Shield holder gets a CONCACAF Champions League (CCL) spot next year, the top three in each conference get byes to the conference semifinals, and fourth place hosts fifth in one wildcard match, with the winner meeting the top seed in the next round. None of the top three really get an advantage over the others, since the conference semifinals and finals are in home and home format, which neuters home field advantage. Sure, fourth is preferable to fifth because of wildcard hosting, but that’s a small factor in comparison to a bye or a CCL spot.

What we’re left with is a clear indication that the Supporters Shield race is between Seattle and LA, with DC or RSL only able to enter it with a huge rally alongside a stumble from the leaders. Essentially, the West has paired off with a Sounders/Galaxy battle for first, Salt Lake/Dallas grappling for third, and Vancouver/Colorado desperately trying to join those four in the playoffs, at least as the visitors in that wildcard match. Meanwhile, the East is one big jumble, with each club seemingly capable of falling or rising by a couple spots when all is said and done. Except Montreal, who can safely set up shop in the cellar.

The admitted blind spots in all this are schedule strength and tiebreakers. I note wins and goal differential in the writeups that pop up when you select a club in the table, but the specific point targets are conference-wide. A team with three more ties than clubs they are close to (prime example, Chicago, who may well set the MLS single season record for draws) would need to aim one point higher. Meanwhile, I’m not including fixture difficulty, but the PPG pace needed is still highly applicable whether the road to get there is rocky or smooth.

Overall, this should be markedly more helpful than the standard league table, but for other advanced views of MLS races, I highly recommend simulations on Sounder at Heart in their series, “State of the MLS Run In,” as well as on American Soccer Analysis. These sites take a more nuanced approach while projecting for each club, while I can update mine quickly and allow for assessment of all the races at a glance.

Google Trends in USA Sports Part 2: Regionality

A while back¹, I looked into American Google Trends scores across 16 sports, both overall and in terms of seasonality. Inspired by FiveThirtyEight’s riffs on Trends, I found that Trends is a gauge of online interest or intrigue, not popularity per se, but the data is quite useful for comparing sports in some contexts. One of Trends’ most interesting features is the way it parses results geographically. While some sports’ fanbases seem likely to use Google to differing degrees (as illustrated by Part 1′s hypothetical baseball and soccer fans), conceptually I felt that Googling bias when comparing states should be smaller², making intra-sport regionality quite instructive. The map in the following Tableau dashboard compares each sport only against itself, state-to-state. Click the dropdown list in the top left to switch sports.

Note that a state’s Trends scores are all relative to the state in which that sport created the most interest, which always has a score of 100. For example, every other state’s score for tackle football is essentially a percentage of the Google searches made per Google use for that sport in Alabama. Soccer is a pretty steady presence from sea to shining sea, though its consistency is not as pronounced as its no-offseason nature lent it in Part 1′s seasonality study. Virginia’s top spot is interesting, as the state’s direct soccer legacy can be traced mainly to college soccer, which gets almost zero media coverage, and D.C. United, which plays its matches on the other side of the Potomac. The nation’s capital does seem to drive the results, though, as all of the most soccer-Trending Virginia cities reside relatively close to it, as illustrated in this map taken straight from Google Trends:

This map does not tell us whether it is driven by D.C. United, University of Virginia soccer, general demographics, or foreign-born ambassadors/lobbyists/etc living in the suburbs and searching for the beautiful game there, but it is notable that UVA is in Charlottesville, a couple hours away from the state’s soccer Trends epicenter.

Generally, soccer seems to be at its strongest in high-population states, as the top seven states by population all land 65 or above on Trends’ 100-point scale. The sport struggles most in sparsely-populated northern states, like North Dakota, Montana, Wyoming, and Alaska, with the 49th state setting the beautiful game’s floor (a relatively high floor at that) for American interest.

In some other sports we start to see a bias toward hotbeds of that particular sport in NCAA competitions. Alabama leads tackle football in Trends, with the many of the NCAA’s most storied football programs (and very few states that house NFL teams) scattered through the top 12. Basketball’s top 10 states contain only three NBA franchises, the Indiana Pacers, Charlotte Bobcats, and the Cleveland Cavaliers, and all three of those states have hugely popular college teams as well.The most striking example is Louisiana topping baseball scores. LSU baseball is legendary in the niche world of college baseball, having been to 16 College World Series with six championships in the last 28 years. The Tigers have had the NCAA’s #1 baseball attendance for 19 straight years, averaging 10,754 per game when no other college drew more than 7,700. Similarly, other non-MLB states like Mississippi, Nebraska, South Carolina, Alabama, and Arkansas are in the baseball top 10.

Meanwhile, ice hockey unsurprisingly has a decidedly northern thrust. Only eight states had Trends scores above 40, and the furthest south of them was Massachusetts. Outside of Alaska (scoring 47), the most Western hockey-interested state is North Dakota. At the other end of the spectrum, 23 states had hockey Trends scores below 15. This means that in all of these states (mostly South and West) hockey searches occurred at most 15% as often as they did in the sport’s standard-bearing state, Minnesota. Of course, the severity of this geographic trend could very well be exaggerated by college hockey programs, which are far more regional than the NHL.

Based on standard deviations of state Trends scores, cricket and lacrosse join hockey in drawing the most sporadic interest in this country. The cricket Trends seems to somewhat mirror the map of Indian immigration in this country, while lacrosse, for reasons unknown to me, seems to have elicited heaviest interest in Wisconsin, Maryland, Connecticut, and some other Northeastern states. If that list makes sense to you, please clue me in on lacrosse patterns in the comments. (Note: Jason Kuenle commented below that Wisonsin has a town named “La Crosse,” which unfortunately means that the lacrosse Trends scores for that state and some of its neighbors are decidedly suspect.)

Hawaii sticks out as the state with the most unique sports profile, popping up as a top three state for niche sports like rugby, ultimate fighting, volleyball, and swimming. If you are one of the few Americans with a passion for those competitions, good news; you may be best off living on a tropical island. Meanwhile, Hawaii has scores below fifty for each of the top four Trends sports: tackle football, soccer, basketball, and baseball.

If someone is deeply convinced that a certain sport will never be big in America, it could well be that they simply haven’t visited one of its domestic hotbeds³. While the apparent college sports bias does raise questions about the actual meaning of Google Trends scores, I feel that they are still a nice way to map online interest. You just have to realize that those driving online interest are likely far younger than the national average and their googling tendencies may well be skewed by which schools they attend. All of the above is a bit fragmented, but the biggest lesson here is that so are American sports preferences.


¹ OK, a lot earlier. This article got delayed because I have been quite busy personally and professionally of late. The delay is kind of a blessing in disguise, though, as it gave me time to mull the results of this study a bit more and come out the other side with what I think is an improved analysis
² Especially given that each state’s Trends score is scaled based on overall Google searches in that state.
³ Also worth noting that there are fluctuations within states, though Trends’ publicly-available data only details the most popular cities for a particular search, meaning it would have been more difficult for me to map them below the state level.

Just For Fun, an Interactive Tableau Doodle

Ever since I first charted a cosine curve on my Casio graphing calculator in high school pre-cal, I’ve been a fan of data visualization. The vast majority of the time, I employ it to learn things, but sometimes I just doodle for the aesthetic appeal. When this is the objective, I often use that trusty cosine curve as my basis.

Tableau has a contest this month for using their product artistically, and I decided to make a cosine graph, its vertical mirror image, and their trend lines with parameters that users could play with to change amplitude, wavelength, color, etc. of the waves. I’m not defining what each of the four parameter controls do at this point (if anyone cares, I’ll gladly post the formula that drives it), I just hope that someone enjoys tinkering with it like I do.

Update: There are some absolutely brilliant finalists for Tableau’s Viz as Art contest, and mine is justifiably not one of them. Highlights include Matthew Bennett’s Pure Data Harmonographs, Robert Mundigi’s adaptions on of Curtis Steiner’s 1000 Blocks sculptures, and George Gorczynski’s “Tendrils”. Mine is just a silly little tool, those pieces are gorgeous.

Charting the Career of a Young, Exciting, Inconsistent Player

Yesterday I wrote about Fabian Castillo’s first 100 MLS matches on Big D Soccer. Click that link for a breakdown of the young career of a speedy young player (who I’ve written about before) that can be frustrating and exciting, often at the same time. Here’s the chart I made chronicling his averages over 10 match periods thusfar:

This graph was only supposed to be a little exercise in descriptive stats, but after I made it, I realized that I could drum up something similar in Tableau, which could easily include some predictive analytics and that would be easily repeatable for others in MLS, whose player pages all include a game log dating back to 2010. I’m just not sure if there’s a market for that, so here I’ll just post the graphic alone and ask for your feedback for now.