Google Trends in Sports Part 1: Seasonality and “Popularity”

Nate Silver’s refurbished FiveThirtyEight has run some interesting columns recently, centered around North American sports teams’ and leagues’ results in Google Trends. All of them are recommended reading, but I do have some reservations, particularly around describing Trends data in terms of popularity, but we’ll get to that later. Mostly I came away curious about some issues on the periphery of FiveThirtyEight’s analysis. Silver and his colleagues used worldwide search data, but what would this look like when confined to the United States? It stands to reason that in this vast country, there should be some interesting seasonal and regional issues at play. Also, the FiveThirtyEight focus on teams and leagues was interesting, but since most sports have multiple college and pro options for people to follow, I wanted to investigate at the level of the sports themselves.

Thankfully, Google Trends makes it very easy to gain insight on all these levels. I narrowed searches to April 2006 through March 2014, in order to get a large sample without over-representing particular months or even four-year cycle that are key to some sports’ biggest events. Searching for what are commonly thought of as the big five sports in the USA yielded the following graph:

“Football” and “American Football” are Trends’ official categories for soccer and tackle football (and looking at Google’s export data in detail, I’m confident that the results’ term confusion is minimal). I’ve chosen to make the terminology more USA-centric within the rest of the article and my own visualizations.

Online interest in these sports certainly ebbs and flows over the course of every year, and it appears to follow annual patterns, with the biggest divergences coming from soccer and hockey, in association with World Cups and Olympics, respectively. Tackle football and basketball have the highest peaks, while ice hockey only very rarely even threatened to be the fourth most-searched sport in the country. But it is a bit jumbled, and would be even more so if I added in other sports.

Nicely, Google Trends allows users to download the data so they can take a deeper dive themselves. I pulled reports for every sport I could think of (always including tackle football in the query, as Trends scales results relative to the most popular single instance) and put them into one master list. Trends did not offer mixed martial arts and golf as sports (if there isn’t a Trends category for a sport, then the data would also reflect searches for the Volkswagen Golf, for example), so I searched for their dominant competitions, the UFC and PGA, which have Trends categories as leagues, instead. Sports with overall ratings of zero (sorry, bicycle racing and rugby union) were excluded. With that data set in hand, Tableau Public made it easy to compile a graph of online interest in each of these sports, averaging across each individual week of the year. The minor sports are hard to see, but using the column graph to filter to those smaller competitions makes it easy to compare them.

Unsurprisingly, the patterns that were looped eight times in the first graph are easier to spot in this view. The two consistent high-points are tackle football cresting as August turns to September (just prior to the NFL season), and basketball doing so in mid-March. Probably not a coincidence that both fantasy football drafts and March Madness bracket are commonly researched online.

Soccer is steadily in second or third place throughout the year, even popping into first during most of June and July. Hockey just doesn’t see nearly as much interest as the other major sports (but does feature prominently in some states, which we’ll get into when I write about regionality during part two of this series), and even during the NHL season it is sometimes out-searched by auto racing, tennis, swimming, MMA, and boxing. Online interest in baseball certainly exceeds that of hockey, but not many others, and it’s scores are highest in the first couple months of the MLB season.

Counter-intuitively, online interest seems to rarely peak in association with the end of a sports season, when trophies get handed out, and TV ratings are at their highest. This gets at the heart of some issues around the word FiveThirtyEight used most often when presenting Google Trends data. It is literally a record of how often Google users search regarding topics, and while that points in the direction of popularity, labeling it as online interest or online intrigue is be more accurate, and reflects verbiage from Google’s own descriptions of the service. To illustrate, let’s look at a couple hypothetical fans:

1) A baseball fan in his sixties, who has season tickets to his local MLB park, and new information is regularly available to him through TV, newspapers, magazines, etc., which he takes advantage of religiously. Maybe he bookmarks his favorite baseball websites, but unless he’s big on sabermetrics, he probably doesn’t need to make baseball-related Google searches very often. Even then he might just thumb through his well-worn Baseball Almanac.

2) A twenty-something soccer supporter, who follows multiple teams and leagues around the world. Even if she has a great TV package that carries all the matches she wants to see, she still hits Google regularly to make sure she knows which channel to turn to for a particular match. If her cable package doesn’t include the ESPN, Fox Sports, beIN, GolTV, etc. channel that she needs to watch that fixture, a mass of Googling will probably be necessary to find a reliable-enough, illicit stream of the match.

Is one of them more interested in their favorite sport than the other? They could both be labeled fanatics, but they would have enormously different impacts on Google Trends. Obviously, these are two extremes, but it seems to me that sports that attract younger fans and provide them a natural need to search for information pertinent to that sport online are generally going to have inflated Trends scores, relative to their overall popularity.

Driving the earlier graphs by median provides another useful angle to explore this data, albeit one that would would seem to reflect overall popularity to an even lesser extent.

Soccer is number one! Well, at least it is when isolating the most middle-of-the-road week (layman’s definition of the median) and judging by online intrigue. There isn’t an offseason, since European leagues, MLS, Liga MX, and others all have different schedules, and as I implied in describing the hypothetical fan earlier, soccer fans in this country generally tend to be young, open to technology, and have a real need to use Google in order to follow their sport.

Also worth noting here that while Trends enhancements claim to specialize in sorting out tricky terms, “football” is absolutely one of the trickiest, and is very much central to this study. I’m not as concerned with this as some, because I would hope that google is using outcomes as well as search terms in their criteria. Seems unlikely that the average American would click a link concerning David Beckham when they were trying to search regarding NFL football.

Based on media exposure, advertising revenue, and other factors, we know that soccer isn’t one of the top three of the States’ sports hierarchy, let alone at the head of it. But when factoring in searches for every professional and amateur competition, as well as equipment and miscellaneous other sport-related queries, the beautiful game is clearly the steadiest presence in this Trends data.

While it would be reckless to claim that Google Trends illustrates an indisputable hierarchy of sports in the United States, beware any claims that it is utterly irrelevant. Every year, more and more of American life takes place online, and online interest should at the very least be of great interest to anyone writing about sports online, or who is tasked with marketing to the valuable demographics that skew toward Google Trends.

Since the Tableau visualizations in this post have implications well beyond my usually soccer-centric audience, it’s a good time to note that anyone can feel free to embed any of my Tableau Public work on their own website. Just click the “Share” link in the bottom left of any viz and copy the embed code into your site’s editor that is labeled HTML or text. Mentioning me in the article and on Twitter (@StatHunting) would of course be appreciated as well.

  1. You’re really confusing the issue by labeling the US Google Trends as “American football” and “Football”… Almost no one in the US is googling “Football” when they look for soccer info.

    I can only assume you mean “Football” and “Soccer” and have mislabeled these (I’m guessing that’s the case based on “Football” peaking once every four years during the World Cup), otherwise this research is totally useless.
    You really should fix this to avoid confusion and people questioning your data.

    • Good point, but that’s the way Google Trends labels these things. As an American, when I type in Soccer, then “Football” is the sports category offered, and when I type Football, it spits out “American Football.” I’d share your concern if Trends wasn’t specifically organized to fix it.

      It’s looking at the context of searches and categorizing them. In the larger downloadable summary data from Trends, the top search word in their “Football” category is soccer, and none of the search terms listed seem to imply the NFL and its ilk in any way.

      That said, confusion-avoidance was my reason for using Soccer and Tackle Football for these sports in the visualizations I created (I hope you got that far into the article). I guess I could have just left the direct embed of the Google Trends chart out, but I felt it added something to the discussion and gave appropriate credit to my data source, despite the odd verbiage.

      Thanks for the note, though. I’ve added some text in italics below the chart from Trends as a means of explanation.

