Monday, June 24, 2024

Using Big Data to Win Your March Madness Pool

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

The rising interest in big data — mining reams of data to help predict outcomes — is affecting many areas of modern life. Could this data-driven approach even help you win your March Madness pool?

Consider my story. One glorious year back in 2001, my March Madness office bracket was a pristine sheet of victory. Leading up to the National Championship game, one side of my bracket was perfect. Yep, perfect. I had correctly picked every single game in the South and Midwest regions. Not one single wrong choice, every upset making me look like a psychic. The other side had a few blemishes, but not many.

Unfortunately, the person chasing me in the standings, despite being far behind in the number of correct picks, still had a chance to win. That individual picked Duke to win it all. I had Arizona. The way this pool was weighted, as so many are, if you pick the National Championship winner correctly, only other people who also picked that winner have a chance of beating you.

Thus, the National Championship matchup of Duke vs. Arizona made this a winner-take contest between me and my office nemesis. I lost and promptly joined the ranks of rabid Duke haters

That loss stung more than others because it was the first year that I really considered data when filling out my bracket. It was the first year that I considered things like the relatively high probability of a five seed losing to a twelve seed. It’s probably a good thing I lost that year. Otherwise, I’d probably be frittering away a month of productivity each March, chasing bracket perfection.

People get crazy when it comes time to fill out the brackets. “In the past, at my old investment fund, I used to organize a huge binder for our CEO every year that analyzed each conference, the teams in that conference, and how they performed during their conference tournaments preceding March Madness,” said Andrew Schrage, co-founder of Money Crashers, a personal finance website. “It took one or two days of work, but those binders gave the CEO a substantial leg up heading into March Madness.”

These days, Schrage believes his old strategy would be ineffective. As the big data experts tell us, we are drowning in an ocean of data. There is so much data available now, and you can access numerous sites that let you run simulations, calculate probabilities and analyze all sorts of data points. Today, you’d be foolish if you were to try to do it all yourself.

However, most people want to actively pick their own brackets. What fun is winning if it was really the Madness-bot 3000 that made all the picks? Most of us want to feel like we made prescient choices. Here, then, are five data-driven ways you can improve your chances in the office pool:

1. Consider The Size of Your Pool

Anyone who’s spent even a little bit of time studying probabilities knows that you’re much better off playing other players than the house. Games that favor the house mean that over time you will always lose.

However, if you’re going to play players, you have to factor their probable behaviors actively into your strategy. Few do this when it comes time to fill out brackets.

“More than anything else, the number of brackets you’re competing against in your 2013 NCAA bracket contest should dictate your strategy for making picks,” said Tom Federico, CEO of

Small pools (the 2001 pool I was in had only had about 20 people in it) favor the powerhouses. “[In a small pool], it’s stupid to make a bunch of big upset picks. Favorites usually win. The cardinal sin of competing in small bracket pools is shooting yourself in the foot by getting too risky, and losing out to someone who picked a bunch of likely winners and got most of them right,” Federico said.

Conversely, in big pools with a thousand or more participants, the favorites will all be overrepresented. Thus, picking the favorite makes your chances of winning a long shot. “If you picked Kentucky in a 1,000-person pool last year, for instance, you had almost no chance to win your pool, even though you got your champion pick right,” Federico added.

In a big pool, it’s likely that roughly forty percent of your opponents will pick Kentucky (or this year, Louisville or Indiana — although neither is as big of a favorite as Kentucky was last year). “Kentucky was a dumb pick in very large pools, whereas a team like two-seed Ohio State made a lot more sense,” he said.

Size matters, but if you really want to Moneyball your way to the top, you’ll need to refine your strategy further.

2. Discern Between the Wisdom of Crowds and Herd Mentality

Now that you’ve considered the size of your pool, also consider who you are playing against. In today’s hyper-connected word, you could, say, monitor social media to see who the trendy upset picks are. You can check out ESPN or Vegas odds to see how crowds are voting.

Again, in large pools, you want to run counter to the crowd, at least as far as the champion is concerned. However, remember that you aren’t playing in a vacuum. If you live in the state of Michigan, are in a local pool and pick three-seed Michigan State and four-seed Michigan to go deep into the tournament, you aren’t picking for value. Rather, you are still following the herd. Sure, ESPN might not value these teams highly, but your neighbors with Spartans and Wolverines flags on their porches will probably overvalue them.

However, if you live in the Research Triangle Park and pick Duke to get knocked off early, you may well get value out of that strategy, especially if you think the three-seed in their region, Michigan State, is a bad matchup for them and will likely knock them off in the Sweet Sixteen anyway. Why not take a flyer on Creighton to knock of Duke, then? It’ll only cost you one game if you’re wrong about Creighton but right about Michigan State.

3. Choose Upsets Wisely

When picking upsets, don’t focus on just the game itself. Also factor in the ramifications for the rest of the bracket. If you think Louisville could lose in the Round of 32, great. But what if anything past that round plays to their strengths? If any one particular upset could really blow up your bracket, stay away from it.

Upsets also should be partially determined by the payouts of your pool. I’ve seen pools that pay out good money for upsets. I’ve also seen pools that give you the same number of points as the team’s seed. That’s where those twelve seeds can really come in handy. You could conceivably have a strategy of picking nothing but upsets in order to almost guarantee yourself some money at the end. However, remember, pools weighted this way influence the herd.

In any pool I’ve ever been in that favors upsets, the real badge of honor (even if the money isn’t as good) is picking the most upsets. You just seem smarter and more daring, especially when any knucklehead could have picked Duke.

Those are exactly the kinds of pools you should pick Duke in, though. Let everyone else fight over the upset money.

If you are going to focus heavily on upsets, though, consider probabilities. Since 1985, no sixteen seed has ever knocked off a number one; only six fifteen seeds have ever knocked off number twos (although two of these were last year), and only fourteen teams seeded fourteenth have beaten a number three.

However, a thirteen knocking off a four happens has happened twenty-four times since 1985, and exactly once a year since 2001. And if you choose your eleven and twelve seeded upsets wisely, since each statistically occurs once per year, you could be in really good shape.

4. Favor Sample Size over the Illusion of “Being Hot”

Now, I’m not saying teams don’t get hot. Maybe the hot team had a leaky defense and has finally buckled down. Maybe it stopped settling for low-percentage contested jump shots and has its guards driving to the hoop more often. Maybe a key injured player came back. There are a million and one ways teams can get better in the postseason than the regular season. Last year’s Stanley Cup playoffs are one of the best recent examples of this. The Los Angeles Kings barely made it into the playoffs, but once in, they dominated.

Look a little more closely, though, and it’s not such a big surprise. Los Angeles changed coaches, added a scorer (Jeff Carter) at the trade deadline, and was already one of the top defensive teams in the league. Given the parity in the NHL, not all that much had to improve to vault them from average to Cup contender.

Teams can and do get hot. But hot teams cool off, and in any statistical sample, there is a tendency to regress to the mean.

“Favor teams that won their regular season conference championship; they win more consistently in the Big Dance. Obviously, it’s not the only factor to consider, but we’ve found it to be statistically significant even in the presence of other stuff. On the flip side, teams that got their league’s automatic bid by winning their conference tournament don’t win more frequently, so don’t worry about that,” said Jay Coleman, Professor of Operations Management & Quantitative Methods at the University of North Florida’s Coggin College of Business.

“There’s a satisfying and unsurprising statistics/analytics story there: the bigger sample we get from the regular season is more predictive/representative of teams’ strengths than the relatively tiny sample of games played in conference tournaments. No duh . . . right?” he added.

I’m not sure it’s a no-duh observation. The human mind isn’t particularly good at calculating probabilities, so even obvious statistical insights can run counter to evidence-free conventional wisdom (i.e., go with who’s hot now).

5. Don’t Try to Reinvent the Wheel (Trust Nate Silver)

Before you go trying to build a perfect bracket-picking model, why not look at the other models out there, study them for any obvious deficiencies and then simply improve on those?

Nate Silver doesn’t have the reputation for picking March Madness winners that he does for zeroing in on Electoral College votes. Nevertheless, his models are proven to work, and whenever they’re based on weak data, Silver will let you know.

The FiveThirtyEight forecast builds on existing computer models (Sagrin, Pomeroy, etc.) and adds in other factors, such as injury and geography.

This year, judging from FiveThirtyEight projections (and a range of similar forecasting tools), the number one seeds are weaker than in previous years. Moreover, when you consider statistical factors that are usually based on luck more than anything else — such as won-loss record in close games — Florida stands out as thevalue pick in this tournament. (Well, unless you live in Gainesville.)

However, statistically speaking, there is better than a 30 percent chance that the one, two or three seed from the Midwest region (Louisville, Duke or Michigan State) will win the National Championship.

With any statistical forecasting tool, it’s important to remember that they measure a slice in time. If a key factor has changed, such as a serious injury to a top player, you’re no longer comparing apples to apples, even with a large data sample.

“For instance, last year Syracuse was doing really well and most of the models said they were 1 or 2. However, an important player was injured, and it was fairly clear they weren’t going to go that far. In such cases, I’d rather run the model and then override its results,” said Davidson College Math Professor Tim Chartier.

Chartier and his students have developed forecasting software that in past years helped them place in the top 97 percent of the 4.6 million brackets submitted to ESPN.

Chartier’s new free course March MATHness on Udemy, an online learning marketplace, shows how to use three popular sports ranking methods — two of which are used by the Bowl Championship Series — to create your own mathematically-produced brackets for March Madness and pick which teams will prevail in the NCAA Finals.

Another of Chartier’s key pieces of advice is to not rate all games as equal. If you are assigning weight to a victory (or loss), some victories should count more than others. You should, for instance, be suspicious of early season wins. Those could have been compiled by a team that looks a lot different from the one on the floor today.

“If a team is beating big teams at this point in the season and you are treating them as, say, two wins, then, they get elevated in the ratings. This is how we find teams that can otherwise be overlooked,” Chartier said.

However, sometimes his software is used to weight heavily those things that can sink a bracket, and sometimes it pays off. “One year I had a student reward the ability to have winning streaks,” he said. “She had the only bracket that we produced with math that recognized that Baylor was going to be in the Final Four.”

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Latest Articles