Baseball and data have been absorbing passions of mine since i was very young. One of the first things i did when i first got my hands on a computer in the late 1970s was to develop a way of tracking the Major League standings and developing new statistics from the data i got from the newspaper.
While i still love baseball, i've lost a lot of interest in the Major Leagues with the outrageously greedy owners and downright evil usurper of a commissioner. But i still love data and for the last several years i've been tracking the standings in an interesting, beautiful way.
I update daily win-loss data once a week; i have figured some tricks, but it still takes quite a while. I'm sure there is a way to have the computer automatically search out the scores every day and insert it into my database, but i haven't figured that out yet.
The graphs are by division, there are 6 divisions, 6 pennant races in baseball these days (they call it the crown, the pennant is for the league). The reason for such a large number of divisions is to create those painfully long and boring playoffs that take up most of the fall.
I plot the daily standing (winning percentage) of each team in the division. It's a great way to see how the season unfolds. You can SEE pennant races. Teams will tangle for a period, the lines will tie in a knot, then one will move away. You can see slumps and streaks. You can even see double headers if you look closely enough. You can instantly see if i division is a tight race or a blowout. It's also interesting to me how the games seem to have different value as the season progresses. The emotional reaction is that games late in the season mean more, but looking at the graphs, you can see how one game causes a much bigger jump in the standings early in the season than it does late in the season. This is standard mathematics, of course they all count the exact equal amount, but it LOOKS different, and it looks different from what we normally assume.
The dotted line is the league wild card -- the highest placed 2nd place team. It's strange to blog about dynamic data. It's going to be frozen in time here. I had been intending to post this at the end of the season, but with a week still to go i find it fascinating to see that gap to the finish line. I've got graphs now for four years, but i'll just post this year's i think. Every year is different, but they are all beautiful.
American League East
The Yankees and Red Sox tangle it up all season and then the Red Sox slump big at the end. Otherwise no real position changes in the second half of the season, but FOUR teams above .500! The rest of the divisions have only 2 (or 1).
American League Central
The Tigers and Indians tangled mid-season for first, then the Indians dropped down into a battle for second with the White Sox. Twins have a long slow but deep slump after the All Star break. Tigers are the only team above .500.
American League West
Mariners collapse in July but fairly stable otherwise. Two races, far apart. Two teams above .500 two below.
National League East
Phillies start to finish. Braves have a good season, no one close, but can't touch the Phillies.
National League Central
This wonderful tight knot in July with the Pirates, Cardinals and Brewers ended with a huge split.
National League West






No comments:
Post a Comment