So being a passionate "soda" guy, I've occasionally gotten into the classic old "soda vs pop" argument. A few times this has happened, someone has sent me a survey map. This is that map:
At first glance, damn that's a lot of blue. Pop is blue, maybe I've been wrong this whole time. Also damn, that's a lot of red, maybe I've been really wrong. I'm not looking forward to admitting this to my friends who say Pop. But then I started to take a closer look.
Half of those blue states are pretty loosely populated. I'm not sure North Dakota even has people in it, I think it's only a state because Canada didn't want it. Hell, there are a lot of counties in that blue sea that are purple, otherwise stated as "no data".
I can instantly see Soda rules both NYC and LA, the two biggest cities in the country. But Pop holds Chicago, #3 on that list. So in the interest of science, I decided to dedicate an hour or so of my time this friday to analyzing this single piece of data to figure out which saying rules: Soda, Coke or Pop.
here is my very unprofessional findings:
In terms of the map, land mass easily is won by Pop, nothing is touching that amount of blue. But acres without people in them don't count, so lets compare two other categories: Cities and States by population. Neither stat is perfect: Cities do not represent entire states population wise, so using only cities misses out on lots of people. But going by states isn't terribly accurate either, as cities represent large portions of state population and they are isolated into small areas. For example: look above. Colorado has about 2 counties that aren't blue, but one of counties is the location of Colorado Springs, the 41st most populated city in the nation. So that tiny yellow dot means more than it appears to. So that's why I'll look at both city and state, and make a judgement after that instead of just using cities or states.
States:
Several states appear to be split geography wise. Lets go with the states that are all or 95% one color:
Now lets look at split states:
For the record, I used Wikipedia's "Projected Population in 2011" numbers.
http://en.wikipedia.org/wiki/List_of_U.S._states_by_population
Soda easily wins the Solid states by over 10 million people. Coke and Pop are pretty close. Pop easily has the most states under it's influence, but most of those states are of low population (NYC is 8 times Montana's population, and 16 times Wyoming's) so while they have the area, they don't have the people. However, Soda has the Entirety of California, and Coke the Entirety of Texas. Pop has no such big population state to itself, the biggest states pop is in are split, like New York and Pennsylvania. And the split states between Soda and Pop make up almost Pop's entire solid state population. So what to do about that?
If we split the split states in half and give the two sides their respective half, What happens to the numbers?
Half of 56,926,166 is 28,463,083
Half of 36,810,547 is 18,405,274
1 Third of Alaska is 240,906
Add those 3 halves up plus Soda's solid states and Soda's total is 121,342,338
Half of 56,926,166 is 28,463,083
Half of 10,886,278 is 5,143,139
1 Third of Alaska is 240,906
Add those up plus Pop's solid states and Pop's total is 93,539,923
Half of 10,886,278 is 5,143,139
Half of 36,810,547 is 18,405,274
1 Third of Alaska is 240,906
Add those up plus Coke's solid states and Coke's total is 85,655,860
So the standings switch slightly. Soda still dominates, but Pop is now edging out Coke for second place. But most of you can probably pinpoint the obvious flaw in the second computation I was doing there: Splitting the split states right down the middle number wise is not accurate whatsoever. Look at Pennsylvania. It appears to be split directly down the middle. But Eastern PA is far, far more populated than Western PA. Philly is the 5th biggest city in the nation, and Eastern PA also has Harrisburg and Scranton. Western PA's biggest city is Pittsburgh, and Pitt is all the way down at #61. So splitting Pennsylvania in half is actually very very flawed. So basically, we need more information, as judging by solid states is flawed, but judging from split states is also flawed. Lets move on to the other big category: Cities.
For this list, I used the top 100 cities by 2011 estimate:
http://en.wikipedia.org/wiki/List_of_United_States_cities_by_population
Woof. This is where it's pretty obvious Soda is our winner. I don't even need to add it up. NYC and LA at #1 and 2 combine for 11 million already, which is more than most of the other cities combined. Considering Pop goes from #3 to #15, pop stands no chance. #50 to #100 is just a range of about 150 thousand, so late in the list there is no real discrepancies between cities: the top 10 makes this list. Soda has 6 of the top 10, Coke has 3, and Pop just 1.
Soda's got the stats no matter what way you look at it. The only category Soda does not win in is Land Area. What surprised me about all this is that Coke is far more prevalent then I thought. I knew it was a southern thing, but Coke has a lot of people saying it (which is weird, because Coke is a brand of Soda).
I've always been a Soda guy, so today I am triumphant.
Of course, this is all based off one single survey map that may not have asked enough people, and I am laughably unqualified as a statistician. The only real way to figure this out would be to put "soda pop coke or other" on a national ballot. But take that, Pop. if you say Pop, you are wrong. The numbers say so. So go drown your sorrows in some Soda.




I found the original source for the image:
ReplyDeletehttp://www.popvssoda.com/
According to the compiled statistics (which does have county-wide breakdowns for every state), "pop" actually significantly edges out "coke" and even gives "soda" a run for its money. However, there are some serious problems with the data: nevermind that it hasn't been updated since 2002 (just over 10 years, in fact), the author mentions that sampling is probably biased but neglects to mention it's probably incredibly biased and even borderline wrong.
I'm not a statistician either, but I do use statistics on a regular basis, and can say with near certainty (95% confidence band, as we'd say) that this data isn't nearly rigorous enough to justify the numbers he gets.
I commend you on your analysis; it's a very nice job. But as we say in academia (and probably industry): crap in, crap out. The dataset is ultimately shit, so in spite of your heroic efforts to extract meaning, there just isn't meaning to be pulled from it.
But people who say "pop" are still wrong.