Posts Tagged ‘basic_facts’

Earthquakes

November 30, 2011

> data(quakes)
> head(quakes)

     lat   long depth mag stations

1 -20.42 181.62   562 4.8       41

2 -20.62 181.03   650 4.2       15

3 -26.00 184.10    42 5.4       43

4 -17.97 181.66   626 4.1       19

5 -20.42 181.96   649 4.0       11

6 -19.68 184.31   195 4.0       12

> summary(quakes)

      lat              long           depth            mag      

 Min.   :-38.59   Min.   :165.7   Min.   : 40.0   Min.   :4.00  

 1st Qu.:-23.47   1st Qu.:179.6   1st Qu.: 99.0   1st Qu.:4.30  

 Median :-20.30   Median :181.4   Median :247.0   Median :4.60  

 Mean   :-20.64   Mean   :179.5   Mean   :311.4   Mean   :4.62  

 3rd Qu.:-17.64   3rd Qu.:183.2   3rd Qu.:543.0   3rd Qu.:4.90  

 Max.   :-10.72   Max.   :188.1   Max.   :680.0   Max.   :6.40  

    stations     

 Min.   : 10.00  

 1st Qu.: 18.00  

 Median : 27.00  

 Mean   : 33.42  

 3rd Qu.: 42.00  

 Max.   :132.00  

> plot(quakes,     pch=20, col=rgb(0,0,0,.1) , lwd=.6) 

> require(ggplot2)
> qplot(data = quakes, x = lat, y = long, size = exp(mag), color = mag, alpha = I(.8))

UPDATE: In the comments, Sean Mulcahy shared his much better post on earthquakes: http://seanmulcahy.blogspot.com/2011/11/global-earthquakes-desktop.html. He shows how to grab up-to-date earthquake data from the U.S. Geological Survey and display it with R’s maps package. Hooray!

October 27, 2011

Fair Trade cocoa price, 1996-2006

You can see from the above graph that fair trade certifiers aim not just to raise, but to raise and stabilise the price a farmer or cooperative receives for produce.

Fact: There are many fair trade certifying bodies, this data comes from Flo-CERT GmbH, a non-profit based in Bonn. Flo-CERT pays X employees to verify that cocoa, coffee, and other popular consumption products are farmed and sold according to Fair Trade standards.

How much extra are you willing to pay these people (their efforts are part of the extra cost of fair trade goods) so that the farmers are guaranteed predictable revenues?

Fact 2: The loathèd corporation Starbucks has been paying stable above-market rates for their coffee for years.

Fact 3: The charts above depict a one-dimensional price. Of course each coffee/cocoa bean is unique; so is every farm and every farmer. For a commodity to be traded from hand to hand to hand, it needs to be standardised. But a big buyer like Starbucks which deals through its own channels with farmers might pay a higher price simply because it’s also requiring a higher grade of beans — thus leaving the middle quality ones to be sold at the regular market rate.

Conclusion: Nothing is as simple or clear-cut as it at first seems.

Average Wages by US County, 1Q2011

October 16, 2011

When people pontificate about national politics, I find the dialogue too generalistic.

These discussions ignore most of the interesting variation and lose touch with real places. And, certain facts that are obvious if you’re familiar with the more specific numbers seem “miraculous” when you just hear one nation-level statistic. (Tax statistics are one such.)

Consider the US unemployment rate, for example. Not only does that figure make it sound like the same 9.5% are unemployed — not true, it’s just an aggregate of all hirings & firings and business openings & business closings — but the unemployment rate in Dane, WI, doesn’t really affect me, because I live in Monroe, IN. If I see some really, really, really compelling place — like Travis, TX — I might uproot my entire life and thenceforth be affected by the data in Travis, TX. And a nearby, culturally good place like Louisville is relevant. I moved to Louisville for a while for a job. But mostly, I need to focus on improving the economy in Monroe, IN.

I remember very well, when I was running my first business, reading grim economic news about the rest of the country. Mall-dwelling retard businesses, national franchises leveraged on the assumption that all of their new franchisees will face good economic conditions … they were affected by the national statistics, but not me. The newspapers kept shouting about how bad things were and I didn’t see it at all.

 

I think if people were primed by reading a table like this before engaging in debates, a lot fewer overly-generalistic ideas would be floated. Looking at regional variation puts me in a frame of mind that’s more specific, more sub_i sub_j, in touch with data and out of touch with theory.

N America is too big for any one’s imagination. Europe is too big for any one’s imagination. Africa is too big for any one’s imagination. China is too big for any one’s imagination. India is too big for any one’s imagination. Theory makes the world seem small, which is necessary to be able to comprehend huge topics. But Theory can make you overconfident. Data humble you.

The question

  • How will policy X create green jobs in Monroe County? in Travis County? in Lancaster County?

gets my gears running very differently than the question

  • “How will policy X create green jobs?”

. Importantly, the first question is more bullsh~t-proof. Even though logically a “Create green jobs” type of claim should be evaluated as the sum total of all green jobs created in every county.

Third number from the right is weekly income.

Table 1. Covered(1) establishments, employment, and wages in the 323 largest counties,
first quarter 2011(2)
                                                                                                       
                                                                                                       
County	                        Average weekly wage
United States(6).........	935
	
San Juan, PR.............	598
Peoria, IL...............	944
Santa Clara, CA..........	1863
Macomb, MI...............	941
Clayton, GA..............	844
Wayne, MI................	1021
Brazoria, TX.............	922
Saginaw, MI..............	760
Stark, OH................	703
Butler, PA...............	799
New York, NY.............	2634
Hartford, CT.............	1260
Fulton, GA...............	1370
Washington, PA...........	867
Snohomish, WA............	968
Genesee, MI..............	742
Fort Bend, TX............	979
Jefferson, TX............	920
Forsyth, NC..............	891
Montgomery, TX...........	886
Hennepin, MN.............	1197
Harris, TX...............	1258
Weld, CO.................	776
Winnebago, IL............	769
Oakland, MI..............	1019
Catawba, NC..............	692
Cuyahoga, OH.............	953
Middlesex, MA............	1370
Mecklenburg, NC..........	1231
Marin, CA................	1103
San Diego, CA............	1003
Worcester, MA............	908
Anoka, MN................	829
Milwaukee, WI............	929
Douglas, CO..............	1069
San Francisco, CA........	1723
Lorain, OH...............	750
Sedgwick, KS.............	816
Caddo, LA................	736
Washington, OR...........	1120
Erie, PA.................	695
Cass, ND.................	765
Whatcom, WA..............	745
Los Angeles, CA..........	1046
Hamilton, IN.............	924
Benton, AR...............	1110
Howard, MD...............	1141
Somerset, NJ.............	1867
Bexar, TX................	838
Contra Costa, CA.........	1210
Nueces, TX...............	748
New Castle, DE...........	1194
Bristol, MA..............	791
Essex, MA................	955
Henrico, VA..............	1027
Ramsey, MN...............	1093
Dane, WI.................	878
Scott, IA................	725
Ottawa, MI...............	714
Westmoreland, PA.........	716
De Kalb, GA..............	992
Fayette, KY..............	811
Ingham, MI...............	879
Travis, TX...............	1002
Tuscaloosa, AL...........	778
Muscogee, GA.............	749
Frederick, MD............	904
Hillsborough, NH.........	975
Lucas, OH................	793
Charleston, SC...........	774
Cook, IL.................	1145
Collin, TX...............	1075
Virginia Beach City, VA..	717
Fairfield, CT............	1888
Vanderburgh, IN..........	729
Rockingham, NH...........	857
Camden, NJ...............	903
Lake, IN.................	791
St. Louis, MN............	722
King, WA.................	1185
Pulaski, AR..............	819
Oklahoma, OK.............	837
Elkhart, IN..............	698
Larimer, CO..............	795
Mercer, NJ...............	1283
Multnomah, OR............	918
Allegheny, PA............	997
Greenville, SC...........	770
Dallas, TX...............	1156
Maricopa, AZ.............	889
Sacramento, CA...........	1025
Santa Barbara, CA........	869
Tulsa, OK................	825
Kanawha, WV..............	797
Denver, CO...............	1212
Will, IL.................	793
Plymouth, MA.............	815
Suffolk, MA..............	1625
Kalamazoo, MI............	816
Jefferson, AL............	919
Ada, ID..................	773
Polk, IA.................	940
Minnehaha, SD............	748
Shelby, TN...............	915
Richmond City, VA........	1071
Calcasieu, LA............	768
Cumberland, ME...........	835
Buncombe, NC.............	676
Guilford, NC.............	802
Webb, TX.................	590
Benton, WA...............	959
Mobile, AL...............	741
New Haven, CT............	956
New London, CT...........	960
Lafayette, LA............	847
Lancaster, PA............	734
Washington, AR...........	726
Greene, MO...............	661
Yellowstone, MT..........	721
Middlesex, NJ............	1191
Erie, NY.................	794
Mahoning, OH.............	632
Dauphin, PA..............	889
Northampton, PA..........	791
Spokane, WA..............	751
Placer, CA...............	876
Hillsborough, FL.........	880
McHenry, IL..............	727
Harford, MD..............	844
Barnstable, MA...........	759
Norfolk, MA..............	1066
Essex, NJ................	1229
Broome, NY...............	703
Philadelphia, PA.........	1079
Madison, AL..............	978
Ventura, CA..............	964
Orange, FL...............	805
Palm Beach, FL...........	886
Wyandotte, KS............	826
Franklin, OH.............	920
Williamson, TN...........	1054
Galveston, TX............	827
Fairfax, VA..............	1479
Lee, FL..................	711
Shawnee, KS..............	751
Onondaga, NY.............	831
Newport News City, VA....	826
Clark, WA................	800
Pima, AZ.................	768
Kern, CA.................	790
Escambia, FL.............	690
Queens, NY...............	844
Suffolk, NY..............	972
Cumberland, NC...........	695
New Hanover, NC..........	741
Chesapeake City, VA......	724
Brown, WI................	803
Montgomery, AL...........	764
Adams, CO................	806
Collier, FL..............	767
Oneida, NY...............	708
Hamilton, OH.............	992
Luzerne, PA..............	684
Bell, TX.................	736
Chesterfield, VA.........	830
Alameda, CA..............	1183
Cobb, GA.................	962
Allen, IN................	747
Berks, PA................	780
Lexington, SC............	650
Boulder, CO..............	1050
Polk, FL.................	668
Chatham, GA..............	752
Richmond, GA.............	743
Linn, IA.................	847
Montgomery, MD...........	1311
Hinds, MS................	778
Denton, TX...............	780
Outagamie, WI............	747
Waukesha, WI.............	902
Lehigh, PA...............	879
Smith, TX................	739
Salt Lake, UT............	856
Jefferson, CO............	929
Baltimore City, MD.......	1081
Cumberland, PA...........	815
Delaware, PA.............	1003
Utah, UT.................	681
Manatee, FL..............	668
Marion, IN...............	987
Jefferson, LA............	831
Dakota, MN...............	895
St. Louis, MO............	973
Lancaster, NE............	711
Richmond, NY.............	758
Lake, OH.................	774
Norfolk City, VA.........	861
Alachua, FL..............	730
Burlington, NJ...........	957
York, PA.................	789
Fresno, CA...............	709
Sonoma, CA...............	846
Miami-Dade, FL...........	874
Gwinnett, GA.............	879
Du Page, IL..............	1076
Sangamon, IL.............	907
Jefferson, KY............	873
Kent, MI.................	792
Olmsted, MN..............	968
Washoe, NV...............	789
Monroe, NY...............	847
Clackamas, OR............	798
Lane, OR.................	672
Orange, CA...............	1035
San Bernardino, CA.......	754
Nassau, NY...............	1015
Montgomery, OH...........	782
El Paso, TX..............	626
Tarrant, TX..............	900
Riverside, CA............	748
San Joaquin, CA..........	752
Broward, FL..............	834
Ocean, NJ................	746
Bronx, NY................	818
Davidson, TN.............	927
Hidalgo, TX..............	556
Duval, FL................	891
Seminole, FL.............	735
Honolulu, HI.............	821
St. Joseph, IN...........	723
Boone, MO................	692
Douglas, NE..............	853
Passaic, NJ..............	921
Bucks, PA................	855
Richland, SC.............	794
Chittenden, VT...........	878
Orleans, LA..............	983
Knox, TN.................	750
Brazos, TX...............	659
Cameron, TX..............	546
McLennan, TX.............	727
Pierce, WA...............	821
El Paso, CO..............	812
Champaign, IL............	750
Albany, NY...............	937
Chester, PA..............	1164
Lackawanna, PA...........	665
Horry, SC................	534
Tulare, CA...............	622
Lake, FL.................	586
Marion, FL...............	614
Pasco, FL................	596
Pinellas, FL.............	765
Volusia, FL..............	629
Kane, IL.................	777
East Baton Rouge, LA.....	831
St. Louis City, MO.......	1037
Atlantic, NJ.............	772
Bergen, NJ...............	1152
Lubbock, TX..............	653
Solano, CA...............	921
Arapahoe, CO.............	1130
Monmouth, NJ.............	945
Jackson, OR..............	644
Anchorage Borough, AK....	958
Bernalillo, NM...........	781
Rockland, NY.............	991
Spartanburg, SC..........	761
Stanislaus, CA...........	748
Bibb, GA.................	699
Johnson, KS..............	955
Morris, NJ...............	1462
Washington, DC...........	1540
Sarasota, FL.............	722
Clay, MO.................	850
Weber, UT................	642
Baltimore, MD............	920
Providence, RI...........	895
Davis, UT................	704
Brevard, FL..............	801
Stearns, MN..............	700
Orange, NY...............	755
Summit, OH...............	841
Yakima, WA...............	606
Winnebago, WI............	831
San Luis Obispo, CA......	742
Santa Cruz, CA...........	814
McLean, IL...............	904
Madison, IL..............	738
Prince Georges, MD.......	933
Montgomery, PA...........	1198
Rutherford, TN...........	771
Loudoun, VA..............	1093
St. Clair, IL............	709
Union, NJ................	1199
Wake, NC.................	917
Marion, OR...............	699
Clark, NV................	790
Dutchess, NY.............	917
Kitsap, WA...............	798
Harrison, MS.............	668
Monterey, CA.............	808
San Mateo, CA............	1485
Jackson, MO..............	894
St. Charles, MO..........	744
Westchester, NY..........	1332
Prince William, VA.......	808
Washtenaw, MI............	925
Gloucester, NJ...........	766
Kings, NY................	725
Leon, FL.................	722
Hampden, MA..............	812
Thurston, WA.............	800
Arlington, VA............	1549
Butler, OH...............	781
Hamilton, TN.............	785
Durham, NC...............	1276
Hudson, NJ...............	1509
Williamson, TX...........	953
Yolo, CA.................	892
Lake, IL.................	1230
Anne Arundel, MD.........	958
Alexandria City, VA......	1226 

Data notes:

  • There’s a lot of variation in number of counties per American state. For example, Indiana (36k sq mi) has 92 counties whilst Massachusetts (10 k sq mi) has 14.
  • Also, this is only private employers which skews some of the Maryland and Virginia numbers.
  • Also, this is a look at employed people, and it doesn’t count benefits.

Some raw-data observations:

  • average income in New York County is $2,600/week but only $800/week in the Bronx.
  • San Francisco and Arlington, VA are about $1000/week less than New York County.
  • Incomes in Indianapolis (Marion County) are a joke on a national scale. Even if you include people in Carmel (Hamilton County) it’s still less than $1000/week. I thought all of those Lilly people made a tidy bundle; I guess they’re too few to bring up the average.
  • I should ddply this data.
  • There seem to be a lot of $600’s $700’s $800’s. That basically checks out with median household income of $51k. Although households can comprise two individual incomes.

October 11, 2011

[T]he firms that leaned most heavily on lobbyists have outperformed the S&P 500 by a whopping 11 percent per year since 2002.

Brad Plumer

report by Strategas; chart appears both in wapo.st and econ.st

September 11, 2011

Look at that vol! What is going on, world?

July 4, 2011

Rat hippocampus, photographed by Thomas Deerinck. via billydalto

June 23, 2011

June 7, 2011

People just don’t google for luxury items like they used to. But 2010-11 Christmas holiday saw a pop.

Polyhedra

May 18, 2011

Polygons (2), polyhedra (3), polychora (4-D), and polytopes (∀) can be represented as a graph — the same network-skeletal structure that models

For example, this is a skeletal graph of a cube:

So are these:

And here’s a skeletal graph of a 4-cube (tesseract).

Triangle = Blood.

And now for the news. triangular face of a polytope has the exact same topology as the blood types.

That’s weird, right?

EDIT: Oops, the bottom arrows should have been 1 → 3. Followed by 2 → 1 and 3 → 1. Also the blood-types observation is not weird, it’s just a statement of the power set topology. ‘Scuse me, while I kiss this guy.

May 10, 2011

Null hypothesis testing is voodoo.

Changes in the mental state of the experimenter should not affect the objective inference of the experiment. An argument for using Bayesian data analysis instead of H0 vs Ha.

Imagine you have a scintillating hypothesis about the effect of some different treatments on a metric dependent variable. You collect some data (carefully insulated from your hopes about differences between groups) and compute a t statistic for two of the groups. The computer program, that tells you the value of t, also tells you the value of p, which is the probability of getting that t by chance from the null hypothesis.

You want the p value to be less than 5%, so that you can reject the null hypothesis and declare that your observed effect is significant.

What is wrong with that procedure? Notice the seemingly innocuous step from t to p. The p value, on which your entire claim to significance rests, is conjured by the computer program with an assumption about your intentions when you ran the experiment. The computer assumes you intended, in advance, to fix the sample sizes in the groups.

In a little more detail, and this is important to understand, the computer figures out the probability that your t value could have occurred from the null hypothesis if the intended experiment was replicated many, many times. The null hypothesis sets the two underlying populations as normal populations with identical means and variances. If your data happen to have six scores per group, then, in every simulated replication of the experiment, the computer randomly samples exactly six data values from each underlying population, and computes the t value for that random sample. Usually t is nearly zero, because the sample comes from a null hypothesis population in which there is zero difference between groups. By chance, however, sometimes the sample t value will be fairly far above or below zero. The computer does a bizillion simulated replications of the experiment. The top panel of Figure 1 shows a histogram of the bizillion t values. According to the decision policy of NHST, we decide that the null hypothesis is rejectable by an actually observed tobs value if the probability that the null hypothesis generates a value as extreme or more is very small, say p < 0.05. The arrow in Figure 1 marks the critical value tcrit at which the probability of getting a t value more extreme is 5%. We reject the null hypothesis if tobs > tcrit In this case, when N = 6 is fixed for both groups, tcrit = 2.23. This is the critical value shown in standard textbook t tables, for a two-tailed t-test with 10 degrees of freedom.

In computing p, the computer assumes that you did not intend to collect data for some time period and then stop; you did not intend to collect more or less data based on an analysis of the early results; you did not intend to have any lost data replaced by additional collection. Moreover, you did not intend to run any other conditions ever again, or compare your data with any other conditions. If you had any of these other intentions, or if the analyst believes you had any of these other intentions, the p value can change dramatically.

AUTHOR: John Kruschke. The Road to Null Hypothesis Testing is Paved with Good Intentions.