Archive for November, 2011


November 30, 2011

> data(quakes)
> head(quakes)

     lat   long depth mag stations

1 -20.42 181.62   562 4.8       41

2 -20.62 181.03   650 4.2       15

3 -26.00 184.10    42 5.4       43

4 -17.97 181.66   626 4.1       19

5 -20.42 181.96   649 4.0       11

6 -19.68 184.31   195 4.0       12

> summary(quakes)

      lat              long           depth            mag      

 Min.   :-38.59   Min.   :165.7   Min.   : 40.0   Min.   :4.00  

 1st Qu.:-23.47   1st Qu.:179.6   1st Qu.: 99.0   1st Qu.:4.30  

 Median :-20.30   Median :181.4   Median :247.0   Median :4.60  

 Mean   :-20.64   Mean   :179.5   Mean   :311.4   Mean   :4.62  

 3rd Qu.:-17.64   3rd Qu.:183.2   3rd Qu.:543.0   3rd Qu.:4.90  

 Max.   :-10.72   Max.   :188.1   Max.   :680.0   Max.   :6.40  


 Min.   : 10.00  

 1st Qu.: 18.00  

 Median : 27.00  

 Mean   : 33.42  

 3rd Qu.: 42.00  

 Max.   :132.00  

> plot(quakes,     pch=20, col=rgb(0,0,0,.1) , lwd=.6) 

> require(ggplot2)
> qplot(data = quakes, x = lat, y = long, size = exp(mag), color = mag, alpha = I(.8))

UPDATE: In the comments, Sean Mulcahy shared his much better post on earthquakes: He shows how to grab up-to-date earthquake data from the U.S. Geological Survey and display it with R’s maps package. Hooray!


November 29, 2011

deCordova by Boston « Threading In The Choirs, via nullscapes

A One-Line Viewpoint on Trading

November 28, 2011

Given a time-series of one security’s price-train P[t], a low-frequency trader’s job (forgetting trading costs) is to find a step function S[t] to convolve against price changes P[t]

with the proviso that the other side to the trade exists.

S[t] represents the bet size long or short the security in question. The trader’s profit at any point in time τ is then given by the above definite integral.

  • I haven’t seen anyone talk this way about the problem, perhaps because I don’t read enough or because it’s not a useful idea. But … it was a cool thought, representing a >0 amount of cogitation.
  • This came to mind while reading a discussion of “Monkey Style Trading” on NuclearPhynance. My guess is that monkey style is a Brownian ratchet and as such should do no useful work.
  • If I were doing a paper investigating the public-welfare consequences of trading, this is how I’d think about the problem.

    Each hedge fund / central bank / significant player is reduced to a conditional response strategy, chosen from the set of all step functions uniformly less than a liquidity constraint. This endogenously coughs up the trading volume which really should be fed back into the conditional strategies.

  • Does this viewpoint lead to new risk metrics?
  • Should be mechanical to expand to multiple securities. Would anything interesting come from that?

I wouldn’t usually think that multiplication of functions has anything to do with trading. Maybe some theorems can do a bit of heavy lifting here; maybe not.

It at least feels like an antidote to two wrongful axiomatic habits. For economists who look for real value, logic, and Information Transmission, it says The market does whatever it wants, and the best response is a response to whatever that is. For financial engineering graduates who spent too long chanting the mantraμ dt + σ dBt” this is just another way of emphasising: you can’t control anything except your bet size.

UPDATE: Thanks to an anonymous commenter for a correction.

November 27, 2011

Land Erodes, Weather Systems Shift, Volcanoes Erupt, Fires Sweep Through Forests, and Storms Leave Behind Drastic Change to the Earth’s Surface by Hollis Brown Thornton

November 26, 2011


November 25, 2011

data from the US Drug Enforcement Agency’s System To Retrieve Information on Drug Evidence

A few points about these pictures which I’ll be elaborating on in future posts:

  • sub i, sub j: There is significant variation from city to city and presumably dealer to dealer or customer to customer, since they plot interquartile range.
  • 3-D data: Since both purity and quantity affect the price, we’re really talking about a “price surface” — just like a volatility surface or the yield curve on Treasurys. And in fact there are even more dimensions to the data since it could be cut differently, and … well, I won’t say what makes for good coke.
  • data collection: Do you really believe these numbers? Some undercover cop probably solicited drugs (I didn’t read the methodology section but just guessing). Does that seem like an error-free data collection process? But the same goes for macroeconomic data, financial data from companies, and so on. It comes from somewhere, it’s not “the truth” necessarily.

The Dirty Projectors – Dark Eyes

November 24, 2011

Dark Eyes covered by The Dirty Projectors

November 23, 2011

Sacer by Gang Gang Dance

What’s Wrong with OKCupid’s Matching Algorithm

November 22, 2011

OKCupid is using the wrong mathematics to match potential dates together. But before I critique them, let me compliment them on what they’re doing right:

  • “Our” mutual score is the geometric average of your score of me, and my score of you.
  • They low-ball the match % until they have enough statistical confidence in the number of questions we’ve both answered.
  • Questions come from users as well as staff. So they avoid some potential blind spots. (crowdsourcing)
  • OKCupid prompts you with questions that have the greatest chance of distinguishing you as quickly as possible. (maximally separating hyperplanes) If OKC already knows you want your date to shower at least once a day, keep a clean room, and that picking food from the trashcan is unacceptable, it won’t ask if you prefer crustpunks or gutterpunks.
  • You don’t have to be the same as me for us to match. I get to specify what answers I want from you.
  • They use a logarithmic scale of importance. Logs are the natural way we perceive levels or categories of importance. (For example “categories” of how big a war was, emerge naturally when you take the log of number of deaths.)
  • It’s simple. At least they’re not using a non-linear Bayesian splitting tree didactogram or some other hunky machine-learning jiu jitsu.

But, there’s still room for improvement. Particularly the following critique, originally made by Becky Russoniello. Currently, OKCupid is set up to award high scores just for being not-a-terrible match. That’s bad.


To show why I need to first detail how your score of me is calculated:

  1. You answer questions like, “Is homosexuality a sin?” Your answer consists of: (a) what you think, (b) what answer/s are acceptable for me to give, and (c) how important it is for me to get this question “right” per your definition.
  2. The question’s importance draws from {Mandatory, Very Important, Somewhat Important, A Little Important, Irrelevant} which biject to the numbers {250, 50, 10, 1, 0}.
  3. If I get a Very Important question “right”, I get 50/50 points, and if I get a Very Important question “wrong”, I get 0/50 points. If I haven’t answered the Very Important question, I get 0/0 points — neither penalised nor rewarded.

For more details, see their FAAAQ.


Here’s the important flaw: the denominator grows as long as we’ve answered the same question. In practice, the Mandatory questions both

  1. crowd out more interesting differentiators, and
  2. inflate the scores of people who merely have tolerable political views.

To demonstrate this, I’ll share some of the Mandatory questions from my own OKCupid profile.

  • Do you think homosexuality is a sin?
  • How often are you open with your feelings? (can’t be Rarely or Never)
  • Would it bother you if your boss was minority, female, or gay?
  • Would you write your child’s college entry essay?
  • What volume level do you prefer when listening to music? (can’t be “I prefer not to listen to music”)
  • Would you try to control your mate with threats of suicide?
  • Gay marriage — should it be legal?
  • Are you married, engaged to be married, or in a relationship that you believe will lead to marriage?
  • How important to you is a match’s sense of humor? (can’t be Not Important)
  • Would the world be a better place if people with low IQ’s were not allowed to reproduce?

Some other doozies which I might wrongly make Mandatory include:

  • Which is bigger? The Earth, or the Sun?
  • How many continents are there?
  • Do you consider astrology to be a legitimate science?

The problem with all of these filters, is that I mean them to act only in a negative direction. (Could I call them “quasi-filters”?)


In other words, someone doesn’t become a great potential match simply because they’re not

  • a bigot,
  • a cheat,
  • a eugenicist,
  • or a depressive manipulative.

You need to receive those check-marks just to get to zero with me. You also need to be not-married-to-someone-else. That doesn’t win you plus points, it’s just a requirement. But under the current OKCupid schema, you do win 250/250 from me for simply being available. Oops.

Likewise, knowing basic facts from grade-school seems, like, uh, necessary. But, even if somebody thinks there are 6 or 8 continents, do you really think you won’t be able to tell once they message you?

Few people will be culled by the Continents question, and if you make 10 such easy questions Mandatory, then everybody else will start with 2500/2500 points — so the rest of your match questions will barely distinguish one from the other. Even the Very Important questions (50 points apiece) will only budge the score a little below a default of 100%. And the Somewhat Important questions, which tend to be the more discriminative ones, are mowed down by the juggernaught of Easy Questions.

EDIT (23 NOV): According to the comments, the number of continents is not a universal fact, but rather varies from culture to culture (and within cultures). So that’s a really terrible question to make Mandatory! I should have said above Few people will be culled by asking whether the Earth is bigger than the sun, and if you make 10 such easy questions Mandatory, then everybody else will start with 2500/2500 points.

OKCupid asks other, more useful questions, like:

  • Are you annoyed by people who are super logical?
  • Do you like abstract art?
  • Do you spend more money on clothes, or food?
  • Could you tolerate a ___________________ [my political / religious views] ?
  • Do you like dogs?

which would actually distinguish among potential dates for me. Let’s face it: I write a blog about mathematics, so someone who is annoyed by super logical people is probably going to dislike me. And, I like abstract art. Maybe we could go to a gala for our first date.

Although everyone knows there are 7 continents the Sun is bigger than the Earth, not everyone is bothered by “logical” personalities. So those questions better sort the available dates.

want to go on a cruise on us stevenf?


The worst side effect of the current scoring system, is that a spammer could easily answer only the questions with obvious answers (basic facts and display of non-bigotry) and get a decently high match percentage with a lot of people. At which point, the spammer uploads a picture of an attractive guy/girl, writes some generic profile text, and scams away.



I think a better model oft how people evaluate potential dates can be found within economics. Specifically, Kahneman & Tversky’s Prospect Theory:

The main lessons I draw from prospect theory, as a theory of psychology, are:

  1. We evaluate things based on a reference point (“zero”).
  2. Small perceived negatives are twice as bad, as small perceived positives are good (“local kink at zero”).
  3. Really bad or really good, we lose our ability to coherently measure how far from zero (“log-like at great distances”).

How does P.T. apply to dating and OKCupid?

Amos Tversky

Bigots, cheats, eugenicists, and depressive manipulatives are way off in negative land. I’m not even interested in meeting them. I don’t care whether OKC gives them a 0% or a 10%, because those are effectively the same to me: ignore. I only need OKCupid to accurately score people who are somewhere north of my reference point.

  • What if the scoring system simply binned everyone below 50%? They could all be labelled “non-match” and then twice as many numbers would be available to grade the remaining candidates.

    That’s a mathematically good idea, but doesn’t address the issue of dilution. And, it seems to ignore an aspect of “numbers psychology”: people like using only the upper half of the scale. Think about how people use the hotness scale: they would never be comfortable dating a 4.

  • What if OKCupid revamped their whole framework along the lines of Prospect Theory? Try to establish a reference point, do some research into psychology papers that bear on the topic, and so on.

    Well, it might be cool. But that’s a lot of work, and OKC is already successful. Big changes alienate users.

Here’s the simplest solution I can think of — which requires no UI changes and no research. In fact an OKC developer should only need to amend one line of code.

  • Mandatory questions can only give out negative points for answering wrong. No plus points for right answers to Mandatory.

Mathematically this is ugly because you introduce a discontinuity — but, so what? I think this is what the broad majority of people mean when they say something is mandatory. If you have a mandatory employee meeting, do people get a bonus for showing up? Does HM Revenue pat you on the back for paying tax?

In the eloquent phrasing of Chris Rock:

If OKC ends out giving some negative (or I guess imaginary, under the square root from the geometric average) scores, so what? I was ignoring everybody under 60% anyway.


If you use OKCupid, there is a way to improve your matches even if they never change their matching algorithm:

  • Lower the importance of questions with obvious answers. I bet you won’t start matching with people who believe the Earth is larger than the Sun. And you will pick up extra precision in matches with other people.
  • Even if something is mandatory for you to date someone, don’t use the Mandatory category like that. Maybe you can have a few mandatory questions, but overall it just dilutes the scoring.

November 20, 2011


Found in a gas station parking lot Pt. 2

Part one is here.