As I’m embarking on my first research project ever, I thought it would be sort of fun to discuss an awesome paper with a fatal flaw. The idea is cool, the data is cool, and at first glance it seems like the perfect RDD (regression discontinuity design.) BUT, lo and behold, there’s a wrench in the works!
This is a pretty cool paper written by several authors, one of whom is Heidi Williams, a Dartmouth ’03!
The idea is that infants that are classified as VLBW (very low birth weight) are mandatorily given extra medical care and attention. The cutoff for VLBW is 1500 grams – and, importantly, you must be BELOW 1500 grams to be classified as VLBW. So, in what appears to be a perfect regression continuity design, one could hypothetically compare infants weighing 1499 grams and 1500 grams – nearly identical in all other respects – and isolate the marginal benefit of extra medical care.
Sounds too good to be true! The running variable here – birth weight as measured in grams – doesn’t “run” quite as smoothly as we’d like it to. And this choppiness in the running variable is – gasp! – largely present in the data for minority and low-income babies! Let’s think about why…
Say you’ve got a high-income baby at a high-income hospital. The hospital reasonably will have better weighing devices, so that hospital can accurately say that a baby weighs 1499 grams. Consider, though, a low-income baby at a low-income hospital, with less fine weighing machinery. At that hospital, a baby weighing 1499 grams – or 1498, or 1497, or something close – might be recorded as weighing 1500 grams. In the aggregate, the data starts to pile up on either side of the 1500-gram discontinuity – low-income babies tend to pile up exactly at 1500 grams, while their high-income counterparts have a smooth weight distribution.
The implications for the RDD are, well, that it doesn’t work. Because anyone AT the 1500 gram weight does NOT receive VLBW treatment, you’ve got a pileup of low-income kids that DON’T receive treatment, and you’re comparing them to high-income babies that DO receive treatment. You’re no longer comparing nearly-identical babies… the RDD falls apart!
In a response here, Barreca et al propose a cool way of testing to make sure you don’t fall into this trap because of an unfortunate pileup in your running variable – omit the actual discontinuity (i.e. all data at exactly 1500 grams) from your dataset. 1499 gram babies are still basically the same as 1501 gram babies, so an RDD should plausibly work just as well with the discontinuity itself omitted. The Almond et al paper doesn’t stand up to the test, as we would expect.
There would be no problem if the 1500 gram babies were included as VLBW. You’d probably have an underestimation problem instead of the overestimation problem this paper faces.
Just thought this was a cool example of how even the most perfect research design can go awry, and yet we can still learn something useful from it!