Of all the damage the Stephen Harper Conservative government has done to Canada during its disastrous reign, perhaps its most bizarre and capricious act was the scrapping of the long-form census in 2010. The 2011 census was instead paired with an optional survey, and response rates plummeted to the point where the data is useless to researchers, making the Conservatives’ experiment in enforced ignorance an expensive waste of time.
In an article on FiveThirtyEight a couple weeks ago, titled “What We Don’t Know About Canada Might Hurt Us”, Ben Cassleman reignites the discussion. What has inspired him to bring it up again is the fact that American Republicans are looking at the disasters Harper has wrought, and thinking that it’s a good idea; those guys never met a flavour of stupid they didn’t like. Using the American Community Survey (ACS) – the American analogue of our former long-form census – Casselman explains why good census data is important:
But government surveys are good for a lot more than data journalism and Twitter bots. They are used to evaluate the effectiveness of government programs, guide decisions on school construction and infrastructure upgrades, and make business investment decisions. When researchers in San Diego wanted to study the impact of the city’s proposed minimum-wage hike, they turned to ACS data. So did Boston when it wanted to assess the progress of its programs to encourage cycling, and Cincinnati when it applied for a grant to build a new streetcar system.
Our government simply can’t make informed decisions if it isn’t informed. Saying that seems almost tautological, but somehow the Conservative government didn’t get the memo.
Statistics Canada is, unsurprisingly, insisting that the data collected is good enough. Bear in mind that the current staff of StatCan is what’s left over after the protest resignations following the Conservatives’ decision to scrap the long-form census. Bear in mind also that this assertion that the data is good enough comes in the same blog post that:
- Spends over 40% of its wordage explaining all the otherwise unnecessary steps StatCan had to take to verify that the 2011 data wasn’t abysmally bad.
- Points out that data had to be straight-up discarded from 1,100 communities (compared to less than 160 in 2006).
- Cautions that data for some groups – like low-income Canadians – cannot be compared to previous data.
- Openly admits that it doesn’t really know what all the problems with the data really are.
But of course, StatCan is quite confident of the results. Unfortunately, researchers who are actually trying to use the data are not.
One other thing worth noting are the shell games StatCan is playing with the results. They are putting a wonderful spin on them, but dig a little deeper and you’ll find some clever tricks being played.
For example, StatCan’s Wayne R. Smith says this:
The number of households responding to the voluntary 2011 National Household Survey was 2,657,461, containing 6,719,688 people. This was 9% higher than the number of households responding to the mandatory 2006 Census long form (2,443,507 households, containing 6,136,517 people).
Well, that sounds good, doesn’t it? They got 9% more data than they got before! What’s the problem with that.
Well, let’s just see what he said a few sentences earlier:
We knew that if the initial sample size for the 2011 National Household Survey remained the same as in 2006, an increase in sampling error would result. So, to mitigate this risk, Statistics Canada increased the sampling rate in the 2011 National Household Survey from one in five households, for the Census long form in 2006, to one in three households.
Aha. “We knew the food quality would be lower than last time we ordered, so we ordered more to compensate”. That’ll fix it!
But that’s not even my favourite of the shell games they’re playing. Brace yourself. This will be awesome.
Someone named Dwight Follick put together some maps showing the data collected in 2006 and 2011. I got them from an article by Armine Yalnizyan on the blog Behind the Numbers – which is an excellent post, with more of these images like the ones I’m about to show.
The maps show which areas have satisfactory response rates (in green) to the 2006 long-form census and the 2011 National Housing Survey (NHS), and which don’t (in black). Here is the response rate in 2006:
And here is the response rate in 2011:
Well, that’s pretty depressing isn’t it? Clearly the data in 2011 is much more spotty than the 2006 data. Wayne Smith can spin it any way he likes, but there’s the visual evidence that we have problems.
Oh, but wait, you haven’t heard the punch line yet.
You see, what I’ve just showed you is which subdivisions had satisfactory versus non-satisfactory response rates. Sounds legit, right?
Turns out that StatCan has redefined “satisfactory”! In 2006, a 75% response rate was required to be “satisfactory”. In 2011? 50%.
Here is the 2011 map again… but this time, using the same standard of “satisfactory” as was used in 2006:
Compare that to the 2006 results, and then tell me again that the voluntary NHS survey is a satisfactory replacement for the long-form census.
But wait! Before you make that assertion, let me make one final point of comparison between the two.
As Wayne Smith mentioned, in order to “fix” the fact that the response rates for the NHS were abysmal, they simply sent out more surveys: up from ~1 in 5 sampled in 2006 to ~1 in 3 sampled in 2011. Not only did this clearly not really fix the problem – as the above images illustrate… it cost ~$22 million more.
Let that sink in. We paid $22 million extra… for shittier data. And to hide the fact of just how much shittier it is, StatCan basically just redefined “shitty”.
So, now try to argue that scrapping the long-form census was a good idea.