The aftermath of scrapping the long-form census

Scrapping the long-form census

Scrapping the long-form census

Of all the damage the Stephen Harper Conservative government has done to Canada during its disastrous reign, perhaps its most bizarre and capricious act was the scrapping of the long-form census in 2010. The 2011 census was instead paired with an optional survey, and response rates plummeted to the point where the data is useless to researchers, making the Conservatives’ experiment in enforced ignorance an expensive waste of time.

In an article on FiveThirtyEight a couple weeks ago, titled “What We Don’t Know About Canada Might Hurt Us”, Ben Cassleman reignites the discussion. What has inspired him to bring it up again is the fact that American Republicans are looking at the disasters Harper has wrought, and thinking that it’s a good idea; those guys never met a flavour of stupid they didn’t like. Using the American Community Survey (ACS) – the American analogue of our former long-form census – Casselman explains why good census data is important:

But government surveys are good for a lot more than data journalism and Twitter bots. They are used to evaluate the effectiveness of government programs, guide decisions on school construction and infrastructure upgrades, and make business investment decisions. When researchers in San Diego wanted to study the impact of the city’s proposed minimum-wage hike, they turned to ACS data. So did Boston when it wanted to assess the progress of its programs to encourage cycling, and Cincinnati when it applied for a grant to build a new streetcar system.

Our government simply can’t make informed decisions if it isn’t informed. Saying that seems almost tautological, but somehow the Conservative government didn’t get the memo.

Statistics Canada is, unsurprisingly, insisting that the data collected is good enough. Bear in mind that the current staff of StatCan is what’s left over after the protest resignations following the Conservatives’ decision to scrap the long-form census. Bear in mind also that this assertion that the data is good enough comes in the same blog post that:

  1. Spends over 40% of its wordage explaining all the otherwise unnecessary steps StatCan had to take to verify that the 2011 data wasn’t abysmally bad.
  2. Points out that data had to be straight-up discarded from 1,100 communities (compared to less than 160 in 2006).
  3. Cautions that data for some groups – like low-income Canadians – cannot be compared to previous data.
  4. Openly admits that it doesn’t really know what all the problems with the data really are.

But of course, StatCan is quite confident of the results. Unfortunately, researchers who are actually trying to use the data are not.

One other thing worth noting are the shell games StatCan is playing with the results. They are putting a wonderful spin on them, but dig a little deeper and you’ll find some clever tricks being played.

For example, StatCan’s Wayne R. Smith says this:

The number of households responding to the voluntary 2011 National Household Survey was 2,657,461, containing 6,719,688 people. This was 9% higher than the number of households responding to the mandatory 2006 Census long form (2,443,507 households, containing 6,136,517 people).

Well, that sounds good, doesn’t it? They got 9% more data than they got before! What’s the problem with that.

Well, let’s just see what he said a few sentences earlier:

We knew that if the initial sample size for the 2011 National Household Survey remained the same as in 2006, an increase in sampling error would result. So, to mitigate this risk, Statistics Canada increased the sampling rate in the 2011 National Household Survey from one in five households, for the Census long form in 2006, to one in three households.

Aha. “We knew the food quality would be lower than last time we ordered, so we ordered more to compensate”. That’ll fix it!

But that’s not even my favourite of the shell games they’re playing. Brace yourself. This will be awesome.

Someone named Dwight Follick put together some maps showing the data collected in 2006 and 2011. I got them from an article by Armine Yalnizyan on the blog Behind the Numbers – which is an excellent post, with more of these images like the ones I’m about to show.

The maps show which areas have satisfactory response rates (in green) to the 2006 long-form census and the 2011 National Housing Survey (NHS), and which don’t (in black). Here is the response rate in 2006:

The census subdivisions which had "satisfactory" response rates for the 2006 long-form census (shown in green).

The census subdivisions which had “satisfactory” response rates for the 2006 long-form census (shown in green).

And here is the response rate in 2011:

The census subdivisions which had "satisfactory" response rates for the 2011 National Housing Survey.

The census subdivisions which had “satisfactory” response rates for the 2011 National Housing Survey.

Well, that’s pretty depressing isn’t it? Clearly the data in 2011 is much more spotty than the 2006 data. Wayne Smith can spin it any way he likes, but there’s the visual evidence that we have problems.

Oh, but wait, you haven’t heard the punch line yet.

You see, what I’ve just showed you is which subdivisions had satisfactory versus non-satisfactory response rates. Sounds legit, right?

Turns out that StatCan has redefined “satisfactory”! In 2006, a 75% response rate was required to be “satisfactory”. In 2011? 50%.

Here is the 2011 map again… but this time, using the same standard of “satisfactory” as was used in 2006:

The census subdivisions which had "satisfactory" response rates for the 2011 National Housing Survey... according to the standard of "satisfactory" used in 2006 and before.

The census subdivisions which had “satisfactory” response rates for the 2011 National Housing Survey… according to the standard of “satisfactory” used in 2006 and before.

Compare that to the 2006 results, and then tell me again that the voluntary NHS survey is a satisfactory replacement for the long-form census.

But wait! Before you make that assertion, let me make one final point of comparison between the two.

As Wayne Smith mentioned, in order to “fix” the fact that the response rates for the NHS were abysmal, they simply sent out more surveys: up from ~1 in 5 sampled in 2006 to ~1 in 3 sampled in 2011. Not only did this clearly not really fix the problem – as the above images illustrate… it cost ~$22 million more.

Let that sink in. We paid $22 million extra… for shittier data. And to hide the fact of just how much shittier it is, StatCan basically just redefined “shitty”.

So, now try to argue that scrapping the long-form census was a good idea.

CC BY-SA 4.0
The aftermath of scrapping the long-form census by Indi in the Wired is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

2 Responses to The aftermath of scrapping the long-form census

  1. You could almost buy the civil liberties argument if it didn’t come from the same government that’s stomped all over them in the name of fighting terrorism.

    • Honestly, I find the civil liberties argument very weak. (I did think about bringing it up – I hate writing things that are one-sided – but I wanted the post to stay on point, and the massive damage to the data quality is a strong enough point on its own.)

      I would agree that government shouldn’t be demanding information from citizens… but I qualify that by adding “unless it’s reasonably necessary”. Context matters; requiring ID when voting is entirely different from requiring ID when walking down the street.

      Census data is not collected just for shits and giggles – that data is used to facilitate the administration of the country/province/whatever in a *huge* number of ways. It is necessary for determining how to allocate resources, like whether or not an area needs a hospital or a larger fire department. It is the core of evidence-based governance. Government cannot function as a rational entity without the census data to guide it.

      And all of that is without even mentioning the *enormous* value census data has to research.

      On balance, considering the massive benefits of the census – including the “benefit” that government can’t work rationally without it – I don’t think privacy or civil liberties arguments hold much weight. And this is coming from someone who takes privacy and civil liberties *very* seriously, to the point where I’m urging everyone who will listen to consider setting up encryption.

      I agree that citizens should have the option to refuse… but I don’t agree it should be as easy as “meh, this’ll take 10 minutes and my show’s coming on *toss in the bin*”. If a citizen has a legitimate objection, they can raise it in court, and it can be considered on a case-by-case basis. Otherwise, they just have to accept the burden of filling out a 50 question form once every 5 years (if that) as part of their duty as citizens, just like paying taxes.

Leave a Reply

%d bloggers like this: