Queensland state projections from Federal Senate voting data

With the resurgence of One Nation, there's some need to model how they'd perform running all over Queensland in a state election.

This being a resurgence, we don't have much by way of useful data from the last state election. Even the more recent Federal election isn't too much help, because they only ran in 12 of 30 federal divisions. It's certainly of some use, because they ran mostly in their stronger areas and are likely to do so again at the state election... but when we're trying to estimate seat polling from state-wide voteshare, we need a ground truth covering the entire state.

Enter Senate data, which is most definitely state-wide.

The Australian Electoral Commission (AEC) has published a spreadsheet ('Formal Preferences' for Queensland, 2016) which lists every formal Senate ballot's preference sequence and the polling place it was lodged at. ('Postal' etc counts as a polling place, broken up by federal division of the voter.)

The AEC has also published a spreadsheet ('Votes by SA1') which, for each polling place, lists the number of House of Representatives votes from each Statistical Area level 1 (SA1, usually contains about 200-400 voters).

The Electoral Commission of Queensland (ECQ) has published a spreadsheet ('Current and Projected SA1 Enrolment') detailing, for each SA1 in Queensland, the state electoral district in which it now resides as of the final determination for the state redistribution made in May 2017. Some SA1s are (or were?) split between districts; each part of an SA1 has its own line in the spreadsheet.

Plan of attack: take polling place data, project it down onto the fine grain of the SA1s (or parts thereof), then aggregate into state electoral districts.

There will, of course, be a number of processing steps.

The first issue is that there are many more parties on the Senate ballot than there will be on the State ballot papers. We solve this problem by only considering the Senate preferences for a subset of the parties: {Greens, Labor, Liberal National, One Nation, and None of those four}. People might interleave preferences for the four parties; we will consider only the earliest preference for each. There are 65 (potentially partial) orderings, detailed below:

1 no-preferences:

4 one-preferences:
(Grn), (Lab), (Lnp), (Phn)

12 two-preferences:
(Grn, Lab), (Grn, Lnp), (Grn, Phn), (Lab, Grn), (Lab, Lnp), (Lab, Phn), (Lnp, Grn), (Lnp, Lab), (Lnp, Phn), (Phn, Grn), (Phn, Lab), (Phn, Lnp)

24 three-preferences:
(Grn, Lab, Lnp), (Grn, Lab, Phn), (Grn, Lnp, Lab), (Grn, Lnp, Phn), (Grn, Phn, Lab), (Grn, Phn, Lnp), (Lab, Grn, Lnp), (Lab, Grn, Phn), (Lab, Lnp, Grn), (Lab, Lnp, Phn), (Lab, Phn, Grn), (Lab, Phn, Lnp), (Lnp, Grn, Lab), (Lnp, Grn, Phn), (Lnp, Lab, Grn), (Lnp, Lab, Phn), (Lnp, Phn, Grn), (Lnp, Phn, Lab), (Phn, Grn, Lab), (Phn, Grn, Lnp), (Phn, Lab, Grn), (Phn, Lab, Lnp), (Phn, Lnp, Grn), (Phn, Lnp, Lab)

24 four-preferences:
(Grn, Lab, Lnp, Phn), (Grn, Lab, Phn, Lnp), (Grn, Lnp, Lab, Phn), (Grn, Lnp, Phn, Lab), (Grn, Phn, Lab, Lnp), (Grn, Phn, Lnp, Lab), (Lab, Grn, Lnp, Phn), (Lab, Grn, Phn, Lnp), (Lab, Lnp, Grn, Phn), (Lab, Lnp, Phn, Grn), (Lab, Phn, Grn, Lnp), (Lab, Phn, Lnp, Grn), (Lnp, Grn, Lab, Phn), (Lnp, Grn, Phn, Lab), (Lnp, Lab, Grn, Phn), (Lnp, Lab, Phn, Grn), (Lnp, Phn, Grn, Lab), (Lnp, Phn, Lab, Grn), (Phn, Grn, Lab, Lnp), (Phn, Grn, Lnp, Lab), (Phn, Lab, Grn, Lnp), (Phn, Lab, Lnp, Grn), (Phn, Lnp, Grn, Lab), (Phn, Lnp, Lab, Grn)

The difference between the latter two sets is that some people chose to exhaust their vote rather than bothering to rank their least-preferred party of the four.

A Python script handily classifies every ballot into one of those 65 orderings and then summarises by polling place. I wouldn't want to do this analysis for additional parties — for five, there would be 326 possible partial orderings! At that point, and especially given issue (5), it would be better to just use primaries and set statewide preference flows.

At this stage we could also arguably include informal Senate ballots (which either have no identifiable first preference marking at all, or are otherwise disqualified) as votes for None. We can't, however, include non-voters as None, however, because we don't know where they'd vote.

At this stage we have a fairly usable spreadsheet, viewable here: https://docs.google.com/spreadsheets/d/16ZD62akSNBXx2s3tXcAZQNZOxSgLLBqTGRfeZuWvXvo/edit?usp=sharing

The second issue to deal with is that Senate turnout is slightly higher than House turnout (and more worryingly, the House turnout numbers from the spreadsheet don't actually match the published total). We resolve that issue by allocating per-booth proportions of each vote-order (so if a polling booth had equal quantities of each vote-order, and precisely one person from a a certain SA1 voted there, that SA1 would be credited with 1/65th of a vote in each category).

The third issue is that the ECQ has different electors-per-SA1 numbers than the AEC do. This has two factors: the AEC is using turnout at the July election, the ECQ is using enrolment about six months later — so the ECQ numbers should be higher. We solve this issue by scaling the federal votes on an SA1-by-SA1 basis.

The fourth issue lies in dealing with SA1s being split between state districts. This is actually quite simple to deal with: the ECQ publishes how many electors will be in each part of the SA1, so just split the votes accordingly.

Actually, issues 3 and 4 can be combined in the one step: simply scale the federal votes by ECQ SA1 [part] population / AEC SA1 total.

At this stage, having performed the aggregation by district, we have another spreadsheet, viewable here: https://docs.google.com/spreadsheets/d/1N0fH5nvKwsnuSmjEPWQpKFt7fjIrtSvw0kVBuWw0tWc/edit?usp=sharing

The fifth issue comes to simulating elections: While the knowledge of further preferences is helpful, Queensland at a state level has recently reverted to full-preferential ballots. The Senate meanwhile has just switched to partial-preferential. We must decide how voters who exhausted their Senate ballot would preference when forced — and it doesn't necessarily match the ratio of voters who did preference. As an example, in my experience, in the lower houses, about 20% of Greens voters will preference Labor if forced, but exhaust their vote if they can (the Greens who preference Liberal usually do so regardless).

The sixth issue is actually applying swings! For example, if polling shows a 5% swing away from One Nation since the election, but One Nation only got 4% in some city seats... what to do? This is a question which I'm still thinking on.

What isn't so much of an issue is people voting differently at federal and state levels. The difference is masked by any swings. Ideally, we'd be able to calibrate our seat model against a state poll held at roughly the same time as the federal election.

The other, Queensland specific factor is Katter's Australian Party. KAP and PHON voters are fairly similar, but the successor districts to the two currently held by KAP MPs should be analysed separately.

Update: Thanks to Kevin Bonham, we now have some more detailed analysis of seats which might have a non-classic result.