Monday, 21 December 2015

On Sub-Samples: Inferring and Weighting

As I write, across social media numerous academics and pollsters are criticising certain individuals for selecting sub-samples of commercial polls and citing them as fully representative polls of their respective sub-sections of the polled population. For example, one such individual was claiming that the Scottish sub-sample of respondents from a national poll was reflective of the overall voting intention of the Scottish population.

This is of course entirely wrong to do. Such sub-samples are not representative of the overall population: simply because they are not large enough to have captured an accurate (within margins of error) distribution of voting intention within the population. Sampling theory and distribution theory combine in this regard to tell us this. It's for this same reason that we set a benchmark of at least 1,000 respondents for commercial polls in order for them to even begin to be considered a representative sample of an electorate.

Put simply, polling 100 people in Scotland would not in any way, shape or form give you an accurate representation of the voting intention of the population of Scotland. Thus, if a sub-sample within a commercial poll only contains 100 Scottish respondents, we cannot take those Scottish respondents as a standalone sample and suggest that the reported voting intention as recorded by said poll is an accurate reflection of current Scottish voting intention. It simply doesn't work like that.

However, what is of great interest is how we are (rightly) so quick to condemn treating sub-samples within commercial polls as representative of their respective populations, but yet very at ease with using often equally (and sometimes smaller) sub-samples in weighting models.

Weighting is applied to representative samples in order to improve their representativeness, and to try and better reflect things like likelihood to turn out to vote. Every commercial polling company uses them, and all survey databases come with weights ready to apply.

While the benefits of (and indeed need for) weighting is not confusing, what is puzzling is that seemingly the same assumptions which we unanimously agree are false about sub-samples of commercial polls in the instance of reading Scottish voting intention are contrastingly considered true when we are applying weighting models to such sub-samples.

For instance, in order to confidently apply a weight which might up-weight Scottish respondents (because they were perhaps underrepresented in the original sample), are we not actually assuming the exact same thing as those who are reading results from the sub-samples: that the individuals that we do have within those samples reflect an accurate representation of the distribution of voting intention within said group?

This same logic would apply to weights on any sub-sample, from age to newspaper readership. Why are we so quick to disregard any inferring of results from sub-samples, but so quick to weight on them?

At the theoretical level, distribution theory (where we reinterpret functions as linear functionals acting on a space of test functions) would suggest that both reading results from sub-samples and applying weighting models based on sub-samples are in effect assuming the same thing: that such sub-samples have captured a representative distribution of voting intention of the population which that sub-sample represents.

Why are we so confident in one regard (that we can robustly apply weighting models and functions to small sub-samples) but so definitely unconfident in the other (that we cannot infer sub-population results from sub-samples)? Why is representativity a worry for inferring results from sub-samples, but not for applying weights to them? Does the small 'n' problem simply not apply to weighting?

Friday, 4 December 2015

Oldham West and Royton

Last night, Labour held Oldham West and Royton in a fashion and by a margin which no one predicted. The overriding feeling coming into this by-election was that Labour would struggle to achieve anything other than an unconvincing victory by slim margins. That UKIP would be tugging furiously on their coat tails in a northern seat in which they had already finished second earlier this year, packed with a higher-than-average number of working class voters (40% compared to a national average of 26%) with whom they do so well.

Counting against them, the Labour Party had the usual by-election effects which align against incumbent parties in these situations; lower turnout and increased protest voting associated with by-elections usually favour second placed parties, with 'soft' voters for the majority party more likely to stay at home.

But they also had to contend with additional factors such as an apparently organised and motivated UKIP  gunning for their working class northern safe seats, and of course a new leader in Jeremy Corbyn widely viewed as unpopular - particularly with groups identified as likely to vote UKIP - and unelectable. All of this pointed to a night which would be considered as nothing other than disappointing for the party. Some even suggested that they were in serious danger of losing the seat.

In reality, by this morning it has become clear that on a much higher turnout than expected Labour cruised to victory on an increased vote share (62%), with UKIP languishing in a distant, distant second (23%). The result suggested that voters turned out in a far higher capacity than anticipated to back Labour.

It is of course important to note that UKIP were never in a serious position to take this seat. Rumours of their chances of taking Oldham West and Royton from Labour were greatly exaggerated. Nonetheless, the usual rules of engagement which surround by-elections and other factors mentioned above pointed strongly to a large increase in their vote share and a dramatic slashing of Labour's majority which was not forthcoming.

Indeed even those who were playing down UKIP's chances of winning were still all but guaranteeing a strong second place finish, which again simply didn't happen. There was even talk of a private poll putting the race to within just 1,000 votes. We should look with great interest as to why predictions in this regard were so far out, and what this means for our current estimates of the current standings of the respective parties.

In terms of Labour Party, after this result some current collective wisdom is in need of readdressing. The idea that Corbyn and a left-wing Labour Party are by nature incapable of winning elections is one which is not challenged anywhere near enough. Previous to this result, articles by the author have pointed out a wide range of left wing policies which are popular with the British electorate, and further how voters often select on the basis of perceived competence and valance rather than ideological position.

And here in Oldham West and Royton we have some striking empirical evidence attesting to the overblown and overstated impact of the reported unpopularity of Corbyn as Labour leader: while experts and commentators strongly expected Labour to be punished for their 'poor' choice of leader, this quite clearly has not happened. Looking at this result, it now seems tenuous at best to say with conviction that Corbyn is dragging Labour into an electoral oblivion.

Also of interest is the depressed nature of both the Conservative (-10) and Liberal Democrat (no change) vote shares. While it was widely argued that Labour's move to the left would cost them dearly in terms of voters leaving to back these two parties now angling for position on the centre ground, evidence from Oldham West and Royton again suggests that this has not happened. On this evidence, who can still say with such certainty that Labour will be punished and suffer at the hands of the electorate with Corbyn at the helm?

Part of the picture may well be that Corbyn and his Labour Party simply are not as unelectable as is currently widely viewed, and that at the ballot box a large majority of their traditional voters will remain loyal. But another part of it surely has to be the clear failure of UKIP to capitalise on any of the factors aligning before them in a by-election which appeared on paper to present a perfect opportunity for advancement.

Some commentators have criticised them for having a poor ground game (failing to properly organise and get voters out on the day), others have suggested that the party may now be hitting a 'ceiling' in terms of their support. The post-mortem on their performance will likely go on for a few days yet, and we should follow it with great interest for its implications for 'Plan 2020' (and indeed the 'Leave' vote in 2017). If they cannot make any serious and significant inroads here, in this by-election context, it is not a good sign for either.

This is of course only one result. Just one vote still fairly close to both the previous election and the election of Corbyn as a new leader. Some may therefore argue that not enough time has passed for the true effects of both Corbyn's election as leader and UKIP's 2020 game to get into gear. Others might argue that this result may be isolated, that this is not the best context in which to judge these effects. But you suspect that they would not have said this had the result gone as predicted just the day before.

In the short term this result will provide a welcome and much needed boost to the Labour Party leadership, and rightly so. This result, not in victory but by the nature and margin of it, is a clear and stinging challenge to the 'Corbyn oblivion' hypothesis. Equally, it must surely worry UKIP and their 2020 plan.