Saturday, 17 June 2017

On #GE2017 Forecasting and Diagnosing the 'Polls Plus' Model

The 2017 General Election campaign, result and fallout are each undeniably remarkable.

How a governing party was able to go from a 20-point lead to a lead of just over 2 points (constituting a swing towards their opposition) over the campaign period defies all previous knowledge about the impact of campaigns, and the ability of parties to turn their fortunes around so dramatically in just six weeks.

Similarly, the dramatic improvement in the personal approval rating of Jeremy Corbyn is beyond anything we have seen before - previous wisdom dictated that first impressions were everything.

Also, for only the second time in modern polling history, polling undershot Labour Party voting intention by a significant margin - around 5% on average. Rather than there being an 8-point Conservative lead as suggested by the final polling average, the Conservatives were just 2.5% ahead on the day.

The resulting hung parliament, with the Conservatives short of an absolute majority by seven seats, was an outcome forecasted by only one - Ben Lauderdale for YouGov.

Many are suggesting that the collective failure of the political academic and wider data-driven community to predict the oncoming hung parliament (save for Lauderdale/YouGov) is indicative of some chronic level of problems. That there is something now structurally or fundamentally wrong with political science regarding election analysis and forecasting. In short: we simply don't get it any more.

I disagree.

Firstly, it is right that we acknowledge where we went wrong, where we can improve, and admit that currently we are still some way away from where we would like to be in terms of our reading of the current political climate and prediction of outcomes.

That said, some forecasts were simply not as far off as others, and while the data which we all rely on contains systematic error, some level of systematic error in our forecasting work is to be expected. Rather than abandon ship and proclaim forecasting as a doomed and dead enterprise, we should  continue to analyse and seek to improve our methods and give time for polling to re-calibrate and readjust. 

As a summary of this rather long piece:
  1. Rumours of the death of academic forecasting are much exaggerated: some polling/swing based models were much more accurate than others, and academic work continues to be very informative and useful. 
  2. Rather than academic methodology or knowledge being fundamentally flawed, the large underestimation of Labour Party voting intention in GE2017 polling was the leading cause in forecasting error - particularly for those closer to the mark. 
  3. The PME Politics uniform swing model was actually closer to the overall total Conservative seats than even the Lauderdale/YouGov model - which is not a good thing. 
  4. The PME Politics 'Polls Plus' model was thrown off by including a polling-error parameter set in the wrong direction (anticipating an undershot in Tory voting intention), and under-estimating the transfer of UKIP vote share to Labour. 
  5. Switching the polling-error parameter directly around in the 'Polls Plus' model produces a near perfect estimation of the Conservative seat total - 317.
  6. The PME Politics 'Polls Plus' model was the second most accurate GE2017 forecast, captured the upward drift of the Labour Party much quicker than most, and its regional modelling approach performed very well and is such a recommendation which I would urge future forecasters to use.

Differential Forecasting Error - Some Models Performed Much Better than Others

Firstly, the idea that all forecasters were equal in their miss, that we should all be thrown into the same scrap bucket, is simply not true. Firstly, and most obviously, the Lauderdale/YouGov forecast correctly projected the hung parliament, with a central forecast of 302 seats for the Conservatives.

The method employed in that model was use of multilevel regression analysis to predict outcomes constituency by constituency. This differs to polling average/uniform swing methods which the vast majority of forecasters - including myself - currently use. It also only included YouGov polls, which turned out to be among the most accurate this time around.

The much lauded (and rightly so) Lauderdale/YouGov forecasting model did predict a hung parliament, but was itself fairly off with its total Conservative seat projection (and thus their proximity to a Commons majority). Their final forecast of 302 seats for the Tories (not to be confused with an on the day poll which made a last minute change in methodology which boosted the Conservative Party lead to 7 points) was 16 seats away from the party's final figure.

16 away is by far the closest anyone managed to get, and combined with the correctly predicted result (hung parliament) makes Lauderdale/YouGov's forecast by far and away the GE2017 winner.

But were others really that much further off?

Separate forecasts produced by myself, Steven Fisher and Rosie Sharrocks, and Michael Thrasher were the next closest to the overall result, and projected a total Conservative Party seat count of 348, 349 and 350 (if you include the Speaker in the Conservative column, as the previous two did and as on-the-night results flows do) respectively. These forecasts thus overshot the projected Conservative seat total by 30, 31 and 32 seats. Twice the distance from the final total as Lauderdale/YouGov.

While half as good as the Lauderdale/YouGov model, these forecasts were not wildly wrong. The models used by each were much informed by ongoing, longstanding research into election forecasting and analysis, combining contemporary academic knowledge and research with polling estimates.

Other forecasts were indeed far wider of the mark. Some forecasted majorities for the Conservatives of around 100 seats (such as by Financial Times Election Analyst Matt Singh) and others close to 125 (such as by Ian Warren at Election Data). This is an overshot of around 60 and 70 respectively.

In short, there were degrees of error within our GE2017 forecasts, and I would argue that polling/swing based models have not yet had their day. That said, there is clearly an awful lot that we can learn from the kinds of models used successfully by both Lauderdale/YouGov and Chris Hanretty (EU Ref) before.

Polling Errors as Root Cause of Forecasting Error

What is also worth remembering is this: we are indeed in uncharted territory, and errors in the principal data which we all rely on to forecast with - polls - are still present. As the above introduction highlighted, this General Election broke many molds and conventions: the ability of opposition parties and their leader to recover from terrible polling ratings, campaigns (and particularly manifestos) making a difference, and polling significantly overestimating a Conservative Party lead.

This final point has much to do with why average polling/swing based estimates were as far off as they were (to varying degrees, as highlighted above).

Many of us, including myself, were correcting in our forecast models for an underestimate of the Conservative Party vote share, in line with what happened in 2015. In truth, while many of us were sceptical that the pollsters had fixed the issues raised by the Sturgis report into the 2015 polling miss, it appears that many of them actually substantially 'over-corrected', and were thus producing estimates of Labour Party voting intention that were far too low - almost 10 points too low in the case of BMG.

Simply put: we should expect average polling/swing based models to be inaccurate if polling continues to be inaccurate. Forecasts in 2015 were off largely because polling was not adequately capturing Conservative Party voting intention. Forecasts in 2017 were off largely because polling was not adequately reporting Labour Party voting intention.

It was a mistake for us to assume that the same error would continue from 2015 into 2017. Perhaps we could have taken a closer look at the changes made by pollsters since the Sturgis report, but I do not believe that we could have reasonably forecasted that the error would have flipped right around.

Diagnosing the PME Politics Forecasting Model

Turning to my own forecast, the worst fear of any modelling diagnostic was indeed realised in my case: the standard, flat uniform swing (UNS) model outperformed it (at least on the Conservative and Labour seat counts). That is to say, my attempts to make uniform swing more accurate actually made the forecast less accurate.

The final UNS model, which took the average swing to/from each party implied by the average of all polling and applies it to every 2015 constituency result in the country, projected a result of: Conservatives (332), Labour (237), Lib Dems (5), UKIP (0), SNP (54), Plaid (3) and Greens (1).

The UNS model was actually more accurate even than the Lauderdale/YouGov model; 332 is 14 seats away from the final Conservative total of 318, whereas the Lauderdale/YouGov model was 16 seats off. 

The final PME Politics 'Polls Plus' forecast by comparison projected: Conservatives (348), Labour (224), Lib Dems (10), UKIP (0), SNP (46), Plaid (3) and Greens (1).

So the UNS swing model predicted the Conservative and Labour Party seat totals much better, but the 'Polls Plus' model much more accurately predicted the SNP and Liberal Democrat seat totals.

Two quick fixes to the 'Polls Plus' model can provide much more accurate seat totals: removing the polling error parameter (which moved swing away from Labour and towards the Conservatives), and increasing the projected transfer of UKIP voters to Labour (initially set at 0.2, now set at 0.3).

These changes produce the following seat totals: Conservatives (330), Labour (239), Lib Dems (12), UKIP (0), SNP (47), Plaid (3) and Greens (1).

These two fixes could have been realistically anticipated and applied before the night, and would have greatly improved the forecast's accuracy (and indeed make it outperform the UNS model). 

A single and simple fix however, which could not have realistically have been foreseen, produces a near perfect result. Directly flipping the polling-error parameter around, so that it anticipates the impact of polls underestimating Labour (rather than the other way around as in 2015), moves the projected seat totals to the following: Conservatives (317), Labour (253), Lib Dems (11), UKIP (0), SNP (47), Plaid (3) and Greens (1).

To be clear, this is the exact same model but rather than average, pre-modeled swing being increased slightly in the Conservative direction, it is instead increased in the Labour direction, and by the same factor.

This is evidence that we should not abandon polling average or swing based models in favour of others: if polling averages become more accurate again in future, then these types of models will not miss in such a fashion as they have done this time around.

Where the 'Polls Plus' Model Performed Well

Despite calling the result incorrectly, and remaining 30 seats adrift of the final Conservative Party seat share, there were some elements of the model's performance which I am pleased with.

Firstly, the model responded quickly (and as it turned out accurately) to the narrowing of the Conservative Party lead over the campaign period. While it started out in similar territory as the aforementioned forecasts by Matt Singh and Ian Warren, the projected Conservative seat lead narrowed sharply as polling indicated a resurgent Labour Party was on its way to closing the gap. The final 'prediction tracker' graph below demonstrates the models' responsiveness to the campaign.


Secondly, the 'Polls Plus' model calculated and applied separate swing estimates for each of Britain's major regions: Scotland, Wales and London. These were informed by regional polling, mostly produced by YouGov. Without the separate calculations, the projected Conservative majority increases by around 10 seats. Thus, a regional approach to election modelling is certainly a recommendation I would make to future forecasting.

And in Conclusion...

There is no one approach to forecasting which has failed, or one which prevails - save of course for the Exit Poll managed by Professor John Curtice. Beyond this, there are no forecasting methods that we can be absolutely certain of. Beyond this, there are no election gurus.

As a political science community, we all have a lot to take away from this last General Election, and indeed still the one before, when it comes to our forecasting. Clearly, the success of the Lauderdale/YouGov multilevel regression method speaks for itself, but let's not write off average polling/swing based models quite yet - and certainly not while polling itself remains inaccurate.

As demonstrated above, a simple adjustment to counterbalance the underestimation of Labour voting intention in itself flipped the polling/swing based PME Politics 'Polls Plus' forecast from a Conservative majority to a near perfect projection of the result.

Thursday, 8 June 2017

#GE2017 Night - Oh What a Night!

As the dust settles on the election, it's time to write up my thoughts.

Quite frankly, who but a few could have predicted yesterday that the Conservatives would be losing their majority?

The exit poll did. And as part of the exit poll team, I have to say a huge congratulations to everyone on the team. From Psepho-in-Chief John Curtice, to fellow Curtice-minions Steve Fisher, Rob Ford, Jon Mellon, and Jouni Kuha. To our producer Tim and typist Tracey. To everyone involved in the operation at GfK and Ipsos -- and to all those who answered the exit poll!

We were delighted that everyone's immensely hard work paid off, and that we were able to successfully predict the outcome and guide the coverage on the night.

As for the result: only YouGov (Ben Lauderdale) had a hung parliament as their central forecast (before the exit poll). My own forecast suggested a Conservative seat total of 348 - a full 30 seats too high. It was however the closest model other than YouGov's that I know of (please do correct me if I am wrong there). I will run a full diagnostic on the model and write a report over the coming days.

In the meantime, full kudos to YouGov and Ben Lauderdale for their success in predicting the hung parliament.

To be where we are now given where we started when the snap election was called is quite incredible.

What happened since forecasts of a 100 seat majority for the Conservatives in May is:
1) the Conservatives put together a dismal campaign,
2) the Labour Party put together an impressive campaign and manifesto which confounded their critics, and
3) the majority of polls were highly erroneous by assuming and weighting to a far too low turnout of young people. This was, we all think, a real central part to the story: young voters coming out and winning Labour those key seats.

In truth, Labour were up mostly everywhere. It was a good performance (compared to the 2015 baseline). We can however draw out some interesting stories which begin to unravel a bit the complex and dynamic picture we now see before us on the British electoral map.

As well as some clear and well established stories about age and voting (swing to Labour was around 5% in seats with more than 10% of young people (aged 18-24), compared to around 2 in seats where there were less than 5% young people), there were also two other clear dynamics: regional performances and the Brexit vote.

On the first, the North East of England was the only region where the Conservatives out-performed Labour outside of Scotland. Here they made a serious impact in terms of winning over UKIP voters, and flipping seats (Middlesborough South and Cleveland and Southport, for example).

In Scotland, the Scottish Conservatives continued their strong advances under Ruth Davidson and have undoubtedly successfully established themselves as the second party there. Indeed, the Conservatives' ability to govern from this point are entirely thanks to their wins north of the border, 12 in all, without which they would be much too far adrift.

Elsewhere however, Labour were very much on top and achieved large swings from both the Conservatives and UKIP. Remarkably, the assumption that the vast majority of UKIP voters would go to the Conservatives simply didn't hold up outside Northern seats. The swings in the South and East in particular were mightily impressive (3.5% in the South East region for instance), knocking over seats which never should have been on the cards such as Ipswich and Canterbury.

The Conservatives also did well in high leave voting areas, particularly those in the North West and Midlands. In midlands seats where the Brexit vote was higher than 60%, the Conservatives rose on average by 10.5 points. This again did win them some contests, such as Walsall North and Mansfield.

Conversely, Labour were up on average 12 points in seats across the nation where the remain vote share was greater than 65%. They picked up wins across London, a high remain voting area, but also in Wales and the East where leave was in the majority.

For the Liberal Democrats, they will be pleased to have increased their overall seat numbers but their loses in England and Wales must be of great concern. They appeared to do well in England in high remain voting areas, against the Conservatives, and with well known candidates. Elsewhere, and particularly against Labour incumbents, they did not do well. In Scotland, the Lib Dems simply held up in many places while the SNP crashed all around them (which is the only real story there).

Young voters, the regions, and Brexit. What a night.

Final PME Politics #GE2017 Forecast - Conservative Overall Majority of 46 (348 Total Seats)

As of this morning, updated polling both nationally and regionally has been included into the model to produce the final PME Politics forecast.

The 'polls plus' model projects a result of:

Conservatives: 348
Labour: 224
Lib Dems: 10
UKIP: 0
Greens: 1
SNP: 46
Plaid: 3

This is based on an estimated swing of around +9 for the Conservatives on their 2015 results. The Butler swing at the aggregate level would therefore be -1 (a swing towards Labour).

Accounting for forecasting errors, the model is projecting a Conservative majority of somewhere between 60 and 30 seats, with the central projection being about 45.

After the results come in, I will run a full disgnostic on the model and write a piece discussing its accuracy, what worked well and what didn't, and what the next PME Politics forecast might look like.

Wednesday, 7 June 2017

#GE2017 - What Will Happen to Student Seats?

This is a quick note on something I expect to see come out in election results analysis tomorrow - the dispersal of student voters out from 'student seats' into the wider electorate.

A good deal of Universities are now finished for the summer, or are about to at the end of this week. This means that instead of voting in their term-time, 'student seats' (such as Sheffield Central, Manchester Central, and early declarers such as Newcastle Central), a whole bunch of student voters may be dispersing back into the student population.

This effect will be quite dramatic at least on the 'student seats' themselves. Take Sheffield Central for example - here, according to the 2011 Census, students made up 38% of the population (the highest in the country). But the population of students usually resident in the constituency is just 16% according to the same Census data. This suggests that there could well be under half the amount of students voting in Sheffield Central tomorrow than otherwise might in a May General Election.

This effect will be repeated across all student towns and cities. In Manchester Central the figures stand at 30% and 15% respectively, 31% and 14% in Liverpool Riverside, 28% and 12% in Cambridge, the list goes on and on.

This has two important consequences, both of which may well emerge as the election results come in over the course of Thursday night and Friday morning.

Firstly, we may see a substantial drop in turnout in student seats. Newcastle Central will declare early (probably 1st, I am told). Here, around 20% of the population are students. If even half of them are not voting in the seat (term time officially finishes on Friday), then we could see turnout decrease there dramatically. We should not however interpret this as indicative of a huge fall in turnout across the nation - it will only be student voters voting elsewhere.

Secondly, and perhaps less likely and definitely harder to detect, this dispersal of student voters could mean that instead of Labour 'racking up' votes in safe, student seats, these extra votes could well diffuse into the wider electorate. Though it is true that the highest concentration of this diffusing is likely to be into seats with younger, more educated populations (which naturally provide more students) which also would be more likely to be Labour strongholds.

That said, if a few hundred students are returning to vote in constituencies such as Derby North and Ealing Central and Acton, then if the result is tight as some polls are suggesting, they could well be the difference in these key battleground seats.

Much of this also depends on whether or not the students themselves will stick around to vote in student seats, or had the foresight to organise voting at home. It is certainly something to look out for!

The Other Side of the Exit Coin - Green Party Saving (Veggie) Bacon

Much has (quite rightly) been made of the potential impact that UKIP's exit in over 250 seats could have on Labour's ability to hang on in marginals up and and down the country. What is being discussed much less, and might be of crucial importance if the result does end up becoming as close as some pollsters and forecasters are projecting, is the impact that Green Party exits might generate.

The Greens are standing down in around 100 seats across the country (where they previously had contested in 2015). In 12 of such seats, the party posted a vote share of over 5%, from 5.1% in Cambridgeshire South East to 10% in York Central.

45 of the total seats in which the Green Party are exiting are tight marginal contests (where the swing required for them to change hands is less than 5%).

4 of the top 12 seats (where the Green vote share is larger than 5%) are marginals, including Brighton Kemptown and Lewes, and they are standing down in other crucial marginals where Labour are currently just about holding on - including the London seats of Brentford and Isleworth and Ealing Central and Acton.

Each of these seats typify a contest where Green exits could make all the difference:

In Brighton Kemptown, the Conservatives lead Labour by 1.5 points. The Green vote share in 2015 was 7%. If Labour are successfully able to mobilise even a third of those Green voters to turn out and put a cross by Lloyd Russell-Moyle's box tomorrow, then Labour would take the seat. Even in the face of a significant Tory swing, Green voters would still be enough to take Labour over the line and knock off a seat from the Conservative column.

In Lewes, the Liberal Democrats are campaigning hard to tack back the seat occupied by Norman Baker until the 2015 election. Then, the Conservatives took it by 2.1%, while the Greens achieved a vote share of 6.3%. So once again, if the Lib Dems are able to pick up those Green voters - whom the local party has specifically directed to vote for Kelly-Marie Blundell - then they may well be celebrating come Friday. The same is true of St Ives, another Con-Lib Dem marginal seat where the Greens achieved 5%+ in 2015 but are now standing down.

Finally, though in Brentford and Iselworth and Ealing Central and Acton the Greens racked up only around 3.5% of the popular vote, the Labour Party won these London battlegrounds in 2015 by the slimmest of margins - 1% and 0.5% respectively. As the Conservatives attempt to flip these seats - and others in the midlands such as Newcastle-Under-Lyme and Northern seats such as Halifax - they may well find that tactical Green voters provide an extra barrier which their candidates fail to breach.

Of course, in the face of a Conservative landslide, Green exits may well stop very little. But if the vote is indeed closer than the whitewash forecasted at the beginning of the campaign, tactical Green voters in marginal seats where their party has exited - and indeed where they have not - could make all the difference to the Conservative seat total.






Tuesday, 6 June 2017

Updated #GE2017 Forecast - Conservative Majority Increased to 46 (348 Seats)

Today's updates include adjustments to Scottish swings based on a new YouGov poll north of the border.

For the penultimate PME Politics forecast, a 3-day average of polls was taken and combined with the contextual model to produce the first increase in the predicted Conservative majority (compared with the last forecast) since early May.

The Uniform Swing (UNS) model calculated the following distribution of seats: Conservative (332), Labour (237), Lib Dems (5), UKIP (0), SNP (54), Plaid (3) and Greens (1). Again, this would constitute a more-than-modest increase of the 2015 Conservative majority of just 2 seats.

The implied swing from the last three days of polling has both Labour and the Conservatives up by around 6.5 points (from 2015). In practice, this would be a swing of around about 3% towards Labour.

The 'polls plus' model produced the following projected result:


This moves the Conservatives up by 3 seats from yesterday's forecast, reflecting a slight increase in their polling average, and more favourable Scottish swings (namely the SNP moving down a touch).

In fact, the SNP's current projected decline is now steep enough to suggest that Labour will take back Renfrewshire East from the nationalist party, according to the 'polls plus' model. 

The first expansion of the Tory lead in just under a month can be seen on the prediction tracking graph below.


The 'polls plus' model moves the projected marginals up one seat into the Labour defense list, with the Conservatives just predicted to take Wakefield by a threat, based on the current polling and contextual factors applied in the model.



Once again though - health warning - this function of the forecast is very much exploratory, and indicates the kind of seats that we might expect to currently be the battlegrounds, rather than constituting a firm prediction of the outcomes there.


Monday, 5 June 2017

Updated #GE2017 Forecast - Conservative Majority Cut Again to Just 40 (355 Seats)

Today's forecast contains an additional regional swing differential for London, which polling suggests is moving in a much different direction to that of the rest of the country. 

Today's #GE2017 Forecast is the last to use a 5-day rolling uniform swing, with future daily forecasts using a 3-day rolling average as the campaign and polling picks up speed moving towards the final day. There will also be a PME Politics Forecast on 'election-day-eve' which will use polls from June 7th only.


A simple uniform swing model using polls published in the last 5 days produces the following result: Conservatives (332), Labour (328), Lib Dems (4), UKIP (0), SNIP (54), Plaid (3), and Greens (1).

The averaging function of the most recent polls implies a Conservative swing of around +8.5, and a Labour swing of about +5%. This moves the result much closer together than that which the 6-point gap between the two parties produced in 2015.

An increase of just one seat and a majority of 14 for Theresa May would undoubtedly be seen as a very bad night.

The 'polls plus' model adds more seats to the Conservative's tally, but today predicts the smallest Conservative majority to date of just 40 seats.


Even a majority of 40 (compared to 14) would not be considered a good night for the Tories, given a) their position in the polls at the start of the campaign, and b) that this snap election was 'called' with the precise aim of giving the Prime Minister a large and commanding majority as she headed into the Brexit negotiations.

The projection tracking graph shows the predicted gap between Britain's largest political parties continuing to narrow, and also highlights the model's early pick-up of this trend (in the first week of May).



Finally, the predicted marginals continue to move back towards the 2015 result, with the model projecting that the fault line between Labour and the Conservatives when the dust settles on June 9th will be between Wakefield and Birmingham Northfield.

Croydon Central, a tight Tory-Lab marginal, is predicted to move into the Labour column - Labour are doing well in London according to recent polling. The 'progressive alliance' is also expected to tip Brighton Kemptown into Labour hands, with the Greens standing down and backing the Labour candidate. Although the reality with these forecasts is that the result in the seats below is currently too close to call.