1. Final Prediction Post

Mena Solomon

2024/10/27

How to predict an election outcome?

Over the past seven weeks, I have been working to build a model which can effectively predict the outcome of the 2024 presidential election. Now, with a little over a week until election day, it is time to utilize my knowledge to produce a comprehensive election forecast.

My final model will include five predictive variables —

Significantly, this model does not include campaign variables covered in week 6 and week 7. There are three primary reasons for this choice: 1. As political scholars point out, the election can often be predicted on fundamentals alone due to a tug-of-war effect wherein each candidate, campaigning at a similar volume, cancels out the effect of their opponent’s campaign. 2. Due to the ever-changing and increasingly dynamic nature of campaigning, there is little historical data to incorporate into the model. Limited data will often generate model bias, which would inhibit my understanding of the predictive power of the variables listed above. 3. Over the past month, Kamala Harris raised over 1 billion dollars in donations (Wall Street Journal, 2024). Indeed, campaign spending and mobilization has become unprecedented, calling into question the predictive power of campaigns.

This model is also trained off of data beginning in 1972 so as to include the maximum number of election cycles after the Civil Rights Act, when each party’s ideology become more consistent.

Training a regression model to predict Democratic two-party vote share

## 
## =================================================================================
##                                                         Dependent variable:      
##                                                   -------------------------------
##                                                   Democratic Two-Party Vote Share
## ---------------------------------------------------------------------------------
## Latest Democratic Poll Averages                          0.695*** (0.027)        
## Q2 GDP Growth                                            0.138*** (0.017)        
## Incumbency                                               -3.112*** (0.410)       
## Democratic Two-Party Vote Share Lagged One Cycle         0.526*** (0.033)        
## Democratic Two-Party Vote Share Lagged Two Cycles        -0.190*** (0.025)       
## Constant                                                 3.690*** (0.939)        
## ---------------------------------------------------------------------------------
## Observations                                                    559              
## R2                                                             0.836             
## Adjusted R2                                                    0.834             
## Residual Std. Error                                      3.741 (df = 553)        
## F Statistic                                          563.175*** (df = 5; 553)    
## =================================================================================
## Note:                                                 *p<0.1; **p<0.05; ***p<0.01

This model, with an adjusted R^2 of 83.4%, can explain all but 20% of the variance in Democratic two-party vote share in every state’s election since 1972. Above, the asterisks indicate that each of the five variables described above is predictive at the 0.01 level. Indeed, each coefficient is also of significant magnitude, representing the percent increase in Democratic vote share triggered by increasing each variable by one point.

Training a regression model to predict Republican two-party vote share

## 
## =================================================================================
##                                                         Dependent variable:      
##                                                   -------------------------------
##                                                   Republican Two-Party Vote Share
## ---------------------------------------------------------------------------------
## Latest Republican Poll Averages                          0.584*** (0.023)        
## Q2 GDP Growth                                            -0.040** (0.017)        
## Incumbency                                               -3.725*** (0.407)       
## Republican Two-Party Vote Share Lagged One Cycle         0.444*** (0.035)        
## Republican Two-Party Vote Share Lagged Two Cycles        0.076*** (0.026)        
## Constant                                                   0.578 (1.093)         
## ---------------------------------------------------------------------------------
## Observations                                                    559              
## R2                                                             0.833             
## Adjusted R2                                                    0.832             
## Residual Std. Error                                      3.770 (df = 553)        
## F Statistic                                          553.132*** (df = 5; 553)    
## =================================================================================
## Note:                                                 *p<0.1; **p<0.05; ***p<0.01

This model, with an adjusted R^2 of 83.2%, can explain all but 20% of the variance in Republican two-party vote share in every state’s election since 1972. Above, the asterisks indicate that each of the five variables described above is predictive at the 0.01 level, except Q2 GDP growth which is predictive at the 0.05 level. Indeed, each coefficient is also of significant magnitude, representing the percent increase in Democratic vote share triggered by increasing each variable by one point.

The similarities in both of these regression models is indicative of the predictive power of all of the variables included in the model.

Utilizing a regularized regression to eliminate collinearity

In using variables which are often highly correlated not only with my chosen outcome variable, two-party vote share, but the other variables within the model as well generates a high risk of collinearity. This could bias our results, thus I chose to use an elastic-net regularized regression which shrinks each coefficient based on its relative significance, thus decreasing model bias. In generating this model, cross validation was used to determine the model’s best lambda value.

To test the success of the elastic net regularization on enhancing my model’s predictive power, I ran a k-fold cross validation, the results of which are included here:

Table: Table 1: Out-of-Sample Error Summary for Democratic and Republican Predictions

PartyMean ErrorStandard Deviation
Democratic0.05201805.800855
Republican-0.16849597.165806

The very small mean error, accompanied by a similarly low standard deviation, increases my confidence in both models, indicating their predictive power.

Predicting the 2024 election

As I have done in the previous three weeks, I will be predicting for the seven states which expert predictors like Cook and Sabato determine to be toss-ups in the upcoming election: Arizona, Nevada, Michigan, Wisconsin, North Carolina, Georgia, and Pennsylvania. Using the elastic-net regularized regression model generated above, which includes five predictive variables, my models calculated both Democratic and Republican two-party vote share.

2024 Election Predictions:

stateDemocratic Two-Party Vote ShareWinner
1Arizona51.86191Harris
4Georgia52.10763Harris
7Michigan52.79416Harris
12Nevada52.38088Harris
16North Carolina51.74808Harris
18Pennsylvania52.51349Harris
22Wisconsin52.56357Harris
stateRepublican Two-Party Vote ShareWinner
1Arizona53.26769Trump
4Georgia53.37372Trump
6Michigan51.74763Trump
11Nevada51.63499Trump
15North Carolina53.46686Trump
17Pennsylvania52.28012Trump
20Wisconsin52.39460Trump

As displayed by both models, an apparent error exists wherein each model is biased to predict a two-party vote share which sums to around 105%, instead of 100%. This bias does not appear to shift when any single variable is removed, thus indicating that it is the fault of an anomaly in the data. To account for this error, my final result normalizes the results above.

Normalized 2024 election prediction:

StateDemocratic PredictionRepublican PredictionWinner
Arizona49.3314150.66859Trump
Georgia49.3998550.60015Trump
Michigan50.5005349.49947Harris
Nevada50.3585449.64146Harris
North Carolina49.1832050.81680Trump
Pennsylvania50.1113549.88865Harris
Wisconsin50.0804949.91951Harris

After normalizing the results, the model appears to predict a landslide victory for Trump in every swing state. Indeed, this model predicts 312 electoral votes for Trump and 226 for Harris. While this result appears to be incredibly unlikely, it is not impossible. Furthermore, the confidence intervals (not shown above) include both outcomes, re-emphasizing that this year’s election will be decided within an incredibly slim margin.

Notes

All code above is accessible via Github.

Data Sources

US Presidential Election Popular Vote Data from 1948-2020 provided by the course. Economic data from the U.S. Bureau of Economic Analysis, also provided by the course. Polling data sourced from FiveThirtyEight.