Chapter 9 1 1) Modeling the Number of Claims
9.1 a. (5 points) Create a table to present the distribution of the number of claims by computing the proportion of each possibility. What percent of the policies have 0 claims?
Plot_DF
## # A tibble: 4 x 4
## ClaimNb count etotal proportion
## <dbl> <int> <int> <dbl>
## 1 0 42685 44999 0.949
## 2 1 2224 44999 0.0494
## 3 2 84 44999 0.00187
## 4 3 6 44999 0.000133
The result shows that about 95% of the policies have 0 claims.
9.2 b. (6 points) Create a Poisson regression to predict the number of claims in the year using only the variables for vehicle age, driver age, bonusMalus, and density. What is the deviance? Does is differ significantly from the null deviance?
<- glm(ClaimNb ~ VehAge + DrivAge + BonusMalus + Density, family=poisson, Insurance)
Poisson_Mod summary(Poisson_Mod)
##
## Call:
## glm(formula = ClaimNb ~ VehAge + DrivAge + BonusMalus + Density,
## family = poisson, data = Insurance)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1306 -0.3375 -0.3039 -0.2816 4.3804
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.9772852 0.1330830 -37.400 < 2e-16 ***
## VehAge 0.0005833 0.0035970 0.162 0.871
## DrivAge 0.0110918 0.0014819 7.485 7.17e-14 ***
## BonusMalus 0.0196450 0.0010601 18.532 < 2e-16 ***
## Density 0.0529343 0.0111655 4.741 2.13e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 14381 on 44998 degrees of freedom
## Residual deviance: 14049 on 44994 degrees of freedom
## AIC: 18744
##
## Number of Fisher Scoring iterations: 6
# S14
pchisq(summary(Poisson_Mod)$deviance,Poisson_Mod$df.residual, lower.tail = FALSE)
## [1] 1
The deviance is 14049. It is not statistically different from the null model.
9.3 c. (4 points) Generate predictions on the response scale µ and round them to the nearest count. Create a table as in part a and comment on how similar or dissimilar this result is.
<- predict(Poisson_Mod,type = "response")
Predicted_Means #X2 <- sum((Insurance$ClaimNb - Predicted_Means)ˆ2/Predicted_Means)
$Predicted_Class <- round(Predicted_Means,0) Insurance
#count(Insurance, Predicted_Class)
The result is not very similar. There are way more 0 then the actual data.
9.4 d. (3 points) Fit a negative binomial model to the data and generate predictions as in part c. Did this solve the problem?
#S26
<- MASS::glm.nb(ClaimNb ~ VehAge + DrivAge + BonusMalus + Density,Insurance)
NegBinom_Mod # Convert back to glm() style object
<- MASS::glm.convert(NegBinom_Mod)
NegBinom_Mod_glm
summary(NegBinom_Mod_glm, dispersion=1)
##
## Call:
## stats::glm(formula = ClaimNb ~ VehAge + DrivAge + BonusMalus +
## Density, data = Insurance, family = negative.binomial(theta = 2.89746020004541,
## link = log))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0799 -0.3360 -0.3026 -0.2805 4.1113
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.9864644 0.1358338 -36.710 < 2e-16 ***
## VehAge 0.0005578 0.0036355 0.153 0.878
## DrivAge 0.0111508 0.0015060 7.404 1.32e-13 ***
## BonusMalus 0.0197329 0.0010930 18.053 < 2e-16 ***
## Density 0.0531295 0.0112863 4.707 2.51e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(2.8975) family taken to be 1)
##
## Null deviance: 13623 on 44998 degrees of freedom
## Residual deviance: 13298 on 44994 degrees of freedom
## AIC: 18738
##
## Number of Fisher Scoring iterations: 1
<- predict(NegBinom_Mod,type = "response")
Predicted_Means_NB
$Predicted_Class_NB <- round(Predicted_Means_NB,0)
Insurance
#count(Insurance, Predicted_Class_NB)
The negative Binomial result is quite similar to the Possion. Thus, it still has the issue of over-predicting 0s.
9.5 e. (5 points) Now fit a zero inflation Poisson model (ZIP) and compute the fitted proportion of zero counts as on slide 38 of the Poisson Notes. Does this help?
#S34
#install.packages(pscl)
library(pscl)
## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis
<- zeroinfl(ClaimNb ~ VehAge + DrivAge + BonusMalus + Density,Insurance)
ZIP_Mod summary(ZIP_Mod)
##
## Call:
## zeroinfl(formula = ClaimNb ~ VehAge + DrivAge + BonusMalus + Density,
## data = Insurance)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.7772 -0.2379 -0.2141 -0.1965 14.0577
##
## Count model coefficients (poisson with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.350206 0.244515 -17.791 < 2e-16 ***
## VehAge 0.055968 0.009113 6.141 8.17e-10 ***
## DrivAge 0.006835 0.002815 2.428 0.0152 *
## BonusMalus 0.015519 0.001764 8.796 < 2e-16 ***
## Density 0.011674 0.019009 0.614 0.5391
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.24320 0.90679 0.268 0.78855
## VehAge 0.20267 0.02203 9.198 < 2e-16 ***
## DrivAge -0.02088 0.01168 -1.789 0.07368 .
## BonusMalus -0.01744 0.00691 -2.524 0.01160 *
## Density -0.17969 0.06359 -2.826 0.00471 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 33
## Log-likelihood: -9333 on 10 Df
#S38
<- predict(ZIP_Mod,type="zero")
Pred_zero <- predict(ZIP_Mod,type="count")
Pred_Count <- Pred_zero + (1-Pred_zero)*exp(-Pred_Count)
Total_Prob0 mean(Total_Prob0)
## [1] 0.9487571
#count(Pred_zero)
Now it helps. The proportion of 0 is close to the actual data. Moreover, there are more other count (i.e., >0) as well.
9.6 f. (8 points) Interpret the signs of the coefficient estimates out of the ZIP model, both the count portion as well as the zero inflated portion!
data.frame(exp.coef = exp(coef(ZIP_Mod)))
## exp.coef
## count_(Intercept) 0.01290416
## count_VehAge 1.05756403
## count_DrivAge 1.00685853
## count_BonusMalus 1.01563970
## count_Density 1.01174223
## zero_(Intercept) 1.27532174
## zero_VehAge 1.22467152
## zero_DrivAge 0.97933313
## zero_BonusMalus 0.98271136
## zero_Density 0.83552783
#??? why it cannot run as S36
The higher the driver age and bonus miles and density reduce the likelihood of fire a claim. The increase in vehicle age increases the likelihood to fire a claim.
The higher the driver age and bonus miles and density increases the number of claims being fired once they fire for a claim. The vehicle age also increases the number of claim to be fired.