Chapter 10 STAT 4520/7520 - Homework 4 Spring 2022 Due: April 25, 2022
10.1 1)
The BodyWeight dataset in the nlme package contains data on the body weights of rats measured over 64 days. The body weights of the rats (in grams) are measured on day 1 and every seven days thereafter until day 64, with an extra measurement on day 44. The experiment started several weeks before “day 1.” There are three groups of rats, each on a different diet. It contains the variables: • weight: The body weight of the rat in grams. • Time: The time at which the measurement is made in number of days. • Rat: The rat whose weight is measured. • Diet: A factor with levels 1 to 3 indicating the diet that the rat receives. The data can be loaded with the following command.
data(BodyWeight,package = "nlme")
10.1.1 a. Make a plot of weight vs Time for the data with rat specific lines to show the subject specific trajectory. Describe your findings. (7510 only) Further, color the lines differently for the different diets. Do you notice anything further?
#S7
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
ggplot(BodyWeight, aes(x=Time,y=weight))+
geom_line()+ facet_wrap(~Rat)
# color the lines differently for different diet.
library(dplyr)
library(ggplot2)
ggplot(BodyWeight, aes(x=Time,y=weight, color = Diet))+
geom_line()+ facet_wrap(~Rat)
10.1.2 b. Use lm() to regress the weight onto the Time. What is the slope and intercept of this model? Is the slope significant?
<- lm(weight ~ Time, BodyWeight)
lm_model summary(lm_model)
##
## Call:
## lm(formula = weight ~ Time, data = BodyWeight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -157.32 -119.54 -25.42 125.75 225.68
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 364.8359 19.0841 19.12 <2e-16 ***
## Time 0.5857 0.4921 1.19 0.236
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 127 on 174 degrees of freedom
## Multiple R-squared: 0.008074, Adjusted R-squared: 0.002373
## F-statistic: 1.416 on 1 and 174 DF, p-value: 0.2356
The slope is 0.59, the intercept is 365. It means the average weight at the begining is 365.Every unit of increase in time increases the weight by 0.59. However, the slope is not significant.
10.1.3 c. Use the lmList function to fit a linear regression model that describes how weight varies with days for each individual rat. How do the slopes and intercepts vary? Do you notice any patterns with respect to diets?
#how is c diff from d??? random intercept model vs. fixed effect here?
# how is lmList (fit a series simple regression within the group) vs. lmer diff
library(lme4)
## Loading required package: Matrix
<- lmList(weight ~ I(Time-1) | Rat, BodyWeight)
ml sapply(ml,coef)[,1:16] |> as.matrix()
## 2 3 4 1 8
## (Intercept) 226.884778 247.2276043 255.8208779 245.3256606 252.1534833
## I(Time - 1) 0.330356 0.3980345 0.3295261 0.4844071 0.4086919
## 5 6 7 11 9
## (Intercept) 256.2473029 264.3630487 268.0675693 443.0196549 407.925879
## I(Time - 1) 0.4058091 0.3184538 0.2018345 0.3625246 1.010657
## 10 12 13 15 14
## (Intercept) 409.080542 553.10439 463.0154619 524.1315571 526.0279100
## I(Time - 1) 1.341101 1.14763 0.9185193 0.4931644 0.3147849
## 16
## (Intercept) 504.3502948
## I(Time - 1) 0.9054379
The intercepts and slopes vary a lot across the rats. Only Diet 2 (Rat 9-12) seems to have a positive slope.
10.1.4 d. Fit a random intercept model to account for the different weights of each individual rat and continue to use Time as a fixed effect. Answer the following:
• What proportion of the variation is explained by Rat? • Interpret the fixed effect intercept and comment on it’s value in the context of the variance component for rat. • Test for significance of the Time effect. What do you find? Is the conclusion different from part b?
library(lmerTest)
##
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
##
## lmer
## The following object is masked from 'package:stats':
##
## step
$Time_s <- BodyWeight$Time-1
BodyWeight<- lmer(weight ~ Time_s + (1| Rat), data = BodyWeight)
ml_intercept
summary(ml_intercept)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: weight ~ Time_s + (1 | Rat)
## Data: BodyWeight
##
## REML criterion at convergence: 1360.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.5029 -0.5458 -0.0394 0.5608 3.1139
##
## Random effects:
## Groups Name Variance Std.Dev.
## Rat (Intercept) 16940.80 130.157
## Residual 66.85 8.176
## Number of obs: 176, groups: Rat, 16
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 365.42163 32.56138 15.03013 11.22 1.05e-08 ***
## Time_s 0.58568 0.03168 159.00000 18.49 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Time_s -0.032
#??? why it says null for the Rat?
19640/ (19640 + 66.85)
## [1] 0.9966078
sqrt(19640)
## [1] 140.1428
confint(ml_intercept, method="boot", oldNames = FALSE)
## Computing bootstrap confidence intervals ...
## 2.5 % 97.5 %
## sd_(Intercept)|Rat 79.6436591 178.6157691
## sigma 7.3076618 9.2588507
## (Intercept) 292.8748923 429.2448963
## Time_s 0.5197526 0.6519881
The % of variance explained by the rats is 99% The fixed effect is 365 or the average weight of the rat is 365 in the sample. However, the random effect standard deviation is 140 or almost half of the body weight. The Time effect is significant at 99% confidence level. The boostrap tests show no 0 is contained in the interval.
10.1.5 e. Obtain the BLUPs for the rats and list who is the most heavy.
# Get the blup
predict(ml_intercept)
## 1 2 3 4 5 6 7 8
## 242.0738 246.1736 250.2734 254.3732 258.4730 262.5727 266.6725 267.2582
## 9 10 11 12 13 14 15 16
## 270.7723 274.8721 278.9719 218.6277 222.7275 226.8273 230.9270 235.0268
## 17 18 19 20 21 22 23 24
## 239.1266 243.2264 243.8121 247.3262 251.4260 255.5257 241.1651 245.2649
## 25 26 27 28 29 30 31 32
## 249.3646 253.4644 257.5642 261.6640 265.7638 266.3494 269.8635 273.9633
## 33 34 35 36 37 38 39 40
## 278.0631 247.5264 251.6262 255.7260 259.8258 263.9256 268.0253 272.1251
## 41 42 43 44 45 46 47 48
## 272.7108 276.2249 280.3247 284.4245 250.4345 254.5343 258.6340 262.7338
## 49 50 51 52 53 54 55 56
## 266.8336 270.9334 275.0332 275.6189 279.1330 283.2327 287.3325 255.7053
## 57 58 59 60 61 62 63 64
## 259.8051 263.9049 268.0047 272.1044 276.2042 280.3040 280.8897 284.4038
## 65 66 67 68 69 70 71 72
## 288.5036 292.6034 255.6144 259.7142 263.8140 267.9138 272.0136 276.1133
## 73 74 75 76 77 78 79 80
## 280.2131 280.7988 284.3129 288.4127 292.5125 246.4359 250.5357 254.6355
## 81 82 83 84 85 86 87 88
## 258.7353 262.8350 266.9348 271.0346 271.6203 275.1344 279.2342 283.3340
## 89 90 91 92 93 94 95 96
## 421.7366 425.8364 429.9362 434.0360 438.1358 442.2356 446.3353 446.9210
## 97 98 99 100 101 102 103 104
## 450.4351 454.5349 458.6347 433.6415 437.7413 441.8410 445.9408 450.0406
## 105 106 107 108 109 110 111 112
## 454.1404 458.2402 458.8259 462.3400 466.4397 470.5395 435.7316 439.8314
## 113 114 115 116 117 118 119 120
## 443.9312 448.0310 452.1308 456.2305 460.3303 460.9160 464.4301 468.5299
## 121 122 123 124 125 126 127 128
## 472.6297 571.3193 575.4191 579.5189 583.6187 587.7185 591.8183 595.9180
## 129 130 131 132 133 134 135 136
## 596.5037 600.0178 604.1176 608.2174 473.8089 477.9087 482.0084 486.1082
## 137 138 139 140 141 142 143 144
## 490.2080 494.3078 498.4076 498.9933 502.5074 506.6071 510.7069 517.1570
## 145 146 147 148 149 150 151 152
## 521.2567 525.3565 529.4563 533.5561 537.6559 541.7557 542.3413 545.8554
## 153 154 155 156 157 158 159 160
## 549.9552 554.0550 521.0647 525.1644 529.2642 533.3640 537.4638 541.5636
## 161 162 163 164 165 166 167 168
## 545.6633 546.2490 549.7631 553.8629 557.9627 514.7033 518.8031 522.9029
## 169 170 171 172 173 174 175 176
## 527.0026 531.1024 535.2022 539.3020 539.8877 543.4018 547.5016 551.6013
max(predict(ml_intercept))
## [1] 608.2174
The rat 132 is heavist with a weight of 608.
10.1.6 f. Fit a mixed effects model that describes how the weight varies linearly with time and allows for random variation in the intercepts and slopes of the rats. Then do the following:
Note: Convergence of this model may be an issue. You can use the argument control=lmerControl(optimizer=“Nelder_Mead”) in lmer() to use a different optimizer.
#??? why need a diff optimizer to converge? what's diff?
<- lmer(weight ~ Time_s + (Time_s| Rat), data = BodyWeight, control=lmerControl(optimizer="Nelder_Mead"))
ml_mix summary(ml_mix)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: weight ~ Time_s + (Time_s | Rat)
## Data: BodyWeight
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## REML criterion at convergence: 1208.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2658 -0.4256 0.0711 0.5871 2.7485
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Rat (Intercept) 15246.84 123.4781
## Time_s 0.12 0.3463 0.56
## Residual 19.75 4.4436
## Number of obs: 176, groups: Rat, 16
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 365.42163 30.87642 15.00001 11.835 5.22e-09 ***
## Time_s 0.58568 0.08828 15.00001 6.634 7.95e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Time_s 0.550
confint(ml_mix, method="boot", oldNames = FALSE)
## Computing bootstrap confidence intervals ...
##
## 2 warning(s): Model failed to converge with max|grad| = 0.00376307 (tol = 0.002, component 1) (and others)
## 2.5 % 97.5 %
## sd_(Intercept)|Rat 82.8559068 171.3197209
## cor_Time_s.(Intercept)|Rat 0.1279669 0.8435561
## sd_Time_s|Rat 0.2129623 0.4749755
## sigma 3.9224008 4.9165953
## (Intercept) 311.7805567 417.4600562
## Time_s 0.3832744 0.7455524
• Compute confidence intervals for the random effects, do the slopes and intercepts seem to vary between subjects?
The slopes and intercepts are both significant at 95% confidence level. Thus they both vary between subjects.
• Interpret the standard deviation of the slope random effect in the context of the fixed effect slope estimate. Does it seem large or small?
0.3463 / 0.58568
## [1] 0.5912785
The random effect of the slope is about 60% of the fixed effect, thus it seems large.
• Test for significance of the Time fixed effect again, and compare the result to parts b and d.
The time fixed effect is significant with an estimate of 0.59 and the t value of 6.63.
In b is: the estimate is the same of 0.59 but it was not significant with the time value of 1.19.
In d is: the estimate is the same of 0.59 but it was more significant at the t value of 18.49
In summary, the (f) result fits the data better. In b without considering the rat’s random effect, it contains too much noise. In d, without the random slope, too much variations were attributed to the intercept.
10.1.7 g. Obtain the BLUPs for the model in part f. Which rat has the fastest rate of increase? The slowest?
Compute the values of their specific slopes using the fixed effect estimate for time and their BLUP value. Print out the data for these two rats and comment on their minimum and maximum weights over the course of the study.
#predict(ml_mix)
ranef(ml_mix)
## $Rat
## (Intercept) Time_s
## 2 -138.54609 -0.2545799
## 3 -118.14461 -0.1887413
## 4 -109.71085 -0.2524615
## 1 -119.87610 -0.1075102
## 8 -113.21541 -0.1782011
## 5 -109.14111 -0.1805082
## 6 -101.21900 -0.2620429
## 7 -97.74852 -0.3716064
## 11 76.90824 -0.2025888
## 9 43.16497 0.4048760
## 10 44.94282 0.7164946
## 12 188.10531 0.5485514
## 13 97.89066 0.3235416
## 15 157.98977 -0.0713060
## 14 159.54107 -0.2392710
## 16 139.05884 0.3153536
##
## with conditional variances for "Rat"
summary(ml_mix) #need to combine with the gloabl slope of 0.58 to get the actual slope.
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: weight ~ Time_s + (Time_s | Rat)
## Data: BodyWeight
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## REML criterion at convergence: 1208.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2658 -0.4256 0.0711 0.5871 2.7485
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Rat (Intercept) 15246.84 123.4781
## Time_s 0.12 0.3463 0.56
## Residual 19.75 4.4436
## Number of obs: 176, groups: Rat, 16
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 365.42163 30.87642 15.00001 11.835 5.22e-09 ***
## Time_s 0.58568 0.08828 15.00001 6.634 7.95e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Time_s 0.550
#!!! how to seperate the effects for time growth?
10.1.8 h. Add the (fixed) diet variable to the model in the form of an interaction with time. Does the diet effect weight? Is this effect different at different times?
<- lmer(weight ~ Time_s*Diet + (Time_s| Rat), data = BodyWeight, control=lmerControl(optimizer="Nelder_Mead"))
ml_mix_int #allow rat specific slope to vary. if I combine with another, no need to do both. The question only asks for two fixed effect interaction.
#Time_s and Rat are interacted. depends on the rat.
summary(ml_mix_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: weight ~ Time_s * Diet + (Time_s | Rat)
## Data: BodyWeight
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## REML criterion at convergence: 1151.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2556 -0.4220 0.0823 0.5993 2.7899
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Rat (Intercept) 1.362e+03 36.9028
## Time_s 6.171e-02 0.2484 -0.14
## Residual 1.975e+01 4.4436
## Number of obs: 176, groups: Rat, 16
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 252.01129 13.07975 13.00001 19.267 6.08e-11 ***
## Time_s 0.35964 0.09114 12.99997 3.946 0.00167 **
## Diet2 201.27133 22.65479 13.00001 8.884 6.97e-07 ***
## Diet3 252.37002 22.65479 13.00001 11.140 5.07e-08 ***
## Time_s:Diet2 0.60584 0.15786 12.99997 3.838 0.00205 **
## Time_s:Diet3 0.29834 0.15786 12.99997 1.890 0.08127 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Time_s Diet2 Diet3 Tm_:D2
## Time_s -0.153
## Diet2 -0.577 0.088
## Diet3 -0.577 0.088 0.333
## Time_s:Dit2 0.088 -0.577 -0.153 -0.051
## Time_s:Dit3 0.088 -0.577 -0.051 -0.153 0.333
#!!!do I need to interact when the Rat is the predictors for random slopes?
The diet affects the wieght. Both Diet2 and Diet3 have positive and significant fixed efect coefficients. Thus, they are more effective than Diet1 in gaining weight. The Diet2 interact with the Time. As the time increases, the Diet2 is more effective in gaining the weight.
10.2 2) Time Series
Recall the Johnson & Johnson data from course notes, which contains the quarterly earnings per Johnson & Johnson share from 1960 to 1980. We can read this in and convert it to a more familiar dataframe format with the following code.
#Why read as vector in the dataframe -> final step add the trend back. Make the format as.vector -> strip out the meta data.
data(JohnsonJohnson)
<- data.frame(EPS= as.vector(JohnsonJohnson),
JJ Time= as.vector(time(JohnsonJohnson))) #matrix of year and quarter if without the vector. 1960 will be q1 -> 1960.25 vs. four columns for differetn period of the year
10.2.1 a. Create a variable containing the log (base 10) of the EPS, then use lm() to regress it onto Time. Store the residuals of this regression in another variable in JJ. Make a plot of the residuals vs Time and describe your findings.
$EPS_lg = log(JJ$EPS) JJ
<- lm(EPS_lg ~ Time, JJ)
lm_model_ts summary(lm_model_ts)
##
## Call:
## lm(formula = EPS_lg ~ Time, data = JJ)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.38309 -0.08569 0.00297 0.09984 0.38016
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.275e+02 5.623e+00 -58.25 <2e-16 ***
## Time 1.668e-01 2.854e-03 58.45 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1585 on 82 degrees of freedom
## Multiple R-squared: 0.9766, Adjusted R-squared: 0.9763
## F-statistic: 3416 on 1 and 82 DF, p-value: < 2.2e-16
<- resid(lm(EPS_lg ~ Time, JJ)) lm_ts_resids
acf(lm_ts_resids)
#weakly dependent process: The correlation is with which period? why does it keep decreasing? since it is further away?
There seems to be a seasonal cycle of every 4th periods.
10.2.2 b. Run the detrended values through auto.arima() to find a good ARMA model. Note that you may want to use the additional argument stepwise = FALSE to allow auto.arima() to conduct a full grid search. What model was found?
#arima(lm_ts_resids, order = c(1, 0, 1))
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
auto.arima(lm_ts_resids,stepwise = FALSE)
## Series: lm_ts_resids
## ARIMA(4,0,0) with zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4
## 0.0454 0.0386 -0.0826 0.8525
## s.e. 0.0601 0.0555 0.0613 0.0582
##
## sigma^2 = 0.008767: log likelihood = 79.07
## AIC=-148.13 AICc=-147.36 BIC=-135.98
The model is ARIMA(4,0,0)
10.2.3 c. Use the best model to forecast the detrended relationship 2 years (8 quarters) ahead. Make a plot of the result and comment on how reasonable your predictions look.
#??? why is the JJ not need time() function?
<- diff(JJ$Time)[1] # diff func gives the difference of adjacnet value. diff 1,2,3 -> 1,1. take the first element.
dt <- max(JJ$Time) + (1:8)*dt # Time points we will predict at
time_steps <- arima(lm_ts_resids, order = c(4,0,0)) #!!! need to be on the residual. NOT in the JJ.
ARMA_p1q1 <- predict( ARMA_p1q1, n.ahead=8 )$pred
ARMA_p1q1_pred plot(JJ$Time,lm_ts_resids,xlim = c(1970, 1990),type = 'l') #How to specify exactly how many periods here? the Time unit is off??? #no plot.ts stripe out the time function -> only if passing time-seris object. generic func may recognize it is time series.
lines(x = time_steps, y = ARMA_p1q1_pred, col="red")
ARMA_p1q1_pred
## Time Series:
## Start = 85
## End = 92
## Frequency = 1
## [1] 0.05160930 -0.06661295 0.01437544 -0.33250908 0.03520765 -0.06894246
## [7] 0.03825592 -0.28704069
#time_steps
The prediction seems to reflect the seasonal fluctuation and a reasonable average, thus the prediction looks good.
10.2.4 d. Use the linear model from part a to predict the trend for the next 8 quarters, add the result to that of part c, transform back to the original scale, and plot the forecast along with the original data. Does the forecast seem reasonable?
<- diff(JJ$Time)[1] # differences between time points ??? What's the diff?
dt <- max(JJ$Time) + (1:8)*dt # Time points we will predict at
time_steps
#plot.ts(JJ[,3],xlim = c(1970, 1990)) #How to specify exactly how many periods here? the Time unit is off???
#lines(x = time_steps, y = ARMA_lm_pred, col="red")
<- lm(lm_ts_resids ~ Time, JJ)
lm_model_ts_resid <- predict(lm_model_ts_resid,n.ahead=8) #??? cannot predict the time series?
ARMA_lm_pred
<- data.frame(ARMA_lm_pred)
ARMA_lm_pred_df c(1:8),] #change it to the first 8 periods of prediction ARMA_lm_pred_df[
## [1] -2.416911e-17 -2.329483e-17 -2.242055e-17 -2.154627e-17 -2.067199e-17
## [6] -1.979771e-17 -1.892343e-17 -1.804915e-17
plot(JJ$Time,lm_ts_resids,xlim = c(1970, 1990),type = 'l')
lines(x = time_steps, y = ARMA_p1q1_pred, col="red")
lines(x = time_steps, y = ARMA_lm_pred_df[c(1:8),], col="blue")
The forcast doesn’t seem reasonable from the linear model. it is essentially a flat line that doesn’t reflect the seasonality.
#S25 !!!
set.seed(1); n <- 2000; eps_t <- rnorm(n, sd = 2); par(mfrow=c(1,1))
<- MA <- ARMA <- double(n)
AR 1:2] <- MA[1:2] <- ARMA[1:2] <- 0
AR[for(i in 3:n) AR[i] <- sum(c(.6, .2) * AR[(i-1):(i-2)]) + eps_t[i]
for(i in 3:n) MA[i] <- eps_t[i] + .8*eps_t[i-1]
for(i in 3:n) ARMA[i] <- sum(c(.6, .2) * AR[(i-1):(i-2)]) + eps_t[i] + .8*eps_t[i-1]
plot.ts(AR,ylim = c(-15,15)); lines(MA, col="red"); lines(ARMA, col="blue")
abline(h=0,lty=2)
10.3 3) Spatial Point Process Analysis
The mucosa datset in spatstat.data gives the locations of the centers of two types of cells in a cross-section of the gastric mucosa of a rat. For the purpose of this analysis, we will ignore the type of cell and focus on the locations of the cells only. The data can be loaded with the following command:
library(spatstat)
## Loading required package: spatstat.data
## Loading required package: spatstat.geom
## Registered S3 method overwritten by 'spatstat.geom':
## method from
## print.boxx cli
## spatstat.geom 2.4-0
## Loading required package: spatstat.random
## spatstat.random 2.2-0
## Loading required package: spatstat.core
## Loading required package: nlme
##
## Attaching package: 'nlme'
## The following object is masked _by_ '.GlobalEnv':
##
## BodyWeight
## The following object is masked from 'package:forecast':
##
## getResponse
## The following object is masked from 'package:lme4':
##
## lmList
## The following object is masked from 'package:dplyr':
##
## collapse
## Loading required package: rpart
## spatstat.core 2.4-2
## Loading required package: spatstat.linnet
## spatstat.linnet 2.3-2
##
## spatstat 2.3-4 (nickname: 'Watch this space')
## For an introduction to spatstat, type 'beginner'
data(mucosa, package = "spatstat.data")
10.3.1 a. Make a plot of the point pattern. Does the distribution of the points in space appear to be homogeneous Poisson process?
#S43
plot(mucosa, main=('mucosa'))
It doesn’t appear to be homogeneous poisson since the distribution is more dense in the bottom of the plot in comparison with the top.
10.3.2 b. Make plots of G(r) and F(r) and comment on what you observe.
<- envelope(mucosa, Gest, verbose=FALSE) #??? how to judge if poisson? or all count poisson? just not homo?
GCSR <- envelope(mucosa, Fest, verbose=FALSE)
FCSR plot(GCSR); plot(FCSR)
According to the plot, it is not a homogeneous poisson process as the black line is outside the boundary
10.3.3 c. Fit a point process model with linear terms in both x and y. Are there any significant spatial relationships?
#S62 - what's the z, why is it a func of x and y??? what's the weight on the distance here?
#ppm function to look for gradience -> intensitivyt of point process. how many in diff grids. lower value of y has more.
<- data.frame(mucosa)
dat library(ggplot2)
qplot(x, y, data=dat)
ppm(mucosa ~ x + y) #homogeneous poisson process , must have a point dataframe.note there is no DV. the first item is the dataset. not set as data =
## Nonstationary multitype Poisson process
##
## Possible marks: 'ECL' and 'other'
##
## Log intensity: ~x + y
##
## Fitted trend coefficients:
## (Intercept) x y
## 6.825371255 -0.007512035 -1.156055790
##
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) 6.825371255 0.08038734 6.6678150 6.9829275 *** 84.9060501
## x -0.007512035 0.11152004 -0.2260873 0.2110632 -0.0673604
## y -1.156055790 0.14068205 -1.4317875 -0.8803240 *** -8.2175075
#??? why the y coefficient is negative. how to interpret?
#pp.dists <- cbind(dat$x, dat$y) |> dist() |> as.matrix()
#pp.dists[1:3,1:3] #what's the distance here1:3!!! first three elements.
#pp.dists.inv <- 1/pp.dists
#diag(pp.dists.inv) <- 0
#ape::Moran.I(dat$x, pp.dists.inv) #!!! what weight should be here? -> ONLY if there are three dimensions for moran.
There is significant spatial relationship here. The y is significant. the lower the y the denser the units per grid.
10.4 4) Loading Keras
Below you will find a block of code that can be used to simulate some simple noisy nonlinear data, fit an ANN to that data, and plot the fit.
# Then, use the install_tensorflow() function to install TensorFlow. Note that on Windows you need a working installation of Anaconda.
# already install the anaconda version of tensorflow https://tensorflow.rstudio.com/installation/
#library(tensorflow)
#install_tensorflow()
library(tensorflow)
library(keras)
<- 100
N <- runif(N, -2, 3) # Build a Gompertz function with some noise
x <- 10 # ceiling parameter
a <- 0 # horizontal shift parameter
b <- 2 # growth rate parameter
c <- a*exp(-exp(b - c*x)) + rnorm(N, mean=0, sd= a/10)
y plot(x,y) # format data in a way model.fit() would like it
<- as.matrix(x)
x <- as.matrix(y) # sequence along x for later prediction
y <- as.matrix(seq(from=min(x), to=max(x), length.out=1000)) # Build and compile simple model
xnew <- keras_model_sequential() |>
model layer_dense(units=32, activation="relu", input_shape=1) |>
layer_dense(units=16, activation = "relu") |>
layer_dense(units=1, activation="linear")
## Loaded Tensorflow version 2.8.0
|> compile(
model loss = "mse",
optimizer = "adam",
metrics = list("mean_absolute_error")
)# Fit model, get predictions, and display result.
|> summary() model
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## dense_2 (Dense) (None, 32) 64
##
## dense_1 (Dense) (None, 16) 528
##
## dense (Dense) (None, 1) 17
##
## ================================================================================
## Total params: 609
## Trainable params: 609
## Non-trainable params: 0
## ________________________________________________________________________________
# Can change verbose > 0 to see plots and text output as model trains
|> fit(x, y, epochs = 20, verbose = 0)
model = model |> predict(xnew)
y_pred plot(x, y)
lines(xnew, y_pred, col="red")
Complete the following:
10.4.1 a. Mostly as an exercise to ensure you have Keras installed correctly, run the above code and comment onthe quality of the fit you observe. NOTE: I was unable to run the code within an RMarkdown cell, but this may be due to a system configuration issue.
The fit is not well. It fits the first part of linear relationship well but not the second part of non-linear relationship.
10.4.2 b. Modify the code in an attempt to improve the fit. Specifically, study the effects (as well as interactions) of:
• Modifying how much data (N) we have. • The complexity of the model. • For how many epochs the model is trained.
library(tensorflow)
library(keras)
<- 10000
N <- runif(N, -2, 3) # Build a Gompertz function with some noise
x <- 10 # ceiling parameter
a <- 0 # horizontal shift parameter
b <- 2 # growth rate parameter
c <- a*exp(-exp(b - c*x)) + rnorm(N, mean=0, sd= a/10)
y plot(x,y) # format data in a way model.fit() would like it
<- as.matrix(x)
x <- as.matrix(y) # sequence along x for later prediction
y <- as.matrix(seq(from=min(x), to=max(x), length.out=1000)) # Build and compile simple model
xnew <- keras_model_sequential() |>
model layer_dense(units=32, activation="relu", input_shape=1) |>
layer_dense(units=16, activation = "relu") |>
layer_dense(units=1, activation="linear")
|> compile(
model loss = "mse",
optimizer = "adam",
metrics = list("mean_absolute_error")
)# Fit model, get predictions, and display result.
|> summary() model
## Model: "sequential_1"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## dense_5 (Dense) (None, 32) 64
##
## dense_4 (Dense) (None, 16) 528
##
## dense_3 (Dense) (None, 1) 17
##
## ================================================================================
## Total params: 609
## Trainable params: 609
## Non-trainable params: 0
## ________________________________________________________________________________
# Can change verbose > 0 to see plots and text output as model trains
|> fit(x, y, epochs = 100, verbose = 0)
model = model |> predict(xnew)
y_pred plot(x, y)
lines(xnew, y_pred, col="red")
plot(x, y)
lines(xnew, y_pred, col="red")
The model fits much better after I added more data for training and let it train for more iterations.