Title: | Functional Data Analysis in a Mixed Model Framework |
---|---|
Description: | Likelihood based analysis of 1-dimension functional data in a mixed-effects model framework. Matrix computation are approximated by semi-explicit operator equivalents with linear computational complexity. Markussen (2013) <doi:10.3150/11-BEJ389>. |
Authors: | Bo Markussen [aut, cre] |
Maintainer: | Bo Markussen <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.6.1 |
Built: | 2024-11-14 03:58:31 UTC |
Source: | https://github.com/bomarkussen/fdamixed |
Likelihood based analysis of 1-dimension functional data in a mixed-effets model framework. The methodology is designed for equidistantly sampled high frequency data, where the needed matrix computation may be approximated by semi-explicit operator equivalents with linear computational complexity. Extensions exist for non-equidistantly sampled data, but these have not been implemented.
Bo Markussen <[email protected]>
Bo Markussen (2013), "Functional data analysis in an operator based mixed model framework", Bernoulli, vol. 19, pp. 1-17.
Conrad Sanderson (2010), "Armadillo: An open source C++ linear algebra library for fast prototyping and computationally intensive experiments", NICTA technical report.
Dirk Eddelbuettel, "Rcpp: Seamless R and C++ Integration with Rcpp", UseR!, Springer, 2013.
Implementation done using the package RcppArmadillo
. For penalized likelihood analysis of functional data see the packages fda
and fda.usc
.
x <- seq(0,2*pi,length.out=200) y.true <- sin(x)+x y.obs <- y.true + rnorm(200) est0 <- fdaLm(y.obs~0,Fright="open",right=2*pi) est1 <- fdaLm(y.obs~0+x,Fright="open",right=2*pi) plot(x,y.obs,main="Estimating the sum of a line and a curve") lines(x,y.true,lty=2) lines(x,est0$xBLUP[,1,1],col=2) lines(x,est1$betaHat*x+est1$xBLUP[,1,1],col=3) legend("topleft",c("True curve","Smooth","Line + smooth"),col=1:3,lty=c(2,1,1))
x <- seq(0,2*pi,length.out=200) y.true <- sin(x)+x y.obs <- y.true + rnorm(200) est0 <- fdaLm(y.obs~0,Fright="open",right=2*pi) est1 <- fdaLm(y.obs~0+x,Fright="open",right=2*pi) plot(x,y.obs,main="Estimating the sum of a line and a curve") lines(x,y.true,lty=2) lines(x,est0$xBLUP[,1,1],col=2) lines(x,est1$betaHat*x+est1$xBLUP[,1,1],col=3) legend("topleft",c("True curve","Smooth","Line + smooth"),col=1:3,lty=c(2,1,1))
Performs forward and backward Box-Cox power transformation including the invariance scaling based on the geometric mean.
dataTrans(y, mu, direction = "backward", geoMean = NULL)
dataTrans(y, mu, direction = "backward", geoMean = NULL)
y |
The numeric variable object to be transformed. |
mu |
The power parameter, where zero corresponds to the logarithmic transformation. |
direction |
A character variable. If the lower case of the first
letter equals |
geoMean |
If a numeric is stated, then this is taken as the
geometric mean of the untransformed observations. If |
The transformed variable.
This function is intended to be used in conjunction with
fdaLm
to achieve estimates on the orginal scale. Thus,
the geometric mean of the original observations should be kept in
order to have the correct backtransformation.
Bo Markussen <[email protected]>
# ---------------------------------------------------- # Make 3 samples with the following characteristics: # 1) length N=500 # 2) sinusoid form + linear fixed effect + noise # 3) exponential transformed # ---------------------------------------------------- N <- 500 sample.time <- seq(0,2*pi,length.out=N) z <- c("a","b","c") x0 <- c(0,10,20) x1 <- rep(x0,each=N) y <- c(sin(sample.time),sin(sample.time),sin(sample.time))+x1+rnorm(3*N) # Make exponential-Box-Cox-backtransformation # Scaling with geometric mean requires that we solve the Whiteker function geoMean <- mean(y) geoMean <- uniroot(function(x){x*log(x)-geoMean},c(exp(-1),(1+geoMean)^2))$root y <- dataTrans(y,0,"b",geoMean) # ---------------------------------------------------- # Do fda's with global and marginal fixed effects # Also seek to find Box-Cox transformation with mu=0 # ---------------------------------------------------- est0 <- fdaLm(y|z~x0,boxcox=1) est1 <- fdaLm(y|z~x1,boxcox=1) # ----------------------------------------------------- # Display results # ----------------------------------------------------- # Panel 1 plot(sample.time,dataTrans(est0$betaHat[,"(Intercept)"]+est0$betaHat[,"x0"], est0$boxcoxHat,"b",geoMean)/ dataTrans(est0$betaHat[,"(Intercept)"],est0$boxcoxHat,"b",geoMean), main="Effect of x (true=1.2)",xlab="time", ylab="response ratio") abline(h=dataTrans(est1$betaHat["(Intercept)"]+est1$betaHat["x1"], est1$boxcoxHat,"b",geoMean)/ dataTrans(est1$betaHat["(Intercept)"],est1$boxcoxHat,"b",geoMean),col=2) legend("topleft",c("marginal","global"),pch=c(1,NA),lty=c(NA,1),col=1:2) # Panel 2 plot(sample.time,dataTrans(est0$betaHat[,"(Intercept)"]+est0$betaHat[,"x0"], est0$boxcoxHat,"b",geoMean)- dataTrans(est0$betaHat[,"(Intercept)"],est0$boxcoxHat,"b",geoMean), main="Effect of x (true=1)",xlab="time", ylab="response difference") abline(h=dataTrans(est1$betaHat["(Intercept)"]+est1$betaHat["x1"], est1$boxcoxHat,"b",geoMean)- dataTrans(est1$betaHat["(Intercept)"],est1$boxcoxHat,"b",geoMean),col=2) legend("bottomleft",c("marginal","global"),pch=c(1,NA),lty=c(NA,1),col=1:2) # Panel 3 plot(sample.time,est0$xBLUP[,1,1],type="l", main="Marginal ANOVA",xlab="time",ylab="x BLUP") # Panel 4 plot(sample.time,est1$xBLUP[,1,1],type="l", main="Global ANOVA",xlab="time",ylab="x BLUP")
# ---------------------------------------------------- # Make 3 samples with the following characteristics: # 1) length N=500 # 2) sinusoid form + linear fixed effect + noise # 3) exponential transformed # ---------------------------------------------------- N <- 500 sample.time <- seq(0,2*pi,length.out=N) z <- c("a","b","c") x0 <- c(0,10,20) x1 <- rep(x0,each=N) y <- c(sin(sample.time),sin(sample.time),sin(sample.time))+x1+rnorm(3*N) # Make exponential-Box-Cox-backtransformation # Scaling with geometric mean requires that we solve the Whiteker function geoMean <- mean(y) geoMean <- uniroot(function(x){x*log(x)-geoMean},c(exp(-1),(1+geoMean)^2))$root y <- dataTrans(y,0,"b",geoMean) # ---------------------------------------------------- # Do fda's with global and marginal fixed effects # Also seek to find Box-Cox transformation with mu=0 # ---------------------------------------------------- est0 <- fdaLm(y|z~x0,boxcox=1) est1 <- fdaLm(y|z~x1,boxcox=1) # ----------------------------------------------------- # Display results # ----------------------------------------------------- # Panel 1 plot(sample.time,dataTrans(est0$betaHat[,"(Intercept)"]+est0$betaHat[,"x0"], est0$boxcoxHat,"b",geoMean)/ dataTrans(est0$betaHat[,"(Intercept)"],est0$boxcoxHat,"b",geoMean), main="Effect of x (true=1.2)",xlab="time", ylab="response ratio") abline(h=dataTrans(est1$betaHat["(Intercept)"]+est1$betaHat["x1"], est1$boxcoxHat,"b",geoMean)/ dataTrans(est1$betaHat["(Intercept)"],est1$boxcoxHat,"b",geoMean),col=2) legend("topleft",c("marginal","global"),pch=c(1,NA),lty=c(NA,1),col=1:2) # Panel 2 plot(sample.time,dataTrans(est0$betaHat[,"(Intercept)"]+est0$betaHat[,"x0"], est0$boxcoxHat,"b",geoMean)- dataTrans(est0$betaHat[,"(Intercept)"],est0$boxcoxHat,"b",geoMean), main="Effect of x (true=1)",xlab="time", ylab="response difference") abline(h=dataTrans(est1$betaHat["(Intercept)"]+est1$betaHat["x1"], est1$boxcoxHat,"b",geoMean)- dataTrans(est1$betaHat["(Intercept)"],est1$boxcoxHat,"b",geoMean),col=2) legend("bottomleft",c("marginal","global"),pch=c(1,NA),lty=c(NA,1),col=1:2) # Panel 3 plot(sample.time,est0$xBLUP[,1,1],type="l", main="Marginal ANOVA",xlab="time",ylab="x BLUP") # Panel 4 plot(sample.time,est1$xBLUP[,1,1],type="l", main="Global ANOVA",xlab="time",ylab="x BLUP")
Fits variance and smoothing parameters, and possibly also Box-Cox transformation, by maximum restricted likelihood. Estimate fixed parameters, predict random effects, and predict serial correlated effect at point of maximum restricted likelihood. Linear models for fixed and random effects may be global or marginal over sample times.
fdaLm(formula, data, design, boxcox = NULL, G = 1, lambda = 1, nlSearch = TRUE, K.order = 1, D.order = NULL, Fleft = "tied", Fright = "tied", left = NULL, right = NULL)
fdaLm(formula, data, design, boxcox = NULL, G = 1, lambda = 1, nlSearch = TRUE, K.order = 1, D.order = NULL, Fleft = "tied", Fright = "tied", left = NULL, right = NULL)
formula |
A multiple formula of the type |
data |
An optional data frame containing the variables. See details below. |
design |
An optional data frame containing the design variables in the specification of the fixed and the random effects. See details below. |
boxcox |
The power parameter in the scale invariant Box-Cox transformation.
If |
G |
Variance of the random effects. Present implementation only allows
for independent random effects, i.e. |
lambda |
Start value for the |
nlSearch |
If |
K.order |
The order of the K-operator. |
D.order |
The requested order of derivatives of the prediction of the serial
correlated effect |
Fleft |
Specification of the |
Fright |
Similarly for the |
left |
Left limit of the sampling interval. If |
right |
Right limit of the sampling interval. If |
The response variable Y
is taken from the data frame
data
(subsidiary the parent environment). If there is more
than one sample, then the responses must be stacked sample-wise on top
of each other. The sample identifier id
is sought for in both
data frames data
and design
(subsidiary the parent
environment). The primarily function of the identifier is to decide
the number of samples. But if id
is present in both data
frames, and if there is more that one sample, then this variable is
also used to match the reponse vector to the design variables
(i.e. these need not appear in the same order).
The design variables fixed
and random
for the fixed and
the random effects are taken from the data frame design
(subsidiary the parent environment), subsidiary from the data frame
data
(subsidiary the parent environment).
If the number of observations in the design variables equal the total number of response observations, then a global ANOVA is performed. If the number of observations in the design variables equal the number of sample points, then a marginal ANOVA is performed.
A list with components
logLik |
Minus twice the log restricted likelihood taken at the estimates. |
ANOVA |
Specifies whether fixed and random effects were estimated
globally ( |
nlSearch |
Specifies whether non-linear optimization was
performed ( |
counts |
Number of computations of the negative log likelihood. |
boxcoxHat |
Maximum restricted likelihood estimate for the power
parameter in the scale invariant Box-Cox transformation. Equal to
|
Ghat |
Maximum restricted likelihood estimate for the variance matrix of the random effects. |
lambdaHat |
Maximum restricted likelihood estimate for the lambda parameter describing the L-operator. |
sigma2hat |
Maximum restricted likelihood estimate for the noise variance. |
betaHat |
For global ANOVA a vector with estimate for the fixed effect. For marginal ANOVA a matrix with estimate for the fixed effects. |
uBLUP |
For global ANOVA a vector with prediction of the random effect. For marginal ANOVA a matrix with predictions of the random effects. |
xBLUP |
Array with predictions of serial correlated effects. The
dimension is (sample length,sample numbers,1+ |
condRes |
Matrix of conditional residuals. The dimension is (sample length,sample numbers). |
betaVar |
Variance matrix of fixed effect estimate. |
If the real value of the left most eigenvalues are non-positive, and if the real value of the right most eigenvalues are non-negative, then the underlying algorithm is numerical stable. This will always be the situation for the present restriction of the L-operator.
If lambda
has length=1, then it may also be interpreted as the
smoothing parameter in the penalized likelihood framework.
If D.order
is chosen larger than K.order
, this number of
derivaties are also computed during the non-linear optimization. This
might slow down the computation speed a little bit.
Bo Markussen <[email protected]>
See also findRoots
and dataTrans
.
# --------------------- # Using a fixed effect # --------------------- x <- seq(0,2*pi,length.out=200) y.true <- sin(x)+x y.obs <- y.true + rnorm(200) est0 <- fdaLm(y.obs~0,Fright="open",right=2*pi) est1 <- fdaLm(y.obs~0+x,Fright="open",right=2*pi) plot(x,y.obs,main="Estimating the sum of a line and a curve") lines(x,y.true,lty=2) lines(x,est0$xBLUP[,1,1],col=2) lines(x,est1$betaHat*x+est1$xBLUP[,1,1],col=3) legend("topleft",c("True curve","Smooth","Line + smooth"),col=1:3,lty=c(2,1,1)) # -------------------------- # Including a random effect # -------------------------- # Build data frame test.frame <- data.frame(y=rnorm(50),sample=factor(rep(1:5,each=10)), x=rep(0:9,times=5), f=factor(rnorm(50) < 0,labels=c("a","b")), j=factor(rnorm(50) < 0,labels=c("A","B"))) test.frame$y <- test.frame$y + 2 + 3*(test.frame$f=="a")*test.frame$x + 5*(test.frame$f=="b")*test.frame$x + (-10)*(test.frame$j=="A") + 10*(test.frame$j=="B") # This is the model 'y|sample ~ f:x|j' with intercept=2, slopes (3,5), # and random effects (-10,10) est <- fdaLm(y|sample ~ f:x|0+j,data=test.frame) print(est)
# --------------------- # Using a fixed effect # --------------------- x <- seq(0,2*pi,length.out=200) y.true <- sin(x)+x y.obs <- y.true + rnorm(200) est0 <- fdaLm(y.obs~0,Fright="open",right=2*pi) est1 <- fdaLm(y.obs~0+x,Fright="open",right=2*pi) plot(x,y.obs,main="Estimating the sum of a line and a curve") lines(x,y.true,lty=2) lines(x,est0$xBLUP[,1,1],col=2) lines(x,est1$betaHat*x+est1$xBLUP[,1,1],col=3) legend("topleft",c("True curve","Smooth","Line + smooth"),col=1:3,lty=c(2,1,1)) # -------------------------- # Including a random effect # -------------------------- # Build data frame test.frame <- data.frame(y=rnorm(50),sample=factor(rep(1:5,each=10)), x=rep(0:9,times=5), f=factor(rnorm(50) < 0,labels=c("a","b")), j=factor(rnorm(50) < 0,labels=c("A","B"))) test.frame$y <- test.frame$y + 2 + 3*(test.frame$f=="a")*test.frame$x + 5*(test.frame$f=="b")*test.frame$x + (-10)*(test.frame$j=="A") + 10*(test.frame$j=="B") # This is the model 'y|sample ~ f:x|j' with intercept=2, slopes (3,5), # and random effects (-10,10) est <- fdaLm(y|sample ~ f:x|0+j,data=test.frame) print(est)
Find complex roots of polynomials in x that are quadratic polynomials in x^k
findRoots(coefs, k = 1)
findRoots(coefs, k = 1)
coefs |
Coefficients |
k |
Order of x^k |
It is assumed that c_2k
is non-zero, and that at least one of
c_0
and c_k
are non-zero (otherwise, we have a double
root, which is not treated by fdaLm
in the present
implementation). An error is issued if these assumptions are violated.
A list with components
left |
The k roots with left most real components |
right |
The k roots with right most real components |
This function is intended for internal usage in fdaLm
to
find eigenvalues. If a robust and stable method of finding all the
complex roots is a polynomial were available, then this could be used
in fdaLm
instead enhancing the scope of this function.
Bo Markussen <[email protected]>
Solved using Section 5.6 in Press et al, "Numerical Recipies in C", second edition.
findRoots(c(-1,0,1),1) findRoots(c(1,-1,1),2)
findRoots(c(-1,0,1),1) findRoots(c(1,-1,1),2)