Title: | Miscellaneous Scripts from the Data Science Laboratory (UCPH) |
---|---|
Description: | Miscellaneous scripts, e.g. functionality to make and plot factor diagrams for the statistical design. |
Authors: | Bo Markussen [aut, cre] |
Maintainer: | Bo Markussen <[email protected]> |
License: | GPL-3 |
Version: | 1.4.4 |
Built: | 2024-11-22 04:07:59 UTC |
Source: | https://github.com/bomarkussen/labapplstat |
chisq.test.simulate
simulates the chi-squared test for a 2-way contingency tabel.
chisq.test.simulate(x, conditioning = "total", x0 = NULL, B = 10000)
chisq.test.simulate(x, conditioning = "total", x0 = NULL, B = 10000)
x |
matrix with the contingency table |
conditioning |
character string specifying the simulation scenario. Defaults to |
x0 |
matrix specifying the null distribution. Defaults to |
B |
integer specifying the number of replicates used in the Monte Carlo test. Defaults to 10000. |
Using conditioning="both"
corresponds to selecting simulate.p.value=TRUE
in chisq.test
. However, conditioning on both row and column marginals appears to be rarely justified in real data. Instead conditioning="total"
is the correct choice for testing independence. Similarly, conditioning="row"
is recommended when the row marginals e.g. are fixed by experimental design.
The option x0
has no effect when conditioning on both row and column marginals.
An object of class "htest"
.
The code has not been optimized for speed, and might be slow.
Bo Markussen
# The Avadex dataset Xobs <- matrix(c(2,3,6,40),2,2) rownames(Xobs) <- c("Avadex +","Avadex -") colnames(Xobs) <- c("Tumor +","Tumor -") # In this example only the rows appear to be fixed by experimental design. # As is seen below, conditioning also on the columns is misleading conservative. chisq.test.simulate(Xobs,"both") chisq.test.simulate(Xobs,"row") chisq.test.simulate(Xobs,"total") # Conditioning both on row and column marginals is simular to chisq.test(). chisq.test(Xobs,simulate.p.value=TRUE)
# The Avadex dataset Xobs <- matrix(c(2,3,6,40),2,2) rownames(Xobs) <- c("Avadex +","Avadex -") colnames(Xobs) <- c("Tumor +","Tumor -") # In this example only the rows appear to be fixed by experimental design. # As is seen below, conditioning also on the columns is misleading conservative. chisq.test.simulate(Xobs,"both") chisq.test.simulate(Xobs,"row") chisq.test.simulate(Xobs,"total") # Conditioning both on row and column marginals is simular to chisq.test(). chisq.test(Xobs,simulate.p.value=TRUE)
DD
computes the Design Diagram for a linear model.
DD(fixed, random = NULL, data, keep = ~1, center = FALSE, eps = 1e-12)
DD(fixed, random = NULL, data, keep = ~1, center = FALSE, eps = 1e-12)
fixed |
formula with fixed effects. A response may the specified, but this optional. |
random |
formula with random effects. Defaults to |
data |
data frame with the explanatory variables and the response (if specified). |
keep |
formula which effects that will not be removed in the collinarity analysis. Defaults to |
center |
boolean deciding whether to centralize numerical predictors when an intercept is present. Defaults to |
eps |
threshold for deeming singular values to be "zero". Defaults to 1e-12. |
An object of class designDiagram-class
Bo Markussen
# 3-way ANOVA x <- factor(rep(rep(1:4,times=4),times=4)) y <- factor(rep(rep(1:4,times=4),each=4)) z <- factor(rep(rep(1:4,each=4),each=4)) myDD <- DD(~x*y*z,data=data.frame(x=x,y=y,z=z)) summary(myDD) #Making the factor diagram closed under minima mydata <- data.frame(age=rep(c("boy","girl","adult","adult"),4), gender=rep(c("child","child","man","woman"),4)) myDD <- DD(~0+age+gender,data=mydata) plot(myDD) # Example of collinearity mydata <- data.frame(age=rnorm(102),edu=rnorm(102),sex=factor(rep(c(1,2),51))) mydata <- transform(mydata,exper=age-edu+0.1*rnorm(102)) mydata <- transform(mydata,wage=2*edu+2*exper+rnorm(102)) summary(myDD <- DD(wage~sex*(age+exper+edu),data=mydata)) # growth of rats antibiotica <- factor(rep(c(0,40),each=6)) vitamin <- factor(rep(rep(c(0,5),each=3),2)) growth <- c(1.30,1.19,1.08,1.26,1.21,1.19,1.05,1.00,1.05,1.52,1.56,1.55) mydata <- data.frame(antibiotica=antibiotica,vitamin=vitamin,growth=growth) myDD <- DD(growth~antibiotica*vitamin,data=mydata) plot(myDD,"MSS") plot(myDD,"I2") # ANCOVA: Non-orthogonal design library(isdals) data(birthweight) plot(DD(weight~sex*I(age-42),data=birthweight),"MSS") plot(DD(weight~I(age-42)+sex:I(age-42)+sex,data=birthweight),"MSS")
# 3-way ANOVA x <- factor(rep(rep(1:4,times=4),times=4)) y <- factor(rep(rep(1:4,times=4),each=4)) z <- factor(rep(rep(1:4,each=4),each=4)) myDD <- DD(~x*y*z,data=data.frame(x=x,y=y,z=z)) summary(myDD) #Making the factor diagram closed under minima mydata <- data.frame(age=rep(c("boy","girl","adult","adult"),4), gender=rep(c("child","child","man","woman"),4)) myDD <- DD(~0+age+gender,data=mydata) plot(myDD) # Example of collinearity mydata <- data.frame(age=rnorm(102),edu=rnorm(102),sex=factor(rep(c(1,2),51))) mydata <- transform(mydata,exper=age-edu+0.1*rnorm(102)) mydata <- transform(mydata,wage=2*edu+2*exper+rnorm(102)) summary(myDD <- DD(wage~sex*(age+exper+edu),data=mydata)) # growth of rats antibiotica <- factor(rep(c(0,40),each=6)) vitamin <- factor(rep(rep(c(0,5),each=3),2)) growth <- c(1.30,1.19,1.08,1.26,1.21,1.19,1.05,1.00,1.05,1.52,1.56,1.55) mydata <- data.frame(antibiotica=antibiotica,vitamin=vitamin,growth=growth) myDD <- DD(growth~antibiotica*vitamin,data=mydata) plot(myDD,"MSS") plot(myDD,"I2") # ANCOVA: Non-orthogonal design library(isdals) data(birthweight) plot(DD(weight~sex*I(age-42),data=birthweight),"MSS") plot(DD(weight~I(age-42)+sex:I(age-42)+sex,data=birthweight),"MSS")
designDiagram
class and some basic methodsObjects of class designDiagram
as generated by DD
is a list with entries as specified below.
response
Logical stating whether a response variable was present.
terms
Named vector with all terms in the design.
random.terms
Vector with the random terms in the design.
relations
Named matrix with relations between variables with the following interpretation: "0"=linear indepent, "<"=row term is a subspace of column, "<-"=row term is a subspace of column term and no other terms are inbetween, ">" and "->" the similar interpretatioin between columns and rows, name=name of minimum between row and column term.
inner
Named matrix of squared inner products of subspaces with nesting subspaces removed. Rounded at order of eps
in the call to link{DD}
. Used to decide orthogonality of the design.
Nparm
Named vector with the number of parameters for the terms.
df
Named vector with the degrees of freedom for the terms.
SS
Named matrix with Sum-of-Squares if a response variable was specified.
MSS
Named matrix with Mean-Sum-of-Squares if a response variable was specified.
pvalue
Named matrix with p-values for Type-I F-tests. p-values are stated at the collapsed nesting, but F-test are done against the most coarse nested random effect.
sigma2
Named vector of random effects variance estimates.
varcov
Named list of variance-covariance matrix for fixed effects relative to each of the random effects. Rounded at order of eps
.
coordinates
Data frame with node coordinates of the terms. Initialized in Sugiyama layout.
## S3 method for class 'designDiagram' print(x, ...) ## S3 method for class 'designDiagram' summary(object, ...) ## S3 method for class 'designDiagram' update(object, ...) ## S3 method for class 'designDiagram' plot( x, circle = "none", pvalue = (circle == "MSS"), sigma2 = NULL, kill = ~1, ca = FALSE, max.area = NULL, relative = 0.01, color = NULL, circle.scaling = 1, arrow.type = arrow(angle = 20, length = unit(4, "mm")), xlim = c(0, 1), ylim = c(0, 1), horizontal = TRUE, ... )
## S3 method for class 'designDiagram' print(x, ...) ## S3 method for class 'designDiagram' summary(object, ...) ## S3 method for class 'designDiagram' update(object, ...) ## S3 method for class 'designDiagram' plot( x, circle = "none", pvalue = (circle == "MSS"), sigma2 = NULL, kill = ~1, ca = FALSE, max.area = NULL, relative = 0.01, color = NULL, circle.scaling = 1, arrow.type = arrow(angle = 20, length = unit(4, "mm")), xlim = c(0, 1), ylim = c(0, 1), horizontal = TRUE, ... )
x |
object of class |
... |
not used. |
object |
object of class |
circle |
character specifying which circles to draw at the terms: |
pvalue |
boolean specifying whether p-values should be inserted on the graphs. This is only possible if a response variable was specified. Defaults to |
sigma2 |
vector of random effects variances. Defaults to |
kill |
formula specifying which cirlces not to plot. Defaults to |
ca |
boolean deciding whether collinearity analysis is visualized. If |
max.area |
numeric specifying the used maximal area of circles. If |
relative |
positive numeric, which specifies needed relative increase for an area to be visualized in the collinearity analysis. Defaults to |
color |
color of circles when |
circle.scaling |
numeric specifying size scaling of circles. Defaults to |
arrow.type |
specifying arrow heads via |
xlim |
x-range of diagram plot. Defaults to |
ylim |
y-range of diagram plot. Defaults to |
horizontal |
boolean specifying if the design diagram should be drawn horizontally or vertically. Defaults to |
For plot.designDiagram
the options circle="SS"
and circle="MSS"
are only available if a response variable was specified for the design.
For circle="I"
and circle="I2"
the color of the circles visualize the coefficient of variation of the informations. For the computation of the informations the variances of the random effects are either estimated (if a response variable is present), all set to 1 (otherwise), or given via the option sigma2
.
If color=NULL
and ca=FALSE
, then the defaults colors are "lightgreen"
for Sum-of-Squares, "lightblue"
for Mean-Sum-of-Squares, and a gradient from "limegreen"
to "orange"
for information spread. To specify a different color gradient in the latter case, then give a vector of two colors.
For update.designDiagram
the second argument should be a data frame with new coordinates
. This can be usefull for manually setting the coordinates for plotting.
Solves linear equations in continuous explanatory variables in order to find the expected dose. A typical application could be to find LD50, i.e. the lethal dose killing 50 percent of the population, from a probit analysis fitted by glm
. The associated variance-covariance matrix is found using the Delta method.
emmeans_ED( object, specs = ~0, left = NULL, right = NULL, tran = NULL, p = 0.5, p.name = "probability" )
emmeans_ED( object, specs = ~0, left = NULL, right = NULL, tran = NULL, p = 0.5, p.name = "probability" )
object |
An object that can be given to |
specs |
As for |
left |
A list specifying the left end point of the linear span of continuous variables in which to measure the ED values. Defaults to |
right |
A list specifying the right end point of the linear span of continuous variables in which to measure the ED values. Defaults to |
tran |
Possible transformation of the scale of the ED values. If given then backtransformation can be done using the technology of the |
p |
Numeric vector given the targeted predictions. Typically probabilities, where the default value |
p.name |
The name of the variable containing |
Find the 'expected dose' along a gradient in the space of numeric predictor variables. The options 'left' and 'right' specify the endpoints of this gradient. Typically these endpoints should be chosen as 0 and 1 for the numeric predictor of interest. If both endpoints are chosen as NULL then these choices are taken for all numeric predictors.
An object of class emmGrid-class
.
Bo Markussen
# Data from: C.I. Bliss, "The calculation of the dose-mortality curve", # Annals of Applied Biology, 134–167, 1935. # import data from dobson package library(dobson) data(beetle) m0 <- glm(cbind(y,n-y)~x,data=beetle,family=binomial(link="cloglog")) # ED50 computation summary(emmeans_ED(m0,tran="log10"),type="response") # Visualization using the tidyverse library(tidyverse) LCL <- Vectorize(function(y,n) binom.test(y,n)$conf.int[1]) UCL <- Vectorize(function(y,n) binom.test(y,n)$conf.int[2]) beetle <- mutate(beetle,LCL=LCL(y,n),UCL=UCL(y,n)) emmeans_ED(m0,p=seq(0.001,0.999,length.out=100),tran="log10") %>% summary(type="response") %>% as.data.frame() %>% mutate(probability=as.numeric(as.character(probability))) %>% ggplot(aes(x=probability,y=response,ymin=asymp.LCL,ymax=asymp.UCL)) + geom_ribbon(alpha=0.2,fill="blue") + geom_line() + xlab("Death probability") + ylab(expression(expected~dose~CS[2]~mg/l)) + geom_errorbarh(aes(xmin=LCL,xmax=UCL,y=10^x),beetle,inherit.aes=FALSE) + geom_point(aes(x=y/n,y=10^x),beetle,inherit.aes=FALSE)
# Data from: C.I. Bliss, "The calculation of the dose-mortality curve", # Annals of Applied Biology, 134–167, 1935. # import data from dobson package library(dobson) data(beetle) m0 <- glm(cbind(y,n-y)~x,data=beetle,family=binomial(link="cloglog")) # ED50 computation summary(emmeans_ED(m0,tran="log10"),type="response") # Visualization using the tidyverse library(tidyverse) LCL <- Vectorize(function(y,n) binom.test(y,n)$conf.int[1]) UCL <- Vectorize(function(y,n) binom.test(y,n)$conf.int[2]) beetle <- mutate(beetle,LCL=LCL(y,n),UCL=UCL(y,n)) emmeans_ED(m0,p=seq(0.001,0.999,length.out=100),tran="log10") %>% summary(type="response") %>% as.data.frame() %>% mutate(probability=as.numeric(as.character(probability))) %>% ggplot(aes(x=probability,y=response,ymin=asymp.LCL,ymax=asymp.UCL)) + geom_ribbon(alpha=0.2,fill="blue") + geom_line() + xlab("Death probability") + ylab(expression(expected~dose~CS[2]~mg/l)) + geom_errorbarh(aes(xmin=LCL,xmax=UCL,y=10^x),beetle,inherit.aes=FALSE) + geom_point(aes(x=y/n,y=10^x),beetle,inherit.aes=FALSE)
minimum
finds the minimum of two factors, i.e. the finest factors that is coarser than both of the factors.
minimum(x, y, concatenate.names = TRUE)
minimum(x, y, concatenate.names = TRUE)
x |
vector that will be interpreted as a factor. |
y |
vector that will be interpreted as a factor. |
concatenate.names |
boolean. If |
A factor with the minimum.
Bo Markussen
x <- rep(c("boy","girl","adult","adult"),4) y <- rep(c("child","child","man","woman"),4) minimum(x,y) minimum(x,y,FALSE)
x <- rep(c("boy","girl","adult","adult"),4) y <- rep(c("child","child","man","woman"),4) minimum(x,y) minimum(x,y,FALSE)
power.chisq.test.simulate
simulates power for tests for 2-way contingency tables based on the Pearson Chi-squared test statistics by simulation under 4 different conditioning scenarios.
power.chisq.test.simulate( x, conditioning = "total", x0 = NULL, sig.level = 0.05, B = 10000 )
power.chisq.test.simulate( x, conditioning = "total", x0 = NULL, sig.level = 0.05, B = 10000 )
x |
matrix specifying the alternative distribution of the contingency table. |
conditioning |
character string specifying the simulation scenario. Defaults to |
x0 |
matrix specifying the null distribution. Defaults to |
sig.level |
significance level used in test. Defaults to 0.05. |
B |
integer specifying the number of replicates used in the Monte Carlo test. Defaults to 10000. |
Using conditioning="both"
corresponds to selecting simulate.p.value=TRUE
in chisq.test
. However, conditioning on both row and column marginals appears to be rarely justified in real data. Instead conditioning="total"
is the correct choice for testing independence. Similarly, conditioning="row"
is recommended when the row marginals e.g. are fixed by experimental design.
Both the alternative and the null are simulated under the parametric scenario estimated from the data matrix x
. This possibly induces a discrepancy with chisq.test.simulate
, where the null also is simulated from the specific data instance. Thus, the problem is that the null distribution depends on the model parameters.
An object of class "power.htest"
.
The code has not been optimized for speed, and might be slow.
Bo Markussen
# The Avadex dataset Xobs <- matrix(c(2,3,6,40),2,2) rownames(Xobs) <- c("Avadex +","Avadex -") colnames(Xobs) <- c("Tumor +","Tumor -") # In this example only the rows appear to be fixed by experimental design. power.chisq.test.simulate(Xobs,"row") power.chisq.test.simulate(Xobs,"total")
# The Avadex dataset Xobs <- matrix(c(2,3,6,40),2,2) rownames(Xobs) <- c("Avadex +","Avadex -") colnames(Xobs) <- c("Tumor +","Tumor -") # In this example only the rows appear to be fixed by experimental design. power.chisq.test.simulate(Xobs,"row") power.chisq.test.simulate(Xobs,"total")