Package 'LabApplStat' reference manual

Title:	Miscellaneous Scripts from the Data Science Laboratory (UCPH)
Description:	Miscellaneous scripts, e.g. functionality to make and plot factor diagrams for the statistical design.
Authors:	Bo Markussen [aut, cre]
Maintainer:	Bo Markussen <[email protected]>
License:	GPL-3
Version:	1.4.4
Built:	2025-01-21 04:12:14 UTC
Source:	https://github.com/bomarkussen/labapplstat

Simulate Chi-squared tests with conditioning

Description

chisq.test.simulate simulates the chi-squared test for a 2-way contingency tabel.

Usage

chisq.test.simulate(x, conditioning = "total", x0 = NULL, B = 10000)
chisq.test.simulate(x, conditioning = "total", x0 = NULL, B = 10000)

Arguments

`x`	matrix with the contingency table
`conditioning`	character string specifying the simulation scenario. Defaults to `"total"`. Other possible scenarios are `"row"`, `"col"`, and `"both"`.
`x0`	matrix specifying the null distribution. Defaults to `NULL`, in which case the null is estimated from the observed data `x`.
`B`	integer specifying the number of replicates used in the Monte Carlo test. Defaults to 10000.

Details

Value

An object of class "htest".

Note

The code has not been optimized for speed, and might be slow.

Author(s)

Bo Markussen

Examples

# The Avadex dataset
Xobs <- matrix(c(2,3,6,40),2,2)
rownames(Xobs) <- c("Avadex +","Avadex -")
colnames(Xobs) <- c("Tumor +","Tumor -")

# In this example only the rows appear to be fixed by experimental design.
# As is seen below, conditioning also on the columns is misleading conservative.
chisq.test.simulate(Xobs,"both")
chisq.test.simulate(Xobs,"row")
chisq.test.simulate(Xobs,"total")

# Conditioning both on row and column marginals is simular to chisq.test().
chisq.test(Xobs,simulate.p.value=TRUE)

# The Avadex dataset
Xobs <- matrix(c(2,3,6,40),2,2)
rownames(Xobs) <- c("Avadex +","Avadex -")
colnames(Xobs) <- c("Tumor +","Tumor -")

# In this example only the rows appear to be fixed by experimental design.
# As is seen below, conditioning also on the columns is misleading conservative.
chisq.test.simulate(Xobs,"both")
chisq.test.simulate(Xobs,"row")
chisq.test.simulate(Xobs,"total")

# Conditioning both on row and column marginals is simular to chisq.test().
chisq.test(Xobs,simulate.p.value=TRUE)

Design diagram for a linear model

Description

DD computes the Design Diagram for a linear model.

Usage

DD(fixed, random = NULL, data, keep = ~1, center = FALSE, eps = 1e-12)
DD(fixed, random = NULL, data, keep = ~1, center = FALSE, eps = 1e-12)

Arguments

`fixed`	formula with fixed effects. A response may the specified, but this optional.
`random`	formula with random effects. Defaults to `NULL` meaning that there are no other random effects than the residual, which is added to all designs.
`data`	data frame with the explanatory variables and the response (if specified).
`keep`	formula which effects that will not be removed in the collinarity analysis. Defaults to `~1` meaning that the intercept will be kept if it is present.
`center`	boolean deciding whether to centralize numerical predictors when an intercept is present. Defaults to `FALSE`.
`eps`	threshold for deeming singular values to be "zero". Defaults to 1e-12.

Value

An object of class designDiagram-class

Author(s)

Bo Markussen

Examples

# 3-way ANOVA
x <- factor(rep(rep(1:4,times=4),times=4))
y <- factor(rep(rep(1:4,times=4),each=4))
z <- factor(rep(rep(1:4,each=4),each=4))
myDD <- DD(~x*y*z,data=data.frame(x=x,y=y,z=z))
summary(myDD)

#Making the factor diagram closed under minima
mydata <- data.frame(age=rep(c("boy","girl","adult","adult"),4),
                     gender=rep(c("child","child","man","woman"),4))
myDD <- DD(~0+age+gender,data=mydata)
plot(myDD)

# Example of collinearity
mydata <- data.frame(age=rnorm(102),edu=rnorm(102),sex=factor(rep(c(1,2),51)))
mydata <- transform(mydata,exper=age-edu+0.1*rnorm(102))
mydata <- transform(mydata,wage=2*edu+2*exper+rnorm(102))
summary(myDD <- DD(wage~sex*(age+exper+edu),data=mydata))

# growth of rats
antibiotica <- factor(rep(c(0,40),each=6))
vitamin <- factor(rep(rep(c(0,5),each=3),2))
growth <- c(1.30,1.19,1.08,1.26,1.21,1.19,1.05,1.00,1.05,1.52,1.56,1.55)
mydata <- data.frame(antibiotica=antibiotica,vitamin=vitamin,growth=growth)
myDD <- DD(growth~antibiotica*vitamin,data=mydata)
plot(myDD,"MSS")
plot(myDD,"I2")


  # ANCOVA: Non-orthogonal design
  library(isdals)
  data(birthweight)
  plot(DD(weight~sex*I(age-42),data=birthweight),"MSS")
  plot(DD(weight~I(age-42)+sex:I(age-42)+sex,data=birthweight),"MSS")


# 3-way ANOVA
x <- factor(rep(rep(1:4,times=4),times=4))
y <- factor(rep(rep(1:4,times=4),each=4))
z <- factor(rep(rep(1:4,each=4),each=4))
myDD <- DD(~x*y*z,data=data.frame(x=x,y=y,z=z))
summary(myDD)

#Making the factor diagram closed under minima
mydata <- data.frame(age=rep(c("boy","girl","adult","adult"),4),
                     gender=rep(c("child","child","man","woman"),4))
myDD <- DD(~0+age+gender,data=mydata)
plot(myDD)

# Example of collinearity
mydata <- data.frame(age=rnorm(102),edu=rnorm(102),sex=factor(rep(c(1,2),51)))
mydata <- transform(mydata,exper=age-edu+0.1*rnorm(102))
mydata <- transform(mydata,wage=2*edu+2*exper+rnorm(102))
summary(myDD <- DD(wage~sex*(age+exper+edu),data=mydata))

# growth of rats
antibiotica <- factor(rep(c(0,40),each=6))
vitamin <- factor(rep(rep(c(0,5),each=3),2))
growth <- c(1.30,1.19,1.08,1.26,1.21,1.19,1.05,1.00,1.05,1.52,1.56,1.55)
mydata <- data.frame(antibiotica=antibiotica,vitamin=vitamin,growth=growth)
myDD <- DD(growth~antibiotica*vitamin,data=mydata)
plot(myDD,"MSS")
plot(myDD,"I2")


  # ANCOVA: Non-orthogonal design
  library(isdals)
  data(birthweight)
  plot(DD(weight~sex*I(age-42),data=birthweight),"MSS")
  plot(DD(weight~I(age-42)+sex:I(age-42)+sex,data=birthweight),"MSS")

The `designDiagram` class and some basic methods

Description

Objects of class designDiagram as generated by DD is a list with entries as specified below.

response: Logical stating whether a response variable was present.
terms: Named vector with all terms in the design.
random.terms: Vector with the random terms in the design.
relations: Named matrix with relations between variables with the following interpretation: "0"=linear indepent, "<"=row term is a subspace of column, "<-"=row term is a subspace of column term and no other terms are inbetween, ">" and "->" the similar interpretatioin between columns and rows, name=name of minimum between row and column term.
inner: Named matrix of squared inner products of subspaces with nesting subspaces removed. Rounded at order of eps in the call to link{DD}. Used to decide orthogonality of the design.
Nparm: Named vector with the number of parameters for the terms.
df: Named vector with the degrees of freedom for the terms.
SS: Named matrix with Sum-of-Squares if a response variable was specified.
MSS: Named matrix with Mean-Sum-of-Squares if a response variable was specified.
pvalue: Named matrix with p-values for Type-I F-tests. p-values are stated at the collapsed nesting, but F-test are done against the most coarse nested random effect.
sigma2: Named vector of random effects variance estimates.
varcov: Named list of variance-covariance matrix for fixed effects relative to each of the random effects. Rounded at order of eps.
coordinates: Data frame with node coordinates of the terms. Initialized in Sugiyama layout.

Usage

## S3 method for class 'designDiagram'
print(x, ...)

## S3 method for class 'designDiagram'
summary(object, ...)

## S3 method for class 'designDiagram'
update(object, ...)

## S3 method for class 'designDiagram'
plot(
  x,
  circle = "none",
  pvalue = (circle == "MSS"),
  sigma2 = NULL,
  kill = ~1,
  ca = FALSE,
  max.area = NULL,
  relative = 0.01,
  color = NULL,
  circle.scaling = 1,
  arrow.type = arrow(angle = 20, length = unit(4, "mm")),
  xlim = c(0, 1),
  ylim = c(0, 1),
  horizontal = TRUE,
  ...
)
## S3 method for class 'designDiagram'
print(x, ...)

## S3 method for class 'designDiagram'
summary(object, ...)

## S3 method for class 'designDiagram'
update(object, ...)

## S3 method for class 'designDiagram'
plot(
  x,
  circle = "none",
  pvalue = (circle == "MSS"),
  sigma2 = NULL,
  kill = ~1,
  ca = FALSE,
  max.area = NULL,
  relative = 0.01,
  color = NULL,
  circle.scaling = 1,
  arrow.type = arrow(angle = 20, length = unit(4, "mm")),
  xlim = c(0, 1),
  ylim = c(0, 1),
  horizontal = TRUE,
  ...
)

Arguments

`x`	object of class `designDiagram`
`...`	not used.
`object`	object of class `designDiagram`
`circle`	character specifying which circles to draw at the terms: `"none"`=no circles, `"SS"`=a circle with area proportional to the associated Sum-of-Squares, `"MSS"`=a circle with area proportional to the associated Mean-Sum-of-Squares, `"I"`=a circle with area proportional to average information, `"I2"`=a circle with area proportional to average information of the parameter contrasts. Defaults to `"none"`.
`pvalue`	boolean specifying whether p-values should be inserted on the graphs. This is only possible if a response variable was specified. Defaults to `TRUE` is `circle="MSS"` and `FALSE` otherwise.
`sigma2`	vector of random effects variances. Defaults to `NULL`, in which case the estimates are used (if present), otherwise all variances are set to 1.
`kill`	formula specifying which cirlces not to plot. Defaults to `~1` corresponding to not plotting the intercept term (that otherwise may overweight the remaining terms).
`ca`	boolean deciding whether collinearity analysis is visualized. If `NULL` then set `TRUE` for non-orthogonal designs, and to `FALSE` for orthogonal designs. Defaults to `FALSE`.
`max.area`	numeric specifying the used maximal area of circles. If `NULL` then `max.area` is derived from `SS`, `MSS` or `I` according to value of `circle`. Defaults to `NULL`.
`relative`	positive numeric, which specifies needed relative increase for an area to be visualized in the collinearity analysis. Defaults to `0.01`.
`color`	color of circles when `ca=FALSE`. Defaults to `NULL` corresponding to preassigned choice of colors (see details below).
`circle.scaling`	numeric specifying size scaling of circles. Defaults to `1`, which corresponds to the largest circle having a radius that is half of the shortest distance between two nodes.
`arrow.type`	specifying arrow heads via `arrow`. Defaults to `arrow(angle=20,length=unit(4,"mm"))`.
`xlim`	x-range of diagram plot. Defaults to `c(0,1)`.
`ylim`	y-range of diagram plot. Defaults to `c(0,1)`.
`horizontal`	boolean specifying if the design diagram should be drawn horizontally or vertically. Defaults to `TRUE`.

Details

For plot.designDiagram the options circle="SS" and circle="MSS" are only available if a response variable was specified for the design. For circle="I" and circle="I2" the color of the circles visualize the coefficient of variation of the informations. For the computation of the informations the variances of the random effects are either estimated (if a response variable is present), all set to 1 (otherwise), or given via the option sigma2.

If color=NULL and ca=FALSE, then the defaults colors are "lightgreen" for Sum-of-Squares, "lightblue" for Mean-Sum-of-Squares, and a gradient from "limegreen" to "orange" for information spread. To specify a different color gradient in the latter case, then give a vector of two colors.

For update.designDiagram the second argument should be a data frame with new coordinates. This can be usefull for manually setting the coordinates for plotting.

Make emmeans object for an expected dose

Description

Solves linear equations in continuous explanatory variables in order to find the expected dose. A typical application could be to find LD50, i.e. the lethal dose killing 50 percent of the population, from a probit analysis fitted by glm. The associated variance-covariance matrix is found using the Delta method.

Usage

emmeans_ED(
  object,
  specs = ~0,
  left = NULL,
  right = NULL,
  tran = NULL,
  p = 0.5,
  p.name = "probability"
)
emmeans_ED(
  object,
  specs = ~0,
  left = NULL,
  right = NULL,
  tran = NULL,
  p = 0.5,
  p.name = "probability"
)

Arguments

`object`	An object that can be given to `emmeans`. Typically a model fitted by `glm`.
`specs`	As for `emmeans`. Typically as one-sided `formula`. Defaults to `~0`.
`left`	A list specifying the left end point of the linear span of continuous variables in which to measure the ED values. Defaults to `NULL`.
`right`	A list specifying the right end point of the linear span of continuous variables in which to measure the ED values. Defaults to `NULL`.
`tran`	Possible transformation of the scale of the ED values. If given then backtransformation can be done using the technology of the `emmeans`. The default value `tran=NULL` corresponds to no transformation.
`p`	Numeric vector given the targeted predictions. Typically probabilities, where the default value `p=0.5` corresponds to ED50.
`p.name`	The name of the variable containing `p`. If `p` contains more than one value, then this will also appear in `@misc$by.vars` in the `emmGrid` object. Defaults to `p.name="probability"`.

Details

Find the 'expected dose' along a gradient in the space of numeric predictor variables. The options 'left' and 'right' specify the endpoints of this gradient. Typically these endpoints should be chosen as 0 and 1 for the numeric predictor of interest. If both endpoints are chosen as NULL then these choices are taken for all numeric predictors.

Value

An object of class emmGrid-class.

Author(s)

Bo Markussen

Examples

# Data from: C.I. Bliss, "The calculation of the dose-mortality curve", 
# Annals of Applied Biology, 134–167, 1935.

# import data from dobson package
library(dobson)
data(beetle)
m0 <- glm(cbind(y,n-y)~x,data=beetle,family=binomial(link="cloglog"))
# ED50 computation
summary(emmeans_ED(m0,tran="log10"),type="response")
# Visualization using the tidyverse
library(tidyverse)
LCL <- Vectorize(function(y,n) binom.test(y,n)$conf.int[1])
UCL <- Vectorize(function(y,n) binom.test(y,n)$conf.int[2])
beetle <- mutate(beetle,LCL=LCL(y,n),UCL=UCL(y,n))
emmeans_ED(m0,p=seq(0.001,0.999,length.out=100),tran="log10") %>% 
  summary(type="response") %>% as.data.frame() %>% 
  mutate(probability=as.numeric(as.character(probability))) %>%
  ggplot(aes(x=probability,y=response,ymin=asymp.LCL,ymax=asymp.UCL)) + 
  geom_ribbon(alpha=0.2,fill="blue") + geom_line() +
  xlab("Death probability") +
  ylab(expression(expected~dose~CS[2]~mg/l)) +
  geom_errorbarh(aes(xmin=LCL,xmax=UCL,y=10^x),beetle,inherit.aes=FALSE) +
  geom_point(aes(x=y/n,y=10^x),beetle,inherit.aes=FALSE)


# Data from: C.I. Bliss, "The calculation of the dose-mortality curve", 
# Annals of Applied Biology, 134–167, 1935.

# import data from dobson package
library(dobson)
data(beetle)
m0 <- glm(cbind(y,n-y)~x,data=beetle,family=binomial(link="cloglog"))
# ED50 computation
summary(emmeans_ED(m0,tran="log10"),type="response")
# Visualization using the tidyverse
library(tidyverse)
LCL <- Vectorize(function(y,n) binom.test(y,n)$conf.int[1])
UCL <- Vectorize(function(y,n) binom.test(y,n)$conf.int[2])
beetle <- mutate(beetle,LCL=LCL(y,n),UCL=UCL(y,n))
emmeans_ED(m0,p=seq(0.001,0.999,length.out=100),tran="log10") %>% 
  summary(type="response") %>% as.data.frame() %>% 
  mutate(probability=as.numeric(as.character(probability))) %>%
  ggplot(aes(x=probability,y=response,ymin=asymp.LCL,ymax=asymp.UCL)) + 
  geom_ribbon(alpha=0.2,fill="blue") + geom_line() +
  xlab("Death probability") +
  ylab(expression(expected~dose~CS[2]~mg/l)) +
  geom_errorbarh(aes(xmin=LCL,xmax=UCL,y=10^x),beetle,inherit.aes=FALSE) +
  geom_point(aes(x=y/n,y=10^x),beetle,inherit.aes=FALSE)

Minimum between factors

Description

minimum finds the minimum of two factors, i.e. the finest factors that is coarser than both of the factors.

Usage

minimum(x, y, concatenate.names = TRUE)
minimum(x, y, concatenate.names = TRUE)

Arguments

`x`	vector that will be interpreted as a factor.
`y`	vector that will be interpreted as a factor.
`concatenate.names`	boolean. If `TRUE` then the levels of the minimum are constructed as the concatenation of the levels for `x` and `y`. If `FALSE` then the levels of the minimum are given as numbers. Defaults to `TRUE`.

Value

A factor with the minimum.

Author(s)

Bo Markussen

Examples

x <- rep(c("boy","girl","adult","adult"),4)
y <- rep(c("child","child","man","woman"),4)
minimum(x,y)
minimum(x,y,FALSE)

x <- rep(c("boy","girl","adult","adult"),4)
y <- rep(c("child","child","man","woman"),4)
minimum(x,y)
minimum(x,y,FALSE)

Simulate power of Chi-squared tests with conditioning

Description

power.chisq.test.simulate simulates power for tests for 2-way contingency tables based on the Pearson Chi-squared test statistics by simulation under 4 different conditioning scenarios.

Usage

power.chisq.test.simulate(
  x,
  conditioning = "total",
  x0 = NULL,
  sig.level = 0.05,
  B = 10000
)
power.chisq.test.simulate(
  x,
  conditioning = "total",
  x0 = NULL,
  sig.level = 0.05,
  B = 10000
)

Arguments

`x`	matrix specifying the alternative distribution of the contingency table.
`conditioning`	character string specifying the simulation scenario. Defaults to `"total"`. Other possible scenarios are `"row"`, `"col"`, and `"both"`.
`x0`	matrix specifying the null distribution. Defaults to `NULL`, in which case the null is estimated from the alternative `x`.
`sig.level`	significance level used in test. Defaults to 0.05.
`B`	integer specifying the number of replicates used in the Monte Carlo test. Defaults to 10000.

Details

Using conditioning="both" corresponds to selecting simulate.p.value=TRUE in chisq.test. However, conditioning on both row and column marginals appears to be rarely justified in real data. Instead conditioning="total" is the correct choice for testing independence. Similarly, conditioning="row" is recommended when the row marginals e.g. are fixed by experimental design. Both the alternative and the null are simulated under the parametric scenario estimated from the data matrix x. This possibly induces a discrepancy with chisq.test.simulate, where the null also is simulated from the specific data instance. Thus, the problem is that the null distribution depends on the model parameters.

Value

An object of class "power.htest".

Note

The code has not been optimized for speed, and might be slow.

Author(s)

Bo Markussen

Examples

# The Avadex dataset
Xobs <- matrix(c(2,3,6,40),2,2)
rownames(Xobs) <- c("Avadex +","Avadex -")
colnames(Xobs) <- c("Tumor +","Tumor -")

# In this example only the rows appear to be fixed by experimental design.
power.chisq.test.simulate(Xobs,"row")
power.chisq.test.simulate(Xobs,"total")

# The Avadex dataset
Xobs <- matrix(c(2,3,6,40),2,2)
rownames(Xobs) <- c("Avadex +","Avadex -")
colnames(Xobs) <- c("Tumor +","Tumor -")

# In this example only the rows appear to be fixed by experimental design.
power.chisq.test.simulate(Xobs,"row")
power.chisq.test.simulate(Xobs,"total")

Package 'LabApplStat'

Help Index

Simulate Chi-squared tests with conditioning

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

Design diagram for a linear model

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

The designDiagram class and some basic methods

Description

Usage

Arguments

Details

See Also

Make emmeans object for an expected dose

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Minimum between factors

Description

Usage

Arguments

Value

Author(s)

Examples

Simulate power of Chi-squared tests with conditioning

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

The `designDiagram` class and some basic methods