Generate an artificial longitudinal data set

Generate an artificial longitudinal data set.

simulate_data(
  N,
  t_data,
  covariates = c(),
  names = NULL,
  relevances = c(1, 1, rep(1, length(covariates))),
  n_categs = rep(2, sum(covariates %in% c(2, 3))),
  t_jitter = 0,
  lengthscales = rep(12, 2 + sum(covariates %in% c(0, 1, 2))),
  f_var = 1,
  noise_type = "gaussian",
  snr = 3,
  phi = 1,
  gamma = 0.2,
  N_affected = round(N/2),
  t_effect_range = "auto",
  t_observed = "after_0",
  c_hat = 0,
  dis_fun = "gp_warp_vm",
  bin_kernel = FALSE,
  steepness = 0.5,
  vm_params = c(0.025, 1),
  continuous_info = list(mu = c(pi/8, pi, -0.5), lambda = c(pi/8, pi, 1)),
  N_trials = 1,
  force_zeromean = TRUE
)

Arguments

N

Number of individuals.

t_data

Measurement times (same for each individual, unless t_jitter > 0 in which case they are perturbed).

covariates

Integer vector that defines the types of covariates (other than id and age). If not given, only the id and age covariates are created. Different integers correspond to the following covariate types:

0 = disease-related age
1 = other continuous covariate
2 = a categorical covariate that interacts with age
3 = a categorical covariate that acts as a group offset
4 = a categorical covariate that that acts as a group offset AND is restricted to have value 0 for controls and 1 for cases

names

Covariate names.

relevances

Relative relevance of each component. Must have be a vector so that
length(relevances) = 2 + length(covariates).
First two values define the relevance of the individual-specific age and shared age component, respectively.

n_categs

An integer vector defining the number of categories for each categorical covariate, so that length(n_categs) equals to the number of 2's and 3's in the covariates vector.

t_jitter

Standard deviation of the jitter added to the given measurement times.

lengthscales

A vector so that
length(lengthscales) = 2 + sum(covariates %in% c(0,1,2)).

f_var

variance of f

noise_type

Either "gaussian", "poisson", "nb" (negative binomial), "binomial", or "bb" (beta-binomial).

snr

The desired signal-to-noise ratio. This argument is valid only when noise_type is "gaussian".

phi

The inverse overdispersion parameter for negative binomial data. The variance is g + g^2/phi.

gamma

The dispersion parameter for beta-binomial data.

N_affected

Number of diseased individuals that are affected by the disease. This defaults to the number of diseased individuals. This argument can only be given if covariates contains a zero.

t_effect_range

Time interval from which the disease effect times are sampled uniformly. Alternatively, This can any function that returns the (possibly randomly generated) real disease effect time for one individual.

t_observed

Determines how the disease effect time is observed. This can be any function that takes the real disease effect time as an argument and returns the (possibly randomly generated) observed onset/initiation time. Alternatively, this can be a string of the form "after_n" or "random_p" or "exact".

c_hat

a constant added to f

dis_fun

A function or a string that defines the disease effect. If this is a function, that function is used to generate the effect. If dis_fun is "gp_vm" or "gp_ns", the disease component is drawn from a nonstationary GP prior ("vm" is the variance masked version of it).

bin_kernel

Should the binary kernel be used for categorical covariates? If this is TRUE, the effect will exist only for group 1.

steepness

Steepness of the input warping function. This is only used if the disease component is in the model.

vm_params

Parameters of the variance mask function. This is only needed if useMaskedVarianceKernel = TRUE.

continuous_info

Info for generating continuous covariates. Must be a list containing fields lambda and mu, which have length 3. The continuous covariates are generated so that x <- sin(a*t + b) + c, where

t <- seq(0, 2*pi, length.out = k)
a <- mu[1] + lambda[1]*stats::runif(1)
b <- mu[2] + lambda[2]*stats::runif(1)
c <- mu[3] + lambda[3]*stats::runif(1)

N_trials

The number of trials parameter for binomial data.

force_zeromean

Should each component (excluding the disease age component) be forced to have a zero mean?

Value

An object of class lgpsim.

Examples

# Generate Gaussian data
dat <- simulate_data(N = 4, t_data = c(6, 12, 24, 36, 48), snr = 3)

# Generate negative binomially (NB) distributed count data
dat <- simulate_data(
  N = 6, t_data = seq(2, 10, by = 2), noise_type = "nb",
  phi = 2
)