See the Mathematical description of lgpr models vignette for more information about the connection between different options and the created statistical model.

create_model(
  formula,
  data,
  likelihood = "gaussian",
  prior = NULL,
  c_hat = NULL,
  num_trials = NULL,
  options = NULL,
  prior_only = FALSE,
  verbose = FALSE,
  sample_f = !(likelihood == "gaussian")
)

Arguments

formula

The model formula, where

  • it must contain exatly one tilde (~), with response variable on the left-hand side and model terms on the right-hand side

  • terms are be separated by a plus (+) sign

  • all variables appearing in formula must be found in data

See the "Model formula syntax" section below (lgp) for instructions on how to specify the model terms.

data

A data.frame where each column corresponds to one variable, and each row is one observation. Continuous covariates and the response variable must have type "numeric" and categorical covariates must have type "factor". Missing values should be indicated with NaN or NA. The response variable cannot contain missing values. Column names should not contain trailing or leading underscores.

likelihood

Determines the observation model. Must be either "gaussian" (default), "poisson", "nb" (negative binomial), "binomial" or "bb" (beta binomial).

prior

A named list, defining the prior distribution of model (hyper)parameters. See the "Defining priors" section below (lgp).

c_hat

The GP mean. This should only be given if sample_f is TRUE, otherwise the GP will always have zero mean. If sample_f is TRUE, the given c_hat can be a vector of length dim(data)[1], or a real number defining a constant GP mean. If not specified and sample_f is TRUE, c_hat is set to

  • c_hat = mean(y), if likelihood is "gaussian",

  • c_hat = log(mean(y)) if likelihood is "poisson" or "nb",

  • c_hat = log(p/(1-p)), where p = mean(y/num_trials) if likelihood is "binomial" or "bb",

where y denotes the response variable measurements.

num_trials

This argument (number of trials) is only needed when likelihood is "binomial" or "bb". Must have length one or equal to the number of data points. Setting num_trials=1 and likelihood="binomial" corresponds to Bernoulli observation model.

options

A named list with the following possible fields:

  • delta Amount of added jitter to ensure positive definite covariance matrices.

  • vm_params Variance mask function parameters (numeric vector of length 2).

If options is NULL, default options are used. The defaults are equivalent to options = list(delta = 1e-8, vm_params = c(0.025, 1)).

prior_only

Should likelihood be ignored? See also sample_param_prior which can be used for any lgpmodel, and whose runtime is independent of the number of observations.

verbose

Should some informative messages be printed?

sample_f

Determines if the latent function values are sampled (must be TRUE if likelihood is not "gaussian"). If this is TRUE, the response variable will be normalized to have zero mean and unit variance.

Value

An object of class lgpmodel, containing the Stan input created based on parsing the specified formula, prior, and other options.

See also

Other main functions: draw_pred(), get_draws(), lgp(), pred(), prior_pred(), sample_model()