Creates an additive Gaussian process model using
create_model
and fits it using sample_model
.
See the
Mathematical description of lgpr models
vignette for more information about the connection between different options
and the created statistical model.
lgp(
formula,
data,
likelihood = "gaussian",
prior = NULL,
c_hat = NULL,
num_trials = NULL,
options = NULL,
prior_only = FALSE,
verbose = FALSE,
sample_f = !(likelihood == "gaussian"),
quiet = FALSE,
skip_postproc = sample_f,
...
)
The model formula, where
it must contain exatly one tilde (~
), with response
variable on the left-hand side and model terms on the right-hand side
terms are be separated by a plus (+
) sign
all variables appearing in formula
must be
found in data
See the "Model formula syntax" section below (lgp
) for
instructions on how to specify the model terms.
A data.frame
where each column corresponds to one
variable, and each row is one observation. Continuous covariates and the
response variable must have type "numeric"
and categorical covariates
must have type "factor"
. Missing values should be indicated with
NaN
or NA
. The response variable cannot contain missing
values. Column names should not contain trailing or leading underscores.
Determines the observation model. Must be either
"gaussian"
(default), "poisson"
, "nb"
(negative
binomial), "binomial"
or "bb"
(beta binomial).
A named list, defining the prior distribution of model
(hyper)parameters. See the "Defining priors" section below
(lgp
).
The GP mean. This should only be given if sample_f
is
TRUE
, otherwise the GP will always have zero mean. If sample_f
is TRUE
, the given c_hat
can be a vector of length
dim(data)[1]
, or a real number defining a constant GP mean. If not
specified and sample_f
is TRUE
, c_hat
is set to
c_hat = mean(y)
, if likelihood
is "gaussian"
,
c_hat =
log(mean(y))
if likelihood
is
"poisson"
or "nb"
,
c_hat =
log(p/(1-p))
, where
p = mean(y/num_trials)
if likelihood
is "binomial"
or "bb"
,
where y
denotes the response variable measurements.
This argument (number of trials) is only needed when
likelihood is "binomial"
or "bb"
. Must have length one or
equal to the number of data points. Setting num_trials=1
and
likelihood="binomial"
corresponds to Bernoulli observation model.
A named list with the following possible fields:
delta
Amount of added jitter to ensure positive definite
covariance matrices.
vm_params
Variance mask function parameters (numeric
vector of length 2).
If options
is NULL
, default options are used. The defaults
are equivalent to
options = list(delta = 1e-8, vm_params = c(0.025, 1))
.
Should likelihood be ignored? See also
sample_param_prior
which can be used for any
lgpmodel, and whose runtime is independent of the number of
observations.
Can messages be printed during model creation? Has no
effect if quiet=TRUE
.
Determines if the latent function values are sampled
(must be TRUE
if likelihood is not "gaussian"
). If this is
TRUE
, the response variable will be normalized to have zero mean
and unit variance.
Should all output messages be suppressed? You need to set
also refresh=0
if you want to suppress also the progress update
messages from sampling
.
Should all postprocessing be skipped? If this is
TRUE
, the returned lgpfit object will likely be
much smaller (if sample_f=FALSE
).
Optional arguments passed to
sampling
or optimizing
.
Returns an object of the S4 class lgpfit.
There are two ways to define the model formula:
Using a common formula
-like syntax, like in
y ~ age +
age|id
+ sex
. Terms can consist of a
single variable, such as age
, or an interaction of two variables,
such as age|id
. In single-variable terms, the variable can be either
continuous (numeric) or categorical (factor), whereas in interaction terms
the variable on the left-hand side of the vertical bar (|
) has to
be continuous and the one on the right-hand side has to be categorical.
Formulae specified using this syntax are translated to the advanced format
so that
single-variable terms become gp(x)
if
variable x
is numeric and zs(x)
if x
is a factor
interaction terms x|z
become gp(x)*zs(z)
Using the advanced syntax, like in y ~ gp(age) +
gp(age)*zs(id) +
het(id)*gp_vm(disAge)
.
This creates lgprhs objects, which consist of
lgpterms, which consist of lgpexprs.
This approach must be used if creating nonstationary, heterogeneous or
temporally uncertain components.
Either one of the approaches should be used and they should not be mixed.
The prior
argument must be a named list, like
list(alpha=student_t(4), wrp=igam(30,10))
. See examples in tutorials.
Possible allowed names are
"alpha"
= component magnitude parameters
"ell"
= component lengthscale parameters
"wrp"
= input warping steepness parameters
"sigma"
= noise magnitude (Gaussian obs. model)
"phi"
= inv. overdispersion (negative binomial obs. model)
"gamma"
= overdispersion (beta-binomial obs. model)
"beta"
= heterogeneity parameters
"effect_time"
= uncertain effect time parameters
"effect_time_info"
= additional options for the above
See priors
for functions that can be
used to define the list elements. If a parameter of a model is not given
in this list, a default prior will be used for it.
It is not recommended to use default priors blindly. Rather, priors should
be specified according to the knowledge about the problem at hand, as in any
Bayesian analysis. In lgpr
this is especially important when
Using a non-Gaussian likelihood or otherwise setting
sample_f = TRUE
. In this case the response variable is not
normalized, so the scale on which the data varies must be taken into
account when defining priors of the signal magnitude parameters
alpha
and possible noise parameters (sigma
, phi
,
gamma
). Also it should be checked if c_hat
is set in a
sensible way.
Using a model that contains a gp_ns(x)
or gp_vm(x)
expression in its formula. In this case the corresponding covariate
x
is not normalized, and the prior for the input warping steepness
parameter wrp
must be set according to the expected width of the
window in which the nonstationary effect of x
occurs. By default,
the width of this window is about 36, which has been set assuming that
the unit of x
is months.
Other main functions:
create_model()
,
draw_pred()
,
get_draws()
,
pred()
,
prior_pred()
,
sample_model()