Title: | Fit Vector Fields and Potential Landscapes from Intensive Longitudinal Data |
---|---|
Description: | A toolbox for estimating vector fields from intensive longitudinal data, and construct potential landscapes thereafter. The vector fields can be estimated with two nonparametric methods: the Multivariate Vector Field Kernel Estimator (MVKE) by Bandi & Moloche (2018) <doi:10.1017/S0266466617000305> and the Sparse Vector Field Consensus (SparseVFC) algorithm by Ma et al. (2013) <doi:10.1016/j.patcog.2013.05.017>. The potential landscapes can be constructed with a simulation-based approach with the 'simlandr' package (Cui et al., 2021) <doi:10.31234/osf.io/pzva3>, or the Bhattacharya et al. (2011) method for path integration <doi:10.1186/1752-0509-5-85>. |
Authors: | Jingmeng Cui [aut, cre] |
Maintainer: | Jingmeng Cui <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.0.9000 |
Built: | 2025-02-20 05:32:09 UTC |
Source: | https://github.com/Sciurus365/fitlandr |
vectorfield
object to enable linear interpolationAdd a grid to a vectorfield
object to enable linear interpolation
add_interp_grid(vf, lims = vf$lims, n = vf$n)
add_interp_grid(vf, lims = vf$lims, n = vf$n)
vf |
A |
lims |
The limits of the range for the vector field estimation as |
n |
The number of equally spaced points in each axis, at which the vectors are to be estimated. |
A vectorfield
project with an interp_grid
field.
Find equilibrium points for a vector field
find_eqs(vf, starts, jacobian_params = list(), ...)
find_eqs(vf, starts, jacobian_params = list(), ...)
vf |
A |
starts |
A vector indicating the starting value for solving the equilibrium point, or a list of vectors providing multiple starting values together. |
jacobian_params |
Parameters passed to |
... |
Parameters passed to |
A list of equilibrium points and their details. Use print.vectorfield_eqs()
to inspect it.
This function is a wrapper of the MVKE method (see MVKE()
) that produces a 2D potential landscape from 1D data. The landscape is constructed by estimating the gradient of the data and then integrating it. The MVKE method is a non-parametric method that estimates the gradient of the data by using a kernel density estimator. The potential landscape is then constructed by integrating the gradient.
fit_2d_ld( data, x, lims, n = 200L, vector_position = "start", na_action = "omit_data_points", method = c("MVKE"), subdivisions = 100L, rel.tol = .Machine$double.eps^0.25, abs.tol = rel.tol, stop.on.error = TRUE, keep.xy = FALSE, aux = NULL, ... ) ## S3 method for class ''2d_MVKE_landscape'' summary(object, ...)
fit_2d_ld( data, x, lims, n = 200L, vector_position = "start", na_action = "omit_data_points", method = c("MVKE"), subdivisions = 100L, rel.tol = .Machine$double.eps^0.25, abs.tol = rel.tol, stop.on.error = TRUE, keep.xy = FALSE, aux = NULL, ... ) ## S3 method for class ''2d_MVKE_landscape'' summary(object, ...)
data |
A data frame or matrix containing the data. The data frame should contain at least a column, with the column name indicated by |
x |
The column name of the data frame that represents the dimension for landscape construction. |
lims |
The limits of the range for the landscape calculation as |
n |
The number of equally spaced points in the axis, at which the landscape is to be estimated. |
vector_position |
One of "start", "middle", or "end", representing the position of the vectors. If "start", for example, the starting point of a vector is regarded as the position of the vector. |
na_action |
One of "omit_data_points" or "omit_vectors". If using "omit_data_points", then only the |
method |
The method used to estimate the gradient. Currently only "MVKE" is supported. |
subdivisions |
the maximum number of subintervals. |
rel.tol |
relative accuracy requested. |
abs.tol |
absolute accuracy requested. |
stop.on.error |
logical. If true (the default) an error stops the
function. If false some errors will give a result with a warning in
the |
keep.xy |
unused. For compatibility with S. |
aux |
unused. For compatibility with S. |
... |
Not used. |
object |
An object of class |
A 2d_MVKE_landscape
object, which contains the following components:
dist
: A data frame containing the estimated potential landscape. The data frame has two columns: x
and U
, where x
is the position and U
is the potential.
p
: A ggplot object containing the plot of the potential landscape.
summary(`2d_MVKE_landscape`)
: Find the local minima of the 2D potential landscape
# generate data single_output_grad <- simlandr::sim_fun_grad(length = 200, seed = 1614) # fit the landscape l <- fit_2d_ld(single_output_grad, "x") summary(l) plot(l) # different behaviors for different `na_action` choices l1 <- fit_2d_ld(data.frame(x = c(1,2,1,2,NA,NA,NA,10,11,10,11)), "x") plot(l1) l2 <- fit_2d_ld(data.frame(x = c(1,2,1,2,NA,NA,NA,10,11,10,11)), "x", na_action = "omit_vectors") plot(l2)
# generate data single_output_grad <- simlandr::sim_fun_grad(length = 200, seed = 1614) # fit the landscape l <- fit_2d_ld(single_output_grad, "x") summary(l) plot(l) # different behaviors for different `na_action` choices l1 <- fit_2d_ld(data.frame(x = c(1,2,1,2,NA,NA,NA,10,11,10,11)), "x") plot(l1) l2 <- fit_2d_ld(data.frame(x = c(1,2,1,2,NA,NA,NA,10,11,10,11)), "x", na_action = "omit_vectors") plot(l2)
Estimate a 2D vector field from intensive longitudinal data. Two methods can be used: Multivariate Vector Field Kernel Estimator (MVKE, using MVKE()
), or Sparse Vector Field Consensus (SparseVFC, using SparseVFC::SparseVFC()
). Note that the input data are automatically normalized before being sent to the estimation engines to make sure the default parameter settings are close to the optimal. Therefore, you do not need to scale up or down the parameters of MVKE()
or SparseVFC::SparseVFC()
. We suggest the MVKE method to be used for psychological data because it has more realistic assumptions and produces more reasonable output.
fit_2d_vf( data, x, y, lims, n = 20, vector_position = "start", na_action = "omit_data_points", method = c("MVKE", "VFC"), ... )
fit_2d_vf( data, x, y, lims, n = 20, vector_position = "start", na_action = "omit_data_points", method = c("MVKE", "VFC"), ... )
data |
The data set used for estimating the vector field. Should be a data frame or a matrix. |
x , y
|
Characters to indicate the name of the two variables. |
lims |
The limits of the range for the vector field estimation as |
n |
The number of equally spaced points in each axis, at which the vectors are to be estimated. |
vector_position |
One of "start", "middle", or "end", representing the position of the vectors. If "start", for example, the starting point of a vector is regarded as the position of the vector. |
na_action |
One of "omit_data_points" or "omit_vectors". If using "omit_data_points", then only the |
method |
One of "MVKE" or "VFC". |
... |
Other parameters to be passed to |
A vectorfield
object.
# generate data single_output_grad <- simlandr::sim_fun_grad(length = 200, seed = 1614) # fit the vector field v2 <- fit_2d_vf(single_output_grad, x = "x", y = "y", method = "MVKE") plot(v2)
# generate data single_output_grad <- simlandr::sim_fun_grad(length = 200, seed = 1614) # fit the vector field v2 <- fit_2d_vf(single_output_grad, x = "x", y = "y", method = "MVKE") plot(v2)
Two methods are available: method = "pathB"
and method = "simlandr"
. See Details section.
fit_3d_vfld( vf, method = c("simlandr", "pathB"), .pathB_options = pathB_options(vf), .sim_vf_options = sim_vf_options(vf), .simlandr_options = simlandr_options(vf), linear_interp = FALSE )
fit_3d_vfld( vf, method = c("simlandr", "pathB"), .pathB_options = pathB_options(vf), .sim_vf_options = sim_vf_options(vf), .simlandr_options = simlandr_options(vf), linear_interp = FALSE )
vf |
A |
method |
The method used for landscape construction. Can be |
.pathB_options |
Only for |
.sim_vf_options |
Only for |
.simlandr_options |
Only for |
linear_interp |
Use linear interpolation method to estimate the drift vector (and the diffusion matrix). This can speed up the calculation. If |
For method = "simlandr"
, the landscape is constructed based on the generalized potential landscape by Wang et al. (2008), implemented by the simlandr
package. This function is a wrapper of sim_vf()
and simlandr::make_3d_static()
. Use those two functions separately for more customization.
For method = "pathB"
, the landscape is constructed based on the deterministic path-integral quasi-potential defined by Bhattacharya et al. (2011).
We recommend the simlandr
method for psychological data because it is more stable.
Parallel computing based on future
is supported for both methods. Use future::plan("multisession")
to enable this and speed up computation.
A landscape
object as described in simlandr::make_3d_static()
, or a 3d_static_landscape_B
object, which inherits from the landscape
class and contains the following elements: dist
, the distribution estimation for landscapes; plot
, a 3D plot using plotly
; plot_2, a 2D plot using ggplot2
; x, y, from vf
.
# generate data single_output_grad <- simlandr::sim_fun_grad(length = 200, seed = 1614) # fit the vector field v2 <- fit_2d_vf(single_output_grad, x = "x", y = "y", method = "MVKE") plot(v2) # fit the landscape future::plan("multisession") set.seed(1614) l2 <- fit_3d_vfld(v2, .sim_vf_options = sim_vf_options(chains = 16, stepsize = 1, forbid_overflow = TRUE), .simlandr_options = simlandr_options(adjust = 5, Umax = 4)) plot(l2, 2) future::plan("sequential")
# generate data single_output_grad <- simlandr::sim_fun_grad(length = 200, seed = 1614) # fit the vector field v2 <- fit_2d_vf(single_output_grad, x = "x", y = "y", method = "MVKE") plot(v2) # fit the landscape future::plan("multisession") set.seed(1614) l2 <- fit_3d_vfld(v2, .sim_vf_options = sim_vf_options(chains = 16, stepsize = 1, forbid_overflow = TRUE), .simlandr_options = simlandr_options(adjust = 5, Umax = 4)) plot(l2, 2) future::plan("sequential")
See references for details.
MVKE(d, v, h = 0.2, kernel = c("exp", "Gaussian"))
MVKE(d, v, h = 0.2, kernel = c("exp", "Gaussian"))
d |
The dataset. Should be a matrix or a data frame, with each row representing a random vector. |
v |
The vectors corresponding to the dataset. Should be a matrix or a data frame with the same shape as |
h |
The bandwidth for the kernel estimator. |
kernel |
The type of kernel estimator used. "exp" by default ( |
A function(x), which then returns the and
estimators at the position
.
Bandi, F. M., & Moloche, G. (2018). On the functional estimation of multivariate diffusion processes. Econometric Theory, 34(4), 896-946. https://doi.org/10.1017/S0266466617000305
Return a normalized prediction function
normalize_predict_f(vf)
normalize_predict_f(vf)
vf |
A |
A function that takes a vector x
and returns a list of v
, the drift part, and a
, the diffusion part.
See path_integral_B()
, align_pot_B()
for details.
pathB_options( vf, lims = rlang::expr(vf$lims), n_path_int = 20, stepsize = 0.01, tol = 0.01, numTimeSteps = 1400, n = 200, digits = 2, linear = TRUE, ... )
pathB_options( vf, lims = rlang::expr(vf$lims), n_path_int = 20, stepsize = 0.01, tol = 0.01, numTimeSteps = 1400, n = 200, digits = 2, linear = TRUE, ... )
vf |
A |
lims |
The limits of the range for the estimation as |
n_path_int |
The number of equally spaced points in each axis, at which the path integrals is to be calculated. |
stepsize |
The stepsize for Euler–Maruyama simulation of the system. |
tol |
The tolerance to test convergence. |
numTimeSteps |
Number of time steps for integrating along each path (to ensure uniform arrays). Choose high-enough number for convergence with given stepsize. |
n |
The number of equally spaced points in each axis, at which the landscape is to be estimated. |
digits |
Currently, the raw sample points in some regions are too dense that may crashes interpolation. To avoid this problem, only one point of all with the same first several digits. is kept. Use this parameter to indicate how many digits are considered. Note that this is a temporary solution and might be changed in the near future. |
linear |
logical – indicating whether linear or spline interpolation should be used. |
... |
Not in use. |
A list containing the parameters of the corresponding function. Only intended to be used within fit_3d_vfld()
Plot a 2D vector field estimated by fit_2d_vf()
. Powered by ggplot2::ggplot()
.
## S3 method for class 'vectorfield' plot( x, arrow = grid::arrow(length = grid::unit(0.1, "cm")), show_estimated_vector = TRUE, estimated_vector_enlarge = 1, estimated_vector_options = list(), show_point = TRUE, point_options = list(size = 0.5), show_original_vector = FALSE, original_vector_enlarge = 1, original_vector_options = list(), show_used_vector = FALSE, used_vector_options = list(color = "red"), show_v_norm = FALSE, v_norm_options = list(), ... )
## S3 method for class 'vectorfield' plot( x, arrow = grid::arrow(length = grid::unit(0.1, "cm")), show_estimated_vector = TRUE, estimated_vector_enlarge = 1, estimated_vector_options = list(), show_point = TRUE, point_options = list(size = 0.5), show_original_vector = FALSE, original_vector_enlarge = 1, original_vector_options = list(), show_used_vector = FALSE, used_vector_options = list(color = "red"), show_v_norm = FALSE, v_norm_options = list(), ... )
x |
A |
arrow |
The description of the arrow heads of the vectors on the plot (representing the vector field). Generated by |
show_estimated_vector |
Show the vectors from the estimated model? |
estimated_vector_enlarge |
A number. How many times should the vectors (representing the estimated vector field) be enlarged on the plot? This can be useful when the estimated vector field is too strong or too weak. |
estimated_vector_options |
A list passing other customized parameters to |
show_point |
Show the original data points? |
point_options |
A list passing other customized parameters to |
show_original_vector |
Show the original vectors (i.e., the vectors between data points)? |
original_vector_enlarge |
A number. How many times should the original vectors be enlarged on the plot? |
original_vector_options |
A list passing other customized parameters to |
show_used_vector |
Only for vector fields estimated by the "VFC" method. Should the vectors from the original data that are considered inliers be specially marked? |
used_vector_options |
Only for vector fields estimated by the "VFC" method. A list passing other customized parameters to |
show_v_norm |
Show the norm of the estimated vectors (the strength of the vector field)? |
v_norm_options |
A list passing other customized parameters to |
... |
Not in use. |
A ggplot2
plot.
Calculate the vector value at a given position
## S3 method for class 'vectorfield' predict(object, pos, linear_interp = FALSE, calculate_a = TRUE, ...)
## S3 method for class 'vectorfield' predict(object, pos, linear_interp = FALSE, calculate_a = TRUE, ...)
object |
A |
pos |
A vector, the position of the vector. |
linear_interp |
Use linear interpolation method to estimate the drift vector (and the diffusion matrix). This can speed up the calculation. If |
calculate_a |
Effective when |
... |
Not in use. |
A list of v
, the drift part that is used for vector fields, and a
(when calculate_a == TRUE
), the diffusion part at a given position.
Then simlandr::check_conv()
can be used meaningfully.
reorder_output(s, chains)
reorder_output(s, chains)
s |
A simulation output, possibly generated by |
chains |
How many chains simulations should be performed? |
A reordered matrix of the simulation output.
Parallel computing based on future
is supported. Use future::plan("multisession")
to enable this.
sim_vf( vf, noise = 1, noise_warmup = noise, chains = 10, length = 10000, discard = 0.3, stepsize = 0.01, sparse = 1, forbid_overflow = FALSE, linear_interp = FALSE, inits = matrix(c(stats::runif(chains, min = vf$lims[1], max = vf$lims[2]), stats::runif(chains, min = vf$lims[3], max = vf$lims[4])), ncol = 2) )
sim_vf( vf, noise = 1, noise_warmup = noise, chains = 10, length = 10000, discard = 0.3, stepsize = 0.01, sparse = 1, forbid_overflow = FALSE, linear_interp = FALSE, inits = matrix(c(stats::runif(chains, min = vf$lims[1], max = vf$lims[2]), stats::runif(chains, min = vf$lims[3], max = vf$lims[4])), ncol = 2) )
vf |
A |
noise |
Relative noise of the simulation. Set this smaller when the simulation is unstable (e.g., when the elements in the diffusion matrix are not finite), and set this larger when the simulation converges too slowly. |
noise_warmup |
The noise used for the warming-up period. |
chains |
How many chains simulations should be performed? |
length |
The simulation length for each chain. |
discard |
How much of the starting part of each chain should be discarded? (Warming-up period.) |
stepsize |
The stepsize for Euler–Maruyama simulation of the system. |
sparse |
A number. How much do you want to sparse the output? When the noise is small, sparse the output may make the density estimation more efficient. |
forbid_overflow |
If |
linear_interp |
Use linear interpolation method to estimate the drift vector (and the diffusion matrix). This can speed up the calculation. If |
inits |
The initial values of each chain. |
A matrix of the simulated data.
See sim_vf()
for details.
sim_vf_options( vf, noise = 1, noise_warmup = noise, chains = 10, length = 10000, discard = 0.3, stepsize = 0.01, sparse = 1, forbid_overflow = FALSE, linear_interp = FALSE, inits = rlang::expr(matrix(c(stats::runif(chains, min = vf$lims[1], max = vf$lims[2]), stats::runif(chains, min = vf$lims[3], max = vf$lims[4])), ncol = 2)) )
sim_vf_options( vf, noise = 1, noise_warmup = noise, chains = 10, length = 10000, discard = 0.3, stepsize = 0.01, sparse = 1, forbid_overflow = FALSE, linear_interp = FALSE, inits = rlang::expr(matrix(c(stats::runif(chains, min = vf$lims[1], max = vf$lims[2]), stats::runif(chains, min = vf$lims[3], max = vf$lims[4])), ncol = 2)) )
vf |
A |
noise |
Relative noise of the simulation. Set this smaller when the simulation is unstable (e.g., when the elements in the diffusion matrix are not finite), and set this larger when the simulation converges too slowly. |
noise_warmup |
The noise used for the warming-up period. |
chains |
How many chains simulations should be performed? |
length |
The simulation length for each chain. |
discard |
How much of the starting part of each chain should be discarded? (Warming-up period.) |
stepsize |
The stepsize for Euler–Maruyama simulation of the system. |
sparse |
A number. How much do you want to sparse the output? When the noise is small, sparse the output may make the density estimation more efficient. |
forbid_overflow |
If |
linear_interp |
Use linear interpolation method to estimate the drift vector (and the diffusion matrix). This can speed up the calculation. If |
inits |
The initial values of each chain. |
A list containing the parameters of the corresponding function. Only intended to be used within fit_3d_vfld()
To control the behavior of simlandr::make_3d_static()
, but with default values accommodated for fitlandr
. See simlandr::make_3d_static()
for details.
simlandr_options( vf, x = rlang::expr(vf$x), y = rlang::expr(vf$y), lims = rlang::expr(vf$lims), kde_fun = c("ks", "MASS"), n = 200, adjust = 1, h, Umax = 5 )
simlandr_options( vf, x = rlang::expr(vf$x), y = rlang::expr(vf$y), lims = rlang::expr(vf$lims), kde_fun = c("ks", "MASS"), n = 200, adjust = 1, h, Umax = 5 )
vf |
A |
x , y
|
The names of the target variables. |
lims |
The limits of the range for the density estimator as |
kde_fun |
Which kernel estimator to use? Choices: "ks" |
n |
The number of equally spaced points in each axis, at which the density is to be estimated. |
adjust |
The multiplier to the bandwidth. The bandwidth used is actually |
h |
A number, or possibly a vector for 3D and 4D landscapes, specifying the smoothing bandwidth to be used. If missing, the default value of the kernel estimator will be used (but |
Umax |
The maximum displayed value of potential. |
A list containing the parameters of the corresponding function. Only intended to be used within fit_3d_vfld()