Title: | Generate Synthetic Data from Statistical Models |
---|---|
Description: | Generate synthetic time series from commonly used statistical models, including linear, nonlinear and chaotic systems. Applications to testing methods can be found in Jiang, Z., Sharma, A., & Johnson, F. (2019) <doi:10.1016/j.advwatres.2019.103430> and Jiang, Z., Sharma, A., & Johnson, F. (2020) <doi:10.1029/2019WR026962> associated with an open-source tool by Jiang, Z., Rashid, M. M., Johnson, F., & Sharma, A. (2020) <doi:10.1016/j.envsoft.2020.104907>. |
Authors: | Ze Jiang [aut, cre] |
Maintainer: | Ze Jiang <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.2.5 |
Built: | 2025-01-20 04:42:18 UTC |
Source: | https://github.com/zejiang-unsw/synthesis |
Generate an affine error model.
data.gen.affine(nobs, a = 0, b = 1, ndim = 3, mu = 0, sd = 1)
data.gen.affine(nobs, a = 0, b = 1, ndim = 3, mu = 0, sd = 1)
nobs |
The data length to be generated. |
a |
intercept |
b |
slope |
ndim |
The number of potential predictors (default is 9). |
mu |
mean of error term |
sd |
standard deviation of error term |
A list of 2 elements: a vector of response (x), and a matrix of potential predictors (dp) with each column containing one potential predictor.
McColl, K. A., Vogelzang, J., Konings, A. G., Entekhabi, D., Piles, M., & Stoffelen, A. (2014). Extended triple collocation: Estimating errors and correlation coefficients with respect to an unknown target. Geophysical Research Letters, 41(17), 6229-6236. doi:10.1002/2014gl061322
# Affine error model from paper with 3 dummy variables data.affine<-data.gen.affine(500) plot.ts(cbind(data.affine$x,data.affine$dp))
# Affine error model from paper with 3 dummy variables data.affine<-data.gen.affine(500) plot.ts(cbind(data.affine$x,data.affine$dp))
Generate predictor and response data from AR1 model.
data.gen.ar1(nobs, ndim = 9)
data.gen.ar1(nobs, ndim = 9)
nobs |
The data length to be generated. |
ndim |
The number of potential predictors (default is 9). |
A list of 2 elements: a vector of response (x), and a matrix of potential predictors (dp) with each column containing one potential predictor.
# AR1 model from paper with 9 dummy variables data.ar1<-data.gen.ar1(500) plot.ts(cbind(data.ar1$x,data.ar1$dp))
# AR1 model from paper with 9 dummy variables data.ar1<-data.gen.ar1(500) plot.ts(cbind(data.ar1$x,data.ar1$dp))
Generate predictor and response data from AR4 model.
data.gen.ar4(nobs, ndim = 9)
data.gen.ar4(nobs, ndim = 9)
nobs |
The data length to be generated. |
ndim |
The number of potential predictors (default is 9). |
A list of 2 elements: a vector of response (x), and a matrix of potential predictors (dp) with each column containing one potential predictor.
# AR4 model from paper with total 9 dimensions data.ar4<-data.gen.ar4(500) plot.ts(cbind(data.ar4$x,data.ar4$dp))
# AR4 model from paper with total 9 dimensions data.ar4<-data.gen.ar4(500) plot.ts(cbind(data.ar4$x,data.ar4$dp))
Generate predictor and response data from AR9 model.
data.gen.ar9(nobs, ndim = 9)
data.gen.ar9(nobs, ndim = 9)
nobs |
The data length to be generated. |
ndim |
The number of potential predictors (default is 9). |
A list of 2 elements: a vector of response (x), and a matrix of potential predictors (dp) with each column containing one potential predictor.
# AR9 model from paper with total 9 dimensions data.ar9<-data.gen.ar9(500) plot.ts(cbind(data.ar9$x,data.ar9$dp))
# AR9 model from paper with total 9 dimensions data.ar9<-data.gen.ar9(500) plot.ts(cbind(data.ar9$x,data.ar9$dp))
Gaussian Blobs
data.gen.blobs( nobs = 100, features = 2, centers = 3, sd = 1, bbox = c(-10, 10), do.plot = TRUE )
data.gen.blobs( nobs = 100, features = 2, centers = 3, sd = 1, bbox = c(-10, 10), do.plot = TRUE )
nobs |
The data length to be generated. |
features |
Features of dataset. |
centers |
Either the number of centers, or a matrix of the chosen centers. |
sd |
The level of Gaussian noise, default 1. |
bbox |
The bounding box of the dataset. |
do.plot |
Logical value. If TRUE (default value), a plot of the generated Blobs is shown. |
This function generates a matrix of features creating multiclass datasets by allocating each class one or more normally-distributed clusters of points. It can control both centers and standard deviations of each cluster. For example, we want to generate a dataset of weight and height (two features) of 500 people (data length), including three groups, baby, children, and adult. Centers are the average weight and height for each group, assuming both weight and height are normally distributed (i.e. follow Gaussian distribution). The standard deviation (sd) is the sd of the Gaussian distribution while the bounding box (bbox) is the range for each generated cluster center when only the number of centers is given.
A list of two variables, x and classes.
Amos Elberg (2018). clusteringdatasets: Datasets useful for testing clustering algorithms. R package version 0.1.1. https://github.com/elbamos/clusteringdatasets
Blobs=data.gen.blobs(nobs=1000, features=2, centers=3, sd=1, bbox=c(-10,10), do.plot=TRUE)
Blobs=data.gen.blobs(nobs=1000, features=2, centers=3, sd=1, bbox=c(-10,10), do.plot=TRUE)
This function generates a time series of one dimension Brownian motion.
data.gen.bm( x0 = 0, w0 = 0, time = seq(0, by = 0.01, length.out = 101), do.plot = TRUE )
data.gen.bm( x0 = 0, w0 = 0, time = seq(0, by = 0.01, length.out = 101), do.plot = TRUE )
x0 |
the start value of x, with the default value 0 |
w0 |
the start value of w, with the default value 0 |
time |
the temporal interval at which the system will be generated. Default seq(0,by=0.01,len=101). |
do.plot |
a logical value. If TRUE (default value), a plot of the generated system is shown. |
Yanping Chen, http://cos.name/wp-content/uploads/2008/12/stochastic-differential-equation-with-r.pdf
set.seed(123) x <- data.gen.bm()
set.seed(123) x <- data.gen.bm()
Generate build-up and wash-off model for water quality modeling
data.gen.BUWO(nobs, k = 0.5, a = 1, m0 = 10, q = 0)
data.gen.BUWO(nobs, k = 0.5, a = 1, m0 = 10, q = 0)
nobs |
The data length to be generated. |
k |
build-up coefficient (kg*t-1) |
a |
wash-off rate constant (m-3) |
m0 |
threshold at which additional mass does not accumulate on the surface (kg) |
q |
runoff (m3*t-1) |
A list of 2 elements: a vector of build-up mass (x), and a vector of wash-off mass (y) per unit time.
Wu, X., Marshall, L., & Sharma, A. (2019). The influence of data transformations in simulating Total Suspended Solids using Bayesian inference. Environmental modelling & software, 121, 104493. doi:https://doi.org/10.1016/j.envsoft.2019.104493
Shaw, S. B., Stedinger, J. R., & Walter, M. T. (2010). Evaluating Urban Pollutant Buildup/Wash-Off Models Using a Madison, Wisconsin Catchment. Journal of Environmental Engineering, 136(2), 194-203. https://doi.org/10.1061/(ASCE)EE.1943-7870.0000142
# Build up model set.seed(101) sample = 500 #create a gamma shape storm event q<- seq(0,20, length.out=sample) p <- pgamma(q, shape=9, rate =2, lower.tail = TRUE) p <- c(p[1],p[2:sample]-p[1:(sample-1)]) data.tss<-data.gen.BUWO(sample, k=0.5, a=5, m0=10, q=p) plot.ts(cbind(p, data.tss$x, data.tss$y), ylab=c("Q","Bulid-up","Wash-off"))
# Build up model set.seed(101) sample = 500 #create a gamma shape storm event q<- seq(0,20, length.out=sample) p <- pgamma(q, shape=9, rate =2, lower.tail = TRUE) p <- c(p[1],p[2:sample]-p[1:(sample-1)]) data.tss<-data.gen.BUWO(sample, k=0.5, a=5, m0=10, q=p) plot.ts(cbind(p, data.tss$x, data.tss$y), ylab=c("Q","Bulid-up","Wash-off"))
Circles
data.gen.circles( n, r_vec = c(1, 2), start = runif(1, -1, 1), s, do.plot = TRUE )
data.gen.circles( n, r_vec = c(1, 2), start = runif(1, -1, 1), s, do.plot = TRUE )
n |
The data length to be generated. |
r_vec |
The radius of circles. |
start |
The center of circles. |
s |
The level of Gaussian noise, default 0. |
do.plot |
Logical value. If TRUE (default value), a plot of the generated Circles is shown. |
A list of two variables, x and classes.
Circles=data.gen.circles(n = 1000, r_vec=c(1,2), start=runif(1,-1,1), s=0.1, do.plot=TRUE)
Circles=data.gen.circles(n = 1000, r_vec=c(1,2), start=runif(1,-1,1), s=0.1, do.plot=TRUE)
Generates a 2-dimensional time series using the Duffing map.
data.gen.Duffing( nobs = 5000, a = 2.75, b = 0.2, start = runif(n = 2, min = -0.5, max = 0.5), s, do.plot = TRUE )
data.gen.Duffing( nobs = 5000, a = 2.75, b = 0.2, start = runif(n = 2, min = -0.5, max = 0.5), s, do.plot = TRUE )
nobs |
Length of the generated time series. Default: 5000 samples. |
a |
The a parameter. Default: 2.75. |
b |
The b parameter. Default: 0.2. |
start |
A 2-dimensional vector indicating the starting values for the x and y Duffing coordinates. Default: If the starting point is not specified, it is generated randomly. |
s |
The level of noise, default 0. |
do.plot |
Logical value. If TRUE (default value), a plot of the generated Duffing system is shown. |
The Duffing map is defined as follows:
The default selection for both a and b parameters (a=1.4 and b=0.3) is known to produce a deterministic chaotic time series.
A list with two vectors named x and y containing the x-components and the y-components of the Duffing map, respectively.
Some initial values may lead to an unstable system that will tend to infinity.
Constantino A. Garcia (2019). nonlinearTseries: Nonlinear Time Series Analysis. R package version 0.2.7. https://CRAN.R-project.org/package=nonlinearTseries
Duffing.map=data.gen.Duffing(nobs = 1000, do.plot=TRUE)
Duffing.map=data.gen.Duffing(nobs = 1000, do.plot=TRUE)
This function generates a a time series of one dimension fractional Brownian motion.
data.gen.fbm( hurst = 0.95, time = seq(0, by = 0.01, length.out = 1000), do.plot = TRUE )
data.gen.fbm( hurst = 0.95, time = seq(0, by = 0.01, length.out = 1000), do.plot = TRUE )
hurst |
the hurst index, with the default value 0.95, ranging from [0,1]. |
time |
the temporal interval at which the system will be generated. Default seq(0,by=0.01,len=1000). |
do.plot |
a logical value. If TRUE (default value), a plot of the generated system is shown. |
Zdravko Botev (2020). Fractional Brownian motion generator (https://www.mathworks.com/matlabcentral/fileexchange/38935-fractional-brownian-motion-generator), MATLAB Central File Exchange. Retrieved August 17, 2020.
Kroese, D. P., & Botev, Z. I. (2015). Spatial Process Simulation. In Stochastic Geometry, Spatial Statistics and Random Fields(pp. 369-404) Springer International Publishing, DOI: 10.1007/978-3-319-10064-7_12
set.seed(123) x <- data.gen.fbm()
set.seed(123) x <- data.gen.fbm()
Friedman with independent uniform variates
data.gen.fm1(nobs, ndim = 9, noise = 1)
data.gen.fm1(nobs, ndim = 9, noise = 1)
nobs |
The data length to be generated. |
ndim |
The number of potential predictors (default is 9). |
noise |
The noise level in the time series. |
A list of 3 elements: a vector of response (x), a matrix of potential predictors (dp) with each column containing one potential predictor, and a vector of true predictor numbers.
###synthetic example - Friedman #Friedman with independent uniform variates data.fm1 <- data.gen.fm1(nobs=1000, ndim = 9, noise = 0) #Friedman with correlated uniform variates data.fm2 <- data.gen.fm2(nobs=1000, ndim = 9, r = 0.6, noise = 0) plot.ts(cbind(data.fm1$x,data.fm2$x), col=c('red','blue'), main=NA, xlab=NA, ylab=c('Friedman with \n independent uniform variates', 'Friedman with \n correlated uniform variates'))
###synthetic example - Friedman #Friedman with independent uniform variates data.fm1 <- data.gen.fm1(nobs=1000, ndim = 9, noise = 0) #Friedman with correlated uniform variates data.fm2 <- data.gen.fm2(nobs=1000, ndim = 9, r = 0.6, noise = 0) plot.ts(cbind(data.fm1$x,data.fm2$x), col=c('red','blue'), main=NA, xlab=NA, ylab=c('Friedman with \n independent uniform variates', 'Friedman with \n correlated uniform variates'))
Friedman with correlated uniform variates
data.gen.fm2(nobs, ndim = 9, r = 0.6, noise = 0)
data.gen.fm2(nobs, ndim = 9, r = 0.6, noise = 0)
nobs |
The data length to be generated. |
ndim |
The number of potential predictors (default is 9). |
r |
Target Spearman correlation. |
noise |
The noise level in the time series. |
A list of 3 elements: a vector of response (x), a matrix of potential predictors (dp) with each column containing one potential predictor, and a vector of true predictor numbers.
###synthetic example - Friedman #Friedman with independent uniform variates data.fm1 <- data.gen.fm1(nobs=1000, ndim = 9, noise = 0) #Friedman with correlated uniform variates data.fm2 <- data.gen.fm2(nobs=1000, ndim = 9, r = 0.6, noise = 0) plot.ts(cbind(data.fm1$x,data.fm2$x), col=c('red','blue'), main=NA, xlab=NA, ylab=c('Friedman with \n independent uniform variates', 'Friedman with \n correlated uniform variates'))
###synthetic example - Friedman #Friedman with independent uniform variates data.fm1 <- data.gen.fm1(nobs=1000, ndim = 9, noise = 0) #Friedman with correlated uniform variates data.fm2 <- data.gen.fm2(nobs=1000, ndim = 9, r = 0.6, noise = 0) plot.ts(cbind(data.fm1$x,data.fm2$x), col=c('red','blue'), main=NA, xlab=NA, ylab=c('Friedman with \n independent uniform variates', 'Friedman with \n correlated uniform variates'))
This function generates a a time series of one dimension geometric Brownian motion.
data.gen.gbm( x0 = 10, w0 = 0, mu = 1, sigma = 0.5, time = seq(0, by = 0.01, length.out = 101), do.plot = TRUE )
data.gen.gbm( x0 = 10, w0 = 0, mu = 1, sigma = 0.5, time = seq(0, by = 0.01, length.out = 101), do.plot = TRUE )
x0 |
the start value of x, with the default value 10 |
w0 |
the start value of w, with the default value 0 |
mu |
the interest/drifting rate, with the default value 1. |
sigma |
the diffusion coefficient, with the default value 0.5. |
time |
the temporal interval at which the system will be generated. Default seq(0,by=0.01,len=101). |
do.plot |
a logical value. If TRUE (default value), a plot of the generated system is shown. |
Yanping Chen, http://cos.name/wp-content/uploads/2008/12/stochastic-differential-equation-with-r.pdf
set.seed(123) x <- data.gen.gbm()
set.seed(123) x <- data.gen.gbm()
Generates a 2-dimensional time series using the Henon map.
data.gen.Henon( nobs = 5000, a = 1.4, b = 0.3, start = runif(n = 2, min = -0.5, max = 0.5), s, do.plot = TRUE )
data.gen.Henon( nobs = 5000, a = 1.4, b = 0.3, start = runif(n = 2, min = -0.5, max = 0.5), s, do.plot = TRUE )
nobs |
Length of the generated time series. Default: 5000 samples. |
a |
The a parameter. Default: 1.4. |
b |
The b parameter. Default: 0.3. |
start |
A 2-dimensional vector indicating the starting values for the x and y Henon coordinates. Default: If the starting point is not specified, it is generated randomly. |
s |
The level of noise, default 0. |
do.plot |
Logical value. If TRUE (default value), a plot of the generated Henon system is shown. |
The Henon map is defined as follows:
The default selection for both a and b parameters (a=1.4 and b=0.3) is known to produce a deterministic chaotic time series.
A list with two vectors named x and y containing the x-components and the y-components of the Henon map, respectively.
Some initial values may lead to an unstable system that will tend to infinity.
Constantino A. Garcia (2019). nonlinearTseries: Nonlinear Time Series Analysis. R package version 0.2.7. https://CRAN.R-project.org/package=nonlinearTseries
Henon.map=data.gen.Henon(nobs = 1000, do.plot=TRUE)
Henon.map=data.gen.Henon(nobs = 1000, do.plot=TRUE)
Generate predictor and response data: Hysteresis Loop
data.gen.HL( nobs = 512, a = 0.8, b = 0.6, c = 0.2, m = 3, n = 5, fp = 25, fd, sd.x = 0.1, sd.y = 0.1 )
data.gen.HL( nobs = 512, a = 0.8, b = 0.6, c = 0.2, m = 3, n = 5, fp = 25, fd, sd.x = 0.1, sd.y = 0.1 )
nobs |
The data length to be generated. |
a |
The a parameter. Default: 0.8. |
b |
The b parameter. Default: 0.6. |
c |
The c parameter. Default: 0.2. |
m |
Positive integer for the split line parameter. If m=1, split line is linear; If m is even, split line has a u shape; If m is odd and higher than 1, split line has a chair or classical shape. |
n |
Positive odd integer for the bulging parameter, indicates degree of outward curving (1=highest level of bulging). |
fp |
The frequency in the generated response. fp = 25 used in the WRR paper. |
fd |
A vector of frequencies for potential predictors. fd = c(3,5,10,15,25,30,55,70,95) used in the WRR paper. |
sd.x |
The noise level in the predictor. |
sd.y |
The noise level in the response. |
The Hysteresis is a common nonlinear phenomenon in natural systems and it can be numerical simulated by the following formulas:
The default selection for the system parameters (a = 0.8, b = 0.6, c = 0.2, m = 3, n = 5) is known to generate a classical hysteresis loop.
A list of 3 elements: a vector of response (x), a matrix of potential predictors (dp) with each column containing one potential predictor, and a vector of true predictor numbers.
LAPSHIN, R. V. 1995. Analytical model for the approximation of hysteresis loop and its application to the scanning tunneling microscope. Review of Scientific Instruments, 66, 4718-4730.
###synthetic example - Hysteresis loop #frequency, sampled from a given range fd <- c(3,5,10,15,25,30,55,70,95) data.HL <- data.gen.HL(m=3,n=5,nobs=512,fp=25,fd=fd) plot.ts(cbind(data.HL$x,data.HL$dp))
###synthetic example - Hysteresis loop #frequency, sampled from a given range fd <- c(3,5,10,15,25,30,55,70,95) data.HL <- data.gen.HL(m=3,n=5,nobs=512,fp=25,fd=fd) plot.ts(cbind(data.HL$x,data.HL$dp))
Generates data from a specific linear Gaussian state space model of the form
and
, where
and
denote independent standard
Gaussian random variables, i.e.
.
data.gen.LGSS( theta, nobs, start = runif(n = 1, min = -1, max = 1), do.plot = TRUE )
data.gen.LGSS( theta, nobs, start = runif(n = 1, min = -1, max = 1), do.plot = TRUE )
theta |
The parameters |
nobs |
The data length to be generated. |
start |
A numeric value indicating the starting value for the time series. If the starting point is not specified, it is generated randomly. |
do.plot |
Logical value. If TRUE (default value), a plot of the generated LGSS system is shown. |
A list of two variables, state and response.
#Dahlin, J. & Schon, T. B. 'Getting Started with Particle Metropolis-Hastings for Inference in Nonlinear Dynamical Models.' Journal of Statistical Software, Code Snippets, 88(2): 1–41, 2019.
data.LGSS <- data.gen.LGSS(theta=c(0.75,1.00,0.10), nobs=500, start=0)
data.LGSS <- data.gen.LGSS(theta=c(0.75,1.00,0.10), nobs=500, start=0)
Generates a time series using the logistic map.
data.gen.Logistic( nobs = 5000, r = 4, start = runif(n = 1, min = 0, max = 1), s, do.plot = TRUE )
data.gen.Logistic( nobs = 5000, r = 4, start = runif(n = 1, min = 0, max = 1), s, do.plot = TRUE )
nobs |
Length of the generated time series. Default: 5000 samples. |
r |
The r parameter. Default: 4 |
start |
A numeric value indicating the starting value for the time series. If the starting point is not specified, it is generated randomly. |
s |
The level of noise, default 0. |
do.plot |
Logical value. If TRUE (default value), a plot of the generated Logistic system is shown. |
The logistic map is defined as follows:
A vector of time series.
Constantino A. Garcia (2019). nonlinearTseries: Nonlinear Time Series Analysis. R package version 0.2.7. https://CRAN.R-project.org/package=nonlinearTseries
Logistic.map=data.gen.Logistic(nobs = 1000, do.plot=TRUE)
Logistic.map=data.gen.Logistic(nobs = 1000, do.plot=TRUE)
Generates a 3-dimensional time series using the Lorenz equations.
data.gen.Lorenz( sigma = 10, beta = 8/3, rho = 28, start = c(-13, -14, 47), time = seq(0, 50, length.out = 1000), s )
data.gen.Lorenz( sigma = 10, beta = 8/3, rho = 28, start = c(-13, -14, 47), time = seq(0, 50, length.out = 1000), s )
sigma |
The |
beta |
The |
rho |
The |
start |
A 3-dimensional numeric vector indicating the starting point for the time series. Default: c(-13, -14, 47). |
time |
The temporal interval at which the system will be generated. Default: time=seq(0,50,by = 0.01). |
s |
The level of noise, default 0. |
The Lorenz system is a system of ordinary differential equations defined as:
The default selection for the system parameters () is known to
produce a deterministic chaotic time series.
A list with four vectors named time, x, y and z containing the time, the x-components, the y-components and the z-components of the Lorenz system, respectively.
Some initial values may lead to an unstable system that will tend to infinity.
Constantino A. Garcia (2019). nonlinearTseries: Nonlinear Time Series Analysis. R package version 0.2.7. https://CRAN.R-project.org/package=nonlinearTseries
###Synthetic example - Lorenz ts.l <- data.gen.Lorenz(sigma = 10, beta = 8/3, rho = 28, start = c(-13, -14, 47), time = seq(0, by=0.05, length.out = 2000)) ts.plot(cbind(ts.l$x,ts.l$y,ts.l$z), col=c('black','red','blue'))
###Synthetic example - Lorenz ts.l <- data.gen.Lorenz(sigma = 10, beta = 8/3, rho = 28, start = c(-13, -14, 47), time = seq(0, by=0.05, length.out = 2000)) ts.plot(cbind(ts.l$x,ts.l$y,ts.l$z), col=c('black','red','blue'))
Nonlinear system with independent/correlate covariates
data.gen.nl1(nobs, ndim = 15, r = 0.6, noise = 1)
data.gen.nl1(nobs, ndim = 15, r = 0.6, noise = 1)
nobs |
The data length to be generated. |
ndim |
The number of potential predictors (default is 9). |
r |
Target Spearman correlation among covariates. |
noise |
The noise level in the time series. |
A list of 3 elements: a vector of response (x), a matrix of potential predictors (dp) with each column containing one potential predictor, and a vector of true predictor numbers.
###synthetic example - Friedman #Friedman with independent uniform variates data.nl1 <- data.gen.nl1(nobs=1000) #Friedman with correlated uniform variates data.nl2 <- data.gen.nl2(nobs=1000) plot.ts(cbind(data.nl1$x,data.nl2$x), col=c('red','blue'), main=NA, xlab=NA, ylab=c('Nonlinear system with \n independent uniform variates', 'Nonlinear system with \n correlated uniform variates'))
###synthetic example - Friedman #Friedman with independent uniform variates data.nl1 <- data.gen.nl1(nobs=1000) #Friedman with correlated uniform variates data.nl2 <- data.gen.nl2(nobs=1000) plot.ts(cbind(data.nl1$x,data.nl2$x), col=c('red','blue'), main=NA, xlab=NA, ylab=c('Nonlinear system with \n independent uniform variates', 'Nonlinear system with \n correlated uniform variates'))
Nonlinear system with Exogenous covariates
data.gen.nl2(nobs, ndim = 7, noise = 1)
data.gen.nl2(nobs, ndim = 7, noise = 1)
nobs |
The data length to be generated. |
ndim |
The number of potential predictors (default is 9). |
noise |
The noise level in the time series. |
A list of 3 elements: a vector of response (x), a matrix of potential predictors (dp) with each column containing one potential predictor, and a vector of true predictor numbers.
Sharma, A., & Mehrotra, R. (2014). An information theoretic alternative to model a natural system using observational information alone. Water Resources Research, 50(1), 650-660.
###synthetic example - Friedman #Friedman with independent uniform variates data.nl1 <- data.gen.nl1(nobs=1000) #Friedman with correlated uniform variates data.nl2 <- data.gen.nl2(nobs=1000) plot.ts(cbind(data.nl1$x,data.nl2$x), col=c('red','blue'), main=NA, xlab=NA, ylab=c('Nonlinear system with \n independent uniform variates', 'Nonlinear system with \n correlated uniform variates'))
###synthetic example - Friedman #Friedman with independent uniform variates data.nl1 <- data.gen.nl1(nobs=1000) #Friedman with correlated uniform variates data.nl2 <- data.gen.nl2(nobs=1000) plot.ts(cbind(data.nl1$x,data.nl2$x), col=c('red','blue'), main=NA, xlab=NA, ylab=c('Nonlinear system with \n independent uniform variates', 'Nonlinear system with \n correlated uniform variates'))
Generate correlated normal variates
data.gen.norm(n, mu = rep(0, 2), sd = rep(1, 2), r = 0.6, sigma)
data.gen.norm(n, mu = rep(0, 2), sd = rep(1, 2), r = 0.6, sigma)
n |
The data length to be generated. |
mu |
A vector giving the means of the variables. |
sd |
A vector giving the standard deviation of the variables. |
r |
The target Pearson correlation, default is 0.6. |
sigma |
A positive-definite symmetric matrix specifying the covariance matrix of the variables. |
A matrix of correlated normal variates
Generates a 3-dimensional time series using the Rossler equations.
data.gen.Rossler( a = 0.2, b = 0.2, w = 5.7, start = c(-2, -10, 0.2), time = seq(0, by = 0.05, length.out = 1000), s )
data.gen.Rossler( a = 0.2, b = 0.2, w = 5.7, start = c(-2, -10, 0.2), time = seq(0, by = 0.05, length.out = 1000), s )
a |
The a parameter. Default: 0.2. |
b |
The b parameter. Default: 0.2. |
w |
The w parameter. Default: 5.7. |
start |
A 3-dimensional numeric vector indicating the starting point for the time series. Default: c(-2, -10, 0.2). |
time |
The temporal interval at which the system will be generated. Default: time=seq(0,50,by=0.01) or time = seq(0,by=0.01,length.out = 1000) |
s |
The level of noise, default 0. |
The Rössler system is a system of ordinary differential equations defined as:
The default selection for the system parameters (a = 0.2, b = 0.2, w = 5.7) is known to produce a deterministic chaotic time series. However, the values a = 0.1, b = 0.1, and c = 14 are more commonly used. These Rössler equations are simpler than those Lorenz used since only one nonlinear term appears (the product xz in the third equation).
Here, a = b = 0.1 and c changes. The bifurcation diagram reveals that low values of c are periodic, but quickly become chaotic as c increases. This pattern repeats itself as c increases — there are sections of periodicity interspersed with periods of chaos, and the trend is towards higher-period orbits as c increases. For example, the period one orbit only appears for values of c around 4 and is never found again in the bifurcation diagram. The same phenomenon is seen with period three; until c = 12, period three orbits can be found, but thereafter, they do not appear.
A list with four vectors named time, x, y and z containing the time, the x-components, the y-components and the z-components of the Rössler system, respectively.
Some initial values may lead to an unstable system that will tend to infinity.
Rössler, O. E. 1976. An equation for continuous chaos. Physics Letters A, 57, 397-398.
Constantino A. Garcia (2019). nonlinearTseries: Nonlinear Time Series Analysis. R package version 0.2.7. https://CRAN.R-project.org/package=nonlinearTseries
wikipedia https://en.wikipedia.org/wiki/R
###synthetic example - Rössler ts.r <- data.gen.Rossler(a = 0.1, b = 0.1, w = 8.7, start = c(-2, -10, 0.2), time = seq(0, by=0.05, length.out = 10000)) oldpar <- par(no.readonly = TRUE) par(mfrow=c(1,1), ps=12, cex.lab=1.5) plot.ts(cbind(ts.r$x,ts.r$y,ts.r$z), col=c('black','red','blue')) par(mfrow=c(1,2), ps=12, cex.lab=1.5) plot(ts.r$x,ts.r$y, xlab='x',ylab = 'y', type = 'l') plot(ts.r$x,ts.r$z, xlab='x',ylab = 'z', type = 'l') par(oldpar)
###synthetic example - Rössler ts.r <- data.gen.Rossler(a = 0.1, b = 0.1, w = 8.7, start = c(-2, -10, 0.2), time = seq(0, by=0.05, length.out = 10000)) oldpar <- par(no.readonly = TRUE) par(mfrow=c(1,1), ps=12, cex.lab=1.5) plot.ts(cbind(ts.r$x,ts.r$y,ts.r$z), col=c('black','red','blue')) par(mfrow=c(1,2), ps=12, cex.lab=1.5) plot(ts.r$x,ts.r$y, xlab='x',ylab = 'y', type = 'l') plot(ts.r$x,ts.r$z, xlab='x',ylab = 'z', type = 'l') par(oldpar)
Generate Random walk time series.
data.gen.rw(nobs, drift = 0.2, sd = 1)
data.gen.rw(nobs, drift = 0.2, sd = 1)
nobs |
the data length to be generated |
drift |
drift |
sd |
the white noise in the data |
A list of 2 elements: random walk and random walk with drift
Shumway, R. H. and D. S. Stoffer (2011). Time series regression and exploratory data analysis. Time series analysis and its applications, Springer: 47-82.
set.seed(154) data.rw <- data.gen.rw(200) plot.ts(data.rw$xd, ylim=c(-5,55), main='random walk', ylab='') lines(data.rw$x, col=4); abline(h=0, col=4, lty=2); abline(a=0, b=.2, lty=2)
set.seed(154) data.rw <- data.gen.rw(200) plot.ts(data.rw$xd, ylim=c(-5,55), main='random walk', ylab='') lines(data.rw$x, col=4); abline(h=0, col=4, lty=2); abline(a=0, b=.2, lty=2)
Spirals
data.gen.spirals(n, cycles = 1, s = 0, do.plot = TRUE)
data.gen.spirals(n, cycles = 1, s = 0, do.plot = TRUE)
n |
The data length to be generated. |
cycles |
The number of cycles of spirals. |
s |
The level of Gaussian noise, default 0. |
do.plot |
Logical value. If TRUE (default value), a plot of the generated Spirals is shown. |
A list of two variables, x and classes.
Friedrich Leisch & Evgenia Dimitriadou (2010). mlbench: Machine Learning Benchmark Problems. R package version 2.1-1.
Spirals=data.gen.spirals(n = 2000, cycles=2, s=0.01, do.plot=TRUE)
Spirals=data.gen.spirals(n = 2000, cycles=2, s=0.01, do.plot=TRUE)
Generate predictor and response data: Sinusoidal model
data.gen.SW(nobs = 500, freq = 50, A = 2, phi = pi, mu = 0, sd = 1)
data.gen.SW(nobs = 500, freq = 50, A = 2, phi = pi, mu = 0, sd = 1)
nobs |
The data length to be generated. |
freq |
The frequencies in the generated response. Default freq=50. |
A |
The amplitude of the sinusoidal series |
phi |
The phase of the sinusoidal series |
mu |
The mean of Gaussian noise in the variable. |
sd |
The standard deviation of Gaussian noise in the variable. |
A list of time and x.
Shumway, R. H., & Stoffer, D. S. (2011). Characteristics of Time Series. In D. S. Stoffer (Ed.), Time series analysis and its applications (pp. 8-14). New York : Springer.
### Sinusoidal model delta <- 1/12 # sampling rate, assuming monthly period.max<- 2^5 N = 6*period.max/delta scales<- 2^(0:5)[c(2,6)] #pick two scales scales ### scale, period, and frequency # freq=1/T; T=s/delta so freq = delta/s tmp <- NULL for(s in scales){ tmp <- cbind(tmp, data.gen.SW(nobs=N, freq = delta/s, A = 1, phi = 0, mu=0, sd = 0)$x) } x <- rowSums(data.frame(tmp)) plot.ts(cbind(tmp,x), type = 'l', main=NA)
### Sinusoidal model delta <- 1/12 # sampling rate, assuming monthly period.max<- 2^5 N = 6*period.max/delta scales<- 2^(0:5)[c(2,6)] #pick two scales scales ### scale, period, and frequency # freq=1/T; T=s/delta so freq = delta/s tmp <- NULL for(s in scales){ tmp <- cbind(tmp, data.gen.SW(nobs=N, freq = delta/s, A = 1, phi = 0, mu=0, sd = 0)$x) } x <- rowSums(data.frame(tmp)) plot.ts(cbind(tmp,x), type = 'l', main=NA)
Generate a two-regime threshold autoregressive (TAR) process.
data.gen.tar( nobs, ndim = 9, phi1 = c(0.6, -0.1), phi2 = c(-1.1, 0), theta = 0, d = 2, p = 2, noise = 0.1 )
data.gen.tar( nobs, ndim = 9, phi1 = c(0.6, -0.1), phi2 = c(-1.1, 0), theta = 0, d = 2, p = 2, noise = 0.1 )
nobs |
the data length to be generated |
ndim |
The number of potential predictors (default is 9) |
phi1 |
the coefficient vector of the lower-regime model |
phi2 |
the coefficient vector of the upper-regime model |
theta |
threshold |
d |
delay |
p |
maximum autoregressive order |
noise |
the white noise in the data |
The two-regime Threshold Autoregressive (TAR) model is given by the following formula:
where r is the threshold and d the delay.
A list of 2 elements: a vector of response (x), and a matrix of potential predictors (dp) with each column containing one potential predictor.
Cryer, J. D. and K.-S. Chan (2008). Time Series Analysis With Applications in R Second Edition Springer Science+ Business Media, LLC.
# TAR2 model from paper with total 9 dimensions data.tar<-data.gen.tar(500) plot.ts(cbind(data.tar$x,data.tar$dp))
# TAR2 model from paper with total 9 dimensions data.tar<-data.gen.tar(500) plot.ts(cbind(data.tar$x,data.tar$dp))
Generate predictor and response data from TAR1 model.
data.gen.tar1(nobs, ndim = 9, noise = 0.1)
data.gen.tar1(nobs, ndim = 9, noise = 0.1)
nobs |
The data length to be generated. |
ndim |
The number of potential predictors (default is 9). |
noise |
The white noise in the data |
A list of 2 elements: a vector of response (x), and a matrix of potential predictors (dp) with each column containing one potential predictor.
Sharma, A. (2000). Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 1 - A strategy for system predictor identification. Journal of Hydrology, 239(1-4), 232-239.
# TAR1 model from paper with total 9 dimensions data.tar1<-data.gen.tar1(500) plot.ts(cbind(data.tar1$x,data.tar1$dp))
# TAR1 model from paper with total 9 dimensions data.tar1<-data.gen.tar1(500) plot.ts(cbind(data.tar1$x,data.tar1$dp))
Generate predictor and response data from TAR2 model.
data.gen.tar2(nobs, ndim = 9, noise = 0.1)
data.gen.tar2(nobs, ndim = 9, noise = 0.1)
nobs |
The data length to be generated. |
ndim |
The number of potential predictors (default is 9). |
noise |
The white noise in the data |
A list of 2 elements: a vector of response (x), and a matrix of potential predictors (dp) with each column containing one potential predictor.
Sharma, A. (2000). Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 1 - A strategy for system predictor identification. Journal of Hydrology, 239(1-4), 232-239.
# TAR2 model from paper with total 9 dimensions data.tar2<-data.gen.tar2(500) plot.ts(cbind(data.tar2$x,data.tar2$dp))
# TAR2 model from paper with total 9 dimensions data.tar2<-data.gen.tar2(500) plot.ts(cbind(data.tar2$x,data.tar2$dp))
Generate correlated uniform variates
data.gen.unif(n, ndim = 9, r = 0.6, sigma, method = c("pearson", "spearman"))
data.gen.unif(n, ndim = 9, r = 0.6, sigma, method = c("pearson", "spearman"))
n |
The data length to be generated. |
ndim |
The number of potential predictors (default is 9). |
r |
The target correlation, default is 0.6. |
sigma |
A symmetric matrix of Pearson correlation, should be same as ndim. |
method |
The target correlation type, inluding Pearson and Spearman correlation. |
A matrix of correlated uniform variates
Schumann, E. (2009). Generating correlated uniform variates. COMISEF. http://comisef. wikidot. com/tutorial: correlateduniformvariates.