Package 'PlotNormTest'

Title: Graphical Univariate/Multivariate Assessments for Normality Assumption
Description: Graphical methods testing multivariate normality assumption. Methods including assessing score function, and cumulant generating functions, independent transformations and linear transformations.
Authors: Huong Tran [aut, cre], Ravindra Khattree [aut]
Maintainer: Huong Tran <[email protected]>
License: MIT + file LICENSE
Version: 1.0.1
Built: 2025-03-26 06:50:08 UTC
Source: https://github.com/huongtran53/plotnormtest

Help Index


Linear combinations of distinct derivatives of empirical cumulant generating function (CGF).

Description

Linear combination of third/fourth derivatives of CGF gives an asymptotically univariate Gaussian process with mean 0 and covariance between two points tRpt \in \mathbb{R}^p and sRps \in \mathbb{R}^p is defined. We consider vector tt and ss as the form t=t1pt = t^*1_p and s=s1ps = s^*1_p.

Usage

mt3_covLtLs(l, p, bigt = seq(-1, 1, 0.05)/sqrt(p), sTtTs = NULL, seed = 1)

mt4_covLtLs(l, p, bigt = seq(-1, 1, 0.05)/sqrt(p), sTtTs = NULL, seed = 1)

Arguments

l

vector of linear combination of size equal to the number of distinct derivatives, see l_dhCGF().

p

dimension of multivariate random vector which data are collected.

bigt

array of value tt^* and ss^*.

sTtTs

Covariance matrix of derivatives vector, see covTtTs(). Default is NULL, when the algorithm will call mt3_covTtTs() or mt4_covTtTs().

seed

Random seed to get the estimate of the supremum of the univariate Gaussian process obtained from the linear combination.

Value

  • sLtLs covariance matrix of the linear combination of distinct derivatives, which is a zero-mean Gaussian process.

  • m.supLt Monte-Carlo estimates of supremum of this Gaussian process

mt3_covLtLs returns values related to the use of third derivatives. mt4_covLtLs returns values related to the use of fourth derivatives.

Examples

bigt <- seq(-1, 1, .4)
p <- 3
# Third derivatives
lT3 <- l_dhCGF(p)[[1]]
l3 <- rep(1/sqrt(lT3), lT3)
mt3_covLtLs(l = l3, p = p, bigt = bigt/sqrt(p), seed = 1)
#fourth derivatives
lT4 <- l_dhCGF(p)[[2]]
l4 <- rep(1/sqrt(lT4), lT4)
mt4_covLtLs(l = l4, p = p, bigt = bigt/sqrt(p), seed = 1)

Covariance matrix of derivatives of sample cumulant generating function (CGF).

Description

Stacking third/fourth derivatives of sample CGF together to obtain a vector, which (under normality assumption on data) approaches a normally distributed vector with zero mean and a covariance matrix. More specifically, covTsTs computes covariance between any two points as the form t=t1pt = t^*1_p and s=s1ps = s^*1_p.

Usage

mt3_covTtTs(bigt, p = 1, pos.matrix = NULL)

mt4_covTtTs(bigt, p = 1, pos.matrix = NULL)

Arguments

bigt

array contains value of tt^*.

p

dimension of multivariate random vector which data are collected.

pos.matrix

matrix containing information of position of any derivatives. Default is NULL, the function will call mt3_pos() or mt4_pos().

Details

Number of distinct third derivatives is lT3=p+2×(p2)+(p3)l_{T_3}= p + 2 \times \begin{pmatrix} p\\2 \end{pmatrix} + \begin{pmatrix} p \\ 3 \end{pmatrix} Number of distinct fourth derivatives is lT4=p+3×(p2)+3×(p3)+(p4)l_{T_4} = p + 3 \times \begin{pmatrix} p\\2 \end{pmatrix} + 3 \times \begin{pmatrix} p \\ 3 \end{pmatrix} + \begin{pmatrix} p \\ 4 \end{pmatrix} For each pairs of (t,s)(t^*, s^*), covTsTt results a covariance matrix of size lT3×lT3l_{T_3} \times l_{T_3} or lT4×lT4l_{T_4} \times l_{T_4}.

Value

A 2 dimensional upper triangular array, with size equals to length of bigt. Each element contains a covariance matrix of derivatives sequences between any two points t=t1pt = t^* 1_p and s=s1ps = s^*1_p. mt3_covTsTt returns the resulting third derivatives.

mt4_covTsTt returns the resulting forth derivatives.

Examples

bigt <- seq(-1, 1, .4)
p <- 2
# Third derivatives
mt3_pos.matrix <- mt3_pos(p)
sTsTt3 <- mt3_covTtTs(bigt = bigt, p = p, pos.matrix = mt3_pos.matrix)
dim(sTsTt3)
sTsTt3[1:5, 1:5]
# Fourth derivatives
mt4_pos.matrix <- mt4_pos(p)
sTsTt4 <- mt4_covTtTs(bigt = bigt, p = p, pos.matrix = mt4_pos.matrix)
dim(sTsTt4)
sTsTt4[1:5, 1:5]

Covariance matrix of derivatives of sample moment generating function (MGF).

Description

Stacking derivatives upto the third/fourth orders of sample MGF together to obtain a vector, which (under normality assumption) approaches a multivariate normally distributed vector with zero mean and a covariance matrix. covZtZs calculates covariance between any two points tt and ss in Rp\mathbb{R}^p.

Usage

mt3_covZtZs(t, s, pos.matrix = NULL)

mt4_covZtZs(t, s, pos.matrix = NULL)

Arguments

t, s

a vector of length pp.

pos.matrix

matrix contains information of positions of derivatives. Default is NULL, where the function will call mt3_pos() or mt4_pos().

Value

mt3_covZtZs Covariance matrix relating to the use of third derivatives.

mt4_covZtZs Covariance matrix relating to the use of fourth derivatives. This also contains information on the third third derivatives mt3_covZtZs.

Examples

set.seed(1)
p <- 3
x <- MASS::mvrnorm(100, rep(0, p), diag(p))
t <- rep(0.2, p)
s <- rep(-.3, p)
# Using third derivatives
pos.matrix3 <- mt3_pos(p)
sZtZs3 <- mt3_covZtZs(t, s, pos.matrix = pos.matrix3)
dim(sZtZs3)
sZtZs3[1:5, 1:5]
# Using fourth derivatives
sZtZs4 <- mt4_covZtZs(t, s)
dim(sZtZs4)
sZtZs4[1:5, 1:5]

Calculation of derivatives of empirical cumulant generating function (CGF).

Description

Get the third/fortth derivatives of sample CGF at a given point.

Usage

d3hCGF(myt, x)

d4hCGF(myt, x)

l_dhCGF(p)

dhCGF1D(t, x)

Arguments

myt, t

numeric vector of length p.

x

data matrix.

p

Dimension.

Details

Estimator of standardized cumulant function is

logM^X(t)=log(1ni=1nexp(tS12(XiXˉ)))\log\hat{M}_X(t) = \log \left(\dfrac{1}{n} \sum_{i = 1}^n \exp(t'S^{\frac{-1}{2}}(X_i - \bar{X})) \right)

and its

kthk^{th}

order derivatives is defined as

Tk(t)=ktj1tj2tjklog(M^X(t)),tRpT_k(t) = \dfrac{\partial^k}{ \partial t_{j_1}t_{j_2} \dots t_{j_k}} \log(\hat{M}_X(t)), t \in \mathbb{R}^p

where tj1tj2tjkt_{j_1}t_{j_2} \dots t_{j_k} are the corresponding components of vector tRpt \in \mathbb{R}^p.

Value

d3hCGF returns the sequence of third derivatives of empirical CGF, ordered by index of j1j2j3pj_1 \leq j_2 \leq j_3 \leq p.

d4hCGF returns the sequence of fourth derivatives of empirical CGF ordered by index of j1j2j3j4pj_1 \leq j_2 \leq j_3 \leq j_4 \leq p.

l_dhCGF returns number of distinct third and fourth derivatives.

dhCGF1D returns third/fourth derivatives of univariate empirical CGF, which are d3hCGF and d4hCGF when p=1p = 1.

Examples

p <- 3
# Number of distinct derivatives
l_dhCGF(p)
set.seed(1)
x <- MASS::mvrnorm(100, rep(0, p), diag(p))
myt <- rep(.2, p)
d3hCGF(myt = myt, x = x)
d4hCGF(myt = myt, x = x)
#Univariate data
set.seed(1)
x <- rnorm(100)
t <- .3
dhCGF1D(t, x)

Moment generating functions (MGF) of standard normal distribution.

Description

Get the polynomial term in the expression of derivatives of moment generating function of Np(0,Ip)N_p(0, I_p), with respect to a given component and its exponent. Up to eighth order.

Usage

dMGF(tab, t, coef = TRUE)

Arguments

tab

a dataframe with the first column contain indices of components of a multivariate random vector X\bold{X}, and the second column is the order derivatives with respect to that components.

t

vector in Rp\mathbb{R}^p.

coef

take TRUE or FALSE value to obtain only polynomial or whole expression by multiplying the polynomial term with the exponent term exp(.5tt)\exp(.5 t't).

Details

For a standard multivariate normal random variables YNp(0,Ip)Y \sim N_p(0, I_p)

E(Y1k1...Ypkpexp(tX))=k1kpt1k1tpkpexp(tt/2)=μ(k1)(t1)...μ(kp)(tp)exp(tt/2)\mathbb{E}\left(Y_1^{k_1} ... Y_p^{k_p} \exp(t'X)\right) = \dfrac{\partial^{k_1}\dots \partial^{k_p}}{t_1^{k_1} \dots t_p^{k_p}} \exp(t't/2) = \mu^{(k_1)} (t_1) ... \mu^{(k_p)}(t_p) \exp(t't/2)

For example, EY24exp(tY)=4t24exp(tt/2)=μ(4)(t2)exp(tt/2).\mathbb{E}Y_2^4 \exp(t'Y) = \dfrac{\partial^4}{\partial t_2^4} \exp(t't/2) = \mu^{(4)}(t_2) \exp(t't/2).

Value

Value of derivatives.

Examples

#Calculation of above example
t <- rep(.2, 7)
tab <- data.frame(j = 2, exponent = 4)
dMGF(tab, t = t)
dMGF(tab, t = t, coef = FALSE)

Get parameters for plots derivatives of multivariate CGF to assess normality assumption.

Description

Obtain necessary parameters to build a graphical test using the third/fourth derivatives of cumulant generating function.

Usage

mt3_get_param(p, bigt = seq(-1, 1, by = 0.05)/sqrt(p), l = NULL)

mt4_get_param(p, bigt = seq(-1, 1, by = 0.05)/sqrt(p), l = NULL)

Arguments

p

Dimension.

bigt

Array containing value of tt^*.

l

Linear transformation of vector of third/fourth distinct derivatives, default is their average.

Value

  • p Dimension.

  • lT Number of distinct third/fourth order derivatives.

  • sTtTs Two dimensional array, each element contains covariance matrix of vector of derivatives, the function called mt3_covTtTs(), or mt4_covTtTs().

  • l.sTtTs Covariance matrix of linear combination of distinct derivatives, the function called mt3_covLtLs(), or mt4_covLtLs().

  • m.supLT The Monte Carlo estimate of expected value supremum of the Gaussian process, see covLtLs().

mt3_get_param returns necessary parameters for the 2D plot relying on third derivatives. mt4_get_param returns necessary parameters for the 2D plot relying on fourth derivatives.

See Also

covZtZs(), covLtLs(), covTtTs()

Examples

p <- 2
mt3 <- mt3_get_param(p, bigt = seq(-1, 1, .4)/sqrt(p))
names(mt3)
mt4 <- mt4_get_param(p, bigt = seq(-1, 1, .4)/sqrt(p))
names(mt4)

Transformation to Independent Univariate Sample

Description

Leave-one-out method gives approximately independent sample of standard multivariate normal distribution, which then produces sample of standard univariate normal distribution.

Usage

Multi.to.Uni(x)

Arguments

x

multivariate data matrix

Details

Let Xˉk\bar{X}_{-k} and SkS_{-k} are the sample mean sample variance covariance matrix obtained by using all but kthk^{th} data point. Then Sk1/2(XkXˉk),k=1,...nS_{-k}^{-1/2} (X_k - \bar{X}_{-k}) , k = 1,... n are approximately independently distributed as Np(0,I)N_p(0, I). Thus all n×pn \times p entries in the data matrix so constructed can be treated as univariate samples of size n\timepn \time p from N(0,1)N(0, 1).

Value

Data frame contains univariate data and the index from multivariate data.

Examples

set.seed(1)
x <- MASS::mvrnorm(100, mu = rep(0, 5), diag(5))
df <- Multi.to.Uni(x)
qqnorm(df$x.new); abline(0, 1)

Best Linear Transformations

Description

The algorithm uses gradient descent algorithm to obtain the maximum of the square of sample skewness, of the kurtosis or of their average under any univariate linear transformation of the multivariate data.

Usage

linear_transform(
  x,
  l0 = rep(1, ncol(x)),
  method = "both",
  epsilon = 1e-10,
  iter = 5000,
  stepsize = 0.001
)

Arguments

x

multivariate data matrix.

l0

starting point for projection algorithm, default is rep(1, ncol(x)).

method

character strings, one of c("skewness", "kurtosis", "both").

epsilon

bounds on error of optimal solution, default is 1e-10.

iter

number of iteration of projection algorithm, default is 5000.

stepsize

gradient descent stepsize, default is .001.

Value

  • max_result: The maximum value after linear transformation.

  • x_uni: Univariate data after transformation.

  • vector_k: Vector of the "best" linear transformation.

  • error: Error of projection algorithm.

  • iteration: Number of iteration.

See Also

skewness(), kurtosis()

Examples

set.seed(1)
x <- MASS::mvrnorm(100, mu = rep(0, 2), diag(2))
linear_transform(x, method = "skewness")$max_result
linear_transform(x, method = "kurtosis")$max_result
linear_transform(x, method = "both")$max_result

From derivatives of MGF to derivatives of CGF.

Description

Taylor expansion implies that vectors of derivatives of log(M^X(t))\log(\hat{M}_X(t)) can be approximated by a linear combination of vectors of derivatives of M^X(t)\hat{M}_X(t). matrix_A results the corresponding linear combinations.

Usage

mt3_matrix_A(t)

mt4_matrix_A(t)

Arguments

t

vector of Rp\mathbb{R}^p

Value

mt3_matrix_A returns coefficient matrix relating to the use of third derivatives.

mt4_matrix_A returns coefficient matrix relating to the use of fourth derivatives.

Examples

p <- 3
t <- rep(.2, p)
A3 <- mt3_matrix_A(t)
dim(A3)
A3[1:5, 1:5]
A4 <- mt4_matrix_A(t)
dim(A4)
A4[1:5, 1:5]

Derivatives of empirical moment generating function (MGF).

Description

Given dimension pp, returns a dataframe containing the position of all derivatives of estimator of moment generating function M^X(t)\hat{M}_X(t), upto third/fourth order.

Usage

mt3_rev_pos(j1, j2, j3, p)

mt3_pos(p)

mt4_pos(p)

Arguments

j1

Index of the first variables

j2

Index of the first variables, should be at least j1

j3

Index of the first variables, should be at least j2

p

Dimension

Details

The estimator of multivariate moment generating function is M^X(t)=1ni=1nexp(tXi)\hat{M}_X(t) = \dfrac{1}{n} \sum_{i = 1}^n \exp(t'X_i) The chain containing all derivatives up to the third order is

Z=(M^,M^001,M^00p,M^011,M^012,M^0pp,M^111,M^112,M^ppp)Z = \bigg(\hat{M}, \hat{M}^{001}, \dots \hat{M}^{00p}, \hat{M}^{011}, \hat{M}^{012}, \dots \hat{M}^{0pp}, \hat{M}^{111}, \hat{M}^{112}, \dots \hat{M}^{ppp}\bigg)'

and

M^=M^000(t)=M^X(t)\hat{M} = \hat{M}^{000}(t)= \hat{M}_X(t)

M^j1j2j3(t)=ktj1tj2tj3M^(t)\hat{M}^{j_1j_2j_3}(t) = \dfrac{\partial^k}{\partial t_{j_1} t_{j_2} t_{j_3}} \hat{M}(t)

where kk is the number of j1,j2,j3j_1, j_2, j_3 different from 0. Similar notation is applied when fourth derivatives is used.

Value

mt3_rev_pos returns the position of this particular derivative in the chain of all derivatives, up to third order.

mt3_pos an array contaning all position with respect to index of j1,j2,j3j_1, j_2, j_3.

mt4_pos an array contaning all position with respect to the index of j1,j2,j3,j4j_1, j_2, j_3, j_4.

Examples

mt3_rev_pos(1, 2, 2, p = 3)
p <- 3
mt3_pos(p)
mt4_pos(p)

Graphical plots to assess multivariate normality assumption.

Description

Cumulant generating functions of normally distributed random variables has derivatives of order higher than 3 are all 0. Hence, plots of empirical third/fourth order derivatives with large value or high slope gives indication of non-normality. Multivariate_CGF_PLot estimates and provides confidence region for average (or any linear combination) of third/fourth derivatives of empirical cumulant function at the points t=t1pt = t^*1_p. Plots for p=2,3,,10p = 2, 3, \dots, 10 will be faster to obtain, as confidence regions and other necessary parameters are available in mt3_lst_param.rda and mt4_lst_param.rda. Higher dimension requires expensive computational cost.

Usage

d3hCGF_plot(x, alpha = 0.05)

d4hCGF_plot(x, alpha = 0.05)

Arguments

x

Data matrix of size n×pn \times p

alpha

Significant level (default is .05.05)

Value

d3hCGF_plot returns plot relying in third derivatives.

d4hCGF_plot returns plot relying in forth derivatives.

See Also

dhCGF_plot1D()

Examples

set.seed(1234)
p <- 3
x <- MASS::mvrnorm(500, rep(0, p), diag(p))
d3hCGF_plot(x)
d4hCGF_plot(x)

Sample skewness and Sample Kurtosis.

Description

Sample skewness and Sample Kurtosis.

Usage

kurtosis(x)

skewness(x)

Arguments

x

univariate data sample

Details

Sample kurtosis is

κ^4=1n1i=1n(XiXˉS)4.\hat{\kappa}_4 = \dfrac{1}{n-1} \sum_{i = 1}^n \left(\dfrac{X_i - \bar{X}}{S}\right)^4.

Sample skewness is

κ^3=1n1i=1n(XiXˉS)3.\hat{\kappa}_3 = \dfrac{1}{n-1} \sum_{i = 1}^n \left(\dfrac{X_i - \bar{X}}{S}\right)^3.

Value

kurtosis returns sample kurtosis.

skewness returns sample skewness.

Examples

set.seed(123)
y <- rnorm(100)
kurtosis(y)
set.seed(123)
x <- rnorm(100)
skewness(x)

Graphical plots to assess multivariate univarite assumption of data.

Description

Plots the empirical third/fourth derivatives of cumulant generating function together with confidence probability region. Indication of non-normality is either violation of probability bands or curves with high slope.

Usage

dhCGF_plot1D(x, alpha = 0.05, method)

Arguments

x

Univariate data

alpha

Significant level (default is .05.05)

method

string, "T3" used the third derivatives, and "T4" uses the fourth derivatives.

Value

Plots

References

Ghosh S (1996). “A New Graphical Tool to Detect Non-Normality.” Journal of the Royal Statistical Society: Series B (Methodological), 58(4), 691-702. doi:10.1111/j.2517-6161.1996.tb02108.x.

Examples

set.seed(123)
x <- rnorm(100)
dhCGF_plot1D(x, method = "T3")
dhCGF_plot1D(x, method = "T4")

Graphical plots to assess the univarite noramality assumption of data.

Description

Score function of a univariate normal distribution is a straight line. A non-linear graph of score function estimator shows evidence of non-normality.

Outliers are detected using the 2-sigma bands method.

Usage

cox(x, P = NULL, lambda = 0.5, x.dist = NULL)

score_plot1D(x, P = NULL, lambda = 0.5, x.dist = NULL, ori.index = NULL)

Arguments

x

univariate data.

P

vector of weight.

lambda

smoothing parameter, default is 0.50.5.

x.dist

the minimum distance between two data points in vector x.

ori.index

original index of vector x, default is NULL when index is just the order.

Details

To avoid the singularity of coefficient matrices in spline method, points with distance less than x.dist are merged and weight of the representative points is updated by the summation of weight of discarded points.

Under null hypothesis, a unbiased estimator score function of a given data point xkx_k is

ψ^(xk)=n4n2xkXˉkSk2\hat{\psi}(x_k) = \dfrac{n - 4}{n - 2} \dfrac{x_k - \bar{X}_{-k}}{S_{-k}^2}

and if aka_{k} is the estimate score from function cox at the point xkx_k, then

akψ^(xk)±2Var^(ψ^(xk)).a_k\in \hat{\psi}(x_k) \pm 2 \sqrt{\hat{\text{Var}}(\hat{\psi}(x_k))}.

Hence points outside the 2-sigma bands are outliers.

Value

cox returns the estimate of score function.

  • x: The updated univariate data if merging happens.

  • a: Score value estimated at x.

  • P: Updated weight (if merging happens).

  • slt: Index of merged data point (is NULL if x.dist = NULL).

score_plot1D returns score functions together with 2-sigma bands for outlier detection.

  • plot: plot of estimate score function and its band.

  • outlier: index of outliers.

References

Ng PT (1994). “Smoothing Spline Score Estimation.” SIAM Journal on Scientific Computing, 15(5), 1003-1025. doi:10.1137/0915061, https://doi.org/10.1137/0915061.

Examples

set.seed(1)
x <- rnorm(100, 2, 4)
re <- cox(sort(x))
plot(re$x, re$a, xlab = "x", ylab = "Estimated Score",
 main = "Estimator of score function")
abline(0, 1)

set.seed(1)
x <- rnorm(100, 2, 4)
score_plot1D(sort(x))