Title: | Continuous Associated Kernel Estimation |
---|---|
Description: | Continuous smoothing of probability density function on a compact or semi-infinite support is performed using four continuous associated kernels: extended beta, gamma, lognormal and reciprocal inverse Gaussian. The cross-validation technique is also implemented for bandwidth selection. |
Authors: | W. E. Wansouwé, F. G. Libengué and C. C. Kokonendji |
Maintainer: | W. E. Wansouwé <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.1 |
Built: | 2025-02-14 03:40:01 UTC |
Source: | https://github.com/cran/Conake |
Continuous smoothing of probability density function defined on a compact or semi-infinite support
is performed using four continuous associated kernels: extended beta, gamma, lognormal and reciprocal inverse Gaussian. The cross-validation technique is also implemented to select the smoothing parameter.
The kernel estimator of
is defined as
where is one of the kernels defined below.
In practice, we first calculate the normalizing constant
where T is the support of the density function. This normalizing constant is not generally equal to 1. The estimated density is then .
Given a data sample, the Conake package allows to compute the density dke
using one of the four kernel functions: extended beta, gamma, lognormal and reciprocal inverse Gaussian. The bandwidth parameter is calculated using the cross-validation technique cvbw
.The kernel functions kef
are defined below.
The extended beta kernel is defined on with
,
and
:
where is the usual beta function with
,
and
denotes the indicator function of A. For
and
, the extended beta kernel corresponds to the beta kernel which is the probability density function of the beta distribution with shape parameters
and
; see Libengué (2013).
The gamma kernel is defined on with
and
:
where is the classical gamma function. It is the probability density function of the gamma distribution with scale parameter
and shape parameter
; see Chen (2000) and also Libengué (2013).
The lognormal kernel is defined on with
and
:
It is the probability densiy function of the classical lognormal distribution with mean and standard deviation
; see Igarashi and Kakizawa (2015) and also Libengué (2013).
The reciprocal inverse Gaussian kernel is defined on with
and
:
where .
It is the probability densiy function of the classical reciprocal inverse Gaussian distribution with mean
and standard deviation
; see Igarashi and Kakizawa (2015) and also Libengué (2013).
The cross-validation technique cvbw
is used for the bandwidth selection. The optimal parameter is the one which minimizes the cross-validation function defined by:
where is the density estimator computed without the observation
.
W. E. Wansouwé, F.G. Libengué and C. C. Kokonendji
Maintainer: W. E. Wansouwé <[email protected]>
Chen, S. X. (1999). Beta kernels estimators for density functions, Computational Statistics and Data Analysis 31, 131 - 145.
Chen, S. X. (2000). Gamma kernels estimators for density functions, Annals of the Institute of Statistical Mathematics 52, 471 - 480.
Libengué, F.G. (2013). Méthode Non-Paramétrique par Noyaux Associés Mixtes et Applications, Ph.D. Thesis Manuscript (in French) to Université de Franche-Comté, Besançon, France and Université de Ouagadougou, Burkina Faso, June 2013, LMB no. 14334, Besançon.
Igarashi, G. and Kakizawa, Y. (2015). Bias correction for some asymmetric kernel estimators, Journal of Statistical Planning and Inference 159, 37 - 63.
For a sample, the function gives automatically the result of computations of the normalizing constant and the smoothing parameter. One can then plot the histogram.
Conakereport(Vec, ker, h = NULL, a = 0, b = 1)
Conakereport(Vec, ker, h = NULL, a = 0, b = 1)
Vec |
The sample of data. |
ker |
The kernel function: |
h |
The bandwidth or smoothing parameter. |
a |
The left bound of the support used for extended beta kernel. Default value is 0 for beta kernel. |
b |
The right bound of the support used for extended beta kernel. Default value is 1 for beta kernel. |
Returns a list containing:
h_n |
The bandwith parameter used to compute f_n |
C_n |
The normalizing constant |
W. E. Wansouwé, F.G. Libengué and C. C. Kokonendji
Libengué, F.G. (2013). Méthode Non-Paramétrique par Noyaux Associés Mixtes et Applications, Ph.D. Thesis Manuscript (in French) to Université de Franche-Comté, Besançon, France and Université de Ouagadougou, Burkina Faso, June 2013, LMB no. 14334, Besançon.
## Data can be simulated data or real data ## We use simulate data Vec<-rgamma(100,1.5,2.6) ## Not run: Conakereport(V,ker="GA") ## End(Not run)
## Data can be simulated data or real data ## We use simulate data Vec<-rgamma(100,1.5,2.6) ## Not run: Conakereport(V,ker="GA") ## End(Not run)
The function allows to calculate the optimal bandwidth using the cross-validation method. Four kernels are available: extended beta, gamma, lognormal and reciprocal inverse Gaussian kernels.
cvbw(Vec, bw = NULL, ker,a=0,b=1)
cvbw(Vec, bw = NULL, ker,a=0,b=1)
Vec |
The sample data. |
bw |
The sequence of bandwidths where the cross-validation is computed. If NULL, the procedure defines a sequence of bandwidths. |
ker |
The associated kernel: "BE" extended beta, "GA" gamma, "LN" lognormal and "RIG" reciprocal inverse Gaussian. |
a |
The left bound of the support used for extended beta kernel. Default value is 0 for beta kernel. |
b |
The right bound of the support used for extended beta kernel. Default value is 1 for beta kernel. |
The selection of the bandwidth parameter is crucial. If the bandwidth is small, we will obtain an undersmoothed estimator, with high variability. On the contrary, if the value is large, the resulting estimator will be very smoothed and farther from the function that we are trying to estimate. See Libengué (2013).
Returns a list containing:
hcv |
The optimal bandwidth obtained by cross-validation. |
CV |
The values of the cross-validation function in the sequence of bandwidths. |
bw |
The sequence of bandwidths used. |
W. E. Wansouwé, F.G. Libengué and C. C. Kokonendji
Libengué, F.G. (2013). Méthode Non-Paramétrique par Noyaux Associés Mixtes et Applications, Ph.D. Thesis Manuscript (in French) to Université de Franche-Comté, Besançon, France and Université de Ouagadougou, Burkina Faso, June 2013, LMB no. 14334, Besançon.
## Data can be simulated data or real data ## We use simulate data ## and then compute the cross validation. Vec<-rgamma(100,1.5,2.6) ## Not run: CV<-cvbw(Vec,ker="GA") CV$hcv ## End(Not run)
## Data can be simulated data or real data ## We use simulate data ## and then compute the cross validation. Vec<-rgamma(100,1.5,2.6) ## Not run: CV<-cvbw(Vec,ker="GA") CV$hcv ## End(Not run)
The function estimates the density in a single value or in a grid using discrete associated kernels. Four different associated kernels are available: extended beta, gamma, lognormal and reciprocal inverse Gaussian.
dke(vec_data, ker, bw, x = NULL,a=0,b=1)
dke(vec_data, ker, bw, x = NULL,a=0,b=1)
vec_data |
The data sample. |
ker |
The associated kernel: "BE" extended beta, "GA" gamma, "LN" lognormal and "RIG" reciprocal inverse Gaussian. |
bw |
The bandwidth or smoothing parameter. |
x |
The single value or grid where estimation is computed |
a |
The left bound of the support used for extended beta kernel. Default value is 0 for beta kernel. |
b |
The right bound of the support used for extended beta kernel. Default value is 1 for beta kernel. |
The kernel estimator of
is defined in the above sections.
We recall that in general, the sum of the estimated values on the support is not equal to 1. In practice, we calculate the normalizing constant
before computing the estimated density
; see Libengué (2013) .
The bandwidth parameter in the function is obtained using the cross-validation technique for the four kernels.
Returns a list containing:
C_n |
The normalizing constant. |
f_n |
The values of the estimated function |
W. E. Wansouwé, F.G. Libengué and C. C. Kokonendji
Libengué, F.G. (2013). Méthode Non-Paramétrique par Noyaux Associés Mixtes et Applications, Ph.D. Thesis Manuscript (in French) to Université de Franche-Comté, Besançon, France and Université de Ouagadougou, Burkina Faso, June 2013, LMB no. 14334, Besançon.
## A sample data with n=100. V<-rgamma(100,1.5,2.6) ##The bandwidth can be the one obtained by cross validation. h<-0.052 ## We choose Gamma kernel. est<-dke(V,"GA",h) est$f_n
## A sample data with n=100. V<-rgamma(100,1.5,2.6) ##The bandwidth can be the one obtained by cross validation. h<-0.052 ## We choose Gamma kernel. est<-dke(V,"GA",h) est$f_n
This function computes the discrete associated kernel function; see Chen (1999) and also Chen (2000).
kef(x, t, h, ker, a = 0, b = 1)
kef(x, t, h, ker, a = 0, b = 1)
x |
The target. |
t |
A single value or the grid where the continuous associated kernel function is computed. |
h |
The bandwidth or smoothing parameter. |
ker |
The associated kernel: "BE" extended beta, "GA" gamma, "LN" lognormal and "RIG" reciprocal inverse Gaussian. |
a |
The left bound of the support used for extended beta kernel. Default value is 0 for beta kernel. |
b |
The right bound of the support used for extended beta kernel. Default value is 1 for beta kernel. |
The associated kernel is one of the four which have been defined in the sections above : extended beta, gamma, lognormal and reciprocal inverse Gaussian; see Igarashi and Kakizawa (2015) and also Libengué (2013).
Returns the value of the discrete associated kernel function at t according to the target and the bandwidth.
W. E. Wansouwé, F.G. Libengué and C. C. Kokonendji
Chen, S. X. (1999). Beta kernels estimators for density functions, Computational Statistics and Data Analysis 31, 131 - 145.
Chen, S. X. (2000). Gamma kernels estimators for density functions, Annals of the Institute of Statistical Mathematics 52, 471 - 480.
Libengué, F.G. (2013).Méthode Non-Paramétrique par Noyaux Associés Mixtes et Applications, Ph.D. Thesis Manuscript (in French) to Université de Franche-Comté, Besançon, France and Université de Ouagadougou, Burkina Faso, June 2013, LMB no. 14334, Besançon.
Igarashi, G. and Kakizawa, Y. (2015). Bias correction for some asymmetric kernel estimators, Journal of Statistical Planning and Inference 159, 37 - 63.
x<-4 h<-0.1 t<-0:10 kef(x,t,h,"GA")
x<-4 h<-0.1 t<-0:10 kef(x,t,h,"GA")
This function computes the Simpson method to calculate an integral.
simp_int(x, fx, n.pts = 256, ret = FALSE)
simp_int(x, fx, n.pts = 256, ret = FALSE)
x |
The vector where the integral is computed |
fx |
The function to integrate |
n.pts |
The number of points used to compute the integral through the Simpson technique. |
ret |
A boolean control parameter. Default value is FALSE. |
Returns the value of the integral.
W. E. Wansouwé, F.G. Libengué and C. C. Kokonendji
Vec=rgamma(100,1.5,2.6) x=seq(min(Vec),max(Vec),length.out=100) simp_int(x,dgamma(x,1.5,2.6))
Vec=rgamma(100,1.5,2.6) x=seq(min(Vec),max(Vec),length.out=100) simp_int(x,dgamma(x,1.5,2.6))