Vignette for R package robRatio

About this package

The functions contained in this package are originally prepared for ratio imputation for official statistics. The ratio model is one of the popular tools for imputation, as the model can accomodate heteroscedastic datasets without any data transformation. On the other hand, the heteroscedastic error term of the ratio model prevents robustification by means of M-estimation.

The functions for ratio models obtain robustness by extracting the homoscedastic part from the heteroscedastic residuals. Furthermore, the conventional ratio model is generalized by parameterizing the relationship between the error term and the explanatory variable. For details, refer to the next section.

Generalization and robustification of the ratio model

Obtain homoscedastic quasi-residuals for M-estimation

The conventional ratio model is \(y_i = \beta x_i + \epsilon_i, \; (i=1, \ldots, n)\), where \(x_i\) is an exlanatory variable and \(y_i\) is a dependent variable. The error term \(\epsilon_i\) is heteroscedastic.

For robustification, the heteroscedastic error term \(\epsilon \sim N(0, x_i\sigma^2)\) was reformulated with the homoscedastic error \(\varepsilon \sim N(0, \sigma^2)\) as in the linear regression model.

Since these error terms have the relation, \(\varepsilon_i = \epsilon_i / \sqrt(x_i)\) (Cochran, 1977), the ratio model can be expressed as \(y_i = \beta x_i + \varepsilon_i \sqrt(x_i)\). This re-formulation yields homoscedastic quasi-residuals of the ratio model, which is obtained by dividing the residuals by the square root of \(x_i\).

Generalization

By replacing the error term \(\varepsilon_i \sqrt(x_i)\) with \(\varepsilon_i x_i^{\gamma}\), the ratio model was generalized. The new parameter \(\gamma\) controls the relationship between the error term and the explanatory variables. The generalized ratio model and its estimation equation is as follows:

\[ y_i = \beta x_i + x_i^\gamma \varepsilon_i. \]

\[ \hat{\beta} = \frac{\sum^{n}_{i=1} y_i x_i^{1-2 \gamma}}{\sum^n_{i=1} x_i^{2(1-\gamma)}}, \]

The quasi-residual \(\check{r}_i\) is,

\[ \check{r}_i = \frac{y_i - \hat{\beta} x_i}{x_i^\gamma}. \]

Robustification

The robustified estimator for the generalized ratio model is derived by means of M-estimation as follows:

\[ \hat{\beta}_{rob} = \frac{\sum{w_i y_i x_i^{1-2\gamma}}}{\sum{w_i x_i^{2 (1-\gamma)}}}, \]

where \(w_i\) is obtained using a weight function for quasi residuals \(\check{r}\). The role of the weight function is to alleviate influence of the observations with large residuals. The following two are the most popular functions among them. One is Tukey’s biweight function,

\[ w_i = w \left(\frac{\check{r}_i}{\hat{\sigma}} \right) = w(e_i) = \left\{ \begin{array}{ll} \left[1-\left( e_i / c\right)^2\right]^2 & |e_i| \le c \\ 0 & |e_i| > c, \end{array} \right. \\ \]

and the other is Huber’s weight function,

\[ w_i = w \left(\frac{\check{r}_i}{\hat{\sigma}} \right) = w(e_i) = \left\{ \begin{array}{ll} 1 & |e_i| \le k \\ k/|e_i| & |e_i| > k. \end{array} \right. \\ \]

The standardised residuals \(e_i\) are quasi-residuals \(\check{r}_i\) divided by an estimated scale parameter \(\hat{\sigma}\). Average absolute deviation (AAD) or median absolute deviation (MAD) is used as the scale parameter, and the tuning constants are shown in the section “About scales of residuals” below. Quasi-residuals \(\check{r}_i\) based on the homoscedastic quasi-error term \(\varepsilon_i\) are obtained by

\[ \check{r}_i = \frac{y_i - \hat{\beta}_{rob} x_i}{x_i^\gamma}. \]

Estimation

The iterative re-weighted least squares (IRLS) algorithm is used for estimation regarding both models (i.e. regression model and ratio models). See Bienias et al.(1997) and other paper in the reference section.

Functions included in this package

This package contains two parent functions and an experimental function without any child functions.
The parent functions are robRatio() for ratio models and robReg() for multivarilate linear regression models.

Both robRatio() and robreg() have two child functions. One uses Tukey’s biweight function and the other, Huber’s weight function to calculating robust weight.

The experimental one is robGR(). It estimates two parameters of the generalized ratio model simultaneously.

The following table shows the relationship of each function with their models, weight functions, and scales of residuals.

Model	Parent function	Child function	Weight function	Scale for residuals	Gamma value
Generalized Ratio	robRatio	RrT.aad	Tukey’s biweight	AAD	Specified by user
Generalized Ratio	robRatio	RrT.mad	Tukey’s biweight	MAD	Specified by user
Generalized Ratio	robRatio	RrH.aad	Huber weight	AAD	Specified by user
Generalized Ratio	robRatio	RrH.mad	Huber weight	MAD	Specified by user
Regression	robReg	Tirls.aad	Tukey’s biweight	AAD	Not applicable
Regression	robReg	Tirls.mad	Tukey’s biweight	MAD	Not applicable
Regression	robReg	Hirls.aad	Huber weight	AAD	Not applicable
Regression	robReg	Hirls.mad	Huber weight	MAD	Not applicable
Generalized Ratio	robGR	-	Tukey’s biweight	AAD	Estimate simultaneously

About scales of residuals

The parent functions, robRatio() and robReg() has a tuning parameter tp. It accepts only 4, 6, or 8; however, actual figures of tuning costant differs depending on the weight function and scale of residuals.

The following table shows the relation of the tuning parameter tp of robRatio(), robReg(), and their child functions’ c1. The parameter tp of the parent functions is standardized based on Tukey’s biweight function with AAD scale so that users are not bothered by differences in weight functions and scales. If users would like to try tuning constants other than the given values (tp=4, 6, or 8), use the child functions (RrT(), RrH(), Tirls(), and Hirls()) directly and set the actual value of the tuning constant for c1. Please note that the return value of R’s mad() is standardized according to the SD(standard deviation), so choose the value for SD scale for c1 when using the MAD scale.

weight	scale	tp=4	tp=6	tp=8
Tukey	SD	c1=5.01	c1=7.52	c1=10.03
Tukey	AAD	c1=4	c1=6	c1=8
Tukey	MAD	c1=3.38	c1=5.07	c1=6.76
Huber	SD	c1=1.44	c1=2.16	c1=2.88
Huber	AAD	c1=1.15	c1=1.72	c1=2.30
Huber	MAD	c1=0.97	c1=1.46	c1=1.94

The IRLS algorithm is based on Bienias et al. (1997). The tuning parameter ‘tp’ should be set between tp=4(more robust) and tp=8(less robust). For the child functions’ ‘c1’, values within the range shown in the table above are expected but not limited.

Reference

Bienias, J. L., Lassman, D. M. Scheleur, S. A. and Hogan H. (1997) Improving Outlier Detection in Two Establishment Surveys. Statistical Data Editing 2 - Methods and Techniques. (UNSC and UNECE eds.), pp.76-83. URL: https://digitallibrary.un.org/record/240780?v=pdf

Cochran, W. G. (1977) Sampling Techniques, 3rd ed., John Wiley & Sons.

Wada, K (2012). Detection of Multivariate Outliers — Regression Imputation by the Iteratively Reweighted Least Squares | (in Japanese). Research Memoir of Official Statistics,Vol.69, pp.23–52. URL https://www.stat.go.jp/training/2kenkyu/ihou/69/pdf/2-2-692.pdf.

Wada, K, Noro, T (2019). Consideration on the influence of weight functions and the scale for robust regression estimator (in Japanese).” Research Memoir of Official Statistics, Vol.76, pp.101–114. URL https://www.stat.go.jp/training/2kenkyu/ihou/76/pdf/2-2-767.pdf.

Wada,K, Sakashita, K. and Tsubaki, H. (2021). Robust Estimation for a Generalised Ratio Model. Austrian Journal of Statistics, Vol.50, pp.74-–87. URL https://doi.org/10.17713/ajs.v50i1.994

Vignette for R package `robRatio`

WADA Kazumi