Error in the previous version of FA_residuals()

May 03, 2016

In the spirit of reproducible science, this is a post about an error I fixed in a function that affects all prior analyses with that.

When factor analyzing data, the goal is to reveal a latent structure in the dataset. Given various assumptions, factor analysis will find a structure if there is one. It is possible that a given case does not fit the structure well, however. This case is said to be a structural outlier or mixed. I devised a few methods for measuring this phenomenon. One of them, mean absolute residual (MAR), consists simply of regressing the indicator scores on the factor scores. For each such regression, one then saves the residuals for each case. Then one calculates the mean of the absolute residuals by case. This is thus a measure of how well one can predict the indicator scores of a given case from its factor scores. Cases that are hard to predict like this do not fit the factor structure very well.

When I did this, I assumed that all the residuals have the same scale. This, however, was not the case. Their scale depends on the scale of the indicators, so if these vary, some indicators will be given more weight in the MAR score than others. This is probably unintended, altho perhaps one should weigh the residuals by the indicators' factor loadings.

When I discovered the error, I corrected it. The residuals are now standardized by default, but one can turn it off if desired (if one wants to weigh the indicators by their SDs).

Here's a reproducible example showing the error. First we load some useful data into the object d. Then:

> d_resids = FA_residuals(d)
> d_resids_unstd = FA_residuals(d, standardize = F)
> #examine SDs
> sapply(d_resids_unstd, sd)
 reason.4 reason.16 reason.17 reason.19  letter.7 letter.33 letter.34 letter.58 matrix.45 matrix.46 matrix.47 matrix.55 
0.7936387 0.8767075 0.8088255 0.8550311 0.8233802 0.8581720 0.8098030 0.8116359 0.9001512 0.8774535 0.8641818 0.9333102 
 rotate.3  rotate.4  rotate.6  rotate.8 
0.8355776 0.8047172 0.8189634 0.8591117 
> sapply(d_resids, sd)
 reason.4 reason.16 reason.17 reason.19  letter.7 letter.33 letter.34 letter.58 matrix.45 matrix.46 matrix.47 matrix.55 
        1         1         1         1         1         1         1         1         1         1         1         1 
 rotate.3  rotate.4  rotate.6  rotate.8 
        1         1         1         1

We see the problem. The residuals do not have the same SDs. The differences in this case are fairly small, but they need not be. Then we calculate the MAR scores and also the other mixedness scores. And finally we correlate them:

> v_MAR = apply(d_resids, MARGIN = 1, FUN = function(row) mean(abs(row)))
> v_MAR_unstd = apply(d_resids_unstd, MARGIN = 1, FUN = function(row) mean(abs(row)))
> #compare with the other mixedness measures
> d_mixed = FA_mixedness(d)
> #add unstd MAR
> d_mixed$MAR_unstd = v_MAR_unstd
> #correlate
> wtd.cors(d_mixed) %>% round(2)
            MAR   CFS  ACFS MeanALC MaxALC MAR_unstd
MAR        1.00  0.89 -0.31   -0.52  -0.32      1.00
CFS        0.89  1.00 -0.61   -0.70  -0.50      0.89
ACFS      -0.31 -0.61  1.00    0.23   0.20     -0.32
MeanALC   -0.52 -0.70  0.23    1.00   0.83     -0.52
MaxALC    -0.32 -0.50  0.20    0.83   1.00     -0.32
MAR_unstd  1.00  0.89 -0.32   -0.52  -0.32      1.00

In fact the correlations round to 1.00 in this case. Hopefully, they will be equally strongly correlated in other cases.

Just Emil Kirkegaard Things

Discussion about this post