In summary, the sensitivity of a test measures the true accuracy of identification of having and not having IQ. The more equidistant between each ordinal ranking and the longer the elongation of the maxima and minima of a slope, the more evenly measured or discriminative ability the test is. A shape of the curve can be shifted up, left, down or right relative to the other groups rankings, indicating a lower floor and ceiling or higher floor and ceiling threshold with respect to when those correct responses are attained. A penalty weighting factor or schema can be applied with respect to each group to correct for the heteroskedastic property of the curve. An integral can measure the cumulative probability distribution. Equalizing results is only necessary with respect to the whole test if the cumulative bias among all test items result in a final distribution of great skew by assigning penalty transformer functions to some items or general weighting schemes with respect to all items. A gangster might have high specificity accuracy but an educated person might not, a women’s diet lexicon might be more advanced with more exposure earlier than a man’s. such shifts in the curve and the parametization of those latent variables with respect to the shape of the functions must be considered individually (I.e. exposure to items, consideration of concept space, insinuated meaning, differential preferences to the skewedness/shifts/centricity/kurtosis.

But the factorization of those outcome variables must be reducible to latent ability factors (i.e. verbal, spatial, mathematical) plus some specificity factors (I.e. culture, sexual dimorphism, group preferences). With the former being non trivial in the measurement of the final variable and the latter requiring judgement.

I could also say that an item displaying DIF in itself is not necessarily biased against groups. The difference is slight though. Bias means that the source, cause of the DIF is the group variable (gender, race). Bias is always bad, but not DIF. If a test is designed to be multidimensional (i.e., measuring two or more abilities) and your DIF method did not account for multidimensionality, then DIF is expected but group has nothing to do with the source of DIF.

I didn't know DeMars had a book. She published one of the most important papers on DIF. Each of these introduce very critical points and problems of DIF methods that are not widely known.

Most books on IRT seem to be overly autistic and essentially about presenting a lot of difficult math of models that mostly aren't used (e.g. 3PL and further variations), but this book is actually what it says, an accessible introduction.

Very cool. You might want to try just diving right in and writing the whole document in Rmarkdown, so you can get pretty formatted code and output and figues in an HTML output.

edited Feb 27, 2023A relevant post on cultural bias on IQ tests: https://thealternativehypothesis.org/index.php/2016/04/15/cultural-bias-on-iq-tests/

In summary, the sensitivity of a test measures the true accuracy of identification of having and not having IQ. The more equidistant between each ordinal ranking and the longer the elongation of the maxima and minima of a slope, the more evenly measured or discriminative ability the test is. A shape of the curve can be shifted up, left, down or right relative to the other groups rankings, indicating a lower floor and ceiling or higher floor and ceiling threshold with respect to when those correct responses are attained. A penalty weighting factor or schema can be applied with respect to each group to correct for the heteroskedastic property of the curve. An integral can measure the cumulative probability distribution. Equalizing results is only necessary with respect to the whole test if the cumulative bias among all test items result in a final distribution of great skew by assigning penalty transformer functions to some items or general weighting schemes with respect to all items. A gangster might have high specificity accuracy but an educated person might not, a women’s diet lexicon might be more advanced with more exposure earlier than a man’s. such shifts in the curve and the parametization of those latent variables with respect to the shape of the functions must be considered individually (I.e. exposure to items, consideration of concept space, insinuated meaning, differential preferences to the skewedness/shifts/centricity/kurtosis.

But the factorization of those outcome variables must be reducible to latent ability factors (i.e. verbal, spatial, mathematical) plus some specificity factors (I.e. culture, sexual dimorphism, group preferences). With the former being non trivial in the measurement of the final variable and the latter requiring judgement.

Good presentation.

I could also say that an item displaying DIF in itself is not necessarily biased against groups. The difference is slight though. Bias means that the source, cause of the DIF is the group variable (gender, race). Bias is always bad, but not DIF. If a test is designed to be multidimensional (i.e., measuring two or more abilities) and your DIF method did not account for multidimensionality, then DIF is expected but group has nothing to do with the source of DIF.

I didn't know DeMars had a book. She published one of the most important papers on DIF. Each of these introduce very critical points and problems of DIF methods that are not widely known.

Most books on IRT seem to be overly autistic and essentially about presenting a lot of difficult math of models that mostly aren't used (e.g. 3PL and further variations), but this book is actually what it says, an accessible introduction.

Very cool. You might want to try just diving right in and writing the whole document in Rmarkdown, so you can get pretty formatted code and output and figues in an HTML output.

Not compatible with WordPress and substack though. But the R markdown is of course public for the few nerds who want to look at the code.