Oral
Asamoah, N.A., Turner, R.C., Lo, W., Crawford, B.L., & Jozkowski, K.N. (2023 April 12 - April 15). Evaluating Rasch tree purification to improve DIF detection in unbalanced item conditions. Presentation at the National Council on Measurement in Education Annual Meeting, Chicago, IL.
Introduction
Rasch trees for differential item functioning (DIF) provide a strong advantage over more common pairwise comparison methods because multiple covariates can be investigated without the pre-identification of subgroups and the interaction of the covariates can be investigated (Strobl et al., 2015; Vaughn & Wang, 2010).
Initially, the procedure was used as a global DIF method in which specific items were not flagged as being responsible for the measurement invariance. In 2022, Henninger et al. recommended the addition of a Mantel-Haenszel (MH) effect size criteria to provide a mechanism for identifying specific DIF items within Rasch tree splits. A follow-up study (author, 2022) was conducted using MH effect size heuristics with Rasch trees under varying conditions of DIF item imbalance (e.g., completely unbalanced, slightly unbalanced, balanced). This study also used the complete tree solution as the outcome rather than the correct number of DIF items flagged. Results indicated that for most conditions the percentage of Rasch tree splits was high but the percentage of correct DIF item identification was low. In this study, we are building on prior results by investigating the effectiveness of purification in increasing the rates of true DIF identification solutions. We address the following research questions:
1. How effective is an iterative purification procedure in increasing the proportion of solutions that only include true DIF detection?
2. Does the effectiveness of the purification procedure differ by level of DIF contamination balance/imbalance, percentage of DIF contamination, DIF magnitude, sample size, and number of items?
Methods
A simulation study was conducted to investigate the research questions of interest. Five conditions were manipulated: number of items (10, 20), total sample size (400, 800, 1200), magnitude of DIF ( = 0, .21, .43, .64, .85; Henninger et al., 2022), percentage of DIF (10%, 20%, 30%), and level of DIF balance (completely unbalanced – 100% vs 0%; slightly unbalanced – 67% vs 33% or 75% vs 25%; balanced – 50% vs 50%). Conditions that were fixed included the use of dichotomous outcomes, dichotomous covariates, and equal sample sizes. Item parameter difficulty and participant ability distributions were ~N(0,1).
An iterative purification procedure was used to select an uncontaminated anchor set and then retest prior DIF-identified items with an updated anchor set until a solution is reached or no solution could be identified. During the purification process, three outcomes were retained to determine if purification increased the total correct solution rate: correct solution, incorrect solution, inconclusive. The inconclusive classification was recorded when a final solution could not be obtained using the purification process provided.
Abbreviated Findings
For the abbreviated findings, we focus on items simulated with DIF effect sizes of = .64 and .85 which represent DIF approximating large magnitudes based on = 1.5 and 2.0 criteria. In the pre-purification stage, applying Rasch trees and using the = 1.5 criterion along with statistical testing resulted in high DIF detection rates with the large DIF conditions. The percentage of Rasch tree solutions identifying a significant split was at least .69 for 10 items and .88 or higher for 20 items. However, when analyzing the data to determine if only the correct items were flagged for DIF in the large DIF conditions, the true detection rates dropped to a range of .05 to .98 for the varying 10 item conditions versus a range of .001 to .96 for the 20 item conditions. The results indicated that percentage of DIF contamination, proportional balance of DIF items favoring different groups, and sample size played a large role in correct DIF detection with the lowest accuracy rates occurring with small samples, 30% DIF contamination, and when DIF contamination was completely unbalanced. Figures summarizing results are provided.
We are now in the process of applying the purification process to the analyses. We began with the 20-item condition with completely unbalanced DIF as that subset of results had the worst correct solution detection when using the Rasch tree. Our findings indicate that the purification method results in higher true detection rates than the original single-step application of the Rasch tree for the examined conditions. However, all conditions will have to be examined for definite conclusions to be made.
Practical Implications
The improvements to correct DIF detection using purification with Rasch trees is encouraging and expected given prior results to improve accuracy in DIF detection with the use of iterative purification procedures with other DIF procedures. However, most simulation results with Rasch trees focus on percent splits, percent DIF items, or true and false positives at the item level and do not evaluate if the final solution is the correct solution, which is important given the global approach of this method. Therefore, this study provides information about caution with single-step Rasch trees that is not widely publicized, and one possible option for minimizing inaccurate outcomes.