Sample
A cluster sampling method was used in this study. Data were collected from children in nine public schools and child development centers in two states (California and South Carolina) involved in a funded grant project investigating universal screening. Parents provided consent for participation in the project; however, the method of consent varied by site. In California, parents were asked to give informed consent, while South Carolina sites used passive consent procedures. Both sites sent hard copies of forms home with children with their classroom work.
The children were nested within 91 classrooms. Hard copies of the PSC-17 forms were sent home for parents to complete. Forms were distributed in Spanish or English, depending on the home language. Parents were not compensated for participation; however, small incentives (e.g., pencils and stickers) were given to children. Families were not given individual feedback but were encouraged to contact the project investigators with any questions or for individualized information. We had bilingual project staff to assist Hispanic families with questions and concerns.
PSC-17 parent ratings were combined during three academic years (2016–17, 2017–18, 2018–19). The sample consists of 1,305 ratings of children aged from 3 to 6. The sample size is adequate based on the recommended sample size range from 300 to 460 cases, considering the number of indicators and factors, the magnitude of factor loadings, path coefficients, and the amount of missing data (Wolf et al.,
2013). Institutional Review Board approval and informed consent were obtained before the data collection, and ethical treatment of subjects was followed during data collection and analysis procedures.
Female children (n = 591, 47.1%) and male children (n = 664, 52.9%) were evenly distributed in the sample. The sample of children rated by parents was predominantly Hispanic (53.6%), White (30.6%), African American (12.0%), and other racial groups, including Asian American, Pacific Islander/Native Hawaiian, American Indian/ Alaska natives, and multi-racial backgrounds (3.8%). The sample included children from different grade levels: Pre-kindergarten (50.1%), 5-year-old Kindergarten (40.4%), and Grade 1(9.5%); Grade level was coded as dichotomous. Approximately 50.1% of sampled children (n = 618) were from Pre-kindergarten, and 49.9% of sampled children (n = 615) were from K to grade 1.
Statistical Analysis
All analyses were conducted using Mplus 8.4 software (Muthén & Muthén,
1998-2015). The weighted least squares with mean and variance adjusted (WLSMV) estimation method was chosen to accommodate the categorical nature of the PSC-17 data and address non-normality in the data (Finney & DiStefano,
2013). In Mplus, the method serves as the default when dealing with categorical data (Muthén & Muthén,
1998-2015. Missing data ranged from 0.3% to 2.9% across the 17 items. Due to the low proportion of missing data, pairwise deletion was utilized, meaning cases are excluded only if they have missing data on variables involved in data analysis. This approach is recommended for categorical data analysis as it maximizes case inclusion (DiStefano et al.,
2017). Additionally, the nesting structure of the data (i.e., students nested in classrooms) was considered using a design effect to provide more accurate standard errors for parameter estimates. (Stapleton,
2013).
As the 3-factor structure of the PSC-17 used in the school setting was previously identified, we did not perform exploratory factor analysis. Confirmatory Factor Analysis (CFA) was conducted to test whether the three-factor solution is appropriate in the current study compared to previously established CFA models for teacher ratings of the PSC-17 in school settings (DiStefano et al.,
2017; Liu et al.,
2020b; Gao et al.,
2022) and the parent-rated PSC-17 in clinical settings (Chaffin et al.,
2017; Murphy et al.,
2016; Stoppelbein, et al.,
2012). The proposed model comprises three intercorrelated sub-scales: Internalizing Problems, Attention Problems, and Externalizing Problems.
To identify uniform differential item functioning (DIF) items, a Multiple Indicators Multiple Causes (MIMIC) model, defined as CFA models with covariates (Brown,
2015), was used. This model comprises a measurement model defining the relationship between indicators and latent variables (established at the CFA stage) and a structural model specifying the direct effects of covariates on item responses and latent factors (Jöreskog & Sörbom,
1996). MIMIC modeling assesses measurement invariance by allowing direct paths from grouping variables to observed variables (Kim et al.,
2012). Uniform DIF occurs when the focal group consistently performs differently reference group after controlling for the level of the scale score (Scott et al.,
2009). Potential uniform DIF items in the parent-rated PSC-17 were identified using children’s gender (0 = male, 1 = female), grade level(3 K to 4 K = 0, 5 K-1st grade = 1), and African American race/ethnicity(0 = no, 1 = yes), Hispanic(0 = no, 1 = yes), other racial/ethnic groups (0 = no, 1 = yes) as covariates. Male children, children from 3 to 4 K, and children from the White group served as reference categories. Item responses are regressed onto grouping variables to determine whether members of different groups vary in the possibility of endorsing any item response option after controlling their level on latent variables (Finch,
2005). MIMIC models were utilized to detect latent trait group differences by regressing the latent trait onto covariates while assuming that the hypothesized structure is invariant across groups (Green & Thompson,
2012). If the invariance measurement test identified items exhibiting uniform DIF, MIMIC modeling was used to examine whether the latent mean comparison across groups might be biased.
Following the approach used by Kim et al. (
2012), a baseline MIMIC model was first constructed. In this baseline model, all latent variables identified in the CFA analysis were simultaneously regressed on all the covariates without any PSC-17 items included in the model (i.e., the direct effects of covariates on all items’ difficulty were constrained to zero). Next, the baseline models were compared with the relaxed MIMIC models where one direct effect of a covariate on a PSC-17 item was added sequentially, with all the covariates' direct effect on each latent variable (i.e., the direct effect of one covariate on one item’s difficulty was freely estimated). To compare these nested models, the WLSMV model chi-square difference test was conducted using the Mplus
Difftest feature. A significant change in the chi-square difference with one degree of freedom between the baseline model and the less constrained model would indicate uniform DIF for the given item.
As high Type I error rates were reported when MIMIC modeling was used to identify nonvariant variables (Kim et al.,
2012), the Oort adjustment to the Chi-square difference test was used to control Type I error inflation (i.e., false identification of uniform DIF for invariant items in the model comparison; Kim et al.,
2012; Oort,
1998). Oort’s formula stated as
\({K}{\prime}=[{x}_{0}^{2}/(K+{df}_{0}-1)]*\text{K}\), adjusts the critical chi-square value to account for the potential model misspecification in the baseline MIMIC model when analyzing categorical items. In the formula, K´ refers to the adjusted critical value for the chi-square difference test, and K is the critical value for the chi-square difference test (e.g., the critical value is 3.84 for 1 df at the 0.05 level of significance). χ
02 is the chi-square value for the baseline model, and
df0 is the degree of freedom for the baseline model. This method not only helps to control Type I error rates at or below the nominal level but also assists with maintaining high power across different study conditions (Kim et al.,
2012).
If uniform DIF items were identified, the effects of gender, grade level, and race/ethnicity on the latent factor means were examined using MIMIC models to determine if latent factor means exhibit bias. The first MIMIC model was the baseline MIMIC model which only includes direct paths from covariates on latent factors. The second MIMIC model was the model in which the direct effects of the covariates on the identified DIF items, as well as latent factors, were included.
The CFA and MIMIC models were evaluated using the following indices commonly used with analysis of categorical data (Finney & DiStefano,
2013): (a) chi-square statistic, (b) comparative fit index (CFI), (c) root mean squared error of approximation (RMSEA), and (e) standardized root mean square residual (SRMR). As chi-square statistics are sensitive to sample size, the values were reported for model comparison purposes. CFI ≥ 0.90, RMSEA ≤ 0.08, and SRMR ≤ 0.10 indicated an acceptable model fit. CFI ≥ 0.95, RMSEA ≤ 0.05, and SRMR ≤ 0.08 suggested good fit (Hu & Bentler,
1999). 90% confidence interval (CI) close to the RMSEA point estimate should contain 0.05 to show the possible close fit (Browne & Cudeck,
1993).
Besides global fit, local fit indices were examined as global model fit might be affected by the magnitude of the factor loadings and the number of items (Greiff & Heene,
2017; McNeish et al.,
2018). Residual values, which show the difference between the observed and estimated covariances, were evaluated. Large standardized residuals (e.g., |> 3.0|) suggest a possible local model misfit (Raykov & Marcoulides,
2012). The interpretability of parameter estimates was also examined.