Skip to main content

Welkom bij Erasmus MC & Bohn Stafleu van Loghum

Erasmus MC heeft ervoor gezorgd dat je Mijn BSL eenvoudig en snel kunt raadplegen. Je kunt je links eenvoudig registreren. Met deze gegevens kun je thuis, of waar ook ter wereld toegang krijgen tot Mijn BSL.

Registreer

Om ook buiten de locaties van Erasmus MC, thuis bijvoorbeeld, van Mijn BSL gebruik te kunnen maken, moet je jezelf eenmalig registreren. Dit kan alleen vanaf een computer op een van de locaties van Erasmus MC.

Eenmaal geregistreerd kun je thuis of waar ook ter wereld onbeperkt toegang krijgen tot Mijn BSL.

Login

Als u al geregistreerd bent, hoeft u alleen maar in te loggen om onbeperkt toegang te krijgen tot Mijn BSL.

Top

28-02-2025 | Review

Examining differential item functioning in self-reported health survey data: via multilevel modeling

Auteurs: Dandan Chen Kaptur, Yiqing Liu, Bradley Kaptur, Nicholas Peterman, Jinming Zhang, Justin L. Kern, Carolyn Anderson

Gepubliceerd in: Quality of Life Research

Log in om toegang te krijgen
share
DELEN

Deel dit onderdeel of sectie (kopieer de link)

  • Optie A:
    Klik op de rechtermuisknop op de link en selecteer de optie “linkadres kopiëren”
  • Optie B:
    Deel de link per e-mail

Abstract

Few health-related constructs or measures have received a critical evaluation in terms of measurement equivalence, such as self-reported health survey data. Differential item functioning (DIF) analysis is crucial for evaluating measurement equivalence in self-reported health surveys, which are often hierarchical in structure. Traditional single-level DIF methods in this case fall short, making multilevel models a better alternative. We highlight the benefits of multilevel modeling for DIF analysis, when applying a health survey data set to multilevel binary logistic regression (for analyzing binary response data) and multilevel multinominal logistic regression (for analyzing polytomous response data), and comparing them with their single-level counterparts. Our findings show that multilevel models fit better and explain more variance than single-level models. This article is expected to raise awareness of multilevel modeling and help healthcare researchers and practitioners understand the use of multilevel modeling for DIF analysis.
Bijlagen
Alleen toegankelijk voor geautoriseerde gebruikers
Literatuur
1.
go back to reference Teresi, J. A., & Fleishman, J. A. (2007). Differential item functioning and health assessment. Quality of Life Research, 16(1), 33–42.PubMedCrossRef Teresi, J. A., & Fleishman, J. A. (2007). Differential item functioning and health assessment. Quality of Life Research, 16(1), 33–42.PubMedCrossRef
2.
go back to reference Rice, S. M., Parker, A. G., Mawren, D., Clifton, P., Harcourt, P., Lloyd, M., Kountouris, A., Smith, B., McGorry, P. D., & Purcell, R. (2020). Preliminary psychometric validation of a brief screening tool for athlete mental health among male elite athletes: The Athlete Psychological Strain Questionnaire. International Journal of Sport and Exercise Psychology, 18(6), 850–865.CrossRef Rice, S. M., Parker, A. G., Mawren, D., Clifton, P., Harcourt, P., Lloyd, M., Kountouris, A., Smith, B., McGorry, P. D., & Purcell, R. (2020). Preliminary psychometric validation of a brief screening tool for athlete mental health among male elite athletes: The Athlete Psychological Strain Questionnaire. International Journal of Sport and Exercise Psychology, 18(6), 850–865.CrossRef
3.
go back to reference Rouquette, A., Nadot, T., Labitrie, P., Broucke, S., Mancini, J., Rigal, L., & Ringa, V. (2018). Validity and measurement invariance across sex, age, and education level of the French short versions of the European Health Literacy Survey Questionnaire. PLOS ONE, 13(12), 1–15.CrossRef Rouquette, A., Nadot, T., Labitrie, P., Broucke, S., Mancini, J., Rigal, L., & Ringa, V. (2018). Validity and measurement invariance across sex, age, and education level of the French short versions of the European Health Literacy Survey Questionnaire. PLOS ONE, 13(12), 1–15.CrossRef
4.
go back to reference Quistberg, D. A., Diez Roux, A. V., Bilal, U., Moore, K., Ortigoza, A., Rodriguez, D. A., Sarmiento, O. L., Frenz, P., Friche, A. A., Caiaffa, W. T., Vives, A., Miranda, J. J., & the SALURBAL Group (2019). Building a data platform for cross-country urban health studies: The SALURBAL study. Journal of Urban Health, 96(2), 311–337. Quistberg, D. A., Diez Roux, A. V., Bilal, U., Moore, K., Ortigoza, A., Rodriguez, D. A., Sarmiento, O. L., Frenz, P., Friche, A. A., Caiaffa, W. T., Vives, A., Miranda, J. J., & the SALURBAL Group (2019). Building a data platform for cross-country urban health studies: The SALURBAL study. Journal of Urban Health, 96(2), 311–337.
5.
go back to reference Tiego, J., Martin, E. A., DeYoung, C. G., Hagan, K., Cooper, S. E., Pasion, R., Satchell, L., Shackman, A. J., Bellgrove, M. A., & Fornito, A. (2023). Precision behavioral phenotyping as a strategy for uncovering the biological correlates of psychopathology. Nature Mental Health, 1(5), 304–315.PubMedPubMedCentralCrossRef Tiego, J., Martin, E. A., DeYoung, C. G., Hagan, K., Cooper, S. E., Pasion, R., Satchell, L., Shackman, A. J., Bellgrove, M. A., & Fornito, A. (2023). Precision behavioral phenotyping as a strategy for uncovering the biological correlates of psychopathology. Nature Mental Health, 1(5), 304–315.PubMedPubMedCentralCrossRef
7.
go back to reference Teresi, J. A., Wang, C., Kleinman, M., Jones, R. N., & Weiss, D. J. (2021). Differential item functioning analyses of the Patient-Reported Outcomes Measurement Information System (PROMIS®) measures: Methods, challenges, advances, and future directions. Psychometrika, 86(3), 674–711.PubMedPubMedCentralCrossRef Teresi, J. A., Wang, C., Kleinman, M., Jones, R. N., & Weiss, D. J. (2021). Differential item functioning analyses of the Patient-Reported Outcomes Measurement Information System (PROMIS®) measures: Methods, challenges, advances, and future directions. Psychometrika, 86(3), 674–711.PubMedPubMedCentralCrossRef
8.
go back to reference Lorem, G., Cook, S., Leon, D. A., Emaus, N., & Schirmer, H. (2020). Self-reported health as a predictor of mortality: A cohort study of its relation to other health measurements and observation time. Scientific Reports, 10(1). Lorem, G., Cook, S., Leon, D. A., Emaus, N., & Schirmer, H. (2020). Self-reported health as a predictor of mortality: A cohort study of its relation to other health measurements and observation time. Scientific Reports, 10(1).
9.
go back to reference Wuorela, M., Lavonius, S., Salminen, M., Vahlberg, T., Viitanen, M., & Viikari, L. (2020). Self-rated health and objective health status as predictors of all-cause mortality among older people: A prospective study with a 5-, 10-, and 27-year follow-up. BMC Geriatrics, 20(1), 120. Wuorela, M., Lavonius, S., Salminen, M., Vahlberg, T., Viitanen, M., & Viikari, L. (2020). Self-rated health and objective health status as predictors of all-cause mortality among older people: A prospective study with a 5-, 10-, and 27-year follow-up. BMC Geriatrics, 20(1), 120.
10.
go back to reference Ulitzsch, E., Henninger, M., & Meiser, T. (2024). Differences in response-scale usage are ubiquitous in cross-country comparisons and a potential driver of elusive relationships. Scientific Reports, 14(1).PubMedPubMedCentralCrossRef Ulitzsch, E., Henninger, M., & Meiser, T. (2024). Differences in response-scale usage are ubiquitous in cross-country comparisons and a potential driver of elusive relationships. Scientific Reports, 14(1).PubMedPubMedCentralCrossRef
11.
go back to reference van de Vijver, F. J. R., & Poortinga, Y. H. (1997). Towards an integrated analysis of bias in cross-cultural assessment. European Journal of Psychological Assessment, 13(1), 29–37.CrossRef van de Vijver, F. J. R., & Poortinga, Y. H. (1997). Towards an integrated analysis of bias in cross-cultural assessment. European Journal of Psychological Assessment, 13(1), 29–37.CrossRef
12.
go back to reference Werner, O., & Campbell, D. (1970). Translating, working through interpreters, and the problem of decentering. In R. Naroll & R. Cohen (Eds.), A Handbook of Cultural Anthropology (pp. 389–418). American Museum of Natural History. Werner, O., & Campbell, D. (1970). Translating, working through interpreters, and the problem of decentering. In R. Naroll & R. Cohen (Eds.), A Handbook of Cultural Anthropology (pp. 389–418). American Museum of Natural History.
13.
go back to reference Berk, R. (Ed.). (1982). Handbook of Methods for Detecting Item Bias. The Johns Hopkins University Press. Berk, R. (Ed.). (1982). Handbook of Methods for Detecting Item Bias. The Johns Hopkins University Press.
14.
go back to reference American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (2014). Standards for educational and psychological testing. American Educational Research Association. American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (2014). Standards for educational and psychological testing. American Educational Research Association.
15.
go back to reference Penfield, R. D., & Camilli, G. (2006). Item functioning and item bias. In C. R. Rao & S. Sinharay (Eds.), Handbook of Statistics, volume 26 of Psychometrics (pp. 125–167). Elsevier. Penfield, R. D., & Camilli, G. (2006). Item functioning and item bias. In C. R. Rao & S. Sinharay (Eds.), Handbook of Statistics, volume 26 of Psychometrics (pp. 125–167). Elsevier.
16.
go back to reference Lee, Y.-H., & Zhang, J. (2017). Effects of differential item functioning on examinees’ test performance and reliability of test. International Journal of Testing, 17(1), 23–54.CrossRef Lee, Y.-H., & Zhang, J. (2017). Effects of differential item functioning on examinees’ test performance and reliability of test. International Journal of Testing, 17(1), 23–54.CrossRef
17.
go back to reference Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7(2), 105–118.CrossRef Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7(2), 105–118.CrossRef
18.
go back to reference Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370.CrossRef Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370.CrossRef
19.
go back to reference Holland, P., & Thayer, D. T. (1985). An alternate definition of the ETS delta scale of item difficulty. Technical Report 85-43, Educational Testing Service. Holland, P., & Thayer, D. T. (1985). An alternate definition of the ETS delta scale of item difficulty. Technical Report 85-43, Educational Testing Service.
20.
go back to reference Lord, F. (1976). A study of item bias, using item characteristic curve theory. Technical Report ED137486, Educational Testing Service. Lord, F. (1976). A study of item bias, using item characteristic curve theory. Technical Report ED137486, Educational Testing Service.
21.
go back to reference Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99(1), 118–128.CrossRef Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99(1), 118–128.CrossRef
22.
go back to reference Osterlind, S., & Everson, H. (2009). Differential Item Functioning (2nd ed.). SAGE Publications Inc.CrossRef Osterlind, S., & Everson, H. (2009). Differential Item Functioning (2nd ed.). SAGE Publications Inc.CrossRef
23.
go back to reference Lee, S. Y. (2015). Lord’s Wald test for detecting DIF in multidimensional IRT Models: A comparison of two estimation approaches. PhD thesis, Rutgers, The State University of New Jersey. Lee, S. Y. (2015). Lord’s Wald test for detecting DIF in multidimensional IRT Models: A comparison of two estimation approaches. PhD thesis, Rutgers, The State University of New Jersey.
24.
go back to reference Chen, D. (2023). Modeling item bias in fixed-item tests and computerized adaptive tests. PhD thesis, University of Illinois at Urbana-Champaign. Chen, D. (2023). Modeling item bias in fixed-item tests and computerized adaptive tests. PhD thesis, University of Illinois at Urbana-Champaign.
25.
go back to reference Raudenbush, S., & Bryk, A. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). SAGE Publications Inc. Raudenbush, S., & Bryk, A. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). SAGE Publications Inc.
26.
go back to reference Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press. Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.
27.
go back to reference Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference (2nd ed.). Houghton Mifflin Company. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference (2nd ed.). Houghton Mifflin Company.
28.
go back to reference Woods, C. M., Cai, L., & Wang, M. (2013). The Langer-improved Wald test for DIF testing with multiple groups: Evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73(3), 532–547.CrossRef Woods, C. M., Cai, L., & Wang, M. (2013). The Langer-improved Wald test for DIF testing with multiple groups: Evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73(3), 532–547.CrossRef
29.
go back to reference Penfield, R. D. (2001). Assessing differential item functioning among multiple groups: A comparison of three Mantel-Haenszel procedures. Applied Measurement in Education, 14(3), 235–259.CrossRef Penfield, R. D. (2001). Assessing differential item functioning among multiple groups: A comparison of three Mantel-Haenszel procedures. Applied Measurement in Education, 14(3), 235–259.CrossRef
30.
go back to reference Snijders, T. A. B., & Bosker, R. (2011). Multilevel analysis: An introduction to basic and advanced multilevel modeling (2nd ed.). SAGE Publications Inc. Snijders, T. A. B., & Bosker, R. (2011). Multilevel analysis: An introduction to basic and advanced multilevel modeling (2nd ed.). SAGE Publications Inc.
31.
go back to reference Swanson, D. B., Clauser, B. E., Case, S. M., Nungester, R. J., & Featherman, C. (2002). Analysis of differential item functioning (DIF) using hierarchical logistic regression models. Journal of Educational and Behavioral Statistics, 27(1), 53–75.CrossRef Swanson, D. B., Clauser, B. E., Case, S. M., Nungester, R. J., & Featherman, C. (2002). Analysis of differential item functioning (DIF) using hierarchical logistic regression models. Journal of Educational and Behavioral Statistics, 27(1), 53–75.CrossRef
32.
go back to reference den Noortgate, W., & de Boeck, P. (2005). Assessing and explaining differential item functioning using logistic mixed models. Journal of Educational and Behavioral Statistics, 30(4), 443–464.CrossRef den Noortgate, W., & de Boeck, P. (2005). Assessing and explaining differential item functioning using logistic mixed models. Journal of Educational and Behavioral Statistics, 30(4), 443–464.CrossRef
33.
go back to reference French, B. F., & Finch, H. (2013). Extensions of Mantel-Haenszel for multilevel DIF detection. Educational and Psychological Measurement, 73(4), 648–671.CrossRef French, B. F., & Finch, H. (2013). Extensions of Mantel-Haenszel for multilevel DIF detection. Educational and Psychological Measurement, 73(4), 648–671.CrossRef
34.
go back to reference Huang, S., & Valdivia, D. S. (2024). Wald χ2 test for differential item functioning detection with polytomous items in multilevel data. Educational and Psychological Measurement, 84(3), 530–548.PubMedCrossRef Huang, S., & Valdivia, D. S. (2024). Wald χ2 test for differential item functioning detection with polytomous items in multilevel data. Educational and Psychological Measurement, 84(3), 530–548.PubMedCrossRef
35.
go back to reference Lord, F. (1980). Applications of Item Response Theory to Practical Testing Problems. Lawrence Erlbaum Associates Inc. Lord, F. (1980). Applications of Item Response Theory to Practical Testing Problems. Lawrence Erlbaum Associates Inc.
36.
go back to reference Agresti, A. (2019). An Introduction to Categorical Data Analysis (3rd ed.). Wiley. Agresti, A. (2019). An Introduction to Categorical Data Analysis (3rd ed.). Wiley.
37.
go back to reference Douglas, E. (2022). Examining the relationship between urban density and sense of community in the Greater Vancouver Regional District. Cities, 130.CrossRef Douglas, E. (2022). Examining the relationship between urban density and sense of community in the Greater Vancouver Regional District. Cities, 130.CrossRef
38.
go back to reference Zander, K. K., Cadag, J. R., Escarcha, J., & Garnett, S. T. (2018). Perceived heat stress increases with population density in urban Philippines. Environmental Research Letters, 13(8), 1–8.CrossRef Zander, K. K., Cadag, J. R., Escarcha, J., & Garnett, S. T. (2018). Perceived heat stress increases with population density in urban Philippines. Environmental Research Letters, 13(8), 1–8.CrossRef
39.
go back to reference Fassio, O., Rollero, C., & De Piccoli, N. (2013). Health, quality of life and population density: A preliminary study on “contextualized’’ quality of life. Social Indicators Research, 110(2), 479–488.CrossRef Fassio, O., Rollero, C., & De Piccoli, N. (2013). Health, quality of life and population density: A preliminary study on “contextualized’’ quality of life. Social Indicators Research, 110(2), 479–488.CrossRef
40.
go back to reference Walton, D., Murray, S. J., & Thomas, J. A. (2008). Relationships between population density and the perceived quality of neighbourhood. Social Indicators Research, 89(3), 405–420.CrossRef Walton, D., Murray, S. J., & Thomas, J. A. (2008). Relationships between population density and the perceived quality of neighbourhood. Social Indicators Research, 89(3), 405–420.CrossRef
43.
go back to reference Nakagawa, S., & Schielzeth, H. (2013). A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4(2), 133–142.CrossRef Nakagawa, S., & Schielzeth, H. (2013). A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4(2), 133–142.CrossRef
44.
go back to reference Svetina Valdivia, D., Huang, S., & Botter, P. (2024). Detecting differential item functioning in presence of multilevel data: Do methods accounting for multilevel data structure make a DIFference? Frontiers in Education, 9. Frontiers. Svetina Valdivia, D., Huang, S., & Botter, P. (2024). Detecting differential item functioning in presence of multilevel data: Do methods accounting for multilevel data structure make a DIFference? Frontiers in Education, 9. Frontiers.
45.
go back to reference Crane, P. K., Gibbons, L. E., Narasimhalu, K., Lai, J.-S., & Cella, D. (2007). Rapid detection of differential item functioning in assessments of health-related quality of life: The Functional Assessment of Cancer Therapy. Quality of Life Research, 16(1), 101–114.PubMedCrossRef Crane, P. K., Gibbons, L. E., Narasimhalu, K., Lai, J.-S., & Cella, D. (2007). Rapid detection of differential item functioning in assessments of health-related quality of life: The Functional Assessment of Cancer Therapy. Quality of Life Research, 16(1), 101–114.PubMedCrossRef
46.
go back to reference Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. Department of National Defense, Ottawa, ON: Directorate of Human Resources Research and Evaluation. Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. Department of National Defense, Ottawa, ON: Directorate of Human Resources Research and Evaluation.
47.
go back to reference Zwick, R., & Thayer, D. T. (1996). Evaluating the magnitude of differential item functioning in polytomous items. Journal of Educational and Behavioral Statistics, 21(3), 187–201.CrossRef Zwick, R., & Thayer, D. T. (1996). Evaluating the magnitude of differential item functioning in polytomous items. Journal of Educational and Behavioral Statistics, 21(3), 187–201.CrossRef
48.
go back to reference French, B. F., & Finch, H. (2010). Hierarchical logistic regression: Accounting for multilevel data in DIF detection. Journal of Educational Measurement, 47(3), 299–317.CrossRef French, B. F., & Finch, H. (2010). Hierarchical logistic regression: Accounting for multilevel data in DIF detection. Journal of Educational Measurement, 47(3), 299–317.CrossRef
49.
go back to reference French, B. F., & Finch, H. (2015). Transforming SIBTEST to account for multilevel data structures. Journal of Educational Measurement, 52(2), 159–180.CrossRef French, B. F., & Finch, H. (2015). Transforming SIBTEST to account for multilevel data structures. Journal of Educational Measurement, 52(2), 159–180.CrossRef
50.
go back to reference French, B. F., Finch, W. H., & Immekus, J. C. (2019). Multilevel generalized Mantel-Haenszel for differential item functioning detection. Frontiers in Education, 4(47), 1–10. French, B. F., Finch, W. H., & Immekus, J. C. (2019). Multilevel generalized Mantel-Haenszel for differential item functioning detection. Frontiers in Education, 4(47), 1–10.
51.
go back to reference Moineddin, R., Matheson, F. I., & Glazier, R. H. (2007). A simulation study of sample size for multilevel logistic regression models. BMC Medical Research Methodology, 7(34), 1–10. Moineddin, R., Matheson, F. I., & Glazier, R. H. (2007). A simulation study of sample size for multilevel logistic regression models. BMC Medical Research Methodology, 7(34), 1–10.
Metagegevens
Titel
Examining differential item functioning in self-reported health survey data: via multilevel modeling
Auteurs
Dandan Chen Kaptur
Yiqing Liu
Bradley Kaptur
Nicholas Peterman
Jinming Zhang
Justin L. Kern
Carolyn Anderson
Publicatiedatum
28-02-2025
Uitgeverij
Springer New York
Gepubliceerd in
Quality of Life Research
Print ISSN: 0962-9343
Elektronisch ISSN: 1573-2649
DOI
https://doi.org/10.1007/s11136-025-03936-9