Background
Patient-reported outcome measures (PROMs) are increasingly used in empirical studies and clinical practice to assess, among other things, the effectiveness of healthcare interventions and to monitor patients’ quality of life (QOL) over time. However, longitudinal measurements of patient-reported outcomes can be affected by response shift. Schwartz & Sprangers [
1,
2] defined response shift as a change in the meaning of one’s self-evaluation of a target construct as a result of a change in one’s internal standards of measurement (recalibration), a change in the importance of component domains constituting the target construct (reprioritization), or a redefinition of the target construct (reconceptualization). When response shift occurs, PROM results will not have the same meaning at different points in time. Consequently, the change in observed PROM scores will not accurately reflect change in the construct that the PROM intends to measure (a.k.a. change in the “target construct”). This difference between “observed” and “target” change has been operationalized as a response shift effect. Whereas response shift may invalidate comparisons of PROM results over time when it is not taken into account, response shift is also viewed as meaningful information that provides insight into how patients accommodate health changes [
1,
2].
Over the past decades, many studies have been conducted to investigate occurrences and magnitudes of response shift effects. Two systematic reviews on the detection of response shift have been published, encompassing 101 [
3] and 107 [
4] studies with 51 overlapping. Previous systematic reviews on the magnitudes of response shift effects include meta-analyses of: (a) studies published up to 2005 that examined response shift based on the then-test (one of the most commonly used response shift detection methods that involves asking respondents to retrospectively re-evaluate their baseline functioning at posttest. The comparison of scores on the baseline measure and the then-test provides an indication of response shift and its magnitude) [
5], (b) studies published up to 2016 on people with an orthopedic condition [
6], and (c) studies published up to 2018 on people with cancer [
7]. Hence, these reviews were restricted in the outcome (i.e., detection or magnitude of response shift), the method, or the target population. We previously conducted a descriptive systematic review of all quantitative studies published before 2021 that investigated response shift using PROMs and described distributions of response shift detection and, where possible, effect sizes [
8]. The results of this descriptive review provided insight into
how the number and magnitude of response shift effects vary across diverse studies employing different response shift methods, populations, research designs, and PROMs.
The next important aim is to gain insight into
why response shift results differ, by investigating relevant variables that are associated with and explain variability in response shift results. The current meta-regression analysis builds on the previous descriptive review and aims to identify response shift methods, population characteristics, design characteristics, PROMs, and study quality characteristics that explain variability in (1) the detection of response shift effects and (2) the magnitude of response shift effects (i.e., standardized mean differences). This latter objective was only investigated for studies using the then-test and/or structural equation modeling (SEM) methods that enable the calculation of a standardized effect size for measuring the difference between means (Cohen’s
d). This work is part of the Response Shift – in Sync Working Group initiative that aims to synthesize the work on response shift to date [
9‐
13].
Discussion
Our meta-regression analysis, involving 171 response shift studies indicates that, on average, one out of five longitudinal PROM effects that are investigated for response shift result in the detection of response shift, when adjusting for sampling dependencies and controlling for sample- and effect-level variables. This result is consistent with our previous systematic review, based on 150 overlapping studies [
8]. The results of our current analysis further indicate that two-fifths of the effect-level variance in response shift detection was explained by the variables we extracted from the included studies. Of those, the kind of response shift method accounted for almost half of this explained variance. Other notable variables influencing response shift detection included response shift type and the study quality control variables. Contrary to expectation, variation in response shift detection was not explained by population characteristics (sex, age, medical condition, and intervention).
The results highlight that variation in response shift detection is predominantly attributable to methodological differences across studies. Most importantly, the question arises as to why different response shift methods have a varying probability of detecting response shift, even after controlling for the other effect-level variables. Most model-based methods, such as latent variable methods based on Oort’s procedure [
35], allow for multiple effects (e.g., effects for different response shift types and domains) that are tested simultaneously within the same model, thereby protecting against false positives. For instance, an overall test for response shift is inherently included in SEMs based on Oort’s procedure, which can help to prevent false positives even when multiple testing corrections are not performed after this overall test. Conversely, in design-based methods, each response shift effect (e.g., effects for different domains) is typically tested separately, in which case multiple testing could result in false positives (unless a method to control the familywise error rate, such as the Bonferroni method is used). This difference may, in part, explain why the probability of detecting response shift is lower for latent-variable methods than for the then-test. Of course, there is also a trade-off between controlling familywise error and statistical power to detect response shift effects; it might be that latent variable method studies are generally underpowered as compared to studies that use design-based methods. This argument is deemed less plausible as application of latent variable methods usually requires larger sample sizes (with more power to detect effects). Another possible explanation is that model-based methods have been more commonly applied in secondary data analyses, leading to a relatively smaller probability of detecting response shift. Conversely, the then-test can only be used in studies where it was included in the original design (i.e., there is no room for secondary analyses as response shift detection is most likely the focus of the primary analyses). Still, these arguments do not explain why the probability of detecting response shift is larger for regression methods and other methods. Therefore, we echo previous recommendations that response shift studies should evaluate and report on sample size requirements for the chosen statistical analyses [
36,
37] (e.g., see Verdam [
38] for a tutorial on power calculations for response shift investigations with SEM).
The magnitude of response shift was only available for effects investigated with then-test and SEM, where one-quarter of the variability of response shift effect sizes across samples was explained by population characteristics (sample-level variables) and only a tenth of the variability across response shift effects was explained by effect-level variables (including study design, the PROM, response shift method (then-test vs. SEM) and study quality control variables). However, effect sizes could be determined for only 132 (3.2%) of the 4176 effects investigated with SEM, whereas for effects investigated with then-test effect-size information was available for 90.0% of the investigated effects (i.e. 637 out of 708). Consequently, the effect size results predominantly represent studies using the then-test method and may not be representative of studies based on SEM or other methods for examining response shift. This points to a limitation of the response shift literature, as the preponderance of response shift studies do not report information about the magnitudes of effects and some methods do not enable effect size computation.
The average effect size of the response shift effects based on the then-test or SEM methods (including only those for which an effect size could be determined) was 0.30 (95% CI: 0.26–0.34) and all marginal effect sizes were above 0.20 (with two exceptions), indicating that they were far from negligible and predominantly ranged from small to moderate in size (based on Cohen’s guidelines [
34]). To contextualize, most effects in PROM research are also of a small to moderate magnitude, as discussed previously [
8] and in some instances, small effect sizes might be clinically relevant [
39]. The largest marginal effect size (0.45) was found for analyses of time-period > 12 months and samples of mostly children/adolescents had the smallest marginal effect size (0.10). It is important to consider that the differences in marginal effect sizes across all of the explanatory and control variables are relatively small and nearly all of the CIs are overlapping, precluding firm conclusions about their relative importance.
The current meta-regression analysis adds to our previous descriptive systematic review by disentangling the variability induced by differences in population characteristics, study design, PROMs, response shift methods, and study quality. The results allow for direct comparisons of effects across different methods and other characteristics. A more specific strength is our extensive analysis of dependencies, which has guarded us against finding spurious effects. Finally, the current meta-regression analysis on response shift effects and effect sizes of quantitative response shift studies is the most comprehensive to date, including all investigated populations and methods.
Several limitations that were listed in our previously published descriptive systematic review [
8] apply equally to the current meta-regression analysis, including: the omission of studies that are not reported in English; inclusion of studies that adopted different operationalizations of response shift and/or had incomplete reporting of study results and/or methodology; consideration of all detected effects as response shift effects, although their substantiation may be questioned; and inclusion of a limited number of explanatory variables that in the current meta-regression analysis explained less than half of the calculated variances. Another limitation is the lack of a formal assessment of study quality, as a meaningful assessment of study quality is hindered by the heterogeneity of the included studies (see section on
risk of bias). Although we have included four control variables as indicators of study quality in all the analyses, it is important to keep in mind that due to a lack of direct assessment of study quality its influence on response shift results remains unknown. Relatedly, the extent to which response shift results are affected by the psychometric quality of PROMs was not investigated. As information about the reliability and validity of PROMs is rarely consistently reported we did not consider it feasible to include this information in the current analysis; its influence might be more easily investigated using simulation studies. Finally, exploration of potential interaction effects between the explanatory/control variables was not performed. This exploration was deemed too complex due to the limited number of observations and the relatively large number of explanatory variables. Nevertheless, insight into the dynamics between the different explanatory variables might be clinically meaningful and thus a relevant topic for future research.
The current meta-regression analysis, combined with our previous descriptive systematic review [
8], provides insight into the variability of response shift results, i.e., how the detection of response shift and response shift effect sizes vary across populations, study designs, PROMs, response shift methods, and study quality criteria. Rather than focusing only on overall response shift effects, future research should aim to identify and understand the conditions under which response shift is more or less likely to occur. This may include person- versus variable-centered quantitative methods [
40,
41], qualitative research, and examination of theoretical and philosophical perspectives [
42]. The marginal probabilities and effect sizes of response shift effects may also be taken into account when interpreting the results of other comparable studies using PROMs by considering the potential occurrence and impact of response shift effects. Additionally, future studies on response shift detection need to be informed by the current insights into response shift effects and effect sizes, which may help with designing the study and interpreting the results [
13]. Well-designed studies and contextualized interpretation of results are needed to improve our understanding of response shift.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.