Top

Open Access 28-03-2025 | Review

Psychometric properties, and cultural appropriateness, of patient reported outcome measures for use in primary healthcare: a scoping review

Auteurs: Christopher M. Doran, Jamie Bryant, Erika Langham, Roxanne Bainbridge, Anthony Shakeshaft, Breanne Hobden, Sara Farnbach, Megan Freund

Gepubliceerd in: Quality of Life Research

Abstract

Purpose

To critically appraise the psychometric properties and cultural appropriateness of self‐reported generic patient-reported outcome measures (PROMs) applicable for use in the primary healthcare setting using the Consensus Based Standards for the Selection of Health Measurement Instruments (COSMIN) guidelines.

Methods

PROMs were identified via a published systematic review and searches of relevant websites. PROMs were included if they were generic (i.e., outcome measures that assessed general aspects of health); had a maximum of 30 items; were applicable for use by all adult primary care patients; and were validated in English. Data was extracted regarding the characteristics of each PROM and the characteristics of included validation studies. The COSMIN risk of bias checklist was used to assess methodological quality and the revised COSMIN criteria was used to assess measurement properties. An evidence synthesis was conducted across studies using the guidelines from the modified Grading of Recommendations Assessment, Development and Evaluation approach for systematic reviews of clinical trials.

Results

399 PROMs were identified and 19 met inclusion criteria. The included PROMs measured general health related quality of life (n = 8), outcomes or impact of care (n = 3), patient enablement, activation, and empowerment (n = 3), quality of care (n = 3), health and disability (n = 1), and functional status (n = 1). Six PROMs met the recommended COSMIN threshold for implementation.

Conclusion

Although six PROMs can be recommended for use in primary care, further psychometric testing is still required to strengthen evidence related to internal consistency, responsiveness and cross-cultural validity/measurement invariance. Selection of a PROM for routine clinical use in primary care also needs to be guided by the patient population.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Plain English summary

Psychometric properties, and cultural appropriateness, of patient reported outcome measures for use in primary healthcare: A scoping review. Patient-reported outcome measures (PROMs) are instruments used to report the status of a patient’s health condition from the patient’s perspective, without interpretation by a healthcare provider or others. This study aimed to understand the most appropriate PROM that is available for use in a primary health care setting. The availability of numerous PROMs creates challenges for clinicians and researchers in selecting the most appropriate one for their specific needs. Since primary care is the cornerstone of healthcare systems, selecting the most appropriate PROM is crucial to ensuring positive patient outcomes. This study found that only a limited number of PROMs can be recommended for use in primary care, with further work required to strengthen methodological quality and cultural appropriateness.

Background

Patient-reported outcomes are defined as any report of the status of a patient’s health condition from the patient’s perspective, without interpretation by a healthcare provider or others [1, 2]. Patient-reported outcome measures (PROMs) are instruments used to capture subjective elements of patient views of their symptoms, functional status, and health related quality of life in a structured, standardised, and efficient way. PROMs can be generic (measure outcomes common across most patients accessing healthcare), disease-specific (applicable only to patients who have a certain disease) or population-specific (applicable only to patients with defined characteristics, for example children).

The fundamental importance of patient perspectives in ensuring the delivery of high-quality healthcare has increasingly been recognised by health‐care systems around the world, with the use of PROMs being touted as having the potential to change the way healthcare is organised and delivered [3]. PROMs can: contribute to the provision of care that is patient centered [4, 5]; assist the identification of health issues that may require further investigation or management; monitor changes in conditions and response to treatments over time; and promote shared decision making [6]. PROMs can also contribute to healthcare policy and decision making by providing information about the comparative effectiveness of treatments and for benchmarking and quality improvement activities, and can provide information about variations in care, costs, and outcomes among healthcare providers.

As a result of healthcare systems around the world increasingly adopting PROM use, there has been a considerable increase in the development of PROMs [7] covering a range of diseases, conditions, and contexts. A 2021 review that critically synthesised information on generic and selected condition‐specific PROMs to describe trends regarding their development and application in the period 1989–2019 identified 315 [8]. The number of available PROMs creates challenges for clinicians and researchers in selecting the most appropriate high-quality PROM for their specific needs. Selection of a PROM for use in clinical practice should consider if the PROM has been psychometrically validated to ensure it measures the latent construct intended, provides reliable responses, and can measure meaningful change. The feasibility of implementing the PROM, how the resultant data will be used, and suitability for the patient population are also important.

Primary care is the cornerstone of healthcare systems, often representing the first point of contact for individuals with the health system. The use of PROMs in primary care presents unique challenges. Patients attending primary care span the entire range of population age, and present with a wide range of diverse health and wellbeing conditions of varied complexity. In this setting, PROMs can have multiple uses from monitoring and informing the ongoing management of an individual’s health condition, through to informing ongoing quality improvement activities at both meso and macro levels [9]. In Australia, there are 31 Primary Health Networks funded by the Federal Government as local independent organisations tasked with ensuring that primary care in their defined regions is accessible, efficient and effective [10]. There is currently no recommended PROM for use in the primary care setting in Australia and no information relating to the potential cultural appropriateness of existing PROMs for Indigenous populations.

The aim of this research was to critically appraise the psychometric properties and cultural appropriateness of self‐reported generic PROMs applicable for use in the primary healthcare setting in Australia using the Consensus Based Standards for the Selection of Health Measurement Instruments (COSMIN) guidelines. This will provide essential evidence to guide the selection of the most appropriate high-quality PROM for use in primary care clinical practice.

Methods

Review advisory group

A Review Advisory Group was established to determine appropriate parameters for the review and ensure the evidence produced was useful to end users. The Review Advisory Group included 16 members, incorporating researchers with expertise in conducting systematic literature reviews and psychometric assessment, and end users with relevant knowledge and expertise. The Group met regularly to establish the scope and methodological approach for the review, refine inclusion and exclusion criteria, and provide advice and feedback about findings.

Identification of potential PROMs for inclusion

A 2021 comprehensive review identified and critically synthesized information on generic and selected condition-specific PROMs [8], identifying 315 PROMs. This review searched the academic and grey literature to identify validated PROMs including searches of peer-reviewed databases, websites, Google Scholar, and Google searches. Given its exhaustiveness, the PROMs identified in the 2021 review were used as a starting point for identification of PROMs for current study, rather than conducting a repetitious literature search. To supplement the 315 PROMs previously identified and capture PROMs relevant to Australia, we also reviewed the generic PROMs available Australian Commission on Safety and Quality in Healthcare website (n = 40) [11], the PROMs available on the NSW Agency for Clinical Innovation website (n = 16) [12], the short-listed PROMs identified in an earlier systematic review of generic PROMs for primary care[13] (n = 20) and lists of PROMs provided that were currently being used in two organisations who collaborated on this review—Western Queensland Primary Health Network (https://www.wqphn.com.au/) (N = 7) and by Check-Up (https://checkup.org.au/ (N = 1).

Inclusion and exclusion criteria

PROMs were included if they assessed general aspects of patient health, were patient reported, were applicable for use with adults aged 18 years and over and had content that was relevant for implementation in the primary care setting. As selection of PROMs for use in primary care (the focus of this review) requires consideration of the feasibility of implementation, therefore PROMs also needed to be < 30 Items OR have evidence of ability to complete the PROM in < 10 min. PROMs were excluded if they were disease, condition, or symptom specific; clinician or proxy reported; only relevant for sub-groups of patients with specific demographic characteristics (e.g., the elderly, veterans, children), or other characteristics (e.g., those taking over the counter medicine); or applicable only in non-primary care settings (e.g., in-patient or acute care settings).

Studies that used an eligible PROM needed to report on development and/or evaluation of one or more psychometric properties, be implemented in English, and the validation sample had to include primary care patients aged 18 years and older, patients with long term conditions, patients with chronic illness, or a general community sample. Studies that used an eligible PROM as an outcome measure, or studies that used the PROM as a comparator instrument in the validation study of another instrument, were excluded.

Data extraction

For each PROM/study meeting the inclusion criteria, the following information was extracted by JB: purpose, target population, initial year of publication, country(ies) of development, mode(s) of administration, recall period, time to complete, readability, number of items, response options, domains, scoring, language and available translations, and licensing restrictions and costs. For each study reporting on psychometric properties, the following information was also extracted: description of participants (including relevant disease characteristics), participant details (number of participants, gender, and age), instrument administration (including setting and method of recruitment, country, and language) and overall response rate.

Assessment of methodological quality and psychometric characteristics

Two reviewers independently undertook all data coding, with discrepancies discussed to achieve consensus about ratings.

Appraisal of study methodological quality

The methodological quality of included studies were assessed using the Consensus-based Standards for the selection of health status Measurement Instruments (COSMIN) risk of bias checklist [14, 15], a tool developed specifically for use in systematic reviews of PROMs. Ten properties of quality were assessed regarding the development or validation of a PROM: development, content validity, structural validity, internal consistency, cross-cultural validity/measurement invariance, reliability, measurement error, criterion validity, hypotheses testing for construct validity, and responsiveness. Each property was rated on a five-point scale as “very good,” “adequate”, “doubtful”, and “inadequate”, or “not assessed”. The overall score for each measurement property was decided using a ‘worst score counts’ approach (e.g., if four out of the five criteria assessed for structural validity were rated as ‘very good’, but one was rated ‘inadequate’, the score given for the entire quality property was ‘inadequate’).

Modified criteria were used to assess the methodological quality of PROM development and content validity due to the stringent COSMIN checklist quality threshold requirements. These modifications were required due to time and resource constraints. PROM development was restricted to assessment of ‘general design requirements. Assessment of content validity was restricted to assessment of whether an appropriate method was used to ask patients about the relevance of the items to their experience of the condition and whether an appropriate method was used to ask professionals whether each item is relevant for the construct of interest. Content validity was assessed globally (i.e., one rating per PROM rather than per study), and included assessment of multiple documents in addition to cited studies (e.g., PROM user manuals, other papers reporting solely on PROM development). As per COSMIN guidelines, instruments that had been modified (e.g., shortened) were evaluated as a new instrument. However, for these PROMs, previously conducted development and content validity studies were considered as relevant for the rating for PROM development and content validity.

Appraisal of PROM measurement properties

The measurement properties of each PROM were assessed using a revised version of the COSMIN criteria for good measurement properties that had been developed with reference to the criteria included in the COSMIN user manual [15] and used in two previous reviews [16, 17]. The revised checklist simplifies some of the stringent assessments required in the original COSMIN checklist, particularly related to assessment of content validity. Nine measurement properties were assessed for each included study: content validity, structural validity, internal consistency, cross-cultural validity and measurement invariance, reliability, measurement error, criterion validity, hypotheses testing for construct validity, and responsiveness. The criteria outlined in Table 2 were applied, with ratings of sufficient ( +), insufficient ( −) or indeterminate (?) provided for each psychometric property within each study.

Evidence synthesis

Following the appraisal of methodological quality and measurement properties for each study, an evidence synthesis was conducted for each PROM. First, we determined the overall evidence for each measurement property, categorising it as sufficient (+), insufficient (−), inconsistent (±), or indeterminate (?). A PROM was deemed to have sufficient or insufficient overall quality if more than 50% of individual studies were rated the same way; otherwise, it was rated as inconsistent. Next, we graded the quality of evidence for each measurement property based on the modified Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach. Each property was initially rated as high quality, with downgrades applied for risk of bias (inadequate or doubtful methodological quality), inconsistency (unexplained variability in results across studies), and indirectness (evidence derived from populations different from the target population). Imprecision was not considered a downgrade criterion since the total sample size across all studies exceeded 200. Additionally, no quality rating was assigned when a measurement property was classified as indeterminate.

As per the COSMIN guidelines [18], PROMs were considered Class A PROMs and able to be recommended for use if they had evidence for sufficient content validity (any level) AND at least low quality evidence for sufficient internal consistency. A PROM was considered a class C PROM and not able to be recommended for use if it had high quality evidence for an insufficient measurement property. Remaining PROMs were classified as Class B PROMS with potential to be recommended for use but require further research to assess quality.

Results

PROM identification

A total of 399 PROMs were identified and following removal of duplicates, 338 POMS were evaluated against inclusion criteria with 319 excluded for the following reasons disease or condition specific (n = 301), not self-reported (n = 2), not relevant for primary care setting (n = 15) or population specific (n = 1). Nineteen PROMs met inclusion criteria and were included in the review (see Table 1). Three PROMs had multiple versions available: the EQ-5D has both a 3 Level (EQ-5D-3L) and 5 Level (EQ-5D-5L) version; there were two versions of the Patient Activation Measure (a 13 item and 22 item version) and there were three versions of Patient Assessment of Chronic Illness Care (a 20 item, 13 item and 14 item version). Detailed characteristics of the PROMs and characteristics of the validation papers (study population and implementation) are outlined in the supplementary material.

Table 1

Summary of included PROMs by category (n = 19)

	PROM name	Abbreviations
Health-related quality of life	EuroQol-5 Dimension HowRU Patient reported outcome measurement system-29 Patient reported outcome measurement system-Global Health Short Form health survey-12 World Health Organisation quality of life Measure Yourself Medical Outcomes Profile	EQ-5D-3L, EQ-5D-5L HowRU PROMIS-29 PROMIS-10 SF-12 WHOQoL-Brief MYMOP
Enablement, activation, empowerment	Patient Activation Measure Patient Enablement Instrument	PAM-13, PAM-22 PEI
Outcomes or impact of care	Long term conditions questionnaire Outcomes related to Impact on Daily Living Primary care outcomes questionnaire	LTCQ ORIDL PCOQ
Functional status	Dartmouth COOP/COOP WONCA Charts	COOP
Health and disability	World Health Organisation Disability Assessment Schedule	WHODAS
Quality of care	Patient Assessment of Chronic Illness Care	PACIC-20, PACIC-13, PACIC-14

Key characteristics of eligible PROMs

Table 1 provides a summary of the key concepts measured by each PROM. Eight PROMs measured general health related quality of life, three measured enablement, activation, and/or empowerment, three measured outcomes or impact of care, and three measures quality of care. One PROM each examined functional status and health and disability.

Countries of development

Six PROMs were developed in the UK (HowRU, LTCQ, MYMOP, ORIDL, PEI, PCOQ), nine were developed in the USA (COOP, PAM-13/22 PACIC-13/14/20, PROMIS-29, PROMIS-10, SF-12) and four were developed across multiple countries (WHOQoL-BREF, WHO DA, EQ-5D-3L, EQ-5D-5L).

Target populations

Six PROMs were specifically developed for use in primary care (COOP, HowRU, MYMOP, ORIDL, PEI, PCOQ). Seven generic PROMs were developed for application across settings (EQ-5D-3L, EQ-5D-5L, PROMIS-29, PROMIS-10, SF-12, WHODAS, WHOQoLBrief) and six were developed to assess patient reported outcomes in people with long term or chronic conditions (LTCQ, PAM-13/22, PACIC-13/14/20).

Mode of administration, length and completion time

Most PROMs are administered as pen and paper surveys (n = 13). Six PROMs can be administered as a pen-and-paper survey or through interviewer administration (face-to-face or telephone). The PROMIS-29 and PROMIS-10 Global Health can be administered via pen and paper or electronically, while the PAM-13, PAM 22, EQ-5D-3L, EQ-5D-5L and SF-12 can be administered via pen and paper, interviewer or electronically. The shortest measures were the HowRU and MYMOP which each contained only 4 questions, and both used pictorial response scales.

Readability

Reading age was reported for three PROMs. The HowRU, which was the shortest included PROM, had the highest readability level with a Flesch-Kincaid Grade Level of 1.9 and a reading ease of 89 [19]. The EQ-5D-3L was reported to have “very easy” to “fairly easy” readability and a Flesch-Kincaid grade level of 4.2 [20]. The PAM-13 had the lowest readability level of the PROMs that reported this, with a Flesch-Kinkaid grade level of 8.2, and a reading age of 13 in a general population sample.

Methodological quality

Table 2 presents COSMIN risk of bias ratings for the methodological quality for the included studies reporting the psychometric properties of the 19 PROMs (very good, adequate, doubtful and inadequate). Most PROMs were rated as very good or adequate for development, with clear descriptions provided of the construct to be measured, the theoretical or conceptual origin of the construct, and the target population for which the PROM was developed. The MYMOP was rated as inadequate as the construct being measured was described only as a patient generated measure that evaluated symptoms and activity of daily living, and the sample used for development was not described. Content validity was rated as very good or adequate for nine PROMs, but inadequate for four PROMs (COOP Charts, HowRU, Measure Yourself Outcomes Profile and Outcomes related to Impact on Daily Living) because of lack of clear involvement of patients and/or professionals in item development. Of the remaining properties, the most frequently evaluated were construct validity (27 studies), structural validity (20 studies) and internal consistency (17 studies). Criterion validity was not rated for most studies due to lack of a ‘gold standard’ comparison for the included measures, according to the COSMIN definition [15]. The only exceptions were comparisons of the SF-12 with the SF-36, and the EQ-5D-5L with the EQ-5D-3L. Four studies reported cross cultural validity/measurement invariance, and two reported measurement error. The three included studies that reported the psychometric properties of the PAM-13/22 provided the most evidence for psychometric testing, able to be rated on eight out of ten properties.

Table 2

Summary of methodological quality (very good/adequate/doubtful/inadequate) and measurement properties (+ , −, ?^b) for each paper reporting on a shortlisted PROM (n = 19 PROMs reported in n = 32 studies)

PROM name	Prom development	Content validity	Structural validity	Internal consistency	Cross cultural/measurement invariance	Reliability	Measurement error	Criterion validity	Construct validity	Responsiveness
COOP charts Jenkinson [32]-9 item Kinnersley [30]-6 item Nelson [33]-9 item	Very good	Inadequate/?	0 0 0	0 0 0	0 0 0	0 Inadequate/? Doubtful/−	0 0 0	0 0 0	Adequate/− Doubtful/+ Very good/+	0 Doubtful/+ 0
EQ-5D Johnson [34]-3L Jannsen [35]-3L Zakershrak [36]-3L Santiago [37]-5L Jannsen [35]-5L	Very good	Adequate/+	0 0 Very good/+ Very good/+ 0	0 0 0 0 0	0 0 0 0 0	0 0 0 0 0	0 0 0 0 0	0 0 Very good/+ 0	Adequate/+ Adequate/− 0 Adequate/+ Adequate/−	0 0 0 0 0
HowRU Benson [19]	Adequate	Inadequate/?	Adequate/+	Very good/+	0	0	0	0	Very good/+	0
Long-term conditions questionnaire Potter [27] Batchelder [25]	Very good	Very good/+	Adequate/+ Very good/+	Very good/+ Very good/+	0 Very good/−	Very good/+ 0	0 0	0 0	Very good/+ Very good/+	0 0
Measure yourself outcomes profile Paterson [38]	Inadequate	Inadequate/?	0	0	0	0	0	0	Very good/−	Doubtful/?
Outcomes related to impact on daily living Reilly [39]	Doubtful	Inadequate/?	0	0	0	0	0	0	0	Very good/−
Patient activation measure Hibbard 2004 20-item [28] Hibbard 2005 13-item [40] Hung 13 item [26]	Very good	Very good/+	Very good/+ Very good/+ Very good/+	Very good/+ 0 Very good/+	0 0 Adequate/−	Adequate/+ 0 0	Adequate/+ Adequate/+ 0	0 0 0	Very good/+ Very good/+ Inadequate/−	Very good/? 0 0
Patient assessment of chronic illness care Glasgow [41]-20 item Taggart [42]-20 item Rick [43]-20 item Lambert [21]-14 item Gibbons [22]-13 item	Very good	Doubtful/−	Very good/+ Very good/+ Very good/− Very good/+ Very good/+	Very good/+ Very good/+ Very good/− Very good/− 0	0 0 0 Inadequate/− 0	Inadequate/− 0 0 0 Very good/+	0 0 0 0 0	0 0 0 0 0	Inadequate/+ Very good/? Inadequate/− 0 0	0 0 0 0 0
Patient enablement instrument Howie 1997 [44] Howie 1998 [45] Mead [46]	Very good	Doubtful/−	0 0 0	Very good/+ Very good/+ 0	0 0 0	0 0 0	0 0 0	0 0 0	Inadequate/− Very good/+ Adequate/?	0 0 0
PROMIS-29 Hays [47] Fischer [48] Cella	Very good	Very good/+	Very good/+ Very good/+ 0	Very good/+ Very good/+ 0	0 Very good/+ 0	0 0 0	0 0 0	0 0 Very good/+	Very good/+ Very good/+ Adequate/+	0 0 0
PROMIS-10 global health Hays[49]	Very good	Very good/+	Very good/−	0	0	0	0	0	Very good/+	0
Primary care outcomes questionnaire Murphy [31]	Very good	Very good/+	Adequate/+	Very good/+	0	0	0	0	Adequate/+	Inadequate/+
SF-12 Ware [29] Cheak-Zamora [50]	Very good	Adequate/+	0 0	0 Very good/+	0 0	Doubtful/+ Very good/−	0 0	Inadequate/+ 0	Very good/− Doubtful/−	0 0
WHODAS Andrews [51] Rhem [52]	Very good	Very good/+	Very good/+ Very good/+	0 Very good/+	0 0	0 0	0 0	0	0 Inadequate/?	0 0
WHOQoL-brief WHOQoL group [23] Skevington [24]	Very good	Very good/+	Very good/? Adequate/+	Very good/− Very good/−	0 Very good/?	Doubtful/− 0	0 0	0 0	Adequate/+ Adequate/?	0 0

0 = property not assessed

^aMethodological quality was rated for each study as very good, adequate, doubtful or inadequate in line with definitions provided by the Consensus-based Standards for the selection of health status Measurement Instruments (COSMIN) risk of bias checklist

^bMeasurement properties were rated using a revised version of the COSMIN criteria for good measurement properties using ratings of sufficient (+), insufficient (−) or indeterminate (?)

Measurement properties

Table 2 provides the COSMIN ratings for measurement properties for the included studies reporting the psychometric properties of the 15 PROMs (sufficient (+), insufficient (−) or indeterminate (?)). Nine PROMs had sufficient content validity (EQ-5D-3L/EQ-5D-5L, Long-term Conditions Questionnaire, PAM-13/22, PROMIS-29, PROMIS-10, Primary Care Outcomes Questionnaire, SF-12, WHODAS and WHOQoL- Brief). Four PROMs had indeterminate ratings as not enough information was provided to rate content validity (COOP Charts, HowRU, Measure Yourself Outcomes Profile, Outcomes related to Impact on Daily Living), and two PROMs had insufficient content validity as there was no clear involvement of patients and/or professionals in development of tool (Patient Assessment of Chronic Illness Care, Patient Enablement Instrument). Structural validity was sufficient for 17 of the 20 studies that examined this, except for the Patient Assessment of Chronic Illness tool where one study found insufficient evidence of structural validity for the 20-item version, and PROMIS-10. Twelve of the sixteen studies that reported internal consistency were rated as sufficient. Two studies reporting on the internal consistency of the PACIC-14 and PACIC-20 and two studies reporting on the internal consistency of the WHOQOL-BREF were rated as insufficient as Cronbach alphas did not meet threshold scores [21‐24]. Cross cultural validity/measurement invariance was rated as sufficient for the PROMIS-29, indeterminate for the WHOQoL-brief, and insufficient in the remaining three studies measuring this [21, 25, 26] as a result of differential item functioning being found for items. Evidence for sufficient test–retest reliability was found for four studies [22, 27‐29]. Construct validity was evaluated in 27 studies and was rated as sufficient in 15 of these. Only two studies reporting on the COOP charts [30] and the Primary Outcomes Questionnaire [31] demonstrated evidence of responsiveness.

Evidence synthesis

GRADE outcomes for PROMs overall, together with the overall ratings for measurement properties, are provided in Table 3. Six PROMs had evidence for sufficient content validity and at least low-quality evidence for sufficient internal consistency, meeting the COSMIN threshold as being able to be recommended for implementation. These were: the Patient Activation Measure (measuring patient activation and empowerment), PROMIS-29 and SF-12 (measuring health related quality of life), the Primary Care Outcomes Questionnaire and Long-term Conditions Questionnaire (measuring outcomes or impact of care), and the WHODAS (measuring health and disability). Seven PROMs (COOP Charts, HowRU, Measure Yourself Outcomes Profile, Outcomes related to Impact on Daily Living, Patient Assessment of Chronic Illness Care, Patient Enablement Instrument, WHOQoL-Brief) had evidence for insufficient or indeterminate content validity or insufficient internal consistency and were classified as Class C PROMs. The remaining two PROMs, the EQ-5D PROMIS-10, require further validity testing before they can be recommended for use.

Table 3

Quality of evidence overall for PROMs (n = 19)

PROM name	Structural validity	Internal consistency	Cross cultural/measurement invariance	Reliability	Measurement error	Criterion	Construct	Responsiveness	Overall rating
COOP charts	0	0	0	Very low/ ±	0	0	Low/+	Low/+	C
EQ-5D 3L 5L	High/+ High/+	0 0	0 0	0 0	0 0	High/+ 0	Moderate/ ± Moderate/ ±	0 0	B B
HowRU	Moderate/+	Moderate/+	0	0	0	0	High/+	0	C
Long-term conditions questionnaire	High/+	High/+	High/−	High/+	0	0	High/+	0	A
Measure yourself outcomes profile	0	0	0	0	0	0	High/−	?	C
Outcomes related to impact on daily living	0	0	0	0	0	0	0	High/−	C
Patient activation measure PAM-22 PAM-13	High/+ High/+	High/+ High/+	0 Moderate/−	Moderate/+ 0	Moderate/+ Moderate/+	0 0	High/+ Moderate/ ±	? 0	A A
Patient assessment of chronic illness care PACIC-20 PACIC-13 PACIC-14	High/ ± High/+ High/+	High/ ± 0 High/−	0 0 Very low/−	Very low/− High + 0	0 0 0	0 0 0	Very low/ ± 0 0	0 0 0	C C C
Patient enablement instrument	0	0	0	0	0	0	Moderate/ ±	0	C
PROMIS-29	High/+	High/+	High/+	0	0	High/+	High/+	0	A
PROMIS-10 global health	High/−	0	0	0	0	0	High/+	0	B
Primary care outcomes questionnaire	Moderate/+	Moderate/+	0	0	0	0	Moderate/+	Very low/+	A
SF-12	0	0	0	Moderate/ ±	0	Very low/+	Moderate/−	0	B
WHODAS	High/+	High/+	0	0	0	0	?	0	A
WHOQoL-brief	Moderate/+	Moderate/−	?	Low/−	0	0	Moderate/+	0	C

Each measurement property for a PROM was rated as having overall sufficient (+), insufficient (−), inconsistent (±) or indeterminate (0) evidence. Quality of evidence for each measurement property of each PROM was rated as high, moderate, low, or very low based on the guidelines from the modified Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach [18]. In line with COSMIN guidelines, internal consistency could not be rated higher in quality that structural validity. Internal consistency was only rated where structural validity was established

Discussion

This review critically appraised the psychometric properties of self‐reported generic PROMs applicable for routine clinical use in the primary healthcare setting in Australia. Of nineteen PROMs included, six had evidence of sufficient content validity and internal consistency per the COSMIN criteria and can be recommended for use in primary care. However, our findings highlight significant gaps in evidence about the psychometric properties of PROMs relevant for the primary care setting. No PROMs had evidence of psychometric validity on all ten criteria examined. For five PROMs, there was evidence of the psychometric properties from only a single validation study and for four PROMS there was evidence from only two validation studies. Cross-cultural validity/measurement invariance, responsiveness, and criterion validity have been assessed for very few PROMs, and only 6 PROMs had evidence of reliability. Failure to undertake robust psychometric evaluation may be unsurprising given the cost and time-consuming nature of scale development and the large sample sizes that are required. Nonetheless, given the increasing implementation of PROMs in healthcare as a means of achieving patient-centred care, it is critical that psychometrically sounds tools are available.

Although six PROMs met COSMIN criteria as Class A PROMs (recommended for use), further psychometric testing of these PROMs is still required. Responsiveness over time of the PAM-22 and PAM-13 needs to be established given some evidence of ceiling effects in other contexts [53‐55], as does differential item functioning for the PAM-22 [53]. Additional work is also needed to explore whether refinement would resolve differential item functioning for the PAM-13 [26]. The Long-term Conditions Questionnaire requires evidence for cross-cultural validity/measurement invariance and the minimally important clinical difference for Primary Care Outcomes Questionnaire change scores are not yet established. Four PROMs, the EQ-5D-3L, EQ-5D-5L, PROMIS-10 and SF-12, were designated Class B PROMs that require additional psychometric testing before they can be recommended for use. While widely used, the EQ-5D-3L, EQ-5D-5L, PROMIS-10 and SF-12 did not have any evidence for internal consistency. Establishing the internal consistency of these PROMs in the primary care setting is needed before they can be recommended for use. Healthcare providers selecting PROMs for use in primary care should consider these findings when choosing the most appropriate tools for their patient populations. Recommendations for healthcare providers include prioritising PROMs with strong psychometric evidence, such as those classified as Class A, while acknowledging the need for further validation of these tools in the primary care context. Caution should be exercised in using the EQ-5D-3L, EQ-5D-5L, PROMIS-10 and SF-12 in primary care until internal consistency is confirmed in primary care populations.

Challenges in implementing PROMs in primary care include the feasibility of use with the patient population. Other research also supports the feasibility of PROMs in the Australian healthcare context. In 2014/15, a pilot study evaluated implementation of the PAM-13 [56] with a convenience sample of 1490 people. It found that 97% of participants completed the PAM-13 within 5–10 min and 93% indicated no difficulty in answering questions. The PROMIS-29 is currently used in the public health sector in NSW to measure patient reported outcomes in hospitals. Additionally, how the captured data will be used should be carefully considered in deciding. A significant advantage of the SF-12 is that it is one of the few PROMs that can be used to measure both health-related quality of life and health state utility value.

Despite different ways of conceptualising health among Indigenous populations, we identified no PROMs developed specifically for use with Indigenous populations and only one study included in the review examined the psychometric properties of a PROM (the EQ-5D) with an Aboriginal and/or Torres Strait Islander sample. Perceptions of health, wellbeing and what is an important outcome of healthcare varies across cultures [57]. While work is currently underway by the New South Wales (NSW) Agency of Clinical Innovation to examine the cultural appropriateness and validity of the PROMIS-29 for assessing health outcomes amongst Aboriginal and Torres Strait Islander people with diabetes in NSW [58], the use of existing general PROMs may not be suitable given most are underpinned by a western cultural perspective and traditional biomedical perceptions of health and wellbeing [57, 59]. There is a need for further work to establish a culturally appropriate and meaningful PROM for use with Aboriginal and Torres Strait Islander people in Australia. A 2019 systematic review identified domains of wellbeing relevant to Aboriginal and Torres Strait Islander Australians for measuring quality of life [60]. These included autonomy, empowerment and recognition; family and community; culture, spirituality and identity; Country; basic needs; work, roles, and responsibilities; education; physical health; and mental health. These identified domains are important to consider in any future work developing an Aboriginal and/or Torres Strait Islander specific PROM. PROMs should be developed with Indigenous knowledges, methodologies and methods [59].

Strengths and limitations

This review provides a detailed summary of the methodological and measurement properties of PROMs available to assess patient reported outcomes in primary health care. While rigorous COSMIN methodology was utilised to undertake the review and make recommendations, the findings should be considered with regard to several limitations. Recommendations about suitable PROMs needed to be provided in a short time frame, which limited the feasibility of undertaking a primary review of the literature. As a result, potentially eligible PROMs were identified from an existing recently published review that used a comprehensive search of the academic and grey literature, supplemented by our own searches. This pragmatic approach may have led to the exclusion of some recently published PROMs that were not identified in the existing review. Additionally, the reliance on secondary data introduces the possibility of overlooking relevant studies or variations in PROM performance missed in the first review, which could have been captured through a more exhaustive primary review process. Finally, studies were restricted to those that implemented English language versions of PROMs. While psychometric performance may differ by language and across cultures [61], this limits the generalisability of our findings to non-English speaking populations. Additional research is needed to assess the psychometric properties of PROMs published in languages other than English, which would provide a more comprehensive understanding of their cross-cultural applicability and performance.

Conclusions

The use of psychometrically sound and culturally appropriate PROMs in primary care is essential for achieving patient-centred care. Six PROMs measuring patient activation and empowerment, health related quality of life, outcomes of care, and health and disability met COSMIN criteria and can be recommended for use in primary care, although all require additional psychometric testing. These findings emphasise the need for future work to develop high quality validated PROMs that can be integrated into routine clinical practice across primary care settings. Selecting a PROM for implementation in primary care requires consideration of psychometric properties as well as feasibility and patient relevance. Further work is needed to develop a culturally appropriate and meaningful PROM for use with Aboriginal and Torres Strait Islander people in Australia that resonates with their unique health experiences, cultural values, and worldviews. Policymakers and healthcare providers should prioritise supporting the development, validation, and integration of PROMs to improve patient outcomes and care delivery.

Acknowledgements

The authors would like to acknowledge Georgia Spanner for research assistance in completing the review. We would also like to acknowledge the work of the Review Advisory Group that guided this work: Kylie Armstrong, Sandy Gillies, Rhonda Fleming, Leanne Mullan, Alistair MacDonald, Trish Leddington-Hill, Karen Hale-Robertson, Edie Stevens, Philippa Hawke, Candice Crawford and Geoff Clarke.

Declarations

Competing interests

The authors declare that they have no competing interests.

Ethical approval

Not applicable due to use of secondary data in public domain.

Not applicable.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Onze productaanbevelingen

BSL Podotherapeut Totaal

Binnen de bundel kunt u gebruik maken van boeken, tijdschriften, e-learnings, web-tv's en uitlegvideo's. BSL Podotherapeut Totaal is overal toegankelijk; via uw PC, tablet of smartphone.

Meer informatie

Higgins, J. P. T., & Greene, S. (2021). Cochrane handbook for systematic reviews of interventions. The Cochrane Collaboration.

U.S. Department of Health and Human Service. Food and Drug Administration. (2019). Guidance for industry: Patient-reported outcome —Use in medical product development to support labeling claims. US Department of Health and Human Service: Food and Drug Administration.

Black, N. (2013). Patient reported outcome measures could help transform healthcare. British Medical Journal, 346, 167.CrossRef

Ishaque, S., Karnon, J., Chen, G., Nair, R., & Salter, A. B. (2019). A systematic review of randomised controlled trials evaluating the use of patient-reported outcome measures (PROMs). Quality of Life Research, 28, 567–592.PubMedCrossRef

Kotronoulas, G., Kearney, N., Maguire, R., Harrow, A., Di Domenico, D., & Croy, S. (2014). What is the value of the routine use of patient-reported outcome measures toward improvement of patient outcomes, processes of care, and health service outcomes in cancer care? A systematic review of controlled trials. Journal of Clinical Oncology, 32, 1480–1501.PubMedCrossRef

Nelson, E. C., Eftimovska, E., Lind, C., Hager, A., Wasson, J. H., & Lindblad, S. (2015). Patient reported outcome measures in practice. British Medical Journal Open, 350, 7818.CrossRef

Hawkins, M., Elsworth, G. R., & Osborne, R. H. (2018). Application of validity theory and methodology to patient-reported outcome measures (PROMs): building an argument for validity. Quality of Life Research, 27, 1695–1710.PubMedPubMedCentralCrossRef

Churruca, K., Pomare, C., Ellis, L. A., Long, J. C., Henderson, S. B., Murphy, L., Leahy, C. J., & Braithwaite, J. (2021). Patient-reported outcome measures (PROMs): A review of generic and condition-specific measures and a discussion of trends and issues. Health Expectations, 24(4), 1015–1024.PubMedPubMedCentralCrossRef

Brower, K., et al. (2021). The use of patient-reported outcome measures in primary care: Applications, benefits and challenges. Journal of Patient-Reported Outcomes, 5(2), 84.PubMedPubMedCentralCrossRef

10.

Department of Health (2024). Primary Health Network background. Available from: https://www.health.gov.au/our-work/phn.

11.

Australian Commission on Safety and Quality in Healthcare, (2024). List of Generic PROMs. Available from: https://www.safetyandquality.gov.au/our-work/indicators-measurement-and-reporting/patient-reported-outcomes/proms-lists/list-generic-proms

12.

Agency for Clinical Innovation (2024). Resources for clinicians and patients. Available from: https://aci.health.nsw.gov.au/statewide-programs/prms/resources

13.

Murphy, M., Hollinghurst, S., & Salisbury, C. (2019). Identification, description and appraisal of generic PROMs for primary care: A systematic review. BMC Family Practice, 19, 41.CrossRef

14.

Mokkink, L. B., et al. (2019). COSMIN risk of bias checklist for systematic reviews of patient-reported outcome measures. Quality of Life Research, 27(5), 1171–1179.CrossRef

15.

Mokkink, L.B., et al., (2018). COSMIN methodology for systematic reviews of patient‐reported outcome measures (PROMs). Avaiable from: https://cosmin.nl/wp-content/uploads/COSMIN-syst-review-for-PROMs-manual_version-1_feb-2018.pdf.

16.

Gartner, F. R., et al. (2018). The quality of instruments to assess the process of shared decision making: A systematic review. PLoS ONE, 13(2), e0191747.PubMedPubMedCentralCrossRef

17.

Bull, C., et al. (2019). A systematic review of the validity and reliability of patient-reported experience measures. Health Services Research, 54(5), 1023–1035.PubMedPubMedCentralCrossRef

18.

Prinsen, C. A. C., et al. (2018). COSMIN guideline for systematic reviews of patient-reported outcome measures. Quality of Life Research, 12(27), 1147–1157.CrossRef

19.

Benson, T., et al. (2010). Evaluation of a new short generic measure of health status: HowRu. Informatics in Primary Care, 18(2), 89–101.PubMed

20.

Paz, S. H., et al. (2009). Readability estimates for commonly used health-related quality of life surveys. Quality of Life Research, 18(7), 889–900.PubMedPubMedCentralCrossRef

21.

Lambert, S., et al. (2021). Using confirmatory factor analysis and Rasch analysis to examine the dimensionality of the patient assessment of care for chronic illness care (PACIC). Quality of Life Research, 30(5), 1503–1512.PubMedCrossRef

22.

Gibbons, C. J., et al. (2017). The patient assessment of chronic illness care produces measurements along a single dimension: Results from a Mokken analysis. Health and Quality of Life Outcomes, 15(1), 61.PubMedPubMedCentralCrossRef

23.

The WHOQOL Group. (1998). Development of the World Health Organization WHOQOL-BREF quality of life assessment. Psychological Medicine, 28(3), 551–558.CrossRef

24.

Skevington, S. M., et al. (2004). The World Health Organization’s WHOQOL-BREF quality of life assessment: Psychometric properties and results of the international field trial. A report from the WHOQOL group. Quality of Life Research, 13(2), 299–310.PubMedCrossRef

25.

Batchelder, L., et al. (2020). Rasch analysis of the long-term conditions questionnaire (LTCQ) and development of a short-form (LTCQ-8). Health and Quality of Life Outcomes. https://doi.org/10.1186/s12955-020-01626-3CrossRefPubMedPubMedCentral

26.

Hung, M., et al. (2013). Psychometric assessment of the patient activation measure short form (PAM-13) in rural settings. Quality of Life Research, 22(3), 521–529.PubMedCrossRef

27.

Potter, C. M., et al. (2017). Long-term conditions questionnaire (LTCQ): Initial validation survey among primary care patients and social care recipients in England. British Medical Journal Open, 3(7), e019235.

28.

Hibbard, J. H., et al. (2004). Development of the patient activation measure (PAM): Conceptualizing and measuring activation in patients and consumers. Health Services Research, 39(4 Pt 1), 1005–1026.PubMedPubMedCentralCrossRef

29.

Ware, J. E., Jr., & Sherbourne, C. D. (1992). The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Medical Care, 30, 473–483.PubMedCrossRef

30.

Kinnersley, P., Peter, T., & Stott, N. (1994). Measuring functional health status in primary care using the COOP-WONCA charts: Acceptability, range of scores, construct validity, reliability, and sensitivity to change. British Journal of General Practice, 44, 545–549.

31.

Murphy, M., et al. (2018). Primary care outcomes questionnaire: Psychometric testing of a new instrument. British Journal of General Practice, 68(671), e433–e440.CrossRef

32.

Jenkinson, C., et al. (2002). Evaluation of the Dartmouth COOP charts in a large-scale community survey in the United Kingdom. J Pub Health Med, 24(2), 106–111.CrossRef

33.

Nelson, E. C., et al. (1990). The functional status of patients. How can it be measured in physicians’ offices? Medical Care, 28(12), 1111–1126.PubMedCrossRef

34.

Johnson, J. A., & Coons, S. J. (1998). Comparison of the EQ-5D and SF-12 in an adult US sample. Quality of Life Research, 7(2), 155–166.PubMedCrossRef

35.

Janssen, M. F., et al. (2013). Measurement properties of the EQ-5D-5L compared to the EQ-5D-3L across eight patient groups: A multi-country study. Quality of Life Research, 22(7), 1717–1727.PubMedCrossRef

36.

Zakershahrak, M., et al. (2022). Psychometric properties of the EQ-5D-3L in South Australia: A multi-method non-preference-based validation study. Current Medical Research and Opinion, 38(5), 673–685.PubMedCrossRef

37.

Ribeiro Santiago, P. H., et al. (2021). Psychometric properties of the EQ-5D-5L for aboriginal Australians: A multi-method study. Health and Quality of Life Outcomes, 19(1), 81.PubMedPubMedCentralCrossRef

38.

Paterson, C. (1996). Measuring outcomes in primary care: A patient generated measure, MYMOP, compared with the SF-36 health survey. British Medical Journal Open, 312(7037), 1016–1020.CrossRef

39.

Reilly, D., et al. (2007). Outcome related to impact on daily living: Preliminary validation of the ORIDL instrument. BMC Health Services Research, 7, 139.PubMedPubMedCentralCrossRef

40.

Hibbard, J. H., et al. (2005). Development and testing of a short form of the patient activation measure. Health Services Research, 40(6 Pt 1), 1918–1930.PubMedPubMedCentralCrossRef

41.

Glasgow, R. E., et al. (2005). Development and validation of the patient assessment of chronic illness care (PACIC). Medical Care, 43, 436–444.PubMedCrossRef

42.

Taggart, J., Chan, B., et al. (2011). Patients assessment of chronic illness care (PACIC) in two Australian studies: Structure and utility. Journal of Evaluation in Clinical Practice, 17(2), 215–221.PubMedCrossRef

43.

Rick, J., et al. (2012). Psychometric properties of the patient assessment of chronic illness care measure: Acceptability, reliability and validity in United Kingdom patients with long-term conditions. BMC Health Services Research, 12, 293.PubMedPubMedCentralCrossRef

44.

Howie, J. G., Heaney, D. J., & Maxwell, M. (1997). Measuring quality in general practice. Pilot study of a needs, process and outcome measure. Occasional Paper (Royal College of General Practitioners), 75, 1–32.

45.

Howie, J., et al. (1998). A comparison of a patient enablement instrument (PEI) against two established satisfaction scales as an outcome measure of primary care consultations. Family Practice, 15(2), 165–171.PubMedCrossRef

46.

Mead, N., Bower, P., & Roland, M. (2008). Factors associated with enablement in general practice: Cross-sectional study using routinely-collected data. British Journal of General Practice, 58(550), 346–352.CrossRef

47.

Hays, R. D., et al. (2018). PROMIS®-29 v2.0 profile physical and mental health summary scores. Quality of Life Research, 27(7), 1885–1891.PubMedPubMedCentralCrossRef

48.

Fischer, F., et al. (2018). Measurement invariance and general population reference values of the PROMIS profile 29 in the UK, France, and Germany. Quality of Life Research, 27(4), 999–1014.PubMedCrossRef

49.

Hays, R. D., et al. (2009). Development of physical and mental health summary scores from the patient-reported outcomes measurement information system (PROMIS) global items. Quality of Life Research, 18(7), 873–880.PubMedPubMedCentralCrossRef

50.

Cheak-Zamora, N. C., Wyrwich, K. W., & McBride, T. D. (2019). Reliability and validity of the SF-12v2 in the medical expenditure panel survey. Quality of Life Research, 18(6), 727–735.CrossRef

51.

Andrews, G., et al. (2009). Normative data for the 12 item WHO disability assessment schedule 2.0. PLoS ONE, 4(12), e8343.PubMedPubMedCentralCrossRef

52.

Rehm, J., & Ustun, T. B. (1999). On the development and psychometric testing of the WHO screening instrument to assess disablement in the general population. International Journal of Methods in Psychistric Research, 8(2), 110.CrossRef

53.

Lightfoot, C. J., Wilkinson, T. J., Memory, K. E., Palmer, J., & Smith, A. C. (2021). Reliability and validity of the patient activation measure in kidney disease: Results of Rasch analysis. Clinical Journal of the American Society of Nephrology, 16(6), 880–888.PubMedPubMedCentralCrossRef

54.

Moljord, I. E. O., Lara-Cabrera, M. L., Perestelo-Pérez, L., Rivero-Santana, A., Eriksen, L., & Linaker, O. M. (2015). Psychometric properties of the patient activation measure-13 among out-patients waiting for mental health treatment: A validation study in Norway. Patient Education and Counseling, 98(11), 1410–1417.PubMedCrossRef

55.

Soejima, T., & Kitao, M. (2023). Adaptation and measurement invariance of the 13-item version of patient activation measure across Japanese young adult cancer survivors during and after treatment: A cross-sectional observational study. PLoS ONE, 18(9), e0291821.PubMedPubMedCentralCrossRef

56.

South Eastern Sydney Medicare Local. (2015). PAM: Measuring patient activation in the South Eastern. South Eastern Sydney Medicare Local.

57.

Kite, E., & Davy, C. (2015). Using indigenist and Indigenous methodologies to connect to deeper understanding of aboriginal and Torres strait Islander peoples’ quality of life. Health Promotion Journal of Australia, 26, 191–194.PubMedCrossRef

58.

Burgess, A., Hawkins, J., Kostovski, C., & Duncanson, K. (2022). Assessing cultural appropriateness of patient-reported outcome measures for aboriginal people with diabetes: Study protocol. Public Health Res Pract, 32(1), e31122105.CrossRef

59.

Ryder, C., et al. (2022). Community engagement and psychometric methods in aboriginal and Torres strait islander patient-reported outcome measures and surveys—A scoping review and critical analysis. International Journal of Environmental Research and Public Health, 19(16), 10354.PubMedPubMedCentralCrossRef

60.

Butler, T. L., et al. (2019). Aboriginal and Torres strait Islander people’s domains of wellbeing: A comprehensive literature review. Social Science & Medicine, 233, 138–157.CrossRef

61.

Saxena, S., Carlson, D., & Billington, R. (2001). The WHO quality of life assessment instrument (WHOQOL-Bref): The importance of its items for cross-cultural research. Quality of Life Research, 10(8), 711–721.PubMedCrossRef

Titel: Psychometric properties, and cultural appropriateness, of patient reported outcome measures for use in primary healthcare: a scoping review
Auteurs: Christopher M. Doran
Jamie Bryant
Erika Langham
Roxanne Bainbridge
Anthony Shakeshaft
Breanne Hobden
Sara Farnbach
Megan Freund
Publicatiedatum: 28-03-2025
Uitgeverij: Springer New York
Gepubliceerd in: Quality of Life Research
Print ISSN: 0962-9343
Elektronisch ISSN: 1573-2649
DOI: https://doi.org/10.1007/s11136-025-03956-5

Bohn Stafleu van Loghum

Welkom bij Erasmus MC & Bohn Stafleu van Loghum

Registreer

Login

Deel dit onderdeel of sectie (kopieer de link)

Abstract

Purpose

Methods

Results

Conclusion

Publisher's Note

Plain English summary

Background

Methods

Review advisory group

Identification of potential PROMs for inclusion

Inclusion and exclusion criteria

Data extraction

Assessment of methodological quality and psychometric characteristics

Appraisal of study methodological quality

Appraisal of PROM measurement properties

Evidence synthesis

Results

PROM identification

Key characteristics of eligible PROMs

Countries of development

Target populations

Mode of administration, length and completion time

Readability

Methodological quality

Measurement properties

Evidence synthesis

Discussion

Strengths and limitations

Conclusions

Acknowledgements

Declarations

Competing interests

Ethical approval

Consent for publication

Publisher's Note

Deel dit onderdeel of sectie (kopieer de link)

Onze productaanbevelingen

BSL Podotherapeut Totaal