Introduction
The introduction of antiretroviral therapies (ARTs) and highly active ARTs has greatly reduced morbidity and mortality for people with HIV (PWH), with marked improvements in life expectancy seen since 1996 [
1]. ARTs, however, are currently unable to cure PWH, instead suppressing the viral load (VL) without complete elimination of the virus [
2]. Therefore, increasingly higher proportions of individuals now live with chronic HIV infection compared with those with severe morbidity associated with a significantly reduced life expectancy [
3].
The World Health Organization’s strategy on HIV is to end acquired immunodeficiency syndrome (AIDS) and achieve universal health coverage and healthy lives and well-being for all ages by 2030 [
4]. Additionally, it has been advocated that at least 90% of PWH with VL suppression maintain a good health-related quality of life (HRQoL) [
5]. As such, HRQoL in PWH is an important outcome when assessing the overall benefits of ARTs. HRQoL can be operationalised within modelling-based economic evaluations by quantifying health state utility values (HSUVs). HSUVs represent a quantified value of the preference for different health states on a cardinal scale, often referred to as a quantification of HRQoL. Therefore, it is possible to suggest to what extent a specific health state is potentially ‘preferred’ to another based on the HSUV. HSUVs are used to inform cost-effectiveness analyses (CEAs) and are a key component of economic decision making. Additionally, HSUVs can be used to inform and guide clinical priorities [
6] and policy development; however, while this is possible, the use of HSUVs in this manner needs to be considered against a range of other relevant information, such as life expectancy as well as equality and equity considerations, among political, socioeconomic, and cultural requirements and expectations [
6].
HSUVs can be obtained via direct elicitation of an individual’s preference for different health states using elicitation techniques like the standard gamble (SG) or time trade-off (TTO) techniques. Alternatively, such outcomes can be indirectly obtained via preference-based measures (PBMs); i.e. health-related patient-reported outcome measures, such as EuroQol’s EQ instruments (e.g. EQ-5D) or the Short-Form 6 Dimensions (SF-6D) [
7]. Overall, these preferences represent HRQoL on a cardinal scale, commonly anchored between 0 (dead) and 1 (perfect health) [
8]. While some PBMs may be less responsive to HRQoL changes and adverse events (AEs) during ART than HIV-specific measures [
9], the shorter and easier administration of generic PBMs makes them a valuable method for assessing HRQoL, and tools such as EQ-5D have shown validity in PWH [
10].
HSUVs can be combined with survival time to estimate quality-adjusted life-years (QALYs) [
7]. QALYs represent morbidity and mortality in a single metric as a key health-related outcome for CEAs. CEA is commonly recommended by Health Technology Assessment (HTA) agencies to provide ‘value-for-money’ evidence of new healthcare technologies and interventions compared with any alternative to guide efficient allocation of finite healthcare resources. Although both direct and indirect preference-based methods are potentially suitable for obtaining HSUVs, the latter using PBMs where preferences are based on a representative sample of society (as opposed to only individuals with the health condition of interest) are often preferred by HTA agencies internationally (such as the National Institute for Health and Care Excellence [NICE] for England and Wales) [
11].
Previous systematic reviews have provided an overview of methods for estimating HRQoL or HSUVs for specific health states in PWH [
12‐
14]. There is, however, still limited evidence on preference-based estimates for PWH. Additionally, HSUVs for all relevant health states required to represent the disease or care pathway of PWH in cost-effectiveness models have not yet been comprehensively identified and critiqued, particularly for newer treatments (e.g. atazanavir). Therefore, there is an opportunity to identify and collate recent evidence on HSUVs to improve understanding and quantification of HRQoL across different health states for PWH that can help inform economic modelling and development of future studies assessing new treatments in PWH.
This de novo systematic literature review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidance [
15], alongside recommendations by The Professional Society for Health Economics and Outcomes Research (ISPOR) [
16], the Centre for Reviews and Dissemination (CRD) [
17], and the NICE Decision Support Unit [
18]. Our aim was to systematically consolidate the wealth of information on heath utilities in PWH, particularly for new ARTs, through identifying, appraising, and collating up-to-date evidence. Additionally, this review presents information to assist with understanding the context and nature of the HSUV, including how these values represent preference-based HRQoL related to the underlying study population and setting.
Methods
The protocol was pre-registered on PROSPERO (CRD42022346286) [
19].
Eligibility criteria
We included interventional (randomised controlled trials [RCTs] and non-randomised trials) and observational (cohort, cross-sectional, case–control) studies with full-text articles published since 2000, with no geographic restrictions. Systematic reviews, case series, case reports, editorials, and cost-effectiveness studies were excluded.
Studies needed a well-defined population or subgroup of adolescents (14–17 years) or adults (≥ 18 years) with a reported HIV diagnosis. Interventions of interest were ARTs (any formulation) administered as first- or second-line therapy, or as a treatment switch. Additionally, treatment-naïve PWH were included where studies also had a comparator arm of relevant ART interventions. Studies reporting HSUVs for PWH with co-infections or comorbidities receiving ARTs were included if adjustments were made for co-existing conditions or concomitant treatments. Treatments for HIV-related co-infections, complications or adverse effects, non-pharmacological treatments, and complementary or alternative management were excluded.
The primary outcome of interest was HSUVs obtained either directly or indirectly, reported as point estimates alongside distributional statistics and p-values (when reported and relevant). Data relating to the entire study population or subset based on health states, including stage of infection, treatment status, or pre-specified clinical parameters (i.e. CD4 count and VL) were included. Additional outcomes of interest were disutilities, as coefficients from regression models when the outcome of interest was a utility value, which are also useful to explore the relationship between population characteristics and HSUVs. Relevant estimates reported by mapping studies were considered if the search produced limited data of relevance. Subsequently, due to the volume of available data from directly elicited estimates, estimates derived from mapping (i.e. studies reporting on non-preference-based methods of HRQoL) were not included. HRQoL obtained from proxies (i.e. clinicians, carers) or vignettes, or measures of person satisfaction were also excluded.
Search strategy
Bibliographic database searches were initially conducted on 27 June 2022 in MEDLINE, EMBASE, Cochrane Library, NHS Economic Evaluation Database (NHS EED), International Network of Agencies for Health Technology Assessment (INAHTA), Epistemonikos, and Clinicaltrials.gov. A publication year limit from 2000 onwards was applied to capture studies reflecting current medical practice management of HIV in PWH. Search terms included subject headings and words that represented HIV infection, antiretroviral treatment, HRQoL, and HSUVs. The MEDLINE search strategy (Supplementary File 1) was adapted for other databases. Additionally, a Google Scholar alert with the search terms ‘health utilities’ in ‘people living with HIV’ was set up from June 2022 to February 2023 to identify publications following the initial search. No additional relevant publications were identified following a Google Scholar search on 26 March 2024.
Supplementary searches included checking reference lists of potentially relevant systematic reviews, cost-effectiveness studies, and included papers. Authors of potentially relevant conference abstracts were contacted when feasible.
Selection process
Based on pre-specified eligibility criteria, study selection was undertaken by two reviewers in a three-stage process. Firstly, both reviewers examined titles of retrieved articles and excluded duplicate articles or records that did not meet the agreed criteria. Secondly, reviewers independently examined titles and abstracts of a mutually exclusive set of remaining records. Early in this stage, both reviewers checked each other’s selection decisions to ensure consistency. Differences were discussed and agreed between reviewers for subsequent examination of records. If an agreement could not be reached, then a third researcher who was an expert in utility measurement was consulted by the reviewers to achieve a resolution.
Full-text articles were obtained and split into half for detailed examination and checked for relevance by each reviewer. Afterwards, both reviewers discussed and validated each other’s decisions. Uncertainties and discrepancies were resolved in consultation with the team’s utility measurement and health economics expert.
Data extraction and assessment
Using a bespoke Microsoft Excel data extraction form, one reviewer extracted data from included studies. Items related to all extracted data were checked by the second reviewer. Differences and inconsistencies were resolved by discussion between the researchers. Abstracted data items included study design, study period, follow-up period, inclusion and exclusion criteria of study population, classification of HIV stage, CD4 count, HIV-RNA VL, antiretroviral treatment (previous and ongoing) and HSUV measurement information.
In the absence of a standardised approach for assessing the methodological quality of the health utilities literature, an 8-item review-specific quality assessment tool was developed and utilised in line with recommendations from the ISPOR Task Force quality assessment criteria for HSUVs in cost-effectiveness models. Items for assessment of study quality related to the following: (1) recruitment and selection of participants; (2) sample size of study; (3) response rate; (4) length of follow-up; (5) HSUs elicitation methods; (6) source of preference weights; (7) loss to follow-up; and (8) reported variance of HSUs (as a proxy for precision of reported estimates) (Supplementary File 1) [
16]. Studies were considered as ‘high’, ‘moderate’, or ‘low’ quality if there were ≥ 6, 4–5, and ≤ 3 ‘yes’ responses, respectively. In some cases, a study could be rated ‘yes’ or ‘no’. For example, when examining the item, ‘acceptable response rate ≥ 60% for HSU measurement?’, a study reporting HSU data for the entire study population (if 100%) and subgroup relating to a defined health state (if < 60%) was rated ‘yes’ or ‘no’. Based on the cumulative counts of responses, some studies were assigned dual ratings. Preliminary independent quality assessment and grading were completed by two members of the review team. Revised criteria were agreed following discussion and input from an expert health economist. Subsequently, quality assessment and grading were undertaken by one researcher and checked by a second researcher.
The review sought to identify HSUVs in PWH to inform cost-effectiveness models of treatments in various reimbursement settings. For pragmatic reasons, the authors agreed to use recommendations from NICE [
20] and the ISPOR Task Force [
16] for the grading of studies. The grading method reflected the approach reported by Cooper 2020 [
21], (Supplementary File 1). Items assessed included the methodological quality of the study, representativeness of the study population and/or health states and the appropriateness of HSU data for cost-effectiveness modelling. Studies were classified according to whether the study met all criteria of NICE (noting NICE’s perspective is for England and Wales, with an acceptable broader remit of the UK) [
20] with no concerns (Grade 1), met most but not all criteria with some concerns (Grade 2), or did not meet the criteria (Grade 3). NICE’s perspective was used to represent a specific HTA agency’s perspective for grading HSUVs for use in decision-analytic models as it would be a substantial task to represent the perspective of all HTA agencies internationally [
22,
23]. Preliminary independent quality assessment and grading were completed by two members of the review team. Revised criteria were agreed following discussion and input from an expert health economist. Subsequently, quality assessment and grading were undertaken by one reviewer and checked by a second reviewer.
Data synthesis
Due to the methodological and clinical heterogeneity of included studies, it was not appropriate to undertake a meta-analysis. Available data are presented in narrative and tabular summaries.
Discussion
This systematic literature review highlights that there is an extensive catalogue of HSUVs derived from studies in PWH. Identified studies had a wide geographic scope and included cross-sectional and longitudinal data. The most frequently reported estimates were from the EQ-5D-3L and EQ-5D-5L, compared with direct measures, which aligns with HTA agency recommendations who frequently prefer indirect measures [
11]. The suitability of HSUV estimates for use in any given decision-analytic economic model depends on a range of factors, depending on what the model is intended to represent and to whom the evidence is intended to inform for any given decision problem [
16]. For example, considerations include the intended population/study sample, specific health conditions and states across a pre-specified disease and care pathway, and the extent to which any given modeller and decision-making body is willing to trade off bias and validity for more information/evidence to inform the decision problem. Although most included studies were regarded as high-quality based on ISPOR Task Force guidance [
16], the NICE-based grading criteria suggested most did not meet the criteria as being suitable for NICE [
20]. In many cases this was because the HSUV did not represent the UK population as the predominant jurisdiction of NICE; however, the same HSUVs could be appropriate for other HTA agencies. It was not possible to provide such grading for all HTA agencies internationally given their relative different jurisdictions, scopes, and preferences [
22,
23].
As the outputs from any decision-analytic model are dependent on the inputs and imposed model structure, it is not possible to suggest a one-size-fits-all conclusion that any identified HSUV is suitable for every decision-analytic model. The range of HSUVs and complementary information provided within this compendium is intended to enable a well-informed decision about choosing a HSUV based on its origins, strengths, and limitations.
A predominance of cross-sectional studies was evident. This may present challenges as cross-sectional data only provide a snapshot of an individual’s HRQoL at a specific timepoint and thus a static perspective. This can limit any inferences made regarding the dynamics or causality of HSUVs. Comparatively higher HSUVs were reported across most measures for PWH subgroups who were suppressed rather than in the viraemic or AIDS health states. This finding supports that of an earlier meta-analysis that demonstrated that PWH who are viraemic or have AIDS reported a decrease of 0.017 and 0.173 versus PWH with suppressed VL load when adjusting for differences in study characteristics [
14]. Additionally, we found that PWH who received fewer ART lines, had no comorbidities, and had higher CD4 counts or lower VL reported higher HSUV estimates across all measures. However, there was variation in HSUV estimates across studies reporting on PWH with similar health states, which may be explained by differences in the participants enrolled and/or cultural and societal settings, among other measured (and unmeasured) factors reported and explored within the relevant study.
Overall, the longitudinal analyses indicated an improving trend in HRQoL over time for PWH who received ART regardless of whether PWH were ART-naïve or -experienced, although PWH who were ART-experienced reported lower baseline HSUV estimates. Notably, heavily treatment-experienced individuals who received active treatment demonstrated significant improvement in HRQoL versus placebo-treated individuals, and comparable findings were seen in both primary studies and secondary analyses from randomised and non-randomised studies. However, although RCTs may be a good source of causal evidence, this is specifically related to differences between randomised treatment arms and assumes that other biases (e.g. information bias due to missing data) have been controlled appropriately. Also, any causal estimates from non-randomised studies are dependent on the hypothesised causal pathways (e.g. as depicted within directed acyclic graphs) and appropriate analyses to account for pertinent forms of bias (e.g. confounding and selection bias) aligned with the data-generating mechanism (e.g. study design/data collection). Therefore, the causality of such estimates should be judged on the nature, conduct, and analysis of the study, which is not fully explored nor reported in this systematic review. In other disease areas the use of registries that are linked with electronic health records is a good source for longitudinal studies. However, owing to greater caution in handling electronic health records of PWH because of potential concerns around confidentiality of patients’ HIV status [
78], it may not be suitable to use this approach with HIV registries. As the majority of identified studies were cross-sectional, future research should look at further longitudinal analyses to help elucidate the impact of ART and HIV on individuals’ quality of life and the nature of how HSUVs in PWH change over time.
As already stated, one major reason for the finding that few studies provided HSUVs in line with NICE’s recommendations for informing cost-effectiveness models [
16] was that the study populations of PWH in this review were largely unrepresentative of the UK population. A considerable number of studies were conducted in countries that could be considered as culturally and possibly economically diverse from the UK. It has been shown that regional cultural and economic considerations potentially influence country-specific value sets [
79]. This highlights why HTA agencies have a preference for their own country-specific value set. This also explains why many utility values were Grade 3 in this analysis as the NICE preferred value set was not used, emphasising the need for appropriate selection of value sets for specific HTA agencies. The extent to which regional and cultural factors are fully represented within country-specific value sets, and their influence on estimated HSUVs, due to either preference-based measure internal or external factors, should be areas for future research.
Additionally, utility measures and/or preference weights were not always those endorsed by NICE [
20]. In some instances, binary ratings were assigned for study quality and appropriateness of HSUV data. Using this approach provides transparency and demonstrates rigour in the assessment process compared with a method of upgrading or downgrading decisions. Analysts can undertake subsequent grading using adapted criteria relevant to a chosen reimbursement agency to identify the most appropriate for their purposes.
Extensive heterogeneity was noted in cross-sectional data. Likely sources of clearly recognisable heterogeneity in the available evidence could be explained by the choice of measure and value set used [
80]. It has been shown that the utilities of health states have considerable variation between countries, with an analysis of six EQ-5D-5L value sets demonstrating a median difference of 0.315 in health states between the countries with the highest and lowest index [
81]. Differences were also seen when analysing changes from one health state to another, with some countries valuing the change in different directions. For example, a change from a health state of 4 (severe problems) in all five dimensions of the EQ-5D-5L to 5 (extreme problems) in three dimensions and 1 (no problems) in two dimensions was seen as an improvement in the Netherlands and as worsening in Uruguay [
81]. This represents approximately one-third variation in values. Therefore, potentially different interpretation of health state changes highlights the importance of choosing an appropriate value set to avoid any inappropriate HTA decisions. Less recognisable heterogeneity may have been due to participant selection and confounding biases. As a result, reported estimates, which are primarily descriptive in nature, may have limited comparability. It also remains unclear whether a calibration of available estimates to approximate improvements or decrements in HRQoL would be a viable approach.
Managing heterogeneity in economic models can be challenging. Accounting for heterogeneity in standard cohort Markov models is difficult to achieve other than through stratification. By comparison, patient-level models are better able to account for heterogeneity; however, for patient-level models, integration of predictive functions to predict HSUs dependent on baseline patient characteristics are better suited than average HSUVs across a sample as identified by our review. Development of such predictive functions, for example, as associated with ‘utility mapping functions’, and integration within cost-effectiveness models has occurred but is still an area for further research.
It should also be noted that current evidence suggests that women, aging populations and those with comorbidities are frequently under-represented in clinical trials of ARTs in PWH [
82,
83]. This is also likely to be the case for studies of newer ARTs and, therefore, capturing the quality of life experiences in these subgroups needs to be carefully considered in CEAs to address equity concerns. The development of standardised datasets for these population groups would be of benefit for future research.
As indicated by our findings, the landscape for available estimates is broad, diverse, and provides a spectrum of HSUVs for potential utilisation. Researchers and decision makers could benefit from more clarity on treatment regimens, preference-weighting methods, and the clinical stage of HIV in future studies to help interpret and select the best HSUVs for cost-effectiveness models. Future research should also focus on how HSUVs may be adapted/adjusted to fit any given HTA agency’s criteria to ensure studies can provide relevant information even if the most appropriate PBM or value set has not been used.
Limitations
While the scope of this review was extensive, allowing a broad range of evidence to be assessed, the substantial heterogeneity identified meant that a meta-analysis of the data, even using a random-effects model, was not appropriate, which restricted further analysis of reported HSUVs. Furthermore, the best value set in the absence of country-specific preferences was not explored. The use of vignettes and proxy studies, which could represent different or alternative HSUV estimations, were not included in the search terms meaning that some studies may have been overlooked. However, as vignette and proxy approaches are generally not preferred by HTA agencies internationally who require these HSUVs for use in cost-effectiveness models [
20,
23], it is unlikely that this exclusion had a substantial impact on our results. Findings indicated variations in methodologies and populations; therefore, caution should be exercised when using these findings for decision-analytic economic modelling. Many studies had limited and unclear reporting on health states, such as treatment-naïve or HIV stage, meaning that authors’ descriptions were accepted. Therefore, more objective classification evidence should be provided in studies reporting HSUVs to ensure that reported utilities reflect the relevant health state. Additionally, owing to a limited number of studies reporting on specific HIV populations, we did not conduct any subgroup analyses by these PWH populations. The date limit for searching, applied to retrieve newer ARTs and see the change in HRQoL and mortality outcomes over time, may have missed studies with relevant health states not related to a specific treatment regimen; additionally, included papers may not have captured all available evidence, especially for new treatments that may not be widely used in clinical settings. Caution should be used when interpreting results from studies published nearer the start of the search period owing to the potential for these datasets to no longer represent the current state of HIV management. Lastly, the extent and intent of publication bias remains unknown as a formal assessment of this type of bias was not feasible.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.