Risk Prediction Models for Mortality in Ambulatory Patients With Heart FailureClinical Perspective
A Systematic Review
Jump to

Abstract
Background—Optimal management of heart failure requires accurate assessment of prognosis. Many prognostic models are available. Our objective was to identify studies that evaluate the use of risk prediction models for mortality in ambulatory patients with heart failure and describe their performance and clinical applicability.
Methods and Results—We searched for studies in Medline, Embase, and CINAHL in May 2012. Two reviewers selected citations including patients with heart failure and reporting on model performance in derivation or validation cohorts. We abstracted data related to population, outcomes, study quality, model discrimination, and calibration. Of the 9952 studies reviewed, we included 34 studies testing 20 models. Only 5 models were validated in independent cohorts: the Heart Failure Survival Score, the Seattle Heart Failure Model, the PACE (incorporating peripheral vascular disease, age, creatinine, and ejection fraction) risk score, a model by Frankenstein et al, and the SHOCKED predictors. The Heart Failure Survival Score was validated in 8 cohorts (2240 patients), showing poor-to-modest discrimination (c-statistic, 0.56–0.79), being lower in more recent cohorts. The Seattle Heart Failure Model was validated in 14 cohorts (16 057 patients), describing poor-to-acceptable discrimination (0.63–0.81), remaining relatively stable over time. Both models reported adequate calibration, although overestimating survival in specific populations. The other 3 models were validated in a cohort each, reporting poor-to-modest discrimination (0.66–0.74). Among the remaining 15 models, 6 were validated by bootstrapping (c-statistic, 0.74–0.85); the rest were not validated.
Conclusions—Externally validated heart failure models showed inconsistent performance. The Heart Failure Survival Score and Seattle Heart Failure Model demonstrated modest discrimination and questionable calibration. A new model derived from contemporary patient cohorts may be required for improved prognostic performance.
Introduction
Heart failure (HF) is a frequent health problem with high morbidity and mortality, increasing prevalence and escalating healthcare costs.1,2 Older patient age, multiple comorbidities, and different patterns of disease progression create important challenges in patient management. Because the impact of these factors and their interactions remain incompletely understood, predicting patients’ clinical course is difficult.
Editorial see p 877
Clinical Perspective on p 889
Accurate estimation of prognosis is important for many reasons. Patients are concerned about their probability of future events. Physicians may use prognosis estimates to decide the appropriate type and timing of additional tests or therapies, including heart transplantation and mechanical circulatory support. Accurate prognostic assessment may prevent delays in appropriate treatment of high-risk patients or overtreatment of low-risk patients. Knowledge of prognosis also facilitates research, for instance in the design of randomized trials and the exploration of subgroup effects.
To be usefully applied, prognostic models must be accurate and generalizable. Models may be inaccurate because of omission of important predictors, derivation from unrepresentative cohorts, overfitting or violations of model assumptions.
In the past 3 decades, investigators have developed many models to predict adverse outcomes in patients with HF.3,4 Clinicians and researchers wishing to use prognostic models would benefit from knowledge of their characteristics and performance. Therefore, we performed a systematic review to identify studies evaluating the use of risk prediction models for mortality in ambulatory patients with HF and to describe their performance and their clinical applicability.
Methods
Data Sources and Searches
In May 2012, with the assistance of an experienced research librarian, we performed a systematic search of electronic databases, including Medline, Embase, and CINAHL. We used several related terms: (internal cardiac defibrillator [ICD]), (heart or cardiac), (mortality or survival), and (multivariate analysis or regression analysis or risk factor or prediction or prognostic factor). The full search strategy is outlined in Appendix A in the online-only Data Supplement (Methods in the online-only Data Supplement). We identified additional studies by searching bibliographic references of included publications.
Study Selection
Eligible articles enrolled adults (>19 years) who were ambulatory patients with HF; used multivariable analysis (≥2 independent variables) to predict mortality or a composite outcome including mortality; reported >30 deaths; reported results as a score, a prediction rule, or as a set of regression coefficients sufficient to make predictions for individual patients; and reported a measure of discrimination or calibration. We also included studies evaluating the performance of an existing score in a different population to the one from which it was developed, and reported model discrimination and calibration. There were no restrictions on study design, left ventricular ejection fraction (LVEF), language, or date of publication. We excluded studies that enrolled patients during hospital admission or duplicate studies providing no new relevant data.
Two reviewers independently screened titles and abstracts, and then evaluated full-text versions of all articles deemed potentially relevant by either reviewer. During full-text screening, in cases of disagreement, consensus was reached through discussion. If consensus could not be reached, a third reviewer resolved the issue. Agreement between reviewers was assessed using weighted κ (0.92). Appendix B in the online-only Data Supplement (Methods in the online-only Data Supplement) shows the eligibility form.
Data Extraction
From each study, we abstracted data related to eligibility criteria, data source, time frame of recruitment, and characteristics of the population, including age, sex, ischemic cardiomyopathy, LVEF, use of β-blockers and ICD, definition, and number of events. We also identified variables included in the prediction models.
Assessment of Study Quality, Model Adequacy, and Performance
The assessment of study quality and model performance was based on what authors reported in their published articles. The selection of items for the assessment of study quality, model adequacy, and performance was based on the criteria proposed by Concato et al5 and Moons et al.6 Items included whether patient selection was consecutive, whether the data were collected prospectively, whether the percentage of missing data were small (<5%) and was correctly managed (ie, using data imputation), whether patients lost to follow-up were infrequent (<1%), and whether predictors were coded clearly.
To assess model adequacy, we abstracted information related to model derivation, including selection of the variables, coding, linearity of the response for continuous variables, overfitting,7 and model assumptions. To assess model performance, we abstracted data related to discrimination and calibration. Discrimination expresses the extent to which the model is capable of differentiating patients who had events from those who did not. It is commonly assessed using the c-statistic, which is equivalent to the area under the receiver-operating characteristic curve.8 Model discrimination was deemed as poor if the c-statistic was between 0.50 and 0.70, modest between 0.70 and 0.80, and acceptable if >0.80.9 To assess how changes in HF treatment might modify model performance, we evaluated the impact of β-blockers, use of ICD, and study recruitment date on model discrimination graphically including models tested in >1 external cohort.
The calibration and goodness-of-fit of a model involves investigating how close the values predicted by the model are to the observed values. We identified the method used to assess model calibration (ie, Hosmer–Lemeshow test or deviance, Cox–Snell analysis, correlation between observed versus predicted events) and estimate of performance.
Table I in the online-only Data Supplement explains the criteria used to assess model adequacy and performance in more detail. Items that were not relevant (eg, in studies validating a preexisting model) were coded as nonapplicable.
Data Synthesis
We summarized the data, focusing on the characteristics of the population from whence models were derived and validated, and the models’ performance. We report findings in 2 sections according to external validation (models that were or were not validated in an independent cohort were summarized separately).
Results
After duplicate citations were removed, we screened 6917 citations and ultimately selected 32 studies evaluating 20 prediction models (Figure 1). Only 5 of these models10–14 were validated in an independent cohort. Among the remaining 15 models, 6 were internally validated by bootstrap; the remaining models were not validated.
Study selection process. Number of studies during selection.
Prediction Models Validated in an Independent Cohort
The Heart Failure Survival Score (HFSS),10 the Seattle Heart Failure Model (SHFM),11 the model proposed by Frankenstein et al,12 the PACE risk score,13 and the SHOCKED predictors14 were validated in a different cohort of patients with HF from the model derivation cohort. Tables II and III in the online-only Data Supplement, and the Table summarize the characteristics of studies included, the assessment of study quality and model characteristics, respectively.
Model Derivation and Performance
Heart Failure Survival Score
The HFSS includes 7 variables to predict a composite outcome of death, urgent (UNOS [United Network for Organ Sharing] status 1) heart transplantation and ventricular assist device implantation. Two predictors are binary: ischemic cardiomyopathy and presence of intraventricular conduction delay (QRS >120 ms); and 5 are continuous: LVEF, resting heart rate, mean blood pressure, peak oxygen consumption, and serum sodium. Scores are then divided into 3 categories: high risk, medium risk, and low risk according to prespecified thresholds.10 The HFSS was derived from a single center cohort including 268 patients with HF and has been validated in 8 independent single-center cohorts including a total of 2240 HF patients.10,14–19
The validation cohorts involve a broad variety of patient populations (Table II in the online-only Data Supplement), with a mean age from 51 to 70 years, mostly males (65%–82%) with a mean LVEF between 20% and 30%. In 3 cohorts, the frequency of use of β-blockers was <30% and in the remaining 4 cohorts was 64% to 80%. In 4 studies reporting ICD status, the frequency of ICD use was 11%, 19%, 49%, and 78%.
Model discrimination (assessed by the c-statistic at 1 year) in validation cohorts ranged from poor to modest (0.56–0.79), being modest (between 0.70 and 0.79) in 6 (75%) of the 8 validation cohorts. As shown in Figure 2, model discrimination was worse in cohorts with more frequent use of β-blockers or ICDs, and in more recent studies. Discrimination was poor (c-statistic, <0.70) in validation cohorts in which the rate of ICD use was >40%, studies with a contemporary recruitment date and in 3 of 4 cohorts in which the use β-blockers was >60%. The study by Zugck et al15 reported a substantially higher discrimination (c-statistic=0.84 at 1 year) when peak oxygen consumption was replaced by the 6-minute walk test. However, this HFSS variant has not been further validated. Only 1 study18 assessed HFSS model calibration and reported that the model overestimated event-free survival by ≈20% in low-risk patients.
Model discrimination. Model discrimination according to the use of β-blockers (A), internal cardiac defibrillator (ICD; B), and study patients recruitment date (C). HFSS indicates Heart Failure Survival Score; and SHFM, Seattle Heart Failure Model.
Seattle Heart Failure Model
The SHFM includes 10 continuous variables (age, LVEF, New York Heart Association class, systolic blood pressure, diuretic dose adjusted by weight, lymphocyte count, hemoglobin, serum sodium, total cholesterol, and uric acid) and 10 categorical variables (sex, ischemic cardiomyopathy, QRS>120 ms, use of β-blockers, angiotensin-converting enzyme inhibitors, angiotensin receptor blockers, potassium-sparing diuretic, statins and allopurinol, and ICD/cardiac resynchronization therapy [CRT] status) in an equation that provides a continuous risk score for each patient, and which can be expressed as predicted mean life expectancy or event-free survival at 1, 2, and 5 years.11 This model was developed to predict a composite outcome of death, urgent heart transplantation, and ventricular assist device in 1125 patients with HF enrolled in the randomized controlled trial Prospective Randomized Amlodipine Survival Evaluation. The SHFM has been validated in 14 independent cohorts including 16 057 patients with HF (4 cohorts including 8983 patients with HF were selected from randomized controlled trials [Table II in the online-only Data Supplement]).11,18,22–28 The validation cohorts involve diverse populations with a mean age from 52 to 77 years, a higher proportion of males (61%–82%), and mean LVEF between 17% and 45%. In 4 cohorts, the used of β-blockers was 20% to 35%, and in the remaining cohorts was >60% (maximum of 92%). In 10 studies reporting ICD status, the use of ICD was <25% in 5 cohorts and >65% in 3 cohorts.
Model discrimination varied from poor to acceptable (0.63–0.81), being at least modest (>0.70) in 7 (50%) cohorts of the 14 validation cohorts. There was a slight trend toward poorer discrimination in cohorts with higher use of ICD devices but was only weakly related to β-blocker use and recruitment date (Figure 2). Some studies18,22,25 have analyzed variations of the SHFM including other predictors, such as renal function, diabetes mellitus, peak oxygen consumption, and brain natriuretic peptide, and reported that discrimination did not improve significantly. However, May et al22 reported that discrimination was significantly improved from 0.72 to 0.78 when brain natriuretic peptide was added to the model. Model calibration was evaluated in most of the cohorts (Table) and showed a high correlation (r-coefficient >0.97) between observed and predicted survival. In 3 cohorts, calibration was assessed graphically by comparing observed and predicted event-free survival17,22,24; the model overestimated event-free survival by ≈2% at 1 year and 10% at 5 years, more significantly in black and patients with ICD/CRT.22 The study by Kalogeropoulos et al24 reported inadequate model goodness-of-fit as assessed by the Hosmer–Lemeshow test.
Frankenstein et al’s Model
This model includes 2 binary variables: brain natriuretic peptide and 6-minute walk test with different cutoffs depending on sex and use of β-blockers.12 Patients can then be categorized into 3 groups (scores 0, 1, or 2). This model was derived from 636 patients with HF to predict all-cause mortality and validated in an independent cohort of 676 patients with HF (mean age, 74 years; 76% male; 63% ischemic cardiomyopathy; 54% treated with β-blockers). Model discrimination in the validation cohort was poor, varying from 0.66 to 0.68 (Table). Model calibration was not reported.
PACE Risk Score
This model includes 4 binary variables: the presence of peripheral vascular disease, age >70 years, creatinine >2 mg/dL, and LVEF <20%, and it provides a continuous risk score for an individual patient from 0 to 5.13 This model was derived from 905 secondary and primary prevention patients with ICD to predict all-cause mortality and validated in an independent cohort of 1812 patients with ICD-HF (mean age, 64 years; 77% male; mean LVEF of 31%; and 58% had ischemic cardiomyopathy [Table II in the online-only Data Supplement]). Model discrimination in the validation cohort was poor with a c-statistic of 0.69 at 1 year (Table). Model calibration was not reported.
SHOCKED Predictors
This model includes 7 binary variables: age >75 years, New York Heart Association class >II, atrial fibrillation, chronic obstructive pulmonary disease, chronic kidney disease, LVEF <20%, and diabetes mellitus.14 This model provides a continuous risk score from 0 to 400 and estimates 1-, 2-, 3- and 4-year survival using a nomogram. This model was derived and validated from a cohort of Medicare beneficiaries receiving primary prevention ICD. The validation cohort included 27 893 patients (39% of patients were >75 years, 75% male, 31% had LVEF <20%, and 63% had ischemic cardiomyopathy [Table II in the online-only Data Supplement]). Model discrimination in the validation cohort was modest with a c-statistic of 0.74 at 1 year (Table). Overall correlation between observed and predicted survival was high correlation (r-coefficient >0.89). However, model calibration, assessed by Hosmer–Lemeshow test, showed inadequate goodness-of-fit at 2 and 3 years.
Prediction Models Not Validated in an Independent Cohort
We identified 15 prediction models that were not validated in an external cohort. Tables IV, V, and VI in the online-only Data Supplement summarize the characteristics of studies included, the assessment of study quality, and model characteristics, respectively. These models include a wide variety of predictors tested in diverse HF populations. The number of predictors included ranged from 2 to 21. Seven models were derived from patients with reduced LVEF and 1 in patients with preserved LVEF. The remaining studies included patients with clinically diagnosed HF without considering a specific LVEF cutoff as an inclusion criterion. In 6 studies, internally validated by bootstrapping, model discrimination ranged from 0.74 to 0.85. The best discrimination (c-statistic, 0.85) was observed in the DSC (Dyssynchrony, posterolateral Scar location and Creatinine) index, a model derived from a selective cohort of patients with HF undergoing CRT implantation, which included some variables that are not routinely available: 1 binary variable, posterolateral scar location evaluated by cardiovascular magnetic resonance; and 2 continuous variables, tissue synchronization index measured by cardiovascular magnetic resonance and serum creatinine. The 5 studies that evaluated model calibration reported adequate performance.
Discussion
In this systematic review, we identified 20 event-free survival prediction models in ambulatory patients with HF. Only 25% (5 of 20 models) have been validated in external cohorts and only 2 models, the HFSS and the SHFM, have been validated in >2 independent cohorts, mostly reporting modest (0.70–0.80)-to-poor (<0.70) discrimination. Studies using the HFSS more frequently reported modest (>0.70) discrimination than cohorts evaluating the SHFM. However, HFSS performance showed a decline over time, whereas the SHFM had a relatively stable performance. Nonetheless, only 2 studies18,20 have directly compared models within the same population and reported that model discrimination was similar (c-statistic of 0.73 and 0.7220 for the SHFM and 0.68 and 0.6318 for the HFSS at 1 year).
Model discrimination represents the capacity of the model to differentiate patients who had the event from those who did not. The study by Goda et al20 reported that discrimination was significantly higher (from 0.72–0.73 to 0.77 at 1 year) when HFSS and SHFM were used in a combined manner within the same model. May et al22 reported that the discrimination of the SHFM was significantly improved from 0.72 to 0.78 when brain natriuretic peptide was added to the model. As proposed by D’Agostino and Byung-Ho Nam,9 a model with discriminative capacity >0.70 has acceptable discrimination; a discriminative capacity >0.80 provides strong support to guide medical decision-making. Clearly, HFSS and SHFM have consistently demonstrated that their performance shows only modest discriminative capacity.
One potential reason for suboptimal performance is that the management and treatment of patients with HF has changed substantially in the past 2 decades. These models were derived from cohorts of patients recruited ≈20 years ago (1986–1991 for the HFSS and 1992–1994 for the SHFM).
As proposed by Moons et al,6 a good model should include variables that are believed to be associated with the outcome of interest. Koelling et al16 evaluated the association of the 7 predictors included in the HFSS model in patients treated with β-blockers and reported that only peak oxygen consumption and LVEF were factors independently associated with event-free survival. In addition, the directions of association of some predictors are opposite in the validation and derivation cohorts. For instance, the HFSS derivation study reported that the hazard ratio for 1 beat per minute increase in heart rate was 1.02 (95% confidence interval of 1.01–1.04), while in 2 validation cohorts16,20 including a high proportion of patients treated with β-blockers (>70%), the hazard ratio was 0.98 (95% confidence interval, 0.97–1.01). This may partially explain the decline observed in the HFSS discriminatory capacity in more recent validation cohorts.
A similar situation is found with potassium-sparing diuretic use in the SHFM. Levy et al11 imputed in the calculus of the score a hazard ratio of 0.74 for patients on potassium-sparing diuretics. Goda et al20 reported a nonsignificant reverse effect of spironolactone in a contemporary cohort (hazard ratio, 1.20; 95% confidence interval, 0.86–1.48). Importantly, this tells us that predictors that were believed or found to be associated with mortality in patients with HF 20 years ago may not act similarly in contemporary patients with HF. This supports the need to develop and test an up-to-date prediction model.
Discrimination should not be reported in isolation because a poorly calibrated model can have the same discriminative capacity as a perfectly calibrated model.29 One limitation of calibration is that assessment techniques do not allow for comparisons between models. In the validation cohorts, both the SHFM and the HFSS showed inadequate calibration attributable to the model overestimating survival in some groups of patients, including low-risk patients, blacks, and patients with ICD/CRT therapy.
Model ability to predict survival has not been compared with intuitive predictions of physicians. A study by Muntwyler et al30 showed that primary care physicians overestimated mortality risk in patients with HF (1-year observed mortality of 13% versus physician estimate of 26%); this was more pronounced in stable New York Heart Association class II patients (1-year observed mortality of 6% versus physician estimated of 18%).
Whether these models may be used to guide or improve clinical practice remains underexplored. Vickers et al29 have proposed the use of simple decision analytic techniques to compare prediction models in terms of their consequences. These techniques weight true and false-positive errors differently, to reflect the impact of decision consequences (ie, risks associated with heart transplantation or ventricular assist device versus risks associated with continuing medical therapy). Such decision analytic techniques may assist in determining whether clinical implementation of prediction models would do more good or more harm relative to current practice (physicians’ predictions).
Should use and validation of these models continue? Or should we seek better models? There is no consensus on this issue among commentators. Researchers are pursuing both avenues, validating and supporting the use of the SHFM and HFSS as well as developing new models.
The performance of more recent models developed thus far, however, does not provide evidence that they will perform substantially better than older models. The 3 externally validated and recently published models12–14 have demonstrated poor-to-modest discrimination (between 0.66 and 0.74). Similarly, the 6 models that were validated by bootstrapping showed in general poor-to-modest discrimination. One of these 6 models provided high discriminatory capacity, but it was developed in a selected group of patients with HF undergoing CRT implantation and included 2 variables that are not easily measured (myocardial tissue synchronization index and scar location by cardiovascular magnetic resonance). The lack of external validation makes it difficult to assess how the performance of the model might be generalized to other populations, which clearly limits their clinical use. Discrimination estimated on a first sample is often higher than that on the subsequent samples.31
Other reasons potentially explaining the suboptimal performance of existing models may pertain to the presence of missing data and variable selection. For example, in cohorts validating the SHFM, the presence of missing data was as high as 100% for percentage of lymphocytes26 or 65% for uric acid.22 Whether frequently missing or not easily available variables should be used to develop a score or should be incorporated to standard clinical practice will depend on the strength of the association between the predictors and outcome, the compromised model performance when the variables are not included in the final score and clinical resources. Nonetheless, adequate methods to deal with missing data, such as multiple imputation techniques, are important when evaluating model performance. The exclusion of cases because of missing information may lead to biased results.32
Variable selection based on statistical significance may lead to suboptimal models. Other techniques, such as stability selection and subsampling, have demonstrated to yield more stable models based on a consistent selection of variables decreasing the chances of type I error.33
As noticed in this review, the performance of predictive models has been traditionally evaluated by the c-statistic, which has been criticized as being insensitive in comparing models and for having limited direct clinical use. Reclassification tables, reclassification calibration statistic, and net reclassification and integrated discrimination improvements are recently developed methods to assess discrimination, calibration, and overall model accuracy. It has been shown that the use of these methods can better guide clinical decision-making by offering prognostic information at different risk strata. The use of these techniques is highly recommended during validation of existing or new models.
Conclusions
Optimal management of patients with HF requires accurate assessment of prognosis; however, making accurate assessment remains challenging. Among 5 externally validated prediction models, the HFSS and SHFM models demonstrated modest discriminative capacity and questionable calibration. The clinical impact of medical decision-making guided by the use of these models has not been explored. Given the limitation of current HF models, the development of a new model derived from contemporary patient cohorts is an appealing option. However, the development and reporting of new models should be optimized by adhering to guidelines to guarantee model adequacy. In addition, new models should seek external validation of their generalizability and performance. Evaluation of the clinical impact of decisions based on models relative to current clinical practice would be enormously informative in determining their use in real-world clinical practice.
Acknowledgements
The authors thank Ani Orchanian-Cheff for her expert assistance in conducting the systematic literature search.
Sources of Funding
Dr Alba was awarded a Vanier Canada Graduate Scholarship, administered by the Canadian Institutes of Health Research, Ottawa, ON, Canada.
Disclosures
None.
Footnotes
The online-only Data Supplement is available at http://circheartfailure.ahajournals.org/lookup/suppl/doi:10.1161/CIRCHEARTFAILURE.112.000043/-/DC1.
- Received December 5, 2012.
- Accepted July 15, 2013.
- © 2013 American Heart Association, Inc.
References
- 1.↵
- Rosamond W,
- Flegal K,
- Furie K,
- Go A,
- Greenlund K,
- Haase N,
- Hailpern SM,
- Ho M,
- Howard V,
- Kissela B,
- Kissela B,
- Kittner S,
- Lloyd-Jones D,
- McDermott M,
- Meigs J,
- Moy C,
- Nichol G,
- O’Donnell C,
- Roger V,
- Sorlie P,
- Steinberger J,
- Thom T,
- Wilson M,
- Hong Y
- 2.↵
- Bleumink GS,
- Knetsch AM,
- Sturkenboom MC,
- Straus SM,
- Hofman A,
- Deckers JW,
- Witteman JC,
- Stricker BH
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- Moons KG,
- Kengne AP,
- Woodward M,
- Royston P,
- Vergouwe Y,
- Altman DG,
- Grobbee DE
- 7.↵
- 8.↵
- 9.↵
- Balakrishnan N,
- Rao CR
- D’Agostino RB,
- Byung-Ho N
- 10.↵
- Aaronson KD,
- Schwartz JS,
- Chen TM,
- Wong KL,
- Goin JE,
- Mancini DM
- 11.↵
- Levy WC,
- Mozaffarian D,
- Linker DT,
- Sutradhar SC,
- Anker SD,
- Cropp AB,
- Anand I,
- Maggioni A,
- Burton P,
- Sullivan MD,
- Pitt B,
- Poole-Wilson PA,
- Mann DL,
- Packer M
- 12.↵
- Frankenstein L,
- Goode K,
- Ingle L,
- Remppis A,
- Schellberg D,
- Nelles M,
- Katus HA,
- Clark AL,
- Cleland JG,
- Zugck C
- 13.↵
- 14.↵
- 15.↵
- Zugck C,
- Krüger C,
- Kell R,
- Körber S,
- Schellberg D,
- Kübler W,
- Haass M
- 16.↵
- 17.↵
- 18.↵
- Gorodeski EZ,
- Chu EC,
- Chow CH,
- Levy WC,
- Hsich E,
- Starling RC
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- May HT,
- Horne BD,
- Levy WC,
- Kfoury AG,
- Rasmusson KD,
- Linker DT,
- Mozaffarian D,
- Anderson JL,
- Renlund DG
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- Perrotta L,
- Ricciardi G,
- Pieragnoli P,
- Chiostri M,
- Pontecorboli G,
- De Santo T,
- Bellocci F,
- Vitulano N,
- Emdin M,
- Mascioli G,
- Ricceri I,
- Porciani MC,
- Michelucci A,
- Padeletti L
- 28.↵
- Haga K,
- Murray S,
- Reid J,
- Ness A,
- O’Donnell M,
- Yellowlees D,
- Denvir MA
- 29.↵
- 30.↵
- Muntwyler J,
- Abetel G,
- Gruner C,
- Follath F
- 31.↵
- 32.↵
- 33.↵
Clinical Perspective
Many models are available to predict adverse outcomes in patients with heart failure. Clinicians and researchers wishing to use prognostic models would benefit from knowledge of their characteristics and performance. Therefore, we performed a systematic review to identify studies evaluating risk prediction models for mortality in ambulatory patients with HF, to describe their performance and clinical applicability. This systematic review included 34 studies testing 20 models. Only 5 models were validated in an independent cohort: the Heart Failure Survival Score, the Seattle Heart Failure Model, the PACE risk score, a model by Frankenstein et al,12 and the SHOCKED predictors. The Heart Failure Survival Score, validated in 8 cohorts, showed poor-to-modest discrimination (c-statistic, 0.56–0.79), being lower in the more recent validation studies possibly because of greater use of β-blockers and implantable cardiac defibrillators. The Seattle Heart Failure Model was validated in 14 cohorts describing poor-to-acceptable discrimination (0.63–0.81), remaining relatively stable over time. Both models reported adequate calibration, although overestimating survival in some specific populations. The other 3 models were validated in a cohort each, with poor-to-modest discrimination (0.66–0.74). There were no studies reporting the clinical impact of medical decision-making guided by the use of these models. In conclusion, externally validated HF models showed inconsistent performance. The Heart Failure Survival Score and Seattle Heart Failure Model demonstrated modest discrimination and questionable calibration. A new model derived from contemporary patient cohorts may be required for improved prognostic performance.
This Issue
Jump to
Article Tools
- Risk Prediction Models for Mortality in Ambulatory Patients With Heart FailureClinical PerspectiveAna C. Alba, Thomas Agoritsas, Milosz Jankowski, Delphine Courvoisier, Stephen D. Walter, Gordon H. Guyatt and Heather J. RossCirculation: Heart Failure. 2013;6:881-889, originally published September 17, 2013https://doi.org/10.1161/CIRCHEARTFAILURE.112.000043
Citation Manager Formats
Share this Article
- Risk Prediction Models for Mortality in Ambulatory Patients With Heart FailureClinical PerspectiveAna C. Alba, Thomas Agoritsas, Milosz Jankowski, Delphine Courvoisier, Stephen D. Walter, Gordon H. Guyatt and Heather J. RossCirculation: Heart Failure. 2013;6:881-889, originally published September 17, 2013https://doi.org/10.1161/CIRCHEARTFAILURE.112.000043