Predictions of Hospital Mortality Rates
The Editors welcome submissions for possible publication in the Letters section. Authors of letters should:
•Include no more than 300 words of text, three authors, and five references
•Type with double-spacing
•Send three copies of the letter, an authors' form signed by all authors, and a cover letter describing any conflicts of interest related to the contents of the letter.
Letters commenting on an Annals article will be considered if they are received within 6 weeks of the time the article was published. Only some of the letters received can be published. Published letters are edited and may be shortened; tables and figures are included only selectively. Authors will be notified that the letter has been received. If the letter is selected for publication, the author will be notified about 3 weeks before the publication date. Unpublished letters cannot be returned.
Annals welcomes electronically submitted letters.
TO THE EDITOR:
Pine and colleagues [1] compared hospital mortality rates calculated by using administrative data alone; administrative plus laboratory data; and administrative, laboratory, and clinical data. The authors controlled for other risk factors and adjusted for disease severity. Although the statistical methods used are interesting, we have a few concerns about the validity and generalizability of the findings.
All of the model building and validation (predictions) are done by using the same set of data. This can highly bias the predictions from the models (that is, making them overly optimistic) [2]. The reason is that the same data are used to search for the best model through stepwise logistic regression and then predict the responses from this same model that was just built. Much work has been done to show that not only are the predictions biased (here, areas under the receiver-operating characteristic [ROC] curves) but differences between model predictions can also be biased. One way around this problem is to set aside some of the data for validation only (a test data set of, for example, 10% to 20% of the original data set) and use the remaining portion on which to build the models (training data set). One could then simply average predictions over many random splits of the data set into test and training partitions. The data set used is large enough to accommodate this. Another alternative would be to collect another data set for validation purposes only [2].
The large amount of missing data (upward of 50% for some variables) was handled by model-based imputing for continuous variables. For dichotomous variables, however, the missing cases were simply allocated to “absent” if the variable was associated with increased risk. Model-based imputation here would also be more reasonable. No adjustments of the estimates and their SEs were made from heavily imputed data (although the authors did mention this in the Discussion section) [3]. With such large amounts of missing data, one must be cautious in drawing definitive conclusions because the conclusions depend on the nature of the missing data. It would have been interesting to see the results obtained by using only the complete data cases, as well to see how the imputation of missing values affected the results.
The c-statistic was done by using straight unweighted averaging of areas for the four different disease outcomes for models based on different sets of predictors. Some sort of weighted averaging of areas should be done when the weights are derived from the prevalence of the disease under consideration. Other, more complicated, weighting schemes have also been proposed for stratified ROC analyses [4].
Finally, the determination of whether an administrative variable is restricted or unrestricted seems rather arbitrary. No references to other similar splits of study variables are reported. As a result, it is difficult to defend the authors' conclusions that are based on their split.
Michael H. Kutner, PhD
J. Sunil Rao, PhD
The Cleveland Clinic Foundation; Cleveland, OH 44195
The Editors welcome submissions for possible publication in the Letters section. Authors of letters should:
•Include no more than 300 words of text, three authors, and five references
•Type with double-spacing
•Send three copies of the letter, an authors' form signed by all authors, and a cover letter describing any conflicts of interest related to the contents of the letter.
Letters commenting on an Annals article will be considered if they are received within 6 weeks of the time the article was published. Only some of the letters received can be published. Published letters are edited and may be shortened; tables and figures are included only selectively. Authors will be notified that the letter has been received. If the letter is selected for publication, the author will be notified about 3 weeks before the publication date. Unpublished letters cannot be returned.
Annals welcomes electronically submitted letters.
- Copyright ©2004 by the American College of Physicians
RSS Feeds









