Impact of ASA score misclassification on NSQIP predicted mortality: a retrospective analysis

Background The ASA physical classification score has a major impact on the observed/expected (O/E) mortality ratio in the NSQIP General Vascular Mortality Model. The difference in predicted mortality is greatest between ASAs 3 and 4. We hypothesized under-classified ASA scores significantly affect the O/E mortality. Methods We conducted a retrospective review of NSQIP essential surgery cases from January 2014 to December 2014 (n = 1264) with mortality sub-analysis (n = 33) at our institution. We recorded transfer and emergency status and independently calculated the ASA score for mortalities using published definitions. A random sample of 50 survivors and 10 emergency survivors were reviewed and ASA recalculated. We performed statistical modeling to simulate the effects of ASA misclassifications. Statistical analysis was performed using JMP 10 and SAS 9.4. Results ASA was under-classified in 18.2% of mortalities, most commonly ASAs 3 and 4. Sixteen percent of ASA 3 survivors were misclassified, including 60% in the emergency subgroup (p < 0.05 vs. elective cases). Patients transferred from other institutions were more likely to be emergency cases than non-transferred patients (43.5 vs. 7.84%, p < 0.05). Transferred patients had a higher proportion of ASAs 3–5 vs. ASAs 1–2 compared with non-transfers (84.38 vs. 49.76%, p < 0.05) Simulation data showed ASA misclassification underestimated predicted mortality by 2.5 deaths on average. Conclusion ASA misclassification significantly impacts O/E mortality. With accurate ASA classification, observed mortality would not have exceeded expected mortality in our institution. Education regarding the impact of ASA scoring is critical to ensure accurate O/E mortality data at hospitals using NSQIP to assess surgical quality.


Background
A current focus of health care systems is to improve the cost and quality of patient care. The American College of Surgeons National Surgical Quality Improvement Program (NSQIP®) is commonly used to collect and report data on institution-specific, risk-adjusted surgical outcomes. A systematic sampling approach is used to determine which surgical cases are selected for abstraction (Shiloach et al. 2010) based on the hospital's specific program: Essentials, Procedure-Targeted, Small & Rural, or Pediatric. Participating institutions are provided with quarterly reports on risk-adjusted complication rates for a variety of postoperative occurrences including surgical site infections, renal failure, thromboembolic complications, cardiac events, readmission, and observed to expected (O/E) mortality. Institutional performance for specific complications are compared with other hospitals and assigned a decile rank. The ranking is reported as "Needs Improvement," "As Expected," or "Exemplary," when compared with expected complication rates using standardized models, such as the NSQIP General Vascular Mortality Model (GVMM) for O/E mortality. Institutions can then use the NSQIP's institution-specific benchmarking data to focus their quality improvement initiatives. NSQIP participation is effective in helping institutions identify potential problems in surgical care (Steinberg et al. 2008;Fink et al. 2002); however, in most instances, additional analyses by the participating hospitals are required to develop a better understanding of how best to prevent or decrease complications. (Schilling et al. 2008).
In 2014, our institution received a NSQIP report indicating a higher than expected observed-to-expected 30-day mortality rate and was subsequently assigned a "Needs Improvement" status. We routinely review all major surgical complications and mortalities through our surgical Morbidity and Mortality conference and were surprised to learn we ranked in the lowest decile for this category. Focusing on mortality, we first reviewed the surgical literature to aid in identifying factors that might impact predicted mortality (Fink et al. 2002) and then did chart reviews of all surgical mortalities in 2014. We initially abstracted data to get a better understanding of patient-specific risk factors and process-ofcare variables that might affect mortality including transfer status, need for emergency surgery, use and timing of "do not resuscitate" (DNR) status, "procedure risk" (low, medium, or high), NSQIP and University Hospital Consortium predicted mortality, and finally the American Society of Anesthesia (ASA) physical status classification system.
Based on our initial review, we turned our focus to the ASA score (Table 1) (Durham et al. 2006) as a potential contributor to our higher than expected O/E mortality rate. The ASA score is assigned by the anesthesia team and provides a baseline metric for the fitness of a patient prior to undergoing surgery. The ASA score is an important predictor of mortality in surgical patients (Davenport et al. 2006;Davenport et al. 2005) and has been specifically validated for use in the NSQIP GVMM. The NSQIP rules for data entry require the SCR to use the ASA score recorded by the anesthesia team but allow the SCR to add the suffix E in cases where the surgical team documents the emergent nature of the surgical procedure, if not already documented in the "anesthesia assigned" ASA score. During our chart review of the 2014 mortalities, we identified several misclassified ASA scores, which greatly altered predicted mortality according to the NSQIP online preoperative risk calculator. Based on this finding, we hypothesized misclassified ASA scores falsely decreased the expected mortality and contributed to the increased 2014 NSQIP O/E mortality at our institution.

Methods
The study was approved by our institutional review board for exemption from review because it used retrospective, de-identified data. At our institution, 1264 general and vascular surgical cases were reported to NSQIP in 2014, which included 33 mortalities. The medical records of these 33 patients were reviewed by two surgery residents (AH and SJ) independently, who did not participate in any of the cases. ASA score was independently calculated by each reviewer and then discussed together to reach consensus based on published guidelines on the ASA website (asahq.org). In addition to ASA score, the following data were abstracted: transfer from another institution, the need for emergency surgery, DNR status and timing relative to death, and procedure risk. To determine factors significantly affecting  (Durham et al. 2006). This analysis determined ASA was the major factor in NSQIP modeling, and discrepancies in classification lead to substantially different outcomes, specifically changes from ASA 3 to ASA 4. Changes from ASA 2 to ASA 3 or ASA 4 to ASA 5 did not impact mortality predictions as expected.
To understand the impact of ASA misclassification, we needed to develop a global estimate of ASA misclassification incidence for the entire NSQIP population, not just the mortalities. As the differential impact between ASAs 2 and 3, and between 4 and 5, was negligible compared to that between ASAs 3 and 4, our study focused on ASA 3 cases only. To objectively estimate the incidence of misclassified ASA 3 patients, a random sample of 50 patients was selected from the 2014 elective NSQIP surgical cases with charted ASA 3 classifications. Additionally, 10 patients of the total 74 emergency cases who were initially charted ASA 3 were randomly selected. These samples were used to estimate the frequency of ASA 3 over-and under-classification. Patient selection was performed by random number generating software (SAS 9.4, Cary, NC).
After correcting the misclassified ASA scores of the random samples, SAS 9.4 was used to simulate the number of expected deaths using the odds ratios for mortality in the published 2014 GVMM and adjusted for the new rates of each ASA class. Both over-and underclassifications were included in the model. The simulation was run 1000 times. As the entered data represented the probabilities of a particular outcome, each run of the simulation generates a number of predicted deaths.
Statistical analyses were performed using JMP 10 and SAS 9.4 (Cary, NC). Contingency analysis using the chisquare test was performed for categorical variables.

Results
The patient characteristics of our study populations (all cases and mortalities) are shown in Table 2. Patients who died were older and more likely to be outside transfers and emergencies (p < 0.05). In addition, certain surgical populations, namely, breast and endocrine surgery patients, had no observed mortality. Characteristics such as gender, race, and Hispanic ethnicity were not different between groups. When evaluating NSQIP variables, the mortality group also had a greater incidence of partially dependent functional status, disseminated cancer, diabetes, hypertension, tobacco use, chronic obstructive pulmonary disease, acute renal failure, dirty wounds, ascites, and ventilator use at the time of surgery (Table 3, p < 0.05). A total of 1264 NSQIP essential cases were performed during 2014. Patients transferred from other institutions were more likely to be emergency cases compared with patients who were not transfers (43.5 vs. 7.84%, p < 0.05), and transferred patients had a higher proportion of ASAs 3-5 vs. ASAs 1-2 compared with non-transfers (84.38 vs. 49.76%, p < 0.05). When comparing our study population to the NSQIP reported total population (n = 768,612), our study population had a higher proportion of transferred patients for mortalities (57.6 vs. 29.9%, p < 0.05) and survivors (11 vs. 3.9%, p < 0.05). Additionally, our study population had a higher number of patients with 3+ risk factors in both mortalities (87.9 vs. 60.1%, p < 0.05) and survivors (23 vs. 11.6%, p < 0.05) than the NSQIP comparison population.
Our initial medical record review of 33 mortalities showed 18.2% of ASA scores were misclassified, mostly in patients originally scored ASA 3 or ASA 4. 12.1% were under-classified (initially received a lower ASA), and 6.1% were over-classified. Discrepancies between recorded and reclassified ASA scores appeared to be the greatest contributor to the NSQIP predicted mortality (compared to all other factors on the online NSQIP model) particularly when ASA 4 and 5 cases were under-classified as ASA 3. Emergency cases were more likely to have a higher ASA score (p < 0.05) and were more likely to be misclassified (p < 0.05). Table 4 lists the reclassified ASA scores and the medical rationale for reclassifying the ASA score.
To determine if ASA scores were also systematically misclassified in the all cases population, we reviewed medical records of 50 randomly chosen survivors initially assigned an ASA score of 3. In this random sample, the ASA was misclassified in 16% of patients with five under-classified and two over-classified. A random sample of 10 patients who underwent emergency surgery was also analyzed. Six of these patients had ASA scores that were misclassified (p < 0.05) including one over-classification. Table 4 summarizes the factors that led to ASA reclassification.
Predicted mortality simulations were then performed using the misclassification rates discovered above. Figure 1 shows the distribution of the results of the simulation model for the ASA 3 misclassifications.

Discussion
Quality improvement initiatives such as the NSQIP and the University Hospital Consortium (UHC) databases are important benchmarking resources, which allow participating hospital systems to assess the quality of care provided at their institutions (Fink et al. 2002). However, when specific quality scoring systems are used to evaluate patient care, it is assumed that the data used to assess patient acuity and outcomes are accurate. The UHC quality benchmarking process utilizes administrative data to "risk-stratify" patient outcomes. Administrative databases require accurate documentation of patient's medical diagnoses in the medical record to accurately "risk-stratify" patient outcomes. Abstracted databases like NSQIP are assumed to be more accurate in patient risk factor stratification (Steinberg et al. 2008). However, as with any "scoring system," the users must understand the strengths and weakness of the system to use it properly. NSQIP is designed to predict the risk of complications and mortality based on information about the patients and their medical conditions that is available prior to performing surgery. Consequently, misrepresentation of these variables, such as failure to recognize sepsis or misclassifying the ASA, can significantly underestimate predicted mortality and thereby inaccurately categorize hospitals as being poor performers.
We discovered that at our institution, an academic medical center with many trainees, ASA misclassifications were relatively common (16%), especially in emergencies (60%). Under-classification of ASA scores 4 and 5 as ASA 3 was unexpected. The statistical model we created suggests that with proper ASA classification, our institution's predicted mortality would have matched our observed mortality. In addition, emergency surgical procedures were most commonly misclassified and transferred patients were most often emergency cases. Thus, the predicted mortality of institutions with a high volume of transferred emergency surgical cases, such as ours, may be artificially reduced from underestimated ASA classifications. This is consistent with data suggesting that emergency cases are prone to high O/E ratios and risk under-classification in general (Hyder et al. 2015).
The reasons why so many ASA scores were misclassified are unclear. As a teaching hospital, it is tempting to assume that junior anesthesia trainees were not as familiar with the ASA score calculation as they should be, or that emergency cases performed on nights and weekends were prone to "erroneous ASA classification" (Gawande et al. 2003). However, Tables 3 and 4 offer some additional insights. First, many patients who were misclassified as ASA 3, when they should have been ASA 4, presented with sepsis or perforated viscus. In reviewing their charts, some of these patients appeared quite well on initial examination; however, their vital signs, laboratory values, and imaging met the Systemic Inflammatory Response Syndrome criteria for sepsis. In addition, all of the ASA 5 patients misclassified as ASA 3 presented with ruptured abdominal aortic aneurysm. On presentation and initial exam, because they were "contained ruptures," these patients also appeared quite well with minimal abnormalities in their vitals or laboratory values. However, while these patients were clinically well appearing, ruptured aortic aneurysms are by definition granted an ASA 5 on the ASA guidelines, as the natural history for these cases is quick propagation to overt rupture and death. A few under-classified patients had medical histories consistent with ASA 4 as well, such as myocardial infarction within 3 months or ongoing transient ischemic attacks prior to surgery. As such, there appeared to be an over reliance on the subjective physical appearance of the patient at the time of examination, as opposed to factors such as the natural history of their disease process and previous medical history. When adjusting for re-classified ASA scores in the sample populations, simulations using the odds ratio of mortality based on the GVMM reports predicted increased mortality matching our institutions observed mortality. Histogram bars depict the percentage of simulations that resulted in the mortality rates shown on the x-axis. The number of deaths with the greatest likelihood based on the simulation model was 33.5. The simulation was run 1000 times. Both over-classification and under-classification rates were included in the model The ASA score is a subjective measure of baseline patient illness; however, it is critical that all providers who assign ASAs are doing so in a consistent manner. Our study provides evidence ASA misclassification can significantly impact predicted mortality and supports continued education regarding the potential impact of ASA scoring on O/E mortality in surgical patients. To address this concern, we communicated our findings on ASA misclassification to both the surgery and anesthesia departments and provided education regarding the ASA score and its importance in calculating predicted mortality. We also modified our institutional procedure verification and time out policy by incorporating the ASA score into the surgical time out. Our revised policy requires the attending anesthesiologist to communicate the ASA score to the surgical team as part of the time out. The attending surgeon is required to acknowledge the assigned ASA score prior to starting the operation and initiate a discussion about it if there are concerns about the assigned score. In addition, surgical residents are required to use the online NSQIP calculator for all cases presented in the weekly morbidity and mortality conference. A future study will reevaluate the incidence of misclassified ASAs after such education has been instituted.
There are several limitations to this study. First, it is a single-center retrospective review of NSQIP mortalities in a large academic medical center. Consequently, the results may not apply to participating NSQIP institutions of varying size and type. Future studies being considered include expansion of the current study to multiple institutions, as well as revisiting our ASA misclassification rate after education. Second, the actual model used to calculate NSQIP predicted mortality is proprietary, so we are unable to report exact statistical change in predicted mortality. In the future, NSQIP plans to be increasingly robust. For example, NSQIP models will improve with specific variables collected for complex hepatobiliary cases.

Conclusion
ASA misclassification significantly impacts observed/ expected mortality ratio, and thus, how a particular institution's safety is viewed. In our review, misclassification, particularly in emergency cases, underestimated the number of predicted deaths by up to 9%. With accurate ASA classification, observed mortality would not have exceeded expected mortality in our institution. Continued education regarding the impact of ASA scoring is essential to ensure accurate O/E mortality data is being used to assess surgical quality at participating NSQIP institutions. Our institution has since instituted a policy that the ASA must be announced during the pre-procedure time out and agreed upon or discussed prior to incision.

Funding
This research was not funded by any funding body. The design, data collection, analysis, and interpretation were all done by the authors alone.

Availability of data and materials
The datasets used and analyzed during this current study are available from the corresponding author on request.
Authors' contributions AH, SVJ, and RNC were responsible for the study conception and design. AH, SVJ, and MF were responsible for the data acquisition. All authors contributed to the analysis and interpretation of the data. The manuscript was drafted by AH, SVJ, AG, LK, MC, and RNC. Critical revision was performed by all authors. All authors read and approved the final manuscript.

Ethics approval and consent to participate
The SUNY Upstate Institutional Review Board provided an exemption for use of our de-identified patient care data for the purposes of this study (project number 943810-1).

Consent for publication
Not applicable.
Competing interests SVJ has been invited to speak by Draeger Medical at conferences without expenses paid. LK has received travel expenses for consultations with Carefusion and the American Cancer Society. She also has received a grant for an after-market analysis of a tunneled pleural catheter. RNC has received less than $5000 for expert testimony provided for a malpractice case, as well as a stipend from the NIH for review panel service. All other others have no competing interests to disclose.

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.