This study shows that the POMS is both a reliable and valid measure of short-term postoperative morbidity in patients undergoing major abdominal surgery. The main strength of this paper is that it studies a homogenous group of subjects all of whom underwent major abdominal surgery, and hence could reliably be expected to have a similar pattern of postoperative morbidity. The study also examined the reliability of the survey when administered by novice researchers unfamiliar with the POMS, allowing an assessment of its potential use as an outcome measure in both research and clinical practice, where it would be required to be administered by a spectrum of health professionals of varying experience.
Overall the inter-rater reliability of POMS-defined morbidity showed high percentage agreement, and near perfect correlation. Reliability within POMS domains showed varying results, with the lowest percentage agreement seen in the gastrointestinal domain, however the κ correlation coefficient still showed a substantial agreement. A similar result was also shown in the only other validation of the POMS , in which the gastrointestinal domain showed the weakest correlation between observers, suggesting that the criteria may need refining to give it greater clinical utility. In the study by Grocott et al.  the majority of the items within the POMS domains showed a perfect correlation, however this was not the case in the present study. The reason for the decreased correlation may be due to experienced research nurses having administered the survey in the previous work, whilst we chose specifically to assess the reliability of the POMS when it was administered by novices to its use. Despite this, POMS, as it currently stands, could be used as an outcome measure without specific training in its use.
There are a number of apparent discrepancies in the reliability of individual domains as only fair correlation was seen in the hematological domain, and no correlation was seen in the wound domain despite good percentage agreement between observers. The use of the κ coefficient in this setting unfairly describes the reliability of the POMS as κ is affected by the prevalence of the findings it is describing. When the reported prevalence is low, as was the case in the hematological and wound domains, then κ may not be a reliable measure of correlation as illustrated by the high agreement but poor correlation .
As the POMS cannot be used as a scale due to its low internal consistency , the presence of any morbidity recorded by the POMS must be associated with an increased hospital LOS in order for it to be valid. The presence of morbidity registered by the POMS at both postoperative days 3 and 5 was associated with a longer hospital LOS, and morbidity at day 5 was associated with an eightfold increase in patients having a prolonged hospital LOS (greater than the 3rd interquartile for the cohort). These observations support the predictive ability of the POMS, but also suggest additional utility for the survey in the context of resource utilization due to its ability to predict those who will remain in hospital for longer.
Grocott et al. found that on postoperative day 5 the presence of morbidity within 5 of the 9 domains was associated with a longer hospital LOS; however, we observed this in an additional 3 domains. This may reflect the significant differences in the two populations studied, with the previous cohort consisting of almost two thirds orthopedic patients undergoing peripheral surgery, whilst this study contained a more homogenous cohort all of whom underwent major abdominal surgery. In addition, it also appears that a number of domains are associated with a longer hospital LOS when the POMS is administered earlier at postoperative day 3, allowing a more prompt assessment of significant morbidity that affects hospital LOS and strengthening its validity as an outcome measure.
The pattern of morbidity seen in this study is similar to that previously described [5, 9], although higher incidences of pain and pulmonary related morbidity, but lower incidences of gastrointestinal morbidity related morbidity were seen. This is surprising given the nature of the cohort, but a possible explanation for this reduction in gastrointestinal morbidity in a series that has undergone major abdominal surgery may due to differences in postoperative analgesia (epidural versus intravenous opiates), or the adoption of enhanced recovery protocols aimed at minimizing morbidity.
There are a number of limitations to this study, primarily the small numbers in the reliability assessment of the POMS; however, the overall number of observations was acceptable. The cohort is not truly homogenous, and although all subjects underwent major abdominal surgery, the inclusion of vascular and urology patients, who may have a different prevalence and pattern of morbidity to those undergoing colorectal surgery, may limit the application of results to this population. In addition we did not assess patients with no POMS defined criteria for potentially important morbidity not captured by the survey. It is possible that this group remained hospitalized due to significant postoperative morbidity rather than social issues or delays in the discharge process. However, it has previously been observed that patients remaining in hospital with no POMS defined morbidity showed no evidence of any other morbidity measured by different means [5, 9].
There are also a number of limitations of the POMS itself as many of the domains describe what is considered routine therapy following major surgery (postoperative oxygen, antibiotics, catheterization), however, by postoperative days 3 and 5, these routine therapies should have been ceased, particularly in the context of enhanced recovery programs. The lack of internal consistency of POMS, meaning that it cannot be treated as a scale, only allows the POMS to identify the presence of morbidity but not to measure its severity, a tool that would be inherently useful when measuring postoperative outcomes. Further modification of the POMS may allow the development of a survey that measures a single construct, and hence can be used to measure the severity of morbidity. The validity of the POMS in different and specific surgical cohorts, for example vascular and urology, also needs to be explored as the patterns of morbidity are often different. However, it appears to be both valid and reliable in this group.
Accurate and meaningful reporting of outcomes in clinical trials is essential; however, it has been acknowledged that reporting of surgical adverse events in the health system is inconsistent . The underlying problem remains the lack of standardization of reported outcome measures in terms of timescale, the specific measurement, and the definition of that same measurement. The standardization of a core set of outcome measures would allow for more effective comparison of interventions, allow appropriate outcome measures to be collected and reported in the correct way, aid meta-analysis and sample size calculations, and simplify the design of trials for researchers. An additional benefit of reporting a set of core outcome measures would be the reduction of outcome reporting bias. Outcome reporting bias can be defined as the selection for publication of a subset of the originally reported outcome variables based on the results .
The POMS is essentially a composite outcome measure, consisting of various domains in which those with multiple items can be collapsed so that each domain becomes a binary outcome. The primary outcome measure can then be considered a composite of the binary outcomes for each domain. A composite outcome measure such as the POMS has a number of inherent advantages over a single clinical endpoint, particularly in clinical trials. Major complications as a core outcome measure have a relatively low event rate; however, the composite measure which combines outcomes increasing the overall incidence of events, improves the power of a study for a set number of participants, or reduces the number of participants required to achieve a preset power. The multi-component nature of composite measures allows researchers the opportunity to be able to describe the disease, or its process, more effectively, particularly in complex states that affect multiple systems such as the inflammatory response and the associated morbidity following major abdominal surgery. The components of a composite measure should describe the outcome of interest, and the POMS was designed by clinicians to reflect clinically important issues that caused patients to remain in hospital after major surgery. Ideally the components should have similar frequency, treatment effects and severity [13, 14], however, this is practically difficult in complex disease processes, and can be compensated for by modern statistical methodology, which also allows not only an overall treatment effect to be detected from the composite, but also the effect of the treatment on the individual components .
Various statistical methodologies are available to clinically utilize the POMS as a core outcome measure following major abdominal surgery. The composite can be collapsed to a single outcome; however, high frequency items are then over-weighted at the expense of less frequently occurring items that may be more clinically important. Alternatively, a count of events may be employed; however, once again, this can be difficult to interpret unless all events are of an equal severity. Multivariate generalized estimating equation (GEE) methods [16, 17] are less affected by issues of unequal severity and frequency  and can be described in terms of a common effect GEE in which a treatment effect is estimated across all components of the composite, or an average relative effect GEE in which a treatment effect is estimated for each component. In terms of the POMS as the common effect GEE methods are still biased by events that occur with a higher frequency, a situation that arises in this survey with the gastrointestinal and pain domains having a high frequency, whereas morbidity in the wound domain rarely occurs, the average relative effects GEE is the more appropriate statistical tool to utilize as it compensates for this effect. In addition clinically derived weights for the importance of the various domains can be applied to this model increasing its clinical utility, although these must be assigned prior to data analysis.
Further work is required to simplify the POMS, and to determine which domains and items have poor predictive ability, allowing modification of the survey with additional items with the view to constructing a survey that has acceptable internal consistency, and hence can be used as a scale. Additional consideration is also required as to the potential of weighting individual domains based on perceived clinical importance.