Review of:
“The Quality in Australian Health Care Study”
(Medical Journal of Australia. Vol. 163. 6 November 1995 Wilson et al).
Reviewer: (Dr.) Bryan Hall November 2002.
Paper published on:
This article reviews a paper on the Quality in Australian Health Care Study (QAHCS) as published in the Medical Journal of Australia in 1995. The MJA paper follows on from the Harvard analysis of the American Health Care System and the “1994 QAHCS”. There are a number of parallels between the Quality in Australian Health Care Study and the Harvard Medical Practice Study, which also drew attention to medical errors.
Both the Harvard study and the Quality in Australian Health Care study examined medical records to detect evidence of adverse events. By extrapolating from the adverse event analysis both the studies drew extraordinary conclusions concerning the frequency and totality of hospital treatment derived injuries and fatalities. Both of these studies have promoted the view that there are an alarming number of adverse events arising from treatment provided to patients in hospitals. … The purpose of this review is to analyse the references to “mathematical” and “statistical” techniques in the context of the MJA article. Advanced numerical techniques are frequently used to present data results of studies in a mathematically elaborate formulation. However the formulation need not always establish the correctness of the results presented, the methods used or the validity of any conclusions drawn concerning the system under evaluation.
Contents
- Backgound
- QAHCS Article Structure
- Headline Summary
- Background
- Abstract
- Main Body
- Target Population
- AE Indicator Criteria
- Definitions Adopted
- The Review Process
- Results
- Discussion
- Analysis of the QAHCS article
- Validity of Analytical Methods
- Objectivity of Conclusions
- Operational Definition Preventability of an AE
- Causation, Preventability and AEs
- Generality of the Conclusions
- Abstraction and Generalisation
- Evidence from Multiple Observations
- Extrapolating to admissions related to AEs on a national basis
- Inadmissible Regression Model
- Ill-conditioned regression systems
- Logistic Regression Model (Error Analysis)
- Summary of Conclusions
- References & Notes
Background
This MJA paper follows on from the Harvard analysis of the American Health Care System1 and the “1994 Quality in Australian Health Care Study (QAHCS) 2“. The MJA paper states that the QAHCS was modeled on the Harvard Medical Practice Study and the same methods were used3, with some modifications. The Harvard Medical Practice Study was first published in 1991 and was based on 1984 case records. The Harvard researchers have subsequently written a number of articles and a book. Anderson states that popular discussion of “the Harvard study” has come to refer to the Harvard Medical Practice Study collective works numbered 1 to 6 listed below4. Both the Harvard study and the Quality in Australian Health Care study examined medical records to detect evidence of adverse events. Both of these studies have promoted the view that there are an alarming number of adverse events arising from treatment provided to patients in hospitals.
It is the present authors intention to stick fairly closely to the MJA paper cited above (to be subsequently referred to as “the article”). The larger QAHCS on which the article is based is not being directly reviewed here. However, the article appears to essentially represent a synthesis of the QAHCS. In presenting this synthesis many results are quoted throughout the text and additional material is presented in some 13 distinct tables having a variety of headings.
The differences between the Harvard and QAHCS studies and their consequently varying conclusions will not be analysed here. There have been others who have dealt with the Harvard Study in significant detail. For example, Anderson4a has written a number of articles on the Harvard study and has stated:
“Even granting the authors (of the Harvard Study) all their own assumptions, the data are simply not reliable, and should not be extrapolated to the real world of malpractice litigation. Moreover, there is no reason to grant the authors their own assumptions. The study lumped together adverse events both grave and minor, whether caused by doctors or simply occurring anywhere in a hospital. A slip and fall in a hospital corridor, for example, was grouped indistinguishably with surgical error and misdiagnosis.”
The matters presently being considered are inherently concerned with numbers and facts and not an interpretation of those facts – regardless of the medical qualifications of those providing the interpretation. Diverging to consider the Harvard and extended QAHCS is not necessary to achieve logical consistency and significantly invites the possibility of entering into an essentially endless dialogue which can reach no satisfactory conclusion. The number of points being considered could be so large that we are never clear exactly which points we are considering, what the essential definitions are and on what assumptions we are relying.
QAHCS Article Structure
The MJA article provides a long (14 page small font) densely written presentation of a review of medical records as described in its headline summary The article consists for the following structural sections:
- Headline Summary
- Background
- Abstract
- Main Body
- Target Population
- AE Indicator Criteria
- Definitions Adopted
- The Review Process
- Results
- Discussion
A summary of each of these document sections is presented below.
Headline Summary
The headline summary outlines the conclusion of the research described in article as follows
“A review of medical records of over 14,000 admissions to 28 hospitals in New South Wales and South Australia revealed that 16.6% of these admissions were associated with a “adverse event”, which resulted in disability or a longer hospital stay for the patient and was caused by health care management; 51% of the adverse events were considered preventable. In 77.1% this disability had resolved within 12 months, but in 13.7% the disability was permanent and in 4.9% the patient died”. (Med J Aust; 163: 458-171)
Background
The background contains introductory material describing historical points in the study of iatrogenic injuries or adverse patient events (AEs) in hospitalised patients. The introduction states that the article reports on the adequacy of methods used to study AEs, the major diagnostic categories and specific specialties associated with AEs, and measures of disability and preventability. Human and system-based factors identified as contributing to AEs are discussed with a view to preventing further AEs.
Abstract
The Abstract outlines the articles objectives, methodology, measurement outcomes, results and conclusions.
The abstract states that the articles objective is to estimate patient injury (and its direct consequences) caused by health care in Australian Hospitals.
The methodology involved analysing patient records pertaining to 14179 hospital admissions from 28 distinct hospitals in NSW and South Australia. These records were screened by registered nurses to detect evidence of Adverse Events, based on 18 explicit selection criteria (reproduced below), arising from medical practice. Those records in which screening “suggested” adverse medical practice outcomes were further analysed by at least two medical officers to decide whether or not an adverse event had occurred. A third medical officer was used to resolve those cases in which there was disagreement as to whether or not an adverse event had occurred.
An Adverse Event (AE) was defined as an unintentional injury or complication which results in disability, death or prolonged hospital stay and is caused by health care management.
The articles abstract states that the main outcome measures of the QAHCS included:
- Adequacy of the medical record and reliability of the method of medical record review;
- Proportion of the admissions associated with AEs;
- Clinical categories of AEs;
- Characteristics of patients with AEs;
- Extra bed-days attributable to AEs;
- Disability attributable to AEs;
- Preventability of AEs;
The abstract summarises the articles reported results as follows:
- 6200 (43.7%) of the total medical records were screened positive against the above mentioned 18 selection criteria.
- 13 of the above 18 selection criteria were statistically significant predictor variables for AEs (P<0.01)
- The proportion of admissions associated with AEs was highest in those with complete medical records
- Agreement between duplicate RN screenings was 84% and the sensitivity and specificity of the RN screening process were 97.6% and 67.3% respectively.
- AEs were identified in 2353 of the medical records selected on the basis of the initial screening (this represents 2353/14210 = 16.6% of the total records screened)
- 73% of the medical records were judged to be of sufficient quality to complete all aspects of the MO review, and the remainder were adequate determine whether an AE had occurred.
- There was 80% agreement on the presence of an AE 58% agreement on the preventability of an AE and 87% agreement for disability or prolonged hospital stay resulting from the AE.
- …
- 51% of AEs were judged to have high preventability
- Disability and preventability varied between specialties, diagnostic categories and according to the location in which the AE occurred
- Error of omission (52% of AEs) were almost twice as common as errors of commission (27% of AEs)
The articles abstract concludes by stating that:
“A retrospective review of hospital medical records was a reliable method of estimating patient injury caused by patient health care. Extrapolating the data on the proportion of admissions and the additional bed-days associated with AEs to all hospitals in Australia in 1992 indicated that about 470,000 (95% CI, 430,000-510,000) admissions and 3.3 million bed days (95% CI, 3.0-3.6 million) were attributable to AEs. These national figures provide empirical data for further studies on quality of care in Australian hospitals. The outcomes for patients and the use of health resources are substantial.”
Main Body
The main body of the article consists of a dense presentation describing the study methodology, results of the study, evaluation of the review process, and a summary of the main findings of the QAHCS.
Target Population
The target population was all patients admitted to public and private acute-care hospitals in Australia in 1992, estimated to be 2.82 million. The sample size was calculated on the assumption that the proportion of all admissions associated with AE would be 4.5% and that for individual hospitals the proportion would range from 2.8% to 4.2% (this was based on the Harvard Study).
Thirty one hospitals in NSW and SA were selected to participate in the study. However, for varying reasons 3 hospitals did not participate in the QAHCS so records from 28 hospitals were analysed. For each participating hospital a minimum of 520 hospital records were “randomly” selected for analysis from computer inpatient databases. Estimates of the proportion of admissions associated with AEs for specific categories of patients or admissions were determined by age, sex, insurance status and Australian National Diagnosis Related Groups (AD-DRG).
AE Indicator Criteria
The process of determining the occurrence of AEs from medical records involved an initial screening of medical records by registered nurses for the presence or absence of 18 selection criteria. The 18 selection criteria used to detect evidence of Adverse Events are listed in Table 1 of the article as:
- Unplanned admission before index admission
- Unplanned readmission after discharge from index admission
- Hospital incurred patient injury
- Adverse drug reaction
- Unplanned transfer from general care to intensive care
- Unplanned transfer from another acute care hospital
- Unplanned return to the operating theater
- Unplanned removal, injury or repair of organ during surgery
- Other patient complications (AMI, CVI, PE etc.)
- Development of neurological deficit not present on admission
- Unexpected death
- Inappropriate discharge to home
- Cardiac/respiratory arrest, low Apgar score
- Injury related to abortion or delivery
- Hospital acquired infection/sepsis
- Dissatisfaction with care documented in the medical record
- Documentation or correspondence indicating litigation
- Any other undesirable outcomes not covered above.
Those records which screened positive for a selection criteria were passed over to experienced medical officers for evaluation to determine whether or not the record indicated an AE. For each of the selected records at least two medical officers undertook a detailed analysis of the medical record. For those records in which the medical officers identified that an AE had occurred the AE was further classified in terms of the specialty involved, extra bed days attributable to the AE, the extent of disability arising from the AE, and when the AE occurred with respect to the index admission. AE preventability was then scored on a scale of 1-6. In the event of 2 medical officers disagreeing whether or not an AE had occurred the determination was made by a third medical officer.
Definitions Adopted
The article relies on a number of expanded definitions which are presented in the body of the article as follows:
Adverse Event
- an unintended injury or complication which
- results in disability, death or prolongation of hospital stay, and is
- caused by health care management rather than the patients disease.
Disability
Disability was temporary or permanent impairment of physical function (including disfigurement) or mental function or prolonged hospital stay (even in the absence of such impairment). Temporary disability included AEs from which complete recovery occurred within 12 months; and permanent disability included AEs which caused permanent impairment or which resulted in permanent institutional or nursing care or death.
Causation & Preventability
The article defines causation and preventability separately but the definitions of each are substantially similar. These definition are reproduced from the article.
Causation
Causation was present if the AE was caused by health care management rather than the disease process. It included acts of omission (failure to diagnose or treat) and acts of commission (incorrect treatment or management). A scale of 1-6 was used to determine whether an AE was caused by health care management or the disease process.
- virtually no evidence for management causation
- slight to modest evidence for management causation
- management causation not likely; less than 50-50 but close call;
- management causation more likely than not, more than 50-50 but close call
- moderate / strong evidence for management causation
- virtually certain evidence for management causation
Preventability
Preventability of an AE was assessed as “an error in management due to failure to follow accepted practice at a individual or system level”; accepted practice was taken to be “the current level of expected performance for the average practitioner or system that manages the condition in question.”
The degree of preventability was scored on a 1-6 scale, grouped into three categories.
No preventability
1 Virtually no evidence for preventability
Low preventability
2 Slight to moderate evidence for preventability
3 Preventability not likely, less than 50-50 but close call
High Preventability
4 Preventability more likely than not, more than 50-50 but close call
5 Strong evidence for preventability
6 Virtually certain evidence for preventability
The preventability was applied to all hospitals regardless of size or available resources.
The article provides indicative descriptive examples of adverse events classified in accordance with the above criteria.
The Review Process
Figure 1 is reproduced from the article it shows schematically the order in which the medical records were sorted and screened. Of the 14655 medical records originally selected 14210 were screened for the presence of the 18 selection criteria, 6210 of these records screened positive for the selection criteria and of these 2353 (16.6%) records were determined to be associated with adverse events.
Figure 1 Schematic presentation of the review process (reproduced from the article)
Various tests were undertaken to determine the reliability with which the screening criteria were interpreted and the determinations of the medical officers. The completeness of the medical records for the purposes of the screening was also assessed.
Results
Proportion of admissions associated with an AE
The article states that the proportion of admissions associated with an AE was 16.6% with a 95% confidence interval (95% CI) of 1.3%.
Proportion of admissions associated with measures of disability
Table 3 in the article reports the percentage of adverse events rated no preventability, low preventability, and high preventability for each level of disability and total adverse events by disability. Table 3 and its associated notes from the article are reproduced below:
preventability | preventability | preventability | Total Adverse Events | |
---|---|---|---|---|
Disability | No (%) | Low (%) | High (%) | |
Less than 1 month | 23.3 | 29.7 | 47 | 1073 (46.6%) |
1-12 months | 16 | 30.1 | 54 | 702 (30.5%) |
Permanent (<50%)* | 20.9 | 32.5 | 46.6 | 206 (8.9%) |
Permanent (>50%)* | 16.5 | 25.7 | 57.8 | 109 (4.7%) |
Death | 4.5 | 25.9 | 69.6 | 112 (4.9%) |
Unable to determine/unknown+ | 10 | 31 | 59 | 100 (4.3%) |
Total | 19 | 29.8 | 51.2 | 2302 (100%) |
After this table the article states the following:
* Assessed qualitatively from the medical records by the reviewing medical officers
+ This excludes the 51 cases with no response to these questions
- 46.6% of AEs caused minimal disability;
- 77.1% caused disability that was resolved within one year and
- 18.5% caused varying levels of permanent disability, including death (4.9%)
There was a statistically significant relationship between disability and preventability, with high preventability being associated with greater disability
High preventability was found in:
- 51.2% of all AEs
- 57.8% of all AEs resulting in >50% permanent disability and
- 69.6% of AEs resulting in death.
Each of these figures was reported with an associated 95% confidence interval.
Discussion
The articles discussion covers:
- AE disability vs preventability,
- bed days attributable to AEs,
- breakdown of AEs by age, sex, diagnostic category, specialties, adverse drug reaction, location and other factors
- nature of AEs in terms of complexity temporary or permanent disability or death.
The article also provides a brief comparison between the QAHCS and the Harvard Medical Practice Study.
Analysis of the QAHCS article
- Validity of Analytical Methods
- Objectivity of Conclusions
- Operational Definition Preventability of an AE
- Causation, Preventability and AEs
- Generality of the Conclusions
- Abstraction and Generalisation
- Evidence from Multiple Observations
- Extrapolating to admissions related to AEs on a national basis
- Inadmissible Regression Model
- Ill-conditioned regression systems
- Logistic Regression Model (Error Analysis)
Validity of Analytical Methods
Taking the article on its own, it is not possible to confirm the validity of the analytical methods which led to the conclusions presented in the article. The reason for this is simply that the article presents material in a compressed manner and that in so far as analytical methods are outlined it is in the form of “this particular technique” or “this particular software” was “used to anaylse the data.5” In the present context these techniques simply have to be taken at face value. Consequently the references in the article to “mathematical” and “statistical” techniques can not serve to establish the correctness of the results presented, the methods used or the validity of any conclusions drawn concerning the system under evaluation.
The present author distinguishes between the analytical techniques applied and result reproducibility or the accuracy with which AE classifications were assigned and AE indicator variables were detected. From various references in the article there is nothing to dispute the accuracy of the data – which substantially depends on the way the data was compiled. There are distinctions between precision and accuracy and structural validity. In particular precision and accuracy do not ensure the validity of conclusions which may subsequently be derived.6
Objectivity of Conclusions
Many of the premises or definitions on which the analysis is conducted are inherently subjective and as a consequence of this subjectivity, the conclusions can not represent objectively established fact. To take a specific example from the article:
“Preventability of an AE was assessed as “an error in management due to failure to follow accepted practice at a individual or system level”; accepted practice was taken to be “the current level of expected performance for the average practitioner or system that manages the condition in question.” (An AE was defined in the articles abstract – see above)
It is a matter of fact that in order to take an average the mathematical scheme describing the quantity in question must conform to the Interval Scale. Unlike temperature, or incident response time, “the current level of expected performance” can not be seen to be described by a scale whose underlying structure is linear. To be more precise here, scales of measurement based on elements of the Real Number Line are possible only because there exists a exact correspondence between what can be done with measurable properties of objects and what can be done with numbers (Stevens; 19467).
When measuring characteristics of objects, experimental operations are performed for classifying (determining equality), for rank-ordering, and for determining when differences and when ratios between the aspects of objects are equal. The empirical operations performed and the characteristics of the property being measured determine the type of measuring scale attained (Stevens; 19467a). The mathematical group structure of a scale is determined by the collection of algebraic functions which leave the scale form invariant. For a statistic to have any meaning when applying any particular scale the statistic must be invariant under all the transformations permissible for the scales listed mathematical group structure.
To put this technical analysis into plainer English, Steven’s paper identified the following 4 scales of measurement:
- Nominal (example: numbers on the back of football players)
- Ordinal (example: Moh’s scale of mineral hardness)
- Interval (example: Temperature in degrees Celsius or Fahrenheit – no absolute zero)
- Ratio (example: incident response time in seconds, minutes or other appropriate unit of time)
The nominal scale, being the most basic, allows for the use of numbers in the same way in which we may put numbers on the backs of football players or on racing cars. We can not make any meaningful conclusions by evaluating the average number on the back of a team of 13 football players as 6.5, nor can we validly conclude that race car number 05 averaged fifth place in the famous Mount Panorama Bathurst motor races. The only statistics that have any physical meaning in terms of the nominal scale of measurement are:
- the number of cases
- the mode (the most popular class)
- a contingency correlation
As an example of a contingency correlation, people may be classified according to mutually exclusive classes of hair colour such as (1) white / light, (2) dark-blonde / brown, (3) red, and (4) black. In this case the number of classes is 4 and the modal class would depend on the characteristics of the population from which the sample was being taken. A contingency correlation conducted over an appropriate population would be expected to show, for example, that black hair colour is associated with native Africans or indigenous Australians while those of English extraction tend not to have black hair.
The Ordinal Scale allows for the ordering or ranking of objects against some predefined scale such as Moh’s Scale of mineral Hardness. Under this scale hardness is ranked using ten solids arranged in such an order that a substance can scratch all substances below it in the scale, but can not scratch those above it. The Pengiun Dictionary of Physics8 advises that Moh’s Hardness scale is not quantitative and also states the reference substances in order of order of increasing hardness as (1) talc, (2) rock salt, (3) calcspar , (4) fluorspar, (5) apatite, (6) felspar, (7) quartz, (8) topaz, (9) corundum, (10) diamond. In addition to the statistics which have physical meaning when using the Nominal scale of measurement it is possible to determine median percentiles when adopting the Ordinal scale of measurement. In particular it is NOT possible to determine or take an average in any meaningful manner when using a scale of measurement that conforms to only the Ordinal Scale.
In order to determine an average it is necessary that the scale of measurement conform at least to the interval scale, which requires that it be possible to exactly determine equality of intervals. As an example, we can determine that the temperature difference between 25 0C and 30 0C is identical to the temperature difference between 45 0C and 50 0C since the Celsius temperature scale is linear9. Both temperatures differ by 5 0C, so in this precise sense the concept of equality of intervals has a well defined meaning. Due to the fact that equality of intervals may be precisely formulated it is possible to determine averages, standard deviations, rank-order correlation’s and product-moment correlation’s in a self consistent manner.
It is however not possible to determine that a temperature of 20 0C is half a temperature of 40 0C. The problem with this is that we can not double -10 0C in any self consistent manner since 2 * -100C would then need to correspond to -20 0C! The problem arises since measurements of temperature in degrees Celsius do not have an absolute zero – the zero, being equivalent to the freezing point of water, has been chosen arbitrarily. By contrast incident response time may be measured in seconds increasing from an “absolute zero” interval of time. Due to the fact that the zero of time is not chosen arbitrarily an incident response time of 2 minutes is exactly half that of an incident response time of 4 minutes. It is here interesting to note that the units of the time measurement do not affect the equality of the ratios, for example 120 seconds (2 minutes) in exactly half of 240 seconds (4 minutes). The existence of an absolute zero makes it possible to consistently determine a coefficient of variation.
The “expected level of performance” is compatible with only the Nominal scale and at best the Ordinal scale. “The expected level of performance” is not a quantitative scale and is most certainly not compatible with the Interval Scale which is required to define the Mean, Standard deviation, Rank-order correlation, and Product-moment correlation statistics. Consequently, any attempt to define a preventability measure based on the expected level of performance in terms of an “average practitioner“ is subjective, counter productive and wrong. It follows immediately from the subjectivity of the definitions adopted that the articles conclusions can not represent quantitatively established fact. One of the papers foundation definitions is inherently subjective and consequently the entire analysis presented in the article, from the point of the application of this definition onward, can be nothing other than qualitative. The statistics and mathematical techniques subsequently applied to the data have no bearing on the validity of this observation since this observation derives from the very structure of the information presented in the article as interpreted in terms of an “average practitioner “.
Operational Definition Preventability of an AE
The present author suggests that the operational definition of preventability applied in the QAHCS was if fact:
Preventability of an AE was assessed as an error in management due to failure to follow accepted practice at a individual or system level. Accepted practice was taken to be the level of performance expected by the reviewing medical officer for the management of the condition in question.
Significantly, where medical practitioners do not share a common standard of expected performance there is no prospect of reaching agreement on the outcomes of the QAHCS. The subjective definition of an “average practitioner” like that of the “average driver” carries with it subjective and implicit references to particular performance criteria which will serve to conceal rather than reveal substantial points at issue.
Causation, Preventability and AEs
It is here of direct relevance that the Causation and Preventability scales stated above in the summary of the article are also not quantitative scales. They are purely qualitative and can be said to confirm to an ordinal scale of measurement only in so far as medical records can be consistently and objectively placed into each of their respective categories. Moh’s scale of hardness is seen to be objective in the sense that either an object will or will not scratch quartz in a well defined procedural test. However, for these causality and preventability scales of measurement, we are dealing with “measurements” that can not be taken with the same level of objectivity. There is clearly no way that the Causation or Preventability scales can be characterised in terms of a linear model.
There appears to be an inherent conflict between the definitions of adverse events, causation and preventability. According to the definition stated in the article an adverse event must be “caused by health care management rather than the patients disease.”
A scale of 1-6 was used to determine whether the AE was caused by health care management or the disease process. An AE was assigned to preventability category 1 if there was “virtually no evidence for management causation”. The present author would argue that any AE assignmet to preventability category 1 is, within the context of an AE not an adverse event. Indeed the authors of the QAHCS seem to share this view:
“If either of the first two elements of the adverse event definition was not satisfied, or there was no causation (causation score 1), the review ceased (“no AE”).” (p 463 col 3)
Causation categories 1, 2 and 3 read:
- Virtually no evidence for management causation
- slight to modest evidence for management causation
- management causation not likely; less than 50-50 but close call;
The first 3 preventability scales read
No preventability
1 Virtually no evidence for preventability
Low preventability
2 Slight to moderate evidence for preventability
3 Preventability not likely, less than 50-50 but close call
In addition to the fact that these dual definitons seem to present an inefficient and inconsistent use of concepts the present author draws attention to table 3 from the article (reproduced above). It seems to the present author that this table would have to be modified as follows to reflect actual adverse events (events associated with probable management failure). It seems that the underlying issue requiring attention is the number of AEs which were both preventible and substantially increased the baseline health risk to the patient. No procedures or care can be provided with 100% safety and to the present author it seems attention should be focussed only on those AEs which are deemed preventable. This table has been constructed by back calculating from the figures in Table 3 having removed the AEs which were found not to be preventable in terms of management practices.
Table: Adverse Events and Patient Managment
Management Preventability | Total Adverse Events | ||
---|---|---|---|
Disability | Low | High | |
Less than 1 month | 319 (39%) | 504 (61%) | 823 |
1-12 months | 211 (36%) | 379 (64%) | 590 |
Permanent (<50%)* | 67 (41%) | 96 (59%) | 163 |
Permanent (>50%)* | 28 (31%) | 63 (69%) | 91 |
Death | 29 (27%) | 78 (73%) | 107 |
Unable to determine/unknown+ | 31 (34%) | 59 (66%) | 90 |
Total | 685 (37%) | 1179 (63%) | 1864 (100%) |
In light of these considerations, the present author would argue that, based on the studies own data and inferences, to more acturately reflect the conclusions of the article, the headline summary should be rewritten to state:
“A review of medical records of over 14,000 admissions to 28 hospitals in New South Wales and South Australia revealed that, in the opinion of an absolute majority of 2 out of a maximum of 3 reviewing medical officers, 13.1% of these admissions were associated with an “adverse event”, which resulted in disability or a longer hospital stay for the patient. Of these AEs 63% were deemed to have been highly preventable in terms of patient treatment or managment. …”. (Med J Aust; 163: 458-171)
Generality of the Conclusions
Abstraction and Generalisation
“An Adverse Event (AE) was defined as an unintended injury or complication which results in disability, death or prolonged hospital stay and is caused by health care management” (p 459 col 1)
The article advises that:
“The preventability scale was applied uniformly to all hospitals regardless of size or available resources” (Definitions box p 461 final sentence)
The theory of Mathematical Statistics is concerned with obtaining all and only those conclusions for which multiple observations are evidence. Mathematical statistics is not merely the handling of facts stated in numerical terms (Kaplan10; 1961). The procedures of abstraction and generalisation significantly affect the utility of data for analytic purposes. It is important to establish the extent and characteristics of the detail lost in the process of generalisation as this affects the nature of the thematic content of the information (Sinton11; 1978).
While medical malpractice is never excusable, this definition of an adverse event and the blanket application of the preventability scale fails to take any account of the underlying risk to the patient and the surrounding circumstances.
The Collins Dictionary of Mathematics12 explains that the logical proposition known as Buridans ass, dating back to the days of Aristotle, takes a modern form in terms of a fireman who ends up losing two “equivalent” burning buildings since, on the basis of logic alone, he is unable to decide which to save first. This is not to be confused with Nero’s fiddling. Nero didn’t care, while the fireman was simply indecisive. To ensure that conclusions concerning adverse events are not exaggerated, and are properly considered, AEs must be considered with regard to the baseline health risk, the risks of other procedures and or the risks of not acting at all.
Evidence from Multiple Observations
More than semantics is involved here. Statistical analysis is not simply the ability to manipulate numbers, but rather the ability to derive valid conclusions for an extended data set based on an analysis of an appropriately selected subset. This requires encapsulating all the relevant sources of variability and then being able to either “control” or “appropriately randomise” across them. Using the definition of an “Adverse Event” adopted in the article and uniformily applying the preventability scale this is simply not possible.
Extrapolating to admissions related to AEs on a national basis
In order to extrapolate from the sample data set to estimate the number of hospital admissions related to AEs on a national basis it is necessary to assume both that:
- The standard of expected performance applied by the classifying medical officers is representative of accepted professional medical opinion
- The sample of hospitals and patient records were representative of the national data set
The article has picked up on only the second of these two points. The article gives no consideration to the first of these points. Professional medical opinion in the respective specialties being the subject of each of the reviews in question may diverge significantly. Consequently, the conclusions of the QAHCS can be stated only in terms of the conclusions of the medical officers who reviewed the medical records. The articles conclusions can not be generalised beyond the assigned classifications of the medical officers who conducted the review.
The discussion in the article claims that the analysis was applied to a “representative sample of Australian hospitals.” However, this claim can not be reconciled with the facts presented in the article that the hospitals were chosen for logistical and not statistical reasons. The hospitals chosen to participate in the survey were in only two states (NSW 23 and SA 8) and significantly hospitals having less than 3000 admissions per annum were excluded from the study. Of the 31 hospitals chosen to participate 1 declined the invitation and another two were ruled out since their records were on microfiche.
For each of the 28 hospitals who participated in the survey a minimum of 520 eligible admissions from each were “randomly” selected from inpatient databases. The total number of records sampled was 14655 dividing this total by 28 (the number of hospitals participating in the survey) yields 523.4. Consequently, results from this survey can not be directly applied to make predications concerning characteristics of the patient population for the sampled hospitals – let alone on a national basis. The problem here appears to be that 520 records were extracted from each hospital regardless of its annual patient intake. To infer characteristics of the entire inpatient group, as a population, it is necessary to weight the number of sampled records to reflect the size of each hospitals intake. In short a patient visiting a hospital with say 100,000 inpatient records from which 500 were sampled would be half as likely to have their record sampled as one who attended a hospital with only 50,000 inpatient records. This issue becomes particularly serious inlight of the observation that the hospitals themselves may be further categorised as (p 460 col 1):
- Teaching or principal referral hospitals (number of sampled hospitals n=10)
- Major referral hospitals (n=4)
- Major rural base hospitals (n=2)
- District high activity level hospitals (n=3)
- District medium activity level hospitals (n=6)
- Private hospitals (n=6)
Only 28 of these 31 hospitals were sampled. The article did not appear state which categories all the 3 unsampled hospitals were members of. An intermedicate resolution to this difficulity may be obtained by comparing the breakdown of AEs on a hospital by hospital basis to see if there were any differences in their respective rates and categoires of AE
A significant issue here is that the article extrapolates the estimated proportion of admissions associated with an AE from the survey (16.6%) to all Australian hospitals and concludes that about 470,000 admissions (16.6% of 2.82 million admissions nationally) are associated with AEs annually in Australian hospitals. But by the articles own calculation 19% of these AEs were deemed to have NO preventability and a further 29.8% of them were deemed to have low preventability. These modifications to the raw (non-extrapolated figures) have been undertaken in the modification of Table 3 extracted from the article (shown above).
Simply by excluding from consideration those AEs from Table 3, which in the opinion of the medical officers undertaking the evaluations were not preventable, the predicted number of admissions extrapolated on a national basis falls by 19% or 89,300 admissions. Further removing from consideration those 29.8% having low preventability (Table 3) removes another 140,060 from the national prediction. This reduces the extrapolated figure of about 470,000 admissions associated with AEs to 240640.
The article concludes by advising:
“Our results can be used in the policy debates on patient education, litigation in health care, … including the development of safer protocols for patients. The implications in terms of preventable adverse outcomes for patients and use of health care resources are substantial”
To avoid significantly unfounded conclusions, it is essential that objectively measurable and consistently interpreted concepts be used to guide procedural and legal matters.
Inadmissible Regression Model
The article explains that 18 criteria were used to identify circumstances where Adverse Events were possible. These criteria are listed in Table 1 of the article and were reproduced above. On page 463 (column 3) of the article it is stated that:
“A logistic regression model with AE as an outcome and all 18 criteria as predictor variables found that five criteria (5, 10, 13, 16, and 17) were not statistically significant at the 0.01 level.”
Exactly what is meant here is not at all clear. The article is not sufficiently descriptive to permit the level of analysis required to determine exactly which, if any, conclusions may be drawn in this regard. However, logistical regression (as opposed to correlation analysis) is a technique generally used to investigate the association between a dependent variable and one or more independent variables. For example (Cooper13; 1969), we may seek to describe a mature mans weight (W) as a function of height (H), waist measurement (S), and back length (B). Under such circumstances a model of the following type could be proposed:
W (kilograms) = A (a constant) + a H + b S + c B *
Where a, b, c are multiplicative coefficients and A a base “mass constant.” In order to undertake such an analysis it is necessary that the variables being considered be independent and describable at the very least in terms of the Interval Scale. A comparison between the variables representing the 18 predictor variables for adverse events and the inherent structure of the properties of the Interval Scale outlined above reveals that the variables can not possibly satisfy these criteria. Consequently an analysis of these types of variables in terms of a logistical regression, as described here, is inadmissible. The 18 predictor variables, cited above, conform to only the nominal scale of measurement since they can not be validly ordered and the concept of equality of intervals has no meaning when considering the 18 predictor variables. The only statistical methods which can be validly applied when using the 18 predictor variables are:
- the number of cases
- the mode (the most popular class)
- a contingency correlation
Consequently, the only possible statistical relationship of relevance to the 18 predictor variables seems to be the Contingency Correlation. The present author can see no way in which a contingency correlation may be determined by applying a logistical regression model to a system consisting of 18 predictor variables which conform to only the nominal scale.
Ill-conditioned regression systems
Putting the significant and basic problems with the regression analysis aside it is very well know that systems of equations such as those represented by the regression equation * are generally “ill-conditioned.” A second year University text in Algebra (Hill14; 1986) describes this as a computational tragedy and explicitly states:
“Ill conditioned problems are very difficult to handle because if we wish a prescribed number of significant figures in the solution, we must determine accurately many more significant figures in the constants we start with. This is undesirable at best, and may even be impossible if the constants are obtained from physical data.
It is surprising and unfortunate how many approaches to real-world problems lead to ill conditioned systems. When this happens, alternative approaches that lead to less ill-conditioned systems must be found”
Cooper’s13a book, written in 1969, states that a large number of computer programs exist to perform multiple regression and provides warnings concerning the practical application of regression analysis for addressing the sort of problems outlined. Cooper goes to some length to develop an analytical program which exploits generally accepted methods deriving from orthogonal polynomials. But again the techniques of orthogonal polynomials require, at the very least, that it be possible to define equality of intervals which can not be done in the current circumstances when using the 18 predictor variables.
Logistic Regression Model (Error Analysis)
The classic university text Advanced Engineering Mathematics (Kyeyszig15; 1988) states the sum of independent normal random variables theorem as follows:
Theorem (Sum of independent normal random variables)
Suppose that X1, X2, … , Xn are independent normal random variables with means m1, m2, … mn and variances s12, s22, …, sn2 , respectively. Then the random variable:
X = X1 + X2 + … Xn
is normal with:
mean m = m1 + m2 + … + mn
and variance s2 = s12 + s22 + … + sn2
This theorem is of considerable significance in the current circumstances where 18 criteria have been applied as predictor variables. In circumstances where a mean can be defined (Interval Scale) on independent variables then the total variance is determined by a form of the standard Pythagorean addition formula shown. In the alternative, where the variables are not independent, the total variance may be less than that corresponding to the Pythagorean addition formula shown. However, under circumstances where the variables are not independent logistical regression methods almost invariably lead to significant computational mistakes.
This suggests that, if it were possible to apply a logistical regression analysis to the predictor variables under consideration (which it is not), an analysis of the errors involved in the process and the mutual dependency between some of the variables would likely show statistical errors inherent in performing the analysis yielded conclusions based on the program unsafe.
Summary of Conclusions
The dual definitions of an AE and application of preventability and causality scales seem to present an inefficient and inconsistent use of concepts. According to the definition presented in the article AEs must be caused by health care management rather than the patients disease. However AEs were assigned to preventability category 1 if there was “virtually no evidence for preventability”.
It seems to the present author that the underlying issue requiring attention is the number of AEs which were both preventable and substantially increased the baseline health risk to the patient. These are the events which, in the opinion of the classifying medical officers, can be associated with probable health care management failure.
Simply by excluding from consideration those AEs which, in the opinion of the classifying medical officers, were not preventable, the predicted number of admissions extrapolated on a national basis falls by 19% or 89,300 admissions. Further removing from consideration those 29.8% having low preventability (Table 3) removes another 140,060 from the national prediction. This reduces the nationally extrapolated figure of about 470,000 admissions associated with AEs to 240640. However, these figures are also subject to the general conclusions stated in this review and the immediately following summary of conclusions. The present author therefore strongly advises against adopting any of these figures as indicative of Adverse patient Events on a national basis.
This review has been primarily concerned with the mathematical foundations of the QAHCS as presented in the article. Based on the inherent structure of the information presented in the article and the facts presented in this review the present author argues that:
- The selection criteria adopted by the reviewing medical officers in determining the expected level of performance of an “average practitioner” are not stated.
- The QAHCS article leads no evidence to suggest that the classifying decisions of the medical officers were (or are) representative of accepted professional medical opinion in the respective specialties being the subject of each of the reviews in question.
- The hospitals selected for medical record review were not selected so as to permit valid inferences concerning adverse patient outcomes on a national basis and the sampling error associated with their selection as nationally indicative centres can not be directly estimated.
- The extrapolation in the article from the sample data set to estimate the number of hospital admissions related to AEs on a national basis has no valid mathematical foundation.
- The results reported in the QAHCS article do not and can not represent quantitatively established fact.
- The QAHCS, as presented in the article, is essentially a numerical analysis of the classification assignments of the reviewing medical officers.
- As presented in the article, the QAHCS does not deal with the quality of Australian health care on a national basis and therefore it should not be cited as such in discussions concerning national health care quality.
References & Notes
1Brennan TA. Loape LL, Laird N et al. Incidence of adverse events and negligence in hospitalised patients; results of the Harvard Medical Practice Study. (N Engl J Med 1991; 324: 370-376). Cited from: Wilson et al “The Quality in Australian Health Care Study” (Medical Journal of Australia. Vol. 163. 6 November 1995).
2 1994 Quality in Australian Health Care Study (QAHCS). Commissioned by the Commonwealth Department of Human Services and Health. Cited from: Wilson et al “The Quality in Australian Health Care Study” (Medical Journal of Australia. Vol. 163. 6 November 1995).
3The article states that in the QAHCS an index of preventability was used instead of a determination of negligence as in the Harvard study
4, 4aNumbered Refererences for the Harvard Medical Practice Study. From Harvard Study Continues to Distort Health Care Quality Debate. by Richard E. Anderson, M.D. F.A.C.P.http://www.thedoctors.com/Resources/Articles/RAPIAA598.htm
- Brennan RA, Leape LL, Laird MM, Hebert L, Localio AR, Lawthers AG, et al. Incidence of Adverse Events and Negligence in Hospitalized Patients: Results of the Harvard Medical Practice Study. New England Journal of Medicine. 1991; 324: 370-6.
- Localio AR, Lawthers AG, Brennan TA, Laird NM, Hebert LE, Peterson LM, et al. Relation Between Malpractice Claims and Adverse Events Due to Negligence. New England Journal of Medicine. 1991; 325: 245-51.
- Leape LL, Brennan TA, Laird N, Lawthers AG, Localio AR, Barnes BA, et al. The Nature of Adverse Events in Hospitalized Patients. New England Journal of Medicine. 1991; 324: 377-84.
- Weiler PC, Hiatt HH, Newhouse JP, Johnson WG, Brennan TA, Leape LL. A Measure of Malpractice. Cambridge: Harvard University Press; 1993: 175.
- Weiler PC, Newhouse JP, Hiatt HH. Proposal for Medical Liability Reform. Journal of the American Medical Association. 1992; 267: 2355-8.
- Weiler PC, Brennan TA, Newhouse JP, Leape LL, Lawthers AG, Hiatt HH, et al. The Economic Consequences of Medical Injuries. Journal of the American Medical Association. 1992; 267: 2487-92.
(these references cited from the articles attributed to Richard E. Anderson referenced in the text)
5For example, the article says: “The QAHCS used a stratified two-stage cluster sample to choose eligible admissions for review, … SUDAAN software was used to obtain estimates of proportions and their SEs and to perform logistic regression analyses, as it adjusts for the sampling design.
6 (i) precisioninfo.com: Structure of Information & Constraints to Analysis (ii) Riversinfo Australia: Australian Map Accuracy Standards & Correlation Analysis
7 7a Stevens SS. On the Theory of Scales of Measurement. Science; Volume 103; Number 2684; June 1946. (See also: (i) precisioninfo.com: Structure of Information & Constraints to Analysis)
8 Penguin Dictionary of Physics. A abridgement of Longman’s A new dictionary of physics, first published 1958. ISBN 0 14 051.071 0
9 Observant readers may recall that there exists direct linear transformations between temperatures stated in Fahrenheit, Kelvin and in Celsius. The “absolute zero of temperature” defined to as zero Kelvin, where all electrons are believed to be in their “ground electronic states” is not a matter that is instructive to digress to consider at this time.
10 Kaplan A. Sociology Learns the Language of Mathematics. In The World of Mathematics. Edited by JR Newman. Published by Allen and Unwin; Britain; 1961.
11 Sinton D. The Inherent Structure of Information as a Constraint to Analysis: Mapped Thematic Data as a Case Study. Harvard Papers on GIS. First International Advanced Study Symposium on topological data structures for Geographic Information Systems. Edited by G Dutton. Volume 7; 1978.
12 Collins Dictionary of Mathematics. Borowski EJ & Borwein JM. Published by Harper and Collins; Great Britain; 1989.
13 13a Cooper BE. Statistics for Experimentalists. Atlas Computer Laboratory, Chilton, Didcot, Berkshire. Pergamon Press. 1969 (Page 233)
14 Hill R.O. Elementary Linear Algebra. Published by Michigan State University 1986.
15 Kreyszig E. Advanced Engineering Mathematics. 6th Edition Published by John Wiley and Sons, New York 1988. (Page 1253)