A meta-analysis uses statistical techniques to combine multiple “real” studies into a new “almost as if” study. A meta-analysis is “almost as if” a study had actually been done the same way that the included studies were done. The purpose of a meta-analysis is to contrast and combine results, in order to create a composite study more powerful than any of the component studies. This is what Nissen and Wolski performed for their controversial New England Journal of Medicine publication, which suggested that rosiglitazone (Avandia, Avandamet, and Avandaryl; GlaxoSmithKline) was responsible for an increase in myocardial infarctions (“heart attacks”) and deaths due to cardiovascular (heart or blood vessel) causes.1 A meta-analysis can never be better than a real study that was designed the same way as the meta-analysis, but it can be worse. There are a number of pitfalls in performing a meta-analysis.
Both of Nissen and Wolski’s meta-analyses1,3 were seriously flawed. They based them upon a biased (non-systematic) selection of component studies, they included studies that arguably should not have been included with the others, they made no attempt to validate the data, and they used unsuitable statistical techniques.2 All of these errors were described for their first meta-analysis1 in my December 18, 2013 post, “Steven Nissen and The Not-So-Great Avandia Controversy.” In addition, more patients in the control groups dropped out (because of elevated blood glucose levels), resulting in less follow-up for them than for the rosiglitazone groups.4 It is obvious that the longer follow-ups for subjects treated with rosiglitazone resulted in a greater time periods in which to experience a cardiovascular event, biasing the results against rosiglitazone.
Is that all that they did wrong? No.
A problem known as “multiple comparisons” was a serious defect in both of Nissen and Wolski’s meta-analyses. As far as I can tell, no one else has pointed out this critical error, at least in print. [Note: Since this posting, I have found a possible reference to the problem: “The initial concern with rosiglitazone arose from observational and case–control epidemiologic studies that generated a legitimate signal of possible cardiovascular harm, but every study had substantial methodologic shortcomings, including multiplicity, which meant that a statistically positive finding might be a false positive result.”5 –Myron Shank, M.D., Ph.D., December 21, 2013.]
Assuming that data represents truly random (chance) samplings, statistics can be used to estimate how frequently differences as large or larger than those observed would have occurred in samples that came from the same population. It is usually assumed that the data represent different populations, rather than coming from the same population, if, on average, random samples from the same population would be as different as, or more different than, those that were actually observed less than one time out of twenty. Stated more conventionally, if there is less than a 5% probability (P<0.05) of seeing results as different as or more different than what was observed, the difference is usually said to be “statistically significant” (In reality, the choice of probability should depend upon the relative costs of erroneously concluding that the samples come from the same population or that they come from different populations, but this is seldom considered in medicine.).
The more times that you make comparisons, however, the more likely you are to see unusual differences. If you sample enough times, you will eventually see even differences that occur very rarely. The probability (P) is for a single comparison, not for multiple comparisons.
So, what did Nissen and Wolski do?
In their original analysis,1 Nissen and Wolski performed not one but eighteen comparisons. Their “overall” analyses overlapped not only with their analyses of “small trials,” but with the DREAM and ADOPT trials, as well. Cardiovascular deaths overlapped with myocardial infarctions. “Combined comparator drugs” overlapped with the individual drugs (metformin, sulfonylurea, insulin, and placebo).
Worse, in their revised meta-analysis,3 Nissen and Wolski performed not one but thirty-four comparisons. Moreover, cardiovascular deaths still overlapped with myocardial infarctions, short-term and long-term studies overlapped with “overall” analyses, “overall” analyses overlapped with their component analyses, analysis without RECORD overlapped with those with RECORD, and their “Peto” analyses overlapped with their analyses that included studies without any cardiovascular events or deaths. It is utterly impossible to fully correct for the errors introduced by these interrelated multiple comparisons, but, with their marginally “significant” results, even the most timid attempt would have completely eliminated any appearance of “significant” differences.
Fortunately, there is an easy solution. If, and only if, their “overall” analysis for all cardiovascular outcomes and deaths were statistically “significant,” Nissen and Wolski were justified in performing sub-analyses. In other words, if they knew that there was a difference somewhere in the data, Nissen and Wolski were entitled to search for it without fear of falsely creating the appearance of a difference. This is sometimes known as a “protected” statistical test, because the sub-analysis is “protected” by the knowledge that the overall results were unlikely to have occurred by chance. In their original meta-analysis, Nissen and Wolski could properly have performed their sub-analyses, if a correct analysis of data had appeared to show overall “significant” differences between rosiglitazone and other drugs or placebo. However, as I discussed in my December 18, 2013 post, “Steven Nissen and The Not-So-Great Avandia Controversy,” that was not the case. In the absence of a true “overall” difference, comparing rosiglitazone with all non-rosiglitazone treatments, their sub-analyses was invalid.
While someone might attempt to excuse them for this statistical breach, on the basis that Nissen and Wolski (incorrectly) believed that their “overall” results were “significantly” different,1 no such rationalization is possible for their revised meta-analysis. In that study, their “overall” results unambiguously showed no difference between treatment with rosiglitazone and treatment without rosiglitazone.3 Since their “overall” results were not “significantly” different, Nissen and Wolski were not entitled to test for the source of the non-existent difference. They proceeded to do just exactly that, anyway, claiming to find “significant” differences3 that, by definition, did not exist in the data. Worse, Dr. Nissen has adamantly insisted that these phoney results trump everything else.
The bottom line is that Nissen and Wolski’s own revised meta-analysis failed to show any evidence of a difference between treatments that included rosiglitazone and those that did not. Period. All of the hubris in the world cannot overcome that simple fact.
©2013 Myron Shank, M.D., Ph.D.
4 Home Philip D., Pocock Stuart J., Beck-Nielsen Henning, Gomis Ramón, Hanefeld Markolf, Jones Nigel P., Komajda Michel, McMurray John J.V., for the RECORD Study Group. Rosiglitazone evaluated for cardiovascular outcomes—an interim analysis. New England Journal of Medicine 2007; 357:28-38.