Circling the Waves

We have had another triumph from LIGO while I was away: a second gravitational wave detection (called GW151226) was claimed, corresponding to a December 26 event I alluded to in an earlier post. In that post, I expressed hope that the limitations of the first observation (GW150914) would soon be shored up by additional observations.

But since the claimed second detection, I’ve felt the opposite effect. Rather than having my initial qualms about these results allayed, I’m getting increasingly uncomfortable discussing or even thinking about this subject, because what I see continues to fall short of what so many experts and so much of the press have been claiming of it. Much of what follows is my lengthy, wandering attempt to get a grip on what it is that makes this result so uncomfortable–to clarify to myself on the one hand the reasoning behind claiming a detection and on the other, the potential weaknesses of that reasoning.

As with the earlier event, the announcement of GW151226 was greeted by adulatory reports with no trace of questioning or reservation; a case of big science conquering all, with no ifs ands or buts. But after actually looking at the data on which this second detection is being claimed so triumphantly, it becomes obvious that the signal is extremely weak; so weak, in fact, that it is completely impossible to see anything resembling a “ringdown” signal out of the normal detector and other background noise. In the words of the new LIGO paper’s authors,

“Matched filtering was essential to the detection of GW151226 since the signal has a smaller strain amplitude and the detectable signal energy is spread over a longer time interval than GW150914”.

Effectively, the signal in GW151226 is so weak and so buried in detector noise that there is no way to directly derive it from the data–say by some straightforward filtering or cleaning-up process or, as was the case for GW150914, just from the naked-eye-obvious shape and temporal overlap of the two interferometers’ data and their resemblance to the predicted “ringdown” signal of a black hole merger. (You can take a look at Figure 1 of the paper announcing the detection to get an idea of just how icky it looks.)

The words of the LIGO scientists hardly feed my enthusiasm for this detection. “Because of the signal’s smaller strain amplitude and time-frequency morphology,” the paper’s authors write, “the generic transient searches that initially identified GW150914 did not detect GW151226.” On the other hand, in the MIT Tech Review article on the detection, we hear one of the LIGO scientists say of the signal, “It’s totally buried inside the noise.”

To get around the issue of this “totally buried” signal, the LIGO authors had to employ something called “matched filtering“. This is a statistical technique whereby one can probe noisy data for chosen signals of interest. In the words of the authors, matched filtering “correlates a waveform model with the data over the detectors’ sensitive band, which enabled GW151226 to be extracted from the detector noise.”

Provided you have a good enough understanding of the statistics and behavior of the instrumental noise itself (which I am sure the LIGO team does, as they state that “Detection and parameter estimation rely on understanding the sources of detector noise”), matched filtering lets you statistically compare signals derived from different hypotheses with a set of noisy data. For each time point in the data, it tells you the best-fit scaling of each hypothesis-filter and gives you a statistical measure telling you how (un)likely it is that the best-fit of each hypothesis matches the data as well as it does by chance alone. The less likely that the match is by chance, the more significant your hypothesis, i.e. the more likely that the signal you’re looking for is really “there”.

I don’t have any problem with the mathematics of matched filtering, or with its application to dig for patterns that aren’t manifest to the naked eye, but I do feel some trepidation about its use here.

To try and get at the reason for that trepidation, I would argue that for matched filtering to be really valid, it needs at least some of the following conditions to be met:

1) A high prior confidence that a signal really exists in the data to be found;

2) A high prior confidence what kind of signal is there to be found, so that for instance you already know the general form of the equations/laws that model it, and only need to nail down certain parameters or details within those equations;

3) In the absence of 2), using a panel of hypothesis-filters sufficiently diverse that it can give an idea of how easily unrelated hypotheses (I’ll call them “adversarial examples”) could fit the noisy signal as well or better than your hypothesis of interest;

4) In the absence of 2) and 3), employing a hypothesis sufficiently detailed and information-rich that a match virtually guarantees detection.

5) In the absence of 4), a large number of high-significance matches.

Conversely, we should be wary of making detection claims using matched filtering when:

1) We have limited or no prior evidence that the signal is really “there”;

2) We have limited or no certainty about what kind of signal it is, or what form it takes;

3) We conduct our search using filters that all assume the hypothesis we are looking for, and use a huge number of these filters (thus inflating the chance that the hypothesis will be confirmed by at least one of them);

4) We are searching for a hypothesis-signal that is not very distinctive or is low in information;

5) We only have very few matches.

Consider an example of how this works: you’re post-docing in a space station near the radiation belts of Jupiter. Tuning in to Earth Classics TV (ECTV), you find a signal so full of static that you can only see and hear what looks for all the world like the static and noise of an empty channel; you definitely cannot see for yourself that it is a specific episode of a specific show, nor can you filter out enough static to start to see a signal.

However, you know that ECTV always does a broadcast at this exact time and frequency, that they have done so many times before, etc. And because of the distance and the static from Jupiter, it’s not too surprising that the signal is weak and messy. So you have very good reason to think that the broadcast really is happening, under the noise as it were. Let’s assume you also know (from the interplanetary TV Guide) that the signal you are looking for is a Game of Thrones episode, and you know a lot about the static coming from Jupiter. That gives you a lot of information about what kind of signal you are looking for, what kind of structure it will have.

Going ahead with matched filtering then, you take the data files of all the Game of Thrones episodes you can think of and statistically compare them to a sliding, scaling, hour-long window of this seemingly empty channel around the expected broadcast time. Sure enough, your analysis shows that of all the episodes you tried, one episode–the season 6 finale–gives by far the best statistical match. Triumphant, you announce to the rest of the space station that you have found the Game of Thrones season 6 finale was broadcast from ECTV. The question now is: are you justified in making that announcement?

In this case, probably yes. Most of the needed conditions are met: there’s good warrant to believe that the signal was really there, and a good idea what kind of signal it was. Though you didn’t test other hypotheses in this case (other shows, movies, etc.) and you got only one match, you’re still OK because the hypothesis you are looking for is extremely distinctive and informationally rich: an hour-long TV episode contains gigabytes of data and loads of higher-level features that can be used to identify the show and episode–the sequence of scenes, the characters’ faces, even the movement of lips which could be used to derive the dialogue.

By contrast, the LIGO claim for GW151226 looks kind of iffy on most of these counts. Let’s go through them one by one.

1. Prior confidence in signal’s existence/openness to null hypothesis.

For LIGO, we have only one decent experimental data point suggesting that the gravitational wave signal is there, and that is of course the much stronger GW150914. On the other hand, detections have so far been much rarer than was originally predicted, and as I wrote earlier, GW150914 itself was observed under odd circumstances (while still on the training mode, and with no one present at the facility) and wasn’t accompanied by any other observations in other modes that would buttress the interpretation of a black hole merger. (Prior to the first claimed detection, LIGO detector had run at strain sensitivities of 1 in 10^21, exactly the same as claimed for GW150914, since November 2005, and went several years without sensing any gravitational waves.)

Furthermore, LIGO is not the first experiment set up to search for evidence of gravitational waves, nor is interferometry the only method by which these waves should be detectable. In particular, gravitational waves from supermassive black hole mergers should alter the path length for signals coming from pulsars, causing subtle variations in the apparent timing of those pulsars. From Earth, pulsars from the same region of the sky should show the same variations at the same time. Since the pulsars are far from each other that they could not possibly coordinate, there must be something between us and them, altering the time of travel. It’s a bit like seeing a mirages over a distant hot road: the road and horizon seem to wiggle together, but the “movement” is really the refraction by intervening air currents.

Arrays of radio telescopes have been set up to track cohorts of pulsars in search of these directionally dependent timing discrepancies. Less than a year ago, a team at the Parkes Pulsar Timing Array announced that they had failed to detect any sign of gravitational waves after a 12-year search using 24 pulsars.

It’s important to note that the two types of experiment are not interchangeable, as pulsar timing is best for finding higher-frequency gravitational waves than interferometry. Still, considering the remarkable dearth of signals from LIGO compared to what was expected, combined with the null result from pulsar timing, our experimental confidence in the existence of gravitational waves at this point should still be considered at best quite tentative. Instead, as negative or doubtful results have piled up, the search has only intensified.

This raises one of the more uncomfortable gray areas of real-world scientific practice; namely, at what point does perseverance cross into willfully discounting a negative result? One of the most striking examples of this occurred in 2007, when a powerful gamma-ray burst named GRB0701201, thought to be due to neutron star mergers, was detected coming from the direction of the Andromeda galaxy. Such a merger from so relatively close by would be a prime candidate for gravitational wave production, but it produced no signal at LIGO. According to a press release covering this non-detection,

“Such a monumental cosmic event occurring in a nearby galaxy should have generated gravitational waves that would be easily measured by the ultrasensitive LIGO detectors. The absence of a gravitational-wave signal meant GRB070201 could not have originated in this way in Andromeda.”

There were two possible interpretations of the negative results here. Either the neutron-star merger only appeared to come from Andromeda but was actually too far away for its gravitational waves to be detectable, or this was evidence that gravitational waves, which had not been detected before, really might not exist as expected. Yet the first interpretation was automatically taken to be the correct one. In a hair-raising favoritism, the absence of a phenomenon that yet to be shown even existed–gravitational radiation–was invoked to settle the matter.

This is where lack of corroboration by alternative observational modes really bites down. How do we know the signal found by our matched filter search “really exists” if we have no prior detections in the gravitational wave mode besides GW150914, and no coincident detections whatsoever in other modes, whether from pulsars, gamma rays, X-rays or anything else, to support the search’s conclusions or confirm that anything, black hole merger or otherwise, really happened? Despite the strong confidence and supreme status that general relativity (GR) has earned, the situation is problematic.

Finally, just as was the case with GW150914, the lack of corroboration in GW151226 is compounded by the lack of directional detail–that is, we don’t know what part of the sky the signal came from. As the paper explains, “Coarse sky localization is due to the limited information afforded by only two sensitive detectors in observing mode.” This means that it would be more difficult, even if there were a coinciding signal in another mode, to be sure that it came from the same source and hence to corroborate the detection.

2. Certainty about the nature of signal.

Leaving aside the fraught hypothetical of a signal’s cryptic “existence” in data, we need to be sure that signal really is most likely to be both a gravitational wave and produced by a black hole merger. On this score the grounds for the LIGO team’s matched filter approach certainly seem stronger at first. If we are willing to grant in the first place that there really is a signal in GW151226, we have various kinds of warrant for believing the signal is more likely to be a gravitational wave than anything else.

As alluded to before, first and most important is the warrant of general relativity’s past predictive successes (perihelion of mercury, gravitational redshift, slowing of clocks lower in a gravity well, and lensing). These strongly suggest that general relativity is, for the scales and energies where it applies, broadly correct. Gravitational radiation, as yet another one of the theory’s predictions, should thus indeed ripple away across space-time at light-speed whenever massive bodies orbit each other. Moreover, the theory tells us specifically what these waves ought to “look” like when emitted by certain types of sources. Experiments like LIGO and pulsar timing are constructed with these properties in mind, in order to search for them as exactingly as possible.

Coming from the opposite direction of this wealth of theoretical prediction from general relativity, we have a kind of argument from parsimony, which asks, “what better ideas do you have? What else but a gravitational wave could cause a synchronized oscillation in detected path length near-simultaneously at two interferometers nearly 2,000 miles away? They must both be feeling the same thing, and the only thing we know of in any of our theories that could readily do that is a gravitational wave.”

At first blush, there should be no problem here. You build the detector for the specific properties of gravitational waves from battle-tested GR. You see something very close to those properties when you turn the detector on–well, technically 13 years after you turn the detector on, and only once very clearly, but still, an observation is an observation. Conversely, you have no clue what other kind of thing could cause the minuscule wiggle your detectors both picked up. So you claim what you have seen must be gravitational waves… full stop. So what’s the problem? Should the LIGO team hold off on claiming the thing they were looking for, simply because some other unknown “kind of thing” could perfectly match GR’s prediction by chance?

But here we stumble over another one of science’s fraught philosophical gray areas, which is that between argument from parsimony and argument from ignorance. Parsimony says: “We take this explanation because it is the simplest and most readily described in terms of things we already know, and therefore it is the most likely to be true.” So much to the good. Yet ignorance says: “We take this explanation because we have no better guesses at this time as to where we should look, and this one is good enough and we have spent billions in funding on it, so we will declare it true and settled as soon as we see anything that resembles it.”

There is no question that GW150914, with its (relatively) strong waveform, closely matched theoretical predictions for gravitational waves produced by an inspiral, a match which was tested in the first detection paper. Yet, as long as GW150914 remains the only direct experimental result that matches the expected GW “fingerprint” without first feeding in that fingerprint via the matched-filter methodology, the grounds for denoting subsequent “buried in noise” wiggles in an interferometer as new beneficiaries of that certainty will remain shaky.

Another philosophical question, which goes back to our trust in general relativity, is: how does a collection of equations, models and formalisms, by dint of having made a certain amount of successful predictions, assure us that its remaining predictions will be successful as well? Predictions are often guided by theory, and this is an excellent way to narrow down the daunting space of possible hypotheses to a well-justified focus for an experiment. However, it’s also nothing new for theories to turn out to be flawed, and rarely in an all-or-nothing way–most theories in the history of science have been far more helpful than nothing, but still incomplete. (It’s been remarked ad nauseam that general relativity itself is already an example of such incompleteness, in that it fails at singularities and resists reformulation into quantum-mechanical language.)

Much as with the neutron star example from 2007, there instead seems to be a powerful bias not to consider the negative or ambiguous result, or the imperfection of theory; any failure to observe gravitational waves where and when current models would expect them is waved away as a problem with the models calling for “perseverance”, but a single positive observation is taken as utterly certain and decisive.

I suppose if it walks like a gravitational wave and quacks like a gravitational wave, so to speak, that is a fair basis for calling it such. But I still think there is a near-cringeworthy chance that we are as yet fooling ourselves. (In my area in the life sciences, we are accustomed to far noisier, messier and less reproducible results than in physics; yet I suspect that if anything like Figure 1 of the new LIGO paper were submitted in a biological manuscript, it would be laughed out of court by any self-respecting journal reviewer.)

3. Diversity of hypotheses tested.

If we are not able to reach a high enough certainty about the nature of the signal to begin with, say because we want to be skeptical about the warrant offered by general relativity’s other successes, or because we feel that existing observations are not yet numerous enough or unambiguous enough for us to be sure of the nature of the signal, we can go ahead with the matched filter anyway, but with the requirement that we scan a sufficiently diverse

The term “sufficiently diverse” is a difficult one to quantify, and I am not in the position to do so here. However, we do not need an exact figure, just the two obvious principles that it must mean more than one hypothesis, and also that the more numerous and varied the hypotheses the closer we will get to it.

In the case of LIGO, we find in the new paper that “False alarms more significant than GW151226 would, in principle, be produced by the online search at a rate of approximately 1 per 1000 yr”, which certainly sounds impressive, well above chance for an observational period of a few months. However, I don’t see any measure of how many other hypotheses, having nothing to do with general relativity or gravitational waves or inspirals, perhaps even producing completely different looking waveforms, were tested to see if they would produce similar calculated “false-alarm” rates in a matched-filter search on the same data.

Since it’s of course impractical to try matching all imaginable signal patterns of all imaginable phenomena, matched filtering depends on having already established certain types of pattern that the researcher has some cause to believe ought to be in the data. In short, unless the signal to be found is already directly given, like a radar echo being compared with the original pulse or, in the example, the episode in the TV guide at a specific time and channel, matched filtering is value-laden.

The only hope of getting around this is by impartially testing many hypotheses. If, for example, we generated a bank of 500 waveforms completely unrelated (but of similar complexity) to that expected from a black-hole inspiral, and found that none of these hypotheses got anywhere near the relativistic inspiral hypotheses, we would have powerful reassurance about our use of the matched-filter. On the other hand, if we found that 200 or even 50 out of the 500 produced matches about as good as the inspiral-gravitational wave hypothesis–i.e., that 1/10 of basically randomly-chosen hypotheses produced “1 per 1000 yr” matches, we would know from the weird abundance of such adversarial examples that something had to be amiss with our test.

Conversely, using too many matched filters that derive from the same basic hypothesis can skew the results in favor of that hypothesis. If you test 500 different filters based on variations on black-hole inspiral models, we would expect the odds of finding a match for at least one of them to go up. We would then begin to cross yet another fuzzy grey line, this time between confirmation and dredging. (In data dredging, we make the range of possible matches so big and the range of potential hypotheses so narrow that odds are, by sheer statistical force, at least something in that range will appear to be a match, with high significance.)

I can’t at this point claim the LIGO match is the result of dredging, but at the moment I don’t see anything reassuring me that something like it could not be at work. From what I have read, it seems that with the announcement of GW151226, the LIGO team left out diverse hypotheses while employing an abundance of theoretically-homogenous hypotheses; in their words, the matched filter selection for GW151226 appears to have been limited to “…a discrete bank of waveform templates which target gravitational waves from binary black hole systems”. Elsewhere they note that “a coherent Bayesian analysis of the data was performed using two families of waveform models”, but still it turns out that “…Both models are calibrated to numerical simulations of binary black holes in general relativity.”

Note that this was not the case for GW150914. Referring back to the first LIGO detection paper, the match of the waveform was verified not only by its obvious appearance, and by “matched filtering with waveforms predicted by general relativity” but also by comparison with “a broad range of generic transient signals, with minimal assumptions about waveforms“. This latter offers crucial assurance that alternative hypotheses were considered, and found wanting. However it doesn’t seem to be there in the second paper, again giving the uneasy feeling that the LIGO team risks getting ahead of themselves.

So with GW151226, there seems to be a danger of question-begging, in that the only hypothesis employed to construct the search filters–viz., “waveform templates which target gravitational waves from binary black hole systems”–is the very same thing that is supposed to be confirmed by the discovery. I don’t know if there’s some logical subtlety whereby this is actually okay, but without any consideration of the likelihood of other imaginable hypothesis types in the data, I continue to find it really awkward to think that a filter match based on a set of filters that only look for gravitational waves from inspirals, no matter how good, is itself evidence that that’s really what’s “there”.

A natural reaction to all this needling about alternative hypotheses would be to fling one’s hands in the air and say, in the manner of the “parsimony” argument, “well what else would you expect a machine designed to look for gravitational waves to look for?” This is not good enough however: just because a machine is designed to respond to Z, does not guarantee it responds only to Z, and it’s nothing new in science for discoveries to be guided by a theory later seen to be incorrect, or by no theory at all–often using instruments originally intended to demonstrate an entirely different effect. This is the essence of serendipity, which has long played a (perhaps embarrassingly) large role in scientific progress.

In The Structure of Scientific Revolutions, Thomas Kuhn refers to “…discovery through accident, a type that occurs more frequently than the impersonal standards of scientific reporting allow us easily to realize [57]”. For examples of this kind of discovery, I am reminded of the discovery of X-rays by Roentgen, or Fleming’s discovery of penicillin: in both cases, the phenomenon in question had been encountered and recorded by numerous predecessors, but simply passed over as insignificant. In particular, with regard to X-rays, Kuhn asks:

“At what point in Roentgen’s investigation, for example, ought we to say that X-rays had actually been discovered? Not, in any case, at the first instant, when all that had been noted was a glowing screen. At least one other investigator had seen that glow and, to his subsequent chagrin, discovered nothing at all [57]”.

This comes at the LIGO problem from a different direction: if exclusive attention is being given to mining out inspiral-gravitational wave signatures from noisy data, what other possibilities might be missed? While conceding the validity of such a concern, Kuhn judiciously warns us not to go overboard with it:

“Ought we conclude from that frequency with which such instrumental commitments prove misleading that science should abandon standard tests and standard instruments? That would result in an inconceivable method of research [60]”.

At least one form of alternative hypothesis that the LIGO team has thoroughly checked for is instrumental and environmental flukes. “Investigations similar to the detection validation procedures for GW150914 found no evidence that instrumental or environmental disturbances contributed to GW151226”, although they then slightly contradict themselves by saying that these disturbances maybe contributed, but “were too small to account for more than 6%” of the signal.

Also, by way of at least considering hypotheses outside of canonical general relativity models, the LIGO authors do try deliberately perturbing the parameters of said models: “To test whether GW151226 is consistent with general relativity, we allow the coefficients that describe the waveform (which are derived as functions of the source parameters from the post-Newtonian approximation and from fits to numerical relativity simulations) to deviate from their nominal values, and check whether the resulting waveforms are consistent with the data. The posterior probability densities of the coefficients are found to center on their general relativity values.” This at least goes some way towards considering other hypotheses, though deformed versions of GR don’t really count as a “random” or “diverse” way of choosing them.

4. Information-richness of the hypothesis.

I briefly mentioned that the diverse hypotheses tested should be of similar complexity. “Complexity” can be quantified in many ways, but for these purposes let’s say it has something to do with the number of parameters that have to be set in a model, or the number of really basic features needed to describe it.

We already went over this for the TV episode on Jupiter. Because the episode itself consists of gigabytes of data, it is extremely high-dimensional, hence highly characteristic, even if a number of its components deviate a bit due to noise. This gives us an enormously stronger confidence that we have found what we think we have found, even if we don’t have a big diverse library of hypotheses to look at.

Another example is in the life sciences, like when we look at DNA sequences. Say we use a DNA synthesizer to create a 1000 base-pair strand to our liking, take the DNA to the sequencer down the street in another building, and find that the sequence it reads out to us is base-for-base identical to the one we typed into the synthesizer. Are we surprised? No, because the sequence was high enough in complexity/information/distinctiveness we can say that there is about a 1 in 4^1000 chance (unbelievably low) that the sequencer guessed randomly and stumbled on the right sequence. We can also apply quite a lot of static or mutations to our TV show and DNA strand, respectively, and still be able to make out a face or a strong phylogenetic relation.

For a signal so basically simple in structure as that predicted for an inspiral, on the other hand, the number of features seems far lower. Look at it this way: what are the ingredients of the expected black hole merger GW signal? Take a look at Fig. 5 of the first detection paper, which shows the optimal matched filter waveform the LIGO team found for the GW151226 data. We can see it as having three main features:

1) It begins as a simple sinusoidal pattern, which gradually increases in frequency;

2) As the frequency increases, the amplitude of the signal also begins to increase;

3) A maximum amplitude is reached, whereupon the signal rapidly dampens down (ringdown).

Altogether, this amounts to a quite vague signal type–our “signature” for a black hole merger is simply a wiggle that gets tighter and stronger and then abruptly fades out. Taking initial amplitude and frequency to be 1, we might estimate 4 variables: one for the rate of the frequency increase, one for the rate of the amplitude increase, one for the duration of the amplitude increase, and one for the rate of signal decay once the increase stops.

Indeed, there are innumerable of ways to get this kind of a shape. Just one dumb example that produces a pretty good likeness of Figure 5, which I got from a few minutes of playing around with a graphing calculator and has nothing whatsoever to do with general relativity or the theory of black holes, is

Pasted Graphic

This equation may be ugly and random, but it’s also not terribly complicated, in the sense that there are only five parameters or so.

How are we confident that such non-distinctive (hence, I presume, more common) waveforms can be impartially extracted from an extremely complex, large, noisy dataset by the matched filter, with such certainty that we say it can only be due to a pair of merging black holes? (Or do I have this somehow backwards, so that complicated patterns are more the norm and simple ones extremely unlikely?)

5. Number of matches.

Finally, if signal informativeness doesn’t close the deal for us, our last hope is getting a lot of matches; even a relatively non-distinctive pattern, if it happens enough times, can mushroom in significance. But as I’ve said a million times by now, we have only one observation that all by itself looks like a gravitational wave ought to, and that’s GW150914.

If we want to be charitable, since there are two detectors, Hanford and Livingston, we may also wish to count both GW151226 and GW150914 as four results instead. In that case, at least one of these looks very dicey (the Livingston frequency-energy map, bottom of Fig. 1). That leaves us with three decent-looking detections out of four observations. Okay, but not great.



As you can probably tell, this has been driving me nuts and I am still not sure where I stand. First off, I definitely don’t have a problem with the idea gravitational waves itself. Disruptions in the very fabric of space time that travel through everything at light speed, actually change the dimensions of objects, and carry clues to insanely distant, ancient and violent events? Super cool!

I must note that my opinions here are just based on reading the detection papers: I haven’t dove into the raw data itself. The obvious next step would be to peruse/re-analyze that raw data on my own (available here), which will involve a lot of computer time and how-to learning but plan on chipping away at it.

But when I put the current picture together, including the pulsar timing null results, the paucity of LIGO detections (if they are such), something… feels funny here. It could be pointing to a surprising and interesting fact about the universe and gravitational waves. Statistically, GW151226’s match with GR’s predictions gives an impressive p-value, 10^-7. But the weird failure to imagine other concepts, plus the sheer tininess of the effect (a weak signal, even in terms of thousandth of a proton diameter), plus the amount of digital finagling to “find” such a jejune pattern in the noise, all nudge my skepticism and sap my zeal. So the best word for GW151226 is not “wrong”, but “uncompelling”. Wait and see.

The situation also shows how results that are presented as straightforward triumphs often prove on philosophical grounds to contain traps, assumptions and gray areas, generally far more of them that scientists would like to think about. Some of my continuing discomfort for instance comes from a sense of how perilously close, in the case of GW151226 especially, this process of filtering and directed attention can come to two age-old bugaboos of thought: confirmation bias and the streetlight effect. The former means we attach great significance to events that support what we already believe while disregarding events that don’t; the latter is based on the anecdote of a man who, losing his keys one night, looks for them only under a streetlight, not because he thinks that’s where he lost them but because “that’s where the light is”.

On the question of confirmation bias, there can be little doubt that LIGO scientists as well as cosmologists the world over would far prefer the discovery of gravitational waves to a null result that then casts a pall over general relativity, the otherwise well-substantiated keystone of their field. I don’t find it credible that they would fabricate a result, but it may be that they would over-interpreted/dredged a basically shaky result in order to confirm a prediction that they want and already believe to be true–not to mention to vindicate a project that has cost billions. Then again, a definitive non-detection would have been in many ways more revolutionary than a detection.

On the question of the streetlight effect, we may want to say that general relativity is like the streetlight–that the LIGO team designed their matched-filter template bank based on models of gravitational waves from general relativity merely because that, too, is “where the light is”. Because general relativity has already notched many other successes, the LIGO scientists are not quite like the man in the story, because those prior successes, to mix metaphors a bit, already gives them reason to believe the keys are somewhere under the streetlight. But what makes us so sure that there is not a better explanation–a better-fitting key–if we do not include anything beyond our current sight?

Let me reiterate what I said in my first LIGO post would count as “compelling”: we’d need 1) at least two more signals, of a strength and clarity similar to GW150914 (or at least such not requiring intricate and value-laden statistical methods to detect/select/dredge them out of overwhelming noise) to make a triplicate; 2) more accurate sky localization of the signal, say by a third interferometer; and 3) concomitant observations in another observational mode, in the same area of the sky, consistent with a black hole merger. GW151226, with its awkward provenance, does not meet those criteria or dispel the doubts–and so the wait continues.

Regarding corroboration from other observational modes, there have been one development, though in the wrong direction. In my previous post, I struck a more conciliatory tone about the first LIGO detection, noting that it appeared to have been closely accompanied by an X-ray event that would nicely corroborate the expectations of a black hole merger in the presence of infalling debris from GW150914. However, I must now point out that that event has since been discredited as the result an error in signal processing. The case of this artifactual X-ray “observation” that seemingly heralded GW150914 shows how easy it is to err in suspecting “corroboration” of what we already feel certain of.

There was also another event that I mentioned in that last post on this, GW151012, that was also tipped as a possible detection along with GW151226. I hoped that this would offer some further clarity, but it turns out it has been discounted by the LIGO team, as it achieved only 2-sigma significance. It is mentioned in the latest paper instead as LVT151012, “the third most significant binary black hole candidate” after GW150914 and GW151226. Furthermore, the authors state, “No other significant binary black hole candidates in the total mass range 4–100M were found during Advanced LIGO’s first observing period”. We have therefore to wait until the next observing period for more solid detections along the lines of GW150914.

Maybe my difficulty in finding the LIGO results fully convincing thus far is just a matter of the sciences having reached such a level of specialization and complexity that it becomes impossible for even the most determined non-initiates to arrive at any certainty about the results’ correctness. The data for Advanced LIGO’s first run, for instance, goes into the terabytes. The amount that we depend on others’ interpretations and analyses has become completely staggering.

My most optimistic guess is that over time all this will come into focus as just another lesson about how real science works: initial results are often clouded in uncertainty or questions of interpretation, which are only settled as more data and analysis from other experimenters flows in, sometimes over decades.

In that case, the only real fault here lies in the news media and press offices’ thirst for splashy headlines and sharp-hewn storylines. This thirst demands presenting the waves’ detection as a total and immediate fait accompli rather than a cumulative process, so that that the natural scientific period of ambiguity and re-checking has been short-circuited. (Even general relativity was not immediately accepted, but for decades after its publication in 1916 sat in a sort of limbo reserved for brilliant but not-totally-certain ideas, till enough experiments and theoretical connections gave it the necessary support.)

To return to Kuhn: “We can only say that X-rays emerged in Würzburg between November 8 and December 28, 1895 [57]”. In a similar way, we might do best to say something like “gravitational waves began to emerge at Livingston and Hanford between September 14, 2015, and will continue to emerge until that full triplicate is completed”.


As a final note, I’d like to suggest that my (and others’) lingering skepticism about LIGO’s results is not purely about the results, but finds its place in a much bigger context: multiplying examples of non-reproducibility in published research, high-profile retractions or reversals of many highly publicized “discoveries”, and a paucity of fundamental new results or compelling new theoretical frameworks. Over the last decade or so, this situation has more and more assumed the shape of a full-blown though hushed crisis, afflicting a huge swath of the sciences, from string theory to dentistry. But seeing how enormous and unwieldy this piece has already gotten, and yet how much more there is to discuss, I will mercifully postpone these matters for another post.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s