4 Testing Techniques
4.1 Coalescence analysis
Gravitational waves emitted during the inspiral, merger and ringdown of compact binaries are the most studied in the context of data analysis and parameter estimation. In this section, we will review some of the main data analysis techniques employed in the context of parameter estimation and tests of GR. We begin with a discussion of matched filtering and Fisher theory (for a detailed review, see [173, 103, 125*, 174*, 248]). We then continue with a discussion of Bayesian parameter estimation and hypothesis testing (for a detailed review, see [387, 205, 123, 294*]).
4.1.1 Matched filtering and Fisher’s analysis
When the detector noise is Gaussian and stationary, and when the signal
is known very well,
the optimal detection strategy is matched filtering. For any given realization, such noise can be
characterized by its power spectral density
, defined via
The detectability of a signal is determined by its signal-to-noise ratio or SNR, which is defined via
where



For the noise assumptions made here, the probability of measuring in the detector output, given a
template
, is given by










The Fisher method to estimate projected constraints on modified gravity theory parameters is as
follows. First, one constructs a waveform model in the particular modified gravity theory one wishes to
constrain. Usually, this waveform will be similar to the GR one, but it will contain an additional parameter,
, such that the template parameters are now
plus
. Let us assume that as
, the modified
gravity waveform reduces to the GR expectation. Then, the accuracy to which
can be measured, or
the accuracy to which we can say
is zero, is approximately
, where the Fisher
matrix associated with this variance-covariance matrix must be computed with the non-GR
model evaluated at the GR limit (
). Such a method for estimating how well modified
gravity theories can be constrained was pioneered by Will in [436*, 353], and since then, it has
been widely employed as a first-cut estimate of the accuracy to which different theories can be
constrained.
The Fisher method described above can dangerously lead to incorrect results if abused [414, 415]. One
must understand that this method is suitable only if the noise is stationary and Gaussian and if the SNR is
sufficiently large. How large an SNR is required for Fisher methods to work depends somewhat on the
signals considered, but usually for applications concerning tests of GR, one would be safe with or
so. In real data analysis, the first two conditions are almost never satisfied. Moreover, the first
detections that will be made will probably be of low SNR, i.e.,
, for which again the
Fisher method fails. In such cases, more sophisticated parameter estimation methods need to be
employed.
4.1.2 Bayesian theory and model testing
Bayesian theory is ideal for parameter estimation and model selection. Let us then assume that a detection
has been made and that the gravitational wave signal in the data can be described by some model ,
parameterized by the vector
. Using Bayes’ theorem, the posterior distribution function (PDF) or
the probability density function for the model parameters, given data
and model
, is












Let us now switch gears to model selection. In hypothesis testing, one wishes to determine whether the
data is more consistent with hypothesis A (e.g., that a GR waveform correctly models the signal) or with
hypothesis B (e.g., that a non-GR waveform correctly models the signal). Using Bayes’ theorem, the PDF
for model given the data is






When hypothesis A and B refer to fundamental theories of nature we can take different viewpoints
regarding the priors. If we argue that we know nothing about whether hypothesis A or B better describes
nature, then we would assign equal priors to both hypotheses. If, on the other hand, we believe GR is the
correct theory of nature, based on all previous experiments performed in the solar system and with binary
pulsars, then we would assign . This assigning of priors necessarily biases the inferences
derived from the calculated posteriors, which is sometimes heavily debated when comparing Bayesian theory
to a frequentist approach. However, this “biasing” is really unavoidable and merely a reflection of
our state of knowledge of nature (for a more detailed discussion on such issues, please refer
to [294]).
The integral over all models in Eq. (88*) can never be calculated in practice, simply because
we do not know all models. Thus, one is forced to investigate relative probabilities between
models, such that the normalization constant cancels out. The odds ratio is defined by








The main difficulty in Bayesian inference (both in parameter estimation and model selection) is sampling the PDF sufficiently accurately. Several methods have been developed for this purpose, but currently the two main workhorses in gravitational-wave data analysis are Markov Chain Monte Carlo and Nested Sampling. In the former, one samples the likelihood through the Metropolis–Hastings algorithm [314, 221, 122, 367]. This is computationally expensive in high-dimensional cases, and thus, there are several techniques to improve the efficiency of the method, e.g., parallel tempering [402]. Once the PDF has been sampled, one can then calculate the evidence integral, for example via thermodynamic integration [420, 167, 419]. In Nested Sampling, the evidence is calculated directly by laying out a fixed number of points in the prior volume, which are then allowed to move and coalesce toward regions of high posterior probability. With the evidence in hand, one can then infer the PDF. As in the previous case, Nested Sampling can be computationally expensive in high-dimensional cases.
Del Pozzo et al. [142*] were the first to carry out a Bayesian implementation of model selection in the context of tests of GR. Their analysis focused on tests of a particular massive graviton theory, using the gravitational wave signal from quasi-circular inspiral of non-spinning black holes. Cornish et al. [124*, 376*] extended this analysis by considering model-independent deviations from GR, using the parameterized post-Einsteinian (ppE) approach (Section 5.3.4) [467*]. Recently, this was continued by Li et al. [290*, 291], who carried out a similar analysis on a large statistical sample of Advanced LIGO (aLIGO) detections using simulated data and a restricted ppE model. All of these studies suggest that Bayesian tests of GR are possible, given sufficiently-high SNR events. Of course, whether deviations from GR are observable will depend on the strong-field character and strength of the deviation, as well as the availability of sufficiently-accurate GR waveforms.
4.1.3 Systematics in model selection
The model selection techniques described above are affected by other systematics present in data analysis. In general, we can classify these into the following [417*]:
- Mismodeling Systematic, caused by inaccurate models of the gravitational-wave template.
- Instrumental Systematic, caused by inaccurate models of the gravitational-wave response.
- Astrophysical Systematic, caused by inaccurate models of the astrophysical environment.
Mismodeling systematics are introduced due to the lack of an exact solution to the Einstein equations from which to extract an exact template, given a particular astrophysical scenario. Inspiral templates, for example, are approximated through post-Newtonian theory and become increasingly less accurate as the binary components approach each other. Cutler and Vallisneri [127] were the first to carry out a formal and practical investigation of such a systematic in the context of parameter estimation from a frequentist approach.
Mismodeling systematics will prevent us from testing GR effectively with signals that we do not understand sufficiently well. For example, when considering signals from black hole coalescences, if the the total mass of the binary is sufficiently high, the binary will merge in band. The higher the total mass, the fewer the inspiral cycles that will be in band, until eventually only the merger is in band. Since the merger phase is the least understood phase, it stands to reason that our ability to test GR will deteriorate as the total mass increases. Of course, we do understand the ringdown phase very well, and tests of the no-hair theorem would be allowed during this phase, provided a sufficiently-high SNR [65*]. On the other hand, for neutron star binaries or very–low-mass black-hole binaries, the merger phase is expected to be essentially out of band for aLIGO (above 1 kHz), and thus, the noise spectrum itself may shield us from our ignorance.
Instrumental systematics are introduced by our ignorance of the transfer function, which connects the detector output to the incoming gravitational waves. Through sophisticated calibration studies with real data, one can approximate the transfer function very well [4*, 1]. However, this function is not time-independent, because the noise in the instrument is not stationary or Gaussian. Thus, un-modeled drifts in the transfer function can introduce systematics in parameter estimation that are as large as 10% in the amplitude and the phase [4].
Instrumental systematics can affect tests of GR, if these are performed with a single instrument. However, one expects multiple detectors to be online in the future and for gravitational-wave detections to be made in several of them simultaneously. Instrumental systematics should be present in all such detections, but since the noise will be mostly uncorrelated between different instruments, one should be able to ameliorate its effects through cross-correlating outputs from several instruments.
Astrophysical systematics are induced by our lack of a priori knowledge of the gravitational wave source. As explained above, matched filtering requires knowledge of a waveform template with which to filter the data. Usually, we assume the sources are in a perfect vacuum and isolated. For example, when considering inspiral signals, we ignore any third bodies, electric or magnetic fields, neutron star hydrodynamics, the expansion of the universe, etc. Fortunately, however, most of these effects are expected to be small: the probability of finding third bodies sufficiently close to a binary system is very small [463*]; for low redshift events, the expansion of the universe induces an acceleration of the center of mass, which is also very small [468*]; electromagnetic fields and neutron-star hydrodynamic effects may affect the inspiral of black holes and neutron stars, but not until the very last stages, when most signals will be out of band anyways. For example, tidal deformation effects enter a neutron-star–binary inspiral waveform at 5 post-Newtonian order, which therefore affects the signal outside of the most sensitive part of the aLIGO sensitivity bucket.
Perhaps the most dangerous source of astrophysical systematics is due to the assumptions made about the astrophysical systems we expect to observe. For example, when considering neutron-star–binary inspirals, one usually assumes the orbit will have circularized by the time it enters the sensitivity band. Moreover, one assumes that any residual spin angular momentum that the neutron stars may possess is very small and aligned or counter-aligned with the orbital angular momentum. These assumptions certainly simplify the construction of waveform templates, but if they happen to be wrong, they would introduce mismodeling systematics that could also affect parameter estimation and tests of GR.
4.2 Burst analyses
In alternative theories of gravity, gravitational-wave sources such as core collapse supernovae may result in the production of gravitational waves in more than just the plus and cross-polarizations [384, 380, 216, 334, 333, 369]. Indeed, the near-spherical geometry of the collapse can be a source of scalar breathing-mode gravitational waves. However, the precise form of the waveform is unknown because it is sensitive to the initial conditions.
When searching for un-modeled bursts in alternative theories of gravity, a general approach involves
optimized linear combinations of data streams from all available detectors to form maximum likelihood
estimates of the waveforms in the various polarizations, and the use of null streams. In the context of
ground-based detectors and GR, these ideas were first explored by Gürsel and Tinto [212*] and later by
Chatterji et al. [101*] with the aim of separating false-alarm events from real detections. The main idea was
to construct a linear combination of data streams received by a network of detectors, so that the
combination contained only noise. Of course, in GR one need only include and
polarizations, and
thus a network of three detectors suffices. This concept can be extended to develop null tests of GR, as
originally proposed by Chatziioannou et al. [102*] and recently implemented by Hayama et
al. [228*].
Let us consider a network of detectors with uncorrelated noise and a detection by all
detectors. For a source that emits gravitational waves in the direction
, a single data point (either in the
time-domain, or a time-frequency pixel) from an array of
detectors (either pulsars or interferometers)
can be written as



If we do not know the form of the signal present in our data, we can obtain maximum likelihood
estimators for it. For simplicity, let us assume the data are Gaussian and of unit variance (the latter can be
achieved by whitening the data). Just as we did in Eq. (80*), we can write the probability of obtaining
datum , in the presence of a gravitational wave
as



With just three independent detectors, we can choose to eliminate the two tensor modes (the plus and cross-polarizations) and construct a GR null stream: a linear combination of data streams that contains no signal consistent within GR, but could contain a signal in another gravitational theory, as illustrated in Figure 4*. With such a GR null stream, one can carry out null tests of GR and study whether such a stream contains any statistically-significant deviations from noise. Notice that this approach does not require a template; if one were parametrically constructed, such as in [102*], more powerful null tests could be applied. In the future, we expect several gravitational wave detectors to be online: the two aLIGO ones in the United States, Advanced VIRGO (adVirgo) in Italy, LIGO-India in India, and KAGRA in Japan. Given a gravitational-wave observation that is detected by all five detectors, one can then construct three GR null streams, each with power in a signal direction.
For pulsar timing experiments where one is dealing with data streams of about a few tens of pulsars, waveform reconstruction for all polarization states, as well as numerous null streams, can be constructed.
4.3 Stochastic background searches
Much work has been done on the response of ground-based interferometers to non-tensorial polarization modes, stochastic background detection prospects, and data analysis techniques [299, 323, 191, 329*, 121]. In the context of pulsar timing, the first work to deal with the detection of such backgrounds in the context of alternative theories of gravity is due to Lee et al. [284*], who used a coherence statistic approach to the detection of non-Einsteinian polarizations. They studied the number of pulsars required to detect the various extra polarization modes, and found that pulsar timing arrays are especially sensitive to the longitudinal mode. Alves and Tinto [22*] also found enhanced sensitivity to longitudinal and vector modes. Here we follow the work in [329*, 99*] that deals with the LIGO and pulsar timing cases using the optimal statistic, a cross-correlation that maximizes the SNR.
In the context of the optimal statistic, the derivations of the effect of extra polarization states for ground-based instruments and pulsar timing are very similar. We begin with the metric perturbation written in terms of a plane wave expansion, as in Eq. (50*). If we assume that the background is unpolarized, isotropic, and stationary, we have that
where




For both ground-based interferometers and pulsar-timing experiments, an isotropic stochastic
background of gravitational waves appears in the data as correlated noise between measurements from
different instruments. The data set from the instrument is of the form



Since the gravitational-wave signal is correlated, we can use cross-correlations to search for it. The cross-correlation statistic is defined as
where






The expressions for the mean and variance of the cross-correlation statistic, and
respectively,
take the same form for both pulsar timing and ground-based instruments. In the frequency domain,
Eq. (109*) becomes





The mean and variance can be rewritten more compactly if we define a positive-definite inner product using the noise power spectra of the two data streams
again in analogy to the inner product in Eq. (78*), when considering inspirals. Using this definition where we recall that the capital Latin indices

The differences in the optimal filter between interferometers and pulsars arise only from
differences in the overlap reduction functions, . For ground-based instruments, the signal
data
are the strains given by Eq. (57*). The overlap reduction functions are then given by


For pulsar timing arrays, the signal data are the redshifts
, given by Eq. (63*). The overlap
reduction functions are then given by





The result for the combination of cross-correlation pairs to form an optimal network statistic is also the same in both ground-based interferometer and pulsar timing cases: a sum of the cross-correlations of all detector pairs weighted by their variances. The detector network optimal statistic is,
where
In order to perform a search for a given polarization mode one first needs to compute the overlap
reduction functions (using either Eq. (120*) or (121*)) for that mode. With that in hand and a form for the
stochastic background spectrum , one can construct optimal filters for all pairs in the detector
network using Eq. (119*), and perform the cross-correlations using either Eq. (109*) (or equivalently
Eq. (111*)). Finally, we can calculate the overall network statistic Eq. (124*), by first finding the variances
using Eq. (114*).
It is important to point out that the procedure outlined above is straightforward for ground-based interferometers. However, pulsar timing data are irregularly sampled, and have a pulsar-timing model subtracted out. This needs to be accounted for, and generally, a time-domain approach is more appropriate for these data sets. The procedure is similar to what we have outlined above, but power spectra and gravitational-wave spectra in the frequency domain need to be replaced by auto-covariance and cross-covariance matrices in the time domain that account for the model fitting (for an example of how to do this see [162]).
Interestingly, Nishizawa et al. [329*] show that with three spatially-separated detectors the tensor,
vector, and scalar contributions to the energy density in gravitational waves can be measured
independently. Lee et al. [284] and Alves and Tinto [22] showed that pulsar timing experiments are
especially sensitive to the longitudinal mode, and to a lesser extent the vector modes. Chamberlin and
Siemens [99] showed that the sensitivity of the cross-correlation to the longitudinal mode using
nearby pulsar pairs can be enhanced significantly compared to that of the transverse modes. For
example, for the NANOGrav pulsar timing array, two pulsar pairs separated by result in
an enhancement of 4 orders of magnitude in sensitivity to the longitudinal mode relative to
the transverse modes. The main contribution to this effect is due to gravitational waves that
are coming from roughly the same direction as the pulses from the pulsars. In this case, the
induced redshift for any gravitational-wave polarization mode is proportional to
, the
product of the gravitational-wave frequency and the distance to the pulsar, which can be large.
When the gravitational waves and the pulse direction are exactly parallel, the redshift for the
transverse and vector modes vanishes, but it is proportional to
for the scalar-longitudinal
mode.
Lee et al. [285] studied the detectability of massive gravitons in pulsar timing arrays through stochastic
background searches. They introduced a modification to Eq. (59*) to account for graviton dispersion, and
found the modified overlap reduction functions (i.e., modifications to the Hellings–Downs curves Eq. (122*))
for various values of the graviton mass. They conclude that a large number of stable pulsars () are
required to distinguish between the massive and massless cases, and that future pulsar timing experiments
could be sensitive to graviton masses of about
(
). This method is competitive with
some of the compact binary tests described later in Section 5.3.1 (see Table 2). In addition, since the
method of Lee et al. [285] only depends on the form of the overlap reduction functions, it is
advantageous in that it does not involve matched filtering (and therefore prior knowledge of
the waveforms), and generally makes few assumptions about the gravitational-wave source
properties.