zotero/storage/XB8C3MNN/.zotero-ft-cache

Supplementary information for “Quantum supremacy using a programmable superconducting processor” [1]
Google AI Quantum and collaborators† (Dated: January 1, 2020)

arXiv:1910.11333v2 [quant-ph] 28 Dec 2019

CONTENTS

I. Device design and architecture

2

II. Fabrication and layout

2

III. Qubit control and readout

3

A. Control

3

B. Readout

3

IV. XEB theory

5

A. XEB of a small number of qubits

5

B. XEB of a large number of qubits

7

C. Two limiting cases

8

D. Measurement errors

9

V. Quantifying errors

9

VI. Metrology and calibration

11

A. Calibration overview

11

1. Device registry

11

2. Scheduling calibrations: “Optimus”

12

B. Calibration procedure

12

1. Device conﬁguration

12

2. Root conﬁg: procedure

13

3. Single-qubit conﬁg: procedure

13

4. Optimizing qubit operating frequencies 13

5. Grid conﬁg: procedure

14

C. Two-qubit gate metrology

15

1. The natural two-qubit gate for transmon

qubits

15

2. Using cross entropy to learn a unitary

model

16

3. Comparison with randomized

benchmarking

16

4. Speckle purity benchmarking (SPB) 18

5. “Per-layer” parallel XEB

19

D. Grid readout calibration

20

1. Choosing qubit frequencies for readout 20

2. Single qubit calibration

20

3. Characterizing multi-qubit readout

21

E. Summary of system parameters

22

VII. Quantum circuits

27

A. Background

27

B. Overview and technical requirements

27

C. Circuit structure

27

D. Randomness

27

E. Quantum gates

28

F. Programmability and universality

29

1. Decomposition of CZ into fSim gates 29

2. Universality for SU(2)

30

G. Circuit variants

30

1. Gate elision

31

2. Wedge formation

31

VIII. Large scale XEB results

31

A. Limitations of full circuits

32

B. Patch circuits: a quick performance

indicator for large systems

33

C. Elided circuits: a more rigorous

performance estimator for large systems 33

D. Choice of unitary model for two-qubit

entangling gates

34

E. Understanding system performance: error

model prediction

35

F. Distribution of bitstring probabilities

36

G. Statistical uncertainties of XEB

measurements

39

H. System stability and systematic

uncertainties

40

I. The ﬁdelity result and the null hypothesis

on quantum supremacy

41

IX. Sensitivity of XEB to errors

42

X. Classical simulations

44

A. Local Schr¨odinger and

Schr¨odinger-Feynman simulators

44

B. Feynman simulator

45

C. Supercomputer Schr¨odinger simulator

49

D. Simulation of random circuit sampling with

a target ﬁdelity

50

1. Optimality of the Schmidt decomposition

for gates embedded in a random circuit 51

2. Classical speedup for imbalanced gates 52

3. Veriﬁable and supremacy circuits

53

E. Treewidth upper bounds and variable

elimination algorithms

54

F. Computational cost estimation for the

sampling task

56

G. Understanding the scaling with width and

depth of the computational cost of

veriﬁcation

57

1. Runtime scaling formulas

57

2. Assumptions and corrections

57

3. Fitting constants

58

4. Memory usage scaling

59

H. Energy advantage for quantum computing 59

XI. Complexity-theoretic foundation of the

experiment

59

2

A. Error model

61

B. Deﬁnition of computational problem

61

C. Computational hardness of unbiased-noise

sampling

62

D. Proof of Theorem 1

62

Acknowledgments

64

Erratum

64

References

64

I. DEVICE DESIGN AND ARCHITECTURE
The Sycamore device was designed with both the quantum supremacy experiment [1] and small noisy intermediate scale quantum (NISQ) applications in mind. The architecture is also suitable for initial experiments with quantum error correction based on the surface code. While we are targeting 0.1% error two-qubit gates for error correction, a quantum supremacy demonstration can be achieved with 0.3-0.6% error rates.
For decoherence-dominated errors, a 0.1% error means a factor of about 1000 between coherence and gate times. For example, a 25 µs coherence time implies a 25 ns gate. A key design objective in our architecture is achieving short two-qubit gate time, leading to the choice of tunable transmon qubits with direct, tunable coupling.
A diﬃcult challenge for achieving a high-performance two-qubit gate is designing a suﬃciently strong coupling when the gate is active, which is needed for fast gates, while minimizing the coupling otherwise for low residual control errors. These two competing requirements are diﬃcult to satisfy with a ﬁxed-coupling architecture: our prior processors [2] used large qubit-qubit detuning (∼1 GHz) to turn oﬀ the eﬀective interaction, requiring relatively high-amplitude precise ﬂux pulses to tune the qubit frequencies to implement a CZ gate. In the Sycamore device, we use adjustable couplers [3] as a natural solution to this control problem, albeit at the cost of more wiring and control signals. This means that the qubits can idle at much smaller relative detuning. We chose a capacitor-coupled design [3, 4], which is simpler to layout and scale, over the inductor-based coupler of previous gmon devices [5, 6]. In Sycamore, the coupling g is tunable from 5 MHz to −40 MHz. The experiment uses ‘on’ coupling of about −20 MHz.
By needing only small frequency excursions to perform a two-qubit gate, the tunable qubit can be operated much closer to its maximum frequency, thus greatly reducing ﬂux sensitivity and dephasing from 1/f ﬂux noise. Additionally, the coupling can be turned oﬀ during measurement, reducing the eﬀect of measurement crosstalk, a phenomenon that has shown to be somewhat diﬃcult to understand and minimize [7].
The interaction Hamiltonian of a system of onresonance transmons with adjustable coupling (truncated

to the qubit levels) has the following approximate form,

Hint(t) ≈

gij (t) (σi+σj−

+

σi−σj+)

+

gi2j (t) |η|

σiz σjz

,

(1)

i,j

where gij is the nearest neighbor coupling, η is the nonlinearity of the qubits (roughly constant), i and j index nearest-neighbor qubit pairs, and σ± = (σx ±iσy)/2. We pulse the coupling in time to create coupling gates.
Our two-qubit gate can be understood using Cartan decomposition [8], which enables an arbitrary twoqubit gate to be decomposed into four single-qubit gates around a central two-qubit gate that can be described by a unitary matrix describing only XX, YY and ZZ interactions, with 3 parameters indicating their strengths. For the physical interaction describing our hardware, we see a swapping interaction between the |01 and |10 qubits states, corresponding to an XX+YY interaction. Interaction of the qubit |11 state with the |2 states of the data transmons produce a phase shift of that state, corresponding to a ZZ interaction. By changing the qubit frequencies and coupling strength we can vary the magnitude of these interactions, giving net control of 2 out of the 3 possible parameters for an arbitrary gate.

II. FABRICATION AND LAYOUT
Our Sycamore quantum processor is conﬁgured as a diagonal array of qubits as seen in the schematic of Fig. 1 in the main text. The processor contains 142 transmon qubits, of which 54 qubits have individual microwave and frequency controls and are individually read out (referred to as qubits). The remaining 88 transmons are operated as adjustable couplers remaining in their ground state during the algorithms (referred to as couplers).
The qubits consist of a DC SQUID sandwiched between two metal islands, operating in the transmon regime. An on-chip bias line is inductively coupled to the DC SQUID, which allows us to tune qubit frequency by applying control ﬂuxes into the SQUID loop. For regular operations, we tune qubits through a small frequency range (< 100 MHz). This corresponds to a relatively small control signal and makes qubit operation less sensitive to ﬂux crosstalk.
Each pair of nearest-neighbor qubits are coupled through two parallel channels: direct capacitive coupling and indirect coupling mediated by coupler [3, 4, 9]. Both channels result in qubit-qubit coupling in the form of σixσjx + σiyσjy in the rotating frame, although with different signs. The indirect coupling is negative, given it is a second-order virtual process. The strength of the indirect coupling is adjusted by changing the coupler frequency with an additional on-chip bias line, giving a net zero qubit-qubit coupling at a speciﬁc ﬂux bias.
The Sycamore processor consists of two die that we fabricated on separate high resistivity silicon wafers. The fabrication process, using aluminum on silicon, requires

3

FIG. S1. A photograph of a packaged Sycamore processor. The processor is shielded from the electromagnetic environment by a mu-metal shield (middle) and a superconducting aluminum cap, inside the mu-metal shield. The processor control wires are routed, through PCB circuit board, to coaxial connectors shown around the edge.
a total of 14 lithography layers utilizing both optical and electron beam lithography. Crosstalk and dissipation are mitigated through ground plane shielding [10]. After fabrication and die singulation, we use indium bump bonding [11, 12] of the two separate dies to form the Sycamore processor.
The Sycamore processor is connected to a 3-layer Alplated circuit board with Al wirebonds [13]. Each line is routed through a microwave connector to an individual coax cable. We shield the processor from stray light using a superconducting Al lid with black coating, and from magnetic ﬁelds using a mu-metal shield as shown in Fig. S1.

ules forms a >250-channel, phase-synchronous waveform generator. We have measured 20 ps of jitter between channels. The modules are mounted in 14-slot 6U rackmount chassis. A single chassis, shown in FIG. S4, can control approximately 15 qubits including their associated couplers and readout signals. A total of 4 chassis are used to control the entire Sycamore chip.
The DAC outputs are used directly for fast ﬂux biasing the qubits and couplers required for two-qubit gates. Microwave control for single-qubit XY rotations and dispersive readout combine two DAC channels and a mixer module to form a microwave arbitrary waveform generator (Microwave AWG) via single-sideband upconversion in an IQ mixer as shown in Figure S2 a. The microwave AWG provides signals with arbitrary spectral content within ±350 MHz of the local oscillator (LO). A single LO signal is distributed to all IQ mixers so that all qubits’ XY controls are phase coherent. The mixer modules are mounted in the same chassis as the DAC modules. Each mixer’s I and Q port DC oﬀsets are calibrated for minimum carrier leakage and the I and Q amplitudes and phases are calibrated to maximize image rejection.
Each DAC module contains an FPGA that provides a gigabit ethernet interface, SRAM to store waveform patterns, and sends the waveform data to the DAC module’s 8 DACs. To optimize the use of SRAM, the FPGA implements a simple jump table to allow reusing or repeating waveform segments. A computer loads the desired waveforms and jump table onto each FPGA using a UDP-based protocol and then requests the ﬁrst (master) FPGA to start. The start pulse is passed down the daisy chain causing the remainder (slave) DACs and ADCs to start.
B. Readout

III. QUBIT CONTROL AND READOUT
A. Control
Operating the device requires simultaneous synchronized control waveforms for each of the qubits and couplers. We use 54 coherent microwave control signals for qubit XY rotations, 54 fast ﬂux bias lines for qubit frequency tuning, and 88 fast ﬂux biases for the adjustable couplers. Dispersive readout requires an additional 9 microwave signals and phase sensitive receivers. A schematic of the room temperature electronics is shown in Fig. S2, and the cryogenic wiring is shown in Fig. S3.
Waveform generation is based on a custom-built multichannel digital to analog converter (DAC) module. Each DAC module provides 8 DACs with 14-bit resolution and 1 GS/s sample rate. Each DAC sample clock is synchronized to a global 10 MHz reference oscillator, and their trigger is connected by a daisy chain to synchronize all modules used in the experiment. This set of DAC mod-

Qubit state measurement and readout (hereafter “readout”) are done via the dispersive interaction between the qubit and a far-detuned harmonic resonator [14–16]. A change in the qubit state from |0 to |1 causes a frequency shift of the resonator from ω|0 to ω|1 . A readout probe signal applied to the resonator at a frequency in between ω|0 and ω|1 reﬂects with a phase shift φ|0 or φ|1 that depends on the resonator frequency and therefore on the qubit state. By detecting the phase of the reﬂected probe signal we infer the qubit state. The readout probe signal is generated with the same microwave AWG as the XY control signals, but with a separate local oscillator, and is received and demodulated by the circuit shown in Figure S2 c.
The readout probe intensity is typically set to populate the readout resonator with only a few photons to avoid readout-induced transitions in the qubit [17]. Detecting this weak signal at room temperature with conventional electronics requires 100 dB of ampliﬁcation. To limit the integration time to a small fraction of the qubit coherence time, the ampliﬁcation chain must operate near the

4

a
MICROWAVE AWG
DAC module
DAC 0.3 GHz

FPGA

DAC 0.3 GHz DAC 0.3 GHz

DAC 0.3 GHz

LO
I
2 dB
Q
2 dB

2 dB

Mixer module
2 dB 7.5 GHz
LOW FREQ. AWG (FLUX)

b DAC card

PARAMP FLUX BIAS
PARAMP PUMP

c
Readout LO

FPGA

ADC ADC

MICROWAVE AWG

2 dB

2 dB

2 dB 2 dB

2 dB

Downmixer module

READOUT PROBE IN
READOUT PROBE OUT 3 dB
Ampliﬁer module

FIG. S2. Control electronics. a, The custom DAC module provides 8 DAC channels (4 shown). DACs are used individually for ﬂux pulses or in pairs combined with a mixer module to comprise a microwave AWG channel (dashed box). b, A single DAC channel and a microwave source are used to bias and pump the parametric ampliﬁer for readout. c, Readout pulses are generated by a microwave AWG. The reﬂected signal is ampliﬁed, mixed down to IF, and then digitized in a pair of ADCs. The digital samples are analyzed in the FPGA.

Control (54 qubit + 88 couplers)

300 K

3 K

10 mK

Qubit ﬂux (Z) 16dB

20dB

0.5 GHz

IR

Qubit μwave

(XY)

3 dB 7.5 GHz

20dB

20dB IR

IR

Coupler ﬂux 10dB

20dB

0.5 GHz

IR

Readout (9X)

Readout in

7.5 GHz

10dB

20dB

20dB IR

20dB

(q
Sycamore chip
Qubit Coupler Readout

P P
20dB

Readout out

IR

IR

IMPA pump

20dB

IR

IMPA ﬂux

10dB

20dB

0.5 GHz IR

300 K

3 K 10 mK

circulator

IR
IR ﬁlter

low-pass ﬁlter

IMPA

cryo-LNA bias tee

20dB
attenuator

band-pass ﬁlter

FIG. S3. Cryogenic wiring. Control and readout signals are carried to and from the Sycamore chip with a set of cables, ﬁlters, attenuators, and ampliﬁers.

FIG. S4. Electronics chassis. Each chassis supports 14 DAC and/or mixer modules. Local oscillators are connected at the top of each mixer module. A set of daisychain cables connects from each ADC module to the next. Control signals exit the chassis through coaxial cables.
quantum noise limit [18, 19]. Inside the cryostat the signal is ampliﬁed by an
impedance matched lumped element Josephson parametric ampliﬁer (IMPA) [20] on the mixing chamber stage followed by a Low Noise Factory cryogenic HEMT ampliﬁer at 3 K. At room temperature the signal is further ampliﬁed before it is mixed down with an IQ mixer producing a pair of intermediate frequency (IF) signals I(t) and Q(t). The IF signals are ampliﬁed by a pair of variable gain ampliﬁers to ﬁne-tune their level, and then digitized by a pair of custom 1 GS/s, 8-bit analog to digital converters (ADC). The digitized samples In and Qn are processed in an FPGA which combines them into a complex phasor
zn = In + iQn = En exp(i(ωndt + φ))
where dt is the sample spacing, ω is the IF frequency, φ is the phase that depends on the qubit state, and En is the envelope of the reﬂected readout signal. The envelope is measured experimentally once and then used by the FPGA in subsequent experiments as the optimal demodulation window wn to extract the phase of the reﬂected readout signal [21, 22]. The FPGA multiplies zn by wn exp(−iωndt), and then sums over time to produce a ﬁnal complex value exp(iφ)
N −1
znwn exp(−iωndt) ∝ exp(iφ)
n=0
In the absence of noise, the ﬁnal complex value would always be one of two possible values corresponding to the qubit states |0 and |1 . However, the noise leads to Gaussian distributions centered at those two points. The size of the clouds is determined mostly by the noise of the

5

IMPA and cryogenic HEMT ampliﬁer, while the separation between the clouds’ centers is determined by the resonator probe power and duration. The signal to noise ratio of the measurement is determined by the clouds’ separation and width [22, 23].
The 54 qubits are divided into nine frequency multiplexed readout groups of six qubits each. Within a group, each qubit is coupled to its own readout resonator, but all six resonators are coupled to a shared bandpass Purcell ﬁlter [22, 24, 25]. All qubits in a group can be read-out simultaneously by frequency-domain multiplexing [2, 26] in which the total probe signal is a superposition of probe signals at each of the readout resonators frequencies. The phase shifts of these superposed signals are independently recovered in the FPGA by demodulating the complex IQ phasor with each intermediate frequency. In other words, we know what frequencies are in the superposed readout signal and we compute the Fourier coeﬃcients at those frequencies to ﬁnd the phase of each reﬂected frequency component.
IV. XEB THEORY
We use cross entropy benchmarking (XEB) [6, 27] to calibrate general single- and two-qubit gates, and also to estimate the ﬁdelity of random quantum circuits with a large number of qubits. XEB is based on the observation that the measurement probabilities of a random quantum state have a similar pattern to laser “speckles”, with some bitstrings more probable than others [28, 29]. The same holds for the output state of random quantum circuits. As errors destroy the speckle pattern, this is enough to estimate the rate of errors and ﬁdelity in an experiment. Crucially, XEB does not require the reconstruction of experimental output probabilities, which would need an exponential number of measurements for increasing number of qubits. Rather, we use numerical simulations to calculate the likelihood of a set of bitstrings obtained in an experiment according to the ideal expected probabilities. Below we describe the theory behind this technique in more detail.
A. XEB of a small number of qubits
We ﬁrst consider the use of XEB to obtain the error rate for single- and two-qubit gates. As explained above, for a two-qubit XEB estimation we use sequences of cycles, each cycle consisting of two suﬃciently random single-qubit gates followed by the same two-qubit gate.
The density operator of the system after application of a random circuit U with m cycles can be written as a sum of two parts
ρU = εm |ψU ψU | + (1 − εm)χU , D = 2n . (2)
Here |ψU = U |ψ0 is the ideal output state and χU is an operator with unit trace that along with εm describes

the eﬀect of errors. For a depolarizing channel model χU = I/D and εm has the meaning of the depolarization ﬁdelity after m cycles. Nevertheless, in the case of small number of qubits, the part of the operator χU has nonzero matrix elements between the states with no error and the states with the error. However, if we undo the evolution of each random circuit and average over an ensemble of circuits such cross-terms are averaged out and we expect

U †χU U

=

I D

.

(3)

Here and below we use the horizontal bar on the top to denote averaging over the ensemble of random circuits. Because of this property it is possible to establish the connection between the quantity εm and the depolarization ﬁdelity after m cycles.
From Eqs. (2) and (3) we get

U †ρU U = εm |ψ0

I ψ0| + (1 − εm) D .

(4)

This is a depolarizing channel. From this and the exponential decay of ﬁdelity we get

εm = pm c ,

(5)

connecting εm to the depolarization ﬁdelity pc per cycle. The noise model (2) is very general in the context of
random circuits. To provide some insight about the origin of this model we consider a speciﬁc case with pure systematic error in the two-qubit gate. In this case the resulting pure state after the application of the random circuit U˜ with the error can be expanded into the direction of the ideal state vector and the orthogonal direction

U˜ |ψ0 = ξm |ψU + 1 − |ξm|2 |ϕU˜ ,

(6)

where

ψU |ϕU˜ = 0, ϕU˜ |ϕU˜ = 1 .

(7)

For the ensemble of random circuits U the error vector
is distributed completely randomly in the plane orthogo-
nal to the ideal vector U |ψ0 (see Fig. S5). This condition of orthogonality is the only constraint on the vector |ϕU˜ that involves |ψU . Therefore we expect

U † |ϕU˜

1

ϕU˜ | U

=

D

−

(I 1

− |ψ0

ψ0|) .

(8)

Also

U † ξm 1 − |ξm|2 |ψU ϕU˜ | + h.c U = 0 . (9)

This gives the connection between the error vector |ϕU˜ and the operator χU

(1

−

εm)χU

−

1

− εm D

|ψU

ψU | = (1 − |ξm|2) |ϕU˜

ϕU˜ |

+ ξm 1 − |ξm|2 |ψU ϕU˜ | + h.c . (10)

6
<EFBFBD><EFBFBD><EFBFBD><EFBFBD>

<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD> (α)

<EFBFBD><EFBFBD><EFBFBD><EFBFBD> <20><><EFBFBD><EFBFBD>

● θ = <20><><EFBFBD><EFBFBD> (<28><><EFBFBD><EFBFBD><EFBFBD>) ● θ = <20><><EFBFBD><EFBFBD> (<28><><EFBFBD><EFBFBD><EFBFBD>) ● θ = <20><><EFBFBD><EFBFBD> (<28><><EFBFBD><EFBFBD><EFBFBD>) ● θ = <20><><EFBFBD><EFBFBD> (<28><><EFBFBD><EFBFBD><EFBFBD>) ● θ = <20><><EFBFBD><EFBFBD> (<28><><EFBFBD><EFBFBD><EFBFBD>)
θ = <20><><EFBFBD><EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD> θ = <20><><EFBFBD><EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD> θ = <20><><EFBFBD><EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD> θ = <20><><EFBFBD><EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD> θ = <20><><EFBFBD><EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD>

FIG. S5. Cartoon: decomposition of the quantum state into the vector aligned with the ideal quantum state and its orthogonal complement

The resulting equation

|ξm|2

=

εm

+

1

− εm D

(11)

is to be expected, because |ξm|2 is the average state ﬁdelity while εm is the depolarization ﬁdelity (see Sec. V). Note that Eqs. (8)–(11) lead to Eq. (4). This result can also be derived assuming that single qubit gates form a 2-design in the Hilbert space of each qubit.
We demonstrate the above ﬁndings by numerically simulating the random circuits for 2 qubits that contains single qubit gates randomly sampled from Haar measure and ISWAP-like gate

1 0

0 0

V

(θ)

=

 

0 0

cos θ −i sin θ

−i sin θ cos θ

0 
0

.

(12)

00

01

The systematic error ∆θ = θ − π/2 corresponds to the deviation of the swap angle from π/2. Then assuming that the single qubit gates are error free the depolarizing channel model gives the prediction for the depolarizing ﬁdelity per cycle

|tr(V (θ)V †(π/2))|2 − 1

pc =

D2 − 1

1 = (8 cos(∆θ) + 2 cos(2∆θ) + 5) . (13)
15

As shown in Fig. S6 the depolarizing ﬁdelity pm c for the circuit of depth m based on Eq. (13) closely matches the corresponding quantity obtained by the averaging of the squared overlap over the ensemble of random circuits (cf. (11)

εm

=

D |ξm|2 − D−1

1

.

(14)

<EFBFBD><EFBFBD><EFBFBD><EFBFBD> <20><> <20><> <20><> <20><> <20><> <20><> <20><> <20><><EFBFBD><EFBFBD><EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD> (<28>)
FIG. S6. Plots of the circuit depolarizing ﬁdelity vs the circuit depth. Solid lines corresponds to the predictions from the depolarizing channel model (13) and points correspond to εm (14) obtained by the averaging of the squared overlap over the ensemble of random circuits. Diﬀerent colored pots correspond to diﬀerent values of the swap error ∆θ = 0.01(red), 0.02(blue), 0.03 (green), 0.04 (pink), 0.05 (black).

Returning to the generic case, property (3) can be extended so that for any smooth function f (u) the following relation holds

f (ps(q)) q| χU |q
q∈{0,1}n

=

f (ps(q)) +

D

q∈{0,1}n

, (15)

where |q is a computational basis state corresponding to bitstring q, and ps(q) = q| U ρ0U † |q is the simulated (computed) ideal probability of q. If the average is performed over a sample of √random circuits of size S then the correction is ∈ O(1/ S). We tested numerically for the case of n = 2 that relation (15) holds even for purely systematic errors in the case of a suﬃciently random set of single qubit gates.
We now make the critical step of estimating the parameter pm c from a set of experimental realizations of random circuits with m cycles. We map each measured bitstring q with a function f (ps(q)) and then average this function over the measured bitstrings. The standard XEB [6, 27] uses the natural logarithm, f (ps(q)) = log(ps(q)). In the main text we use the linear version of XEB, for which f (ps(q)) = Dps(q) − 1. Both these functions give higher values to bitstrings with higher simulated probabilities. Another closely related choice is the Heavy Output Generation test [30], for which f is a step-function.
Under the model (2), in an experiment with ideal state preparation and measurement, we obtain the bitstring q

7

with probability

pm c ps(q) + (1 − pm c ) q| χU |q ,

(16)

For the linear XEB, the average value of Dps(q)−1 when sampling with probabilities given by Eq. (16) is

Dps(q) − 1 = pm c D ps(q)2 − 1 .

(17)

q

Similarly to Eq. (15), the horizontal bar denotes averag-
ing over the random circuits.
The sum on the right hand side of (17) goes over all bit-
strings in the computational basis, and can be obtained
with numerical simulations. It can also be found analyti-
cally assuming that the random circuit ensemble approxi-
mates the Haar measure where for a given q the quantity
ps(q) is distributed with the beta distribution function (D − 1)(1 − ps)D−2. In this case the right hand side in (17) equals pm c (2D/(D + 1) − 1).
The experimental average on th√e left hand side of (17) can be estimated with accuracy 1/ SNs using S random circuit realizations with Ns samples each

1 S Ns SNs j=1 i=1

Dpjs(qi,j ) − 1

= Dps(q) − 1 + O

1 √
SNs

. (18)

This gives an estimate of pm c . This estimate can be justiﬁed using Bayes rule. The
log-likelihood for a set of experimental measurements
{qi,j} assuming that the experimental probabilities are given by Eq. (16) is proportional to

S Ns
log 1 + pm c (Dpjs(qi,j) − 1) ,
j=1 i=1

(19)

where pjs(q) is a simulated probability corresponding to the j-th circuit realization. We want to maximize the log-likelihood as a function of pm c . Taking the derivative with respect to pm c and equating to 0 we obtain

S j=1

Ns i=1

Dpjs(qi,j ) − 1 1 + pm c (Dpjs(qi,j ) − 1)

=

0

,

(20)

For pm c 1 it is easy to solve this equation and obtain the estimate

pm c

S j=1
S j=1

Ns i=1

Dpjs(qi,j ) − 1

Ns i=1

Dpjs(qi,j ) − 1 2

Dps(q) − 1 . D q ps(q)2 − 1

(21)

In the spirit of the XEB method, we can use other functions f (ps(q)) to estimate pm c . One alternative is

derived from the log-likelihood of a sample {qi,j} with respect to the simulated (computed) ideal probabilities

S Ns

log ΠSj=1ΠNi=s1pjs(qi,j ) =

log pjs(qi,j ) ,

j=1 i=1

(22)

which converges to the cross entropy between experimental probabilities and simulated probabilities. The experimental average of the function f (ps(q)) = log ps(q) under the probabilities from Eq. (16) with additional averaging over random circuits is

log ps(q) pm c

(ps(q) − 1/D) log ps(q)

q

1

+ D

log ps(q) .

(23)

q

As before, the sums on the right hand side can be ob-
tained with numerical simulations and the average value
on the left hand side can be estimated experimentally. This also gives an estimate of pm c .
Both Eq. (17) and Eq. (23) give a linear equation, from
which we can obtain an estimate of the total polarization pm c for an experimental implementation of one quantum circuit with m cycles. We normally use mutiple circuits with the same number of cycles m to estimate pm c , which we can do using the least squares method. Finally, we obtain an estimate of pc from a ﬁt of the estimates pm c as an exponential decay in m. This is standard in randomized
benchmarking [31, 32]. One advantage of this method
is that it allows us to estimate the cycle polarization pc independently of the state preparation and measurement
errors (SPAM). See also below.

B. XEB of a large number of qubits

We now consider the case of a large number of qubits n 1. We are typically interested in estimating the ﬁdelity F of each of a set of circuits with a given number of qubits and depth. As above, we write the output of an approximate implementation of the random quantum circuit U as

ρU = F |ψU ψU | + (1 − F )χU ,

(24)

where |ψU is the ideal output and F = ψU | ρU |ψU is the ﬁdelity. We do not necessarily assume χU = I/D, and we will ignore the small diﬀerence, of order 2−n,
n 1, between the ﬁdelity F and the depolarization
ﬁdelity p.
As for the case of small number of qubits n, we map
each output bitstring q with a function f (ps(q)). Given that the values q| χU |q resulting from errors are typically uncorrelated with the chaotic “speckles” of ps(q), we make our main assumption

1

q| χU |q f (ps(q)) = D f (ps(q)) + . (25)

q

q

8

10 2

100 n. qubits = 16 n. qubits = 20

Linear XEB Logarithmic XEB

n. qubits = 24

10 1

10 3

10 2

XEB Fidelity Fidelity estimate

10 4 9

10 11 12

13

14

Depth

FIG. S7. Absolute value of the XEB ﬁdelity between a random quantum circuit and the same circuit with a single Pauli error. Markers show the median over all possible positions in the circuit for both bit-ﬂip and phase-ﬂip errors. Error bars correspond√to the ﬁrst and third quartile. The dashed lines are the 1/ D theory prediction.

This equation is trivial if we assume a depolarizing model,
χU = I/D. More generally, it can be understood in the geometric context of concentration of measure [33–36] for
high dimensional spaces, and from Levy’s lemm√a [37] we expect a typical statistical ﬂuctuation ∈ O(1/ D) with D = 2n. We will only require F . We check Eq. (25)
numerically for the output ρe = |ψe ψe| where |ψe is the wave function obtained after a single phase-ﬂip or bit-
ﬂip error is added somewhere in the circuit, see Fig. S7
and Ref. [27]. We have also tested this assumption nu-
merically compa√ring the ﬁ√delity with the XEB estimate for a pure state F |ψU + 1 − F |ψ⊥ , see also Ref. [38] and Section X.
From Eqs. (24) and (25) we obtain Eq. (17) for linear
XEB, f (ps(q)) = Dps(q) − 1 (FXEB in the main text). We also obtain Eq. (23) for XEB, f (ps(q)) = log ps(q), with pm c replaced by ﬁdelity F . As before, the sums on the right hand side can be obtained with numerical sim-
ulations and the average value on the left han√d side can be estimated experimentally with accuracy 1/ Ns using Ns samples. This gives an estimate of F .
In practice, circuits of enough depth (as in the exper-
iments reported here) exhibit the Porter-Thomas distri-
bution for the measurement probabilities p = {ps(q)}, that is

Pr(p) = De−Dp .

(26)

In this case the linear cross entropy Eq. (17) gives

F = Dps(q) − 1 .

(27)

The standard deviation of the estimate of F with Ns samples from the central limit theorem is

10 310 15 20 2N5umb3e0r of q3u5bits,4n0 45 50 55

FIG. S8. Comparison of ﬁdelity estimates obtained using linear XEB, Eq. (27) and logarithmic XEB, Eq. (28) from bitstrings observed in our experiment using elided circuits (see Section VII G 1). Standard deviation smaller than markers.

(1 + 2F − F 2)/Ns. The cross entropy Eq. (23) gives

F = log Dps(q) + γ ,

(28)

where γ is the Euler-Mascheroni constant ≈ 0.577. The standard deviation of the estimate of F with Ns samples is (π2/6 − F 2)/Ns. The logarithmic XEB has a smaller standard deviation for F > 0.32 (it is the best estimate when F ≈ 1), while for F < 0.32 the linear XEB has a smaller standard deviation (it is the best estimate for F 1, where it relates to the maximum likelihood estimator). See Fig. S8 for comparison of the ﬁdelity estimates produced by the linear and logarithmic XEB.
We note in passing another example for an estimator of F related to the HOG test [30] which counts the number of measured bitstrings with probabilities ps(q) greater than the median of the probabilities. The function f (ps(q)) in this case returns 1 for Dps(q) ≥ log(2), and 0 in the other case. The ﬁdelity estimator uses the following normalization

1

F= log(2)

2ns(q) − 1

,

(29)

where ns(q) is deﬁned to be 1 if Dps(q) ≥ log(2), and 0 otherwise. The standard deviation of this estimator is [log−2(2) − F 2]/Ns, which is always larger than for
the XEB. See Fig. S9 for comparison of the ﬁdelity estimates produced by linear XEB and the HOG-based ﬁdelity estimator. HOG test is also related to a deﬁnition of quantum volume [39].

C. Two limiting cases
Here, we consider two special cases of Eq. (27) and the formula (1) in the main paper [1]. First, suppose

9

Substituting into equation (1) in the main paper yields

100

Linear XEB

FXEB = 1. The general case of a depolarizing error can be obtained from the two limiting cases by convex com-

Normalized HOG score

bination.

10 1
D. Measurement errors

Fidelity estimate

10 2

10 310 15 20 2N5umb3e0r of q3u5bits,4n0 45 50 55
FIG. S9. Comparison of ﬁdelity estimates obtained using linear XEB, Eq. (27) and normalized HOG score, Eq. (29) from bitstrings observed in our experiment using elided circuits (see Section VII G 1). Standard deviation smaller than markers.

bitstrings qi are sampled from the uniform distribution. In this case the sampling probability is 1/D for every bitstring and FXEB = 0. Therefore, if the qubits are in the maximally mixed state, the estimator yields zero ﬁdelity, as expected.
Second, suppose that bitstrings are sampled from the theoretical output distribution of a random quantum circuit. Assume that the distribution has Porter-Thomas shape. By Eq. (26), the fraction of bitstrings with theoretical probability in [p, p + dp] is

Pr(p) dp = De−Dpdp

(30)

and the total number of such bitstrings is

N (p) dp = D2e−Dp dp.

(31)

Therefore, the probability that a bitstring with probability in [p, p + dp] is sampled equals

p · N (p) dp = pD2e−Dp dp = f (p) dp

(32)

where f (p) is the probability density function of the random variable deﬁned as the ideal probability of a sampled bitstring, i.e. the random variable which is being averaged in the formula (1) of the main paper. Thus, the average probability of a sampled bitstring is

We now consider how measurement errors aﬀect the
estimation of ﬁdelity. Let us assume uncorrelated clas-
sical measurement errors, so that if the “actual” mea-
surement result of a qubit is 0, we can get 1 with prob-
ability em0, and similarly with probability em1 we get 0 for actual result 1, i.e., p(1|0) = em0, p(0|0) = 1 − em0, p(0|1) = em1, p(1|1) = 1 − em1. In this case the probability to get measurement result q = k1k2..kn for actual result q = k1k2..kn is the product of the corresponding factors. The probability of correct measurement result is
then

pm(q ) = (1 − em0)n−|q |(1 − em1)|q | ≈ (1 − em0)n/2(1 − em1)n/2,

(34)

where |q | is the number of 1s (Hamming distance from 00..0) in the initial bitstring q , and in the second expression we approximated |q | with n/2 for large n.
Now let us make a natural assumption that if there was one or more measurement errors, q → q, then the resulting ideal probability ps(q) is uncorrelated with the actual ideal probability ps(q ). Using this assumption we can write

F = FU pm

(35)

where FU is the circuit ﬁdelity and F is the complete (eﬀective) ﬁdelity. The complete ﬁdelity F is estimated as before. The measurement ﬁdelity pm can be obtained independently. For instance, we can prepare a bistring q and measure immediately to obtain the probability of a correct measurement result for q. We obtain pm by repeating this for a set of random bitstrings. We can therefore obtain FU from Eq. (35). As explained above, ﬁtting the depolarization ﬁdelity per cycle pc for diﬀerent circuit depths m is also a method to separate measurement errors.
The state preparation errors can be treated similarly, assuming that a single error leads to uncorrelated resulting distribution ps(q), so that the measurement ﬁdelity pm in Eq. (35) is combined with a similar factor describing the state preparation ﬁdelity.

1

1

ps(q) = pf (p) dp = p2D2e−Dp dp

0

0

2 =

1 − e−D

D2 +D+1

D

2

2 ≈.
D

V. QUANTIFYING ERRORS

(33)

An important test for this experiment is predicting

XEB ﬁdelity FXEB based on simpler measurements of

single- and two-qubit errors. Here we review how this is

10

calculated, illustrating important principles with the example of a single qubit. The general theory is described at the end of this section.
First, we assume Pauli errors describe decoherence using a depolarizing model. This model is used, for example, to compute thresholds and logical error rates for error correction. The parameter describing decoherence in a single qubit is the Pauli error eP , giving a probability eP /3 for applying an erroneous X, Y, or Z gate to the qubit after the gate, corresponding to a bit and/or phase ﬂip.
Second, the depolarization model is assumed to describe the system state using simple classical probability. The probability of no error for many qubits and many operations, corresponding to no change to the system state, is then found by simply multiplying the probability of no error for each qubit gate. This is a good assumption for RB and XEB since a bit- or phase-ﬂip error eﬀectively decorrelates the state. The depolarization model assumes that when there is an error with probability ed, the system state randomly splits to all qubits states, which has Hilbert space dimension D = 2n. This is described by a change in density matrix ρ → (1 − ed)ρ + ed × 11/D. Note the depolarization term has a small possibility of the state resetting back to its original state. For a single qubit where D = 2, this can be described using a Paulierror type model as a probability ed/4 applying a I, X, Y, or Z gate. Comparing to the Pauli model, the error probability thus needs to be rescaled by ed = eP /(1 − 1/D2). This gives a net polarization p of the qubit state due to many Pauli errors as

p = 1 − eP (i)/ 1 − 1/D2 .

(36)

i

Third, the eﬀect of this depolarization has to be accounted for considering the measured signal. The measured signal for randomized benchmarking is given by RB = p(1 − 1/D) + 1/D, which can be understood in a physical argument that a complete randomization of the state has a 1/D chance to give the correct ﬁnal state. A cross-entropy benchmarking measurement gives FXEB = p. A measurement of p, which can have oﬀsets and prefactors in these formulas, also includes other scaling factors coming from state preparation and measurement errors. All of these scaling issues are circumvented by applying gates in a repeated number of cycles m such that p = pm c . A measurement of the signal versus m can then directly pull out the fractional polarization change per cycle, pc, independent of these scale factors.
Fourth, from this polarization change we can then compute the Pauli error, which is the metric that should be reported since it is the fundamental error rate that is independent of D. Unfortunately, a ﬁdelity 1 − eP /(1 + 1/D) for RB is commonly reported, which has a Ddependent correction. We recommend this practice be changed, but note that removing the 1/(1 + 1/D) factor decreases the reported ﬁdelity value. We also recommend reporting Pauli error, eP instead of entanglement ﬁdelity

(1−eP ), since it is more intuitive to understand how close some quantity is to 0 than to 1. Table I summarizes the diﬀerent error metrics and their relations.
This general model can also account for nondepolarizing errors such as energy decay, since quantum states in an algorithm typically average over the entire Bloch sphere (as in XEB), or for example when the algorithm purposely inserts spin-echoes. Thus the average eﬀect of energy decay eﬀectively randomizes the state in a way compatible with Pauli errors. For a gate of length tg with a qubit decay time T1, averaging over the Bloch sphere (2 poles and 4 equator positions) gives (to ﬁrst order) an average error probability ea = tg/3T1. Using Table I, this converts to a Pauli error eP = tg/2T1.
A detailed theory of the D scaling factor is as follows. In order to arrive at a ﬁrst order estimate on how error rates accumulate on random quantum circuits, the errors can be modeled via the set of Kraus operators. The density matrix of the system ρ after application of a gate is connected to the density matrix ρ0 before the gate as follows:

K
ρ = Λ(ρ0) = Akρ0A†k,
k=0

A†kAk = 11.
k

(37)

For the closed-system quantum evolution with unitary U (no dephasing nor decay) the sum on the right hand side contains only one term with k=0 and A0 = U . In general, Kraus operators describe the physical eﬀects of many types of errors (control error, decoherence, etc.) that can explicitly depend on the gate. Knowing the Kraus operators allows us to calculate the total error budget as well as its individual components.
Conventionally, circuit ﬁdelities are reported as a metric of its quality. To make a connection to physically observable quantities, the average ﬁdelity can be expressed in terms of Kraus operators. In the absence of leakage errors and cross-talk the average ﬁdelity equals

F = 1 − eP , 1 + 1/D

1 eP = 1 − D2

K

| tr(U A†k)|2

(38)

k=0

where D = 2n is the dimension of the Hilbert space and the quantity eP plays a role of a Pauli error probability in the depolarizing channel model (see below).
For random circuits the eﬀects of errors can be described by a depolarizing channel model, with Kraus operators of the form

Ak =

eP D2 −

1

PkU,

k = 0,

(39)

√

A0 = 1 − eP P0U,

Pk = σk1 ⊗ σk2 . . . ⊗ σkn

where Pk are strings of Pauli operators σkj for individual qubits for kj = 1, 2, 3 and also identity matrices σ0 in the qubit subspace for kj = 0. This form assumes that individual Pauli errors all happen with the same probability
eP .

11

TABLE I. A Rosetta stone translation between error metrics. In single- and two-qubit RB or XEB experiments, we measure the per-gate (or per-cycle) depolarization decay constant p. The second column shows conversions from this rate to the various error metrics. The last two columns are representative comparisons for 0.1% Pauli error.

Error metric Pauli error (ep, rP ) a Average error (ea, r) Depolarization error (ed)

Relation to depolarization decay constant p (1 − p)(1 − 1/D2) (1 − p)(1 − 1/D) 1−p

n=1 (D=2) 0.1%
0.067% 0.133%

n=2 (D=4) 0.1% 0.08%
0.107%

a 1− process ﬁdelity, or 1− entanglement ﬁdelity

To make a connection to experimental measurements of the cross-entropy we substitute (39) into (37) and obtain

Λ(ρ0) = (1 − eP )U ρ0U −1

+ eP

11 − U ρ0U −1 . (40)

D − 1/D

D

We compare this expression with the standard form of the depolarizing channel model

Λ(ρ0)

=

pU ρ0U −1

11 + (1 − p)
D

,

(41)

expressed in terms of the depolarization ﬁdelity parameter p. Note the diﬀerence between the expressions. On the one hand, in (41) the second term corresponds to full depolarization in all directions. On the other hand, in (40) the second term describes full depolarization in all directions except for the direction corresponding to the ideal quantum state.
From (40), (41) one can establish the connection between the Pauli error rate and depolarizing ﬁdelity parameter p

eP = (1 − p)(1 − 1/D2)

(42)

We note that the explicit assumption of connecting Pauli errors to depolarization is needed for the small D case, typically for single- and two-qubit error measurements. Once we have measured the Pauli errors, then only a simple probabilistic calculation is needed to compute FXEB in the large D case.

VI. METROLOGY AND CALIBRATION
A. Calibration overview
Quantum computations are physically realized through the time-evolution of quantum systems steered by analog control signals. As quantum information is stored in continuous amplitudes and phases, these control signals must be carefully chosen to achieve the desired result. Calibration is the process of performing a series of experiments on the quantum system to learn optimal control parameters.

Calibration is challenging for a number of reasons. Analog control requires careful control-pulse shaping as any deviation from the ideal will introduce error. Qubits require individual calibration as variations in the control system and qubits necessitate diﬀerent control parameters to hit target ﬁdelities. Optimal control parameters can also drift in time, requiring calibrations to be revisited to maintain performance. Additionally, the full calibration procedure requires bootstrapping: using a series of control sequences with increasing complexity to determine circuit and control parameters to increasingly higher degrees of precision. Lastly, each qubit needs to perform a number of independent operations which are independently calibrated: single-qubit gates, two-qubit gates, and readout.
Our Sycamore processor oﬀers a high degree of programmability: we can dynamically change the frequency of each qubit, as well as the eﬀective qubit-qubit coupling between nearest neighbor qubits. This tunability gives us the freedom to enact many diﬀerent control strategies, as well as account for non-uniformities in the processor’s parameters. However, these extra degrees of freedom are a double-edged sword. Additional control knobs always introduce a source of decoherence and control errors as well as an added burden on calibration.
Our approach is to systematize and automate our calibration procedure as much as possible, thus abstracting complexity away. This automation allows us to turn calibration into a science, where we can compare calibration procedures to determine optimal strategies for time, performance, and reliability. By employing calibration science to study full-system performance with diﬀerent control strategies, we have been able to improve full-system ﬁdelities by over an order of magnitude from initial attempts while decreasing the calibration time and improving reliability. Lastly, we design our calibration to be done almost entirely at the single- or two-qubit level, rather than at the system level, in order to be as scalable as possible.
1. Device registry
The device registry is a database of control variables and conﬁguration information we use to control our quantum processors. The registry stores information such as operating frequencies, control biases, gate parameters

12

a

Active Inactive b

c

last two-qubit gate

last single-qubit gate

last readout ﬁrst

single-qubit control coupler control

single-qubit gates two-qubit gates readout

FIG. S10. Optimus calibration graph for Sycamore. Calibration of physical qubits is a bootstrapping procedure between diﬀerent pulse sequences or “experiments” to extract control and system parameters. Initial experiments are coarse and have interplay between fundamental operations and elements such as single-qubit gates, readout, and the coupler. Final experiments involve precise metrology for each of the qubit operations: single-qubit gates, two-qubit gates, and readout.

such as duration, amplitude, parameterization of circuit models, etc. The goal of calibration is to experimentally determine and populate the registry with optimal control parameters. We typically store >100 parameters per qubit to achieve high ﬁdelity across all of the various qubit operations. The large number of parameters and subtle interdependencies between them highlights the need for automated calibration.

2. Scheduling calibrations: “Optimus”
We seek a strategy for identifying and maintaining optimal control parameters for a system of physical qubits given incomplete system information. To perform these tasks, we use the “Optimus” formulation as in Ref [40], where each calibration is a node in a directed acyclic graph that updates one or more registry parameters, and the bootstrapping nature of calibration sequences is represented as directed edges between nodes. Now, calibrating a system of physical qubits becomes a welldeﬁned graph traversal problem. The calibration graph used for the Sycamore device can be see in Figure S10. This strategy is particularly useful for maintaining calibrations in the presence of drift, where we want to do the minimal amount of work to bring the system back in spec, and when extending the calibration procedure, as interdependencies are explicit. Typical timescales for bringup of a new Sycamore processor are approximately 36 hours upon ﬁrst cooldown, and 4 hours per day thereafter for maintaining calibrations. These times are speciﬁc to current available technology, and can be signiﬁcantly improved.

Root

Single qubits

Grid

FIG. S11. Conﬁgurations of the device over the course of calibration. a, In the root conﬁguration, we start with no knowledge of the system and measure basic device parameters. b, We create a single qubit conﬁguration for each qubit, where all qubits except the qubit of interest are biased to near zero frequency. c, Using knowledge learned in the single qubit conﬁgurations, we build a grid of qubits.

B. Calibration procedure
1. Device conﬁguration
Throughout the calibration procedure, the device registry may be conﬁgured in diﬀerent states in order to calibrate certain parameters. We call these diﬀerent states “device conﬁgurations”, and diﬀerent kinds of conﬁgurations reﬂect our knowledge of the system at diﬀerent points in the full calibration procedure. As illustrated in Figure S11, the primary diﬀerence between the diﬀerent conﬁgurations is the set of “active” qubits, where active qubits are biased to an operating frequency between 5-7 GHz, and “inactive” qubits are biased near zero frequency. Following the outline above, we have three device conﬁgurations of interest:
a. Root conﬁg. The root conﬁguration is the starting state of the system immediately after cool down and basic system veriﬁcation. In this conﬁguration, we calibrate coarse frequency vs bias curves for each readout resonator, qubit, and coupler.
b. Single qubit conﬁg. After completing root calibrations, we now know how to bias each qubit to its minimum and maximum frequencies. We create one conﬁguration of the device registry for each qubit, where the qubit of interest is biased in a useful region (5-7 GHz) and the remaining qubits are biased to their minimum frequencies in order to isolate the qubit of interest. In each of these conﬁgurations, we ﬁne tune the bias vs frequency curves for the qubit and its associated couplers and resonators, and also measure T1 as a function of frequency, necessary due to background TLS defects and modes.
c. Grid conﬁg. After completing calibrations in each isolated qubit conﬁguration, we feed the information we learned into a frequency optimization procedure. The optimizer places the biases for each qubit and coupler in a user deﬁned grid of any desired size up to the entire chip. We then proceed to calibrate high ﬁdelity

13

single qubit gates, two qubit gates, and readout.
2. Root conﬁg: procedure
We begin calibration with simple frequency-domain experiments to understand how each qubit and coupler responds to its ﬂux bias line.
• Calibrate each parametric ampliﬁer (ﬂux bias, pump frequency, pump power).
• For each qubit, identify its readout resonator and measure the readout signal versus qubit bias (“Resonator Spectroscopy”) [41]. Estimate the resonator and qubit frequency as a function of qubit bias.
• For each coupler, place one of its qubits near maximum frequency and the other near minimum frequency, then measure the readout signal of the ﬁrst qubit as a function of coupler bias. The readout signal changes signiﬁcantly as the coupler frequency passes near the qubit frequency. Identify where the coupler is near its maximum frequency, so the qubit-qubit coupling is small (a few MHz) and relatively insensitive to coupler bias.
3. Single-qubit conﬁg: procedure
After setting the biases to isolate a single qubit, we follow the procedure outlined in [42] which we will summarize here:
• Perform ﬁxed microwave drive qubit spectroscopy while sweeping the qubit bias and detecting shifts in the resonator response, to ﬁnd the bias that places the qubit at the desired resonant frequency.
• Using the avoided level crossing identiﬁed in the root conﬁg, determine the operating bias to bring the qubit on resonance with its readout resonator to perform active ground state preparation. We use a 10 µs pulse consistent with the readout resonator ringdown time.
• Perform power Rabi oscillations to ﬁnd the drive power that gives a π pulse to populate the |1 state.
• Optimize the readout frequency and power to maximize readout ﬁdelity.
• Fine tune parameters (qubit resonant frequency, drive power, drive detuning [43]) for π and π/2 pulses.
• Calibrate the timing between the qubit microwave drive, qubit bias, and coupler bias.
• Perform qubit spectroscopy as a function of qubit bias to ﬁne tune the qubit bias vs frequency curves.

• Measure T1 vs. frequency by preparing the qubit in |1 then biasing the qubit to a variable frequency for a variable amount of time, and measuring the ﬁnal population [44].
• Measure the response of a qubit to a detuning pulse to calibrate the frequency-control transfer function [6, 42, 45].
With the single-qubits calibrated in isolation, we have a wealth of information on circuits parameters and coherence information for each qubit. We use this information as input to a frequency placement algorithm to identify optimal operating frequencies for when the full processor is in operation.
4. Optimizing qubit operating frequencies
In our quantum processor architecture, we can independently tune each qubit’s operating frequency. Since qubit performance varies strongly with frequency, selecting good operating frequencies is necessary to achieve high ﬁdelity gates. In arbitrary quantum algorithms, each qubit operates at three distinct types of frequencies: idle, interaction, and readout frequencies. Qubits idle and execute single-qubit gates at their respective idle frequencies. Qubit pairs execute two-qubit gates near their respective interaction frequencies. Finally, qubits are measured at their respective readout frequencies. In selecting operating frequencies, it is necessary to mitigate and make nontrivial tradeoﬀs between energy-relaxation, dephasing, leakage, and control imperfections. We solve and automate the frequency selection problem by abstracting it into an optimization problem.
We construct a quantum-algorithm-dependent and gate-dependent optimization objective that maps operating frequencies onto a metric correlated with system error. The error mechanisms embedded within the objective function are parasitic coupling between nearestneighbor and next-nearest-neighbor qubits, spectrallydiﬀusing two-level-system (TLS) defects [44], spurious microwave modes, coupling to control lines and the readout resonator, frequency-control electronics noise, frequency-control pulse distortions, microwave-control pulse distortions, and microwave-carrier bleedthrough. Additional considerations in selecting readout frequencies are covered in Section VI D. The objective is constructed from experimental data and numerics, and the individual error mechanisms are weighted by coefﬁcients determined either heuristically or through statistical learning.
Minimizing the objective function is a complex combinatorial optimization problem. We characterize the complexity of the problem by the optimization dimension and search space. For a processor with N qubits on a square lattice with nearest-neighbor coupling, there are N idle, N readout, and ∼ 2N interaction frequencies to optimize. In an arbitrary quantum algorithm, all

Ideal

+ Control Noise

+ Pulse Distortions

14

+ NN Parasitic Coupling

+ NNN Parasitic Coupling

+ TLS, Modes, Control Lines, ...

Idle Freq. (a.u.)

FIG. S12. Idle frequency solutions found by our Snake optimizer with diﬀerent error mechanisms enabled. The optimizer makes increasingly complex tradeoﬀs as more error mechanisms are enabled. These tradeoﬀs manifest as a transition from a structured frequency conﬁguration into an unstructured one. Similar tradeoﬀs are simultaneously made in optimizing interaction and readout frequencies. Optimized idle and interaction operating frequencies are shown in Figure S13 and optimized readout frequencies are shown in Figure S20. Color scales are chosen to maximize contrast. Grey indicates that there is no preference for any frequency.

frequencies are potentially intertwined due to coupling between qubits. Therefore, the optimization dimension is ∼ 4N . The optimization search-space is constrained by qubits’ circuit parameters and control-hardware speciﬁcations. Discretizing each qubit’s operational range to 100 frequencies results in an optimization search space of ∼ 1004N . This is much larger than the dimension of the Hilbert space of an N qubit processor, which is 2N .
Given the problem complexity, it is assumed that ﬁnding globally optimal operating frequencies is intractable. However, we have empirically veriﬁed that locally optimal solutions are suﬃcient for state-of-the-art system performance. To ﬁnd local optima, we developed the “Snake” homebrew optimizer that combines quantum algorithm structure with physics intuition to exponentially reduce optimization complexity and take intelligent optimization steps. For the circuits used here, the optimizer exploits the time-interleaved structure of singlequbit gates, two-qubit gates, and readout. For our 53 qubit processor, it returns local optima in ∼ 10 seconds on a desktop. Because of its favorable scaling in runtime versus number of qubits, we believe the Snake optimizer is a viable long-term solution to the frequency selection problem.
To illustrate how the Snake optimizer makes tradeoﬀs between error mechanisms, we plot idle frequency solutions with diﬀerent error mechanisms enabled (Figure S12). Starting with an ideal processor with no error mechanisms enabled, there is no preference for any frequency conﬁguration. Enabling frequency-control electronics noise, the optimizer pushes qubits towards their respective maximum frequencies, to minimize ﬂuxnoise susceptibility. Note that each qubit has a different maximum frequency due to fabrication variability. Enabling frequency-control pulse distortions forces a gradual transition between qubit frequencies to minimize two-qubit-gate frequency-sweep amplitudes. Enabling nearest-neighbor (NN) and next-nearest neighbor (NNN) parasitic coupling further lowers the degeneracy between qubit frequencies into a structure that resem-

bles a multi-tiered checkerboard. Finally, enabling errors from TLS defects, spurious microwave modes, and all other known error mechanisms removes any obvious structure. A set of optimized idle and interaction frequencies is shown in Figure S13, and readout frequencies are shown in Figure S20.
5. Grid conﬁg: procedure
Calibrating a grid of qubits follows the same procedure as calibrating an isolated qubit with additional calibrations to turn oﬀ the qubit-qubit coupling.
• Achieve basic state discrimination for each qubit at its desired frequency.
• For each coupler, minimize the qubit-qubit coupling (note changing coupler biases aﬀects qubit frequencies). For each case below, we choose the coupler bias minimizing the interaction.
– For qubit pairs idling within 60 MHz of each other, use a resonant swapping experiment. We excite one qubit and apply ﬂux pulses to nominally put the qubits on resonance and let the qubits interact over time [9].
– For qubit pair idling further apart, use a conditional phase experiment. We perform two Ramsey experiments on one qubit, where the other qubit is in the ground state and the excited state, to identify the state-dependent frequency shift of the ﬁrst qubit.
• Adjust the qubit biases to restore the desired qubit frequencies and proceed with qubit calibration as in the single-qubit conﬁgurations.
• Calibrate the entangling gate.
– Estimate the qubit pulse amplitudes to reach the desired interaction frequency with their frequency versus bias calibration.

a

Idle Frequencies (GHz)

15
– Fine-tune the qubit pulse amplitudes to reach resonance, compensating for pulse undershoot.
– Tune the coupler pulse amplitude to achieve a complete photon exchange.
In the next two sections, we describe in more detail the ﬁne tuning required to achieve high ﬁdelity two qubit gates and multiqubit readout.

C. Two-qubit gate metrology
High-ﬁdelity two-qubit gates are very hard to achieve. In an eﬀort to make this easier, we design qubits with tunable frequencies and tunable interactions. This added control allows for immense ﬂexibility when implementing gates. In the following subsections, we discuss a simple high-ﬁdelity control and metrology strategy for two-qubit gates in our system.

1. The natural two-qubit gate for transmon qubits

b

Interaction Frequencies (GHz)

FIG. S13. Optimized idle and interaction frequencies found by our Snake optimizer. a, Idle frequencies, b, interaction frequencies. Readout frequencies are shown in Figure S20. These solutions are suﬃcient for state-of-the-art system performance. See Figure S12 to understand some of the tradeoﬀs that are made during optimization. Color scales are chosen to maximize contrast.

Consider two transmon qubits at diﬀerent frequencies (say 6.0 and 6.1 GHz). Here are two potential ways of generating a multi-qubit gate in this system. If the qubits are tuned into resonance, then excitations swap back-and-forth and this interaction can be modeled as a partial-iSWAP gate [46]. If the qubits are detuned by an amount close to their nonlinearity, then the 11state undergoes an evolution that can be modeled as a controlled-phase gate (assuming the population does not leak) [47, 48]. In fact, any two-qubit control sequence that does not leak can be modeled as a partial-iSWAP followed by a controlled-phase gate.
A typical control sequence is shown Fig. S14a. Gate times of 12 ns are chosen to trade oﬀ decoherence (too slow) and leakage to higher states of the qubit (too fast). Figure S14b depicts how this operation can be decomposed as a quantum circuit. This circuit contains Zrotations that result from the frequency excursions of the qubits, and can be expressed by the unitary:

1

0

0

0

0 ei(∆++∆−) cos θ −iei(∆+−∆−,off ) sin θ 0 −iei(∆++∆−,off ) sin θ ei(∆+−∆−) cos θ

0 0

 

.



0

0

0

ei(2∆+ −φ)

(43)

These gates have an eﬃcient mapping to interacting

fermions and have been coined ‘fSim’ gates, short for

fermionic simulation [49]. The long-term goal is to im-

plement the entire space of gates (shown in Fig. S14c).

For quantum supremacy, the two-qubit gate of choice is

the iSWAP gate. For example, CZ is less computationally

expensive to simulate on a classical computer by a factor

of two [38, 50]. A dominant error-mechanism when try-

ing to implement an iSWAP is a small conditional-phase

ﬂux
cphase(ϕ) iSWAP(θ)†

a Qubit
Qubit Coupler b
Z1 Z2

12 ns

time

Z3

Z4

fSim(θ, ϕ)

c 90

iSWAP

Sycamore

SWAP

Swap angle θ (degs.)

45 sqrt(iSWAP)

CZ 0

0

90

180

Conditional phase ϕ (degs.)

FIG. S14. Two-qubit gate strategy. a, Control waveforms for two qubits and a coupler. Each curve represents the control ﬂux applied to the qubit’s and coupler’s SQUID loops as a function of time. b, Generic circuit representation for an arbitrary two-qubit gate using ﬂux pulses. This family of gates have been named “fSim” gates, short for fermionic-simulation gates. Our deﬁnition of the fSim gate uses θ with the sign opposite to the common convention for the iSWAP gate. c, Control landscape for fSim gates as a function of the swap angle and conditional phase, up to single qubit rotations. The coordinates of common entangling gates are marked along with the Sycamore gate fSim(θ = 90◦, φ = 30◦).

that is generated by an interaction of the |11 -state with higher states of the transmons (|02 and |20 ). For this reason, the fSim gate with swap-angle θ 90◦ and conditional phase φ 30◦ has become the gate of choice in our supremacy experiment. Note that small deviations from these angles are also viable quantum supremacy gates. These gates result from the natural evolution of two qubits making them easy to calibrate, high intrinsic ﬁdelity gates for quantum supremacy.

2. Using cross entropy to learn a unitary model
We have recently introduced cross-entropy as a ﬁdelity metric for quantum supremacy experiments. Cross-

16
entropy benchmarking (XEB) was introduced as an analog to randomized benchmarking (RB) that can be used with any number of qubits and is independent of statepreparation and measurement errors [6, 27].
A distinct advantage of XEB is that the resulting data can be analyzed to ﬁnd an optimal representation of a unitary; this process is outlined in Fig. S15. The gate sequence for a two-qubit XEB experiment is shown in Fig. S15a. The sequence alternates between single-qubit gates on both qubits and a two-qubit gate between them. At the end of the sequence, both qubits are measured and the probabilities of bitstrings (00, 01, 10, 11) are estimated. This procedure is repeated for ∼10-20 instances of randomly selected single-qubit gates. The measured probabilities can then be compared to the ideal probabilities using the expression for ﬁdelity Eq. (3) in Ref. [6].
The data from a two-qubit XEB experiment is shown in Fig. S15b (green dots). By performing additional sequences with tomography rotations prior to measurement, we can infer the decay of purity with increasing circuit depth (blue dots). For two qubits, the decay of ﬁdelity tells us the total error of our gates while the purity decay tells us the contribution from decoherence —the diﬀerence is control error. Based on the data in green and blue, it appears that the total error is about half control and half decoherence.
So far, we have established a generic unitary model (Fig. S14b), a training dataset (Fig. S15a), and a costfunction (Fig. S15b). These three ingredients form the foundation for using optimization techniques to improve ﬁdelity. Using a simple Nelder-Mead optimization protocol, we can maximize the XEB ﬁdelity by varying the parameters of the unitary model. The ﬁdelity decay curve for the optimal unitary model are shown in Fig. S15b (orange dots). The optimized results are nearly coherence limited.
The optimal control-model parameters for all pairs are shown as integrated histograms in Fig. S16a,b. Panel (a) shows the histograms for partial-iSWAP angles (∼90 degrees) and conditional phases (∼30 degrees). Panel (b) shows histograms for the various ﬂavors of Z-rotations. While conceptually there are four possible Z-rotations (see Fig. S14b), only three of these rotations are needed to uniquely deﬁne the operation. These three rotations can be thought of as the detuning of the qubits before the iSWAP, the detuning after the iSWAP, and an overall frequency shift of both qubits which commutes with the iSWAP.
3. Comparison with randomized benchmarking
In Fig. S17 we show that two-qubit gate ﬁdelity extracted using XEB agrees well with the ﬁdelity as measured with RB, an important sanity check in validating XEB as a gate metrology tool. In two-qubit XEB, we extract the error per cycle which consists of a single-qubit gate on each qubit and a two-qubit gate between them.

a

basic calibrations

a

U(θ, ϕ)

XEB circuit

U(θ, ϕ)
U(θ, ϕ) U(θ, ϕ) U(θ, ϕ)

cycle 1

2

q1 q2

3

m

. . .

. . .

b

classical computer

quantum computer

optimization loop θ, ϕ cross entropy, error

Integrated histogram

17
Conditional phase ϕ Swap angle θ

30o

90o

Control angles (degs.)

Frequency shift (2Δ+)
Detuning before (Δ-+Δ-,oﬀ) Detuning after (Δ--Δ-,oﬀ)

Integrated histogram

XEB ﬁdelity

b 1.0 0.8 0.6 0.4 0.2 0.0 0

U(θ', ϕ')
purity error = 0.541(3)% XEB error (optimized) = 0.62(2)%
XEB error = 1.06(5)%

200

400

600

800

Number of cycles

1000

FIG. S15. Using XEB to learn a unitary model. a, Process ﬂow diagram for using XEB to learn a unitary model. After running basic calibrations, we have an approximate model for our two-qubit gate. Using this gate, we construct a random circuit that is fed into both the quantum computer and a classical computer. The results of both outputs can be compared using cross-entropy. Optimizing over the parameters in the two qubit model provide a high-ﬁdelity representation of the two-qubit unitary. b, Data from a two-qubit XEB experiment. The two-qubit purity (blue) was measured tomographically and provides the coherence-limit of the operations. The decay of the XEB ﬁdelity is shown in green and orange. In orange, the parameters of a generic unitary model were optimized to determine a higher-ﬁdelity representation of the unitary. All errors are quoted as Pauli errors.

Control angles (degs.)
FIG. S16. Parameters of the control model. A generic model for two-qubit gates using ﬂux-control has ﬁve free parameters. Using XEB we can measure these parameters with high ﬁdelity. a, Integrated histogram (cumulative distribution) of the control parameters that determine the interaction between the qubits. b, An integrated histogram of the remaining three parameters that represent diﬀerent ﬂavors of single-qubit Z-rotations. While the ﬁrst two parameters (panel a) deﬁne the entangling gate, the ﬁnal three parameters (panel b) are simply measured and then kept track of during an algorithm. Intuitively, these three angles correspond to a detuning before the swap, a detuning after the swap, and an overall frequency shift which commutes through the swap; these correspond to ∆− +∆−,oﬀ , ∆− −∆−,oﬀ , and 2∆+ respectively in Eq. (43). Note that θ and φ angles are 360 degrees periodic and Z-rotation angles are 720 degree periodic.
In Fig. S17a we show the individual RB decay curves for single-qubit gates. In panel b, we show the RB decay curve for benchmarking a CZ gate. Adding up the three errors from RB, we would expect an XEB cycle error of 0.57%. In panel c, we show the measured XEB decay curve which indicates a cycle error of 0.59% —nearly identical to the value predicted by RB.
For single-qubit gate benchmarking on the Sycamore device used in this work (see Table II), we ﬁnd that π pulse ﬁdelities are somewhat worse than π/2 pulse ﬁdelities, which we attribute to reﬂections from the imper-

fect microwave environment. Because the XEB gateset

a

we have used consists only of π/2 pulses, we ﬁnd that

the single-qubit gate errors extracted from conventional

RB, which contains π pulses, are somewhat higher than

those extracted from single-qubit XEB. Using only π/2

pulses instead of π pulses in single-qubit RB brings the

extracted error close to that measured via XEB.

18
Single qubit RB
1Q gate error = 0.09% & 0.07%

Fidelity

4. Speckle purity benchmarking (SPB)

It is experimentally useful to be able to extract state purity from XEB experiments in order to error-budget the contribution of decoherence. Conventionally, purity estimation can be done with state tomography, where the full density matrix ρ is reconstructed and used to quantify the state purity. This involves expanding a single sequence into a collection of sequences each appended with single-qubit gates. Unfortunately, full tomographic reconstruction scales exponentially in the number of qubits, both for the number of sequences needed as well as the number of measurements needed per sequence. Here, we introduce an exponentially more eﬃcient method to extract the state purity without additional sequences.
We use a re-scaled purity deﬁnition such that a fullydecohered state has a purity of 0, and a pure state has a purity of 1. We deﬁne

D Purity =

Tr(ρ2) − 1 ,

(44)

D−1

D

which is consistent with what is deﬁned in Ref. [51]. This can be understood as the squared length of the generalized Bloch vector in D dimensions (for a qubit, D = 2, this deﬁnition gives X 2 + Y 2 + Z 2).
Speckle Purity Benchmarking (SPB) is the method of measuring the state purity from raw XEB data. Assuming the depolarizing-channel model with polarization parameter p, we can model the quantum state as

11

ρ = p |ψ ψ| + (1 − p) .

(45)

D

Here, p is the probability of a pure state |ψ (which in this case is not necessarily known to us), while 1 − p is the probability of being in the fully-decohered state (11 is the identity operator). For the state (45), from the deﬁnition (44) it is easy to ﬁnd the relation

Purity = p2.

(46)

We will now work out how to obtain p2 from a distribution of measured probabilities Pm of various bitstrings for a sequence, collected over many XEB sequences (Figs. S18a and S18b).
First, we note that for p = 0 the probabilities of all bitstrings are 1/D, and the distribution is the δ-function located at 1/D (the integrated histogram is then the stepfunction – see Fig. S18b). In contrast, if p = 1, then

Number of cliﬀords

b

Two qubit RB

reference interleaved CZ

CZ error = 0.41%

Fidelity

Number of cliﬀords

c

Two qubit XEB

XEB error = 0.59%

XEB Fidelity

Number of cycles
FIG. S17. Sanity check: XEB agrees with RB. a, Singlequbit randomized benchmarking (RB) data taken separately on two qubits. b, Two-qubit randomized benchmarking data for a CZ on the same pair of qubits. c, Two-qubit crossentropy benchmarking (XEB) on the same pair of qubits. The measured XEB error (0.59% / cycle) agrees well with the prediction from single- and two-qubit RB (0.57%). All errors are quoted as Pauli errors.

the measured probabilities Pm follow the D-dimensional Porter-Thomas distribution [27]

PPT(Pm) = (D − 1)(1 − Pm)D−2,

(47)

19

which has the same average 1/D and variance

D−1

VarPT(Pm)

=

D2(D

+

. 1)

(48)

For the fully-decohered state all bitstrings have the same probability 1/D, so in this case the variance of the distribution of probabilities is zero. For the state (45) with an arbitrary p, the histogram of probabilities Pm will be described by the distribution (47) shrunk towards the average 1/D by the factor p. Consequently, the variance of the experimental probabilities will be p2 times the Porter-Thomas variance (48).
Thus, we can ﬁnd p2 by dividing the variance of experimentally measured probabilities Pm by the PorterThomas variance (48). Finally, using the relation (46) for the depolarization model (45), we can relate the variance of the experimental probabilities Pm to the average state purity

D2(D + 1)

Purity = Var(Pm) D − 1 .

(49)

With these convenient relations, we√can directly compare the XEB ﬁdelity FXEB = p to Purity from SPB on the same scale, and check their dependence p = pm c on the number of cycles m. Without systematic control errors, the XEB and SPB results should coincide. Experimentally, we always have control errors which lead us to incorrectly predict |ψ , so control errors give XEB a higher error than SPB. Thus, with a single XEB dataset we can extract the XEB error per-cycle, and the purity loss per-cycle with SPB. By subtracting these, we are left with the control error per-cycle. Thus, with a single experiment we can error budget total error into control error and decoherence error.
These relationships can be seen experimentally in Figure S18. Amazingly, computing the speckle purity can be done with no knowledge of the speciﬁc gate sequence performed; as long as the experiment introduces suﬃcient randomization of the Hilbert Space, Porter-Thomas statistics apply. Practically, SPB allows us to measure the state purity from raw XEB data with exponentially fewer number of pulse sequences as compared to full state tomography. This favorable scaling allows one to extend purity measurements to larger numbers of qubits. It is important to note that an exponential number of measurements are still required to fully characterize the probability distribution for a given sequence, as in tomography, so purity measurements of the full processor are impractical.

5. “Per-layer” parallel XEB
To execute quantum circuits eﬃciently, it is helpful to run as many gates as possible in parallel. We wish to benchmark our entangling gates operating simultaneously. Resulting ﬁdelities and optimized unitaries may

Prob. a
P00 P01 P10 P11 b
c
FIG. S18. “Speckle” purity extracted from XEB. a, Measured probabilities from XEB for a two-qubit system and 30 random circuits. Raw probabilities show a speckle pattern at low cycles (orange dashed) over circuit instance and probabilities (|00 , |01 , |10 , |11 ). The speckle contrast decreases with cycles and thus decoherence (green dashed). b, Integrated histogram (cumulative distribution) of probabilities. The x-axis is scaled by the dimension D = 22, so the uniform distribution is a step function at 1.0. At low cycles, the distribution is well-described by Porter-Thomas, and at high cycles, the distribution approaches the uniform distribution. c, We can directly relate the variance of the distribution to the average state purity. We ﬁt an exponential to the square root of Purity. We compare this purity-derived number per-cycle= 0.00276 to a similar number per-cycle=0.00282 derived from tomographic measure of purity, and see good agreement. The error of XEB, which also includes control errors, is slightly higher at error per-cycle=0.00349.

20

diﬀer from the isolated case, where we benchmark each pair individually, due to imperfections such as control crosstalk and stray qubit-qubit interactions. In the quantum supremacy algorithm, we partition the set of twoqubit gates into four layers, each of which can be executed in parallel. We then cycle through these layers interleaved with randomly chosen single-qubit gates (see Fig. 3a). However, it is intractable to directly use fullsystem XEB to benchmark our entangling gates for two reasons: we would simultaneously optimize over the unitary model parameters of every entangling gate, and the classical simulation would be exponentially expensive in system size.
We solve this problem with “per-layer” parallel XEB (see Ref. [52] for a related technique in the context of RB). Instead of alternating among the four layers of entanglers, where each qubit becomes entangled with each of its neighbors, we perform four separate experiments, one for each layer. The experiment sequences are illustrated in Fig. S19a. For each layer, we construct parallel sequences where the layer is repeated with interleaved single-qubit gates; nominally, each qubit only interacts with one other. Following each parallel XEB sequence, we measure all the qubits and extract the equivalent XEB data for each pair. Every two-qubit gate can be characterized in these four experiments, regardless of system size. The optimization and classical simulation are also eﬃcient, as each pair can be analyzed individually.
We present experimental results of “per-layer” parallel XEB in Fig. S19b-c. In Fig. S19b, we compare the performance in the isolated and simultaneous (parallel) experiments. In both cases, the optimized XEB error is close to purity-limited. Simultaneous operation modestly increases the error, by roughly 0.003. This increase is primarily from purity error, which would arise from unintended interactions with other qubits, where coherent errors at the system scale manifest as incoherent errors when we focus on individual pairs. The unitaries we obtain in the simultaneous case diﬀer slightly from the isolated case, which would arise from control crosstalk and unintended interactions. To quantify how these diﬀerences aﬀect the gate error, we recalculate the error with the unitaries from the isolated optimization and the data from the simultaneous experiment, which increases the error. We also plot the distributions of the diﬀerences in unitary model parameters in Fig. S19c. The dominant change is in ∆+, a single-qubit phase.
D. Grid readout calibration
1. Choosing qubit frequencies for readout
The algorithm described in Section VI B 4 generally chooses qubit idling frequencies which are far detuned from the resonator to optimize for dephasing. However, these idling frequencies are not optimal for performing readout. To address this problem, we dynamically bias

each qubit to a diﬀerent frequency during the readout phase of the experiment. The qubit frequencies during readout are shown in Fig. S20 (compare to Fig. S13).
To choose the qubit frequencies for readout, we ﬁrst measure readout ﬁdelity as a function of qubit frequency and resonator drive frequency at a ﬁxed resonator drive power, in each of the isolated single qubit conﬁgurations. This scan captures errors due to both non-optimal detuning between the qubit and resonator, as well as regions with low T1 values due to TLSs. We then use the data for each qubit and a few constraints to optimize the placement of the qubit frequencies during readout, using the same optimization technique that was described in Section VI B 4. We describe two of the important constraints and related error reduction techniques below.
First, because the coupling between qubits relies on a dispersive interaction with the coupler, the coupling would no longer be oﬀ when the qubits were detuned by a signiﬁcant amount from their idling positions. Thus, we impose a constraint that qubits should not be placed near resonance during readout. Nevertheless, we found that for some pairs of qubits, we had to dynamically bias the coupler during readout to avoid any swapping transitions between the qubits during readout. This readout coupler bias is found by sweeping the coupler bias and maximizing the two-qubit readout ﬁdelity.
Second, the pattern of the bare resonator frequencies on the chip as shown in Fig. S20 led to an unexpected problem. Pairs of readout resonators which were coupled to neighboring qubits and were also within a few MHz in frequency space were found to have non-negligible coupling. This coupling was strong enough to mediate swapping of photons from one resonator to the other. The pairs of qubits with similar resonator frequencies were all located in a diagonal chain bisecting the qubit grid, as shown by the red outline in Fig. S20. To mitigate this problem, we arrange the qubit frequencies for these qubits so that the resonator eigenfrequencies are as far apart as possible. The resulting spectral separation is not quite enough to eliminate all deleterious eﬀects, so in addition, we use correlated discrimination on the eight of the qubits in this chain. In other words, we use the results of all eight detector values to determine which one of 28 = 256 states the eight qubits were in. All other qubits in the grid are discriminated as isolated qubits.
2. Single qubit calibration
After placing the qubit frequencies for readout, we calibrate and ﬁne tune the readout parameters for each qubit. For each qubit, we use a 1 µs drive pulse and a 1 µs demodulation window. We summarize the procedure for choosing the remaining parameters as follows:
• Choose the resonator drive frequency to maximize the separation between measurements performed with the qubit in either |0 and |1 [16].

21

a

b

m

Layer 0

m

Layer 1

Layer 2

m

c

m

Layer 3

FIG. S19. Parallel XEB. a, Schematics of four device-wide sequences, one for each entangler layer. Black points are active qubits, colored circles are single-qubit gates, and colored lines are two-qubit gates. We cycle between single- and two-qubit gates m times. Compare to Fig. 3a, main text, where the layers are interleaved. b, Integrated histograms of Pauli error e2c (see Fig. 2a, main text). These include isolated results, where each entangler is measured in its own experiment, and simultaneous (parallel) results. Purity is speckle purity. c, Diﬀerence, δ, in unitary model parameters (Eq. 43) between the unitaries obtained in the isolated and simultaneous experiments. δ∆− is not plotted because it has a negligible eﬀect on the unitary when θ ≈ 90 degrees.

• Choose the resonator drive power to hit a target separation between |0 and |1 , so that the error due to this separation is below a 0.3% threshold. We do not choose the readout power to maximize the separation as doing so would saturate our ampliﬁers, and cause unwanted transitions of the qubit state [17, 22, 53, 54].
• Find the optimal demodulation weight function by measuring the average detector voltage as a function of time during the course of the readout pulse [16, 21].
• Finally, choose the discrimination line between the measurement results for |0 and |1 , except as noted in the previous section where we need to apply correlated discrimination.
After completing these calibrations, we check each qubit’s readout ﬁdelity by preparing either |0 or |1 and reading the qubit out. We deﬁne the identiﬁcation error to be the probability that the qubit was not measured in the state we intended to prepare. We achieve 0.97% median identiﬁcation error for the |0 state, and 4.5% for |1 , when each qubit is measured in isolation. The full

distribution is shown in dashed lines in Fig. S21a. We conjecture that the error in |0 is due to thermal excitation during preparation or measurement, and that the error in |1 is due to energy relaxation during readout.
3. Characterizing multi-qubit readout
To assess the ﬁdelity of multi-qubit readout, we prepare and measure 150 random classical bitstring states with 53 qubits, with 3000 trials per state. We ﬁnd that 13.6% of all trials successfully identiﬁed the prepared state. We can decompose this overall ﬁdelity in two ways. First, we plot in solid lines in Fig. S21 the errors for each qubit during simultaneous readout, averaged over the 150 random bitstrings. We ﬁnd that the median errors increase from 0.97% for |0 and 4.5% for |1 in isolation, to 1.8% and 5.1% for simultaneous readout. We do not yet understand the root causes of this increase in error. In addition, we show in Fig. S21 the distribution of errors among the multiqubit results. We see that the most likely error is one lost excitation in the measured state.

22

TABLE II. Aggregate system parameters

Parameter
Qubit maximum frequency Qubit idle frequency Qubit frequency at readout Readout drive frequency Qubit anharmonicity Resonator linewidth κ/2π Qubit-resonator coupling g/2π T1 at Idle Frequency Readout error |0 isolated / simultaneous Readout error |1 isolated / simultaneous 1Q RBa e1 1Q RBa e1 (π/2 gateset) 1Q RBa tomographic e1 purity 1Q XEB e1 isolated / simultaneous 1Q XEB e1 purity isolated / simultaneous 2Q XEB e2 isolated / simultaneous 2Q XEB e2c isolated / simultaneous 2Q XEB e2c purity isolated / simultaneous Measurement em isolated / simultaneous

Median 6.924 6.661 5.750 4.618 -208.0 0.64 72.3 15.54
0.97 / 1.8 4.5 / 5.1
0.19 0.15 0.14 0.13 / 0.14 0.11 / 0.11 0.30 / 0.60 0.64 / 0.89 0.59 / 0.86 2.83 / 3.50

Mean 6.933 6.660 5.766 4.588 -208.0 0.69 72.1 16.04 1.2 / 2.3 5.0 / 5.5 0.22 0.16 0.15 0.15 / 0.16 0.11 / 0.12 0.36 / 0.62 0.65 / 0.93 0.62 / 0.89 3.05 / 3.77

Stdev. 0.114 0.057 0.360 0.076
4.7 0.23 2.8 4.00 0.8 / 2.1 1.8 / 2.2 0.10 0.06 0.04 0.05 / 0.05 0.03 / 0.03 0.17 / 0.24 0.20 / 0.26 0.20 / 0.24 1.09 / 1.61

a RB data taken at a later date

Units GHz GHz GHz GHz MHz MHz MHz
µs % % % % % % % % % % %

Figure S22 S13 S20 S20 S22 S22 S22 S22 S21 S21 S23 S23 S23
3a (main) S23 S23
3a (main) 3a (main)
S19 3a (main)

E. Summary of system parameters
Table II reports aggregate values for qubit and pair parameters in our processor. A complete table of singlequbit parameter values by qubit is available in supporting online materials, Ref. [55], and illustrated in Figs. S22 through S24. Single-qubit metrics represent a sample size of 53. Two-qubit metrics represent 86 pairs.

a

Readout drive frequency (GHz)

23

a

Isolated

Simultaneous

Integrated histogram

Row

Column

b

Readout qubit f10 (GHz)

b

Identiﬁcation error

(Meas - Prep) excitations

Row

Column

Hamming distance
FIG. S21. Readout errors. a, Histogram of readout errors for each qubit when prepared in |0 or |1 , and readout in isolation or simultaneously. b, Distribution of errors in multiqubit readout. The x-axis Hamming distance is the number of bits that are diﬀerent between measured and prepared states, while the y-axis is the diﬀerence in the number of 1s in the states. For example, if we prepare |011 and measure |101 , the Hamming distance is 2 and the diﬀerence in the number of excitations is 0.

FIG. S20. a, Drive frequencies for the readout resonators for each qubit. The red outline shows the area where we had to perform correlated discrimination because of unwanted crosscouplings between the resonators. b, Qubit frequencies during readout, found using a frequency optimization procedure.

24 FIG. S22. Typical distribution of single-qubit parameters over the Sycamore processor.

25
FIG. S23. Typical distribution of single-qubit gate benchmarking errors over the Sycamore processor, for both isolated and simultaneous operation.

26 FIG. S24. Typical distribution of readout errors over the Sycamore processor, for both isolated and simultaneous operation.

27

VII. QUANTUM CIRCUITS
A. Background
We sample the output of random quantum circuits (RQCs) with two use cases in mind: performing a computational task beyond the reach of state-of-the-art supercomputers (quantum supremacy); and estimating the experimental ﬁdelity (performance evaluation).
In order for the RQCs to cover both use cases, we deﬁne a circuit family with a varying number of qubits n and cycles m. Our quantum supremacy demonstration uses RQCs with a large number of qubits n = 53 and high depth m = 20. Large number of qubits hinders wave function (Schr¨odinger) simulation and high depth impedes tensor network (Feynman) simulation (see Sec. X B). We ﬁnd that the most competitive classical simulator for our hardest RQCs is the Schr¨odinger-Feynman algorithm (SFA, see Sec. X A) which copes well with high depth circuits on many qubits.
SFA takes as input an n-qubit quantum circuit and a cut which divides n = n1 + n2 qubits into two contiguous partitions with n1 and n2 qubits. The algorithm computes the output state as the sum over simulation paths formed as the product of the terms of the Schmidt decomposition of all cross-partition gates. By the distributive law there are rg such simulation paths for a circuit with g cross-partition gates of Schmidt rank r. Consequently, the algorithm achieves runtime proportional to (2n1 + 2n2 )rg. Circuit cuts with n1, n2 and g that make the simulation task tractable are called promising cuts. The most promising cut for our largest RQCs runs parallel to the shorter axis of the device starting in the vicinity of the broken qubit. The sum over the simulation paths can be interpreted as tensor contraction. In this view, the rg factor can be thought of as the bond dimension associated with the circuit partitioning, i.e. the cardinality of the index set ranged over in the contraction corresponding to all cross-partition gates. SFA is described in more detail in [38] and section X.
B. Overview and technical requirements
The two use cases for our RQCs give rise to a tension in technical requirements at the heart of quantum supremacy. On the one hand, supremacy RQC sampling should by deﬁnition be prohibitively hard to simulate classically. On the other hand, performance evaluation entails classical simulation of the RQCs. To resolve the conﬂict, we note that the ﬁdelity of a RQC experiment depends primarily on the number and quality of the gates. By contrast, the simulation cost is highly sensitive to minor perturbations in the circuit. Consequently, experiment ﬁdelity for RQCs that cannot be simulated directly may be approximated from the experiment ﬁdelity of similar RQCs obtained as the result of transformations that reduce simulation cost without signiﬁcantly

aﬀecting experiment ﬁdelity (see Section VII G). Performance evaluation using XEB provides another
design consideration. The procedure requires knowledge of the cross-entropy of the theoretical output distribution of the circuit. An analytical expression for this quantity has been derived in [27] for circuits whose measurement probabilities approach the Porter-Thomas distribution. We ﬁnd that our RQCs satisfy this assumption when the circuit depth is larger than 12, see Fig. S35a. Note that high circuit depth also increases the cost of classical simulation.
C. Circuit structure
A RQC with n qubits generally utilizes qubits 1 through n in the qubit order shown in Fig. S27 with small deviations from this default qubit ordering in some circuits. The qubit order has been chosen to ensure that for most RQCs with fewer than 51 qubits, there is a partitioning of the qubits into two similarly sized blocks connected by only ﬁve couplers. The next larger RQC, with 51 qubits, has seven couplers along the most promising circuit cut. Since the cost of SFA grows exponentially in the number of gates across the partitions our circuit geometry leads to a steep increase in the simulation cost of 51-qubit RQCs relative to the circuits with fewer qubits. This creates a sizeable gap in the computational hardness between most of our evaluation circuits and the quantum supremacy circuits (n = 53).
In the time dimension, each RQC is a series of m full cycles and one half cycle followed by measurement of all qubits. Every full cycle consists of two steps. In the ﬁrst step, a single-qubit gate is applied to every qubit. In the second step, two-qubit gates are applied to pairs of qubits. Diﬀerent qubit pairs are allowed to interact in diﬀerent cycles. Speciﬁcally, in the supremacy RQCs we loop through the direct neighbors of every qubit over the eight-cycle sequence ABCDCDAB and in the evaluation RQCs we use the four-cycle sequence EFGH where A, B, ..., H are coupler activation patterns shown in Fig. S25. The sequence is repeated in subsequent cycles. The cost of SFA simulation is highly sensitive to the speciﬁc sequence employed in a circuit, see VII G 2. Border qubits have fewer than four neighbors and no gate is applied to them in some cycles. The half cycle preceding the measurement consists of the single-qubit gates only. The overall structure of our RQCs is shown in Fig. 3 of the main paper [1].
D. Randomness
Single-qubit gates in every cycle are chosen randomly using a pseudo-random number generator (PRNG). The generator is initialized with a seed s which is the third parameter for our family of RQCs. The single-qubit gate applied to a particular qubit in a given cycle depends only

Pattern A

Pattern B

Pattern C

28
Pattern D

Pattern E

Pattern F

Pattern G

Pattern H

FIG. S25. Coupler activation patterns. Coupler activation pattern determines which qubits are allowed to interact simultaneously in a cycle. Quantum supremacy RQCs utilize the staggered patterns shown in the top row in the sequence ABCDCDAB, repeated in subsequent cycles. Performance evaluation RQCs employ the patterns shown in the bottom row in the sequence EFGH, likewise repeated in subsequent cycles. The former sequence makes SFA simulation harder by facilitating prompt transfer of entanglement created at promising circuit cuts into the bulk of each circuit partition.

on s. Consequently, two RQCs with the same s apply the same single-qubit gate to a given qubit in a given cycle as long as the qubit and the cycle belong in both RQCs as determined by their size n and depth m parameters.
Conversely, the choice of single-qubit gates is the sole property of our RQCs that depends on s. In particular, the same two-qubit gate is applied to a given qubit pair in a given cycle by all RQCs that contain the pair and the cycle.

E. Quantum gates
In our experiment, we conﬁgure three single-qubit gates. Each one is a π/2-rotation around an axis lying on the equator of the Bloch sphere. Up to global phase, the gates are

X 1/2

≡

RX (π/2)

=

√1 2

1 −i

−i 1

,

(50)

Y

1/2

≡

RY

(π/2)

=

√1 2

1 1

−1 1

,

√

(51)

W 1/2

≡

RX +Y

(π/2)

=

1 √
2

√1 − i −i 1

(52)

√

√

where W = (X + Y )/ 2 and ±i denotes the princi-

pal value of the square root. The ﬁrst two belong to the

single-qubit Cliﬀord group, while W 1/2 is a non-Cliﬀord

gate. Single-qubit gates in the ﬁrst cycle are chosen in-

dependently and uniformly at random from the set of the

three gates above. In subsequent cycles, each single-qubit gate is chosen independently and uniformly at random from among the gates above except the gate applied to the qubit in the preceding cycle. This prevents simpliﬁcations of some simulation paths in SFA. Consequently, there are 3n2nm possible random choices for a RQC with n qubits and m cycles.
Two-qubit gates in our RQCs are not randomized, but are determined by qubit pair and cycle number. The gates preserve the number of ground and excited states of the qubits which gives their matrices block diagonal structure with 1×1, 2×2 and 1×1 blocks. Therefore, up to global phase they belong to U (1) ⊕ U (2) ⊕ U (1)/U (1) and thus can be described by ﬁve real parameters (see Fig. S16, and Eq. 43). Each gate in this family can be decomposed into four Z-rotations described by three free parameters and the two-parameter fermionic simulation gate

1 0

0

0

fSim(θ,

φ)

=

0  0

cos(θ) −i sin(θ)

−i sin(θ) cos(θ)

0 
0

(53)

00

0

e−iφ

which is the product of a fractional iSWAP and controlled phase gate (see Fig. S14b).
In our experiment, we tune up the two-qubit gates close to θ ≈ π/2 and φ ≈ π/6 radians and then infer more accurate values of all ﬁve parameters for each qubit pair using XEB. Consequently, all ﬁve parameters of the two-qubit gate depend on the qubit pair. While inferred unitaries are suitable for RQC sampling, future applications of the Sycamore processor, for example, in

29

quantum chemistry, will require precise targeting of the entangling parameters [49, 56]. The three parameters which control the Z-rotations implicit in the two-qubit gates can be canceled out with active Z-rotations turning an arbitrary ﬁve-parameter gate into pure fSim(θ, φ). In our RQCs, we have decided not to apply such correction gates. This choice aﬀords us greater number of interactions within the available circuit depth budget and introduces additional implicit non-Cliﬀord single-qubit gates into the RQCs.
The Z-rotations have two origins. First, they capture the phase shifts due to qubit frequency excursions during the two-qubit gate. Second, they account for phase changes due to diﬀerent idle frequencies of the interacting qubits. The latter introduces dependency of the three parameters deﬁning the Z-rotations on the time at which the gate is applied. By contrast, for a given qubit pair θ and φ do not depend on the cycle.
The fSim(π/2, π/6) gate is the product of a nonCliﬀord controlled phase gate and an iSWAP which is a two-qubit Cliﬀord gate.
F. Programmability and universality
Programmability of Sycamore rests on our ability to tune up a variety of gate sets including sets that are universal for quantum computation. For example, the set of gates employed in our quantum supremacy demonstration is universal, as we show in this section.
The proof consists of two parts. First, we show that the CZ gate can be obtained as a composition of two fSim gates and single-qubit rotations. Second, we outline how the well-known proof that the H and T gates are universal for SU(2) [57] can be adapted for X1/2 and W 1/2. The conclusion follows from the fact that the gate set consisting of the CZ gate and SU(2) is universal [58].

1. Decomposition of CZ into fSim gates

Here, we show how to decompose a controlled-phase gate into two fSim gates and several single-qubit gates. The fSim gate is native to our hardware and can be decomposed into
fSim(θ, φ) = e−iθ(X⊗X+Y ⊗Y )/2 e−iφ(I−Z)⊗(I−Z)/4 , (54)

where the iSWAP angle θ π/2 and the controlled-phase angle φ π/6. The controlled-phase part can be further decomposed into

e−iφ(I −Z )⊗(I −Z )/4

= e−iφ/4 eiφ(Z⊗I+I⊗Z)/4 e−iφZ⊗Z/4 .

(55)

To simplify notations, we introduce the two-qubit gate

Υ(θ, φ) = e−iθ(X⊗X+Y ⊗Y )/2 e−iφZ⊗Z/4 = eiφ/4 e−iφ(Z⊗I+I⊗Z)/4 fSim(θ, φ) ,

(56)

which is equivalent to the fSim gate up to single-qubit Z rotations. The sign of θ in Υ(θ, φ) can be changed by the single-qubit transformation,

Z1 Υ(θ, φ) Z1 = Υ(−θ, φ) ,

(57)

where Z1 = Z ⊗ I (Z2 = I ⊗ Z works equally well). Multiplying two Υ gates with opposite values of θ on
both sides the operator X1 = X ⊗ I, we have
Υ(−θ, φ) X1 Υ(θ, φ) = eiθY ⊗Y /2 X1 e−iθY ⊗Y /2 = cos θ X1 + sin θ Z ⊗ Y . (58)

With the identity (58), we have

Υ(−θ, φ) eiαX1 Υ(θ, φ) = cos α

φ

φ

cos I ⊗ I − i sin Z ⊗ Z

+ i sin α

cos θ X ⊗ I + sin θ Z ⊗ Y

2

2

φ

φ

= cos α cos I + i sin α cos θ X ⊗ I − iZ ⊗ cos α sin Z − sin α sin θ Y , (59)

2

2

where 0 ≤ α ≤ π/2 is to be determined. We introduce the Schmidt operators

Γ1(α) = cos α cos(φ/2) I + i sin α cos θ X , (60)

Γ2(α) = cos α sin(φ/2) Z − sin α sin θ Y ,

(61)

and the unitary (59) takes the simple form

Υ(−θ, φ) eiαX1 Υ(θ, φ) = Γ1 ⊗ I − iZ ⊗ Γ2 . (62)

The Schmidt rank of this unitary is two. Therefore, it is equivalent to a controlled-phase gate (also with Schmidt

rank two) up to some single-qubit unitaries. The two non-zero Schmidt coeﬃcients of the unitary (59) are equal to the operator norms of Γ1, 2.
The target controlled-phase gate that we want to decompose into the fSim gate is

diag 1, 1, 1, e−iδ = e−iδ(I−Z)⊗(I−Z)/4 ,

(63)

where 0 ≤ δ ≤ 2π. It has two non-zero Schmidt coefﬁcients cos(δ/4) and sin(δ/4). For example, we set the operator norm of Γ2 to be equal to the second Schmidt

30

coeﬃcient of the target unitary

2. Universality for SU(2)

Γ2(α) = cos α sin(φ/2) 2 + sin α sin θ 2

= sin(δ/4) ,

(64)

and the parameter α can be determined

sin(δ/4)2 − sin(φ/2)2

sin α = sin(θ)2 − sin(φ/2)2 .

(65)

This equation has a solution if and only if one of the following two conditions is satisﬁed

|sin θ| ≤ sin(δ/4) ≤ |sin(φ/2)| ,

(66)

|sin(φ/2)| ≤ sin(δ/4) ≤ |sin θ| .

(67)

A large set of controlled-phase gates can be implemented with the typical values of θ and φ of the fSim gate, except for those that are very close to the identity.
To ﬁx the local basis of the ﬁrst qubit in Eq. (59), we introduce two X rotations of the same angle

e−iξX/2 Γ1(α) e−iξX/2 = cos(δ/4) I , e−iξX/2 Z e−iξX/2 = Z ,

(68) (69)

where the angle ξ is

tan α cos θ π

ξ = arctan

+ 1 − sgn cos(φ/2) .

cos(φ/2) 2

(70)

To ﬁx the local basis of the second qubit in Eq. (59), we introduce two X rotations of opposite angles

eiηX/2 Γ2(α) e−iηX/2 = sin(δ/4) Z ,

(71)

where the angle η is

tan α sin θ π

η = arctan

+ 1 − sgn sin(φ/2) .

sin(φ/2) 2

(72)

Applying these local X rotations before and after the gate sequence in Eq. (59), we have

e−i(ξX1−ηX2)/2 Υ(−θ, φ) eiαX1 Υ(θ, φ) e−i(ξX1+ηX2)/2

= cos(δ/4) I ⊗ I − i sin(δ/4) Z ⊗ Z ,

(73)

which is the desired controlled-phase gate up to some single-qubit Z rotations.
The target controlled-phase gate equals to the CZ gate for δ = π. We numerically checked that the decomposition (73) yields the CZ gate for all 86 fSim gates (with diﬀerent values of θ and φ) in our device.

Here, we show how the argument for the well-known result that the H and T gates are universal for SU(2) [57] can be adapted for the X1/2 and W 1/2 gates. At the core of the argument lies the observation that T ≡ RZ (π/4) followed by HT H ≡ RX (π/4) is a single-qubit rotation by angle α which is an irrational multiple of π. Speciﬁcally, α is such that

cos α = cos2 π = 1

1 1+ √

.

(74)

2

82

2

By Theorem B.1 in Appendix B of [57], α/π is irrational
because the monic minimal polynomial with rational coeﬃcients of eiα

x4 + x3 + 1 x2 + x + 1

(75)

4

is not cyclotomic (since not all its coeﬃcients are inte-
gers). Similarly, W 1/2 ≡ RX+Y (π/2) followed by X1/2 ≡
RX (π/2) is a single-qubit rotation by angle β such that

cos

β

=

cos2

π

−

1 √

sin2 π = 1

1 1− √

.

(76)

2

4 2 42

2

The monic minimal polynomial with rational coeﬃcients of eiβ is (75), the same as that of eiα. Therefore, β is also
an irrational multiple of π. The rest of the universality argument for H and T also applies in the case of X1/2 and W 1/2.

G. Circuit variants
Since XEB entails classical simulation, it is hard or impossible to use it to estimate experimental ﬁdelity of circuits which are hard or impossible to simulate classically. As described above, we designed our RQCs to ensure that an eﬀective partitioning for SFA exists for circuits with fewer than 51 qubits. This gives rise to a signiﬁcant gap in the cost of classical simulation between quantum supremacy circuits and most of our performance evaluation circuits. This gap facilitates performance evaluation of the Sycamore processor near the quantum supremacy frontier. In practice, however, we would like greater control over the simulation hardness, for two reasons. First, performance evaluation is still very costly for large n approaching the supremacy frontier. Second, we would like to be able to estimate the ﬁdelity of supremacy RQCs more directly, even though classical simulation of this case is unfeasible by deﬁnition.
In order to achieve more ﬁne-grained control over the cost of classical simulation of our RQCs, we exploit the fact that the experimental ﬁdelity depends primarily on

31

Circuit variant Gates elided Sequence of patterns If the error probability of the elided two-qubit gate is

non-simpliﬁable full

none

ABCDCDAB

similar to the error probability of the two-qubit identity

non-simpliﬁable elided non-simpliﬁable patch
simpliﬁable full simpliﬁable elided simpliﬁable patch

some all
none some
all

ABCDCDAB ABCDCDAB
EFGH EFGH EFGH

gate which it is replaced with, the circuit resulting from gate elision exhibits ﬁdelity that is similar to the ﬁdelity of the original circuit. This assumption holds when the two-qubit gate errors are dominated by the same decoherence processes that govern the single-qubit gate er-

TABLE III. Circuit variants. Six variants of RQCs employed in quantum supremacy demonstration (nonsimpliﬁable full) and performance evaluation (remaining ﬁve variants) classiﬁed by transformations applied in order to con-

rors such as ﬁnite T1 and T2. Indeed, for circuit sizes where XEB on full circuits is possible, we have observed good agreement between ﬁdelity estimates produced for patch, elided and full circuits. For harder circuits, we

trol the cost of classical simulation. The eight coupler activa- have observed good agreement between ﬁdelity estimates

tion patterns A, B, ..., H are shown in Fig. S25.

for patch and elided circuits. See Section VIII for detailed

discussion of these results.

the number and quality of the gates while the simulation cost is highly sensitive to the structure of the quantum circuit. Therefore, we approximate the experimental ﬁdelity of RQCs which are hard or impossible to simulate from the ﬁdelity of similar RQCs obtained as the result of transformations that reduce simulation cost without signiﬁcantly aﬀecting experimental ﬁdelity.
We employ two such transformations. Each decreases simulation cost by reducing the bond dimension of promising circuit cuts. The ﬁrst one removes some or all cross-partition gates. We say that the removed gates have been elided and term the transformation gate elision. The second transformation changes the sequence of coupler activation patterns shown in Fig. S25 to enable the formation of wedges which reduce the bond dimension by slowing the spread of entanglement generated at the circuit cut.
The two transformations complete the description of RQCs used in our experiment. Consequently, each RQC is uniquely determined by ﬁve parameters: number of qubits n, number of cycles m, PRNG seed s, number of elided gates and the sequence of coupler activation patterns.
1. Gate elision
The most straightforward way to reduce the cost of classical simulation of a RQC is to remove a number of cross-partition gates across the most promising circuit cut. In order to enable independent propagation by the SFA of the wave function of each circuit partition for the ﬁrst few cycles, the gates are elided beginning with the initial cycle. Each elided gate reduces the bond dimension of the partitioning by a factor of two or four, see Section X.
We refer to RQCs with a small number of elided gates as elided circuits. A particularly dramatic speedup is possible when all two-qubit gates across the partitions are elided leading to two disconnected circuits running in parallel. We refer to such disconnected RQCs as patch circuits. Base RQCs in which no gates have been elided are referred to as full circuits.

2. Wedge formation
The most competitive algorithm for our hardest circuits, SFA (see Sec. X A) scales proportionally to the bond dimension of the circuit partitioning which is equal to the product of Schmidt rank of all cross-partition gates (see Sec. X D). The Schmidt decomposition of most two-qubit gates in our RQCs consists of four terms (a few gates can be replaced with simpler gates with Schmidt rank of two, see Section X). Therefore most cross-partition gates contribute a factor of four to the bond dimension of the partitioning. However, when two consecutive cross-partition gates share a qubit forming a wedge as shown in Fig. S26, the Schmidt decomposition of the resulting three-qubit unitary also has only four terms. In other words, the second cross-partition gate does not generally produce substantial new entanglement (as quantiﬁed by the Schmidt rank) among the partitions in excess of the entanglement produced by the ﬁrst gate. Consequently, every wedge reduces the bond dimension of the partitioning by a factor of four.
The eight-cycle sequence ABCDCDAB and the four constituent coupler activation patterns A, B, C and D shown in Fig. S25 have been designed to prevent formation of wedges across promising circuit cuts. In other words, the sequence ensures that entanglement created in a given cycle by cross-partition gates is transferred into the bulk of each partition in the following cycle.
On the other hand, the four-cycle sequence EFGH enables formation of wedges and thus eﬃcient simulation of RQCs using SFA. We employ the latter sequence in most evaluation circuits and use the former eight-cycle sequence for the quantum supremacy circuits and largest evaluation circuits, see Table III.
VIII. LARGE SCALE XEB RESULTS
In Section VI, we have detailed the device calibration processes used for individual components such as qubits, couplers, and coupled pairs of qubits. We have also introduced cross-entropy benchmarking (XEB) as a method

32

Circuit variant n m Single-qubit gates All two-qubit gates Cross-partition two-qubit gates

non-simpliﬁable full 53 20

1113

430

35

non-simpliﬁable elided 53 20

1113

408

13

non-simpliﬁable patch 53 20

1113

395

0

simpliﬁable full 38 14

570

210

18

simpliﬁable elided 38 14

570

204

12

simpliﬁable patch 38 14

570

192

0

TABLE IV. Gate counts. Number of gates in selected random quantum circuits employed for quantum supremacy demonstration and performance evaluation of the Sycamore processor.

FIG. S26. Cross-partition wedge. Two consecutive crosspartition gates which share a qubit form a wedge, as illustrated here with gates highlighted in turquoise and magenta. Schmidt rank of a single two-qubit gate is at most four. Schmidt rank of a wedge is also at most four. Therefore, generally wedges are not eﬃcient at increasing entanglement across partitions and can be simulated eﬃciently by the SFA.
that allows us to evaluate the performance of a quantum system. In this section, we describe how we use a few circuit variations to benchmark our Sycamore processor at a larger scale. In particular, we present a modular version of XEB with “patch circuits” that does not require exponential classical computation resources for estimating XEB ﬁdelities FXEB of larger systems. We also describe the eﬀect of choice of unitary model on large-scale FXEB, as well as how we use patch circuits to monitor the stability of the full system.
A. Limitations of full circuits
We ﬁrst discuss what we refer to as “full circuits”, where for a given set of qubits, all possible two-qubit gates participate in the circuit. With full circuits, we benchmarked the system as a function of size, where as discussed below the classical resources and techniques used to compute the FXEB is a function of the number of qubits. The order in which each qubit was added is labeled in Fig. S27. The rationale behind this ordering is explained in Section VII. At each system size, we executed 10 randomly generated circuit instances on the

FIG. S27. Qubit ordering for large-scale XEB experiments. Illustration of the order in which qubits are added for large-scale experiments. The partition between left (black) and right (blue) qubits along the boundary (dashed red lines) is used in patch and elided circuits, as explained below.
processor and sampled output bitstrings 500k times for each circuit (unless otherwise speciﬁed). To minimize potential instance-to-instance ﬂuctuations, we chose the gate sequences in a persistent, “stable” manner: using a known seed for a random number generator, for each circuit, each time a new qubit is added, we maintain the same gateset for all the “existing” qubits and new gates are only introduced to qubits and pairs associated with the added qubit (see Section VII for details).
Once a suﬃcient number of bitstrings are collected, FXEB can be calculated for each system size, following the method described in Section IV. As the system size increases, the computational complexity of XEB analysis grows exponentially, which can be qualitatively divided into three regimes. For system size from 12 to 37 qubits, XEB analysis was carried out by evolving the full quantum state (Schr¨odinger method) on a high-performance server (88 hyper-threads, 1.5TB memory in our case) using the “qsim” program. At 38 qubits we used a n1ultramem-160 VM in Google’s cloud (160 hyperthreads, 3.8TB memory). Above 38 qubits, Google’s large-scale cluster computing became necessary, and in addition a

33

XEB ﬁdelity

Full circuits Patch circuits
Number of qubits
FIG. S28. Comparison between XEB with patch circuits and full circuits. Full vs. patch circuit benchmarking up to 38 qubits with 14 cycles, showing close agreement to within the intrinsic ﬂuctuations of the system. We plot the results for patch circuits out to 53 qubits.
hybrid Schr¨odinger-Feynman approach, the “qsimh” program, was used to improve the eﬃciency: in this case, we break the system up into two patches, where each patch can be eﬃciently computed via the Schr¨odinger method and then connected by a Feynman path-integral approach (see Section X for more details). Finally we used a Schr¨odinger algorithm in the Ju¨lich supercomputer for some circuits up to 43 qubits.
In order to reduce the computational cost, we introduce two modiﬁed circuit types in the following sections. By using slightly simpliﬁed gate sequences, these two methods can provide good approximate predictions of system performance all the way out to the “quantum supremacy” regime.
B. Patch circuits: a quick performance indicator for large systems
The simplest approach to large-scale performance estimation is referred to as “patch circuits,” which predicts the performance of the full system by multiplying together the ﬁdelities of non-interacting subsystems, or “patches”. In this work, we use two such subsystems, where each patch is roughly half the size of the full system. The two subsystems are run simultaneously, so that eﬀects such as gate and measurement crosstalk between patches are included, but the two patches are analyzed separately when computing the ﬁdelity. The two patches are deﬁned by the gates removed along their boundary, as illustrated in Fig. S27. For suﬃciently large systems, these removed two-qubit gates represent a small portion of the whole circuit. As a consequence, FXEB of the full system can be estimated as the product of the ﬁdelities

of the two subsystems; compared with full circuits, the main missing factor is the absence of entanglement between the two patches.
We evaluate the eﬃcacy of using patch circuits by comparing it against full circuits with the same set of qubits. The experimental results can be seen in Fig. 4a (main text), where we show ﬁdelities measured by these two methods for systems from 12 qubits to 53 qubits, in an interleaved fashion. We re-plot this data here in Fig. S28 as well. As expected, the ﬁdelities obtained via patch XEB show a consistent exponential decay (up to ﬂuctuations arising from qubit-dependent gate ﬁdelities and a small amount of system ﬂuctuations) as a function of system size. For every system size investigated, we found that patch and full XEB provide ﬁdelities that are in good agreement with each other, with a typical deviation of ∼5% of the ﬁdelity itself (we attribute the worst-case disagreement of 10% at 34 qubits due to a temporary system ﬂuctuation in between the two datasets, which was also seen in interleaved measurement ﬁdelity data). Theoretically, one would expect patch circuits to result in ∼ 10% higher ﬁdelity than full circuits due to the slightly reduced gate count. We ﬁnd that patch circuits perform slightly worse than expected, which we believe is due to the fact that the two-qubit gate unitaries are optimized for full operation and not patch operation. In any case, agreement between patch and full circuits shows that patch circuits can be a good estimator for full circuits, which is quite remarkable given the drastic diﬀerence in entanglement generated by the two methods. These results give us a good preview of the system performance in all three regimes discussed earlier.
The advantage of using patch circuits lies in its exponentially reduced computational cost, as it only requires calculating FXEB of subsystems at half the full size (or less if a larger number of smaller patches is used). This allows for quick estimates of large-scale system performance on a day-to-day basis, including for system and circuit sizes in the “quantum supremacy” regime. As a consequence, we typically use patch circuits as a quick system performance indicator, which we use for rapid turnarounds between system calibration and performance evaluation, as well as for monitoring full system stability (see Section VIII H). We also note that patch circuits can be used well beyond 50 qubits, and in fact can be extended to arbitrary numbers of qubits while keeping the analysis time at most linear in the number of qubits (or even constant if the patches can be analyzed in parallel), assuming that the patch size stays roughly constant and more non-interacting patches are added as the number of qubits grows.
C. Elided circuits: a more rigorous performance estimator for large systems
For a more rigorous prediction of full FXEB, we introduce a more sophisticated approach referred to as “elided

34

XEB ﬁdelity

Full circuits Elided circuits (6 elided gates)
Number of qubits
FIG. S29. Comparison between XEB with elided circuits and full circuits. Full vs. elided circuit benchmarking up to 38 qubits at 14 cycles, showing close agreement to within the intrinsic ﬂuctuations of the system.
circuits”. Similar to patch circuits, we partition a given set of qubits into two subsets separated by a boundary, but elide (remove) only a fraction of the two-qubit gates along this boundary during a few early cycles of the sequence (more speciﬁcally, we elide the earliest gates in time, meaning early layers will have none of their gates along the boundary while later layers will have all of their usual gates across the boundary). Accordingly, the two subsets of qubits are no longer isolated from each other and we cannot simply compute their ﬁdelities separately and multiply. Rather, we must still compute the evolution of the full system. Given that a suﬃcient number of gates are elided, we can take advantage of the “weak link” between patches with a hybrid analysis technique: we compute each patch via the Schr¨odinger method and then connect them with a Feynman path-integral approach (see Section X for more details on this “qsimh” program).
Compared with patch circuits, elided circuits more closely approach a description of the full system performance under a full circuit: in addition to capturing issues such as control and readout crosstalk, elided circuits allow entanglement to form between the two weakly connected subsystems. It covers essentially all the possible processes that occur in the full circuit, and therefore can be used to predict system performance at a dramatically reduced computational cost, albeit signiﬁcantly costlier than patch circuits.
In order to validate the use of elided circuits as a system performance estimator, we evaluated its accuracy via a direct comparison with full circuits. In Fig. S29 we show two sets of ﬁdelities from interleaved full and elided circuit experiments. For every system size investigated, using elided circuits yields a ﬁdelity value that is in good agreement with the one obtained with the corre-

sponding full circuits. The average ratio of elided circuit ﬁdelity to full circuit ﬁdelity over all veriﬁcation circuits was found to be 1.01, with a standard deviation of 5%, dominated by system ﬂuctuations. It is this agreement that certiﬁes elided circuits as a precise predictor for full circuits (within a systematic relative uncertainty of 5%), which we rely on to extrapolate the system performance in the regimes where full circuit analysis is too expensive to perform (i.e., Fig. 4b of the main text).
Compared with full circuits, elided circuits can result in a reduced amount of quantum entanglement in the system. The amount of reduced entanglement can be bounded from above by counting the number of iSWAP gates across the boundary: one iSWAP gate generates at most two units of bipartite entanglements (ebits). This upper bound translates directly into the exponential cost of a Schr¨odinger-Feynman simulation. For elided circuits with 50 qubits and 14 cycles, the full circuit has approximately 25 ebits of entanglement, while with 6 elisions the elided circuit has at most 12 ebits entanglement between the two patches. For the 53-qubit elided circuits used in the main paper [1], there were enough iSWAPs across the boundary that the amount of entanglement between patches for full vs. elided circuits should be close, giving us even more conﬁdence in using elided circuits to predict the ﬁdelity of the circuit used to claim quantum supremacy.
D. Choice of unitary model for two-qubit entangling gates
In Section VI, we discussed how the two-qubit gate unitaries can be measured by two diﬀerent approaches: isolated two-qubit XEB and per-layer simultaneous twoqubit XEB. These two methods resulted in two diﬀerent unitary models when deducing the best-ﬁt unitary. Since we must specify the two-qubit gate unitary matrices in order to compute FXEB of the larger system, a natural question is which unitary model should be used. To address this question, we point out that full XEB on the large system occurs in repeated cycles, where during each two-qubit gate layer, all the two-qubit gates in the same orientation take place at the same time (see Fig. 3 in the main text). As a consequence, the two-qubit gate layers during simultaneous pair XEB in Fig. S19 emulate the corresponding layer when running full XEB on a large system. Accordingly, learning the unitaries in parallel operation captures any small coherent modiﬁcations induced by the simultaneous application of the other twoqubit gates, such as ﬂux control crosstalk and dispersive shifts from stray interactions. This is evident from the fact that by re-learning the two-qubit unitary parameters, the errors extracted from simultaneous pair XEB become purity-limited (see Fig. S19). This correspondence assures us that unitary parameters extracted from simultaneous pair XEB provides a more accurate description of the full system when full XEB is performed.

Patch circuit XEB ﬁdelity

35

Eﬀect of unitary model choice on circuit ﬁdelity m = 14 (simpliﬁable circuit pattern)
Simultaneous, w/ arbitrary unitaries Simultaneous, w/ "Sycamore" unitaries Isolated, w/ arbitrary unitaries
Number of qubits, n Simultaneous, w/ arbitrary unitaries Simultaneous, w/ "Sycamore" unitaries Isolated, w/ arbitrary unitaries
n = 53 (non-simpliﬁable circuit pattern)

qubits, and then ﬁtting only for two single-qubit phase terms. For the purpose of benchmarking the system ﬁdelity for the operations we performed, we have focused on using unitaries learned from simultaneous pair XEB, which provide the most accurate description of the system. The validity of this approach is experimentally veriﬁed—for the same gate sequences, using the simultaneous pair XEB unitaries leads to the best full-system ﬁdelity values at every system size. This is direct evidence that the unitaries learned from simultaneous pair XEB form a more accurate description of the system than those from isolated pair XEB.
On the other hand, in order to be useful for generic quantum algorithms, it will be desirable to use calibrated gatesets that are independent of the speciﬁc gate sequences used. For this purpose, it is important to check the circuit ﬁdelity under the other two unitary models, where the two-qubit gate unitaries were calibrated in more generic settings. One can see that ﬁdelities calculated from these two unitary models still demonstrate nearly as good performance despite the addition of small coherent control errors. They diﬀer from the ﬁdelities using the simultaneous pair XEB unitaries by less than a factor of 2 at 50 qubits (ﬁdelity goes from 9 × 10−3 to 5 × 10−3 at 50 qubits). This is remarkable since it suggests going from a 2-qubit setting to 50-qubit setting, our full system calibration precision degrades only by a factor of < 2 despite the system size increasing by a factor of 25. This high precision in gate calibration gives us conﬁdence to use our processors in NISQ algorithms.

Patch circuit XEB ﬁdelity

Number of cycles, m
FIG. S30. Eﬀect of unitary model on full system ﬁdelity. a, Patch circuit ﬁdelity versus number of qubits and choice of unitary model. b, Same but versus number of cycles and for the non-simpliﬁable supremacy circuits. Blue: patch XEB ﬁdelities using the unitaries deduced from the best-ﬁt fSim unitary from isolated pair XEB. Green: patch XEB ﬁdelities using the unitaries deduced from the best-ﬁt fSim unitary from per-layer simultaneous pair XEB. Orange: patch XEB ﬁdelities using the unitaries deduced from the best-ﬁt “Sycamore unitary” (θ = π/2, φ = π/6) from per-layer simultaneous pair XEB. As expected, the best ﬁdelities arise from ﬁtting to the most general unitary in parallel operation, although the ﬁdelities are high enough to achieve quantum supremacy with the Sycamore unitary model as well.
In Fig. S30, we show patch circuit ﬁdelities at diﬀerent system sizes, where the ﬁdelity is evaluated using three diﬀerent unitary models: the best-ﬁt unitaries from isolated pair XEB, the best-ﬁt unitaries from simultaneous pair XEB, and the best-ﬁt “Sycamore” unitaries from simultaneous pair XEB. The Sycamore unitaries are the unitaries obtained when keeping the swap angle ﬁxed at θ = π/2 and conditional phase ﬁxed at φ = π/6 for all

E. Understanding system performance: error model prediction
In this section, we perform additional analysis to compare the measured ﬁdelities to that predicted from the constituent gate and measurement errors.
The most commonly used error model in quantum computing theory is the digital error model. Analogous to the independent noise model in classical information theory, the digital error model is based on the assumption that there are no space and time correlations between errors of quantum gates [27, 59, 60]. If this assumption is valid, it should be possible to construct the ﬁdelity of a large quantum system from the ﬁdelities of its constituent parts: single- and two-qubit gates, and measurement. It is important to point out that the gate ﬁdelity metric that should be used here is the entanglement ﬁdelity, 1 − eP (see Section V for more details). This is the correct quantity to describe the ﬁdelity of quantum operations since, in contrast to other metrics such as the commonly used average ﬁdelity, it is independent of the dimension of the Hilbert space.
In Fig. S31, we show ﬁdelities as a function of both system size and number of cycles (circuit depth), measured with patch circuits. In each plot, we compare the measured ﬁdelities to the predicted ﬁdelities, which

XEB ﬁdelity

XEB ﬁdelity

a

Prediction vs. n (patch circuits @ m=14 cycles)

Measured Predicted (gate error only) Predicted (gate and readout error)

Number of qubits, n

b

Prediction vs. m (patch circuits @ n=51 qubits)

Measured Predicted (gate error only) Predicted (gate and readout error)

Number of cycles, m
FIG. S31. Predicted vs. measured large-scale XEB ﬁdelity. a, Data and two predictions for 14-cycle patch circuits vs. number of qubits. Predictions are based on the product of single- and two-qubit gate entanglement ﬁdelities under simultaneous operation. Blue curve contains measured ﬁdelities. Orange is the prediction based only on gate errors during parallel operation, but without taking measurement error into account. Green is the same but multiplied by the measured readout ﬁdelities. b, Same as the ﬁrst panel, but vs. number of cycles at a ﬁxed number of qubits n = 51. Again, the prediction from simultaneous gate ﬁdelities and measurement ﬁdelity is a good prediction of the actual system performance.
are calculated from a simple multiplication of individual gate entanglement ﬁdelities as measured during simultaneous operation, along with the measurement ﬁdelities obtained during simultaneous measurement. We note that the measured readout ﬁdelities actually also automatically include the eﬀect of state preparation errors as well. More explicitly, if a circuit contains the set of single-qubit gates G1, the set of two-qubit gates G2, and

36

the set of qubits Q, then we approximate the ﬁdelity F as

F = (1 − eg) (1 − eg) (1 − eq),

g∈G1

g∈G2

q∈Q

(77)

where eg are the individual gate Pauli errors and eq are the state preparation and measurement errors of individual qubits. It is evident that there is a good agreement between the measured and predicted ﬁdelities, with deviations of up to only 10-20%. Given that the sequence here involves tens of qubits and ∼ 1000 quantum gates, this level of agreement provides strong evidence to the validity of the digital error model.
This conclusion can be further strengthened by the close agreement between the ﬁdelities of full circuits, patch circuits, and elided circuits. Even though these three methods diﬀer only slightly in the gate sequence, they can result in systems with drastically diﬀerent levels of computational complexity and entanglement between subsystems. The agreement between the ﬁdelities measured by these diﬀerent methods, as well as the agreement with the predicted ﬁdelity from individual gates, gives compelling evidence conﬁrming the assumptions made by the digital error model. Moreover, these assumptions remain valid even in the presence of quantum entanglement.
The validation of the digital error model has crucial consequences, in particular for quantum error correction. The absence of space or time correlations in quantum noise has been a commonly assumed property in quantum error correction since the very ﬁrst paper on the topic [59]. Our data is evidence that such a property is achievable with existing quantum processors.

F. Distribution of bitstring probabilities
In Section IV, we motivate two diﬀerent estimates for ﬁdelity F , one based on the cross entropy, Eq. (28), and the other based on linear cross entropy, Eq. (27). In this section, we examine the probabilities of sampled bitstrings and compare them against theoretical distributions. We use bitstring samples from non-supremacy region to demonstrate the analysis methodology, then apply it to the sample in the supremacy region.
The theoretical PDF for the bitstring probability p with linear XEB is
Pl(x|F ) = (F x + (1 − F ))e−x
where x ≡ Dp is the probability p scaled by the Hilbert space dimension D, and F is the linear cross entropy ﬁdelity. The PDF for log p is
Pc(x|F ) = (1 + F (ex − 1))ex−ex
where x ≡ log(Dp) and F is the cross entropy ﬁdelity.

Pr(p)

a 100

Distribution of bitstring probability, 20 qubits

10 1

10 2

Pl(x|F = 0)

Pl(x|F = 0.218)

10 3

Puln(xif|oFr=ml1y)random bitstrings

experimental bitstrings

ideal circuit output

10 40

1

2

3x = Dp4

5

6

7

b

10 1

Pr(p)

10 2

Pc(x|F = 0) Pc(x|F = 0.218)

Pucn(ixfo|Frm=l1y)random bitstrings

experimental bitstrings

ideal circuit output

10 3 5

4

3 x2= logDp1 0

1

2

FIG. S32. Histograms of ideal probabilities. The ideal probability p is calculated from the ﬁnal state amplitudes of a (20-qubit 14-cycle) random circuit. The blue, orange, and green histogram is the ideal probabilities of bitstrings sampled uniformly at random, from the experiment, and ideal output, respectively. a, The distribution of Dp and theoretical curves Pl(x|Fl) normalized to histogram counts for Fl = 0, Fˆl, 1, respectively. b, The distribution of log(Dp) and theoretical curves Pc(x|Fc) for Fc = 0, Fˆc, 1, respectively.

From a set of bitstrings {qi}, the ﬁdelity is estimated from the ideal probabilities {pi = ps(qi)} as

Fˆl = Dp − 1,

(78)

Fˆc = log(Dp) + γ,

(79)

where γ is the Euler-Mascheroni constant, see Sec. IV B. Figure S32 shows the distribution of {pi} from 0.5 mil-
lion bitstrings obtained in an experiment with a 20-qubit 14-cycle random quantum circuit. For comparison, we produce 0.5 million bitstrings sampled uniformly at random and 0.5 million bitstrings sampled from the output distribution of the ideal circuit and show them in the same ﬁgure. The theoretical distribution curves are also shown, where the ﬁdelity estimated from data is fed into

p-value

101 10 2 10 5 10 8 10 11 10 14 10 17 10 20
0

Kolmogorov distribution

1

2

3

4

Ns DKS

37 5

FIG. S33. The Kolmogorov distribution function. This function is used to compute p-value from a given DKS and number of samples Ns.

the curve Pl(x|Fˆ) and Pc(x|Fˆ). We see good agreements between experiment and the-
ory. To quantify the agreements, we use the KolmogorovSmirnov test [61] to characterize the goodness of ﬁt of data {pi} to theoretical PDFs. First we compute the Kolmogorov-Smirnov statistics DKS, that is, the distance between data and theory as the supremum of point-wise distances between the empirical cumulative distribution function of data ECDF(p) and the theoretical cumulative distribution function CDF(p):
DKS = sup |ECDF(pi) − CDF(pi)|.
i
We then convert the distance DKS to a p-value using the Kolmogorov distribution shown in Fig. S33. The p-value is used for rejecting the null hypothesis that the data {pi} is consistent with the theoretical distribution. The whole Kolmogorov-Smirnov test is done using the scipy package [62] and checked against R package ks.test [63]. Both packages produce consistent results.
We test the ideal probabilities of bitstrings observed in the experiment {pi} against 2 theoretical distributions, one with estimated ﬁdelity F = Fˆ and one with ﬁdelity F = 0. The Kolmogorov-Smirnov statistics DKS and the p-value of every circuit are shown in ﬁgure S34. Note that the p-values for F = 0 are not shown because they are
10−20 due to the large DKS ≈ 0.07 with Ns = 5 × 105 points in the sample. That is evident from reading oﬀ Fig. S33.
We reject the null hypothesis that the experimental bitstrings are consistent with the uniform random distribution with very high conﬁdence for this (20-qubit 14cycle) random circuit.
Now we turn our attention to the supremacy circuits. We use random circuits with gate elisions for checking the distributions because it is exponentially expen-

38

Kolmogorov Smirnov test, 20 qubits 100 10 1 10 2

a
105

Distribution of bitstring probability, 53 qubits Pl(x|Fl) Data histogram

103

Counts

10 3

10 4

p-value, Fl = Fl p-value, Fl = 0

10 5

DKS, Fl = Fl DKS, Fl = 0

100 10 1 10 2

Linear XEB

101

10 1 0
b

5

10

15

20

x = Dp

Pc(x|Fc) Data histogram 104

Counts

10 3

10 4

p-value, Fc = Fc p-value, Fc = 0

10 5

DDKKSS,,

FFcc

= =

F0c

Log XEB

0123456789 circuit index

FIG. S34. The Kolmogorov-Smirnov test results for each of 10 circuits for a (20-qubit 14-cycle) random circuit. See text for the deﬁnition of DKS and p-value. The upper plot is for linear XEB, and the lower one is for log XEB.

102

100

15

10

5

0

5

x = log(Dp)

FIG. S35. Distribution of bitstring probabilities from
a 53-qubit 20-cycle circuit. We calculate the theoretical
probabilities of experimentally-observed bitstrings. a, The distribution of Dp and the theoretical curve Pl(x|Fˆl) normalized to histogram counts. b, The distribution of log(Dp) with theoretical curve Pc(x|Fˆc).

sive to calculate the ideal theoretical probability of a bitstring without gate elisions. The eﬀect on ﬁdelity from gate elisions is well understood, see Sec. VIII C. The gate elisions are chosen to minimize the eﬀect while making the classical estimation feasible, see Sec. VII G 1. We sample Ns = 3 × 106 bitstrings {qi|i = 1...Ns} from each of 10 (53-qubit 20-cycle) random circuits, and compute the theoretical ideal probabilities of each bitstring {pi|i = 1...Ns}.
The distributions of Dp and log(Dp) from one such circuit along with the corresponding theoretical curves are shown in Fig. S35.
We again use the Kolmogorov-Smirnov test to characterize the goodness of ﬁt of data {pi} to theoretical PDFs with estimated ﬁdelity F = Fˆ and zero ﬁdelity F = 0. The Kolmogorov-Smirnov statistics DKS and the p-value of every circuit are shown in ﬁgure S36.
The p-value for the null hypothesis of zero ﬁdelity is

generally small for every circuit, with a maximum of 0.045 for circuit number 1. We say that the null hypothesis of zero ﬁdelity is rejected better than a 95% conﬁdence level for each circuit. On the other hand, the p-value of null hypothesis of estimated ﬁdelity Fˆ is generally large. The p-value is between 0.18 and 0.98 for linear XEB, and between 0.33 and 0.98 for log XEB. That indicates that the empirical cumulative distribution functions ECDF(pi) from data is quite consistent with the theoretical CDF(pi|Fˆ).
As will be seen in Fig. S38 in section VIII G below, the ﬁdelity of individual circuits are consistent with each other within the statistical uncertainties. Therefore it makes sense to do a Kolmogorov-Smirnov test on all samples combined, containing 30 million bitstrings. The estimated ﬁdelities from the combined sample are Fˆl = 2.24 × 10−3 and Fˆc = 2.34 × 10−3, respectively. The

39

Kolmogorov Smirnov test, 53 qubits

100

10 1

10 2

10 3

10 4

10 5

p-value, Fl = Fl p-value, Fl = 0

10 6

DKS, Fl = Fl DKS, Fl = 0

Linear XEB

100

10 1

10 2

10 3

10 4

10 5

p-value, Fc = Fc p-value, Fc = 0

10 6

DDKKSS,,

FFcc

= =

F0c

Log XEB

0123456789 circuit index

FIG. S36. The Kolmogorov-Smirnov test results for random circuits with 53 qubits. The upper plot is for linear XEB, and the lower one is for log XEB.

DKS

p-value

F = Fˆ F = 0 F = Fˆ F = 0

Linear XEB 1.3 × 10−4 9.6 × 10−4 0.66 < 2.2 × 10−16

Log XEB 9.5 × 10−5 9.6 × 10−4 0.95 < 2.2 × 10−16

TABLE V. The Kolmogorov-Smirnov test results on combined samples.

DKS and p-values are listed in table V. The p-value for the null hypothesis of F = 0 is very small: p-value = 3 × 10−24 from scipy, and p-value < 2.2 × 10−16 from R. We note the more conservative value in the table. The null hypothesis of F = 0 is rejected with much higher conﬁdence levels than individual circuits.

G. Statistical uncertainties of XEB measurements
In this section we check the statistical uncertainties of our ﬁdelity estimates against theoretical predictions.
The statistical uncertainties of Fˆl and Fˆc are estimated from data using the standard error-on-mean formula as
σˆFl = D Var(p)/Ns,
σˆFc = Var(log p)/Ns,
where Var(x) is the variance estimator of sample {xi}. Because the distribution of p and log p have ﬁnite variances both experimentally and theoretically, we can use the bootstrap procedure [64] to verify the estimate of statistical uncertainties.
The ﬁdelity distribution from 4000 bootstrap samples are shown in Fig. S37. The distribution of Fˆl and Fˆc are each ﬁt to a Gaussian distribution function using maximum likelihood.
The Kolmogorov-Smirnov test on the Gaussian ﬁt produces p-values of 0.99 and 0.41 for Fˆl and Fˆc bootstrap distributions, respectively. It indicates that the central limit theorem is at work and the distributions are consistent with Gaussian distributions.
The estimated statistical uncertainty, the standard deviation of the bootstrap distribution, and the σ parameter of the Gaussian ﬁt are compared against each other to verify that the statistical uncertainty estimate is minimally biased. For the example circuit used in the ﬁgures, the three parameters are 5.78, 5.78, 5.78 (×10−3) for σˆFl , respectively. The same parameters for σˆFc are 7.40, 7.46, 7.46 (×10−3). The relative diﬀerences are less than 1%, consistent with the expected agreement of parameters for 4000 bootstrap samples.
We repeat the bootstrap procedure on all ten 53-qubit 20-cycle circuits with 2500 bootstrap resamples. The statistical uncertainty estimates are all within 3.1% of the bootstrap standard deviation.
The combined linear cross entropy ﬁdelity and statistical uncertainty of 10 random circuits is calculated using inverse-variance weighting to be Fˆl = (2.24±0.18)×10−3. The theoretical prediction of the statistical uncertainty,
(1 + 2F − F 2)/Ns, is 1.8 × 10−4, which agrees with the experimental estimate. As a comparison, the combined cross entropy ﬁdelity is Fˆc = (2.34 ± 0.23) × 10−3. The theoretical prediction of statistical uncertainty, (π2/6 − F 2)/Ns, is 2.3 × 10−4, which agrees with the experimental estimate as well. Thus, the cross entropy ﬁdelity and linear cross entropy ﬁdelity estimators produce consistent results. Furthermore, the statistical uncertainty of the linear cross entropy estimator is smaller, as expected from its theoretical formula.
In Fig. S38, we also show the linear XEB ﬁdelities and 5σ statistical uncertainties of all 10 elided circuit instances for each circuit depth from Fig. 4b of the main text. Variations between the ﬁdelities of diﬀerent circuit instances are consistent with the expected statistical noise due to the ﬁnite number of samples. In the last

40

a
102

Distribution of fidelities from bootstrapping Gaussian fit Data histogram

Counts

101

100 0.000
b
102

0.001 0.002 0.003 0.004 Linear cross entropy fidelity Fl
Gaussian fit Data histogram

Counts

101
0.000 0.001 0.002 0.003 0.004 Cross entropy fidelity Fc
FIG. S37. Distribution of ﬁdelity from 4000 bootstrap samples. a, The distribution of bootstrap Fˆl. The theoretical curve is a Gaussian ﬁt normalized to histogram counts. b, The distribution of bootstrap Fˆc, with Gaussian ﬁt.
panel, we also show the smaller statistical uncertainties of the ﬁdelity averaged over the 10 circuit instances for each depth.
H. System stability and systematic uncertainties
In addition to statistical errors, XEB ﬁdelity is also subject to systematic drift as the system performance may ﬂuctuate and/or degrade over time. To quantify these mechanisms, we performed a patch circuits time stability measurement on 53 qubits using a circuit of 16 cycles and 1 million bitstrings for 17.4 hours after calibration. In between these measurements, we measured the ﬁdelity of other 53-qubit circuits with 16 to 20 cycles. The analyzed results are shown in Fig. S39. The statistical uncertainties of the ﬁdelities are estimated to be 1.29 × 10−4, as indicated by the error bars.

We repeated the stability measurements twice, with diﬀerent circuits and on diﬀerent days. Fig. S39 shows the one that exhibits greater degradation as a conservative estimate of the eﬀect. The measurement indicates a degradation of ﬁdelity within the range of time. A linear ﬁt with F = p0 + p1t results in estimated parameters pˆ0 = (5.51 ± 0.055) × 10−3, pˆ1 = (−6.87 ± 0.64) × 10−5, and a correlation coeﬃcient of pˆ0 and pˆ1, ρ, to be -0.76. The χ2 per degree of freedom is 26.3/11.
The p-value for the χ2 for 11 degrees of freedom is 0.0058, indicating that it is not a very good ﬁt. Because the correctness of the estimates of statistical uncertainties has been veriﬁed in Section VIII G, this is attributed to systematic ﬂuctuation in addition to degradation. It is supported by the larger variance of ﬁdelity than the 1σ band in Fig. S39.
The 1σ band depends on the statistical uncertainties of ﬁdelities and the variance of time on the x-axis, but is independent of the variance of ﬁdelity. To take the variance of ﬁdelity into account, we use the variance of the residuals of the linear ﬁt as an estimator of the variance of ﬁdelity. The standard deviation of residuals is estimated to be 1.84 × 10−4, which is added to σp0 in quadrature to be the total σp0 . The estimate is total σp0 = 1.92 × 10−4, 3.5 times larger than the statistical-only σp0 of 5.5×10−5.
The uncertainty on a ﬁdelity measured at time t can be estimated by the standard error propagation, assuming that t is uncorrelated with either p0 or p1.

σF = σp20 + 2tσp0 σp1 ρ + σp21 t2 1/2

(80)

The value of σF as well as the ratio σF /F in the range of measured ﬁdelities monotonically decreases. We take max(σF /F ) as the estimate of relative systematic uncertainty for ﬁdelities measured in the same run. The value is found to be 4.4% and is used in subsequent analysis.
The physical origin of the observed system ﬂuctuations can be attributed to many possible channels: 1/f ﬂux noise, qubit T1 ﬂuctuations, control signal drift, etc. We speculate that the dominant mechanism is the moderate interaction between a small number of TLSs and a few qubits at their idling and/or readout biases. In Fig. S40a, we show the result of measuring per-layer simultaneous pair XEB at a ﬁxed depth of 14 cycles repeatedly over time. The quantity plotted is the ratio of the worst pair ﬁdelity to best ﬁdelity observed over the course of 30 minutes. This type of repetitive measurement allows us to pinpoint which pairs dominate the ﬂuctuations in full system ﬁdelity. Note that because we used ﬁdelity at a ﬁxed cycle depth rather than the one extracted from the exponential decay, these numbers contain the eﬀect of ﬂuctuating measurement ﬁdelity as well.
As shown in Fig. S40a, the depth-14 ﬁdelity of most pairs ﬂuctuates downward by only ∼1% at depth 14, which translates to either a ∼1% ﬂuctuation in measurement ﬁdelity for a pair, or a ∼0.08% ﬂuctuation in the two-qubit gate ﬁdelity for a pair. Before ﬁnding the unstable TLS defect in Fig. S40b, a single qubit dominated the ﬂuctuations in full system ﬁdelity seen

41

a

b

c

XEB Fidelity, 𝓕XEB

d

e

circuit instance
Average over instances f

XEB Fidelity, 𝓕XEB

circuit instance

circuit instance

number of cycles, m

FIG. S38. Per-instance elided circuit ﬁdelities and statistical uncertainties. XEB ﬁdelities of all 10 elided circuit instances for each circuit depth from Fig. 4b of the main tex√t. a to e, Here, each panel corresponds to a single circuit depth m. In these panels, ±5σ statistical error bars, where σ = 1/ Ns, are shown for each of the individual circuit instance ﬁdelities. Also shown is a band corresponding to ±σ for a single instance, but about the mean ﬁdelity of the 10 instances, showing that the variations between circuits can be explained by statistical ﬂuctuations from the ﬁnite number of samples. f, Fidelity averaged over all 10 circuits along with ±5√σ error bars are shown (the same quantity is plotted in Fig. 4b of the main text but on a log scale), where in this case σ = 1/ 10Ns. Here, for all circuit depths, the mean ﬁdelity is more than 5σ above 0.001.

in Fig. S40c. After we moved this problematic qubit far from the ﬂuctuating TLS, the ﬂuctuations in ﬁdelity during the actual quantum supremacy experiment (Fig. S39) were dominated by a handful of pairs containing qubits in the “degenerate” readout region (described in section VI). For these qubits, due to constraints from readout crosstalk we had little freedom in what readout detunings we could choose, and so the best we could do was to put some qubits near defects or transmon-resonator transition modes during readout. We speculate that this is where the remaining dominant ﬂuctuations originate.
I. The ﬁdelity result and the null hypothesis on quantum supremacy
We use the mean ﬁdelity of ten 53-qubit 20-cycle circuits as the ﬁnal benchmark of the system. In section VIII G we estimated the ﬁdelity and statistical uncertainty to be (2.24 ± 0.18) × 10−3 using the linear cross entropy. In section VIII H we estimated the relative systematic uncertainty due to drift to be 4.4%. Combining these 2 estimations we arrive at the ﬁnal ﬁdelity as (2.24 ± 0.10(syst.) ± 0.18(stat.)) × 10−3. Fidelity estimates with statistical and systematic uncertainty for

other quantum circuits are shown in Figure S41.
As we show in section X, a noisy sampling of a random quantum circuit at ﬁdelity F = 10−3 requires 5000 years with a classical computer with CPU power equivalent to 1 million cores, and it scales linearly with ﬁdelity F . It takes a quantum computer less than an hour to complete the same noisy sampling. Therefore we form the null hypothesis that the ﬁdelity of the quantum computer is F ≤ 10−3, and the alternative hypothesis that F > 10−3. If the alternative hypothesis is true, we can say that a classical computer can not perform the same noisy sampling task as the quantum computer.
The total uncertainty on ﬁdelity is estimated with addition in quadrature of systematic uncertainty and statistical uncertainty. The mean ﬁdelity of 10 random circuits with 53 qubits and 20 cycles is (2.24 ± 0.21) × 10−3. The null hypothesis is therefore rejected with a signiﬁcance of 6σ.
While our analysis of the uncertainty in FXEB was computed from both statistical and systematic errors, some care should be taken in the consideration of systematic errors as they pertain to the claim of quantum supremacy. Systematic errors should be included if we wish to use the XEB ﬁdelity value, for example comparing ﬁdelities of patch, elided and full circuits. However

42

Fidelity

0.0058 0.0056 0.0054

53-qubit 16-cycle patch XEB fidelity Measured fidelities Linear max likelihood fit 1 band

0.0052

0.0050

0.0048

0.0046

0.0044

0.0042 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Time after calibration (hours)

FIG. S39. Stability of repeated 53-qubit 16-cycle patch circuit benchmarking over 17.4 hours, without any system recalibration. Statistical error bars from the ﬁnite bitstring sample number are included. The intrinsic system ﬂuctuations are likely dominated by a small number of TLSs moderately coupled to a few qubits at their idling and/or readout biases.

for quantum supremacy, a false claim would arise if FXEB was zero, but we obtained a non-zero value because of a ﬂuctuation. Systematic ﬂuctuations produce a change in magnitude of XEB, as seen in the data in this section, which is thus a multiplicative-type error that does not change the XEB ﬁdelity value when it is zero. A false positive is only produced by a additive-type statistical ﬂuctuations and thus it is the only mechanism that should be considered when computing the uncertainty. Therefore, the 6σ signiﬁcance of our claim should be considered as conservative.
Some skeptics have warned that a quantum computer may not be possible [65, 66], for example due to the fragility of quantum information at large qubit number and exponentially large Hilbert space. The demonstration here of quantum behavior at 1016 Hilbert space is strong conﬁrmation that nothing unusual or unexpected happens to our current understanding of quantum mechanics at this scale.

IX. SENSITIVITY OF XEB TO ERRORS
An important requirement for a procedure used to evaluate quantum processors, such as XEB, is sensitivity to errors. Qubit amplitudes are complex variables and therefore quantum errors are inherently continuous. Nevertheless, they can be given a discrete description, for example in the form of a ﬁnite set of Pauli operators. The digital error model is used for instance in quantum error correction where errors are discretized by syndrome extraction. In this section we examine the impact of both discrete and continuous errors on the ﬁdelity estimate

obtained from the XEB algorithm. The XEB procedure uses a set of random quantum
circuits U = {U1, . . . , US} with n qubits and m cycles. Every circuit is executed Ns times on the quantum processor under test. Each execution of the circuit Uj applies the quantum operation Λj, which is an imperfect realization of Uj, to the input state |0 0|. The result of the experiment is a set B of SNs bitstrings qi,j sampled from the distributions pe(qi,j) = qi,j| ρj |qi,j where ρj = Λj(|0 0|) is the output state in the experiments with circuit Uj. For each bitstring qi,j, a simulator computes the ideal probability ps(qi,j) = | qi,j|ψj |2 where |ψj = Uj |0 is the ideal output state of the circuit Uj. Finally, XEB uses Eq. (27) or (28) to compute an estimate FXEB(B, U ) of ﬁdelity F (|ψj ψj| , ρj) = ψj| ρj |ψj averaged over circuits U . The result quantiﬁes how well the quantum processor is able to realize quantum circuits of size n and depth m. See section IV for more details on XEB.
The estimate FXEB(B, U ) is a function of bistrings B obtained in experiment and of the set of quantum circuits U used to compute ideal probabilities. This enables a test of the sensitivity of the method to errors by replacing the error-free reference circuits U = {U1, . . . , US} with circuits UE = {U1,E, . . . , US,E} where Uj,E is the quantum circuit obtained from Uj by the insertion at a particular location in the circuit of a gate E representing the error. We identify errors inserted at diﬀerent circuit locations that lead to the same output distribution since XEB cannot diﬀerentiate between them.
We ﬁrst consider the impact of a discrete single-qubit Pauli error E placed in a random location in the circuit. In Fig. S42 we plot FXEB(B, UE) where B are bitstrings observed in our experiment and UE are quantum circuits modiﬁed by the insertion of an additional X or Z gate following an existing single-qubit gate. Each ﬁdelity estimate corresponds to a diﬀerent circuit location where the error gate has been inserted. For every n, the highest ﬁdelity values correspond to the insertion of the Z gate in the ﬁnal cycle of the circuit. They have no impact on measurements and thus are equivalent to absence of error. The corresponding ﬁdelity estimates match the estimates for the unmodiﬁed circuits.
The probability of only seeing the error E is approximately q = ep where e is the probability of E arising at the particular circuit location and p is the probability that no other error occurs. The fraction q of executions realize circuit Uj,E ∈ UE yielding bitstrings BE while the remaining fraction 1 − q yield bitstrings B∗. XEB averages over circuit executions, so
FXEB(B, UE) =
q FXEB(BE, UE) + (1 − q) FXEB(B∗, UE). (81)
Since bitstrings BE originated in a perfect realization of UE we have FXEB(BE, UE) 1 with high probability. Also, assuming the circuits randomize the output q√uantum state suﬃciently, we have FXEB(B∗, UE) 1/ D,

a

14-cycle parallel XEB repeated over 30 minutes:

b

Ratio of worst to best pair ﬁdelity (before moving Q1,7 away from TLS)

43
Q1,7 unstable TLS

Qubit

c

51-qubit 14-cycle patch XEB stability

before tracking down unstable TLS

Row Fidelity

Column

Time (hours)

FIG. S40. Identifying sources of ﬂuctuations with repetitive per-layer simultaneous pair XEB. a, Per-pair ratio of worst ﬁdelity to best ﬁdelity measured via per-layer simultaneous pair XEB at a depth of 14 cycles over the course of 30 minutes. During this time, ﬂuctuations were dominated by a single TLS. b, Measured qubit T1 vs. f10 for Q1,7 at two diﬀerent times a few minutes apart (red vs. blue points), showing an unstable TLS that was dominating the ﬂuctuations in full system ﬁdelity seen in c,. Moving Q1,7 far from this TLS led to the stability seen in Fig. S39.

where D = 2n, see Eq. (25) and Fig. S7. Therefore, for large n

1−q

FXEB(B, UE )

q+ √ D

q

(82)

with high probability. Now, the probability p that no error other than E oc-
curs is approximately equal to the experimental ﬁdelity F which is approximated by FXEB(B, U ), so

FXEB(B, UE) e FXEB(B, U )

(83)

which means that XEB result obtained using circuits modiﬁed to include E is approximately proportional to the XEB result obtained using the error-free reference circuits. Moreover, the ratio of the two XEB results is approximately equal to the probability of E.
The data in Fig. S42 agrees with the approximate proportionality in Eq. (83) and allows us to estimate the median probability of a Pauli error. Based on the drop in XEB ﬁdelity estimate by a factor of almost 100 due to the insertion of one single-qubit Pauli error into the circuit, the probability is on the order of 1%. While more work on the gate failure model needs to be done to correctly relate Sycamore gate error rates to the probability

of speciﬁc Pauli errors, we already see that e has the same order of magnitude as our per cycle and per qubit error given by e2c/2 0.5%, see Table II. A possible resolution of the factor of two discrepancy may lie in the fact that more than one gate failure can manifest itself as a particular Pauli error E in a particular circuit location.
Lastly, we consider the impact of continuous errors on XEB result. Fig. S43 shows the ﬁdelity estimate obtained from XEB using bitstrings observed in our experiment and quantum circuits modiﬁed to include a single rotation RZ(θ). The middle point of the plot is equal to the ﬁdelity estimate obtained for one of the discrete errors in Fig. S42 whereas the leftmost and rightmost points correspond to the ﬁdelity estimate obtained from XEB using the error-free reference circuit.
The analysis above illustrates how questions about the behavior and performance of quantum processors can be formulated in terms of modiﬁcations to the reference quantum circuits and how XEB can help investigate these questions. While XEB has proven itself a powerful tool for calibration and performance evaluation (see sections VI and VIII), more work is required to assess its eﬃcacy as a diagnostic tool for quantum processors.

Statistical and total uncertainty
Linear XEB fidelity Log XEB fidelity
10-2

44

qubits, n run time in seconds

32

111

34

473

36

1954

38

8213

TABLE VI. Circuit simulation run times using qsim on a single Google cloud node (n1-ultramem-160).

XEB fidelity

10-3

12

14

16

18

20

m

FIG. S41. Statistical and total uncertainty of ﬁdelity estimates produced by the linear XEB and logarithmic XEB from ten random quantum circuits with 53 qubits and m cycles. The inner error bars represent the statistical uncertainty discussed in section VIII G. The outer error bars represent the total uncertainty discussed in section VIII H.
X. CLASSICAL SIMULATIONS
A. Local Schro¨dinger and Schro¨dinger-Feynman simulators
We have developed two quantum circuit simulators: qsim and qsimh. The ﬁrst simulator, qsim, is a Schr¨odinger full state vector simulator. It computes all 2n amplitudes, where n is the number of qubits. Essentially, the simulator performs matrix-vector multiplications repeatedly. One matrix-vector multiplication corresponds to applying one gate. For a 2-qubit gate acting on qubits q1 and q2 (q1 < q2), it can be depicted schematically by the following pseudocode.
#iterate over all values of qubits q > q2 for (int i = 0; i < 2^n; i += 2 * 2^q2) {
#iterate values for q1 < q < q2 for (int j = 0; j < 2^q2; j += 2 * 2^q1) {
#iterate values for q < q1 for (int k = 0; k < 2^q1; k += 1) {
#apply gate for fixed values #for all q not in [q1,q2] int l = i + j + k;
float v0[4]; #gate input float v1[4]; #gate output
#copy input

v0[0] = v[l]; v0[1] = v[l + 2^q1]; v0[2] = v[l + 2^q2]; v0[3] = v[l + 2^q1 + 2^q2];
#apply gate for (r = 0; r < 4; r += 1) {
v1[r] = 0; for (s = 0; s < 4; s += 1) {
v1[r] += U[r][s] * v0[s]; } }
#copy output v[l] = v1[0]; v[l + 2^q1] = v1[1]; v[l + 2^q2] = v1[2]; v[l + 2^q1 + 2^q2] = v1[3]; } } }
Here U is a 4x4 gate matrix and v is the full state vector. To make the simulator faster, we use gate fusion [67], single precision arithmetic, AVX/FMA instructions for vectorization, and OpenMP for multi-threading. We are able to simulate 38-qubit circuits on a single Google cloud node that has 3844 GB memory and four CPUs with 20 cores each (n1-ultramem-160). The run times for diﬀerent circuit sizes at depth 14 are listed in Table VI.
The second simulator, qsimh, is a hybrid Schr¨odingerFeynman algorithm (SFA) simulator [38]. We cut the lattice into two parts and use the Schmidt decomposition for the 2-qubit gates on the cut. If the Schmidt rank of each gate is r and the number of gates on the cut is g then there are rg paths, corresponding to all the possible choices of Schmidt terms for each 2-qubit gate across the cut. To obtain ﬁdelity equal to unity, we need to simulate all the rg paths and sum the results. The total run time is proportional to (2n1 + 2n2 )rg, where n1 and n2 are the qubit numbers in the ﬁrst and second parts. Each part is simulated by qsim using the Schr¨odinger algorithm. Path simulations are independent of each other and can be trivially parallelized to run on supercomputers or in data centers. Note that one can run simulations with ﬁdelity F < 1 just by summing over a fraction F of all the paths (see Ref. [38] and Sec. X D). In order to speed up the computation, we save a copy of the state after the ﬁrst p

a 100
10 1 10 2

b
0.06 0.05 0.04

45
100 10 1 10 2

XEB fidelity
Fraction of simulations
XEB fidelity

10 3

0.03

10 3

0.02
10 4 10 4
0.01

10 3 12 14 16 18 20 22 24 26 28 30

0.00

Number of qubits, n

10 4

Reference circuits

10 4

Circuits with X or Z error (median)

Circuits with X or Z error (quartiles)

12 14 16 18 20 22 24 26 28 30 10 3

Number of qubits, n

FIG. S42. Impact of one single-qubit Pauli error on ﬁdelity estimate from XEB. a, Distributions of ﬁdelity estimates from XEB using measured bitstrings and quantum circuits with one bit-ﬂip or one phase-ﬂip error. For each n, shades of blue represent the normalized histogram of the estimates obtained for the error gate placed at diﬀerent circuit locations. The highest ﬁdelity estimates correspond to phase-ﬂip errors immediately preceding measurement and are equal to the ﬁdelity estimates from XEB using error-free circuits. b, Quartiles of the distributions shown in a (blue) compared to the ﬁdelity estimates from XEB using measured bitstrings and unmodiﬁed quantum circuits (red). Both plots use linear scale between 10−4 and −10−4 and logarithmic scale everywhere else.

2-qubit gates across the cut, so the remaining rg−p paths can be computed without re-starting the simulation from the beginning. We call the speciﬁc choice of Schmidt terms for the ﬁrst p gates in the cut a preﬁx.
B. Feynman simulator
qFlex was introduced in Ref. [50] and later adapted to GPU architectures in Ref. [68] to allow eﬃcient computation on Summit, currently the world’s Top-1 supercomputer. qFlex is open source and available at https://github.com/ngnrsaa/qflex. Given a random quantum circuit, qFlex computes output bitstring amplitudes by adding all the Feynman path contributions via tensor network (TN) contractions [69, 70], and so it follows what we call a Feynman approach (FA) to circuit sampling. TN simulators are known to outperform all other methods for circuits with low depth or a large number of qubits (e.g., Ref. [68] successfully simulates 121 qubits at low depth using this technique), as well as for small sample sizes (Ns), since simulation cost scales linearly with Ns.
TN simulators compute one amplitude (or a few amplitudes; see below) per contraction of the entire network. In order to sample bitstrings for a given circuit, a set of

random output bitstrings is chosen before the computation starts. Then, the amplitudes for these bitstrings are computed and either accepted or rejected using frugal rejection sampling [38]. This ensures that the selected subset of bitstrings is indistinguishable from bitstrings sampled from a quantum computer. The cost of the TN simulation is therefore linear in the number of output bitstrings. This makes TN methods more competitive for small sets of output bitstrings.
The optimization of qFlex considers a large number of factors to achieve the best time-to-solution on current supercomputers, an approach that often diverges from purely theoretical considerations on the complexity of TN contractions. More precisely, qFlex implements several features such as:
• Avoidance of distributed tensor contractions: by “cutting” the TN (slicing some indexes), the contraction of the TN is decomposed into many paths that can be contracted locally and independently, therefore avoiding internode communication, which is the main cause for the slowdown of distributed tensor contractions.
• Contraction orderings for high arithmetic intensity: TN contraction orderings are chosen so that the expensive part of the computation con-

46

PFlop/s* eﬃciency (%)

qubits cycles FXEB (%) Ns nodes runtime peak sust. peak sust. power (MW) energy (MWh)

0.5

1M

1.29 hours

8.21

12

1.4 0.5M

1.81 hours** 235.2 111.7 57.4 27.3

5.73

11.2**

1.4

3M

10.8 hours**

62.7**

53

2.22 × 10−6 1M 4550 0.72 hours

6.11

14

0.5

1M

1.0 0.5M

67.7 days** 347.5 252.3 84.8 61.6 67.7 days**

7.25

1.18 × 104** 1.18 × 104**

1.0

3M

1.11 years**

7.07 × 104**

TABLE VII. Runtimes, eﬃciency and energy consumption for the simulation of random circuit sampling of Ns bitstrings from Sycamore with ﬁdelity F using qFlex on Summit. Simulations used 4550 nodes out of 4608, which represents about 99% of Summit. Single batches of 64 amplitudes were computed on each MPI task using a socket with three GPUs (two sockets per node); given that one of the 9100 MPI tasks acts as master, 9099 batches of amplitudes were computed. For the circuit with 12 cycles, 144/256 paths for these batches were computed in 1.29 hours, which leads to the sampling of about 1M bitstrings with ﬁdelity F ≈ 0.5% (see Ref. [50] for details on the sampling procedure); runtimes and energy consumption for other sample sizes and ﬁdelities are extrapolated linearly in Ns and F from this run. At 14 cycles, 128/524288 paths were computed in 0.72 hours, which leads to the sampling of about 1M bitstrings with ﬁdelity 2.22 × 10−6. In this case, one would need to consider 288101 paths on all 9099 batches in order to sample about 1M (0.5M) bitstrings with ﬁdelity F ≈ 0.5% (1.0%). By extrapolation, we estimate that such computations would take 1625 hours (68 days). For Ns =3M bitstrings and F ≈ 1.0%, extrapolation gives us an estimated runtime of 1.1 years. Performance is higher for the simulation with 14 cycles, due to higher arithmetic intensity tensor contractions. Power consumption is also larger in this case. Job, MPI, and TAL-SH library initialization and shutdown times, as well as initial and ﬁnal IO times are not considered in the runtime, but they are in the total energy consumption. *Single precision. **Extrapolated from the simulation with a fractional ﬁdelity.

XEB fidelity

0.2

Reference circuits, Circuits with Rz( ) error

cos2( /2)

0.1

0.0

0

2

32

2

FIG. S43. Impact of the Rz(θ) error on XEB. Fidelity estimates computed by XEB from measured bitstrings and circuits with n = 20 qubits and m = 14 cycles modiﬁed to include Rz(θ) error applied in 10th cycle to one of the qubits as a function of θ (orange dots). Also shown is XEB ﬁdelity computed using the same bitstrings and unmodiﬁed circuits (blue solid line) and a simple model which predicts the eﬀect of the error (green dashed line).
sists of a small number of tensor contractions with high arithmetic intensity. This lowers the time-tosolution.
• Highly eﬃcient tensor contractions on GPU: the back-end TAL-SH library [71] provides fully asynchronous execution of tensor operations on GPU and fast tensor transposition, allowing out-ofcore tensor contractions for instances that exceed GPU memory. This achieves very high eﬃciency

(see Table VII) on high arithmetic intensity contractions.
In addition, qFlex implements two techniques in order to lower the cost of the simulation:
• Noisy simulation: the cost of a simulation of ﬁdelity F < 1 (F ≈ 5 × 10−3 in practice) is lowered by a factor 1/F, i.e., is linear in F [38, 50].
• Fast sampling technique: the overhead in applying the frugal rejection sampling mentioned above is removed by this technique, giving an order of magnitude speedup [50]. This involves the computation of the amplitudes of a few correlated bitstrings (batch) per circuit TN contraction.
As shown in Table VII, qFlex is successful in simulating Sycamore with 12 cycles on Summit, sampling 1M bitstrings with ﬁdelity close to 0.5% in 1.29 hours. At 14 cycles, we perform a partial simulation and extrapolate the simulation time for the sampling of 1M bitstrings with ﬁdelity close to 0.5% using Summit, giving an estimated 68 days to complete the task. Sampling 3M bitstrings at 14 cycles with ﬁdelity close to 1.0% (average experimentally realized ﬁdelity) would take an estimated 1.1 years to complete. Other estimates for diﬀerent sample sizes and ﬁdelities can be found in Table VII. At 16 cycles and beyond, however, the enormous amount of Feynman paths required so that the computation does not exceed the 512 GB of RAM of each Summit node makes the computation impractical.
The contraction of the TNs involved in the computation of amplitudes from Sycamore using qFlex is preceded by a simpliﬁcation of the circuits, which allows

Raw circuit

28

19

11

2

0

29

20

12

6

3

1

37

30

21

13

7

4

38

31

22

14

8

5

44

39

32

23

15

9

45

40

33

24

16

10

49

46

41

34

25

17

50

47

42

35

26

18

52

51

48

43

36

27

28

19

11

2

0

29

20

12

6

3

1

37

30

21

13

7

4

38

31

22

14

8

5

44

39

32

23

15

9

45

40

33

24

16

10

49

46

41

34

25

17

50

47

42

35

26

18

52

51

48

43

36

27

SWAP-cphase transformation

28

19

11

2

0

29

20

12

6

3

1

37

30

21

13

7

4

38

31

22

14

8

5

44

39

32

23

15

9

45

40

33

24

16

10

49

46

41

34

25

17

50

47

42

35

26

18

52

51

48

43

36

27

5

6

cut

47

SWAP-cphase transformation & cuts

28

19

11

2

0

29

20

012

6

3

1

37

30

521

4 13

7

4

38

31

22

14

8

5

44

39

32

23

15

9

45

40

33

24

16

10

49

46

41

34

25

17

50

47

42

35

26

18

52

51

48

43

36

27

28

19

11

2

0

29

20

12

6

3

1

37

30

21

13

7

4

38

31

22

14

8

5

44

39

32

23

15

9

45

40

33

24

16

10

49

46

41

34

25

17

50

47

42

35

26

18

52

51

5

48
6

43

36

27

7

8

cut

28

19

11

2

0

29

20

012

6

3

1

37

30

121

1 13

7

4

38

31

22

14

8

5

44

39

32

23

15

9

45

40

33

24

16

10

49

46

41

34

25

17

50

47

42

35

26

18

52

51

48

43

36

27

FIG. S44. Logarithm base 2 of the bond (index) dimensions of the tensor network to contract for the simulation of sampling from Sycamore with 12 cycles (top) and 14 cycles (bottom) using qFlex. The left plots represent the tensor network given by the circuit. The middle plots represent the tensor network obtained from a circuit where fSim gates have been transformed, when possible (see main text). The right plots represent the tensor network after the gate transformations and cuts (gray bonds) have been applied; the log2 of the bond dimensions of the indexes cut are written explicitly. For 12 cycles, there are 25 × 21 × 22 = 28 = 256 cut instances (paths); for 14 cycles, there are 27 × 27 × 25 = 219 = 524288 cut instances.

us to decrease the bond (index) dimension of some of the indexes of the TN. This comes from the realization that fSim(θ = π/2, φ) = −i · [Rz(−π/2) ⊗ Rz(−π/2)] · cphase(π + φ) · SWAP (see Sections VI and VII E); note that the SWAP gate can be applied either at the beginning or at the end of the sequence. We apply this transformation to all fSim gates at the beginning (end) of the circuit that aﬀect qubits that are not aﬀected by any other two-qubit gate before (after) in the circuit. The SWAP is then applied to the input (output) qubits and their respective one-qubit gates trivially, and the bond dimension remaining from this gate is 2, corresponding to the cphase gate, as opposed to the bond dimension 4 of the original fSim gate. Note that in practice this identity is only approximate, since θ ≈ π/2; we ﬁnd that transforming all gates described above causes a drop in ﬁdelity to about 95%.
After the above simpliﬁcation is applied, we proceed to cut (slice) some of the indexes of the TN (see Ref. [50] for details). The size of the slice of the index involved in each cut (the eﬀective bond dimension of the index) is variable, and is chosen diﬀerently for diﬀerent number of cycles on the circuit. Cutting indexes decomposes the contraction of the TN into several simpler contractions, whose results are summed after computing them

independently on diﬀerent nodes of the supercomputer. Fig. S44 shows the bond dimensions of the TN corre-
sponding to the circuits with 12 and 14 cycles simulated. We can see the decrease in bond dimension after the fSim simpliﬁcation is applied, as well as the remaining bond dimension on the indexes cut for each case.
Finally, we contract the tensor network corresponding to the computation of a set of amplitudes (for fast sampling) for a particular batch of output bitstrings. The contraction ordering, which is chosen (together with the size and position of the cuts) in order to minimize the time-to-solution of the computation (which involves a careful consideration of the memory resources used and the eﬃciency achieved on the GPUs) is shown in Fig. S45. The computation can be summarized in the following pseudo-code, where α, β, and γ are variables that denote the diﬀerent instances of the cuts:
# Qubits on C are used for fast sampling. # size_of_batch amps. per circuit contraction. size_of_batch = 2^num_qubits(C)
# Placeholder for all amplitudes in the batch. batch_of_amplitudes = zeros(size_of_batch)
# Start contracting...

1

28

19

11

2

0

29

20

12

6

3

1

0

37

30

21

13

7

4

5

4

38

31

22

14

8

5

44

39

32

23

15

9

45

40

33

24

16

10

49

46

41

34

25

17

50

47

42

35

26

18

52

51

48

43

36

27

5

28

19

11

29

2 20

12 0

6

2

C0
3

1

37

30

21

13

7

4

5

4

44 49 52

38 39
45
1 46
50 51

31
340
47

732 441
48

22
833 542

23
934 643

14 15
24 25
1035 36

8

5

9

16

10

26

B17

18

27

2

28

19

11

29

20

12

6

32 3

0
1 21

37

30

21 5

0 13
4

7

44

38

31

22

14

8

5

644

39

32

23

15

9

249

45

546

40

41

1 50
52

351

447 48

33 42

34 43

24 35

25 36

16
426

317 227

10
118

6

28

19

11

29

20

12

0

6

2

C0
3

1

37

30

21

13

7

4

5

4

44

38

39

E31

32

22

23

14

15

8

9

5

45

40

33

24

16

10

49

50

46

47

41

42

34

35

25

26

B17

18

52

51

48

43

36

27

3

528 29
837 38
44

319

630 939

20
731

111 421

212 0

5

4

22

32

13 23

6 14

2 7 15

C0
3

1

4

8

5

9

49

A45 46 50

40 47

41

33 42

34

24 35

25

16

10

26

B17

18

52

51

48

43

36

27

7

28 37 44 49

29 38 45 50

19

20

30

31

39

1

40

46

47

11 21 32 41

2

0

12

6

3

1

14 15 0

13

7

4

10 13 5

4

22

14

8

5

2 9 12 23

15

9

3 8 11 33

24

16

10

4 7 34

25

17

5 6 42

35

26

18

52

51

48

43

36

27

48

4

28

29

D19 20

11

12 0

6

2

C0
3

1

37

30

21

13

7

4

5

4

38

31

22

14

8

5

44

39

32

23

15

9

49

A45 46 50

40 47

41

33 42

34

24 35

25

16

10

26

B17

18

52

51

48

43

36

27

8

28

19

11

2

0

29

20

12

6

3

1

0

37

30

21

13

7

4

44

38

39

31

32

5

F4
22 23

14

15

8

9

5

45

40

33

24

16

10

49

46

41

34

25

17

50

47

42

35

26

18

52

51

48

43

36

27

FIG. S45. TN contraction ordering for the computation of a batch of amplitudes for the simulation of Sycamore with 12 and 14 cycles. Dotted qubits are used for fast sampling; the output index is left open. Three indexes are cut, with remaining bond dimensions given in Fig. S44, and all possible cut instances are labelled by variables α, β, and γ (panel 1). Tensors A, B, and C are independent of cut instances, and so are contracted only once (panels 2 and 3) and reused several times. Given a particular instance of α and β, tensors D (panels 3 and 4) and subsequently E (panels 5 and 6) are contracted; tensor E will be reused in the inner loop. For each instance of γ (inner loop), tensor F is contracted (panels 7 and 8), which gives the contribution to the batch of amplitudes (open indexes on C and speciﬁed output bits otherwise) from a particular (α, β, γ) instance (path). The sequence of tensor contractions leading to building a tensor are enumerated, where each tensor is contracted to the one formed previously. For simplicity, the contraction of two single-qubit tensors onto a pair before being contracted with others (e.g., tensor 10 in the yellow sequence of panel 5) is not shown on a separate panel; these pairs of tensors are computed ﬁrst and are reused for all cut instances.

contract(A) # Panel 2 contract(B) # Panel 2 contract(C) # Panel 2
# alpha labels instances of 1st cut for each alpha {
# beta labels instances of 2nd cut for each beta {
contract(D) # Panels 3 & 4 contract(E) # Panels 5 & 6
# gamma labels instances of 3rd cut for each gamma {
contract(F) # Panels 7 & 8
# Add contribution from this # path (alpha, beta, gamma). batch_of_amplitudes += F } } }
Dotted qubits on Fig. S45 denote the region used for fast sampling, where output indexes are left open. The circuit TN contraction leads to the computation of 64 amplitudes of correlated bitstrings (tensor F ). Note that computing only a fraction F of the paths results in am-

plitudes with a ﬁdelity roughly equal to F. Computing a set of perfect ﬁdelity batches of amplitudes, where the number of batches is smaller than the number of bitstrings to sample also provides a similar ﬁdelity F in the sampling task, where F is equal to the ratio of the number of batches to the number of bitstrings in the sample. A hybrid approach (fraction of batches, each only with a fraction of paths), which we use in practice, also provides a similar sampling ﬁdelity. See Refs. [38, 50] and Section X A for more details.
A new feature of qFlex, implemented for this work, is the possibility to perform out-of-core tensor contractions (of tensors that exceed GPU memory) over more than one GPU on the same node. Although the arithmetic intensity requirements to achieve high eﬃciency are now higher (about an arithmetic intensity of 3000 for an eﬃciency close to 90% over three GPUs, as opposed to 1000 for a similar eﬃciency using a single GPU), the fact that a large part of a node is performing a single TN contraction lets us work with larger tensors, which implies reducing the number of cuts, as well as increasing the bond dimension of each cut; this, in turn, achieves better overall time-to-solution for sampling than simulations based on TNs with smaller tensors and with a lower memory footprint during their contraction (which could perhaps show a higher GPU eﬃciency due to the simultaneous use of each GPU for independent

49

TNs). It is worth noting that the TN contraction ordering presented in Fig. S45 provides us with the best time-to-solution after considering several possibilities for the simulation of sampling from Sycamore using qFlex for both 12 and 14 cycles. This is generally not the case, since diﬀerent numbers of cycles generate diﬀerent TNs, which generally have diﬀerent contraction schemes for best simulation time-to-solution.
Sampling of random circuits on Sycamore is diﬃcult to simulate with TN simulators at 16 cycles and beyond. Indeed, FA simulators suﬀer from an exponential scaling of runtime with circuit depth. For qFlex, this is manifested in the large size of the tensors involved in the circuit TN contraction (this size grows exponentially with the number of cycles of the circuit), which require a large number of cuts in order not to exceed the RAM of a computation node, and which in turn generates an impractical number of Feynman paths. For other simulators, such as the one presented in Ref. [72], the number of projected variables is expected to be so large that the computation time (which increases exponentially with the number of projected variables) on a state-of-the-art supercomputer makes the computation impractical; see Section X E for a detailed analysis. For TN-based simulators that attempt the circuit contraction distributed over several nodes (without cuts) [73], we expect the size of the largest tensor encountered during the TN contraction (which grows exponentially with depth) to exceed the RAM available on any current supercomputer. Not having enough memory for a simulation is the problem that led to developing FA simulators in the ﬁrst place, for circuits of close to 50 qubits and beyond, for which the Schr¨odinger simulator (see Section X C) requires more memory to store the wave function than available. FA simulators give best performance as compared to other methods in situations with a large number of qubits and low depth. For circuits where both the number of qubits and the number of cycles are considered large enough to make the computation expensive, and contribute equally in doing so (formally, each linear dimension of the qubit grid is comparable to the time dimension), like the supremacy circuits considered in this work, we expect SFA of Section X A to be the leading approach for sampling from a random circuit, given a large enough sample size (∼ 1M in this work); note the linear dependence of the runtime of FA with sample size, which is absent for SFA.
C. Supercomputer Schro¨dinger simulator
We also performed supercomputer Schr¨odinger simulations in the Ju¨lich Supercomputing Centre. For a comprehensive description of the universal quantum computer simulators JUQCS-E and JUQCS-A, see Refs. [74] and [75].
For a given quantum circuit U designed to generate

a random state, JUQCS-E [75] executes U and com-

putes (in double precision ﬂoating point) the probabil-

ity distribution pU (j) for each output or bitstring j ∈ {0, . . . , D − 1}, where D = 2n, n denoting the number of

qubits. JUQCS-E can also compute (in double precision

ﬂoating point) the corresponding distribution function

PU (k) =

k j=0

pU

(j

)

and

sample

bitstrings

from

it.

We

denote by U the set of m states generated by executing

the circuit U . A new feature of JUQCS-E, not docu-

mented in Ref. 75, allows the user to specify a set Q of

M bitstrings for which JUQCS-E calculates pU (j) for all

j ∈ Q and saves them in a ﬁle.

Similarly, for the same circuit U , JUQCS-A [75] computes (with adaptive two-byte encoding) the probability distribution pA(j) for each bitstring j ∈ {0, . . . , D − 1}. Although numerical experiments with Shor’s algorithm for up to 48 qubits indicate that the results produced by JUQCS-A are suﬃciently accurate, there is, in general, no guarantee that pA(j) ≈ pU (j). In this sense, JUQCS-A can be viewed as an approximate simulator of a quantum computing device.

In principle, sampling states with probabilities pA(j) requires the knowledge of the distribution function

PA(k) =

k j=0

pA(j).

If

D

is

large,

and

pA(j)

≈

O(1/D),

as in the case of random states, computing PA(k) requires

the sum over j to be performed with suﬃciently high

precision. For instance, if D = 239, pA(j) ≈ O(10−12) and even with double precision arithmetic (≈ 16 dig-

its), adding D = 239 small numbers requires some care.

Note that in practice, each MPI process only calculates

a partial sum, which helps to reduce the loss of signif-

icant digits. JUQCS-A can compute PA(k) in double precision and sample bitstrings from it. We denote by

A the set of M bitstrings generated by JUQCS-A af-

ter executing the circuit U . Activating this feature re-

quires additional memory, eﬀectively reducing the max-

imum number of qubits that can be simulated by three.

This reduction of the maximum number of qubits might

be avoided as follows. In the case at hand, we know that

all pA(j) ≈ O(1/D). Then, since pA(j) is known, one might as well sample the states from a uniform distribu-

tion, list the weight wA(j) = N pA(j) for each generated state j and use these weights to compute averages. We

do not pursue this possibility here because for the present

purpose, it is essential to be able to compute pU (j) and

therefore, the maximum number of qubits that can be

studied is limited by the amount of memory that JUQCS-

E, not JUQCS-A, needs to perform the simulation.

50

For an XEB comparison, the quantities of interest are

D−1

αU,U ≡ log D + γ + pU (j) log pU (j), (84)
j=0

D−1

αA,U ≡ log D + γ + pA(j) log pU (j), (85)
j=0

D−1

αA,A ≡ log D + γ + pA(j) log pA(j), (86)
j=0

1

αX ,U ≡ log D + γ + M

log pU (j),

(87)

j∈X

where X is one of the four sets U, A, M (a collection of bitstrings generated by the experiment), or C (obtained by generating bistrings distributed uniformly). If M is suﬃciently large (M = 500000 in the case at hand), we may expect that αU,U ≈ αU,U and αA,U ≈ αA,U .
In addition to the cross entropies Eqs. (84)–(87), we also compute the linear cross entropies

D−1
αU,U ≡ pU (j)(DpU (j) − 1),
j=0
D−1
αA,U ≡ pA(j)(DpU (j) − 1),
j=0
D−1
αA,A ≡ pA(j)(DpA(j) − 1),
j=0
1 αX ,U ≡ M (DpU (j) − 1).
j∈X

(88) (89) (90) (91)

Table VIII presents simulation results for the α’s deﬁned by Eqs. (84)–(87) and for the α’s deﬁned by Eqs. (88)–(91), obtained by running JUQCS-E and JUQCS-A on the supercomputers at the Ju¨lich Supercomputer Centre. For testing quantum supremacy using these machines, the maximum number of qubits that a universal quantum computer simulator can handle is 43 (45 on the Sunway TaihuLight at Wuxi China [75]).
The fact that in all cases, αU,U ≈ αA,A ≈ 1 supports the hypothesis that the circuit U , executed by either JUQCS-E or JUQCS-A, produces a Porter-Thomas distribution. The fact that in all cases, αU,U ≈ 1 supports the theoretical result that replacing the sum over all states by the sum over M = 500000 states yields an accurate estimate of the former (see Section IV). Although αA,A ≈ 1 in all cases, using the sample A generated by JUQCS-A to compute αA,U shows an increasing deviation from one, the deviation becoming larger as the number of qubits increases. In combination with the observation that αA,A ≈ 1, this suggests that JUQCS-A produces a random state, albeit not the same state as JUQCS-E. Taking into account that JUQCS-A stores the coeﬃcients of each of the basis states as two single-byte numbers and not as two double precision ﬂoating point numbers (as JUQCS-E does), this is hardly a surprise.

From Table VIII it is clear that the simulation results for αX ,U and αX ,U where X = A, M, C are consistent. The full XEB ﬁdelity estimates αM,U and αM,U , that is the values computed with the bitstrings produced by the experiment, are close to the ﬁdelity estimates of the probabilistic model, patch XEB, and elided XEB, as seen in Fig. 4(a) of the main text.
For reference, in Tables IX and X we present some technical information about the supercomputer systems used to perform the simulations reported in this appendix and give some indication of the computer resources used.
D. Simulation of random circuit sampling with a target ﬁdelity
A classical simulator can leverage the fact that experimental sampling from random circuits occurs at low ﬁdelity FXEB by considering only a small fraction of the Feynman paths (see Secs. X A and X B) involved in the simulation [38], which provides speedups of at least a factor of 1/FXEB. This is done by Schmidt decomposing a few two-qubit gates in the circuit and counting only a fraction of their contributing terms (paths). A key assumption here is that the diﬀerent paths result in orthogonal output states, as was studied in Ref. [38] and later in Ref. [50]. In what follows, we argue that, provided the generation of paths through decomposing gates, the Schmidt decomposition is indeed the optimal approach to achieving the largest speedup, i.e., that the ﬁdelity kept by considering only a fraction of paths is largest when keeping the paths with the largest Schmidt coefﬁcient. This is diﬀerent from proving the optimality of the Schmidt decomposition of a single gate, since here we refer to the ﬁdelity of the entire output state, and decomposed gates are embedded in a much larger circuit. In addition, we show that, for the two-qubit gates used in this work, the speedup is very close to linear in FXEB (and not much larger), since their Schmidt spectrum is close to ﬂat. We close this section by relating the present discussion to Section VII G 2, where the formation of simpliﬁable gate patterns in some two-qubit gate tilings of the circuit is introduced.
In summary, this section provides a method to simulate approximate sampling with a classical computational cost proportional to FXEB. Sec. XI argues, based on complexity theory, that this scaling is optimal. We note that Refs [78–80] propose an alternative method to approximately sample the output distribution at low ﬁdelity. In essence, this method relies on the observation that, for some noise models, the high weight Fourier components of the noisy output distribution decay exponentially to 0. Then this method proposes to estimate low weight Fourier components with an additive error which is polynomial in the computational cost. Nevertheless, Ref. [81] shows that all Fourier components of the output distribution of random circuits are exponentially small, and therefore they can not be estimated in polynomial

51

TABLE VIII. Simulation results for various α’s as deﬁned by Eqs. (84)–(87), obtained by JUQCS-E and JUQCS-A. The results for the α’s deﬁned by Eqs. (88)–(91) are given in parenthesis. The set of bitstrings M has been obtained from experiments. In the ﬁrst column, the number in parenthesis is the circuit identiﬁcation number. Horizontal lines indicate that data is not available (and would require additional simulation runs to obtain it).

qubits
30 39(0) 39(1) 39(2) 39(3) 42(0) 42(1) 43(0) 43(1)

αU,U
1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

αA,A
1.0000 1.0000 1.0000 1.0000 1.0000 1.0001 1.0000 1.0001 1.0000

αU ,U
0.9997 0.9992 1.0002 0.9996 0.9999 0.9998 1.0027 1.0013
—–

αA,U (αA,U )
0.8824 (0.8826) 0.4746 (0.4762)
—–; (—–) —–; (—–) —–; (—–) 0.4264 (0.4268) —–; (—–) 0.3807 (0.3784) —–; (—–)

αM,U (αM,U )
0.0708 (0.0711) 0.0281 (0.0261) 0.0350 (0.0362) 0.0351 (0.0332) 0.0375 (0.0355) 0.0287 (0.0258) 0.0254 (0.0273) 0.0182 (0.0177) 0.0217 (0.0204)

αC,U (αC,U )
+0.0026 (+0.0017) −0.0003 (−0.0011)
—–; (—–) —–; (—–) —–; (—–) −0.0024 (−0.0001) —–; (—–) −0.0010 (−0.0003) —–; (—–)

TABLE IX. Speciﬁcation of the computer systems at the Ju¨lich Supercomputing Centre used to perform all simulations reported in this appendix. The row “maximum # qubits” gives the maximum number of qubits n that JUQCS-E (JUQCS-A) can simulate on a speciﬁc computer.

Supercomputer
CPU
Peak performance Clock frequency Memory/node
# cores/node # threads/core used maximum # nodes used maximum # MPI processes used maximum # qubits

JURECA-CLUSTER [76]
Intel Xeon E5-2680 v3 Haswell
1.8 PFlop/s 2.5 GHz 128 GB 2 × 12 1 256 4096 40 (43)

JURECA-BOOSTER [76]
Intel Xeon Phi 7250-F Knights Landing
5 PFlop/s 1.4 GHz 96 GB + 16 GB (MCDRAM)
64 1 512 32768 41 (44)

JUWELS [77]
Dual Intel Xeon Platinum 8168
10.4 PFlops/s 2.7 GHz 96 GB 2 × 24 3 2048 32768 43 (46)

time with this method. The conclusion is then that the noisy output distribution can be approximated by sampling bitstrings uniformly at random, the distribution for which all Fourier components are 0. This is consistent with Ref. [27] and Secs. IV and VIII E, but it will produce a sample with FXEB = 0, while the output of the experimental samples at 53 qubits and m = 20 still has FXEB ≥ 0.1%

1. Optimality of the Schmidt decomposition for gates embedded in a random circuit

Consider a two-qubit gate Vab acting on qubits a and b. We would like to replace it by a tensor product operator Ma ⊗ Nb. The ﬁnal state of the ideal circuit is

|ψ := U2VabU1|0n

(92)

where U1(U2) is a unitary composed by all the gates applied before (after) Vab. The ﬁnal normalized state of the circuit with the replacement by Ma ⊗ Nb is
|φM,N := U2(Ma ⊗ Nb)U1|0n / U2(Ma ⊗ Nb)U1|0n . (93)

We would like to ﬁnd M, N which maximize the ﬁdelity of the two states, given by

ψ|φM,N = 0n|U1†Va†b|β / β|β ,

(94)

where

|β ≡ (Ma ⊗ Nb)U1|0n

(95)

As the overlap is invariant if we multiply (Ma ⊗ Nb) by a constant, we ﬁx the normalization tr[(Ma ⊗ Nb)†(Ma ⊗ Nb)] = 1.
We now make the assumption that the circuit is ran-
dom (or suﬃciently scrambling) and that the Vab is a gate placed suﬃciently in the middle of the computation that the reduced density matrix of qubits a and b of U1|0n shows maximal mixing between the two. In more detail,
let

ε :=

tr\(a,b)(U1|0n

0n|U1†)

−

I 4

,
2

(96)

with X 2 := tr(X†X)1/2 the Hilbert-Schmidt norm and tr\(a,b) the partial trace of all qubits except a and b.
Using Eq. (96) and Eq. (94), we ﬁnd

52

TABLE X. Representative elapsed times and number of MPI processes used to perform simulations with JUQCS-E and JUQCS-A on the supercomputer indicated. Note that the elapsed times may ﬂuctuate signiﬁcantly depending on the load of the machine/network.

qubits
30 39 42 43

gates
614 802 864 886

Supercomputer
BOOSTER CLUSTER JUWELS JUWELS

JUQCS-E MPI processes
128 4096 16384 32768

Elapsed time
0:02:28 0:42:51 0:51:16 1:01:53

Supercomputer
CLUSTER CLUSTER JUWELS JUWELS

JUQCS-A MPI processes
128 4096 8192 32768

Elapsed time
0:05:23 1:38:42 2:15:48 1:32:19

ψ|φM,N

= tr(tr\(a,b)(U1|0n 0n|U1†)Va†b(Ma ⊗ Nb)) (97)

=

1 4

tr[Va†b(Ma

⊗

Nb)]

±

(Ma ⊗ Nb)

2

Vab

2ε.

As (Ma ⊗ Nb) 2 = 1 and Vab 2 = 2, we ﬁnd

ψ|φM,N

=

1 4

tr[Va†b(Ma

⊗

Nb)] ± 2ε.

(98)

Refs. [82, 83] proved that for a random circuit U1 of depth D in one dimension, ε ≤ (4/5)D. In two dimensions we expect ε to go to zero even faster with depth, so we can ignore the second term of Eq. (98) for suﬃciently large depth.
We now want to ﬁnd Ma, Nb which are optimal for

Ma ,Nb :

max
Ma 2=

Nb

tr[Va†b(Ma ⊗ Nb)].
2

(99)

At this point, we have reduced the problem to ﬁnding the optimal decomposition of the gate as a standalone operator.
Consider the operator Schmidt decomposition of Vab:

Vab = λiRa,i ⊗ Sb,i,
i

(100)

where Ra,i (Sb,i) are orthonormal set of operators in the Hilbert-Schmidt inner product, i.e. tr(Ra†,iRa,j) = tr(Sa†,iSb,j) = δij. The Schmidt singular values λ1 ≥ λ2 ≥ . . . are in decreasing order. Then it follows that
the solution of Eq. (99) is λ1, with optimal solution
Ma = Ra,1 and Nb = Sb,1. Indeed we can write Eq. (99)
as

max x|V |y
|x ,|y

(101)

where the maximum is over all unit vectors |x , |y in (C2)⊗2 and V is the matrix

V := λi(Ra,i ⊗ I)|Φ Φ|(Sb†,i ⊗ I)
i

(102)

with |Φ = i |i ⊗ |i . This can be veriﬁed using the fact that any unit vector |x in (Cd)⊗2 can be written

p(| |)

14 12 10 8 6 4 2
0 0.00 0.02 0.04 0.0|6 0|.08 0.10 0.12 0.14
FIG. S46. Probability distribution of the deviations |δθ| from θ ≈ π/2 for fSim gates. The magnitude of δθ is directly related to the runtime speedup low ﬁdelity classical sampling can take from exploiting the existence of paths with large Schmidt coeﬃcients. In practice, |δθ| ≈ 0.05 radians on average, which imposes a bound of less than an order of magnitude on this potential speedup for the circuits, gates, and simulation techniques considered in this work.

as |x = (L ⊗ I)|Φ for a matrix L acting on (Cd) s.t. L 2 = 1. The result follows by noting that λi are the singular values of V .

The argument above easily generalizes to the problem

of ﬁnding the optimal operator of Schmidt rank k for

replacing the unitary gate. In that case the optimal

choice is

k i=1

λiRa,i

⊗

Sb,i.

2. Classical speedup for imbalanced gates
We now want to analyze the Schmidt spectrum of the two-qubit gates used in this work. The fSim(θ, φ) gate is introduced in Section VII E. This gate, which is presented in matrix form in Eq. (53), has the following Schmidt

53

103

F = 0. 001

g = 10

F = 0. 002

g = 15

F = 0. 004

g = 20

F = 0. 006

g = 25

102

F = 0. 01

g = 30

F = 0. 014

g = 35

speedup

101

g = 35

F = 0. 001

100

0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20

|δθ|

|δθ|

FIG. S47. Classical speedup given by the imbalance in the Schmidt coeﬃcients of the gates decomposed. The speedup is computed by comparison with the case where θ = π/2 exactly. The classical simulation has a target ﬁdelity F, and g fSim gates are decomposed. For simplicity, we assume θ = π/2 + δθ is the same for all gates, as well as φ = π/6. Left: speedup at diﬀerent target ﬁdelities for ﬁxed g = 35. Note that the speedup decreases with F; this is due to the fact that at very low ﬁdelity, considering a few paths with very high weight might be enough to achieve the target ﬁdelity, while for larger values of F, paths with a smaller weight have to be considered, and so a larger number of them is needed per fractional ﬁdelity increase. Right: speedup for ﬁxed ﬁdelity F = 0.001 for diﬀerent values of g. As expected, the speedup is greater as g increases, since the weight of the highest contributing paths increases exponentially with g. The largest speedup is achieved at large g and small F. For g = 35 and F 0.001, we ﬁnd speedups well below an order of magnitude, given that |δθ| ≈ 0.05 radians in practice (shaded area); this case is representative of our simulation of Sycamore with m = 20 (see Section X A) targeting the ﬁdelity measured experimentally.

singular values:

λ1 = 1 + 2 · | cos(φ/2) cos θ| + cos2 θ λ2 = λ3 = sin θ
λ4 = 1 − 2 · | cos(φ/2) cos θ| + cos2 θ ,

(103) (104) (105)

where normalization is chosen so that

4 i

λ2i

=

4.

In

practice, we have θ ≈ π/2 and φ ≈ π/6, and so we obtain

λi ≈ 1, ∀i ∈ {1, 2, 3, 4}, which gives a ﬂat spectrum.

In the case that θ = π/2 ± δθ, the spectrum becomes

imbalanced, as expected. When considering the decom-

position of a number g of fSim(π/2 ± δθ, φ ≈ π/6) gates, the set of weights of all paths is equal to the outer prod-

uct of all sets of Schmidt coeﬃcients (one per gate).

Achieving a ﬁdelity FXEB > 0 implies (in the optimal case) including the largest contributing paths, and so

the advantage one can get from this is upper bounded

by the magnitude of the largest weight, which is equal

to

g α=1

λ2α,max,

where

α

labels

the

gates

decomposed

and λα,max is the largest Schmidt coeﬃcient for gate α.

In practice, |δθ| has values of around 0.05 radians (see

Fig. S46). The geometric mean of λmax is about 1.047, which gives an upper bound of 1.0472g to the speedup

discussed here. For the largest value of g considered in

this work, i.e., the decomposition of g = 35 gates using

the SFA simulator (Section X A) on a circuit of m = 20

cycles, we obtain a value of 1.0472×35 = 25.4. Note that

the speedup obtained in practice (as compared to run-

times over circuits with perfectly ﬂat gate Schmidt de-

compositions) for ﬁdelities of the order of 0.1% and larger

is expected to be far smaller than this value, given that

one has to consider a large number of paths, from which

only an exponentially small number will have a weight

close to 25.4.

We can get a better estimate for the speedup achieved

in practice, beyond the upper bound of about a factor of

25 that decomposing g = 35 gates with typical parame-

ters would give. For simplicity, let us assume that all g

gates have the same values of θ and φ. Then the weight

of each path arising from this decomposition can be writ-

ten as Wi = W(a,b,c) = λ21aλ22bλ23c, where a + b + c = g, and that the number of paths for each choice of (a, b, c)

is equal to #(a, b, c) =

b k=0

multinomial(a,

b

−

k,

k,

c)

=

2b ×multinomial(a, b, c). After sorting all 4g weights (and

paths) by decreasing value, given a target ﬁdelity, F, one

now has to consider the ﬁrst S paths (i.e., those with the

largest weight), up to the point where the sum of their

weights

S Wi i=1 4g

malization factor

matches the target ﬁdelity. The 4g guarantees that if one were to

norcon-

sider all paths, the ﬁdelity would be unity, as expected. Compared to the case where we consider a number F ×4g

of paths, as for a ﬂat Schmidt spectrum, this provides a

speedup equal to

F

S ×4g

.

We

show the speedup

achieved

this way in Fig. S47. For the case where we would achieve

the largest speedup in the simulations considered in this

work, namely the simulation of Sycamore at m = 20

cycles and a ﬁdelity F ≈ 0.2% with g = 35 gates decom-

posed (see Section X F), we estimate that the speedup

obtained this way would be well below an order of mag-

nitude, since |δθ| typically takes values of about 0.05 radians.

3. Veriﬁable and supremacy circuits
So far we have considered the decomposition of gates one by one, i.e., where the total number of paths is equal to the product of the Schmidt rank of all gates decomposed. However, by fusing gates together in a larger unitary, one can provide some speedup to the classical simulation of the sampling task.
The rationale here comes from the realization that a unitary that involves a number of qubits q cannot have a rank larger than 4min(ql,qr) when Schmidt decomposed over two subsets of qubits of size ql and qr, with ql +qr = q. Therefore one might reduce exponentially the number of paths by fusing gates such that the resulting unitary reaches on either side (l or r) a number of qubits that is smaller than the product of the ranks of the fused

54

gates to be decomposed. This is at the heart of the formation of wedges of Section VII G 2. These wedges denote particular sequences of consecutive two-qubit gates that only act upon three qubits. Fusing these two-qubit gates together generates 4 paths, as opposed to a naive count of 42 paths if one decomposes each gate separately. Each wedge identiﬁed across a circuit cut provides a speedup by a factor of 4.
In this work, we deﬁne two classes of circuits: veriﬁable and supremacy circuits. Veriﬁable circuits present a large number of wedges across the partition used with the SFA simulator (Section X A) and are therefore classically simulatable in a reasonable amount of time. These circuits were used to perform full XEB over the entire device up to depth m = 14 (see Fig. 4a of the main article and Sections VII and VIII), which involves perfect ﬁdelity computations. On the other hand, supremacy circuits are designed so that the presence of wedges and similar sequences is mitigated, therefore avoiding the possibility of exploiting this classical speedup.
It is natural to apply the ideas presented here beyond wedges. It is also easy to look for similar structures in the circuits algorithmically. This way, we ﬁnd that for the supremacy circuits there is a small number of such sequences. On the sequence of cycles DCD (see Fig. S25), three two-qubit gates are applied on qubits 16, 47, and 51 (see Fig. S27 for numbering). These three gates can be fused in one. Then, if the two gates between qubits 47 and 51 are decomposed (as is done with the SFA simulations of Section X A used in Fig. 4 of the main article), this technique provides a speedup of a factor of 4. The sequence of layouts DCD appears twice for circuits of m = 20, which provides a total speedup of 42 = 16 in the simulation of the supremacy circuits. This particular decomposition is currently not implemented, and the estimated timings of Section X A and Fig. 4 of the main article do not take it into account.
Beyond this, one has to go to groups of several cycles of the circuit (more than two) in order to identify regions where the fusion of several gates provides any advantage of this kind. In our circuits, the resulting unitaries act upon a large number of qubits, which makes explicitly building the unitary impractical.
E. Treewidth upper bounds and variable elimination algorithms
We explained in Section X B that the Feynman method to compute individual amplitudes of the output of a quantum circuit can be implemented as a tensor network when quantum gates are interpreted as tensors. All indexes of the tensor network have dimension two because indexes correspond to qubits. Similarly, Ref. [70] showed that a quantum circuit can be mapped directly to an undirected graphical model. In the undirected graphical model, vertices or variables correspond to tensor indexes, and cliques correspond

to tensors. Individual amplitudes can be computed using a variable elimination algorithm on the undirected graphical model, which is similar to a tensor contraction on a tensor network. The variable elimination algorithm depends on the ordering in which variables are eliminated or contracted. If we deﬁne the contraction width of an ordering to be the rank of the largest tensor formed along the contraction, the treewidth of the undirected graph is equal to the minimum contraction width over all orderings. Therefore, the complexity of a tensor network contraction grows in the optimal case exponentially with the treewidth, and the treewdith can be used to study the complexity of Feynman methods for simulating quantum circuits [69]. Ref. [70] showed that for diagonal gates the undirected graphical model is simpler, potentially lowering its treewidth, and hence improving the complexity. This simpliﬁcation is not achievable in the tensor network view without including hyperedges, i.e., edges attached to more than two tensors. Ref. [70] also introduced the use of QuickBB to ﬁnd a heuristic contraction ordering [84]. If allowed to run for long enough, QuickBB ﬁnds the optimal ordering, together with the treewidth of the graph. However, note that obtaining the treewidth of a graph is an NP-hard problem, and so in practice a suboptimal solution is considered for the simulations described here.
Once the width of a contraction is large enough, the largest tensor it generates is beyond the memory resources available. This constraint was overcome in Ref. [72] by projecting a subset of p variables or vertices in the undirected graphical model into each possible bistring of 0 and 1 values. This generates 2p similar subgraphs, each of which can be contracted with lower complexity and independently from each other, making the computation embarrassingly parallelizable. Choosing the subset of variables that, after projection, optimally decreases the treewidth of the resulting subgraph is also NP-hard. However, Ref. [72] developed a heuristic approach that works well in practice. The algorithm proceeds as follows:
1. Run QuickBB for S seconds on the initial graph. This gives a heuristic contraction ordering, as well as an upper bound for the treewidth.
2. For each variable, estimate the cost of contracting the subgraph after projection. The estimate is done with the ordering inherited from the previous step.
3. Choose to project the variable which results in the minimum contraction cost.
4. Repeat steps 2 and 3 until the cost is within reasonable resources.
5. Once all variables have been chosen and projected, run QuickBB for S seconds on the resulting subgraph to try to improve the contraction ordering inherited from step 1 and lower the contraction cost.

55

contraction width

runtime (s) on 1M cores

80

Sycamore supremacy circuits at m cycles Bristlecone circuits at depth d

70

QuickBB

60

50

40

30

m = 14

m = 18

m =d10= (1 + 32 + 1m) = 1d2= (1 + 40 + 1)

m = 16

m = 20

0

10

20 # variab3le0 s project4e0d

50

60

Sampling 1M bitstrings with fidelity 0.5%

1020

111B years

1016 1012

5K years 4M years

108 104

110 hours 71 days 34 seconds

2(vKeyriefiaarbsle circuit)

100

10 4

10 8

4

6

8

10

m12

14

16

18

20

FIG. S48. Contraction widths and estimated runtimes for classical sampling using the variable elimination algorithm with projected variables of Ref. [72] for Sycamore supremacy circuits. Top: contraction width as a function of the number of variables projected using the algorithm of Ref. [72]. We project enough variables in order to decrease the width to 28 or lower. Note that often the second QuickBB run does not decrease the treewidth (and might even increase it), in which case the resulting contraction ordering it is ignored. Bottom: estimated runtimes for the classical sampling of 1M bitstrings from the supremacy circuits with ﬁdelity 0.5% using the contraction ordering found by QuickBB at the end of the projection procedure shown in the top panel. The red data point shows the estimated runtime for a veriﬁable circuit; note that the heuristic algorithm analyzed here provides some speedup in this case. Our time estimates assume the use of fast sampling, although it is so far unclear whether this technique can be adapted to the algorithm described here. Failure to do so would result in a slowdown of about an order of magnitude.

In the top panel of Fig. S48 we show the contraction width as a function of the number of variables that are projected for the supremacy circuits used in this paper. In order to decrease the contraction width to 28 or below (a tensor with 28 binary indexes consumes 2 GB of memory using single precision complex numbers), we need to project between 8 and 63 variables, depending on the depth of the circuits. In addition, we report the result of the projection procedure on the Bristlecone circuits considered in Refs. [50, 85] and available at https://github.com/sboixo/GRCS for depths (1+32+1) and (1+40+1), since these cases were benchmarked in Ref. [85]. We obtain a contraction width equal to 28 after 10 projections for Bristlecone at depth

(1+32+1), and width 26 after 22 projections for Bristlecone at depth (1+40+1), consistent with the results in Ref. [85]. Even though Ref. [72] uses S = 60, we run QuickBB for 1800 seconds (30 minutes) every time, in order to decrease the contraction width of the Bristlecone simulations to values that match the memory requirements reported in Ref. [85]. Note that Ref. [85] neither reports the value of S used nor the contraction widths found; however, with S = 1800 we are able to match the scaling of time complexity reported, as is explained below.
To estimate the runtime of the computation of a single amplitude using this algorithm on the circuits presented in this work, we use the following scaling formula:
TVE = CV−E1 · 2p · (cost after p projections)/ncores, (106)
where VE refers to the variable elimination algorithm with projections described in this section, CVE is a constant factor, p is the number of variables projected, and ncores is the number of cores used in the computation. The cost of the full contraction of each subgraph is estimated as the sum of 2rank, where the rank refers to the number of variables involved in each individual contraction along the full contraction of the subgraph. We obtain the value of CVE from the runtimes reported in Ref. [72], which shows that a single amplitude of Bristlecone at depth (1+32+1) takes 0.43 seconds to compute on 127,512 CPU cores with 10 projected variables, and at depth (1+40+1) it takes 580.7 seconds with 22 projected variables using the same number of cores. We use the benchmark at depth (1+32+1) because it provides the largest value for CVE (lowest time estimates), which is equal to 52.7 MHz; the benchmark at depth (1+40+1) gives CVE = 51.6 MHz. In order to sample 1M bitstrings from a random circuit with ﬁdelity 0.5%, we need to compute 5000 amplitudes.
We present our estimates for Sycamore supremacy circuits in the bottom panel of Fig. S48. Note that depth (1+40+1) in Refs. [72, 85] is equivalent to m=20 cycles here because of the denser layout of two-qubit gates. Furthermore, computation times reported previously are for circuit variations less complex than for Sycamore, arising from changes in complexity such as CZ vs. fSim gates and diﬀering patterns; with this change of gates, depth (1+40+1) in Refs. [72, 85] is actually equivalent to m=10 cycles here. Finally, note that we present optimistic estimates, since we are assuming that the fast sampling technique discussed in Section X B is applicable here. To the best of our knowledge, it is not known how to apply this technique for the heuristic variable elimination algorithm discussed here; in the absence of an implementation of this technique, in order to successfully apply rejection sampling we would instead need to compute a few independent amplitudes per sampled bitstring, which would increase the estimated times by about an order of magnitude (see Section X B and Refs. [38, 86] for more details). According to our esti-

W1/2

X1/2

W1/2

Y1/2

Syc

Syc

X1/2

W1/2

X1/2

W1/2

Syc

Y1/2

X1/2

Y1/2

W1/2

Syc

Syc

X1/2

W1/2

Y1/2

X1/2

56
80

60

Counts

40

20

0

180

200

220

240

260

Job times (seconds)

FIG. S50. Qsimh execution time for a 53 qubit circuit with 20 cycles for the ﬁrst 1000 preﬁx values. The average job time tpreﬁx is calculated to be 246 seconds.

FIG. S49. Circuit with Sycamore gates (top) and its corresponding undirected graphical model (bottom). Each non-diagonal single-qubit gate introduces a new vertex or variable. Note that, even though two-qubit gates are generally represented by a clique with four vertices or variables, Sycamore gates can be simpliﬁed as a cphase followed by a SWAP. The cphase is represented as an edge between two existing variables. The SWAP, however, provides more complexity to the graph as it swaps the corresponding variables.
mates, sampling from supremacy circuits at m = 16 and beyond is out of reach for this algorithm. Interestingly, we ﬁnd some speedup for the simulation of veriﬁable circuits, as is shown in Fig. S48 for m = 16 (red data point).
Finally, note that the undirected graphical model derived from the supremacy circuits can take advantage of the structure of the Sycamore gates (fSim plus single-qubit Rz rotations). Due to the fact that fSim(θ ≈ π/2, φ) ≈ −i · [Rz(−π/2) ⊗ Rz(−π/2)] · cphase(π + φ) · SWAP, the Sycamore gate corresponds to a subgraph of only two variables, which explicitly represents the diagonal cphase and the logical SWAP. This simpliﬁcation, used in our estimates, results in an undirected graphical model that is simpler than that one generated by arbitrary two-qubit gates. See Fig. S49 for an example.
F. Computational cost estimation for the sampling task
We ﬁnd that the most eﬃcient simulator for our hardest circuits is the SFA simulator (see Sec. X A). In order to estimate the computational cost associated with simulating a 53 qubit circuit with 20 cycles, where no gates are elided on the cut, we use a Google cloud cluster com-

posed of 1000 machines with 2 vCPUs and 7.5 GB of RAM each (n1-standard-2). We use n1-standard-2 because this is the smallest non-custom machine with sufﬁcient RAM for simulating the two halves of the circuit. In 20 cycles, the circuit contains 35 gates across the cut. All cross gates have a Schmidt rank of 4 except for the last four gates which can be simpliﬁed to cphase with a Schmidt rank of 2. To obtain a perfect ﬁdelity simulation we would need to simulate all 431 × 24 paths. We conﬁgure qsimh according to Ref. [38] to have a preﬁx of 30 cross gates, thus requiring 430 separate qsimh runs. The ﬁrst 1000 paths of the required 430 were used for timing purposes. In Figure S50 we plot the distribution of simulation times with qsimh consuming two hyperthreads. The average job time is 246 seconds resulting in a calculated 1.6 × 1014 core hours for a simulation of the circuit with 0.002 ﬁdelity [87]. Extrapolated run times for other circuits with 53 qubits are shown in Table XI. To calculate a total cost for the largest circuit we multiply the Google Cloud preemptible n1-standard-2 price in zone us-central-1 of $0.02 per hour, 246 seconds average run time, 0.002 target ﬁdelity, and 430 qsimh runs. This results in an estimated cost of 3.1 trillion USD. For perfect ﬁdelity simulations (necessary for XEB), an extrapolation to a ﬁdelity value of 100% gives a good estimate of the run time. We believe these estimates are a lower bound on costs and simulation time due to the fact that these calculations are likely to compete with each other if they are run on the same nodes.
As a ﬁnal remark, note that a hypothetical implementation of the decomposition discussed at the end of Section X D 3 could decrease the computation time presented here by a factor of 16.

57

qubits, n cycles, m total #paths ﬁdelity run time

53

12

41724 1.4%

2 hours

53

14

42124 0.9% 2 weeks

53

16

42523 0.6%

4 years

53

18

42823 0.4% 175 years

53

20

43124 0.2% 10000 years

TABLE XI. Approximate qsimh run times using one million CPU cores extrapolated from the average simulation run time for 1000 simulation paths on one CPU core.

G. Understanding the scaling with width and depth of the computational cost of veriﬁcation

1. Runtime scaling formulas

Here we study the scaling of the runtime of the classical computation of exact amplitudes from the output wave function of a circuit with m cycles and n qubits on Sycamore, assuming a supercomputer with 1M cores. This computation is needed in order to perform XEB on the circuits run. We consider two algorithms: a distributed Schr¨odinger algorithm (SA) [74, 75] (see Section X C) and a hybrid Schr¨odinger-Feynman algorithm (SFA) [38] that splits the circuit in two patches and time evolves each of them for all Feynman paths connecting both patches (see Section X A). The latter is embarrassingly parallelizable. Note that these scaling formulas provide rough estimates presented with the intent of building intuition on the scaling of runtimes with the width and depth of the circuits, and that the ﬁnite size eﬀects of the circuits can give discrepancies of an order of magnitude or more for the circuit sizes considered in this work.

For SA, the runtime is directly proportional to the size of the wave function on n qubits. This is equal to 2n. In addition, the runtime is proportional to the number of gates applied, which scales linearly with n and m. For this reason, we propose the scaling:

TSA = CS−A1 · mn · 2n,

(107)

where the constant CSA is ﬁt to runtimes observed experimentally when running on a supercomputer, and scaled to 1M cores.

For SFA the runtime is proportional to the number of paths connecting both patches, as well as to the time taken to simulate each pair of patches. When using the supremacy two-qubit gate layouts (ABCDCDAB. . . ), each fSim gate bridging between the two patches (crossgates) generates a factor of 4 in the n√umber of paths. The number of cross-gates scales with n (we assume a two-dimensional grid) and with m. The time taken to simulate each patch is proportional to 2n/2, where n/2 estimates the number of qubits per patch, and the exponential dependence comes from a linear scaling of the run-

time with the size of the wave function over that patch.

The runtime therefore scales as:

TSFA,

supremacy

=

CS−F1A

·2

·

n
22

√
· 4B·m n,

(108)

where the extra factor of two accounts for the fact that,

for every path, two patches have to be simulated. The

constant CSFA, with units of frequency, is the eﬀective frequency with which 1M cores simulate paths and is ﬁt

from experimentally observed runtime. The constant B

accounts for the average number of cross-gates observed

per cycle, which depends on the two-dimensional grid

considered and on the two-qubit gate layouts used. For

Sycamore, with the supremacy layouts, we ﬁnd 35 cross-

gates for n = 53 and m = 20, which gives B = 0.24 ≈

1/4.

For SFA, using the veriﬁable two-qubit gate lay-

outs (EFGHEFGH. . . ), the main diﬀerence with the

supremacy circuits case is the fact that most of the cross-

gates can be fused in pairs, forming three-qubit gates we

refer to as wedges (see Sec. VII G 2 and X D 3). Each cross-wedge generates only 4 paths, as opposed to the 42

paths the two independent fSim gates would have gen-

erated. Since every 4 cycles provide 7 cross-gates, and

from those 7 gates, 6 are converted into 3 wedges, we count only 44 paths, as opposed to a naive count of 47

for those 4 cycles. In turn, the exponent in the last factor

of

Eq.

108

is

corrected

by

the

fraction

4 7

.

This

results

in:

TSFA,

veriﬁable

=

CS−F1A

·

2

·

n
22

·

4

4 7

√ B·m n

.

(109)

2. Assumptions and corrections
There are several assumptions considered in Section X G 1 and other details that can either (1) contribute to a somewhat large discrepancy between the runtimes predicted by the scaling formulas and the actual runtimes potentially measured experimentally, or (2) be ignored with no signiﬁcant impact on the accuracy of the predictions. Here we discuss the ones we consider most relevant.
Concerning SA, the algorithm is benchmarked in practice on up to 100K cores. Since this is a distributed algorithm, the scaling with number of cores is not ideal and therefore the constant CSA can only be estimated roughly. We assume perfect scaling in our estimates for runtime on 1M cores, i.e., the runtime on 1M cores is the one on 100K cores divided by 10; this is of course an optimistic estimate, and runtimes should be expected to be larger.
For memory requirement estimates, we assume a 2 byte encoding of complex numbers. Beyond about 49 qubits there is not enough RAM on any existing supercomputer to store the wave function. In those cases, runtimes are given for the unrealistic, hypothetical case that one can store the wave function.

58

FIG. S51. Scaling of the computational cost of XEB using SA and SFA. a, For a Schro¨dinger algorithm, the limitation is RAM size, shown as vertical dashed line for the Summit supercomputer. Circles indicate full circuits with n = 12 to 43 qubits that are benchmarked in Fig. 4a of the main paper [1]. 53 qubits would exceed the RAM of any current supercomputer, and is shown as a star. b, For the hybrid Schro¨dinger-Feynman algorithm, which is more memory eﬃcient, the computation time scales exponentially in depth. XEB on full veriﬁable circuits was done at depth m = 14 (circle). c, XEB on full supremacy circuits is out of reach within reasonable time resources for m = 12, 14, 16 (stars), and beyond. XEB on patch and elided supremacy circuits was done at m = 14, 16, 18, and 20.

SFA is embarrassingly parallelizable, and so it does not suﬀer from non-ideal scaling. However, there are other factors to take into account. First, we have written no explicit dependence of the time to simulate patches of the circuit with m; the number of cycles m only plays a role when counting the number of paths to be considered. SFA stores several copies of the state of a patch after its evolution at diﬀerent depths, iterating over paths over several nested loops. For this reason, most of the time is spent iterating over the inner-most loop, which accounts for the last few gates of the circuit and is similar in cost for all depths. This implies that the amortized time per path is considered approximately equal for all depths and the direct m dependence was correctly ignored.
A factor contributing to the discrepancy between the predicted runtimes of the scaling formulas of Section X G 1 and those expected in practice is due to ﬁnite size eﬀects. While these scaling formulas consider the average number of cross-gates encountered per cycle, different cycles have layouts that contribute a few more (or less) gates than others. Since the runtime dependency is exponential in the number of gates, this might cause

discrepancies of around an order of magnitude. Furthermore, for veriﬁable circuits, wedges form over groups of two cycles; this coarse graining exacerbates ﬁnite size effects. For the sake of simplicity in the scaling formulas, we do not perform any corrections to include these factors. However, in order to mitigate the propagation of ﬁnite size eﬀect errors, we consider diﬀerent constants CSFA, supremacy and CSFA, veriﬁable, that we ﬁt independently.
Finally, we refer to runtimes of our simulations on a hypothetical supercomputer with 1M cores. While this is a realistic size for a Top-5 supercomputer currently, a core-hour can vary signiﬁcantly between diﬀerent CPU types. Again, we only intend to provide rough estimates in order to build intuition on the dependence of runtimes with circuit width and depth.
3. Fitting constants
In the case of SA, we ﬁt the constant CSA with a runtime of 0.1 hours for the simulation with n = 43 and

59

m = 14. This runtime is obtained by assuming ideal scaling when extrapolating a runtime of 1 hour on nearly 100K nodes (215 MPI processes, 3 cores per process), as reported in Sec. X C. This gives a value of

CSA = 0.015 × 106 GHz.

(110)

For SFA, we consider B = 1/4 for simplicity. In order to ﬁt CSFA, we consider a runtime of 5 hours and 4 years for the case with n = 53 and m = 14 for veriﬁable and supremacy circuits, respectively (see Fig. 4 of the main text). This gives:

CSFA, veriﬁable = 0.0062 × 106 GHz CSFA, supremacy = 3.3 × 106 GHz.

(111)

As discussed above, these ﬁts provide times estimated for a supercomputer with 1M cores. Contour plots showing the dependency of runtime with n and m are presented in Fig. S51.

4. Memory usage scaling

Let us conclude with a discussion of the memory foot-

print of both algorithms. For these estimates, we assume

a 2-byte encoding of complex numbers, as opposed to 8

bytes (single precision) or 16 bytes (double precision).

This results in a lower bound for the memory usage of

these two algorithms. These estimates need an extra fac-

tor of 4 (8) when using single (double) precision. SA

stores the wave function of the state on all qubits. For

this reason, it needs 2n × 2 = 2n+1 bytes. SFA simu-

lates the wave function of both halves of the system (n/2

qubits)

per

path,

one

at

a

time.

This

requires

2n 2

·2

bytes

per path. In practice, the use of checkpoints implies the

need to store more than one wave function per path; for

simplicity, and in the same optimistic spirit of other as-

sumptions, we ignore this fact. If 1M cores are used

and each path is simulated using a single core, the total

memory

footprint

is

estimated

to

be

106

×

2

n 2

+1

bytes.

State-of-the-art supercomputers have less than 3 PB of

memory.

H. Energy advantage for quantum computing
With the end of Dennard scaling for CMOS circuits, gains in computing energy eﬃciency have slowed signiﬁcantly [88]. As a result, today’s high performance computing centers are usually constrained by available energy supplies rather than hardware costs. For example, the Summit supercomputer at Oak Ridge National Laboratory has a total power capacity of 14 MW available to achieve a design speciﬁcation of 200 Pﬂop/s doubleprecision performance. We took detailed energy measurements with qFlex running on Summit. The energy

consumption grows exponentially with the circuit depth, as illustrated in Table VII.
For a superconducting quantum computer, the two primary sources of energy consumption are:
1. A dilution refrigerator: our refrigerator has a direct power consumption of ∼10 kW, dominated by the mechanical compressor driving the 3 K cooling stage. The power required to provide chilled water cooling for the compressor and pumps associated with the refrigerator can be an additional 10 kW or more.
2. Supporting electronics: these include microwave electronics, ADCs, DACs, clocks, classical computers, and oscilloscopes that are directly associated with a quantum processor in the refrigerator. The average power consumption of supporting electronics was nearly 3 kW for the experiments in this paper.
We estimate the total average power consumption of our apparatus under worst-case conditions for chilled water production to be 26 kW. This power does not change appreciably between idle and running states of the quantum processor, and it is also independent of the circuit depth. This means that the energy consumed during the 200 s required to acquire 1M samples in our experiment is ∼ 5×106 J (∼ 1 kWh). As compared to the qFlex classical simulation on Summit, we require roughly 7 orders of magnitude less energy to perform the same computation (see Table VII). Furthermore, the data acquisition time is currently dominated by control hardware communications, leading to a quantum processor duty cycle as low as 2%. This means there is signiﬁcant potential to increase our energy eﬃciency further.
XI. COMPLEXITY-THEORETIC FOUNDATION OF THE EXPERIMENT
The notion of quantum supremacy was originally introduced by John Preskill [89]. He conceived of it as “the day when well controlled quantum systems can perform tasks surpassing what can be done in the classical world”. For the purpose of an experimental demonstration we would like to reﬁne the deﬁnition.
Demonstrating quantum supremacy requires:
1. A well deﬁned computational task, i.e. a mathematical speciﬁcation of a computational problem with a well deﬁned solution.
Comment: This requirement, standard in computer science, excludes tasks such as “simulate a glass of water”. However, it would include ﬁnding the ground state energy of an H2O molecule to a given precision governed by a speciﬁc Hamiltonian. Note

60

that a mathematical speciﬁcation of a computational problem calls for highly accurate control resulting in measurable system ﬁdelity.
2. Programmable computational device
Comment: Many physics experiments estimate the values of observables to a precision which can not be obtained numerically. But those do not involve a freely programmable computational device and the computational task is often not well deﬁned as required above. Ideally, we would even restrict ourselves to devices that are computationally universal. However, this would exclude proposals to demonstrate quantum supremacy with BosonSampling [90] or IQP circuits [91].
3. A scaling runtime diﬀerence between the quantum and classical computational processes that can be made large enough as a function of problem size so that it becomes impractical for a supercomputer to solve the task using any known classical algorithm.
Comment: What is impractical for classical computers today may become tractable in ten years. So the quantum supremacy frontier will be moving towards larger and larger problems. But if a task is chosen such that the scaling for the quantum processors is polynomial while for the classical computer it is exponential then this shift will be small. Establishing an exponential separation requires substantial eﬀorts designing and benchmarking classical algorithms [27, 50, 67–70, 72, 74, 75, 85], and support from complexity theory arguments [27, 30, 92]. Sampling the output of random quantum circuits is likely to exhibit this scaling separation as a function of the number of qubits for large enough depth. In this context, we note that quantum analog simulations that estimate an observable in the thermodynamic limit typically do not deﬁne a problem size parameter.
The requirements above are satisﬁed by proposals of quantum supremacy emerging from computer science, such as BosonSampling [90], IQP circuits [91], and random circuit sampling [6, 27, 30, 92, 93]. They are also implicit in the Extended Church-Turing Thesis: any reasonable model of computation can be eﬃciently simulated, as a function of problem size, by a Turing machine.
We note that formal complexity proofs are asymptotic, and therefore assume an arbitrarily large number of qubits. This is only possible with a fault tolerant quantum computer and therefore near term practical demonstrations of quantum supremacy must rely on a careful comparison with highly optimized classical algorithms on state-of-the-art supercomputers.
So far we have argued for quantum supremacy by comparing the running time of the quantum experiment with the time required for the same task using the best known classical algorithms, running on the most powerful supercomputers currently available. The fastest known al-

gorithm for exact sampling (or for computing transition probabilities) runs in time exponential in the treewidth of the quantum circuit [69, 70]; for a depth D circuit on a rectangular lattice of sizes lx and ly, the treewidth is given by min(min(lx, ly)D, lxly). For approximate simulation in which one only requires a given global ﬁdelity F , the classical cost is reduced linearly in F [38]. As classical algorithms and compute power can be improved in the future, the classical cost benchmark is a moving target.
A complementary approach to back up supremacy claims consists of giving complexity-theoretic arguments for the classical hardness of the problem solved (in our case sampling from the output distribution of a random circuit of a given number of qubits, depth and output ﬁdelity). Previous work gave hardness results for sampling exactly from the output distribution of diﬀerent classes of circuits [27, 90, 94–96]. Most relevant to us are Refs. [92, 93, 97], which proved that it is classically intractable (unless the polynomial hierarchy collapses to its third level, which is considered extremely unlikely [98]) to sample from the exact probability distribution of outcomes of measurements in random circuits. We note the distribution of circuits considered in [92, 93, 97] is diﬀerent from ours.
An important clariﬁcation is that such results are asymptotic, i.e. they show that, unless the polynomial hierarchy collapses, there are no polynomial-time classical algorithms for sampling from output measurements of certain quantum circuits. But they cannot be used directly to give concrete lower bounds for quantum computations of a ﬁxed number of qubits and depth. Refs. [99– 101] tackled this question using tools from ﬁne-grained complexity, giving several ﬁnite size bounds.
There are also results arguing for the hardness of approximate sampling (see e.g. [27, 90, 91, 95]), where the task is only to sample from a distribution which is close to the ideal one. As the quantum experiment will never be perfect, this is an important consideration. However those results are weaker than the ones for exact sampling, as the hardness assumptions required have been much less studied (and in fact were introduced with the exact purpose of arguing for quantum supremacy). Another drawback is that the results only apply to the situation where the samples come from a distribution very close to the ideal one (i.e. with high ﬁdelity with the ideal one). This is not the regime in which our experiment operates.
With these challenges in mind, we consider an alternative hardness argument in this section, which will allow us to lower bound the classical simulation cost of noisy quantum circuits by the cost of the ideal one. On one hand, our argument will be more restrictive than previous results in that we will assume a particular noise model for the quantum computer (one, however, which models well the experiment). On the other hand, it will be stronger in two ways: (1) it will apply even to the setting in which the output ﬁdelity of the experimental state with the ideal one can be very small, but still the

61

product of total ﬁdelity with exact computational cost is large; and (2) it will be based on more mainstream complexity assumptions in contrast to the tailor-made conjectures required in e.g. [90, 91, 95] to handle the case of small adversarial noise.

A. Error model

Our error model is the following. We assume that the quantum computer samples from the following output distribution:

rU,F (x) := F | x| U |0 |2 + (1 − F )/2n,

(112)

with U the circuit implemented. In words, we assume global depolarizing noise. Ref. [27] argues that Eq. (112) is a good approximation for the output state of random circuits (see Sec. IV and Section III of [27]); this form has also been veriﬁed experimentally on a small number of qubits. In the experiment, F is in the range 10−2 − 10−3.
We note that while we assume a global white noise model in this section, we do not assume it in the rest of the paper, neither for validating the cross entropy test nor in the comparison with state-of-the-art classical algorithms (and indeed the algorithm considered in Section X samples from an approximate distribution diﬀerent from the one in Eq. (112)).

B. Deﬁnition of computational problem
Before stating our result, let us deﬁne precisely the computational problem we consider. We start with the ideal version of the problem with no noise:
Circuit Sampling: The input is a description of a n qubit quantum circuit U , described by a sequence of oneand two-qubit gates. The task of the problem is to sample from the probability distribution of outcomes pU (x) := | x|U |0 |2.
Circuit sampling is an example of a sampling problem [102]. A classical algorithm for circuit sampling can be thought of, without loss of generality, as a function A mapping m ∈ poly(n) bits r = (r1, . . . rm) to n bits such that
1 2m |{(r1, . . . , rm) s.t. A(r1, . . . , rm) = x}| = p˜U (x),
(113) with p˜(x) an approximation of pU (x) to l ∈ poly(n) bits of precision. So when r is chosen uniformly at random, the output of A are samples from p (up to rounding errors which can be made super-exponentially small).
Assuming the polynomial hierarchy does not collapse, it is known that Circuit Sampling cannot be solved classically eﬃciently in n, meaning any algorithm A satisfying Eq. (113) must have superpolynomial circuit complexity, for several classes of circuits (such as short depth circuits

[96], IQP [94] and Boson Sampling [90]). We might also be interested in the average case of circuit sampling (for a restricted class of circuits).
Random Circuit Sampling: The input is a set of quantum circuits U on n qubits. The task is to sample from pU (x) := | x|U |0 |2 for most circuits U ∈ U .
Ref. [92] proved that an eﬃcient (in terms of n) classical algorithm for this task for random circuits would also collapse the polynomial hierarchy. As every realistic quantum experiment will be somewhat noisy, it is relevant to consider a variant of this task allowing for small deviations from ideal. One possible formulation is the following:
ε-Approximate Random Circuit Sampling: The input is a set of quantum circuits U on n qubits. The task is to sample for most circuits U ∈ U, from any distribution qU s.t. dVD(qU , pU ) ≤ ε, where dVD(p, q) is the variational-distance between the distributions p, q [103] and pU (x) := | x|U |0 |2.
Refs. [27, 90, 91] put forward new complexity-theoretic assumptions about the #P-hardness of certain problems and proved they imply that several restricted classes of circuits are hard to approximately sample for ε suﬃciently close to zero. However, we cannot use these results here as the ε we achieve is far from zero. We will resort to the following diﬀerent variant of approximate circuit sampling.
Unbiased-Noise F -Approximate Random Circuit Sampling: The input is a set of quantum circuits U on n qubits. The task is to sample from the distribution rU,F given by Eq. (112), for most circuits U ∈ U .
We note that there are alternatives for deﬁning the computational problem for which supremacy is achieved without having to use sampling problems. These have the advantage that it is possible to verify, for each problem instance, that the task was achieved (whereas while it is in principle possible to verify that one is sampling from the correct distribution by estimating the frequencies of outcomes, this is unfeasible in practice for high entropy distributions with > 250 outcomes as the one we consider here).
One such problem (considered on Refs. [27, 30]) is the following:
b-Heavy Output Generation: Given as input a number b > 1 and a random circuit U on n qubits (drawn at random from a set of circuits U), generate output strings x1, . . . , xk s.t.

1 k

k

|

xj|U |0

|2

≥

b 2n

j=1

(114)

Ref. [30] argues for the hardness of this task for every b > 1, although here again one has to resort to

62

rather bold complexity-theoretic conjectures. Cross entropy benchmarking allows us to estimate b for a reasonable value of k (though the classical time needed to compute | xj|U |0 |2 still grows very fast), see Sec. IV. In terms of known algorithms, the complexity of solving Heavy Output Generation is equivalent to the complexity of sampling k samples from a noisy distribution corresponding to the same b value.
The experiment we report in this paper can be interpreted as showing quantum supremacy in solving the bHeavy Output Generation with b = 1 + F and F the ﬁdelity of the output quantum state.

C. Computational hardness of unbiased-noise sampling

To state our result, we use the complexity class ArthurMerlin, which is a variant of the class NP and is denoted by AM[T ]. It is deﬁned as the class of problems for which there is an Arthur-Merlin one-round protocol of the following form: given an instance of a problem in AM [T ] (which Arthur would like to decide if it is a YES or NO instance), Arthur ﬁrst sends random bits to Merlin. Merlin (which is computationally unbounded) then sends back a proof to Arthur. Finally Arthur uses the proof and decides in time T if he accepts. In the YES case, Arthur accepts with probability larger than 2/3. In the NO case, he accepts with probability no larger than 1/3.

Theorem 1 Assume there is a classical algorithm running in time T and using m bits of randomness that samples from the distribution rU,F (x) given by Eq. (112), for a given quantum circuit U on n qubits and F ≥ 0. Then for every integer L, there is an AM [LT + 2Lm] protocol for deciding, given λ > 0, whether

| 0| U |0 |2 ≥ λ

2 1+
L

2(1 − F ) + F L2n

or

| 0| U |0 |2 ≤ λ

2 1−
L

2(1 − F ) − F L2n

(115) (116)

Before giving the proof, let us discuss the signiﬁcance of the result. We are interested in the theorem mostly when L = c/F with c a small constant (say 10). Noting that for a random circuit, with high probability, | 0| U |0 |2 ≥ 2−n/5 [97], the theorem states that if we can sample classically in time T from the distribution given in Eq. (112), then we can calculate a good estimate for | 0| U |0 |2 in time 10T /F (with the help from an all-powerful but untrustworthy Merlin). It is unlikely that Merlin can be of any help for this task for random circuits, as estimating | 0| U |0 |2 for random circuits is a #P-hard problem [92], and it is believed #P is vastly more complex than AM (which is contained on the third level of the polynomial hierarchy [98]). Therefore we conclude that global white noise leads to no more than

a linear decrease in ﬁdelity in classical simulation time (which is in fact optimal as it is achieved by the method presented in Ref. [38]).
Ref. [104] proposed a similar, but more demanding, conjecture about the non-existence of certain AM protocols for estimating transition probabilities of random circuits. This conjecture was applied to show that the output bits of our supremacy experiment can be used to produce certiﬁable random bits.
We note Theorem 1 does not establish a lower bound on the classical computation cost of calculating a transition amplitude with additive error δ/2n, for small constant δ > 0. What it does is to show that the sampling problem with unbiased noise is as hard as this task, up to a linear reduction in F in complexity.
Concerning the hardness of computing | 0|U |0 |2 it is known that this problem is #P hard for random circuits to additive error 2−poly(n) [92]. This implies that there is no subexponential-time algorithms for this task (unless #P collapses to P). For ﬁnite size bounds, which are more relevant to our experiment, the result of Ref. [99] is the most relevant. It shows that under the Strong Exponential Time Hypothesis (SETH) [105], there are quantum circuits on n qubits which require 2(1−o(1))n time for estimating | 0|U |0 |2 to additive error 2−(n+1) [106]. Together with Theorem 1, we ﬁnd there is a quantum circuit U on n qubits for which the distribution rU,F (given by Eq. (112)) cannot be sampled in time F 2(1−o(1)n), unless SETH is false.
It is an open question to show a similar lower bound to the one proved in Ref. [99] for estimating the transition probability of random circuits. Even more relevant for this work, it would be interesting to study if one can show a lower bound of the form 2(1−o(1))treewidth for a random quantum circuit, under a suitable complexitytheoretic assumption, as the depth of the construction in [99] is relatively high.
D. Proof of Theorem 1
The proof will follow along similar lines to previous work [90, 91, 95]. We will use approximate counting (which can be done in AM) to show that a sampling algorithm for rU,F running in time T implies an AM protocol to compute rU,F (0)(1 ± 1/L), with classical veriﬁcation of order LT . Since the noise is unbiased, i.e. rU,F (0) = F 0|U |0 |2 +(1−F )/2n, we can subtract it and ﬁnd an AM protocol for estimating | 0|U |0 |2 as stated in the theorem.
In more detail, suppose there is a classical algorithm for sampling from rU,F given by a function A mapping m ∈ poly(n) bits r = (r1, . . . rm) to n bits such that
1 2m |{(r1, . . . , rm) s.t. A(r1, . . . , rm) = x}|
= rU,F (x). (117)

63

Let a(r1, . . . , rm) be a function which is 1 if A(r1, . . . , rm) = 0n and zero otherwise.
We start with the following lemma, showing the existence of A implies an AM [LT + 2Lm] protocol for estimating rU,F (0):
Lemma 1 Assume there is an algorithm A given by Eq. (117). Then for every θ and L there is an AM [LT + 2Lm] protocol which determines if (i) rU,F (0) ≥ θ(1 + 2/L) (YES instance) or (ii) rU,F (0) ≤ θ(1 − 2/L) (NO instance).

Proof: The protocol is the following:

1. For every t ∈ [Lm], Arthur chooses a function at random ht ∈ HLm,t from a family HLm,t of 2universal linear hash functions from {0, 1}Lm to {0, 1}t [98]. Then he communicates his choice of (h1, . . . , hLm) to Merlin.
2. Merlin sends an Lm-bitstring w to Arthur and an integer s ∈ [Lm] .

3. Arthur veriﬁes that hs(w) = 0 and
a(w1,1, . . . w1,m) ∧ . . . ∧ a(wL,1, . . . wL,m) = 0.
He rejects if any of the three equations is not satisﬁed. Then he checks if θ ≤ 2−m201/L2s/L(1 + 2/L)−1, accepting if it is the case and rejecting otherwise.

The cost to compute a(w1,1, . . . w1,m) is T , and the cost to compute is hs(w) is less than 2Lm, so the total veriﬁcation time of the AM protocol is LT + 2Lm.
Let us analyze the completeness and soundness of the protocol.

Completeness: Suppose we have a YES instance,
rU,F (0) ≥ θ(1+2/L). Let us show that Merlin can send w and s which makes Arthur accept with high probability.
Let M be the number of solutions of a(r1, . . . , rm) = 0 (i.e. M = 2mrU,F (0)). Then a(r1,1, . . . r1,m) ∧ . . . ∧ a(rL,1, . . . rL,m) has M L solutions, M for each copy of the function a. As part of the proof Merlin sends s satisfying 20 ≥ M L/2s ≥ 10 (such a value always exists as s can
be an arbitrary integer less than or equal to Lm).
Let us apply Lemma 2 (stated below) with q = Lm, t = s, δ = 1/2, and S the set of solutions, so |S| = M L. Then indeed |S|/2s > 10 > 1/δ3. Therefore, with high
probability, the number of solutions of

a(x1,1, . . . x1,m) ∧ . . . ∧ a(xL,1, . . . xL,m) ∧ hs(x) (118)

is in the interval [(1/2)M L/2s, 2M L/2s]. Since (1/2)M L/2s ≥ 1, there is a string w s.t.
a(w1,1, . . . w1,m) ∧ . . . ∧ a(wL,1, . . . wL,m) ∧ hs(w) = 0, which Merlin also sends to Arthur as part of the proof.
Since M = 2mrU,F (0) ≥ 2mθ(1 + 2/L) and M L/2s ≤ 20,

20

≥

ML 2s

≥

2Lm 2s

θL

2 1+
L

L
,

(119)

so indeed θ ≤ 2−m201/L2s/L(1 + 2/L)−1 and Arthur will accept with high probability.

Soundness: Suppose we have a NO instance, rU,F (0) ≤ θ(1 − 2/L). Let us show that no matter which witnesses w, s Merlin sends, Arthur will only accept with a small probability. Merlin must send s such that

θL ≤ (20)2−Lm2s(1 + 2/L)−L,

(120)

otherwise Arthur rejects. By Lemma 2 (stated below), the number of solutions of

a(x1,1, . . . x1,m) ∧ . . . ∧ a(xL,1, . . . xL,m) ∧ hs(x) (121)
will be in the interval [(1/2)M L/2s, 2M L/2s], with M = 2mrU,F (0) ≤ 2mθ(1 − 2/L). Since
2M L/2s ≤ 2(2−s)2LmθL(1 − 2/L)L ≤ 40(1 − 2/L)L(1 + 2/L)−L ≤ 40e−4 < 1, (122)

there is no solution to Eq. (121) and thus there is no w which will make Arthur accept. This ﬁnishes the proof of Lemma 1.
Reduction to AM protocol for | 0|U |0 |2: Finally let us show how to use Lemma 1 to build the AM protocol stated in Theorem 1. Since rU,F (0) = F | 0| U |0 |2 + (1 − F )/2n, on one hand:

| 0|U |0 |2 ≥ λ

2 1+
L

2(1 − F ) + F L2n

(123)

implies that

rU,F (0) ≥ (F λ + (1 − F )/2n)

2 1+
L

.

(124)

On the other hand:

| 0|U |0 |2 ≤ λ

2 1−
L

2(1 − F ) − F L2n

(125)

implies that

rU,F (0) ≤ (F λ + (1 − F )/2n)

2 1−
L

.

(126)

Setting θ = F λ+(1−F )/2n we see that the AM protocol from before can also be used to decide if Eq. (123) or Eq. (125) hold true. This ends the proof of the theorem.

Lemma 2 [98] For t ≤ q, let Hq,t be a family of pairwise-independent linear hash functions mapping {0, 1}q to {0, 1}t, and let δ > 0. Let S ⊆ {0, 1}n be arbitrary with |S| ≥ δ−32t. Then with probability larger
than 9/10 over the choice of h ∈ Hn,t,

(1

−

|S| δ) 2t

≤

|{x

∈

S|h(x)

=

0t}|

≤

(1

+

|S| δ) 2t

(127)

Moreover h(x) can be evaluated in time 2n, for every h ∈ Hn,t.

64

ACKNOWLEDGMENTS
We acknowledge Georg Goerg for consultation on sta-
tistical analyses. This research used resources of the Oak
Ridge Leadership Computing Facility, which is a DOE
Oﬃce of Science User Facility supported under Contract
DE-AC05-00OR22725.
Correspondence and requests for materials should be addressed to John M. Martinis (jmartinis@google.com).
† Frank Arute1, Kunal Arya1, Ryan Babbush1, Dave Bacon1, Joseph C. Bardin1,2, Rami Barends1, Rupak Biswas3, Sergio Boixo1, Fernando G.S.L. Brandao1,4, David A. Buell1, Brian Burkett1, Yu Chen1, Zijun Chen1, Ben Chiaro5, Roberto Collins1, William Courtney1, Andrew Dunsworth1, Edward Farhi1, Brooks Foxen1,5, Austin Fowler1, Craig Gidney1, Marissa Giustina1, Rob Graﬀ1, Keith Guerin1, Steve Habegger1, Matthew P. Harrigan1, Michael J. Hartmann1,6, Alan Ho1, Markus Hoﬀmann1, Trent Huang1, Travis S. Humble7, Sergei V. Isakov1, Evan Jeﬀrey1, Zhang Jiang1, Dvir Kafri1, Kostyantyn Kechedzhi1, Julian Kelly1, Paul V. Klimov1, Sergey Knysh1, Alexander Korotkov1,8, Fedor Kostritsa1, David Landhuis1, Mike Lindmark1, Erik Lucero1, Dmitry Lyakh9, Salvatore Mandra`3,10, Jarrod R. McClean1, Matthew McEwen5,Anthony Megrant1, Xiao Mi1,Kristel Michielsen11,12, Masoud Mohseni1, Josh Mutus1, Ofer Naaman1, Matthew Neeley1, Charles Neill1, Murphy Yuezhen Niu1, Eric Ostby1, Andre Petukhov1, John C. Platt1, Chris Quintana1, Eleanor G. Rieﬀel3, Pedram Roushan1, Nicholas C. Rubin1, Daniel Sank1, Kevin J. Satzinger1, Vadim Smelyanskiy1, Kevin J. Sung1,13, Matthew D. Trevithick1, Amit Vainsencher1, Benjamin Villalonga1,14, Theodore White1, Z. Jamie Yao1, Ping Yeh1,

Adam Zalcman1, Hartmut Neven1, John M. Martinis1,5
1. Google AI Quantum, Mountain View, CA, USA, 2. Department of Electrical and Computer Engineering, University of Massachusetts Amherst, Amherst, MA, USA, 3. Quantum Artiﬁcial Intelligence Lab. (QuAIL), NASA Ames Research Center, Moﬀett Field, USA, 4. Institute for Quantum Information and Matter, Caltech, Pasadena, CA, USA, 5. Department of Physics, University of California, Santa Barbara, CA, USA, 6. Friedrich-Alexander University ErlangenNu¨rnberg (FAU), Department of Physics, Erlangen, Germany, 7. Quantum Computing Institute, Oak Ridge National Laboratory, Oak Ridge, TN, USA, 8. Department of Electrical and Computer Engineering, University of California, Riverside, CA, USA, 9. Scientiﬁc Computing, Oak Ridge Leadership Computing, Oak Ridge National Laboratory, Oak Ridge, TN, USA 10. Stinger Ghaﬀarian Technologies Inc., Greenbelt, MD, USA, 11. Institute for Advanced Simulation, Ju¨lich Supercomputing Centre, Forschungszentrum Ju¨lich, Ju¨lich, Germany, 12. RWTH Aachen University, Aachen, Germany, 13. Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA, 14. Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL, USA
ERRATUM
The caption of Figure 4 in the main paper [1] incorrectly states that the error bars in the ﬁgure represent both statistical and systematic uncertainty. They represent the statistical uncertainty. See Figure S41 for comparison of both types of uncertainty and discussion in Section VIII for details. Note that both types of uncertainty were accounted for in the analysis and all conclusions remain intact.

[1] Arute, F. et al. Quantum supremacy using a programmable superconducting processor. Nature 574, 505 (2019). URL https://doi.org/10.1038/ s41586-019-1666-5
[2] Barends, R. et al. Superconducting quantum circuits at the surface code threshold for fault tolerance. Nature 508, 500 (2014).
[3] Neill, C. A path towards quantum supremacy with superconducting qubits. Ph.D. thesis, University of California, Santa Barbara (2017).
[4] Yan, F. et al. Tunable coupling scheme for implementing high-ﬁdelity two-qubit gates. Phys. Rev. Applied 10, 054062 (2018).
[5] Chen, Y. et al. Qubit architecture with high coherence and fast tunable coupling. Phys. Rev. Lett. 113, 220502 (2014).
[6] Neill, C. et al. A blueprint for demonstrating quantum supremacy with superconducting qubits. Science 360, 195–199 (2018).
[7] Khezri, M., Dressel, J. & Korotkov, A. N. Qubit measurement error from coupling with a detuned neighbor in circuit QED. Phys. Rev. A 92, 052306 (2015).
[8] Tucci, R. R. An introduction to Cartans KAK decomposition for QC programmers. Preprint at https://arxiv.org/abs/quant-ph/0507171 (2005).

[9] Dunsworth, A. High ﬁdelity entangling gates in superconducting qubits. Ph.D. thesis, University of California, Santa Barbara (2018).
[10] Dunsworth, A. et al. A method for building low loss multi-layer wiring for superconducting microwave devices. Appl. Phys. Lett. 112, 063502 (2018).
[11] Rosenberg, D. et al. 3D integrated superconducting qubits. npj Quantum Inf. 3, 42 (2017).
[12] Foxen, B. et al. Qubit compatible superconducting interconnects. Quantum Sci. Tech. 3, 014005 (2017).
[13] Foxen, B. et al. High speed ﬂux sampling for tunable superconducting qubits with an embedded cryogenic transducer. Supercond. Sci. Technol. 32, 015012 (2018).
[14] Blais, A., Huang, R.-S., Wallraﬀ, A., Girvin, S. M. & Schoelkopf, R. J. Cavity quantum electrodynamics for superconducting electrical circuits: An architecture for quantum computation. Phys. Rev. A 69, 062320 (2004).
[15] Gambetta, J. et al. Qubit-photon interactions in a cavity: Measurement-induced dephasing and number splitting. Phys. Rev. A 74, 042318 (2006).
[16] Bultink, C. C. et al. General method for extracting the quantum eﬃciency of dispersive qubit readout in circuit QED. Appl. Phys. Lett. 112, 092601 (2018).

[17] Sank, D. et al. Measurement-induced state transitions in a superconducting qubit: Beyond the rotating wave approximation. Phys. Rev. Lett. 117, 190503 (2016).
[18] Clerk, A. A., Devoret, M. H., Girvin, S. M., Marquardt, F. & Schoelkopf, R. J. Introduction to quantum noise, measurement, and ampliﬁcation. Rev. Mod. Phys. 82, 1155–1208 (2010).
[19] Caves, C. M. Quantum limits on noise in linear ampliﬁers. Phys. Rev. D 26, 1817–1839 (1982).
[20] Mutus, J. Y. et al. Strong environmental coupling in a Josephson parametric ampliﬁer. Appl. Phys. Lett. 104, 263513 (2014).
[21] Ryan, C. A. et al. Tomography via correlation of noisy measurement records. Phys. Rev. A 91, 022118 (2015).
[22] Jeﬀrey, E. et al. Fast accurate state measurement with superconducting qubits. Phys. Rev. Lett. 112, 190504 (2014).
[23] Sank, D. What is the connection between analog signal to noise ratio and signal to noise ratio in the IQ plane in a quadrature demodulation system? Signal Processing Stack Exchange. URL https://dsp.stackexchange.com/questions/24372 (2015).
[24] Reed, M. D. et al. Fast reset and suppressing spontaneous emission of a superconducting qubit. Appl. Phys. Lett. 96, 203110 (2010).
[25] Sete, E. A., Martinis, J. M. & Korotkov, A. N. Quantum theory of a bandpass purcell ﬁlter for qubit readout. Phys. Rev. A 92, 012325 (2015).
[26] Chen, Y. et al. Multiplexed dispersive readout of superconducting phase qubits. Appl. Phys. Lett. 101, 182601 (2012).
[27] Boixo, S. et al. Characterizing quantum supremacy in near-term devices. Nat. Phys. 14, 595 (2018).
[28] Wootters, W. K. Random quantum states. Found. Phys. 20, 1365–1378 (1990).
[29] Emerson, J., Livine, E. & Lloyd, S. Convergence conditions for random quantum circuits. Phys. Rev. A 72, 060302 (2005).
[30] Aaronson, S. & Chen, L. Complexity-theoretic foundations of quantum supremacy experiments. In 32nd Computational Complexity Conference (CCC 2017) (2017).
[31] Magesan, E., Gambetta, J. M. & Emerson, J. Characterizing quantum gates via randomized benchmarking. Phys. Rev. A 85 (2012).
[32] Magesan, E., Gambetta, J. M. & Emerson, J. Robust randomized benchmarking of quantum processes. Phys. Rev. Lett. 106 (2011).
[33] Popescu, S., Short, A. J. & Winter, A. Entanglement and the foundations of statistical mechanics. Nat. Phys. 2, 754 (2006).
[34] Bremner, M. J., Mora, C. & Winter, A. Are random pure states useful for quantum computation? Phys. Rev. Lett. 102, 190502 (2009).
[35] Gross, D., Flammia, S. T. & Eisert, J. Most quantum states are too entangled to be useful as computational resources. Phys. Rev. Lett. 102, 190501 (2009).
[36] McClean, J. R., Boixo, S., Smelyanskiy, V. N., Babbush, R. & Neven, H. Barren plateaus in quantum neural network training landscapes. Nat. Comm. 9, 4812 (2018).
[37] Ledoux, M. The concentration of measure phenomenon. 89 (American Mathematical Society, 2005).
[38] Markov, I. L., Fatima, A., Isakov, S. V. & Boixo, S. Quantum supremacy is both closer and farther than it

65
appears. Preprint at https://arxiv.org/pdf/1807.10749 (2018). [39] Cross, A. W., Bishop, L. S., Sheldon, S., Nation, P. D. & Gambetta, J. M. Validating quantum computers using randomized model circuits. Phys. Rev. A 100, 032328 (2019). [40] Kelly, J., O’Malley, P., Neeley, M., Neven, H. & Martinis, J. M. Physical qubit calibration on a directed acyclic graph. Preprint at https://arXiv.org/abs/1803.03226 (2019). [41] Wallraﬀ, A. et al. Strong coupling of a single photon to a superconducting qubit using circuit quantum electrodynamics. Nature 431, 162 (2004). [42] Chen, Z. Metrology of quantum control and measurement in superconducting qubits. Ph.D. thesis, University of California, Santa Barbara (2018). [43] Chen, Z. et al. Measuring and suppressing quantum state leakage in a superconducting qubit. Phys. Rev. Lett. 116, 020501 (2016). [44] Klimov, P. V. et al. Fluctuations of energy-relaxation times in superconducting qubits. Phys. Rev. Lett. 121, 090502 (2018). [45] Kelly, J. et al. Optimal quantum control using randomized benchmarking. Phys. Rev. Lett. 112, 240504 (2014). [46] Bialczak, R. C. et al. Quantum process tomography of a universal entangling gate implemented with Josephson phase qubits. Nat. Phys. 6, 409 (2010). [47] Martinis, J. M. & Geller, M. R. Fast adiabatic qubit gates using only σz control. Phys. Rev. A 90, 022307 (2014). [48] DiCarlo, L. et al. Demonstration of two-qubit algorithms with a superconducting quantum processor. Nature 460, 240 (2009). [49] Kivlichan, I. D. et al. Quantum simulation of electronic structure with linear depth and connectivity. Phys. Rev. Lett. 120, 110501 (2018). [50] Villalonga, B. et al. A ﬂexible high-performance simulator for the veriﬁcation and benchmarking of quantum circuits implemented on real hardware. npj Quantum Information 5, 1 (2019). [51] Wallman, J., Granade, C., Harper, R. & Flammia, S. T. Estimating the coherence of noise. New J. Phys. 17, 113020 (2015). [52] Erhard, A. et al. Characterizing large-scale quantum computers via cycle benchmarking. Preprint at https://arxiv.org/pdf/1902.08543 (2019). [53] Johnson, J. E. et al. Heralded state preparation in a superconducting qubit. Phys. Rev. Lett. 109, 050506 (2012). [54] Sank, D. T. Fast, accurate state measurement in superconducting qubits. Ph.D. thesis, University of California, Santa Barbara (2014). [55] Experimental data repository URL https://doi.org/ 10.5061/dryad.k6t1rj8 [56] Babbush, R. et al. Low-depth quantum simulation of materials. Phys. Rev. X 8, 011044 (2018). [57] Boykin, P., Mor, T., Pulver, M., Roychowdhury, V. & Vatan, F. On universal and fault-tolerant quantum computing. Proc. 40th Annual Symposium on Foundations of Computer Science, IEEE Computer Society Press (1999). [58] DiVincenzo, D. P. Two-bit gates are universal for quantum computation. Phys. Rev. A 51, 1015–1022 (1995).

66

[59] Shor, P. W. Scheme for reducing decoherence in quan-

tum computer memory. Phys. Rev. A 52, R2493(R)

(1995).

[60] Knill, E. et al. Randomized benchmarking of quantum

gates. Phys. Rev. A 77, 012307 (2008).

[61] Lehmann, E. L. & Romano, J. P. Testing Statistical

Hypotheses (Springer-Verlag New York, 2005).

[62] Jones, E., Oliphant, T., Peterson, P. et al. SciPy:

Open source scientiﬁc tools for Python (2001). URL

http://www.scipy.org/ (2016).

[63] R Core Team. R: A Language and Environment for Sta-

tistical Computing. R Foundation for Statistical Com-

puting, Vienna, Austria (2017). URL https://www.R-

project.org/

[64] Efron, B. Bootstrap methods: Another look at the jack-

knife. Ann. Stat. 7, 1 (1979).

[65] Dyakonov, M. The case against quantum computing.

IEEE Spectrum 56, 3, 24-29 (2019).

[66] Kalai, G. The argument against quantum computers.

Preprint at https://arxiv.org/pdf/1908.02499 (2019).

[67] Smelyanskiy, M., Sawaya, N. P. & Aspuru-

Guzik, A. qHiPSTER: the quantum high perfor-

mance software testing environment. Preprint at

https://arxiv.org/pdf/1601.07195 (2016).

[68] Villalonga, B. et al. Establishing the quantum

supremacy frontier with a 281 Pﬂop/s simulation.

Preprint at https://arxiv.org/pdf/1905.00444 (2019).

[69] Markov, I. L. & Shi, Y. Simulating quantum computa-

tion by contracting tensor networks. SIAM J. Comput.

38, 963–981 (2008).

[70] Boixo, S., Isakov, S. V., Smelyanskiy, V. N. &

Neven, H. Simulation of low-depth quantum circuits

as complex undirected graphical models. Preprint at

https://arxiv.org/pdf/1712.05384 (2017).

[71] Lyakh, D.

Tensor algebra library rou-

tines for shared memory systems.

URL

https://github.com/DmitryLyakh/TAL SH (2019).

[72] Chen, J. et al.

Classical simulation of

intermediate-size quantum circuits. Preprint at

https://arxiv.org/pdf/1805.01450 (2018).

[73] Guo, C. et al. General-purpose quantum cir-

cuit simulator with projected entangled-pair states

and the quantum supremacy frontier. Preprint at

https://arxiv.org/pdf/1905.08394 (2019).

[74] De Raedt, K. et al. Massively parallel quantum com-

puter simulator. Comput. Phys. Commun. 176, 121–136

(2007).

[75] De Raedt, H. et al. Massively parallel quantum com-

puter simulator, eleven years later. Comput. Phys. Com-

mun. 237, 47–61 (2019).

[76] Krause, D. & Tho¨rnig, P. Jureca: Modular supercom-

puter at Ju¨lich supercomputing centre. Journal of large-

scale research facilities JLSRF 4, 132 (2018).

[77] Krause, D. Juwels: Modular tier-0/1 supercomputer at

the Ju¨lich supercomputing centre. Journal of large-scale

research facilities JLSRF 5, 135 (2019).

[78] Kalai, G. & Kindler, G. Gaussian noise sensitivity and

bosonsampling. arXiv:1409.3093 (2014).

[79] Bremner, M. J., Montanaro, A. & Shepherd, D. J.

Achieving quantum supremacy with sparse and noisy

commuting quantum computations. Quantum 1, 8

(2017).

[80] Yung, M.-H. & Gao, X. Can chaotic quantum

circuits maintain quantum supremacy under noise?

arXiv:1706.08913 (2017). [81] Boixo, S., Smelyanskiy, V. N. & Neven, H. Fourier anal-
ysis of sampling from noisy chaotic quantum circuits. arXiv:1708.01875 (2017). [82] Nahum, A., Vijay, S. & Haah, J. Operator spreading in random unitary circuits. Phys. Rev. X 8, 021014 (2018). [83] Von Keyserlingk, C., Rakovszky, T., Pollmann, F. & Sondhi, S. L. Operator hydrodynamics, OTOCs, and entanglement growth in systems without conservation laws. Phys. Rev. X 8, 021013 (2018). [84] Gogate, V. & Dechter, R. A complete anytime algorithm for treewidth. In Proceedings of the 20th Conference on Uncertainty in Artiﬁcial Intelligence, 201–208 (2004). [85] Zhang, F. et al. Alibaba cloud quantum development kit: Large-scale classical simulation of quantum circuits. Preprint at https://arxiv.org/pdf/1907.11217 (2019). [86] Chen, M.-C. et al. Quantum teleportation-inspired algorithm for sampling large random quantum circuits. Preprint at https://arxiv.org/pdf/1901.05003 (2019). [87] We assume 2 vCPUs per core. [88] Koomey, J. & Naﬀziger, S. Energy eﬃciency of computing: What’s next? Electronic Design. URL https://www.electronicdesign.com/microprocessors/ energy-eﬃciency-computing-what-s-next (2016). [89] Preskill, J. Quantum computing and the entanglement frontier. Rapporteur talk at the 25th Solvay Conference on Physics, Brussels (2012). [90] Aaronson, S. & Arkhipov, A. The computational complexity of linear optics. In Proceedings of the Fortythird Annual ACM Symposium on Theory of Computing, 333–342 (2011). [91] Bremner, M. J., Montanaro, A. & Shepherd, D. J. Average-case complexity versus approximate simulation of commuting quantum computations. Phys. Rev. Lett. 117, 080501 (2016). [92] Bouland, A., Feﬀerman, B., Nirkhe, C. & Vazirani, U. On the complexity and veriﬁcation of quantum random circuit sampling. Nat. Phys. 15, 159 (2019). [93] Movassagh, R. Cayley path and quantum computational supremacy: A proof of average-case #P-hardness of random circuit sampling with quantiﬁed robustness. Preprint at https://arxiv.org/pdf/1909.06210 (2019). [94] Bremner, M. J., Jozsa, R. & Shepherd, D. J. Classical simulation of commuting quantum computations implies collapse of the polynomial hierarchy. Proc. Royal Soc. A 467, 459–472 (2010). [95] Harrow, A. W. & Montanaro, A. Quantum computational supremacy. Nature 549, 203 (2017). [96] Terhal, B. M. & DiVincenzo, D. P. Adaptive quantum computation, constant depth quantum circuits and Arthur-Merlin games. Quant. Inf. Comp. 4, 134–145 (2004). [97] Harrow, A. W. & Mehraban, S. Approximate unitary t-designs by short random quantum circuits using nearest-neighbor and long-range gates. Preprint at https://arxiv.org/pdf/1809.06957 (2018). [98] Arora, S. & Barak, B. Computational complexity: a modern approach (Cambridge University Press, 2009). [99] Huang, C., Newman, M. & Szegedy, M. Explicit lower bounds on strong quantum simulation. Preprint at https://arxiv.org/pdf/1804.10368 (2018). [100] Dalzell, A. M., Harrow, A. W., Koh, D. E. & La Placa, R. L. How many qubits are needed

67

for quantum computational supremacy? Preprint at

https://arxiv.org/pdf/1805.05224.pdf (2018).

[101] Morimae, T. & Tamaki, S. Fine-grained quantum

supremacy of the one-clean-qubit model. Preprint at

https://arxiv.org/pdf/1901.01637 (2019).

[102] Aaronson, S. The equivalence of sampling and search-

ing. Theory of Computing Systems 55, 281–298 (2014).

[103] The variational distance is deﬁned as dVD(p, q) :=

1 2

x |p(x) − q(x)| where the sum is over all output bit-

strings x.

[104] Aaronson, S. Certiﬁable randomness from supremacy.

Manuscript in preparation. (2019).

[105] Calabro, C., Impagliazzo, R. & Paturi, R. The com-
plexity of satisﬁability of small depth circuits. In Inter-
national Workshop on Parameterized and Exact Com-
putation, 75–85 (Springer, 2009).
[106] In terms of depth, the current construction gives a circuit U on n qubits of depth n3k/2+5/2 for which it takes time 2(1−2/k)n to estimate the transition probability to additive error 2−(n+1), assuming a form of SETH stating that it takes time no less than 2(1−2/k)n to solve
k-SAT.