zotero-db/storage/X4J6C88U/.zotero-ft-cache

6128 lines
267 KiB
Plaintext
Raw Normal View History

Supplementary information for “Quantum supremacy using a programmable superconducting processor” [1]
Google AI Quantum and collaborators† (Dated: January 1, 2020)
arXiv:1910.11333v2 [quant-ph] 28 Dec 2019
CONTENTS
I. Device design and architecture
2
II. Fabrication and layout
2
III. Qubit control and readout
3
A. Control
3
B. Readout
3
IV. XEB theory
5
A. XEB of a small number of qubits
5
B. XEB of a large number of qubits
7
C. Two limiting cases
8
D. Measurement errors
9
V. Quantifying errors
9
VI. Metrology and calibration
11
A. Calibration overview
11
1. Device registry
11
2. Scheduling calibrations: “Optimus”
12
B. Calibration procedure
12
1. Device configuration
12
2. Root config: procedure
13
3. Single-qubit config: procedure
13
4. Optimizing qubit operating frequencies 13
5. Grid config: procedure
14
C. Two-qubit gate metrology
15
1. The natural two-qubit gate for transmon
qubits
15
2. Using cross entropy to learn a unitary
model
16
3. Comparison with randomized
benchmarking
16
4. Speckle purity benchmarking (SPB) 18
5. “Per-layer” parallel XEB
19
D. Grid readout calibration
20
1. Choosing qubit frequencies for readout 20
2. Single qubit calibration
20
3. Characterizing multi-qubit readout
21
E. Summary of system parameters
22
VII. Quantum circuits
27
A. Background
27
B. Overview and technical requirements
27
C. Circuit structure
27
D. Randomness
27
E. Quantum gates
28
F. Programmability and universality
29
1. Decomposition of CZ into fSim gates 29
2. Universality for SU(2)
30
G. Circuit variants
30
1. Gate elision
31
2. Wedge formation
31
VIII. Large scale XEB results
31
A. Limitations of full circuits
32
B. Patch circuits: a quick performance
indicator for large systems
33
C. Elided circuits: a more rigorous
performance estimator for large systems 33
D. Choice of unitary model for two-qubit
entangling gates
34
E. Understanding system performance: error
model prediction
35
F. Distribution of bitstring probabilities
36
G. Statistical uncertainties of XEB
measurements
39
H. System stability and systematic
uncertainties
40
I. The fidelity result and the null hypothesis
on quantum supremacy
41
IX. Sensitivity of XEB to errors
42
X. Classical simulations
44
A. Local Schr¨odinger and
Schr¨odinger-Feynman simulators
44
B. Feynman simulator
45
C. Supercomputer Schr¨odinger simulator
49
D. Simulation of random circuit sampling with
a target fidelity
50
1. Optimality of the Schmidt decomposition
for gates embedded in a random circuit 51
2. Classical speedup for imbalanced gates 52
3. Verifiable and supremacy circuits
53
E. Treewidth upper bounds and variable
elimination algorithms
54
F. Computational cost estimation for the
sampling task
56
G. Understanding the scaling with width and
depth of the computational cost of
verification
57
1. Runtime scaling formulas
57
2. Assumptions and corrections
57
3. Fitting constants
58
4. Memory usage scaling
59
H. Energy advantage for quantum computing 59
XI. Complexity-theoretic foundation of the
experiment
59
2
A. Error model
61
B. Definition of computational problem
61
C. Computational hardness of unbiased-noise
sampling
62
D. Proof of Theorem 1
62
Acknowledgments
64
Erratum
64
References
64
I. DEVICE DESIGN AND ARCHITECTURE
The Sycamore device was designed with both the quantum supremacy experiment [1] and small noisy intermediate scale quantum (NISQ) applications in mind. The architecture is also suitable for initial experiments with quantum error correction based on the surface code. While we are targeting 0.1% error two-qubit gates for error correction, a quantum supremacy demonstration can be achieved with 0.3-0.6% error rates.
For decoherence-dominated errors, a 0.1% error means a factor of about 1000 between coherence and gate times. For example, a 25 µs coherence time implies a 25 ns gate. A key design objective in our architecture is achieving short two-qubit gate time, leading to the choice of tunable transmon qubits with direct, tunable coupling.
A difficult challenge for achieving a high-performance two-qubit gate is designing a sufficiently strong coupling when the gate is active, which is needed for fast gates, while minimizing the coupling otherwise for low residual control errors. These two competing requirements are difficult to satisfy with a fixed-coupling architecture: our prior processors [2] used large qubit-qubit detuning (1 GHz) to turn off the effective interaction, requiring relatively high-amplitude precise flux pulses to tune the qubit frequencies to implement a CZ gate. In the Sycamore device, we use adjustable couplers [3] as a natural solution to this control problem, albeit at the cost of more wiring and control signals. This means that the qubits can idle at much smaller relative detuning. We chose a capacitor-coupled design [3, 4], which is simpler to layout and scale, over the inductor-based coupler of previous gmon devices [5, 6]. In Sycamore, the coupling g is tunable from 5 MHz to 40 MHz. The experiment uses on coupling of about 20 MHz.
By needing only small frequency excursions to perform a two-qubit gate, the tunable qubit can be operated much closer to its maximum frequency, thus greatly reducing flux sensitivity and dephasing from 1/f flux noise. Additionally, the coupling can be turned off during measurement, reducing the effect of measurement crosstalk, a phenomenon that has shown to be somewhat difficult to understand and minimize [7].
The interaction Hamiltonian of a system of onresonance transmons with adjustable coupling (truncated
to the qubit levels) has the following approximate form,
Hint(t) ≈
gij (t) (σi+σj
+
σiσj+)
+
gi2j (t) |η|
σiz σjz
,
(1)
i,j
where gij is the nearest neighbor coupling, η is the nonlinearity of the qubits (roughly constant), i and j index nearest-neighbor qubit pairs, and σ± = (σx ±iσy)/2. We pulse the coupling in time to create coupling gates.
Our two-qubit gate can be understood using Cartan decomposition [8], which enables an arbitrary twoqubit gate to be decomposed into four single-qubit gates around a central two-qubit gate that can be described by a unitary matrix describing only XX, YY and ZZ interactions, with 3 parameters indicating their strengths. For the physical interaction describing our hardware, we see a swapping interaction between the |01 and |10 qubits states, corresponding to an XX+YY interaction. Interaction of the qubit |11 state with the |2 states of the data transmons produce a phase shift of that state, corresponding to a ZZ interaction. By changing the qubit frequencies and coupling strength we can vary the magnitude of these interactions, giving net control of 2 out of the 3 possible parameters for an arbitrary gate.
II. FABRICATION AND LAYOUT
Our Sycamore quantum processor is configured as a diagonal array of qubits as seen in the schematic of Fig. 1 in the main text. The processor contains 142 transmon qubits, of which 54 qubits have individual microwave and frequency controls and are individually read out (referred to as qubits). The remaining 88 transmons are operated as adjustable couplers remaining in their ground state during the algorithms (referred to as couplers).
The qubits consist of a DC SQUID sandwiched between two metal islands, operating in the transmon regime. An on-chip bias line is inductively coupled to the DC SQUID, which allows us to tune qubit frequency by applying control fluxes into the SQUID loop. For regular operations, we tune qubits through a small frequency range (< 100 MHz). This corresponds to a relatively small control signal and makes qubit operation less sensitive to flux crosstalk.
Each pair of nearest-neighbor qubits are coupled through two parallel channels: direct capacitive coupling and indirect coupling mediated by coupler [3, 4, 9]. Both channels result in qubit-qubit coupling in the form of σixσjx + σiyσjy in the rotating frame, although with different signs. The indirect coupling is negative, given it is a second-order virtual process. The strength of the indirect coupling is adjusted by changing the coupler frequency with an additional on-chip bias line, giving a net zero qubit-qubit coupling at a specific flux bias.
The Sycamore processor consists of two die that we fabricated on separate high resistivity silicon wafers. The fabrication process, using aluminum on silicon, requires
3
FIG. S1. A photograph of a packaged Sycamore processor. The processor is shielded from the electromagnetic environment by a mu-metal shield (middle) and a superconducting aluminum cap, inside the mu-metal shield. The processor control wires are routed, through PCB circuit board, to coaxial connectors shown around the edge.
a total of 14 lithography layers utilizing both optical and electron beam lithography. Crosstalk and dissipation are mitigated through ground plane shielding [10]. After fabrication and die singulation, we use indium bump bonding [11, 12] of the two separate dies to form the Sycamore processor.
The Sycamore processor is connected to a 3-layer Alplated circuit board with Al wirebonds [13]. Each line is routed through a microwave connector to an individual coax cable. We shield the processor from stray light using a superconducting Al lid with black coating, and from magnetic fields using a mu-metal shield as shown in Fig. S1.
ules forms a >250-channel, phase-synchronous waveform generator. We have measured 20 ps of jitter between channels. The modules are mounted in 14-slot 6U rackmount chassis. A single chassis, shown in FIG. S4, can control approximately 15 qubits including their associated couplers and readout signals. A total of 4 chassis are used to control the entire Sycamore chip.
The DAC outputs are used directly for fast flux biasing the qubits and couplers required for two-qubit gates. Microwave control for single-qubit XY rotations and dispersive readout combine two DAC channels and a mixer module to form a microwave arbitrary waveform generator (Microwave AWG) via single-sideband upconversion in an IQ mixer as shown in Figure S2 a. The microwave AWG provides signals with arbitrary spectral content within ±350 MHz of the local oscillator (LO). A single LO signal is distributed to all IQ mixers so that all qubits XY controls are phase coherent. The mixer modules are mounted in the same chassis as the DAC modules. Each mixers I and Q port DC offsets are calibrated for minimum carrier leakage and the I and Q amplitudes and phases are calibrated to maximize image rejection.
Each DAC module contains an FPGA that provides a gigabit ethernet interface, SRAM to store waveform patterns, and sends the waveform data to the DAC modules 8 DACs. To optimize the use of SRAM, the FPGA implements a simple jump table to allow reusing or repeating waveform segments. A computer loads the desired waveforms and jump table onto each FPGA using a UDP-based protocol and then requests the first (master) FPGA to start. The start pulse is passed down the daisy chain causing the remainder (slave) DACs and ADCs to start.
B. Readout
III. QUBIT CONTROL AND READOUT
A. Control
Operating the device requires simultaneous synchronized control waveforms for each of the qubits and couplers. We use 54 coherent microwave control signals for qubit XY rotations, 54 fast flux bias lines for qubit frequency tuning, and 88 fast flux biases for the adjustable couplers. Dispersive readout requires an additional 9 microwave signals and phase sensitive receivers. A schematic of the room temperature electronics is shown in Fig. S2, and the cryogenic wiring is shown in Fig. S3.
Waveform generation is based on a custom-built multichannel digital to analog converter (DAC) module. Each DAC module provides 8 DACs with 14-bit resolution and 1 GS/s sample rate. Each DAC sample clock is synchronized to a global 10 MHz reference oscillator, and their trigger is connected by a daisy chain to synchronize all modules used in the experiment. This set of DAC mod-
Qubit state measurement and readout (hereafter “readout”) are done via the dispersive interaction between the qubit and a far-detuned harmonic resonator [1416]. A change in the qubit state from |0 to |1 causes a frequency shift of the resonator from ω|0 to ω|1 . A readout probe signal applied to the resonator at a frequency in between ω|0 and ω|1 reflects with a phase shift φ|0 or φ|1 that depends on the resonator frequency and therefore on the qubit state. By detecting the phase of the reflected probe signal we infer the qubit state. The readout probe signal is generated with the same microwave AWG as the XY control signals, but with a separate local oscillator, and is received and demodulated by the circuit shown in Figure S2 c.
The readout probe intensity is typically set to populate the readout resonator with only a few photons to avoid readout-induced transitions in the qubit [17]. Detecting this weak signal at room temperature with conventional electronics requires 100 dB of amplification. To limit the integration time to a small fraction of the qubit coherence time, the amplification chain must operate near the
4
a
MICROWAVE AWG
DAC module
DAC 0.3 GHz
FPGA
DAC 0.3 GHz DAC 0.3 GHz
DAC 0.3 GHz
LO
I
2 dB
Q
2 dB
2 dB
Mixer module
2 dB 7.5 GHz
LOW FREQ. AWG (FLUX)
b DAC card
PARAMP FLUX BIAS
PARAMP PUMP
c
Readout LO
FPGA
ADC ADC
MICROWAVE AWG
2 dB
2 dB
2 dB 2 dB
2 dB
Downmixer module
READOUT PROBE IN
READOUT PROBE OUT 3 dB
Amplifier module
FIG. S2. Control electronics. a, The custom DAC module provides 8 DAC channels (4 shown). DACs are used individually for flux pulses or in pairs combined with a mixer module to comprise a microwave AWG channel (dashed box). b, A single DAC channel and a microwave source are used to bias and pump the parametric amplifier for readout. c, Readout pulses are generated by a microwave AWG. The reflected signal is amplified, mixed down to IF, and then digitized in a pair of ADCs. The digital samples are analyzed in the FPGA.
Control (54 qubit + 88 couplers)
300 K
3 K
10 mK
Qubit flux (Z) 16dB
20dB
0.5 GHz
IR
Qubit μwave
(XY)
3 dB 7.5 GHz
20dB
20dB IR
IR
Coupler flux 10dB
20dB
0.5 GHz
IR
Readout (9X)
Readout in
7.5 GHz
10dB
20dB
20dB IR
20dB
(q
Sycamore chip
Qubit Coupler Readout
P P
20dB
Readout out
IR
IR
IMPA pump
20dB
IR
IMPA flux
10dB
20dB
0.5 GHz IR
300 K
3 K 10 mK
circulator
IR
IR filter
low-pass filter
IMPA
cryo-LNA bias tee
20dB
attenuator
band-pass filter
FIG. S3. Cryogenic wiring. Control and readout signals are carried to and from the Sycamore chip with a set of cables, filters, attenuators, and amplifiers.
FIG. S4. Electronics chassis. Each chassis supports 14 DAC and/or mixer modules. Local oscillators are connected at the top of each mixer module. A set of daisychain cables connects from each ADC module to the next. Control signals exit the chassis through coaxial cables.
quantum noise limit [18, 19]. Inside the cryostat the signal is amplified by an
impedance matched lumped element Josephson parametric amplifier (IMPA) [20] on the mixing chamber stage followed by a Low Noise Factory cryogenic HEMT amplifier at 3 K. At room temperature the signal is further amplified before it is mixed down with an IQ mixer producing a pair of intermediate frequency (IF) signals I(t) and Q(t). The IF signals are amplified by a pair of variable gain amplifiers to fine-tune their level, and then digitized by a pair of custom 1 GS/s, 8-bit analog to digital converters (ADC). The digitized samples In and Qn are processed in an FPGA which combines them into a complex phasor
zn = In + iQn = En exp(i(ωndt + φ))
where dt is the sample spacing, ω is the IF frequency, φ is the phase that depends on the qubit state, and En is the envelope of the reflected readout signal. The envelope is measured experimentally once and then used by the FPGA in subsequent experiments as the optimal demodulation window wn to extract the phase of the reflected readout signal [21, 22]. The FPGA multiplies zn by wn exp(iωndt), and then sums over time to produce a final complex value exp(iφ)
N 1
znwn exp(iωndt) ∝ exp(iφ)
n=0
In the absence of noise, the final complex value would always be one of two possible values corresponding to the qubit states |0 and |1 . However, the noise leads to Gaussian distributions centered at those two points. The size of the clouds is determined mostly by the noise of the
5
IMPA and cryogenic HEMT amplifier, while the separation between the clouds centers is determined by the resonator probe power and duration. The signal to noise ratio of the measurement is determined by the clouds separation and width [22, 23].
The 54 qubits are divided into nine frequency multiplexed readout groups of six qubits each. Within a group, each qubit is coupled to its own readout resonator, but all six resonators are coupled to a shared bandpass Purcell filter [22, 24, 25]. All qubits in a group can be read-out simultaneously by frequency-domain multiplexing [2, 26] in which the total probe signal is a superposition of probe signals at each of the readout resonators frequencies. The phase shifts of these superposed signals are independently recovered in the FPGA by demodulating the complex IQ phasor with each intermediate frequency. In other words, we know what frequencies are in the superposed readout signal and we compute the Fourier coefficients at those frequencies to find the phase of each reflected frequency component.
IV. XEB THEORY
We use cross entropy benchmarking (XEB) [6, 27] to calibrate general single- and two-qubit gates, and also to estimate the fidelity of random quantum circuits with a large number of qubits. XEB is based on the observation that the measurement probabilities of a random quantum state have a similar pattern to laser “speckles”, with some bitstrings more probable than others [28, 29]. The same holds for the output state of random quantum circuits. As errors destroy the speckle pattern, this is enough to estimate the rate of errors and fidelity in an experiment. Crucially, XEB does not require the reconstruction of experimental output probabilities, which would need an exponential number of measurements for increasing number of qubits. Rather, we use numerical simulations to calculate the likelihood of a set of bitstrings obtained in an experiment according to the ideal expected probabilities. Below we describe the theory behind this technique in more detail.
A. XEB of a small number of qubits
We first consider the use of XEB to obtain the error rate for single- and two-qubit gates. As explained above, for a two-qubit XEB estimation we use sequences of cycles, each cycle consisting of two sufficiently random single-qubit gates followed by the same two-qubit gate.
The density operator of the system after application of a random circuit U with m cycles can be written as a sum of two parts
ρU = εm |ψU ψU | + (1 εm)χU , D = 2n . (2)
Here |ψU = U |ψ0 is the ideal output state and χU is an operator with unit trace that along with εm describes
the effect of errors. For a depolarizing channel model χU = I/D and εm has the meaning of the depolarization fidelity after m cycles. Nevertheless, in the case of small number of qubits, the part of the operator χU has nonzero matrix elements between the states with no error and the states with the error. However, if we undo the evolution of each random circuit and average over an ensemble of circuits such cross-terms are averaged out and we expect
U †χU U
=
I D
.
(3)
Here and below we use the horizontal bar on the top to denote averaging over the ensemble of random circuits. Because of this property it is possible to establish the connection between the quantity εm and the depolarization fidelity after m cycles.
From Eqs. (2) and (3) we get
U †ρU U = εm |ψ0
I ψ0| + (1 εm) D .
(4)
This is a depolarizing channel. From this and the exponential decay of fidelity we get
εm = pm c ,
(5)
connecting εm to the depolarization fidelity pc per cycle. The noise model (2) is very general in the context of
random circuits. To provide some insight about the origin of this model we consider a specific case with pure systematic error in the two-qubit gate. In this case the resulting pure state after the application of the random circuit U˜ with the error can be expanded into the direction of the ideal state vector and the orthogonal direction
U˜ |ψ0 = ξm |ψU + 1 |ξm|2 |ϕU˜ ,
(6)
where
ψU |ϕU˜ = 0, ϕU˜ |ϕU˜ = 1 .
(7)
For the ensemble of random circuits U the error vector
is distributed completely randomly in the plane orthogo-
nal to the ideal vector U |ψ0 (see Fig. S5). This condition of orthogonality is the only constraint on the vector |ϕU˜ that involves |ψU . Therefore we expect
U † |ϕU˜
1
ϕU˜ | U
=
D
(I 1
|ψ0
ψ0|) .
(8)
Also
U † ξm 1 |ξm|2 |ψU ϕU˜ | + h.c U = 0 . (9)
This gives the connection between the error vector |ϕU˜ and the operator χU
(1
εm)χU
1
εm D
|ψU
ψU | = (1 |ξm|2) |ϕU˜
ϕU˜ |
+ ξm 1 |ξm|2 |ψU ϕU˜ | + h.c . (10)
6
<EFBFBD><EFBFBD><EFBFBD><EFBFBD>
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD> (α)
<EFBFBD><EFBFBD><EFBFBD><EFBFBD> <20><><EFBFBD><EFBFBD>
● θ = <20><><EFBFBD><EFBFBD> (<28><><EFBFBD><EFBFBD><EFBFBD>) ● θ = <20><><EFBFBD><EFBFBD> (<28><><EFBFBD><EFBFBD><EFBFBD>) ● θ = <20><><EFBFBD><EFBFBD> (<28><><EFBFBD><EFBFBD><EFBFBD>) ● θ = <20><><EFBFBD><EFBFBD> (<28><><EFBFBD><EFBFBD><EFBFBD>) ● θ = <20><><EFBFBD><EFBFBD> (<28><><EFBFBD><EFBFBD><EFBFBD>)
θ = <20><><EFBFBD><EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD> θ = <20><><EFBFBD><EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD> θ = <20><><EFBFBD><EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD> θ = <20><><EFBFBD><EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD> θ = <20><><EFBFBD><EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
FIG. S5. Cartoon: decomposition of the quantum state into the vector aligned with the ideal quantum state and its orthogonal complement
The resulting equation
|ξm|2
=
εm
+
1
εm D
(11)
is to be expected, because |ξm|2 is the average state fidelity while εm is the depolarization fidelity (see Sec. V). Note that Eqs. (8)(11) lead to Eq. (4). This result can also be derived assuming that single qubit gates form a 2-design in the Hilbert space of each qubit.
We demonstrate the above findings by numerically simulating the random circuits for 2 qubits that contains single qubit gates randomly sampled from Haar measure and ISWAP-like gate
1 0
0 0
V
(θ)
=
 
0 0
cos θ i sin θ
i sin θ cos θ
0 
0
.
(12)
00
01
The systematic error ∆θ = θ π/2 corresponds to the deviation of the swap angle from π/2. Then assuming that the single qubit gates are error free the depolarizing channel model gives the prediction for the depolarizing fidelity per cycle
|tr(V (θ)V †(π/2))|2 1
pc =
D2 1
1 = (8 cos(∆θ) + 2 cos(2∆θ) + 5) . (13)
15
As shown in Fig. S6 the depolarizing fidelity pm c for the circuit of depth m based on Eq. (13) closely matches the corresponding quantity obtained by the averaging of the squared overlap over the ensemble of random circuits (cf. (11)
εm
=
D |ξm|2 D1
1
.
(14)
<EFBFBD><EFBFBD><EFBFBD><EFBFBD> <20><> <20><> <20><> <20><> <20><> <20><> <20><> <20><><EFBFBD><EFBFBD><EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD> (<28>)
FIG. S6. Plots of the circuit depolarizing fidelity vs the circuit depth. Solid lines corresponds to the predictions from the depolarizing channel model (13) and points correspond to εm (14) obtained by the averaging of the squared overlap over the ensemble of random circuits. Different colored pots correspond to different values of the swap error ∆θ = 0.01(red), 0.02(blue), 0.03 (green), 0.04 (pink), 0.05 (black).
Returning to the generic case, property (3) can be extended so that for any smooth function f (u) the following relation holds
f (ps(q)) q| χU |q
q∈{0,1}n
=
f (ps(q)) +
D
q∈{0,1}n
, (15)
where |q is a computational basis state corresponding to bitstring q, and ps(q) = q| U ρ0U † |q is the simulated (computed) ideal probability of q. If the average is performed over a sample of √random circuits of size S then the correction is ∈ O(1/ S). We tested numerically for the case of n = 2 that relation (15) holds even for purely systematic errors in the case of a sufficiently random set of single qubit gates.
We now make the critical step of estimating the parameter pm c from a set of experimental realizations of random circuits with m cycles. We map each measured bitstring q with a function f (ps(q)) and then average this function over the measured bitstrings. The standard XEB [6, 27] uses the natural logarithm, f (ps(q)) = log(ps(q)). In the main text we use the linear version of XEB, for which f (ps(q)) = Dps(q) 1. Both these functions give higher values to bitstrings with higher simulated probabilities. Another closely related choice is the Heavy Output Generation test [30], for which f is a step-function.
Under the model (2), in an experiment with ideal state preparation and measurement, we obtain the bitstring q
7
with probability
pm c ps(q) + (1 pm c ) q| χU |q ,
(16)
For the linear XEB, the average value of Dps(q)1 when sampling with probabilities given by Eq. (16) is
Dps(q) 1 = pm c D ps(q)2 1 .
(17)
q
Similarly to Eq. (15), the horizontal bar denotes averag-
ing over the random circuits.
The sum on the right hand side of (17) goes over all bit-
strings in the computational basis, and can be obtained
with numerical simulations. It can also be found analyti-
cally assuming that the random circuit ensemble approxi-
mates the Haar measure where for a given q the quantity
ps(q) is distributed with the beta distribution function (D 1)(1 ps)D2. In this case the right hand side in (17) equals pm c (2D/(D + 1) 1).
The experimental average on th√e left hand side of (17) can be estimated with accuracy 1/ SNs using S random circuit realizations with Ns samples each
1 S Ns SNs j=1 i=1
Dpjs(qi,j ) 1
= Dps(q) 1 + O
1 √
SNs
. (18)
This gives an estimate of pm c . This estimate can be justified using Bayes rule. The
log-likelihood for a set of experimental measurements
{qi,j} assuming that the experimental probabilities are given by Eq. (16) is proportional to
S Ns
log 1 + pm c (Dpjs(qi,j) 1) ,
j=1 i=1
(19)
where pjs(q) is a simulated probability corresponding to the j-th circuit realization. We want to maximize the log-likelihood as a function of pm c . Taking the derivative with respect to pm c and equating to 0 we obtain
S j=1
Ns i=1
Dpjs(qi,j ) 1 1 + pm c (Dpjs(qi,j ) 1)
=
0
,
(20)
For pm c 1 it is easy to solve this equation and obtain the estimate
pm c
S j=1
S j=1
Ns i=1
Dpjs(qi,j ) 1
Ns i=1
Dpjs(qi,j ) 1 2
Dps(q) 1 . D q ps(q)2 1
(21)
In the spirit of the XEB method, we can use other functions f (ps(q)) to estimate pm c . One alternative is
derived from the log-likelihood of a sample {qi,j} with respect to the simulated (computed) ideal probabilities
S Ns
log ΠSj=1ΠNi=s1pjs(qi,j ) =
log pjs(qi,j ) ,
j=1 i=1
(22)
which converges to the cross entropy between experimental probabilities and simulated probabilities. The experimental average of the function f (ps(q)) = log ps(q) under the probabilities from Eq. (16) with additional averaging over random circuits is
log ps(q) pm c
(ps(q) 1/D) log ps(q)
q
1
+ D
log ps(q) .
(23)
q
As before, the sums on the right hand side can be ob-
tained with numerical simulations and the average value
on the left hand side can be estimated experimentally. This also gives an estimate of pm c .
Both Eq. (17) and Eq. (23) give a linear equation, from
which we can obtain an estimate of the total polarization pm c for an experimental implementation of one quantum circuit with m cycles. We normally use mutiple circuits with the same number of cycles m to estimate pm c , which we can do using the least squares method. Finally, we obtain an estimate of pc from a fit of the estimates pm c as an exponential decay in m. This is standard in randomized
benchmarking [31, 32]. One advantage of this method
is that it allows us to estimate the cycle polarization pc independently of the state preparation and measurement
errors (SPAM). See also below.
B. XEB of a large number of qubits
We now consider the case of a large number of qubits n 1. We are typically interested in estimating the fidelity F of each of a set of circuits with a given number of qubits and depth. As above, we write the output of an approximate implementation of the random quantum circuit U as
ρU = F |ψU ψU | + (1 F )χU ,
(24)
where |ψU is the ideal output and F = ψU | ρU |ψU is the fidelity. We do not necessarily assume χU = I/D, and we will ignore the small difference, of order 2n,
n 1, between the fidelity F and the depolarization
fidelity p.
As for the case of small number of qubits n, we map
each output bitstring q with a function f (ps(q)). Given that the values q| χU |q resulting from errors are typically uncorrelated with the chaotic “speckles” of ps(q), we make our main assumption
1
q| χU |q f (ps(q)) = D f (ps(q)) + . (25)
q
q
8
10 2
100 n. qubits = 16 n. qubits = 20
Linear XEB Logarithmic XEB
n. qubits = 24
10 1
10 3
10 2
XEB Fidelity Fidelity estimate
10 4 9
10 11 12
13
14
Depth
FIG. S7. Absolute value of the XEB fidelity between a random quantum circuit and the same circuit with a single Pauli error. Markers show the median over all possible positions in the circuit for both bit-flip and phase-flip errors. Error bars correspond√to the first and third quartile. The dashed lines are the 1/ D theory prediction.
This equation is trivial if we assume a depolarizing model,
χU = I/D. More generally, it can be understood in the geometric context of concentration of measure [3336] for
high dimensional spaces, and from Levys lemm√a [37] we expect a typical statistical fluctuation ∈ O(1/ D) with D = 2n. We will only require F . We check Eq. (25)
numerically for the output ρe = |ψe ψe| where |ψe is the wave function obtained after a single phase-flip or bit-
flip error is added somewhere in the circuit, see Fig. S7
and Ref. [27]. We have also tested this assumption nu-
merically compa√ring the fi√delity with the XEB estimate for a pure state F |ψU + 1 F |ψ⊥ , see also Ref. [38] and Section X.
From Eqs. (24) and (25) we obtain Eq. (17) for linear
XEB, f (ps(q)) = Dps(q) 1 (FXEB in the main text). We also obtain Eq. (23) for XEB, f (ps(q)) = log ps(q), with pm c replaced by fidelity F . As before, the sums on the right hand side can be obtained with numerical sim-
ulations and the average value on the left han√d side can be estimated experimentally with accuracy 1/ Ns using Ns samples. This gives an estimate of F .
In practice, circuits of enough depth (as in the exper-
iments reported here) exhibit the Porter-Thomas distri-
bution for the measurement probabilities p = {ps(q)}, that is
Pr(p) = DeDp .
(26)
In this case the linear cross entropy Eq. (17) gives
F = Dps(q) 1 .
(27)
The standard deviation of the estimate of F with Ns samples from the central limit theorem is
10 310 15 20 2N5umb3e0r of q3u5bits,4n0 45 50 55
FIG. S8. Comparison of fidelity estimates obtained using linear XEB, Eq. (27) and logarithmic XEB, Eq. (28) from bitstrings observed in our experiment using elided circuits (see Section VII G 1). Standard deviation smaller than markers.
(1 + 2F F 2)/Ns. The cross entropy Eq. (23) gives
F = log Dps(q) + γ ,
(28)
where γ is the Euler-Mascheroni constant ≈ 0.577. The standard deviation of the estimate of F with Ns samples is (π2/6 F 2)/Ns. The logarithmic XEB has a smaller standard deviation for F > 0.32 (it is the best estimate when F ≈ 1), while for F < 0.32 the linear XEB has a smaller standard deviation (it is the best estimate for F 1, where it relates to the maximum likelihood estimator). See Fig. S8 for comparison of the fidelity estimates produced by the linear and logarithmic XEB.
We note in passing another example for an estimator of F related to the HOG test [30] which counts the number of measured bitstrings with probabilities ps(q) greater than the median of the probabilities. The function f (ps(q)) in this case returns 1 for Dps(q) ≥ log(2), and 0 in the other case. The fidelity estimator uses the following normalization
1
F= log(2)
2ns(q) 1
,
(29)
where ns(q) is defined to be 1 if Dps(q) ≥ log(2), and 0 otherwise. The standard deviation of this estimator is [log2(2) F 2]/Ns, which is always larger than for
the XEB. See Fig. S9 for comparison of the fidelity estimates produced by linear XEB and the HOG-based fidelity estimator. HOG test is also related to a definition of quantum volume [39].
C. Two limiting cases
Here, we consider two special cases of Eq. (27) and the formula (1) in the main paper [1]. First, suppose
9
Substituting into equation (1) in the main paper yields
100
Linear XEB
FXEB = 1. The general case of a depolarizing error can be obtained from the two limiting cases by convex com-
Normalized HOG score
bination.
10 1
D. Measurement errors
Fidelity estimate
10 2
10 310 15 20 2N5umb3e0r of q3u5bits,4n0 45 50 55
FIG. S9. Comparison of fidelity estimates obtained using linear XEB, Eq. (27) and normalized HOG score, Eq. (29) from bitstrings observed in our experiment using elided circuits (see Section VII G 1). Standard deviation smaller than markers.
bitstrings qi are sampled from the uniform distribution. In this case the sampling probability is 1/D for every bitstring and FXEB = 0. Therefore, if the qubits are in the maximally mixed state, the estimator yields zero fidelity, as expected.
Second, suppose that bitstrings are sampled from the theoretical output distribution of a random quantum circuit. Assume that the distribution has Porter-Thomas shape. By Eq. (26), the fraction of bitstrings with theoretical probability in [p, p + dp] is
Pr(p) dp = DeDpdp
(30)
and the total number of such bitstrings is
N (p) dp = D2eDp dp.
(31)
Therefore, the probability that a bitstring with probability in [p, p + dp] is sampled equals
p · N (p) dp = pD2eDp dp = f (p) dp
(32)
where f (p) is the probability density function of the random variable defined as the ideal probability of a sampled bitstring, i.e. the random variable which is being averaged in the formula (1) of the main paper. Thus, the average probability of a sampled bitstring is
We now consider how measurement errors affect the
estimation of fidelity. Let us assume uncorrelated clas-
sical measurement errors, so that if the “actual” mea-
surement result of a qubit is 0, we can get 1 with prob-
ability em0, and similarly with probability em1 we get 0 for actual result 1, i.e., p(1|0) = em0, p(0|0) = 1 em0, p(0|1) = em1, p(1|1) = 1 em1. In this case the probability to get measurement result q = k1k2..kn for actual result q = k1k2..kn is the product of the corresponding factors. The probability of correct measurement result is
then
pm(q ) = (1 em0)n|q |(1 em1)|q | ≈ (1 em0)n/2(1 em1)n/2,
(34)
where |q | is the number of 1s (Hamming distance from 00..0) in the initial bitstring q , and in the second expression we approximated |q | with n/2 for large n.
Now let us make a natural assumption that if there was one or more measurement errors, q → q, then the resulting ideal probability ps(q) is uncorrelated with the actual ideal probability ps(q ). Using this assumption we can write
F = FU pm
(35)
where FU is the circuit fidelity and F is the complete (effective) fidelity. The complete fidelity F is estimated as before. The measurement fidelity pm can be obtained independently. For instance, we can prepare a bistring q and measure immediately to obtain the probability of a correct measurement result for q. We obtain pm by repeating this for a set of random bitstrings. We can therefore obtain FU from Eq. (35). As explained above, fitting the depolarization fidelity per cycle pc for different circuit depths m is also a method to separate measurement errors.
The state preparation errors can be treated similarly, assuming that a single error leads to uncorrelated resulting distribution ps(q), so that the measurement fidelity pm in Eq. (35) is combined with a similar factor describing the state preparation fidelity.
1
1
ps(q) = pf (p) dp = p2D2eDp dp
0
0
2 =
1 eD
D2 +D+1
D
2
2 ≈.
D
V. QUANTIFYING ERRORS
(33)
An important test for this experiment is predicting
XEB fidelity FXEB based on simpler measurements of
single- and two-qubit errors. Here we review how this is
10
calculated, illustrating important principles with the example of a single qubit. The general theory is described at the end of this section.
First, we assume Pauli errors describe decoherence using a depolarizing model. This model is used, for example, to compute thresholds and logical error rates for error correction. The parameter describing decoherence in a single qubit is the Pauli error eP , giving a probability eP /3 for applying an erroneous X, Y, or Z gate to the qubit after the gate, corresponding to a bit and/or phase flip.
Second, the depolarization model is assumed to describe the system state using simple classical probability. The probability of no error for many qubits and many operations, corresponding to no change to the system state, is then found by simply multiplying the probability of no error for each qubit gate. This is a good assumption for RB and XEB since a bit- or phase-flip error effectively decorrelates the state. The depolarization model assumes that when there is an error with probability ed, the system state randomly splits to all qubits states, which has Hilbert space dimension D = 2n. This is described by a change in density matrix ρ → (1 ed)ρ + ed × 11/D. Note the depolarization term has a small possibility of the state resetting back to its original state. For a single qubit where D = 2, this can be described using a Paulierror type model as a probability ed/4 applying a I, X, Y, or Z gate. Comparing to the Pauli model, the error probability thus needs to be rescaled by ed = eP /(1 1/D2). This gives a net polarization p of the qubit state due to many Pauli errors as
p = 1 eP (i)/ 1 1/D2 .
(36)
i
Third, the effect of this depolarization has to be accounted for considering the measured signal. The measured signal for randomized benchmarking is given by RB = p(1 1/D) + 1/D, which can be understood in a physical argument that a complete randomization of the state has a 1/D chance to give the correct final state. A cross-entropy benchmarking measurement gives FXEB = p. A measurement of p, which can have offsets and prefactors in these formulas, also includes other scaling factors coming from state preparation and measurement errors. All of these scaling issues are circumvented by applying gates in a repeated number of cycles m such that p = pm c . A measurement of the signal versus m can then directly pull out the fractional polarization change per cycle, pc, independent of these scale factors.
Fourth, from this polarization change we can then compute the Pauli error, which is the metric that should be reported since it is the fundamental error rate that is independent of D. Unfortunately, a fidelity 1 eP /(1 + 1/D) for RB is commonly reported, which has a Ddependent correction. We recommend this practice be changed, but note that removing the 1/(1 + 1/D) factor decreases the reported fidelity value. We also recommend reporting Pauli error, eP instead of entanglement fidelity
(1eP ), since it is more intuitive to understand how close some quantity is to 0 than to 1. Table I summarizes the different error metrics and their relations.
This general model can also account for nondepolarizing errors such as energy decay, since quantum states in an algorithm typically average over the entire Bloch sphere (as in XEB), or for example when the algorithm purposely inserts spin-echoes. Thus the average effect of energy decay effectively randomizes the state in a way compatible with Pauli errors. For a gate of length tg with a qubit decay time T1, averaging over the Bloch sphere (2 poles and 4 equator positions) gives (to first order) an average error probability ea = tg/3T1. Using Table I, this converts to a Pauli error eP = tg/2T1.
A detailed theory of the D scaling factor is as follows. In order to arrive at a first order estimate on how error rates accumulate on random quantum circuits, the errors can be modeled via the set of Kraus operators. The density matrix of the system ρ after application of a gate is connected to the density matrix ρ0 before the gate as follows:
K
ρ = Λ(ρ0) = Akρ0A†k,
k=0
A†kAk = 11.
k
(37)
For the closed-system quantum evolution with unitary U (no dephasing nor decay) the sum on the right hand side contains only one term with k=0 and A0 = U . In general, Kraus operators describe the physical effects of many types of errors (control error, decoherence, etc.) that can explicitly depend on the gate. Knowing the Kraus operators allows us to calculate the total error budget as well as its individual components.
Conventionally, circuit fidelities are reported as a metric of its quality. To make a connection to physically observable quantities, the average fidelity can be expressed in terms of Kraus operators. In the absence of leakage errors and cross-talk the average fidelity equals
F = 1 eP , 1 + 1/D
1 eP = 1 D2
K
| tr(U A†k)|2
(38)
k=0
where D = 2n is the dimension of the Hilbert space and the quantity eP plays a role of a Pauli error probability in the depolarizing channel model (see below).
For random circuits the effects of errors can be described by a depolarizing channel model, with Kraus operators of the form
Ak =
eP D2
1
PkU,
k = 0,
(39)
A0 = 1 eP P0U,
Pk = σk1 ⊗ σk2 . . . ⊗ σkn
where Pk are strings of Pauli operators σkj for individual qubits for kj = 1, 2, 3 and also identity matrices σ0 in the qubit subspace for kj = 0. This form assumes that individual Pauli errors all happen with the same probability
eP .
11
TABLE I. A Rosetta stone translation between error metrics. In single- and two-qubit RB or XEB experiments, we measure the per-gate (or per-cycle) depolarization decay constant p. The second column shows conversions from this rate to the various error metrics. The last two columns are representative comparisons for 0.1% Pauli error.
Error metric Pauli error (ep, rP ) a Average error (ea, r) Depolarization error (ed)
Relation to depolarization decay constant p (1 p)(1 1/D2) (1 p)(1 1/D) 1p
n=1 (D=2) 0.1%
0.067% 0.133%
n=2 (D=4) 0.1% 0.08%
0.107%
a 1 process fidelity, or 1 entanglement fidelity
To make a connection to experimental measurements of the cross-entropy we substitute (39) into (37) and obtain
Λ(ρ0) = (1 eP )U ρ0U 1
+ eP
11 U ρ0U 1 . (40)
D 1/D
D
We compare this expression with the standard form of the depolarizing channel model
Λ(ρ0)
=
pU ρ0U 1
11 + (1 p)
D
,
(41)
expressed in terms of the depolarization fidelity parameter p. Note the difference between the expressions. On the one hand, in (41) the second term corresponds to full depolarization in all directions. On the other hand, in (40) the second term describes full depolarization in all directions except for the direction corresponding to the ideal quantum state.
From (40), (41) one can establish the connection between the Pauli error rate and depolarizing fidelity parameter p
eP = (1 p)(1 1/D2)
(42)
We note that the explicit assumption of connecting Pauli errors to depolarization is needed for the small D case, typically for single- and two-qubit error measurements. Once we have measured the Pauli errors, then only a simple probabilistic calculation is needed to compute FXEB in the large D case.
VI. METROLOGY AND CALIBRATION
A. Calibration overview
Quantum computations are physically realized through the time-evolution of quantum systems steered by analog control signals. As quantum information is stored in continuous amplitudes and phases, these control signals must be carefully chosen to achieve the desired result. Calibration is the process of performing a series of experiments on the quantum system to learn optimal control parameters.
Calibration is challenging for a number of reasons. Analog control requires careful control-pulse shaping as any deviation from the ideal will introduce error. Qubits require individual calibration as variations in the control system and qubits necessitate different control parameters to hit target fidelities. Optimal control parameters can also drift in time, requiring calibrations to be revisited to maintain performance. Additionally, the full calibration procedure requires bootstrapping: using a series of control sequences with increasing complexity to determine circuit and control parameters to increasingly higher degrees of precision. Lastly, each qubit needs to perform a number of independent operations which are independently calibrated: single-qubit gates, two-qubit gates, and readout.
Our Sycamore processor offers a high degree of programmability: we can dynamically change the frequency of each qubit, as well as the effective qubit-qubit coupling between nearest neighbor qubits. This tunability gives us the freedom to enact many different control strategies, as well as account for non-uniformities in the processors parameters. However, these extra degrees of freedom are a double-edged sword. Additional control knobs always introduce a source of decoherence and control errors as well as an added burden on calibration.
Our approach is to systematize and automate our calibration procedure as much as possible, thus abstracting complexity away. This automation allows us to turn calibration into a science, where we can compare calibration procedures to determine optimal strategies for time, performance, and reliability. By employing calibration science to study full-system performance with different control strategies, we have been able to improve full-system fidelities by over an order of magnitude from initial attempts while decreasing the calibration time and improving reliability. Lastly, we design our calibration to be done almost entirely at the single- or two-qubit level, rather than at the system level, in order to be as scalable as possible.
1. Device registry
The device registry is a database of control variables and configuration information we use to control our quantum processors. The registry stores information such as operating frequencies, control biases, gate parameters
12
a
Active Inactive b
c
last two-qubit gate
last single-qubit gate
last readout first
single-qubit control coupler control
single-qubit gates two-qubit gates readout
FIG. S10. Optimus calibration graph for Sycamore. Calibration of physical qubits is a bootstrapping procedure between different pulse sequences or “experiments” to extract control and system parameters. Initial experiments are coarse and have interplay between fundamental operations and elements such as single-qubit gates, readout, and the coupler. Final experiments involve precise metrology for each of the qubit operations: single-qubit gates, two-qubit gates, and readout.
such as duration, amplitude, parameterization of circuit models, etc. The goal of calibration is to experimentally determine and populate the registry with optimal control parameters. We typically store >100 parameters per qubit to achieve high fidelity across all of the various qubit operations. The large number of parameters and subtle interdependencies between them highlights the need for automated calibration.
2. Scheduling calibrations: “Optimus”
We seek a strategy for identifying and maintaining optimal control parameters for a system of physical qubits given incomplete system information. To perform these tasks, we use the “Optimus” formulation as in Ref [40], where each calibration is a node in a directed acyclic graph that updates one or more registry parameters, and the bootstrapping nature of calibration sequences is represented as directed edges between nodes. Now, calibrating a system of physical qubits becomes a welldefined graph traversal problem. The calibration graph used for the Sycamore device can be see in Figure S10. This strategy is particularly useful for maintaining calibrations in the presence of drift, where we want to do the minimal amount of work to bring the system back in spec, and when extending the calibration procedure, as interdependencies are explicit. Typical timescales for bringup of a new Sycamore processor are approximately 36 hours upon first cooldown, and 4 hours per day thereafter for maintaining calibrations. These times are specific to current available technology, and can be significantly improved.
Root
Single qubits
Grid
FIG. S11. Configurations of the device over the course of calibration. a, In the root configuration, we start with no knowledge of the system and measure basic device parameters. b, We create a single qubit configuration for each qubit, where all qubits except the qubit of interest are biased to near zero frequency. c, Using knowledge learned in the single qubit configurations, we build a grid of qubits.
B. Calibration procedure
1. Device configuration
Throughout the calibration procedure, the device registry may be configured in different states in order to calibrate certain parameters. We call these different states “device configurations”, and different kinds of configurations reflect our knowledge of the system at different points in the full calibration procedure. As illustrated in Figure S11, the primary difference between the different configurations is the set of “active” qubits, where active qubits are biased to an operating frequency between 5-7 GHz, and “inactive” qubits are biased near zero frequency. Following the outline above, we have three device configurations of interest:
a. Root config. The root configuration is the starting state of the system immediately after cool down and basic system verification. In this configuration, we calibrate coarse frequency vs bias curves for each readout resonator, qubit, and coupler.
b. Single qubit config. After completing root calibrations, we now know how to bias each qubit to its minimum and maximum frequencies. We create one configuration of the device registry for each qubit, where the qubit of interest is biased in a useful region (5-7 GHz) and the remaining qubits are biased to their minimum frequencies in order to isolate the qubit of interest. In each of these configurations, we fine tune the bias vs frequency curves for the qubit and its associated couplers and resonators, and also measure T1 as a function of frequency, necessary due to background TLS defects and modes.
c. Grid config. After completing calibrations in each isolated qubit configuration, we feed the information we learned into a frequency optimization procedure. The optimizer places the biases for each qubit and coupler in a user defined grid of any desired size up to the entire chip. We then proceed to calibrate high fidelity
13
single qubit gates, two qubit gates, and readout.
2. Root config: procedure
We begin calibration with simple frequency-domain experiments to understand how each qubit and coupler responds to its flux bias line.
• Calibrate each parametric amplifier (flux bias, pump frequency, pump power).
• For each qubit, identify its readout resonator and measure the readout signal versus qubit bias (“Resonator Spectroscopy”) [41]. Estimate the resonator and qubit frequency as a function of qubit bias.
• For each coupler, place one of its qubits near maximum frequency and the other near minimum frequency, then measure the readout signal of the first qubit as a function of coupler bias. The readout signal changes significantly as the coupler frequency passes near the qubit frequency. Identify where the coupler is near its maximum frequency, so the qubit-qubit coupling is small (a few MHz) and relatively insensitive to coupler bias.
3. Single-qubit config: procedure
After setting the biases to isolate a single qubit, we follow the procedure outlined in [42] which we will summarize here:
• Perform fixed microwave drive qubit spectroscopy while sweeping the qubit bias and detecting shifts in the resonator response, to find the bias that places the qubit at the desired resonant frequency.
• Using the avoided level crossing identified in the root config, determine the operating bias to bring the qubit on resonance with its readout resonator to perform active ground state preparation. We use a 10 µs pulse consistent with the readout resonator ringdown time.
• Perform power Rabi oscillations to find the drive power that gives a π pulse to populate the |1 state.
• Optimize the readout frequency and power to maximize readout fidelity.
• Fine tune parameters (qubit resonant frequency, drive power, drive detuning [43]) for π and π/2 pulses.
• Calibrate the timing between the qubit microwave drive, qubit bias, and coupler bias.
• Perform qubit spectroscopy as a function of qubit bias to fine tune the qubit bias vs frequency curves.
• Measure T1 vs. frequency by preparing the qubit in |1 then biasing the qubit to a variable frequency for a variable amount of time, and measuring the final population [44].
• Measure the response of a qubit to a detuning pulse to calibrate the frequency-control transfer function [6, 42, 45].
With the single-qubits calibrated in isolation, we have a wealth of information on circuits parameters and coherence information for each qubit. We use this information as input to a frequency placement algorithm to identify optimal operating frequencies for when the full processor is in operation.
4. Optimizing qubit operating frequencies
In our quantum processor architecture, we can independently tune each qubits operating frequency. Since qubit performance varies strongly with frequency, selecting good operating frequencies is necessary to achieve high fidelity gates. In arbitrary quantum algorithms, each qubit operates at three distinct types of frequencies: idle, interaction, and readout frequencies. Qubits idle and execute single-qubit gates at their respective idle frequencies. Qubit pairs execute two-qubit gates near their respective interaction frequencies. Finally, qubits are measured at their respective readout frequencies. In selecting operating frequencies, it is necessary to mitigate and make nontrivial tradeoffs between energy-relaxation, dephasing, leakage, and control imperfections. We solve and automate the frequency selection problem by abstracting it into an optimization problem.
We construct a quantum-algorithm-dependent and gate-dependent optimization objective that maps operating frequencies onto a metric correlated with system error. The error mechanisms embedded within the objective function are parasitic coupling between nearestneighbor and next-nearest-neighbor qubits, spectrallydiffusing two-level-system (TLS) defects [44], spurious microwave modes, coupling to control lines and the readout resonator, frequency-control electronics noise, frequency-control pulse distortions, microwave-control pulse distortions, and microwave-carrier bleedthrough. Additional considerations in selecting readout frequencies are covered in Section VI D. The objective is constructed from experimental data and numerics, and the individual error mechanisms are weighted by coefficients determined either heuristically or through statistical learning.
Minimizing the objective function is a complex combinatorial optimization problem. We characterize the complexity of the problem by the optimization dimension and search space. For a processor with N qubits on a square lattice with nearest-neighbor coupling, there are N idle, N readout, and 2N interaction frequencies to optimize. In an arbitrary quantum algorithm, all
Ideal
+ Control Noise
+ Pulse Distortions
14
+ NN Parasitic Coupling
+ NNN Parasitic Coupling
+ TLS, Modes, Control Lines, ...
Idle Freq. (a.u.)
FIG. S12. Idle frequency solutions found by our Snake optimizer with different error mechanisms enabled. The optimizer makes increasingly complex tradeoffs as more error mechanisms are enabled. These tradeoffs manifest as a transition from a structured frequency configuration into an unstructured one. Similar tradeoffs are simultaneously made in optimizing interaction and readout frequencies. Optimized idle and interaction operating frequencies are shown in Figure S13 and optimized readout frequencies are shown in Figure S20. Color scales are chosen to maximize contrast. Grey indicates that there is no preference for any frequency.
frequencies are potentially intertwined due to coupling between qubits. Therefore, the optimization dimension is 4N . The optimization search-space is constrained by qubits circuit parameters and control-hardware specifications. Discretizing each qubits operational range to 100 frequencies results in an optimization search space of 1004N . This is much larger than the dimension of the Hilbert space of an N qubit processor, which is 2N .
Given the problem complexity, it is assumed that finding globally optimal operating frequencies is intractable. However, we have empirically verified that locally optimal solutions are sufficient for state-of-the-art system performance. To find local optima, we developed the “Snake” homebrew optimizer that combines quantum algorithm structure with physics intuition to exponentially reduce optimization complexity and take intelligent optimization steps. For the circuits used here, the optimizer exploits the time-interleaved structure of singlequbit gates, two-qubit gates, and readout. For our 53 qubit processor, it returns local optima in 10 seconds on a desktop. Because of its favorable scaling in runtime versus number of qubits, we believe the Snake optimizer is a viable long-term solution to the frequency selection problem.
To illustrate how the Snake optimizer makes tradeoffs between error mechanisms, we plot idle frequency solutions with different error mechanisms enabled (Figure S12). Starting with an ideal processor with no error mechanisms enabled, there is no preference for any frequency configuration. Enabling frequency-control electronics noise, the optimizer pushes qubits towards their respective maximum frequencies, to minimize fluxnoise susceptibility. Note that each qubit has a different maximum frequency due to fabrication variability. Enabling frequency-control pulse distortions forces a gradual transition between qubit frequencies to minimize two-qubit-gate frequency-sweep amplitudes. Enabling nearest-neighbor (NN) and next-nearest neighbor (NNN) parasitic coupling further lowers the degeneracy between qubit frequencies into a structure that resem-
bles a multi-tiered checkerboard. Finally, enabling errors from TLS defects, spurious microwave modes, and all other known error mechanisms removes any obvious structure. A set of optimized idle and interaction frequencies is shown in Figure S13, and readout frequencies are shown in Figure S20.
5. Grid config: procedure
Calibrating a grid of qubits follows the same procedure as calibrating an isolated qubit with additional calibrations to turn off the qubit-qubit coupling.
• Achieve basic state discrimination for each qubit at its desired frequency.
• For each coupler, minimize the qubit-qubit coupling (note changing coupler biases affects qubit frequencies). For each case below, we choose the coupler bias minimizing the interaction.
For qubit pairs idling within 60 MHz of each other, use a resonant swapping experiment. We excite one qubit and apply flux pulses to nominally put the qubits on resonance and let the qubits interact over time [9].
For qubit pair idling further apart, use a conditional phase experiment. We perform two Ramsey experiments on one qubit, where the other qubit is in the ground state and the excited state, to identify the state-dependent frequency shift of the first qubit.
• Adjust the qubit biases to restore the desired qubit frequencies and proceed with qubit calibration as in the single-qubit configurations.
• Calibrate the entangling gate.
Estimate the qubit pulse amplitudes to reach the desired interaction frequency with their frequency versus bias calibration.
a
Idle Frequencies (GHz)
15
Fine-tune the qubit pulse amplitudes to reach resonance, compensating for pulse undershoot.
Tune the coupler pulse amplitude to achieve a complete photon exchange.
In the next two sections, we describe in more detail the fine tuning required to achieve high fidelity two qubit gates and multiqubit readout.
C. Two-qubit gate metrology
High-fidelity two-qubit gates are very hard to achieve. In an effort to make this easier, we design qubits with tunable frequencies and tunable interactions. This added control allows for immense flexibility when implementing gates. In the following subsections, we discuss a simple high-fidelity control and metrology strategy for two-qubit gates in our system.
1. The natural two-qubit gate for transmon qubits
b
Interaction Frequencies (GHz)
FIG. S13. Optimized idle and interaction frequencies found by our Snake optimizer. a, Idle frequencies, b, interaction frequencies. Readout frequencies are shown in Figure S20. These solutions are sufficient for state-of-the-art system performance. See Figure S12 to understand some of the tradeoffs that are made during optimization. Color scales are chosen to maximize contrast.
Consider two transmon qubits at different frequencies (say 6.0 and 6.1 GHz). Here are two potential ways of generating a multi-qubit gate in this system. If the qubits are tuned into resonance, then excitations swap back-and-forth and this interaction can be modeled as a partial-iSWAP gate [46]. If the qubits are detuned by an amount close to their nonlinearity, then the 11state undergoes an evolution that can be modeled as a controlled-phase gate (assuming the population does not leak) [47, 48]. In fact, any two-qubit control sequence that does not leak can be modeled as a partial-iSWAP followed by a controlled-phase gate.
A typical control sequence is shown Fig. S14a. Gate times of 12 ns are chosen to trade off decoherence (too slow) and leakage to higher states of the qubit (too fast). Figure S14b depicts how this operation can be decomposed as a quantum circuit. This circuit contains Zrotations that result from the frequency excursions of the qubits, and can be expressed by the unitary:
1
0
0
0
0 ei(∆++∆−) cos θ iei(∆+−∆−,off ) sin θ 0 iei(∆++∆−,off ) sin θ ei(∆+−∆−) cos θ
0 0
 
.
0
0
0
ei(2∆+ −φ)
(43)
These gates have an efficient mapping to interacting
fermions and have been coined fSim gates, short for
fermionic simulation [49]. The long-term goal is to im-
plement the entire space of gates (shown in Fig. S14c).
For quantum supremacy, the two-qubit gate of choice is
the iSWAP gate. For example, CZ is less computationally
expensive to simulate on a classical computer by a factor
of two [38, 50]. A dominant error-mechanism when try-
ing to implement an iSWAP is a small conditional-phase
flux
cphase(ϕ) iSWAP(θ)†
a Qubit
Qubit Coupler b
Z1 Z2
12 ns
time
Z3
Z4
fSim(θ, ϕ)
c 90
iSWAP
Sycamore
SWAP
Swap angle θ (degs.)
45 sqrt(iSWAP)
CZ 0
0
90
180
Conditional phase ϕ (degs.)
FIG. S14. Two-qubit gate strategy. a, Control waveforms for two qubits and a coupler. Each curve represents the control flux applied to the qubits and couplers SQUID loops as a function of time. b, Generic circuit representation for an arbitrary two-qubit gate using flux pulses. This family of gates have been named “fSim” gates, short for fermionic-simulation gates. Our definition of the fSim gate uses θ with the sign opposite to the common convention for the iSWAP gate. c, Control landscape for fSim gates as a function of the swap angle and conditional phase, up to single qubit rotations. The coordinates of common entangling gates are marked along with the Sycamore gate fSim(θ = 90◦, φ = 30◦).
that is generated by an interaction of the |11 -state with higher states of the transmons (|02 and |20 ). For this reason, the fSim gate with swap-angle θ 90◦ and conditional phase φ 30◦ has become the gate of choice in our supremacy experiment. Note that small deviations from these angles are also viable quantum supremacy gates. These gates result from the natural evolution of two qubits making them easy to calibrate, high intrinsic fidelity gates for quantum supremacy.
2. Using cross entropy to learn a unitary model
We have recently introduced cross-entropy as a fidelity metric for quantum supremacy experiments. Cross-
16
entropy benchmarking (XEB) was introduced as an analog to randomized benchmarking (RB) that can be used with any number of qubits and is independent of statepreparation and measurement errors [6, 27].
A distinct advantage of XEB is that the resulting data can be analyzed to find an optimal representation of a unitary; this process is outlined in Fig. S15. The gate sequence for a two-qubit XEB experiment is shown in Fig. S15a. The sequence alternates between single-qubit gates on both qubits and a two-qubit gate between them. At the end of the sequence, both qubits are measured and the probabilities of bitstrings (00, 01, 10, 11) are estimated. This procedure is repeated for 10-20 instances of randomly selected single-qubit gates. The measured probabilities can then be compared to the ideal probabilities using the expression for fidelity Eq. (3) in Ref. [6].
The data from a two-qubit XEB experiment is shown in Fig. S15b (green dots). By performing additional sequences with tomography rotations prior to measurement, we can infer the decay of purity with increasing circuit depth (blue dots). For two qubits, the decay of fidelity tells us the total error of our gates while the purity decay tells us the contribution from decoherence —the difference is control error. Based on the data in green and blue, it appears that the total error is about half control and half decoherence.
So far, we have established a generic unitary model (Fig. S14b), a training dataset (Fig. S15a), and a costfunction (Fig. S15b). These three ingredients form the foundation for using optimization techniques to improve fidelity. Using a simple Nelder-Mead optimization protocol, we can maximize the XEB fidelity by varying the parameters of the unitary model. The fidelity decay curve for the optimal unitary model are shown in Fig. S15b (orange dots). The optimized results are nearly coherence limited.
The optimal control-model parameters for all pairs are shown as integrated histograms in Fig. S16a,b. Panel (a) shows the histograms for partial-iSWAP angles (90 degrees) and conditional phases (30 degrees). Panel (b) shows histograms for the various flavors of Z-rotations. While conceptually there are four possible Z-rotations (see Fig. S14b), only three of these rotations are needed to uniquely define the operation. These three rotations can be thought of as the detuning of the qubits before the iSWAP, the detuning after the iSWAP, and an overall frequency shift of both qubits which commutes with the iSWAP.
3. Comparison with randomized benchmarking
In Fig. S17 we show that two-qubit gate fidelity extracted using XEB agrees well with the fidelity as measured with RB, an important sanity check in validating XEB as a gate metrology tool. In two-qubit XEB, we extract the error per cycle which consists of a single-qubit gate on each qubit and a two-qubit gate between them.
a
basic calibrations
a
U(θ, ϕ)
XEB circuit
U(θ, ϕ)
U(θ, ϕ) U(θ, ϕ) U(θ, ϕ)
cycle 1
2
q1 q2
3
m
. . .
. . .
b
classical computer
quantum computer
optimization loop θ, ϕ cross entropy, error
Integrated histogram
17
Conditional phase ϕ Swap angle θ
30o
90o
Control angles (degs.)
Frequency shift (2Δ+)
Detuning before (Δ-+Δ-,off) Detuning after (Δ--Δ-,off)
Integrated histogram
XEB fidelity
b 1.0 0.8 0.6 0.4 0.2 0.0 0
U(θ', ϕ')
purity error = 0.541(3)% XEB error (optimized) = 0.62(2)%
XEB error = 1.06(5)%
200
400
600
800
Number of cycles
1000
FIG. S15. Using XEB to learn a unitary model. a, Process flow diagram for using XEB to learn a unitary model. After running basic calibrations, we have an approximate model for our two-qubit gate. Using this gate, we construct a random circuit that is fed into both the quantum computer and a classical computer. The results of both outputs can be compared using cross-entropy. Optimizing over the parameters in the two qubit model provide a high-fidelity representation of the two-qubit unitary. b, Data from a two-qubit XEB experiment. The two-qubit purity (blue) was measured tomographically and provides the coherence-limit of the operations. The decay of the XEB fidelity is shown in green and orange. In orange, the parameters of a generic unitary model were optimized to determine a higher-fidelity representation of the unitary. All errors are quoted as Pauli errors.
Control angles (degs.)
FIG. S16. Parameters of the control model. A generic model for two-qubit gates using flux-control has five free parameters. Using XEB we can measure these parameters with high fidelity. a, Integrated histogram (cumulative distribution) of the control parameters that determine the interaction between the qubits. b, An integrated histogram of the remaining three parameters that represent different flavors of single-qubit Z-rotations. While the first two parameters (panel a) define the entangling gate, the final three parameters (panel b) are simply measured and then kept track of during an algorithm. Intuitively, these three angles correspond to a detuning before the swap, a detuning after the swap, and an overall frequency shift which commutes through the swap; these correspond to ∆− +∆−,off , ∆− −∆−,off , and 2∆+ respectively in Eq. (43). Note that θ and φ angles are 360 degrees periodic and Z-rotation angles are 720 degree periodic.
In Fig. S17a we show the individual RB decay curves for single-qubit gates. In panel b, we show the RB decay curve for benchmarking a CZ gate. Adding up the three errors from RB, we would expect an XEB cycle error of 0.57%. In panel c, we show the measured XEB decay curve which indicates a cycle error of 0.59% —nearly identical to the value predicted by RB.
For single-qubit gate benchmarking on the Sycamore device used in this work (see Table II), we find that π pulse fidelities are somewhat worse than π/2 pulse fidelities, which we attribute to reflections from the imper-
fect microwave environment. Because the XEB gateset
a
we have used consists only of π/2 pulses, we find that
the single-qubit gate errors extracted from conventional
RB, which contains π pulses, are somewhat higher than
those extracted from single-qubit XEB. Using only π/2
pulses instead of π pulses in single-qubit RB brings the
extracted error close to that measured via XEB.
18
Single qubit RB
1Q gate error = 0.09% & 0.07%
Fidelity
4. Speckle purity benchmarking (SPB)
It is experimentally useful to be able to extract state purity from XEB experiments in order to error-budget the contribution of decoherence. Conventionally, purity estimation can be done with state tomography, where the full density matrix ρ is reconstructed and used to quantify the state purity. This involves expanding a single sequence into a collection of sequences each appended with single-qubit gates. Unfortunately, full tomographic reconstruction scales exponentially in the number of qubits, both for the number of sequences needed as well as the number of measurements needed per sequence. Here, we introduce an exponentially more efficient method to extract the state purity without additional sequences.
We use a re-scaled purity definition such that a fullydecohered state has a purity of 0, and a pure state has a purity of 1. We define
D Purity =
Tr(ρ2) 1 ,
(44)
D1
D
which is consistent with what is defined in Ref. [51]. This can be understood as the squared length of the generalized Bloch vector in D dimensions (for a qubit, D = 2, this definition gives X 2 + Y 2 + Z 2).
Speckle Purity Benchmarking (SPB) is the method of measuring the state purity from raw XEB data. Assuming the depolarizing-channel model with polarization parameter p, we can model the quantum state as
11
ρ = p |ψ ψ| + (1 p) .
(45)
D
Here, p is the probability of a pure state |ψ (which in this case is not necessarily known to us), while 1 p is the probability of being in the fully-decohered state (11 is the identity operator). For the state (45), from the definition (44) it is easy to find the relation
Purity = p2.
(46)
We will now work out how to obtain p2 from a distribution of measured probabilities Pm of various bitstrings for a sequence, collected over many XEB sequences (Figs. S18a and S18b).
First, we note that for p = 0 the probabilities of all bitstrings are 1/D, and the distribution is the δ-function located at 1/D (the integrated histogram is then the stepfunction see Fig. S18b). In contrast, if p = 1, then
Number of cliffords
b
Two qubit RB
reference interleaved CZ
CZ error = 0.41%
Fidelity
Number of cliffords
c
Two qubit XEB
XEB error = 0.59%
XEB Fidelity
Number of cycles
FIG. S17. Sanity check: XEB agrees with RB. a, Singlequbit randomized benchmarking (RB) data taken separately on two qubits. b, Two-qubit randomized benchmarking data for a CZ on the same pair of qubits. c, Two-qubit crossentropy benchmarking (XEB) on the same pair of qubits. The measured XEB error (0.59% / cycle) agrees well with the prediction from single- and two-qubit RB (0.57%). All errors are quoted as Pauli errors.
the measured probabilities Pm follow the D-dimensional Porter-Thomas distribution [27]
PPT(Pm) = (D 1)(1 Pm)D2,
(47)
19
which has the same average 1/D and variance
D1
VarPT(Pm)
=
D2(D
+
. 1)
(48)
For the fully-decohered state all bitstrings have the same probability 1/D, so in this case the variance of the distribution of probabilities is zero. For the state (45) with an arbitrary p, the histogram of probabilities Pm will be described by the distribution (47) shrunk towards the average 1/D by the factor p. Consequently, the variance of the experimental probabilities will be p2 times the Porter-Thomas variance (48).
Thus, we can find p2 by dividing the variance of experimentally measured probabilities Pm by the PorterThomas variance (48). Finally, using the relation (46) for the depolarization model (45), we can relate the variance of the experimental probabilities Pm to the average state purity
D2(D + 1)
Purity = Var(Pm) D 1 .
(49)
With these convenient relations, we√can directly compare the XEB fidelity FXEB = p to Purity from SPB on the same scale, and check their dependence p = pm c on the number of cycles m. Without systematic control errors, the XEB and SPB results should coincide. Experimentally, we always have control errors which lead us to incorrectly predict |ψ , so control errors give XEB a higher error than SPB. Thus, with a single XEB dataset we can extract the XEB error per-cycle, and the purity loss per-cycle with SPB. By subtracting these, we are left with the control error per-cycle. Thus, with a single experiment we can error budget total error into control error and decoherence error.
These relationships can be seen experimentally in Figure S18. Amazingly, computing the speckle purity can be done with no knowledge of the specific gate sequence performed; as long as the experiment introduces sufficient randomization of the Hilbert Space, Porter-Thomas statistics apply. Practically, SPB allows us to measure the state purity from raw XEB data with exponentially fewer number of pulse sequences as compared to full state tomography. This favorable scaling allows one to extend purity measurements to larger numbers of qubits. It is important to note that an exponential number of measurements are still required to fully characterize the probability distribution for a given sequence, as in tomography, so purity measurements of the full processor are impractical.
5. “Per-layer” parallel XEB
To execute quantum circuits efficiently, it is helpful to run as many gates as possible in parallel. We wish to benchmark our entangling gates operating simultaneously. Resulting fidelities and optimized unitaries may
Prob. a
P00 P01 P10 P11 b
c
FIG. S18. “Speckle” purity extracted from XEB. a, Measured probabilities from XEB for a two-qubit system and 30 random circuits. Raw probabilities show a speckle pattern at low cycles (orange dashed) over circuit instance and probabilities (|00 , |01 , |10 , |11 ). The speckle contrast decreases with cycles and thus decoherence (green dashed). b, Integrated histogram (cumulative distribution) of probabilities. The x-axis is scaled by the dimension D = 22, so the uniform distribution is a step function at 1.0. At low cycles, the distribution is well-described by Porter-Thomas, and at high cycles, the distribution approaches the uniform distribution. c, We can directly relate the variance of the distribution to the average state purity. We fit an exponential to the square root of Purity. We compare this purity-derived number per-cycle= 0.00276 to a similar number per-cycle=0.00282 derived from tomographic measure of purity, and see good agreement. The error of XEB, which also includes control errors, is slightly higher at error per-cycle=0.00349.
20
differ from the isolated case, where we benchmark each pair individually, due to imperfections such as control crosstalk and stray qubit-qubit interactions. In the quantum supremacy algorithm, we partition the set of twoqubit gates into four layers, each of which can be executed in parallel. We then cycle through these layers interleaved with randomly chosen single-qubit gates (see Fig. 3a). However, it is intractable to directly use fullsystem XEB to benchmark our entangling gates for two reasons: we would simultaneously optimize over the unitary model parameters of every entangling gate, and the classical simulation would be exponentially expensive in system size.
We solve this problem with “per-layer” parallel XEB (see Ref. [52] for a related technique in the context of RB). Instead of alternating among the four layers of entanglers, where each qubit becomes entangled with each of its neighbors, we perform four separate experiments, one for each layer. The experiment sequences are illustrated in Fig. S19a. For each layer, we construct parallel sequences where the layer is repeated with interleaved single-qubit gates; nominally, each qubit only interacts with one other. Following each parallel XEB sequence, we measure all the qubits and extract the equivalent XEB data for each pair. Every two-qubit gate can be characterized in these four experiments, regardless of system size. The optimization and classical simulation are also efficient, as each pair can be analyzed individually.
We present experimental results of “per-layer” parallel XEB in Fig. S19b-c. In Fig. S19b, we compare the performance in the isolated and simultaneous (parallel) experiments. In both cases, the optimized XEB error is close to purity-limited. Simultaneous operation modestly increases the error, by roughly 0.003. This increase is primarily from purity error, which would arise from unintended interactions with other qubits, where coherent errors at the system scale manifest as incoherent errors when we focus on individual pairs. The unitaries we obtain in the simultaneous case differ slightly from the isolated case, which would arise from control crosstalk and unintended interactions. To quantify how these differences affect the gate error, we recalculate the error with the unitaries from the isolated optimization and the data from the simultaneous experiment, which increases the error. We also plot the distributions of the differences in unitary model parameters in Fig. S19c. The dominant change is in ∆+, a single-qubit phase.
D. Grid readout calibration
1. Choosing qubit frequencies for readout
The algorithm described in Section VI B 4 generally chooses qubit idling frequencies which are far detuned from the resonator to optimize for dephasing. However, these idling frequencies are not optimal for performing readout. To address this problem, we dynamically bias
each qubit to a different frequency during the readout phase of the experiment. The qubit frequencies during readout are shown in Fig. S20 (compare to Fig. S13).
To choose the qubit frequencies for readout, we first measure readout fidelity as a function of qubit frequency and resonator drive frequency at a fixed resonator drive power, in each of the isolated single qubit configurations. This scan captures errors due to both non-optimal detuning between the qubit and resonator, as well as regions with low T1 values due to TLSs. We then use the data for each qubit and a few constraints to optimize the placement of the qubit frequencies during readout, using the same optimization technique that was described in Section VI B 4. We describe two of the important constraints and related error reduction techniques below.
First, because the coupling between qubits relies on a dispersive interaction with the coupler, the coupling would no longer be off when the qubits were detuned by a significant amount from their idling positions. Thus, we impose a constraint that qubits should not be placed near resonance during readout. Nevertheless, we found that for some pairs of qubits, we had to dynamically bias the coupler during readout to avoid any swapping transitions between the qubits during readout. This readout coupler bias is found by sweeping the coupler bias and maximizing the two-qubit readout fidelity.
Second, the pattern of the bare resonator frequencies on the chip as shown in Fig. S20 led to an unexpected problem. Pairs of readout resonators which were coupled to neighboring qubits and were also within a few MHz in frequency space were found to have non-negligible coupling. This coupling was strong enough to mediate swapping of photons from one resonator to the other. The pairs of qubits with similar resonator frequencies were all located in a diagonal chain bisecting the qubit grid, as shown by the red outline in Fig. S20. To mitigate this problem, we arrange the qubit frequencies for these qubits so that the resonator eigenfrequencies are as far apart as possible. The resulting spectral separation is not quite enough to eliminate all deleterious effects, so in addition, we use correlated discrimination on the eight of the qubits in this chain. In other words, we use the results of all eight detector values to determine which one of 28 = 256 states the eight qubits were in. All other qubits in the grid are discriminated as isolated qubits.
2. Single qubit calibration
After placing the qubit frequencies for readout, we calibrate and fine tune the readout parameters for each qubit. For each qubit, we use a 1 µs drive pulse and a 1 µs demodulation window. We summarize the procedure for choosing the remaining parameters as follows:
• Choose the resonator drive frequency to maximize the separation between measurements performed with the qubit in either |0 and |1 [16].
21
a
b
m
Layer 0
m
Layer 1
Layer 2
m
c
m
Layer 3
FIG. S19. Parallel XEB. a, Schematics of four device-wide sequences, one for each entangler layer. Black points are active qubits, colored circles are single-qubit gates, and colored lines are two-qubit gates. We cycle between single- and two-qubit gates m times. Compare to Fig. 3a, main text, where the layers are interleaved. b, Integrated histograms of Pauli error e2c (see Fig. 2a, main text). These include isolated results, where each entangler is measured in its own experiment, and simultaneous (parallel) results. Purity is speckle purity. c, Difference, δ, in unitary model parameters (Eq. 43) between the unitaries obtained in the isolated and simultaneous experiments. δ∆− is not plotted because it has a negligible effect on the unitary when θ ≈ 90 degrees.
• Choose the resonator drive power to hit a target separation between |0 and |1 , so that the error due to this separation is below a 0.3% threshold. We do not choose the readout power to maximize the separation as doing so would saturate our amplifiers, and cause unwanted transitions of the qubit state [17, 22, 53, 54].
• Find the optimal demodulation weight function by measuring the average detector voltage as a function of time during the course of the readout pulse [16, 21].
• Finally, choose the discrimination line between the measurement results for |0 and |1 , except as noted in the previous section where we need to apply correlated discrimination.
After completing these calibrations, we check each qubits readout fidelity by preparing either |0 or |1 and reading the qubit out. We define the identification error to be the probability that the qubit was not measured in the state we intended to prepare. We achieve 0.97% median identification error for the |0 state, and 4.5% for |1 , when each qubit is measured in isolation. The full
distribution is shown in dashed lines in Fig. S21a. We conjecture that the error in |0 is due to thermal excitation during preparation or measurement, and that the error in |1 is due to energy relaxation during readout.
3. Characterizing multi-qubit readout
To assess the fidelity of multi-qubit readout, we prepare and measure 150 random classical bitstring states with 53 qubits, with 3000 trials per state. We find that 13.6% of all trials successfully identified the prepared state. We can decompose this overall fidelity in two ways. First, we plot in solid lines in Fig. S21 the errors for each qubit during simultaneous readout, averaged over the 150 random bitstrings. We find that the median errors increase from 0.97% for |0 and 4.5% for |1 in isolation, to 1.8% and 5.1% for simultaneous readout. We do not yet understand the root causes of this increase in error. In addition, we show in Fig. S21 the distribution of errors among the multiqubit results. We see that the most likely error is one lost excitation in the measured state.
22
TABLE II. Aggregate system parameters
Parameter
Qubit maximum frequency Qubit idle frequency Qubit frequency at readout Readout drive frequency Qubit anharmonicity Resonator linewidth κ/2π Qubit-resonator coupling g/2π T1 at Idle Frequency Readout error |0 isolated / simultaneous Readout error |1 isolated / simultaneous 1Q RBa e1 1Q RBa e1 (π/2 gateset) 1Q RBa tomographic e1 purity 1Q XEB e1 isolated / simultaneous 1Q XEB e1 purity isolated / simultaneous 2Q XEB e2 isolated / simultaneous 2Q XEB e2c isolated / simultaneous 2Q XEB e2c purity isolated / simultaneous Measurement em isolated / simultaneous
Median 6.924 6.661 5.750 4.618 -208.0 0.64 72.3 15.54
0.97 / 1.8 4.5 / 5.1
0.19 0.15 0.14 0.13 / 0.14 0.11 / 0.11 0.30 / 0.60 0.64 / 0.89 0.59 / 0.86 2.83 / 3.50
Mean 6.933 6.660 5.766 4.588 -208.0 0.69 72.1 16.04 1.2 / 2.3 5.0 / 5.5 0.22 0.16 0.15 0.15 / 0.16 0.11 / 0.12 0.36 / 0.62 0.65 / 0.93 0.62 / 0.89 3.05 / 3.77
Stdev. 0.114 0.057 0.360 0.076
4.7 0.23 2.8 4.00 0.8 / 2.1 1.8 / 2.2 0.10 0.06 0.04 0.05 / 0.05 0.03 / 0.03 0.17 / 0.24 0.20 / 0.26 0.20 / 0.24 1.09 / 1.61
a RB data taken at a later date
Units GHz GHz GHz GHz MHz MHz MHz
µs % % % % % % % % % % %
Figure S22 S13 S20 S20 S22 S22 S22 S22 S21 S21 S23 S23 S23
3a (main) S23 S23
3a (main) 3a (main)
S19 3a (main)
E. Summary of system parameters
Table II reports aggregate values for qubit and pair parameters in our processor. A complete table of singlequbit parameter values by qubit is available in supporting online materials, Ref. [55], and illustrated in Figs. S22 through S24. Single-qubit metrics represent a sample size of 53. Two-qubit metrics represent 86 pairs.
a
Readout drive frequency (GHz)
23
a
Isolated
Simultaneous
Integrated histogram
Row
Column
b
Readout qubit f10 (GHz)
b
Identification error
(Meas - Prep) excitations
Row
Column
Hamming distance
FIG. S21. Readout errors. a, Histogram of readout errors for each qubit when prepared in |0 or |1 , and readout in isolation or simultaneously. b, Distribution of errors in multiqubit readout. The x-axis Hamming distance is the number of bits that are different between measured and prepared states, while the y-axis is the difference in the number of 1s in the states. For example, if we prepare |011 and measure |101 , the Hamming distance is 2 and the difference in the number of excitations is 0.
FIG. S20. a, Drive frequencies for the readout resonators for each qubit. The red outline shows the area where we had to perform correlated discrimination because of unwanted crosscouplings between the resonators. b, Qubit frequencies during readout, found using a frequency optimization procedure.
24 FIG. S22. Typical distribution of single-qubit parameters over the Sycamore processor.
25
FIG. S23. Typical distribution of single-qubit gate benchmarking errors over the Sycamore processor, for both isolated and simultaneous operation.
26 FIG. S24. Typical distribution of readout errors over the Sycamore processor, for both isolated and simultaneous operation.
27
VII. QUANTUM CIRCUITS
A. Background
We sample the output of random quantum circuits (RQCs) with two use cases in mind: performing a computational task beyond the reach of state-of-the-art supercomputers (quantum supremacy); and estimating the experimental fidelity (performance evaluation).
In order for the RQCs to cover both use cases, we define a circuit family with a varying number of qubits n and cycles m. Our quantum supremacy demonstration uses RQCs with a large number of qubits n = 53 and high depth m = 20. Large number of qubits hinders wave function (Schr¨odinger) simulation and high depth impedes tensor network (Feynman) simulation (see Sec. X B). We find that the most competitive classical simulator for our hardest RQCs is the Schr¨odinger-Feynman algorithm (SFA, see Sec. X A) which copes well with high depth circuits on many qubits.
SFA takes as input an n-qubit quantum circuit and a cut which divides n = n1 + n2 qubits into two contiguous partitions with n1 and n2 qubits. The algorithm computes the output state as the sum over simulation paths formed as the product of the terms of the Schmidt decomposition of all cross-partition gates. By the distributive law there are rg such simulation paths for a circuit with g cross-partition gates of Schmidt rank r. Consequently, the algorithm achieves runtime proportional to (2n1 + 2n2 )rg. Circuit cuts with n1, n2 and g that make the simulation task tractable are called promising cuts. The most promising cut for our largest RQCs runs parallel to the shorter axis of the device starting in the vicinity of the broken qubit. The sum over the simulation paths can be interpreted as tensor contraction. In this view, the rg factor can be thought of as the bond dimension associated with the circuit partitioning, i.e. the cardinality of the index set ranged over in the contraction corresponding to all cross-partition gates. SFA is described in more detail in [38] and section X.
B. Overview and technical requirements
The two use cases for our RQCs give rise to a tension in technical requirements at the heart of quantum supremacy. On the one hand, supremacy RQC sampling should by definition be prohibitively hard to simulate classically. On the other hand, performance evaluation entails classical simulation of the RQCs. To resolve the conflict, we note that the fidelity of a RQC experiment depends primarily on the number and quality of the gates. By contrast, the simulation cost is highly sensitive to minor perturbations in the circuit. Consequently, experiment fidelity for RQCs that cannot be simulated directly may be approximated from the experiment fidelity of similar RQCs obtained as the result of transformations that reduce simulation cost without significantly
affecting experiment fidelity (see Section VII G). Performance evaluation using XEB provides another
design consideration. The procedure requires knowledge of the cross-entropy of the theoretical output distribution of the circuit. An analytical expression for this quantity has been derived in [27] for circuits whose measurement probabilities approach the Porter-Thomas distribution. We find that our RQCs satisfy this assumption when the circuit depth is larger than 12, see Fig. S35a. Note that high circuit depth also increases the cost of classical simulation.
C. Circuit structure
A RQC with n qubits generally utilizes qubits 1 through n in the qubit order shown in Fig. S27 with small deviations from this default qubit ordering in some circuits. The qubit order has been chosen to ensure that for most RQCs with fewer than 51 qubits, there is a partitioning of the qubits into two similarly sized blocks connected by only five couplers. The next larger RQC, with 51 qubits, has seven couplers along the most promising circuit cut. Since the cost of SFA grows exponentially in the number of gates across the partitions our circuit geometry leads to a steep increase in the simulation cost of 51-qubit RQCs relative to the circuits with fewer qubits. This creates a sizeable gap in the computational hardness between most of our evaluation circuits and the quantum supremacy circuits (n = 53).
In the time dimension, each RQC is a series of m full cycles and one half cycle followed by measurement of all qubits. Every full cycle consists of two steps. In the first step, a single-qubit gate is applied to every qubit. In the second step, two-qubit gates are applied to pairs of qubits. Different qubit pairs are allowed to interact in different cycles. Specifically, in the supremacy RQCs we loop through the direct neighbors of every qubit over the eight-cycle sequence ABCDCDAB and in the evaluation RQCs we use the four-cycle sequence EFGH where A, B, ..., H are coupler activation patterns shown in Fig. S25. The sequence is repeated in subsequent cycles. The cost of SFA simulation is highly sensitive to the specific sequence employed in a circuit, see VII G 2. Border qubits have fewer than four neighbors and no gate is applied to them in some cycles. The half cycle preceding the measurement consists of the single-qubit gates only. The overall structure of our RQCs is shown in Fig. 3 of the main paper [1].
D. Randomness
Single-qubit gates in every cycle are chosen randomly using a pseudo-random number generator (PRNG). The generator is initialized with a seed s which is the third parameter for our family of RQCs. The single-qubit gate applied to a particular qubit in a given cycle depends only
Pattern A
Pattern B
Pattern C
28
Pattern D
Pattern E
Pattern F
Pattern G
Pattern H
FIG. S25. Coupler activation patterns. Coupler activation pattern determines which qubits are allowed to interact simultaneously in a cycle. Quantum supremacy RQCs utilize the staggered patterns shown in the top row in the sequence ABCDCDAB, repeated in subsequent cycles. Performance evaluation RQCs employ the patterns shown in the bottom row in the sequence EFGH, likewise repeated in subsequent cycles. The former sequence makes SFA simulation harder by facilitating prompt transfer of entanglement created at promising circuit cuts into the bulk of each circuit partition.
on s. Consequently, two RQCs with the same s apply the same single-qubit gate to a given qubit in a given cycle as long as the qubit and the cycle belong in both RQCs as determined by their size n and depth m parameters.
Conversely, the choice of single-qubit gates is the sole property of our RQCs that depends on s. In particular, the same two-qubit gate is applied to a given qubit pair in a given cycle by all RQCs that contain the pair and the cycle.
E. Quantum gates
In our experiment, we configure three single-qubit gates. Each one is a π/2-rotation around an axis lying on the equator of the Bloch sphere. Up to global phase, the gates are
X 1/2
RX (π/2)
=
√1 2
1 i
i 1
,
(50)
Y
1/2
RY
(π/2)
=
√1 2
1 1
1 1
,
(51)
W 1/2
RX +Y
(π/2)
=
1 √
2
√1 i i 1
(52)
where W = (X + Y )/ 2 and ±i denotes the princi-
pal value of the square root. The first two belong to the
single-qubit Clifford group, while W 1/2 is a non-Clifford
gate. Single-qubit gates in the first cycle are chosen in-
dependently and uniformly at random from the set of the
three gates above. In subsequent cycles, each single-qubit gate is chosen independently and uniformly at random from among the gates above except the gate applied to the qubit in the preceding cycle. This prevents simplifications of some simulation paths in SFA. Consequently, there are 3n2nm possible random choices for a RQC with n qubits and m cycles.
Two-qubit gates in our RQCs are not randomized, but are determined by qubit pair and cycle number. The gates preserve the number of ground and excited states of the qubits which gives their matrices block diagonal structure with 1×1, 2×2 and 1×1 blocks. Therefore, up to global phase they belong to U (1) ⊕ U (2) ⊕ U (1)/U (1) and thus can be described by five real parameters (see Fig. S16, and Eq. 43). Each gate in this family can be decomposed into four Z-rotations described by three free parameters and the two-parameter fermionic simulation gate
1 0
0
0
fSim(θ,
φ)
=
0  0
cos(θ) i sin(θ)
i sin(θ) cos(θ)
0 
0
(53)
00
0
e
which is the product of a fractional iSWAP and controlled phase gate (see Fig. S14b).
In our experiment, we tune up the two-qubit gates close to θ ≈ π/2 and φ ≈ π/6 radians and then infer more accurate values of all five parameters for each qubit pair using XEB. Consequently, all five parameters of the two-qubit gate depend on the qubit pair. While inferred unitaries are suitable for RQC sampling, future applications of the Sycamore processor, for example, in
29
quantum chemistry, will require precise targeting of the entangling parameters [49, 56]. The three parameters which control the Z-rotations implicit in the two-qubit gates can be canceled out with active Z-rotations turning an arbitrary five-parameter gate into pure fSim(θ, φ). In our RQCs, we have decided not to apply such correction gates. This choice affords us greater number of interactions within the available circuit depth budget and introduces additional implicit non-Clifford single-qubit gates into the RQCs.
The Z-rotations have two origins. First, they capture the phase shifts due to qubit frequency excursions during the two-qubit gate. Second, they account for phase changes due to different idle frequencies of the interacting qubits. The latter introduces dependency of the three parameters defining the Z-rotations on the time at which the gate is applied. By contrast, for a given qubit pair θ and φ do not depend on the cycle.
The fSim(π/2, π/6) gate is the product of a nonClifford controlled phase gate and an iSWAP which is a two-qubit Clifford gate.
F. Programmability and universality
Programmability of Sycamore rests on our ability to tune up a variety of gate sets including sets that are universal for quantum computation. For example, the set of gates employed in our quantum supremacy demonstration is universal, as we show in this section.
The proof consists of two parts. First, we show that the CZ gate can be obtained as a composition of two fSim gates and single-qubit rotations. Second, we outline how the well-known proof that the H and T gates are universal for SU(2) [57] can be adapted for X1/2 and W 1/2. The conclusion follows from the fact that the gate set consisting of the CZ gate and SU(2) is universal [58].
1. Decomposition of CZ into fSim gates
Here, we show how to decompose a controlled-phase gate into two fSim gates and several single-qubit gates. The fSim gate is native to our hardware and can be decomposed into
fSim(θ, φ) = eiθ(X⊗X+Y ⊗Y )/2 eiφ(IZ)⊗(IZ)/4 , (54)
where the iSWAP angle θ π/2 and the controlled-phase angle φ π/6. The controlled-phase part can be further decomposed into
eiφ(I Z )⊗(I Z )/4
= eiφ/4 eiφ(Z⊗I+I⊗Z)/4 eiφZ⊗Z/4 .
(55)
To simplify notations, we introduce the two-qubit gate
Υ(θ, φ) = eiθ(X⊗X+Y ⊗Y )/2 eiφZ⊗Z/4 = eiφ/4 eiφ(Z⊗I+I⊗Z)/4 fSim(θ, φ) ,
(56)
which is equivalent to the fSim gate up to single-qubit Z rotations. The sign of θ in Υ(θ, φ) can be changed by the single-qubit transformation,
Z1 Υ(θ, φ) Z1 = Υ(−θ, φ) ,
(57)
where Z1 = Z ⊗ I (Z2 = I ⊗ Z works equally well). Multiplying two Υ gates with opposite values of θ on
both sides the operator X1 = X ⊗ I, we have
Υ(−θ, φ) X1 Υ(θ, φ) = eiθY ⊗Y /2 X1 eiθY ⊗Y /2 = cos θ X1 + sin θ Z ⊗ Y . (58)
With the identity (58), we have
Υ(−θ, φ) eiαX1 Υ(θ, φ) = cos α
φ
φ
cos I ⊗ I i sin Z ⊗ Z
+ i sin α
cos θ X ⊗ I + sin θ Z ⊗ Y
2
2
φ
φ
= cos α cos I + i sin α cos θ X ⊗ I iZ ⊗ cos α sin Z sin α sin θ Y , (59)
2
2
where 0 ≤ α ≤ π/2 is to be determined. We introduce the Schmidt operators
Γ1(α) = cos α cos(φ/2) I + i sin α cos θ X , (60)
Γ2(α) = cos α sin(φ/2) Z sin α sin θ Y ,
(61)
and the unitary (59) takes the simple form
Υ(−θ, φ) eiαX1 Υ(θ, φ) = Γ1 ⊗ I iZ ⊗ Γ2 . (62)
The Schmidt rank of this unitary is two. Therefore, it is equivalent to a controlled-phase gate (also with Schmidt
rank two) up to some single-qubit unitaries. The two non-zero Schmidt coefficients of the unitary (59) are equal to the operator norms of Γ1, 2.
The target controlled-phase gate that we want to decompose into the fSim gate is
diag 1, 1, 1, eiδ = eiδ(IZ)⊗(IZ)/4 ,
(63)
where 0 ≤ δ ≤ 2π. It has two non-zero Schmidt coefficients cos(δ/4) and sin(δ/4). For example, we set the operator norm of Γ2 to be equal to the second Schmidt
30
coefficient of the target unitary
2. Universality for SU(2)
Γ2(α) = cos α sin(φ/2) 2 + sin α sin θ 2
= sin(δ/4) ,
(64)
and the parameter α can be determined
sin(δ/4)2 sin(φ/2)2
sin α = sin(θ)2 sin(φ/2)2 .
(65)
This equation has a solution if and only if one of the following two conditions is satisfied
|sin θ| ≤ sin(δ/4) ≤ |sin(φ/2)| ,
(66)
|sin(φ/2)| ≤ sin(δ/4) ≤ |sin θ| .
(67)
A large set of controlled-phase gates can be implemented with the typical values of θ and φ of the fSim gate, except for those that are very close to the identity.
To fix the local basis of the first qubit in Eq. (59), we introduce two X rotations of the same angle
eiξX/2 Γ1(α) eiξX/2 = cos(δ/4) I , eiξX/2 Z eiξX/2 = Z ,
(68) (69)
where the angle ξ is
tan α cos θ π
ξ = arctan
+ 1 sgn cos(φ/2) .
cos(φ/2) 2
(70)
To fix the local basis of the second qubit in Eq. (59), we introduce two X rotations of opposite angles
eiηX/2 Γ2(α) eiηX/2 = sin(δ/4) Z ,
(71)
where the angle η is
tan α sin θ π
η = arctan
+ 1 sgn sin(φ/2) .
sin(φ/2) 2
(72)
Applying these local X rotations before and after the gate sequence in Eq. (59), we have
ei(ξX1ηX2)/2 Υ(−θ, φ) eiαX1 Υ(θ, φ) ei(ξX1+ηX2)/2
= cos(δ/4) I ⊗ I i sin(δ/4) Z ⊗ Z ,
(73)
which is the desired controlled-phase gate up to some single-qubit Z rotations.
The target controlled-phase gate equals to the CZ gate for δ = π. We numerically checked that the decomposition (73) yields the CZ gate for all 86 fSim gates (with different values of θ and φ) in our device.
Here, we show how the argument for the well-known result that the H and T gates are universal for SU(2) [57] can be adapted for the X1/2 and W 1/2 gates. At the core of the argument lies the observation that T ≡ RZ (π/4) followed by HT H ≡ RX (π/4) is a single-qubit rotation by angle α which is an irrational multiple of π. Specifically, α is such that
cos α = cos2 π = 1
1 1+ √
.
(74)
2
82
2
By Theorem B.1 in Appendix B of [57], α/π is irrational
because the monic minimal polynomial with rational coefficients of eiα
x4 + x3 + 1 x2 + x + 1
(75)
4
is not cyclotomic (since not all its coefficients are inte-
gers). Similarly, W 1/2 ≡ RX+Y (π/2) followed by X1/2 ≡
RX (π/2) is a single-qubit rotation by angle β such that
cos
β
=
cos2
π
1 √
sin2 π = 1
1 1
.
(76)
2
4 2 42
2
The monic minimal polynomial with rational coefficients of eiβ is (75), the same as that of eiα. Therefore, β is also
an irrational multiple of π. The rest of the universality argument for H and T also applies in the case of X1/2 and W 1/2.
G. Circuit variants
Since XEB entails classical simulation, it is hard or impossible to use it to estimate experimental fidelity of circuits which are hard or impossible to simulate classically. As described above, we designed our RQCs to ensure that an effective partitioning for SFA exists for circuits with fewer than 51 qubits. This gives rise to a significant gap in the cost of classical simulation between quantum supremacy circuits and most of our performance evaluation circuits. This gap facilitates performance evaluation of the Sycamore processor near the quantum supremacy frontier. In practice, however, we would like greater control over the simulation hardness, for two reasons. First, performance evaluation is still very costly for large n approaching the supremacy frontier. Second, we would like to be able to estimate the fidelity of supremacy RQCs more directly, even though classical simulation of this case is unfeasible by definition.
In order to achieve more fine-grained control over the cost of classical simulation of our RQCs, we exploit the fact that the experimental fidelity depends primarily on
31
Circuit variant Gates elided Sequence of patterns If the error probability of the elided two-qubit gate is
non-simplifiable full
none
ABCDCDAB
similar to the error probability of the two-qubit identity
non-simplifiable elided non-simplifiable patch
simplifiable full simplifiable elided simplifiable patch
some all
none some
all
ABCDCDAB ABCDCDAB
EFGH EFGH EFGH
gate which it is replaced with, the circuit resulting from gate elision exhibits fidelity that is similar to the fidelity of the original circuit. This assumption holds when the two-qubit gate errors are dominated by the same decoherence processes that govern the single-qubit gate er-
TABLE III. Circuit variants. Six variants of RQCs employed in quantum supremacy demonstration (nonsimplifiable full) and performance evaluation (remaining five variants) classified by transformations applied in order to con-
rors such as finite T1 and T2. Indeed, for circuit sizes where XEB on full circuits is possible, we have observed good agreement between fidelity estimates produced for patch, elided and full circuits. For harder circuits, we
trol the cost of classical simulation. The eight coupler activa- have observed good agreement between fidelity estimates
tion patterns A, B, ..., H are shown in Fig. S25.
for patch and elided circuits. See Section VIII for detailed
discussion of these results.
the number and quality of the gates while the simulation cost is highly sensitive to the structure of the quantum circuit. Therefore, we approximate the experimental fidelity of RQCs which are hard or impossible to simulate from the fidelity of similar RQCs obtained as the result of transformations that reduce simulation cost without significantly affecting experimental fidelity.
We employ two such transformations. Each decreases simulation cost by reducing the bond dimension of promising circuit cuts. The first one removes some or all cross-partition gates. We say that the removed gates have been elided and term the transformation gate elision. The second transformation changes the sequence of coupler activation patterns shown in Fig. S25 to enable the formation of wedges which reduce the bond dimension by slowing the spread of entanglement generated at the circuit cut.
The two transformations complete the description of RQCs used in our experiment. Consequently, each RQC is uniquely determined by five parameters: number of qubits n, number of cycles m, PRNG seed s, number of elided gates and the sequence of coupler activation patterns.
1. Gate elision
The most straightforward way to reduce the cost of classical simulation of a RQC is to remove a number of cross-partition gates across the most promising circuit cut. In order to enable independent propagation by the SFA of the wave function of each circuit partition for the first few cycles, the gates are elided beginning with the initial cycle. Each elided gate reduces the bond dimension of the partitioning by a factor of two or four, see Section X.
We refer to RQCs with a small number of elided gates as elided circuits. A particularly dramatic speedup is possible when all two-qubit gates across the partitions are elided leading to two disconnected circuits running in parallel. We refer to such disconnected RQCs as patch circuits. Base RQCs in which no gates have been elided are referred to as full circuits.
2. Wedge formation
The most competitive algorithm for our hardest circuits, SFA (see Sec. X A) scales proportionally to the bond dimension of the circuit partitioning which is equal to the product of Schmidt rank of all cross-partition gates (see Sec. X D). The Schmidt decomposition of most two-qubit gates in our RQCs consists of four terms (a few gates can be replaced with simpler gates with Schmidt rank of two, see Section X). Therefore most cross-partition gates contribute a factor of four to the bond dimension of the partitioning. However, when two consecutive cross-partition gates share a qubit forming a wedge as shown in Fig. S26, the Schmidt decomposition of the resulting three-qubit unitary also has only four terms. In other words, the second cross-partition gate does not generally produce substantial new entanglement (as quantified by the Schmidt rank) among the partitions in excess of the entanglement produced by the first gate. Consequently, every wedge reduces the bond dimension of the partitioning by a factor of four.
The eight-cycle sequence ABCDCDAB and the four constituent coupler activation patterns A, B, C and D shown in Fig. S25 have been designed to prevent formation of wedges across promising circuit cuts. In other words, the sequence ensures that entanglement created in a given cycle by cross-partition gates is transferred into the bulk of each partition in the following cycle.
On the other hand, the four-cycle sequence EFGH enables formation of wedges and thus efficient simulation of RQCs using SFA. We employ the latter sequence in most evaluation circuits and use the former eight-cycle sequence for the quantum supremacy circuits and largest evaluation circuits, see Table III.
VIII. LARGE SCALE XEB RESULTS
In Section VI, we have detailed the device calibration processes used for individual components such as qubits, couplers, and coupled pairs of qubits. We have also introduced cross-entropy benchmarking (XEB) as a method
32
Circuit variant n m Single-qubit gates All two-qubit gates Cross-partition two-qubit gates
non-simplifiable full 53 20
1113
430
35
non-simplifiable elided 53 20
1113
408
13
non-simplifiable patch 53 20
1113
395
0
simplifiable full 38 14
570
210
18
simplifiable elided 38 14
570
204
12
simplifiable patch 38 14
570
192
0
TABLE IV. Gate counts. Number of gates in selected random quantum circuits employed for quantum supremacy demonstration and performance evaluation of the Sycamore processor.
FIG. S26. Cross-partition wedge. Two consecutive crosspartition gates which share a qubit form a wedge, as illustrated here with gates highlighted in turquoise and magenta. Schmidt rank of a single two-qubit gate is at most four. Schmidt rank of a wedge is also at most four. Therefore, generally wedges are not efficient at increasing entanglement across partitions and can be simulated efficiently by the SFA.
that allows us to evaluate the performance of a quantum system. In this section, we describe how we use a few circuit variations to benchmark our Sycamore processor at a larger scale. In particular, we present a modular version of XEB with “patch circuits” that does not require exponential classical computation resources for estimating XEB fidelities FXEB of larger systems. We also describe the effect of choice of unitary model on large-scale FXEB, as well as how we use patch circuits to monitor the stability of the full system.
A. Limitations of full circuits
We first discuss what we refer to as “full circuits”, where for a given set of qubits, all possible two-qubit gates participate in the circuit. With full circuits, we benchmarked the system as a function of size, where as discussed below the classical resources and techniques used to compute the FXEB is a function of the number of qubits. The order in which each qubit was added is labeled in Fig. S27. The rationale behind this ordering is explained in Section VII. At each system size, we executed 10 randomly generated circuit instances on the
FIG. S27. Qubit ordering for large-scale XEB experiments. Illustration of the order in which qubits are added for large-scale experiments. The partition between left (black) and right (blue) qubits along the boundary (dashed red lines) is used in patch and elided circuits, as explained below.
processor and sampled output bitstrings 500k times for each circuit (unless otherwise specified). To minimize potential instance-to-instance fluctuations, we chose the gate sequences in a persistent, “stable” manner: using a known seed for a random number generator, for each circuit, each time a new qubit is added, we maintain the same gateset for all the “existing” qubits and new gates are only introduced to qubits and pairs associated with the added qubit (see Section VII for details).
Once a sufficient number of bitstrings are collected, FXEB can be calculated for each system size, following the method described in Section IV. As the system size increases, the computational complexity of XEB analysis grows exponentially, which can be qualitatively divided into three regimes. For system size from 12 to 37 qubits, XEB analysis was carried out by evolving the full quantum state (Schr¨odinger method) on a high-performance server (88 hyper-threads, 1.5TB memory in our case) using the “qsim” program. At 38 qubits we used a n1ultramem-160 VM in Googles cloud (160 hyperthreads, 3.8TB memory). Above 38 qubits, Googles large-scale cluster computing became necessary, and in addition a
33
XEB fidelity
Full circuits Patch circuits
Number of qubits
FIG. S28. Comparison between XEB with patch circuits and full circuits. Full vs. patch circuit benchmarking up to 38 qubits with 14 cycles, showing close agreement to within the intrinsic fluctuations of the system. We plot the results for patch circuits out to 53 qubits.
hybrid Schr¨odinger-Feynman approach, the “qsimh” program, was used to improve the efficiency: in this case, we break the system up into two patches, where each patch can be efficiently computed via the Schr¨odinger method and then connected by a Feynman path-integral approach (see Section X for more details). Finally we used a Schr¨odinger algorithm in the Ju¨lich supercomputer for some circuits up to 43 qubits.
In order to reduce the computational cost, we introduce two modified circuit types in the following sections. By using slightly simplified gate sequences, these two methods can provide good approximate predictions of system performance all the way out to the “quantum supremacy” regime.
B. Patch circuits: a quick performance indicator for large systems
The simplest approach to large-scale performance estimation is referred to as “patch circuits,” which predicts the performance of the full system by multiplying together the fidelities of non-interacting subsystems, or “patches”. In this work, we use two such subsystems, where each patch is roughly half the size of the full system. The two subsystems are run simultaneously, so that effects such as gate and measurement crosstalk between patches are included, but the two patches are analyzed separately when computing the fidelity. The two patches are defined by the gates removed along their boundary, as illustrated in Fig. S27. For sufficiently large systems, these removed two-qubit gates represent a small portion of the whole circuit. As a consequence, FXEB of the full system can be estimated as the product of the fidelities
of the two subsystems; compared with full circuits, the main missing factor is the absence of entanglement between the two patches.
We evaluate the efficacy of using patch circuits by comparing it against full circuits with the same set of qubits. The experimental results can be seen in Fig. 4a (main text), where we show fidelities measured by these two methods for systems from 12 qubits to 53 qubits, in an interleaved fashion. We re-plot this data here in Fig. S28 as well. As expected, the fidelities obtained via patch XEB show a consistent exponential decay (up to fluctuations arising from qubit-dependent gate fidelities and a small amount of system fluctuations) as a function of system size. For every system size investigated, we found that patch and full XEB provide fidelities that are in good agreement with each other, with a typical deviation of 5% of the fidelity itself (we attribute the worst-case disagreement of 10% at 34 qubits due to a temporary system fluctuation in between the two datasets, which was also seen in interleaved measurement fidelity data). Theoretically, one would expect patch circuits to result in 10% higher fidelity than full circuits due to the slightly reduced gate count. We find that patch circuits perform slightly worse than expected, which we believe is due to the fact that the two-qubit gate unitaries are optimized for full operation and not patch operation. In any case, agreement between patch and full circuits shows that patch circuits can be a good estimator for full circuits, which is quite remarkable given the drastic difference in entanglement generated by the two methods. These results give us a good preview of the system performance in all three regimes discussed earlier.
The advantage of using patch circuits lies in its exponentially reduced computational cost, as it only requires calculating FXEB of subsystems at half the full size (or less if a larger number of smaller patches is used). This allows for quick estimates of large-scale system performance on a day-to-day basis, including for system and circuit sizes in the “quantum supremacy” regime. As a consequence, we typically use patch circuits as a quick system performance indicator, which we use for rapid turnarounds between system calibration and performance evaluation, as well as for monitoring full system stability (see Section VIII H). We also note that patch circuits can be used well beyond 50 qubits, and in fact can be extended to arbitrary numbers of qubits while keeping the analysis time at most linear in the number of qubits (or even constant if the patches can be analyzed in parallel), assuming that the patch size stays roughly constant and more non-interacting patches are added as the number of qubits grows.
C. Elided circuits: a more rigorous performance estimator for large systems
For a more rigorous prediction of full FXEB, we introduce a more sophisticated approach referred to as “elided
34
XEB fidelity
Full circuits Elided circuits (6 elided gates)
Number of qubits
FIG. S29. Comparison between XEB with elided circuits and full circuits. Full vs. elided circuit benchmarking up to 38 qubits at 14 cycles, showing close agreement to within the intrinsic fluctuations of the system.
circuits”. Similar to patch circuits, we partition a given set of qubits into two subsets separated by a boundary, but elide (remove) only a fraction of the two-qubit gates along this boundary during a few early cycles of the sequence (more specifically, we elide the earliest gates in time, meaning early layers will have none of their gates along the boundary while later layers will have all of their usual gates across the boundary). Accordingly, the two subsets of qubits are no longer isolated from each other and we cannot simply compute their fidelities separately and multiply. Rather, we must still compute the evolution of the full system. Given that a sufficient number of gates are elided, we can take advantage of the “weak link” between patches with a hybrid analysis technique: we compute each patch via the Schr¨odinger method and then connect them with a Feynman path-integral approach (see Section X for more details on this “qsimh” program).
Compared with patch circuits, elided circuits more closely approach a description of the full system performance under a full circuit: in addition to capturing issues such as control and readout crosstalk, elided circuits allow entanglement to form between the two weakly connected subsystems. It covers essentially all the possible processes that occur in the full circuit, and therefore can be used to predict system performance at a dramatically reduced computational cost, albeit significantly costlier than patch circuits.
In order to validate the use of elided circuits as a system performance estimator, we evaluated its accuracy via a direct comparison with full circuits. In Fig. S29 we show two sets of fidelities from interleaved full and elided circuit experiments. For every system size investigated, using elided circuits yields a fidelity value that is in good agreement with the one obtained with the corre-
sponding full circuits. The average ratio of elided circuit fidelity to full circuit fidelity over all verification circuits was found to be 1.01, with a standard deviation of 5%, dominated by system fluctuations. It is this agreement that certifies elided circuits as a precise predictor for full circuits (within a systematic relative uncertainty of 5%), which we rely on to extrapolate the system performance in the regimes where full circuit analysis is too expensive to perform (i.e., Fig. 4b of the main text).
Compared with full circuits, elided circuits can result in a reduced amount of quantum entanglement in the system. The amount of reduced entanglement can be bounded from above by counting the number of iSWAP gates across the boundary: one iSWAP gate generates at most two units of bipartite entanglements (ebits). This upper bound translates directly into the exponential cost of a Schr¨odinger-Feynman simulation. For elided circuits with 50 qubits and 14 cycles, the full circuit has approximately 25 ebits of entanglement, while with 6 elisions the elided circuit has at most 12 ebits entanglement between the two patches. For the 53-qubit elided circuits used in the main paper [1], there were enough iSWAPs across the boundary that the amount of entanglement between patches for full vs. elided circuits should be close, giving us even more confidence in using elided circuits to predict the fidelity of the circuit used to claim quantum supremacy.
D. Choice of unitary model for two-qubit entangling gates
In Section VI, we discussed how the two-qubit gate unitaries can be measured by two different approaches: isolated two-qubit XEB and per-layer simultaneous twoqubit XEB. These two methods resulted in two different unitary models when deducing the best-fit unitary. Since we must specify the two-qubit gate unitary matrices in order to compute FXEB of the larger system, a natural question is which unitary model should be used. To address this question, we point out that full XEB on the large system occurs in repeated cycles, where during each two-qubit gate layer, all the two-qubit gates in the same orientation take place at the same time (see Fig. 3 in the main text). As a consequence, the two-qubit gate layers during simultaneous pair XEB in Fig. S19 emulate the corresponding layer when running full XEB on a large system. Accordingly, learning the unitaries in parallel operation captures any small coherent modifications induced by the simultaneous application of the other twoqubit gates, such as flux control crosstalk and dispersive shifts from stray interactions. This is evident from the fact that by re-learning the two-qubit unitary parameters, the errors extracted from simultaneous pair XEB become purity-limited (see Fig. S19). This correspondence assures us that unitary parameters extracted from simultaneous pair XEB provides a more accurate description of the full system when full XEB is performed.
Patch circuit XEB fidelity
35
Effect of unitary model choice on circuit fidelity m = 14 (simplifiable circuit pattern)
Simultaneous, w/ arbitrary unitaries Simultaneous, w/ "Sycamore" unitaries Isolated, w/ arbitrary unitaries
Number of qubits, n Simultaneous, w/ arbitrary unitaries Simultaneous, w/ "Sycamore" unitaries Isolated, w/ arbitrary unitaries
n = 53 (non-simplifiable circuit pattern)
qubits, and then fitting only for two single-qubit phase terms. For the purpose of benchmarking the system fidelity for the operations we performed, we have focused on using unitaries learned from simultaneous pair XEB, which provide the most accurate description of the system. The validity of this approach is experimentally verified—for the same gate sequences, using the simultaneous pair XEB unitaries leads to the best full-system fidelity values at every system size. This is direct evidence that the unitaries learned from simultaneous pair XEB form a more accurate description of the system than those from isolated pair XEB.
On the other hand, in order to be useful for generic quantum algorithms, it will be desirable to use calibrated gatesets that are independent of the specific gate sequences used. For this purpose, it is important to check the circuit fidelity under the other two unitary models, where the two-qubit gate unitaries were calibrated in more generic settings. One can see that fidelities calculated from these two unitary models still demonstrate nearly as good performance despite the addition of small coherent control errors. They differ from the fidelities using the simultaneous pair XEB unitaries by less than a factor of 2 at 50 qubits (fidelity goes from 9 × 103 to 5 × 103 at 50 qubits). This is remarkable since it suggests going from a 2-qubit setting to 50-qubit setting, our full system calibration precision degrades only by a factor of < 2 despite the system size increasing by a factor of 25. This high precision in gate calibration gives us confidence to use our processors in NISQ algorithms.
Patch circuit XEB fidelity
Number of cycles, m
FIG. S30. Effect of unitary model on full system fidelity. a, Patch circuit fidelity versus number of qubits and choice of unitary model. b, Same but versus number of cycles and for the non-simplifiable supremacy circuits. Blue: patch XEB fidelities using the unitaries deduced from the best-fit fSim unitary from isolated pair XEB. Green: patch XEB fidelities using the unitaries deduced from the best-fit fSim unitary from per-layer simultaneous pair XEB. Orange: patch XEB fidelities using the unitaries deduced from the best-fit “Sycamore unitary” (θ = π/2, φ = π/6) from per-layer simultaneous pair XEB. As expected, the best fidelities arise from fitting to the most general unitary in parallel operation, although the fidelities are high enough to achieve quantum supremacy with the Sycamore unitary model as well.
In Fig. S30, we show patch circuit fidelities at different system sizes, where the fidelity is evaluated using three different unitary models: the best-fit unitaries from isolated pair XEB, the best-fit unitaries from simultaneous pair XEB, and the best-fit “Sycamore” unitaries from simultaneous pair XEB. The Sycamore unitaries are the unitaries obtained when keeping the swap angle fixed at θ = π/2 and conditional phase fixed at φ = π/6 for all
E. Understanding system performance: error model prediction
In this section, we perform additional analysis to compare the measured fidelities to that predicted from the constituent gate and measurement errors.
The most commonly used error model in quantum computing theory is the digital error model. Analogous to the independent noise model in classical information theory, the digital error model is based on the assumption that there are no space and time correlations between errors of quantum gates [27, 59, 60]. If this assumption is valid, it should be possible to construct the fidelity of a large quantum system from the fidelities of its constituent parts: single- and two-qubit gates, and measurement. It is important to point out that the gate fidelity metric that should be used here is the entanglement fidelity, 1 eP (see Section V for more details). This is the correct quantity to describe the fidelity of quantum operations since, in contrast to other metrics such as the commonly used average fidelity, it is independent of the dimension of the Hilbert space.
In Fig. S31, we show fidelities as a function of both system size and number of cycles (circuit depth), measured with patch circuits. In each plot, we compare the measured fidelities to the predicted fidelities, which
XEB fidelity
XEB fidelity
a
Prediction vs. n (patch circuits @ m=14 cycles)
Measured Predicted (gate error only) Predicted (gate and readout error)
Number of qubits, n
b
Prediction vs. m (patch circuits @ n=51 qubits)
Measured Predicted (gate error only) Predicted (gate and readout error)
Number of cycles, m
FIG. S31. Predicted vs. measured large-scale XEB fidelity. a, Data and two predictions for 14-cycle patch circuits vs. number of qubits. Predictions are based on the product of single- and two-qubit gate entanglement fidelities under simultaneous operation. Blue curve contains measured fidelities. Orange is the prediction based only on gate errors during parallel operation, but without taking measurement error into account. Green is the same but multiplied by the measured readout fidelities. b, Same as the first panel, but vs. number of cycles at a fixed number of qubits n = 51. Again, the prediction from simultaneous gate fidelities and measurement fidelity is a good prediction of the actual system performance.
are calculated from a simple multiplication of individual gate entanglement fidelities as measured during simultaneous operation, along with the measurement fidelities obtained during simultaneous measurement. We note that the measured readout fidelities actually also automatically include the effect of state preparation errors as well. More explicitly, if a circuit contains the set of single-qubit gates G1, the set of two-qubit gates G2, and
36
the set of qubits Q, then we approximate the fidelity F as
F = (1 eg) (1 eg) (1 eq),
g∈G1
g∈G2
q∈Q
(77)
where eg are the individual gate Pauli errors and eq are the state preparation and measurement errors of individual qubits. It is evident that there is a good agreement between the measured and predicted fidelities, with deviations of up to only 10-20%. Given that the sequence here involves tens of qubits and 1000 quantum gates, this level of agreement provides strong evidence to the validity of the digital error model.
This conclusion can be further strengthened by the close agreement between the fidelities of full circuits, patch circuits, and elided circuits. Even though these three methods differ only slightly in the gate sequence, they can result in systems with drastically different levels of computational complexity and entanglement between subsystems. The agreement between the fidelities measured by these different methods, as well as the agreement with the predicted fidelity from individual gates, gives compelling evidence confirming the assumptions made by the digital error model. Moreover, these assumptions remain valid even in the presence of quantum entanglement.
The validation of the digital error model has crucial consequences, in particular for quantum error correction. The absence of space or time correlations in quantum noise has been a commonly assumed property in quantum error correction since the very first paper on the topic [59]. Our data is evidence that such a property is achievable with existing quantum processors.
F. Distribution of bitstring probabilities
In Section IV, we motivate two different estimates for fidelity F , one based on the cross entropy, Eq. (28), and the other based on linear cross entropy, Eq. (27). In this section, we examine the probabilities of sampled bitstrings and compare them against theoretical distributions. We use bitstring samples from non-supremacy region to demonstrate the analysis methodology, then apply it to the sample in the supremacy region.
The theoretical PDF for the bitstring probability p with linear XEB is
Pl(x|F ) = (F x + (1 F ))ex
where x ≡ Dp is the probability p scaled by the Hilbert space dimension D, and F is the linear cross entropy fidelity. The PDF for log p is
Pc(x|F ) = (1 + F (ex 1))exex
where x ≡ log(Dp) and F is the cross entropy fidelity.
Pr(p)
a 100
Distribution of bitstring probability, 20 qubits
10 1
10 2
Pl(x|F = 0)
Pl(x|F = 0.218)
10 3
Puln(xif|oFr=ml1y)random bitstrings
experimental bitstrings
ideal circuit output
10 40
1
2
3x = Dp4
5
6
7
b
10 1
Pr(p)
10 2
Pc(x|F = 0) Pc(x|F = 0.218)
Pucn(ixfo|Frm=l1y)random bitstrings
experimental bitstrings
ideal circuit output
10 3 5
4
3 x2= logDp1 0
1
2
FIG. S32. Histograms of ideal probabilities. The ideal probability p is calculated from the final state amplitudes of a (20-qubit 14-cycle) random circuit. The blue, orange, and green histogram is the ideal probabilities of bitstrings sampled uniformly at random, from the experiment, and ideal output, respectively. a, The distribution of Dp and theoretical curves Pl(x|Fl) normalized to histogram counts for Fl = 0, Fˆl, 1, respectively. b, The distribution of log(Dp) and theoretical curves Pc(x|Fc) for Fc = 0, Fˆc, 1, respectively.
From a set of bitstrings {qi}, the fidelity is estimated from the ideal probabilities {pi = ps(qi)} as
Fˆl = Dp 1,
(78)
Fˆc = log(Dp) + γ,
(79)
where γ is the Euler-Mascheroni constant, see Sec. IV B. Figure S32 shows the distribution of {pi} from 0.5 mil-
lion bitstrings obtained in an experiment with a 20-qubit 14-cycle random quantum circuit. For comparison, we produce 0.5 million bitstrings sampled uniformly at random and 0.5 million bitstrings sampled from the output distribution of the ideal circuit and show them in the same figure. The theoretical distribution curves are also shown, where the fidelity estimated from data is fed into
p-value
101 10 2 10 5 10 8 10 11 10 14 10 17 10 20
0
Kolmogorov distribution
1
2
3
4
Ns DKS
37 5
FIG. S33. The Kolmogorov distribution function. This function is used to compute p-value from a given DKS and number of samples Ns.
the curve Pl(x|Fˆ) and Pc(x|Fˆ). We see good agreements between experiment and the-
ory. To quantify the agreements, we use the KolmogorovSmirnov test [61] to characterize the goodness of fit of data {pi} to theoretical PDFs. First we compute the Kolmogorov-Smirnov statistics DKS, that is, the distance between data and theory as the supremum of point-wise distances between the empirical cumulative distribution function of data ECDF(p) and the theoretical cumulative distribution function CDF(p):
DKS = sup |ECDF(pi) CDF(pi)|.
i
We then convert the distance DKS to a p-value using the Kolmogorov distribution shown in Fig. S33. The p-value is used for rejecting the null hypothesis that the data {pi} is consistent with the theoretical distribution. The whole Kolmogorov-Smirnov test is done using the scipy package [62] and checked against R package ks.test [63]. Both packages produce consistent results.
We test the ideal probabilities of bitstrings observed in the experiment {pi} against 2 theoretical distributions, one with estimated fidelity F = Fˆ and one with fidelity F = 0. The Kolmogorov-Smirnov statistics DKS and the p-value of every circuit are shown in figure S34. Note that the p-values for F = 0 are not shown because they are
1020 due to the large DKS ≈ 0.07 with Ns = 5 × 105 points in the sample. That is evident from reading off Fig. S33.
We reject the null hypothesis that the experimental bitstrings are consistent with the uniform random distribution with very high confidence for this (20-qubit 14cycle) random circuit.
Now we turn our attention to the supremacy circuits. We use random circuits with gate elisions for checking the distributions because it is exponentially expen-
38
Kolmogorov Smirnov test, 20 qubits 100 10 1 10 2
a
105
Distribution of bitstring probability, 53 qubits Pl(x|Fl) Data histogram
103
Counts
10 3
10 4
p-value, Fl = Fl p-value, Fl = 0
10 5
DKS, Fl = Fl DKS, Fl = 0
100 10 1 10 2
Linear XEB
101
10 1 0
b
5
10
15
20
x = Dp
Pc(x|Fc) Data histogram 104
Counts
10 3
10 4
p-value, Fc = Fc p-value, Fc = 0
10 5
DDKKSS,,
FFcc
= =
F0c
Log XEB
0123456789 circuit index
FIG. S34. The Kolmogorov-Smirnov test results for each of 10 circuits for a (20-qubit 14-cycle) random circuit. See text for the definition of DKS and p-value. The upper plot is for linear XEB, and the lower one is for log XEB.
102
100
15
10
5
0
5
x = log(Dp)
FIG. S35. Distribution of bitstring probabilities from
a 53-qubit 20-cycle circuit. We calculate the theoretical
probabilities of experimentally-observed bitstrings. a, The distribution of Dp and the theoretical curve Pl(x|Fˆl) normalized to histogram counts. b, The distribution of log(Dp) with theoretical curve Pc(x|Fˆc).
sive to calculate the ideal theoretical probability of a bitstring without gate elisions. The effect on fidelity from gate elisions is well understood, see Sec. VIII C. The gate elisions are chosen to minimize the effect while making the classical estimation feasible, see Sec. VII G 1. We sample Ns = 3 × 106 bitstrings {qi|i = 1...Ns} from each of 10 (53-qubit 20-cycle) random circuits, and compute the theoretical ideal probabilities of each bitstring {pi|i = 1...Ns}.
The distributions of Dp and log(Dp) from one such circuit along with the corresponding theoretical curves are shown in Fig. S35.
We again use the Kolmogorov-Smirnov test to characterize the goodness of fit of data {pi} to theoretical PDFs with estimated fidelity F = Fˆ and zero fidelity F = 0. The Kolmogorov-Smirnov statistics DKS and the p-value of every circuit are shown in figure S36.
The p-value for the null hypothesis of zero fidelity is
generally small for every circuit, with a maximum of 0.045 for circuit number 1. We say that the null hypothesis of zero fidelity is rejected better than a 95% confidence level for each circuit. On the other hand, the p-value of null hypothesis of estimated fidelity Fˆ is generally large. The p-value is between 0.18 and 0.98 for linear XEB, and between 0.33 and 0.98 for log XEB. That indicates that the empirical cumulative distribution functions ECDF(pi) from data is quite consistent with the theoretical CDF(pi|Fˆ).
As will be seen in Fig. S38 in section VIII G below, the fidelity of individual circuits are consistent with each other within the statistical uncertainties. Therefore it makes sense to do a Kolmogorov-Smirnov test on all samples combined, containing 30 million bitstrings. The estimated fidelities from the combined sample are Fˆl = 2.24 × 103 and Fˆc = 2.34 × 103, respectively. The
39
Kolmogorov Smirnov test, 53 qubits
100
10 1
10 2
10 3
10 4
10 5
p-value, Fl = Fl p-value, Fl = 0
10 6
DKS, Fl = Fl DKS, Fl = 0
Linear XEB
100
10 1
10 2
10 3
10 4
10 5
p-value, Fc = Fc p-value, Fc = 0
10 6
DDKKSS,,
FFcc
= =
F0c
Log XEB
0123456789 circuit index
FIG. S36. The Kolmogorov-Smirnov test results for random circuits with 53 qubits. The upper plot is for linear XEB, and the lower one is for log XEB.
DKS
p-value
F = Fˆ F = 0 F = Fˆ F = 0
Linear XEB 1.3 × 104 9.6 × 104 0.66 < 2.2 × 1016
Log XEB 9.5 × 105 9.6 × 104 0.95 < 2.2 × 1016
TABLE V. The Kolmogorov-Smirnov test results on combined samples.
DKS and p-values are listed in table V. The p-value for the null hypothesis of F = 0 is very small: p-value = 3 × 1024 from scipy, and p-value < 2.2 × 1016 from R. We note the more conservative value in the table. The null hypothesis of F = 0 is rejected with much higher confidence levels than individual circuits.
G. Statistical uncertainties of XEB measurements
In this section we check the statistical uncertainties of our fidelity estimates against theoretical predictions.
The statistical uncertainties of Fˆl and Fˆc are estimated from data using the standard error-on-mean formula as
σˆFl = D Var(p)/Ns,
σˆFc = Var(log p)/Ns,
where Var(x) is the variance estimator of sample {xi}. Because the distribution of p and log p have finite variances both experimentally and theoretically, we can use the bootstrap procedure [64] to verify the estimate of statistical uncertainties.
The fidelity distribution from 4000 bootstrap samples are shown in Fig. S37. The distribution of Fˆl and Fˆc are each fit to a Gaussian distribution function using maximum likelihood.
The Kolmogorov-Smirnov test on the Gaussian fit produces p-values of 0.99 and 0.41 for Fˆl and Fˆc bootstrap distributions, respectively. It indicates that the central limit theorem is at work and the distributions are consistent with Gaussian distributions.
The estimated statistical uncertainty, the standard deviation of the bootstrap distribution, and the σ parameter of the Gaussian fit are compared against each other to verify that the statistical uncertainty estimate is minimally biased. For the example circuit used in the figures, the three parameters are 5.78, 5.78, 5.78 (×103) for σˆFl , respectively. The same parameters for σˆFc are 7.40, 7.46, 7.46 (×103). The relative differences are less than 1%, consistent with the expected agreement of parameters for 4000 bootstrap samples.
We repeat the bootstrap procedure on all ten 53-qubit 20-cycle circuits with 2500 bootstrap resamples. The statistical uncertainty estimates are all within 3.1% of the bootstrap standard deviation.
The combined linear cross entropy fidelity and statistical uncertainty of 10 random circuits is calculated using inverse-variance weighting to be Fˆl = (2.24±0.18)×103. The theoretical prediction of the statistical uncertainty,
(1 + 2F F 2)/Ns, is 1.8 × 104, which agrees with the experimental estimate. As a comparison, the combined cross entropy fidelity is Fˆc = (2.34 ± 0.23) × 103. The theoretical prediction of statistical uncertainty, (π2/6 F 2)/Ns, is 2.3 × 104, which agrees with the experimental estimate as well. Thus, the cross entropy fidelity and linear cross entropy fidelity estimators produce consistent results. Furthermore, the statistical uncertainty of the linear cross entropy estimator is smaller, as expected from its theoretical formula.
In Fig. S38, we also show the linear XEB fidelities and 5σ statistical uncertainties of all 10 elided circuit instances for each circuit depth from Fig. 4b of the main text. Variations between the fidelities of different circuit instances are consistent with the expected statistical noise due to the finite number of samples. In the last
40
a
102
Distribution of fidelities from bootstrapping Gaussian fit Data histogram
Counts
101
100 0.000
b
102
0.001 0.002 0.003 0.004 Linear cross entropy fidelity Fl
Gaussian fit Data histogram
Counts
101
0.000 0.001 0.002 0.003 0.004 Cross entropy fidelity Fc
FIG. S37. Distribution of fidelity from 4000 bootstrap samples. a, The distribution of bootstrap Fˆl. The theoretical curve is a Gaussian fit normalized to histogram counts. b, The distribution of bootstrap Fˆc, with Gaussian fit.
panel, we also show the smaller statistical uncertainties of the fidelity averaged over the 10 circuit instances for each depth.
H. System stability and systematic uncertainties
In addition to statistical errors, XEB fidelity is also subject to systematic drift as the system performance may fluctuate and/or degrade over time. To quantify these mechanisms, we performed a patch circuits time stability measurement on 53 qubits using a circuit of 16 cycles and 1 million bitstrings for 17.4 hours after calibration. In between these measurements, we measured the fidelity of other 53-qubit circuits with 16 to 20 cycles. The analyzed results are shown in Fig. S39. The statistical uncertainties of the fidelities are estimated to be 1.29 × 104, as indicated by the error bars.
We repeated the stability measurements twice, with different circuits and on different days. Fig. S39 shows the one that exhibits greater degradation as a conservative estimate of the effect. The measurement indicates a degradation of fidelity within the range of time. A linear fit with F = p0 + p1t results in estimated parameters pˆ0 = (5.51 ± 0.055) × 103, pˆ1 = (6.87 ± 0.64) × 105, and a correlation coefficient of pˆ0 and pˆ1, ρ, to be -0.76. The χ2 per degree of freedom is 26.3/11.
The p-value for the χ2 for 11 degrees of freedom is 0.0058, indicating that it is not a very good fit. Because the correctness of the estimates of statistical uncertainties has been verified in Section VIII G, this is attributed to systematic fluctuation in addition to degradation. It is supported by the larger variance of fidelity than the 1σ band in Fig. S39.
The 1σ band depends on the statistical uncertainties of fidelities and the variance of time on the x-axis, but is independent of the variance of fidelity. To take the variance of fidelity into account, we use the variance of the residuals of the linear fit as an estimator of the variance of fidelity. The standard deviation of residuals is estimated to be 1.84 × 104, which is added to σp0 in quadrature to be the total σp0 . The estimate is total σp0 = 1.92 × 104, 3.5 times larger than the statistical-only σp0 of 5.5×105.
The uncertainty on a fidelity measured at time t can be estimated by the standard error propagation, assuming that t is uncorrelated with either p0 or p1.
σF = σp20 + 2tσp0 σp1 ρ + σp21 t2 1/2
(80)
The value of σF as well as the ratio σF /F in the range of measured fidelities monotonically decreases. We take max(σF /F ) as the estimate of relative systematic uncertainty for fidelities measured in the same run. The value is found to be 4.4% and is used in subsequent analysis.
The physical origin of the observed system fluctuations can be attributed to many possible channels: 1/f flux noise, qubit T1 fluctuations, control signal drift, etc. We speculate that the dominant mechanism is the moderate interaction between a small number of TLSs and a few qubits at their idling and/or readout biases. In Fig. S40a, we show the result of measuring per-layer simultaneous pair XEB at a fixed depth of 14 cycles repeatedly over time. The quantity plotted is the ratio of the worst pair fidelity to best fidelity observed over the course of 30 minutes. This type of repetitive measurement allows us to pinpoint which pairs dominate the fluctuations in full system fidelity. Note that because we used fidelity at a fixed cycle depth rather than the one extracted from the exponential decay, these numbers contain the effect of fluctuating measurement fidelity as well.
As shown in Fig. S40a, the depth-14 fidelity of most pairs fluctuates downward by only 1% at depth 14, which translates to either a 1% fluctuation in measurement fidelity for a pair, or a 0.08% fluctuation in the two-qubit gate fidelity for a pair. Before finding the unstable TLS defect in Fig. S40b, a single qubit dominated the fluctuations in full system fidelity seen
41
a
b
c
XEB Fidelity, 𝓕XEB
d
e
circuit instance
Average over instances f
XEB Fidelity, 𝓕XEB
circuit instance
circuit instance
number of cycles, m
FIG. S38. Per-instance elided circuit fidelities and statistical uncertainties. XEB fidelities of all 10 elided circuit instances for each circuit depth from Fig. 4b of the main tex√t. a to e, Here, each panel corresponds to a single circuit depth m. In these panels, ±5σ statistical error bars, where σ = 1/ Ns, are shown for each of the individual circuit instance fidelities. Also shown is a band corresponding to ±σ for a single instance, but about the mean fidelity of the 10 instances, showing that the variations between circuits can be explained by statistical fluctuations from the finite number of samples. f, Fidelity averaged over all 10 circuits along with ±5√σ error bars are shown (the same quantity is plotted in Fig. 4b of the main text but on a log scale), where in this case σ = 1/ 10Ns. Here, for all circuit depths, the mean fidelity is more than 5σ above 0.001.
in Fig. S40c. After we moved this problematic qubit far from the fluctuating TLS, the fluctuations in fidelity during the actual quantum supremacy experiment (Fig. S39) were dominated by a handful of pairs containing qubits in the “degenerate” readout region (described in section VI). For these qubits, due to constraints from readout crosstalk we had little freedom in what readout detunings we could choose, and so the best we could do was to put some qubits near defects or transmon-resonator transition modes during readout. We speculate that this is where the remaining dominant fluctuations originate.
I. The fidelity result and the null hypothesis on quantum supremacy
We use the mean fidelity of ten 53-qubit 20-cycle circuits as the final benchmark of the system. In section VIII G we estimated the fidelity and statistical uncertainty to be (2.24 ± 0.18) × 103 using the linear cross entropy. In section VIII H we estimated the relative systematic uncertainty due to drift to be 4.4%. Combining these 2 estimations we arrive at the final fidelity as (2.24 ± 0.10(syst.) ± 0.18(stat.)) × 103. Fidelity estimates with statistical and systematic uncertainty for
other quantum circuits are shown in Figure S41.
As we show in section X, a noisy sampling of a random quantum circuit at fidelity F = 103 requires 5000 years with a classical computer with CPU power equivalent to 1 million cores, and it scales linearly with fidelity F . It takes a quantum computer less than an hour to complete the same noisy sampling. Therefore we form the null hypothesis that the fidelity of the quantum computer is F ≤ 103, and the alternative hypothesis that F > 103. If the alternative hypothesis is true, we can say that a classical computer can not perform the same noisy sampling task as the quantum computer.
The total uncertainty on fidelity is estimated with addition in quadrature of systematic uncertainty and statistical uncertainty. The mean fidelity of 10 random circuits with 53 qubits and 20 cycles is (2.24 ± 0.21) × 103. The null hypothesis is therefore rejected with a significance of 6σ.
While our analysis of the uncertainty in FXEB was computed from both statistical and systematic errors, some care should be taken in the consideration of systematic errors as they pertain to the claim of quantum supremacy. Systematic errors should be included if we wish to use the XEB fidelity value, for example comparing fidelities of patch, elided and full circuits. However
42
Fidelity
0.0058 0.0056 0.0054
53-qubit 16-cycle patch XEB fidelity Measured fidelities Linear max likelihood fit 1 band
0.0052
0.0050
0.0048
0.0046
0.0044
0.0042 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Time after calibration (hours)
FIG. S39. Stability of repeated 53-qubit 16-cycle patch circuit benchmarking over 17.4 hours, without any system recalibration. Statistical error bars from the finite bitstring sample number are included. The intrinsic system fluctuations are likely dominated by a small number of TLSs moderately coupled to a few qubits at their idling and/or readout biases.
for quantum supremacy, a false claim would arise if FXEB was zero, but we obtained a non-zero value because of a fluctuation. Systematic fluctuations produce a change in magnitude of XEB, as seen in the data in this section, which is thus a multiplicative-type error that does not change the XEB fidelity value when it is zero. A false positive is only produced by a additive-type statistical fluctuations and thus it is the only mechanism that should be considered when computing the uncertainty. Therefore, the 6σ significance of our claim should be considered as conservative.
Some skeptics have warned that a quantum computer may not be possible [65, 66], for example due to the fragility of quantum information at large qubit number and exponentially large Hilbert space. The demonstration here of quantum behavior at 1016 Hilbert space is strong confirmation that nothing unusual or unexpected happens to our current understanding of quantum mechanics at this scale.
IX. SENSITIVITY OF XEB TO ERRORS
An important requirement for a procedure used to evaluate quantum processors, such as XEB, is sensitivity to errors. Qubit amplitudes are complex variables and therefore quantum errors are inherently continuous. Nevertheless, they can be given a discrete description, for example in the form of a finite set of Pauli operators. The digital error model is used for instance in quantum error correction where errors are discretized by syndrome extraction. In this section we examine the impact of both discrete and continuous errors on the fidelity estimate
obtained from the XEB algorithm. The XEB procedure uses a set of random quantum
circuits U = {U1, . . . , US} with n qubits and m cycles. Every circuit is executed Ns times on the quantum processor under test. Each execution of the circuit Uj applies the quantum operation Λj, which is an imperfect realization of Uj, to the input state |0 0|. The result of the experiment is a set B of SNs bitstrings qi,j sampled from the distributions pe(qi,j) = qi,j| ρj |qi,j where ρj = Λj(|0 0|) is the output state in the experiments with circuit Uj. For each bitstring qi,j, a simulator computes the ideal probability ps(qi,j) = | qi,j|ψj |2 where |ψj = Uj |0 is the ideal output state of the circuit Uj. Finally, XEB uses Eq. (27) or (28) to compute an estimate FXEB(B, U ) of fidelity F (|ψj ψj| , ρj) = ψj| ρj |ψj averaged over circuits U . The result quantifies how well the quantum processor is able to realize quantum circuits of size n and depth m. See section IV for more details on XEB.
The estimate FXEB(B, U ) is a function of bistrings B obtained in experiment and of the set of quantum circuits U used to compute ideal probabilities. This enables a test of the sensitivity of the method to errors by replacing the error-free reference circuits U = {U1, . . . , US} with circuits UE = {U1,E, . . . , US,E} where Uj,E is the quantum circuit obtained from Uj by the insertion at a particular location in the circuit of a gate E representing the error. We identify errors inserted at different circuit locations that lead to the same output distribution since XEB cannot differentiate between them.
We first consider the impact of a discrete single-qubit Pauli error E placed in a random location in the circuit. In Fig. S42 we plot FXEB(B, UE) where B are bitstrings observed in our experiment and UE are quantum circuits modified by the insertion of an additional X or Z gate following an existing single-qubit gate. Each fidelity estimate corresponds to a different circuit location where the error gate has been inserted. For every n, the highest fidelity values correspond to the insertion of the Z gate in the final cycle of the circuit. They have no impact on measurements and thus are equivalent to absence of error. The corresponding fidelity estimates match the estimates for the unmodified circuits.
The probability of only seeing the error E is approximately q = ep where e is the probability of E arising at the particular circuit location and p is the probability that no other error occurs. The fraction q of executions realize circuit Uj,E ∈ UE yielding bitstrings BE while the remaining fraction 1 q yield bitstrings B. XEB averages over circuit executions, so
FXEB(B, UE) =
q FXEB(BE, UE) + (1 q) FXEB(B, UE). (81)
Since bitstrings BE originated in a perfect realization of UE we have FXEB(BE, UE) 1 with high probability. Also, assuming the circuits randomize the output q√uantum state sufficiently, we have FXEB(B, UE) 1/ D,
a
14-cycle parallel XEB repeated over 30 minutes:
b
Ratio of worst to best pair fidelity (before moving Q1,7 away from TLS)
43
Q1,7 unstable TLS
Qubit
c
51-qubit 14-cycle patch XEB stability
before tracking down unstable TLS
Row Fidelity
Column
Time (hours)
FIG. S40. Identifying sources of fluctuations with repetitive per-layer simultaneous pair XEB. a, Per-pair ratio of worst fidelity to best fidelity measured via per-layer simultaneous pair XEB at a depth of 14 cycles over the course of 30 minutes. During this time, fluctuations were dominated by a single TLS. b, Measured qubit T1 vs. f10 for Q1,7 at two different times a few minutes apart (red vs. blue points), showing an unstable TLS that was dominating the fluctuations in full system fidelity seen in c,. Moving Q1,7 far from this TLS led to the stability seen in Fig. S39.
where D = 2n, see Eq. (25) and Fig. S7. Therefore, for large n
1q
FXEB(B, UE )
q+ √ D
q
(82)
with high probability. Now, the probability p that no error other than E oc-
curs is approximately equal to the experimental fidelity F which is approximated by FXEB(B, U ), so
FXEB(B, UE) e FXEB(B, U )
(83)
which means that XEB result obtained using circuits modified to include E is approximately proportional to the XEB result obtained using the error-free reference circuits. Moreover, the ratio of the two XEB results is approximately equal to the probability of E.
The data in Fig. S42 agrees with the approximate proportionality in Eq. (83) and allows us to estimate the median probability of a Pauli error. Based on the drop in XEB fidelity estimate by a factor of almost 100 due to the insertion of one single-qubit Pauli error into the circuit, the probability is on the order of 1%. While more work on the gate failure model needs to be done to correctly relate Sycamore gate error rates to the probability
of specific Pauli errors, we already see that e has the same order of magnitude as our per cycle and per qubit error given by e2c/2 0.5%, see Table II. A possible resolution of the factor of two discrepancy may lie in the fact that more than one gate failure can manifest itself as a particular Pauli error E in a particular circuit location.
Lastly, we consider the impact of continuous errors on XEB result. Fig. S43 shows the fidelity estimate obtained from XEB using bitstrings observed in our experiment and quantum circuits modified to include a single rotation RZ(θ). The middle point of the plot is equal to the fidelity estimate obtained for one of the discrete errors in Fig. S42 whereas the leftmost and rightmost points correspond to the fidelity estimate obtained from XEB using the error-free reference circuit.
The analysis above illustrates how questions about the behavior and performance of quantum processors can be formulated in terms of modifications to the reference quantum circuits and how XEB can help investigate these questions. While XEB has proven itself a powerful tool for calibration and performance evaluation (see sections VI and VIII), more work is required to assess its efficacy as a diagnostic tool for quantum processors.
Statistical and total uncertainty
Linear XEB fidelity Log XEB fidelity
10-2
44
qubits, n run time in seconds
32
111
34
473
36
1954
38
8213
TABLE VI. Circuit simulation run times using qsim on a single Google cloud node (n1-ultramem-160).
XEB fidelity
10-3
12
14
16
18
20
m
FIG. S41. Statistical and total uncertainty of fidelity estimates produced by the linear XEB and logarithmic XEB from ten random quantum circuits with 53 qubits and m cycles. The inner error bars represent the statistical uncertainty discussed in section VIII G. The outer error bars represent the total uncertainty discussed in section VIII H.
X. CLASSICAL SIMULATIONS
A. Local Schro¨dinger and Schro¨dinger-Feynman simulators
We have developed two quantum circuit simulators: qsim and qsimh. The first simulator, qsim, is a Schr¨odinger full state vector simulator. It computes all 2n amplitudes, where n is the number of qubits. Essentially, the simulator performs matrix-vector multiplications repeatedly. One matrix-vector multiplication corresponds to applying one gate. For a 2-qubit gate acting on qubits q1 and q2 (q1 < q2), it can be depicted schematically by the following pseudocode.
#iterate over all values of qubits q > q2 for (int i = 0; i < 2^n; i += 2 * 2^q2) {
#iterate values for q1 < q < q2 for (int j = 0; j < 2^q2; j += 2 * 2^q1) {
#iterate values for q < q1 for (int k = 0; k < 2^q1; k += 1) {
#apply gate for fixed values #for all q not in [q1,q2] int l = i + j + k;
float v0[4]; #gate input float v1[4]; #gate output
#copy input
v0[0] = v[l]; v0[1] = v[l + 2^q1]; v0[2] = v[l + 2^q2]; v0[3] = v[l + 2^q1 + 2^q2];
#apply gate for (r = 0; r < 4; r += 1) {
v1[r] = 0; for (s = 0; s < 4; s += 1) {
v1[r] += U[r][s] * v0[s]; } }
#copy output v[l] = v1[0]; v[l + 2^q1] = v1[1]; v[l + 2^q2] = v1[2]; v[l + 2^q1 + 2^q2] = v1[3]; } } }
Here U is a 4x4 gate matrix and v is the full state vector. To make the simulator faster, we use gate fusion [67], single precision arithmetic, AVX/FMA instructions for vectorization, and OpenMP for multi-threading. We are able to simulate 38-qubit circuits on a single Google cloud node that has 3844 GB memory and four CPUs with 20 cores each (n1-ultramem-160). The run times for different circuit sizes at depth 14 are listed in Table VI.
The second simulator, qsimh, is a hybrid Schr¨odingerFeynman algorithm (SFA) simulator [38]. We cut the lattice into two parts and use the Schmidt decomposition for the 2-qubit gates on the cut. If the Schmidt rank of each gate is r and the number of gates on the cut is g then there are rg paths, corresponding to all the possible choices of Schmidt terms for each 2-qubit gate across the cut. To obtain fidelity equal to unity, we need to simulate all the rg paths and sum the results. The total run time is proportional to (2n1 + 2n2 )rg, where n1 and n2 are the qubit numbers in the first and second parts. Each part is simulated by qsim using the Schr¨odinger algorithm. Path simulations are independent of each other and can be trivially parallelized to run on supercomputers or in data centers. Note that one can run simulations with fidelity F < 1 just by summing over a fraction F of all the paths (see Ref. [38] and Sec. X D). In order to speed up the computation, we save a copy of the state after the first p
a 100
10 1 10 2
b
0.06 0.05 0.04
45
100 10 1 10 2
XEB fidelity
Fraction of simulations
XEB fidelity
10 3
0.03
10 3
0.02
10 4 10 4
0.01
10 3 12 14 16 18 20 22 24 26 28 30
0.00
Number of qubits, n
10 4
Reference circuits
10 4
Circuits with X or Z error (median)
Circuits with X or Z error (quartiles)
12 14 16 18 20 22 24 26 28 30 10 3
Number of qubits, n
FIG. S42. Impact of one single-qubit Pauli error on fidelity estimate from XEB. a, Distributions of fidelity estimates from XEB using measured bitstrings and quantum circuits with one bit-flip or one phase-flip error. For each n, shades of blue represent the normalized histogram of the estimates obtained for the error gate placed at different circuit locations. The highest fidelity estimates correspond to phase-flip errors immediately preceding measurement and are equal to the fidelity estimates from XEB using error-free circuits. b, Quartiles of the distributions shown in a (blue) compared to the fidelity estimates from XEB using measured bitstrings and unmodified quantum circuits (red). Both plots use linear scale between 104 and 104 and logarithmic scale everywhere else.
2-qubit gates across the cut, so the remaining rgp paths can be computed without re-starting the simulation from the beginning. We call the specific choice of Schmidt terms for the first p gates in the cut a prefix.
B. Feynman simulator
qFlex was introduced in Ref. [50] and later adapted to GPU architectures in Ref. [68] to allow efficient computation on Summit, currently the worlds Top-1 supercomputer. qFlex is open source and available at https://github.com/ngnrsaa/qflex. Given a random quantum circuit, qFlex computes output bitstring amplitudes by adding all the Feynman path contributions via tensor network (TN) contractions [69, 70], and so it follows what we call a Feynman approach (FA) to circuit sampling. TN simulators are known to outperform all other methods for circuits with low depth or a large number of qubits (e.g., Ref. [68] successfully simulates 121 qubits at low depth using this technique), as well as for small sample sizes (Ns), since simulation cost scales linearly with Ns.
TN simulators compute one amplitude (or a few amplitudes; see below) per contraction of the entire network. In order to sample bitstrings for a given circuit, a set of
random output bitstrings is chosen before the computation starts. Then, the amplitudes for these bitstrings are computed and either accepted or rejected using frugal rejection sampling [38]. This ensures that the selected subset of bitstrings is indistinguishable from bitstrings sampled from a quantum computer. The cost of the TN simulation is therefore linear in the number of output bitstrings. This makes TN methods more competitive for small sets of output bitstrings.
The optimization of qFlex considers a large number of factors to achieve the best time-to-solution on current supercomputers, an approach that often diverges from purely theoretical considerations on the complexity of TN contractions. More precisely, qFlex implements several features such as:
• Avoidance of distributed tensor contractions: by “cutting” the TN (slicing some indexes), the contraction of the TN is decomposed into many paths that can be contracted locally and independently, therefore avoiding internode communication, which is the main cause for the slowdown of distributed tensor contractions.
• Contraction orderings for high arithmetic intensity: TN contraction orderings are chosen so that the expensive part of the computation con-
46
PFlop/s* efficiency (%)
qubits cycles FXEB (%) Ns nodes runtime peak sust. peak sust. power (MW) energy (MWh)
0.5
1M
1.29 hours
8.21
12
1.4 0.5M
1.81 hours** 235.2 111.7 57.4 27.3
5.73
11.2**
1.4
3M
10.8 hours**
62.7**
53
2.22 × 106 1M 4550 0.72 hours
6.11
14
0.5
1M
1.0 0.5M
67.7 days** 347.5 252.3 84.8 61.6 67.7 days**
7.25
1.18 × 104** 1.18 × 104**
1.0
3M
1.11 years**
7.07 × 104**
TABLE VII. Runtimes, efficiency and energy consumption for the simulation of random circuit sampling of Ns bitstrings from Sycamore with fidelity F using qFlex on Summit. Simulations used 4550 nodes out of 4608, which represents about 99% of Summit. Single batches of 64 amplitudes were computed on each MPI task using a socket with three GPUs (two sockets per node); given that one of the 9100 MPI tasks acts as master, 9099 batches of amplitudes were computed. For the circuit with 12 cycles, 144/256 paths for these batches were computed in 1.29 hours, which leads to the sampling of about 1M bitstrings with fidelity F ≈ 0.5% (see Ref. [50] for details on the sampling procedure); runtimes and energy consumption for other sample sizes and fidelities are extrapolated linearly in Ns and F from this run. At 14 cycles, 128/524288 paths were computed in 0.72 hours, which leads to the sampling of about 1M bitstrings with fidelity 2.22 × 106. In this case, one would need to consider 288101 paths on all 9099 batches in order to sample about 1M (0.5M) bitstrings with fidelity F ≈ 0.5% (1.0%). By extrapolation, we estimate that such computations would take 1625 hours (68 days). For Ns =3M bitstrings and F ≈ 1.0%, extrapolation gives us an estimated runtime of 1.1 years. Performance is higher for the simulation with 14 cycles, due to higher arithmetic intensity tensor contractions. Power consumption is also larger in this case. Job, MPI, and TAL-SH library initialization and shutdown times, as well as initial and final IO times are not considered in the runtime, but they are in the total energy consumption. *Single precision. **Extrapolated from the simulation with a fractional fidelity.
XEB fidelity
0.2
Reference circuits, Circuits with Rz( ) error
cos2( /2)
0.1
0.0
0
2
32
2
FIG. S43. Impact of the Rz(θ) error on XEB. Fidelity estimates computed by XEB from measured bitstrings and circuits with n = 20 qubits and m = 14 cycles modified to include Rz(θ) error applied in 10th cycle to one of the qubits as a function of θ (orange dots). Also shown is XEB fidelity computed using the same bitstrings and unmodified circuits (blue solid line) and a simple model which predicts the effect of the error (green dashed line).
sists of a small number of tensor contractions with high arithmetic intensity. This lowers the time-tosolution.
• Highly efficient tensor contractions on GPU: the back-end TAL-SH library [71] provides fully asynchronous execution of tensor operations on GPU and fast tensor transposition, allowing out-ofcore tensor contractions for instances that exceed GPU memory. This achieves very high efficiency
(see Table VII) on high arithmetic intensity contractions.
In addition, qFlex implements two techniques in order to lower the cost of the simulation:
• Noisy simulation: the cost of a simulation of fidelity F < 1 (F ≈ 5 × 103 in practice) is lowered by a factor 1/F, i.e., is linear in F [38, 50].
• Fast sampling technique: the overhead in applying the frugal rejection sampling mentioned above is removed by this technique, giving an order of magnitude speedup [50]. This involves the computation of the amplitudes of a few correlated bitstrings (batch) per circuit TN contraction.
As shown in Table VII, qFlex is successful in simulating Sycamore with 12 cycles on Summit, sampling 1M bitstrings with fidelity close to 0.5% in 1.29 hours. At 14 cycles, we perform a partial simulation and extrapolate the simulation time for the sampling of 1M bitstrings with fidelity close to 0.5% using Summit, giving an estimated 68 days to complete the task. Sampling 3M bitstrings at 14 cycles with fidelity close to 1.0% (average experimentally realized fidelity) would take an estimated 1.1 years to complete. Other estimates for different sample sizes and fidelities can be found in Table VII. At 16 cycles and beyond, however, the enormous amount of Feynman paths required so that the computation does not exceed the 512 GB of RAM of each Summit node makes the computation impractical.
The contraction of the TNs involved in the computation of amplitudes from Sycamore using qFlex is preceded by a simplification of the circuits, which allows
Raw circuit
28
19
11
2
0
29
20
12
6
3
1
37
30
21
13
7
4
38
31
22
14
8
5
44
39
32
23
15
9
45
40
33
24
16
10
49
46
41
34
25
17
50
47
42
35
26
18
52
51
48
43
36
27
28
19
11
2
0
29
20
12
6
3
1
37
30
21
13
7
4
38
31
22
14
8
5
44
39
32
23
15
9
45
40
33
24
16
10
49
46
41
34
25
17
50
47
42
35
26
18
52
51
48
43
36
27
SWAP-cphase transformation
28
19
11
2
0
29
20
12
6
3
1
37
30
21
13
7
4
38
31
22
14
8
5
44
39
32
23
15
9
45
40
33
24
16
10
49
46
41
34
25
17
50
47
42
35
26
18
52
51
48
43
36
27
5
6
cut
47
SWAP-cphase transformation & cuts
28
19
11
2
0
29
20
012
6
3
1
37
30
521
4 13
7
4
38
31
22
14
8
5
44
39
32
23
15
9
45
40
33
24
16
10
49
46
41
34
25
17
50
47
42
35
26
18
52
51
48
43
36
27
28
19
11
2
0
29
20
12
6
3
1
37
30
21
13
7
4
38
31
22
14
8
5
44
39
32
23
15
9
45
40
33
24
16
10
49
46
41
34
25
17
50
47
42
35
26
18
52
51
5
48
6
43
36
27
7
8
cut
28
19
11
2
0
29
20
012
6
3
1
37
30
121
1 13
7
4
38
31
22
14
8
5
44
39
32
23
15
9
45
40
33
24
16
10
49
46
41
34
25
17
50
47
42
35
26
18
52
51
48
43
36
27
FIG. S44. Logarithm base 2 of the bond (index) dimensions of the tensor network to contract for the simulation of sampling from Sycamore with 12 cycles (top) and 14 cycles (bottom) using qFlex. The left plots represent the tensor network given by the circuit. The middle plots represent the tensor network obtained from a circuit where fSim gates have been transformed, when possible (see main text). The right plots represent the tensor network after the gate transformations and cuts (gray bonds) have been applied; the log2 of the bond dimensions of the indexes cut are written explicitly. For 12 cycles, there are 25 × 21 × 22 = 28 = 256 cut instances (paths); for 14 cycles, there are 27 × 27 × 25 = 219 = 524288 cut instances.
us to decrease the bond (index) dimension of some of the indexes of the TN. This comes from the realization that fSim(θ = π/2, φ) = i · [Rz(−π/2) ⊗ Rz(−π/2)] · cphase(π + φ) · SWAP (see Sections VI and VII E); note that the SWAP gate can be applied either at the beginning or at the end of the sequence. We apply this transformation to all fSim gates at the beginning (end) of the circuit that affect qubits that are not affected by any other two-qubit gate before (after) in the circuit. The SWAP is then applied to the input (output) qubits and their respective one-qubit gates trivially, and the bond dimension remaining from this gate is 2, corresponding to the cphase gate, as opposed to the bond dimension 4 of the original fSim gate. Note that in practice this identity is only approximate, since θ ≈ π/2; we find that transforming all gates described above causes a drop in fidelity to about 95%.
After the above simplification is applied, we proceed to cut (slice) some of the indexes of the TN (see Ref. [50] for details). The size of the slice of the index involved in each cut (the effective bond dimension of the index) is variable, and is chosen differently for different number of cycles on the circuit. Cutting indexes decomposes the contraction of the TN into several simpler contractions, whose results are summed after computing them
independently on different nodes of the supercomputer. Fig. S44 shows the bond dimensions of the TN corre-
sponding to the circuits with 12 and 14 cycles simulated. We can see the decrease in bond dimension after the fSim simplification is applied, as well as the remaining bond dimension on the indexes cut for each case.
Finally, we contract the tensor network corresponding to the computation of a set of amplitudes (for fast sampling) for a particular batch of output bitstrings. The contraction ordering, which is chosen (together with the size and position of the cuts) in order to minimize the time-to-solution of the computation (which involves a careful consideration of the memory resources used and the efficiency achieved on the GPUs) is shown in Fig. S45. The computation can be summarized in the following pseudo-code, where α, β, and γ are variables that denote the different instances of the cuts:
# Qubits on C are used for fast sampling. # size_of_batch amps. per circuit contraction. size_of_batch = 2^num_qubits(C)
# Placeholder for all amplitudes in the batch. batch_of_amplitudes = zeros(size_of_batch)
# Start contracting...
1
28
19
11
2
0
29
20
12
6
3
1
0
37
30
21
13
7
4
5
4
38
31
22
14
8
5
44
39
32
23
15
9
45
40
33
24
16
10
49
46
41
34
25
17
50
47
42
35
26
18
52
51
48
43
36
27
5
28
19
11
29
2 20
12 0
6
2
C0
3
1
37
30
21
13
7
4
5
4
44 49 52
38 39
45
1 46
50 51
31
340
47
732 441
48
22
833 542
23
934 643
14 15
24 25
1035 36
8
5
9
16
10
26
B17
18
27
2
28
19
11
29
20
12
6
32 3
0
1 21
37
30
21 5
0 13
4
7
44
38
31
22
14
8
5
644
39
32
23
15
9
249
45
546
40
41
1 50
52
351
447 48
33 42
34 43
24 35
25 36
16
426
317 227
10
118
6
28
19
11
29
20
12
0
6
2
C0
3
1
37
30
21
13
7
4
5
4
44
38
39
E31
32
22
23
14
15
8
9
5
45
40
33
24
16
10
49
50
46
47
41
42
34
35
25
26
B17
18
52
51
48
43
36
27
3
528 29
837 38
44
319
630 939
20
731
111 421
212 0
5
4
22
32
13 23
6 14
2 7 15
C0
3
1
4
8
5
9
49
A45 46 50
40 47
41
33 42
34
24 35
25
16
10
26
B17
18
52
51
48
43
36
27
7
28 37 44 49
29 38 45 50
19
20
30
31
39
1
40
46
47
11 21 32 41
2
0
12
6
3
1
14 15 0
13
7
4
10 13 5
4
22
14
8
5
2 9 12 23
15
9
3 8 11 33
24
16
10
4 7 34
25
17
5 6 42
35
26
18
52
51
48
43
36
27
48
4
28
29
D19 20
11
12 0
6
2
C0
3
1
37
30
21
13
7
4
5
4
38
31
22
14
8
5
44
39
32
23
15
9
49
A45 46 50
40 47
41
33 42
34
24 35
25
16
10
26
B17
18
52
51
48
43
36
27
8
28
19
11
2
0
29
20
12
6
3
1
0
37
30
21
13
7
4
44
38
39
31
32
5
F4
22 23
14
15
8
9
5
45
40
33
24
16
10
49
46
41
34
25
17
50
47
42
35
26
18
52
51
48
43
36
27
FIG. S45. TN contraction ordering for the computation of a batch of amplitudes for the simulation of Sycamore with 12 and 14 cycles. Dotted qubits are used for fast sampling; the output index is left open. Three indexes are cut, with remaining bond dimensions given in Fig. S44, and all possible cut instances are labelled by variables α, β, and γ (panel 1). Tensors A, B, and C are independent of cut instances, and so are contracted only once (panels 2 and 3) and reused several times. Given a particular instance of α and β, tensors D (panels 3 and 4) and subsequently E (panels 5 and 6) are contracted; tensor E will be reused in the inner loop. For each instance of γ (inner loop), tensor F is contracted (panels 7 and 8), which gives the contribution to the batch of amplitudes (open indexes on C and specified output bits otherwise) from a particular (α, β, γ) instance (path). The sequence of tensor contractions leading to building a tensor are enumerated, where each tensor is contracted to the one formed previously. For simplicity, the contraction of two single-qubit tensors onto a pair before being contracted with others (e.g., tensor 10 in the yellow sequence of panel 5) is not shown on a separate panel; these pairs of tensors are computed first and are reused for all cut instances.
contract(A) # Panel 2 contract(B) # Panel 2 contract(C) # Panel 2
# alpha labels instances of 1st cut for each alpha {
# beta labels instances of 2nd cut for each beta {
contract(D) # Panels 3 & 4 contract(E) # Panels 5 & 6
# gamma labels instances of 3rd cut for each gamma {
contract(F) # Panels 7 & 8
# Add contribution from this # path (alpha, beta, gamma). batch_of_amplitudes += F } } }
Dotted qubits on Fig. S45 denote the region used for fast sampling, where output indexes are left open. The circuit TN contraction leads to the computation of 64 amplitudes of correlated bitstrings (tensor F ). Note that computing only a fraction F of the paths results in am-
plitudes with a fidelity roughly equal to F. Computing a set of perfect fidelity batches of amplitudes, where the number of batches is smaller than the number of bitstrings to sample also provides a similar fidelity F in the sampling task, where F is equal to the ratio of the number of batches to the number of bitstrings in the sample. A hybrid approach (fraction of batches, each only with a fraction of paths), which we use in practice, also provides a similar sampling fidelity. See Refs. [38, 50] and Section X A for more details.
A new feature of qFlex, implemented for this work, is the possibility to perform out-of-core tensor contractions (of tensors that exceed GPU memory) over more than one GPU on the same node. Although the arithmetic intensity requirements to achieve high efficiency are now higher (about an arithmetic intensity of 3000 for an efficiency close to 90% over three GPUs, as opposed to 1000 for a similar efficiency using a single GPU), the fact that a large part of a node is performing a single TN contraction lets us work with larger tensors, which implies reducing the number of cuts, as well as increasing the bond dimension of each cut; this, in turn, achieves better overall time-to-solution for sampling than simulations based on TNs with smaller tensors and with a lower memory footprint during their contraction (which could perhaps show a higher GPU efficiency due to the simultaneous use of each GPU for independent
49
TNs). It is worth noting that the TN contraction ordering presented in Fig. S45 provides us with the best time-to-solution after considering several possibilities for the simulation of sampling from Sycamore using qFlex for both 12 and 14 cycles. This is generally not the case, since different numbers of cycles generate different TNs, which generally have different contraction schemes for best simulation time-to-solution.
Sampling of random circuits on Sycamore is difficult to simulate with TN simulators at 16 cycles and beyond. Indeed, FA simulators suffer from an exponential scaling of runtime with circuit depth. For qFlex, this is manifested in the large size of the tensors involved in the circuit TN contraction (this size grows exponentially with the number of cycles of the circuit), which require a large number of cuts in order not to exceed the RAM of a computation node, and which in turn generates an impractical number of Feynman paths. For other simulators, such as the one presented in Ref. [72], the number of projected variables is expected to be so large that the computation time (which increases exponentially with the number of projected variables) on a state-of-the-art supercomputer makes the computation impractical; see Section X E for a detailed analysis. For TN-based simulators that attempt the circuit contraction distributed over several nodes (without cuts) [73], we expect the size of the largest tensor encountered during the TN contraction (which grows exponentially with depth) to exceed the RAM available on any current supercomputer. Not having enough memory for a simulation is the problem that led to developing FA simulators in the first place, for circuits of close to 50 qubits and beyond, for which the Schr¨odinger simulator (see Section X C) requires more memory to store the wave function than available. FA simulators give best performance as compared to other methods in situations with a large number of qubits and low depth. For circuits where both the number of qubits and the number of cycles are considered large enough to make the computation expensive, and contribute equally in doing so (formally, each linear dimension of the qubit grid is comparable to the time dimension), like the supremacy circuits considered in this work, we expect SFA of Section X A to be the leading approach for sampling from a random circuit, given a large enough sample size ( 1M in this work); note the linear dependence of the runtime of FA with sample size, which is absent for SFA.
C. Supercomputer Schro¨dinger simulator
We also performed supercomputer Schr¨odinger simulations in the Ju¨lich Supercomputing Centre. For a comprehensive description of the universal quantum computer simulators JUQCS-E and JUQCS-A, see Refs. [74] and [75].
For a given quantum circuit U designed to generate
a random state, JUQCS-E [75] executes U and com-
putes (in double precision floating point) the probabil-
ity distribution pU (j) for each output or bitstring j ∈ {0, . . . , D 1}, where D = 2n, n denoting the number of
qubits. JUQCS-E can also compute (in double precision
floating point) the corresponding distribution function
PU (k) =
k j=0
pU
(j
)
and
sample
bitstrings
from
it.
We
denote by U the set of m states generated by executing
the circuit U . A new feature of JUQCS-E, not docu-
mented in Ref. 75, allows the user to specify a set Q of
M bitstrings for which JUQCS-E calculates pU (j) for all
j ∈ Q and saves them in a file.
Similarly, for the same circuit U , JUQCS-A [75] computes (with adaptive two-byte encoding) the probability distribution pA(j) for each bitstring j ∈ {0, . . . , D 1}. Although numerical experiments with Shors algorithm for up to 48 qubits indicate that the results produced by JUQCS-A are sufficiently accurate, there is, in general, no guarantee that pA(j) ≈ pU (j). In this sense, JUQCS-A can be viewed as an approximate simulator of a quantum computing device.
In principle, sampling states with probabilities pA(j) requires the knowledge of the distribution function
PA(k) =
k j=0
pA(j).
If
D
is
large,
and
pA(j)
O(1/D),
as in the case of random states, computing PA(k) requires
the sum over j to be performed with sufficiently high
precision. For instance, if D = 239, pA(j) ≈ O(1012) and even with double precision arithmetic (≈ 16 dig-
its), adding D = 239 small numbers requires some care.
Note that in practice, each MPI process only calculates
a partial sum, which helps to reduce the loss of signif-
icant digits. JUQCS-A can compute PA(k) in double precision and sample bitstrings from it. We denote by
A the set of M bitstrings generated by JUQCS-A af-
ter executing the circuit U . Activating this feature re-
quires additional memory, effectively reducing the max-
imum number of qubits that can be simulated by three.
This reduction of the maximum number of qubits might
be avoided as follows. In the case at hand, we know that
all pA(j) ≈ O(1/D). Then, since pA(j) is known, one might as well sample the states from a uniform distribu-
tion, list the weight wA(j) = N pA(j) for each generated state j and use these weights to compute averages. We
do not pursue this possibility here because for the present
purpose, it is essential to be able to compute pU (j) and
therefore, the maximum number of qubits that can be
studied is limited by the amount of memory that JUQCS-
E, not JUQCS-A, needs to perform the simulation.
50
For an XEB comparison, the quantities of interest are
D1
αU,U ≡ log D + γ + pU (j) log pU (j), (84)
j=0
D1
αA,U ≡ log D + γ + pA(j) log pU (j), (85)
j=0
D1
αA,A ≡ log D + γ + pA(j) log pA(j), (86)
j=0
1
αX ,U ≡ log D + γ + M
log pU (j),
(87)
j∈X
where X is one of the four sets U, A, M (a collection of bitstrings generated by the experiment), or C (obtained by generating bistrings distributed uniformly). If M is sufficiently large (M = 500000 in the case at hand), we may expect that αU,U ≈ αU,U and αA,U ≈ αA,U .
In addition to the cross entropies Eqs. (84)(87), we also compute the linear cross entropies
D1
αU,U ≡ pU (j)(DpU (j) 1),
j=0
D1
αA,U ≡ pA(j)(DpU (j) 1),
j=0
D1
αA,A ≡ pA(j)(DpA(j) 1),
j=0
1 αX ,U ≡ M (DpU (j) 1).
j∈X
(88) (89) (90) (91)
Table VIII presents simulation results for the αs defined by Eqs. (84)(87) and for the αs defined by Eqs. (88)(91), obtained by running JUQCS-E and JUQCS-A on the supercomputers at the Ju¨lich Supercomputer Centre. For testing quantum supremacy using these machines, the maximum number of qubits that a universal quantum computer simulator can handle is 43 (45 on the Sunway TaihuLight at Wuxi China [75]).
The fact that in all cases, αU,U ≈ αA,A ≈ 1 supports the hypothesis that the circuit U , executed by either JUQCS-E or JUQCS-A, produces a Porter-Thomas distribution. The fact that in all cases, αU,U ≈ 1 supports the theoretical result that replacing the sum over all states by the sum over M = 500000 states yields an accurate estimate of the former (see Section IV). Although αA,A ≈ 1 in all cases, using the sample A generated by JUQCS-A to compute αA,U shows an increasing deviation from one, the deviation becoming larger as the number of qubits increases. In combination with the observation that αA,A ≈ 1, this suggests that JUQCS-A produces a random state, albeit not the same state as JUQCS-E. Taking into account that JUQCS-A stores the coefficients of each of the basis states as two single-byte numbers and not as two double precision floating point numbers (as JUQCS-E does), this is hardly a surprise.
From Table VIII it is clear that the simulation results for αX ,U and αX ,U where X = A, M, C are consistent. The full XEB fidelity estimates αM,U and αM,U , that is the values computed with the bitstrings produced by the experiment, are close to the fidelity estimates of the probabilistic model, patch XEB, and elided XEB, as seen in Fig. 4(a) of the main text.
For reference, in Tables IX and X we present some technical information about the supercomputer systems used to perform the simulations reported in this appendix and give some indication of the computer resources used.
D. Simulation of random circuit sampling with a target fidelity
A classical simulator can leverage the fact that experimental sampling from random circuits occurs at low fidelity FXEB by considering only a small fraction of the Feynman paths (see Secs. X A and X B) involved in the simulation [38], which provides speedups of at least a factor of 1/FXEB. This is done by Schmidt decomposing a few two-qubit gates in the circuit and counting only a fraction of their contributing terms (paths). A key assumption here is that the different paths result in orthogonal output states, as was studied in Ref. [38] and later in Ref. [50]. In what follows, we argue that, provided the generation of paths through decomposing gates, the Schmidt decomposition is indeed the optimal approach to achieving the largest speedup, i.e., that the fidelity kept by considering only a fraction of paths is largest when keeping the paths with the largest Schmidt coefficient. This is different from proving the optimality of the Schmidt decomposition of a single gate, since here we refer to the fidelity of the entire output state, and decomposed gates are embedded in a much larger circuit. In addition, we show that, for the two-qubit gates used in this work, the speedup is very close to linear in FXEB (and not much larger), since their Schmidt spectrum is close to flat. We close this section by relating the present discussion to Section VII G 2, where the formation of simplifiable gate patterns in some two-qubit gate tilings of the circuit is introduced.
In summary, this section provides a method to simulate approximate sampling with a classical computational cost proportional to FXEB. Sec. XI argues, based on complexity theory, that this scaling is optimal. We note that Refs [7880] propose an alternative method to approximately sample the output distribution at low fidelity. In essence, this method relies on the observation that, for some noise models, the high weight Fourier components of the noisy output distribution decay exponentially to 0. Then this method proposes to estimate low weight Fourier components with an additive error which is polynomial in the computational cost. Nevertheless, Ref. [81] shows that all Fourier components of the output distribution of random circuits are exponentially small, and therefore they can not be estimated in polynomial
51
TABLE VIII. Simulation results for various αs as defined by Eqs. (84)(87), obtained by JUQCS-E and JUQCS-A. The results for the αs defined by Eqs. (88)(91) are given in parenthesis. The set of bitstrings M has been obtained from experiments. In the first column, the number in parenthesis is the circuit identification number. Horizontal lines indicate that data is not available (and would require additional simulation runs to obtain it).
qubits
30 39(0) 39(1) 39(2) 39(3) 42(0) 42(1) 43(0) 43(1)
αU,U
1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
αA,A
1.0000 1.0000 1.0000 1.0000 1.0000 1.0001 1.0000 1.0001 1.0000
αU ,U
0.9997 0.9992 1.0002 0.9996 0.9999 0.9998 1.0027 1.0013
—–
αA,U (αA,U )
0.8824 (0.8826) 0.4746 (0.4762)
—–; (—–) —–; (—–) —–; (—–) 0.4264 (0.4268) —–; (—–) 0.3807 (0.3784) —–; (—–)
αM,U (αM,U )
0.0708 (0.0711) 0.0281 (0.0261) 0.0350 (0.0362) 0.0351 (0.0332) 0.0375 (0.0355) 0.0287 (0.0258) 0.0254 (0.0273) 0.0182 (0.0177) 0.0217 (0.0204)
αC,U (αC,U )
+0.0026 (+0.0017) 0.0003 (0.0011)
—–; (—–) —–; (—–) —–; (—–) 0.0024 (0.0001) —–; (—–) 0.0010 (0.0003) —–; (—–)
TABLE IX. Specification of the computer systems at the Ju¨lich Supercomputing Centre used to perform all simulations reported in this appendix. The row “maximum # qubits” gives the maximum number of qubits n that JUQCS-E (JUQCS-A) can simulate on a specific computer.
Supercomputer
CPU
Peak performance Clock frequency Memory/node
# cores/node # threads/core used maximum # nodes used maximum # MPI processes used maximum # qubits
JURECA-CLUSTER [76]
Intel Xeon E5-2680 v3 Haswell
1.8 PFlop/s 2.5 GHz 128 GB 2 × 12 1 256 4096 40 (43)
JURECA-BOOSTER [76]
Intel Xeon Phi 7250-F Knights Landing
5 PFlop/s 1.4 GHz 96 GB + 16 GB (MCDRAM)
64 1 512 32768 41 (44)
JUWELS [77]
Dual Intel Xeon Platinum 8168
10.4 PFlops/s 2.7 GHz 96 GB 2 × 24 3 2048 32768 43 (46)
time with this method. The conclusion is then that the noisy output distribution can be approximated by sampling bitstrings uniformly at random, the distribution for which all Fourier components are 0. This is consistent with Ref. [27] and Secs. IV and VIII E, but it will produce a sample with FXEB = 0, while the output of the experimental samples at 53 qubits and m = 20 still has FXEB ≥ 0.1%
1. Optimality of the Schmidt decomposition for gates embedded in a random circuit
Consider a two-qubit gate Vab acting on qubits a and b. We would like to replace it by a tensor product operator Ma ⊗ Nb. The final state of the ideal circuit is
|ψ := U2VabU1|0n
(92)
where U1(U2) is a unitary composed by all the gates applied before (after) Vab. The final normalized state of the circuit with the replacement by Ma ⊗ Nb is
|φM,N := U2(Ma ⊗ Nb)U1|0n / U2(Ma ⊗ Nb)U1|0n . (93)
We would like to find M, N which maximize the fidelity of the two states, given by
ψ|φM,N = 0n|U1†Va†b|β / β|β ,
(94)
where
|β ≡ (Ma ⊗ Nb)U1|0n
(95)
As the overlap is invariant if we multiply (Ma ⊗ Nb) by a constant, we fix the normalization tr[(Ma ⊗ Nb)†(Ma ⊗ Nb)] = 1.
We now make the assumption that the circuit is ran-
dom (or sufficiently scrambling) and that the Vab is a gate placed sufficiently in the middle of the computation that the reduced density matrix of qubits a and b of U1|0n shows maximal mixing between the two. In more detail,
let
ε :=
tr\(a,b)(U1|0n
0n|U1†)
I 4
,
2
(96)
with X 2 := tr(X†X)1/2 the Hilbert-Schmidt norm and tr\(a,b) the partial trace of all qubits except a and b.
Using Eq. (96) and Eq. (94), we find
52
TABLE X. Representative elapsed times and number of MPI processes used to perform simulations with JUQCS-E and JUQCS-A on the supercomputer indicated. Note that the elapsed times may fluctuate significantly depending on the load of the machine/network.
qubits
30 39 42 43
gates
614 802 864 886
Supercomputer
BOOSTER CLUSTER JUWELS JUWELS
JUQCS-E MPI processes
128 4096 16384 32768
Elapsed time
0:02:28 0:42:51 0:51:16 1:01:53
Supercomputer
CLUSTER CLUSTER JUWELS JUWELS
JUQCS-A MPI processes
128 4096 8192 32768
Elapsed time
0:05:23 1:38:42 2:15:48 1:32:19
ψ|φM,N
= tr(tr\(a,b)(U1|0n 0n|U1†)Va†b(Ma ⊗ Nb)) (97)
=
1 4
tr[Va†b(Ma
Nb)]
±
(Ma ⊗ Nb)
2
Vab
2ε.
As (Ma ⊗ Nb) 2 = 1 and Vab 2 = 2, we find
ψ|φM,N
=
1 4
tr[Va†b(Ma
Nb)] ± 2ε.
(98)
Refs. [82, 83] proved that for a random circuit U1 of depth D in one dimension, ε ≤ (4/5)D. In two dimensions we expect ε to go to zero even faster with depth, so we can ignore the second term of Eq. (98) for sufficiently large depth.
We now want to find Ma, Nb which are optimal for
Ma ,Nb :
max
Ma 2=
Nb
tr[Va†b(Ma ⊗ Nb)].
2
(99)
At this point, we have reduced the problem to finding the optimal decomposition of the gate as a standalone operator.
Consider the operator Schmidt decomposition of Vab:
Vab = λiRa,i ⊗ Sb,i,
i
(100)
where Ra,i (Sb,i) are orthonormal set of operators in the Hilbert-Schmidt inner product, i.e. tr(Ra†,iRa,j) = tr(Sa†,iSb,j) = δij. The Schmidt singular values λ1 ≥ λ2 ≥ . . . are in decreasing order. Then it follows that
the solution of Eq. (99) is λ1, with optimal solution
Ma = Ra,1 and Nb = Sb,1. Indeed we can write Eq. (99)
as
max x|V |y
|x ,|y
(101)
where the maximum is over all unit vectors |x , |y in (C2)⊗2 and V is the matrix
V := λi(Ra,i ⊗ I)|Φ Φ|(Sb†,i ⊗ I)
i
(102)
with |Φ = i |i ⊗ |i . This can be verified using the fact that any unit vector |x in (Cd)⊗2 can be written
p(| |)
14 12 10 8 6 4 2
0 0.00 0.02 0.04 0.0|6 0|.08 0.10 0.12 0.14
FIG. S46. Probability distribution of the deviations |δθ| from θ ≈ π/2 for fSim gates. The magnitude of δθ is directly related to the runtime speedup low fidelity classical sampling can take from exploiting the existence of paths with large Schmidt coefficients. In practice, |δθ| ≈ 0.05 radians on average, which imposes a bound of less than an order of magnitude on this potential speedup for the circuits, gates, and simulation techniques considered in this work.
as |x = (L ⊗ I)|Φ for a matrix L acting on (Cd) s.t. L 2 = 1. The result follows by noting that λi are the singular values of V .
The argument above easily generalizes to the problem
of finding the optimal operator of Schmidt rank k for
replacing the unitary gate. In that case the optimal
choice is
k i=1
λiRa,i
Sb,i.
2. Classical speedup for imbalanced gates
We now want to analyze the Schmidt spectrum of the two-qubit gates used in this work. The fSim(θ, φ) gate is introduced in Section VII E. This gate, which is presented in matrix form in Eq. (53), has the following Schmidt
53
103
F = 0. 001
g = 10
F = 0. 002
g = 15
F = 0. 004
g = 20
F = 0. 006
g = 25
102
F = 0. 01
g = 30
F = 0. 014
g = 35
speedup
101
g = 35
F = 0. 001
100
0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20
|δθ|
|δθ|
FIG. S47. Classical speedup given by the imbalance in the Schmidt coefficients of the gates decomposed. The speedup is computed by comparison with the case where θ = π/2 exactly. The classical simulation has a target fidelity F, and g fSim gates are decomposed. For simplicity, we assume θ = π/2 + δθ is the same for all gates, as well as φ = π/6. Left: speedup at different target fidelities for fixed g = 35. Note that the speedup decreases with F; this is due to the fact that at very low fidelity, considering a few paths with very high weight might be enough to achieve the target fidelity, while for larger values of F, paths with a smaller weight have to be considered, and so a larger number of them is needed per fractional fidelity increase. Right: speedup for fixed fidelity F = 0.001 for different values of g. As expected, the speedup is greater as g increases, since the weight of the highest contributing paths increases exponentially with g. The largest speedup is achieved at large g and small F. For g = 35 and F 0.001, we find speedups well below an order of magnitude, given that |δθ| ≈ 0.05 radians in practice (shaded area); this case is representative of our simulation of Sycamore with m = 20 (see Section X A) targeting the fidelity measured experimentally.
singular values:
λ1 = 1 + 2 · | cos(φ/2) cos θ| + cos2 θ λ2 = λ3 = sin θ
λ4 = 1 2 · | cos(φ/2) cos θ| + cos2 θ ,
(103) (104) (105)
where normalization is chosen so that
4 i
λ2i
=
4.
In
practice, we have θ ≈ π/2 and φ ≈ π/6, and so we obtain
λi ≈ 1, ∀i ∈ {1, 2, 3, 4}, which gives a flat spectrum.
In the case that θ = π/2 ± δθ, the spectrum becomes
imbalanced, as expected. When considering the decom-
position of a number g of fSim(π/2 ± δθ, φ ≈ π/6) gates, the set of weights of all paths is equal to the outer prod-
uct of all sets of Schmidt coefficients (one per gate).
Achieving a fidelity FXEB > 0 implies (in the optimal case) including the largest contributing paths, and so
the advantage one can get from this is upper bounded
by the magnitude of the largest weight, which is equal
to
g α=1
λ2α,max,
where
α
labels
the
gates
decomposed
and λα,max is the largest Schmidt coefficient for gate α.
In practice, |δθ| has values of around 0.05 radians (see
Fig. S46). The geometric mean of λmax is about 1.047, which gives an upper bound of 1.0472g to the speedup
discussed here. For the largest value of g considered in
this work, i.e., the decomposition of g = 35 gates using
the SFA simulator (Section X A) on a circuit of m = 20
cycles, we obtain a value of 1.0472×35 = 25.4. Note that
the speedup obtained in practice (as compared to run-
times over circuits with perfectly flat gate Schmidt de-
compositions) for fidelities of the order of 0.1% and larger
is expected to be far smaller than this value, given that
one has to consider a large number of paths, from which
only an exponentially small number will have a weight
close to 25.4.
We can get a better estimate for the speedup achieved
in practice, beyond the upper bound of about a factor of
25 that decomposing g = 35 gates with typical parame-
ters would give. For simplicity, let us assume that all g
gates have the same values of θ and φ. Then the weight
of each path arising from this decomposition can be writ-
ten as Wi = W(a,b,c) = λ21aλ22bλ23c, where a + b + c = g, and that the number of paths for each choice of (a, b, c)
is equal to #(a, b, c) =
b k=0
multinomial(a,
b
k,
k,
c)
=
2b ×multinomial(a, b, c). After sorting all 4g weights (and
paths) by decreasing value, given a target fidelity, F, one
now has to consider the first S paths (i.e., those with the
largest weight), up to the point where the sum of their
weights
S Wi i=1 4g
malization factor
matches the target fidelity. The 4g guarantees that if one were to
norcon-
sider all paths, the fidelity would be unity, as expected. Compared to the case where we consider a number F ×4g
of paths, as for a flat Schmidt spectrum, this provides a
speedup equal to
F
S ×4g
.
We
show the speedup
achieved
this way in Fig. S47. For the case where we would achieve
the largest speedup in the simulations considered in this
work, namely the simulation of Sycamore at m = 20
cycles and a fidelity F ≈ 0.2% with g = 35 gates decom-
posed (see Section X F), we estimate that the speedup
obtained this way would be well below an order of mag-
nitude, since |δθ| typically takes values of about 0.05 radians.
3. Verifiable and supremacy circuits
So far we have considered the decomposition of gates one by one, i.e., where the total number of paths is equal to the product of the Schmidt rank of all gates decomposed. However, by fusing gates together in a larger unitary, one can provide some speedup to the classical simulation of the sampling task.
The rationale here comes from the realization that a unitary that involves a number of qubits q cannot have a rank larger than 4min(ql,qr) when Schmidt decomposed over two subsets of qubits of size ql and qr, with ql +qr = q. Therefore one might reduce exponentially the number of paths by fusing gates such that the resulting unitary reaches on either side (l or r) a number of qubits that is smaller than the product of the ranks of the fused
54
gates to be decomposed. This is at the heart of the formation of wedges of Section VII G 2. These wedges denote particular sequences of consecutive two-qubit gates that only act upon three qubits. Fusing these two-qubit gates together generates 4 paths, as opposed to a naive count of 42 paths if one decomposes each gate separately. Each wedge identified across a circuit cut provides a speedup by a factor of 4.
In this work, we define two classes of circuits: verifiable and supremacy circuits. Verifiable circuits present a large number of wedges across the partition used with the SFA simulator (Section X A) and are therefore classically simulatable in a reasonable amount of time. These circuits were used to perform full XEB over the entire device up to depth m = 14 (see Fig. 4a of the main article and Sections VII and VIII), which involves perfect fidelity computations. On the other hand, supremacy circuits are designed so that the presence of wedges and similar sequences is mitigated, therefore avoiding the possibility of exploiting this classical speedup.
It is natural to apply the ideas presented here beyond wedges. It is also easy to look for similar structures in the circuits algorithmically. This way, we find that for the supremacy circuits there is a small number of such sequences. On the sequence of cycles DCD (see Fig. S25), three two-qubit gates are applied on qubits 16, 47, and 51 (see Fig. S27 for numbering). These three gates can be fused in one. Then, if the two gates between qubits 47 and 51 are decomposed (as is done with the SFA simulations of Section X A used in Fig. 4 of the main article), this technique provides a speedup of a factor of 4. The sequence of layouts DCD appears twice for circuits of m = 20, which provides a total speedup of 42 = 16 in the simulation of the supremacy circuits. This particular decomposition is currently not implemented, and the estimated timings of Section X A and Fig. 4 of the main article do not take it into account.
Beyond this, one has to go to groups of several cycles of the circuit (more than two) in order to identify regions where the fusion of several gates provides any advantage of this kind. In our circuits, the resulting unitaries act upon a large number of qubits, which makes explicitly building the unitary impractical.
E. Treewidth upper bounds and variable elimination algorithms
We explained in Section X B that the Feynman method to compute individual amplitudes of the output of a quantum circuit can be implemented as a tensor network when quantum gates are interpreted as tensors. All indexes of the tensor network have dimension two because indexes correspond to qubits. Similarly, Ref. [70] showed that a quantum circuit can be mapped directly to an undirected graphical model. In the undirected graphical model, vertices or variables correspond to tensor indexes, and cliques correspond
to tensors. Individual amplitudes can be computed using a variable elimination algorithm on the undirected graphical model, which is similar to a tensor contraction on a tensor network. The variable elimination algorithm depends on the ordering in which variables are eliminated or contracted. If we define the contraction width of an ordering to be the rank of the largest tensor formed along the contraction, the treewidth of the undirected graph is equal to the minimum contraction width over all orderings. Therefore, the complexity of a tensor network contraction grows in the optimal case exponentially with the treewidth, and the treewdith can be used to study the complexity of Feynman methods for simulating quantum circuits [69]. Ref. [70] showed that for diagonal gates the undirected graphical model is simpler, potentially lowering its treewidth, and hence improving the complexity. This simplification is not achievable in the tensor network view without including hyperedges, i.e., edges attached to more than two tensors. Ref. [70] also introduced the use of QuickBB to find a heuristic contraction ordering [84]. If allowed to run for long enough, QuickBB finds the optimal ordering, together with the treewidth of the graph. However, note that obtaining the treewidth of a graph is an NP-hard problem, and so in practice a suboptimal solution is considered for the simulations described here.
Once the width of a contraction is large enough, the largest tensor it generates is beyond the memory resources available. This constraint was overcome in Ref. [72] by projecting a subset of p variables or vertices in the undirected graphical model into each possible bistring of 0 and 1 values. This generates 2p similar subgraphs, each of which can be contracted with lower complexity and independently from each other, making the computation embarrassingly parallelizable. Choosing the subset of variables that, after projection, optimally decreases the treewidth of the resulting subgraph is also NP-hard. However, Ref. [72] developed a heuristic approach that works well in practice. The algorithm proceeds as follows:
1. Run QuickBB for S seconds on the initial graph. This gives a heuristic contraction ordering, as well as an upper bound for the treewidth.
2. For each variable, estimate the cost of contracting the subgraph after projection. The estimate is done with the ordering inherited from the previous step.
3. Choose to project the variable which results in the minimum contraction cost.
4. Repeat steps 2 and 3 until the cost is within reasonable resources.
5. Once all variables have been chosen and projected, run QuickBB for S seconds on the resulting subgraph to try to improve the contraction ordering inherited from step 1 and lower the contraction cost.
55
contraction width
runtime (s) on 1M cores
80
Sycamore supremacy circuits at m cycles Bristlecone circuits at depth d
70
QuickBB
60
50
40
30
m = 14
m = 18
m =d10= (1 + 32 + 1m) = 1d2= (1 + 40 + 1)
m = 16
m = 20
0
10
20 # variab3le0 s project4e0d
50
60
Sampling 1M bitstrings with fidelity 0.5%
1020
111B years
1016 1012
5K years 4M years
108 104
110 hours 71 days 34 seconds
2(vKeyriefiaarbsle circuit)
100
10 4
10 8
4
6
8
10
m12
14
16
18
20
FIG. S48. Contraction widths and estimated runtimes for classical sampling using the variable elimination algorithm with projected variables of Ref. [72] for Sycamore supremacy circuits. Top: contraction width as a function of the number of variables projected using the algorithm of Ref. [72]. We project enough variables in order to decrease the width to 28 or lower. Note that often the second QuickBB run does not decrease the treewidth (and might even increase it), in which case the resulting contraction ordering it is ignored. Bottom: estimated runtimes for the classical sampling of 1M bitstrings from the supremacy circuits with fidelity 0.5% using the contraction ordering found by QuickBB at the end of the projection procedure shown in the top panel. The red data point shows the estimated runtime for a verifiable circuit; note that the heuristic algorithm analyzed here provides some speedup in this case. Our time estimates assume the use of fast sampling, although it is so far unclear whether this technique can be adapted to the algorithm described here. Failure to do so would result in a slowdown of about an order of magnitude.
In the top panel of Fig. S48 we show the contraction width as a function of the number of variables that are projected for the supremacy circuits used in this paper. In order to decrease the contraction width to 28 or below (a tensor with 28 binary indexes consumes 2 GB of memory using single precision complex numbers), we need to project between 8 and 63 variables, depending on the depth of the circuits. In addition, we report the result of the projection procedure on the Bristlecone circuits considered in Refs. [50, 85] and available at https://github.com/sboixo/GRCS for depths (1+32+1) and (1+40+1), since these cases were benchmarked in Ref. [85]. We obtain a contraction width equal to 28 after 10 projections for Bristlecone at depth
(1+32+1), and width 26 after 22 projections for Bristlecone at depth (1+40+1), consistent with the results in Ref. [85]. Even though Ref. [72] uses S = 60, we run QuickBB for 1800 seconds (30 minutes) every time, in order to decrease the contraction width of the Bristlecone simulations to values that match the memory requirements reported in Ref. [85]. Note that Ref. [85] neither reports the value of S used nor the contraction widths found; however, with S = 1800 we are able to match the scaling of time complexity reported, as is explained below.
To estimate the runtime of the computation of a single amplitude using this algorithm on the circuits presented in this work, we use the following scaling formula:
TVE = CVE1 · 2p · (cost after p projections)/ncores, (106)
where VE refers to the variable elimination algorithm with projections described in this section, CVE is a constant factor, p is the number of variables projected, and ncores is the number of cores used in the computation. The cost of the full contraction of each subgraph is estimated as the sum of 2rank, where the rank refers to the number of variables involved in each individual contraction along the full contraction of the subgraph. We obtain the value of CVE from the runtimes reported in Ref. [72], which shows that a single amplitude of Bristlecone at depth (1+32+1) takes 0.43 seconds to compute on 127,512 CPU cores with 10 projected variables, and at depth (1+40+1) it takes 580.7 seconds with 22 projected variables using the same number of cores. We use the benchmark at depth (1+32+1) because it provides the largest value for CVE (lowest time estimates), which is equal to 52.7 MHz; the benchmark at depth (1+40+1) gives CVE = 51.6 MHz. In order to sample 1M bitstrings from a random circuit with fidelity 0.5%, we need to compute 5000 amplitudes.
We present our estimates for Sycamore supremacy circuits in the bottom panel of Fig. S48. Note that depth (1+40+1) in Refs. [72, 85] is equivalent to m=20 cycles here because of the denser layout of two-qubit gates. Furthermore, computation times reported previously are for circuit variations less complex than for Sycamore, arising from changes in complexity such as CZ vs. fSim gates and differing patterns; with this change of gates, depth (1+40+1) in Refs. [72, 85] is actually equivalent to m=10 cycles here. Finally, note that we present optimistic estimates, since we are assuming that the fast sampling technique discussed in Section X B is applicable here. To the best of our knowledge, it is not known how to apply this technique for the heuristic variable elimination algorithm discussed here; in the absence of an implementation of this technique, in order to successfully apply rejection sampling we would instead need to compute a few independent amplitudes per sampled bitstring, which would increase the estimated times by about an order of magnitude (see Section X B and Refs. [38, 86] for more details). According to our esti-
W1/2
X1/2
W1/2
Y1/2
Syc
Syc
X1/2
W1/2
X1/2
W1/2
Syc
Y1/2
X1/2
Y1/2
W1/2
Syc
Syc
X1/2
W1/2
Y1/2
X1/2
56
80
60
Counts
40
20
0
180
200
220
240
260
Job times (seconds)
FIG. S50. Qsimh execution time for a 53 qubit circuit with 20 cycles for the first 1000 prefix values. The average job time tprefix is calculated to be 246 seconds.
FIG. S49. Circuit with Sycamore gates (top) and its corresponding undirected graphical model (bottom). Each non-diagonal single-qubit gate introduces a new vertex or variable. Note that, even though two-qubit gates are generally represented by a clique with four vertices or variables, Sycamore gates can be simplified as a cphase followed by a SWAP. The cphase is represented as an edge between two existing variables. The SWAP, however, provides more complexity to the graph as it swaps the corresponding variables.
mates, sampling from supremacy circuits at m = 16 and beyond is out of reach for this algorithm. Interestingly, we find some speedup for the simulation of verifiable circuits, as is shown in Fig. S48 for m = 16 (red data point).
Finally, note that the undirected graphical model derived from the supremacy circuits can take advantage of the structure of the Sycamore gates (fSim plus single-qubit Rz rotations). Due to the fact that fSim(θ ≈ π/2, φ) ≈ i · [Rz(−π/2) ⊗ Rz(−π/2)] · cphase(π + φ) · SWAP, the Sycamore gate corresponds to a subgraph of only two variables, which explicitly represents the diagonal cphase and the logical SWAP. This simplification, used in our estimates, results in an undirected graphical model that is simpler than that one generated by arbitrary two-qubit gates. See Fig. S49 for an example.
F. Computational cost estimation for the sampling task
We find that the most efficient simulator for our hardest circuits is the SFA simulator (see Sec. X A). In order to estimate the computational cost associated with simulating a 53 qubit circuit with 20 cycles, where no gates are elided on the cut, we use a Google cloud cluster com-
posed of 1000 machines with 2 vCPUs and 7.5 GB of RAM each (n1-standard-2). We use n1-standard-2 because this is the smallest non-custom machine with sufficient RAM for simulating the two halves of the circuit. In 20 cycles, the circuit contains 35 gates across the cut. All cross gates have a Schmidt rank of 4 except for the last four gates which can be simplified to cphase with a Schmidt rank of 2. To obtain a perfect fidelity simulation we would need to simulate all 431 × 24 paths. We configure qsimh according to Ref. [38] to have a prefix of 30 cross gates, thus requiring 430 separate qsimh runs. The first 1000 paths of the required 430 were used for timing purposes. In Figure S50 we plot the distribution of simulation times with qsimh consuming two hyperthreads. The average job time is 246 seconds resulting in a calculated 1.6 × 1014 core hours for a simulation of the circuit with 0.002 fidelity [87]. Extrapolated run times for other circuits with 53 qubits are shown in Table XI. To calculate a total cost for the largest circuit we multiply the Google Cloud preemptible n1-standard-2 price in zone us-central-1 of $0.02 per hour, 246 seconds average run time, 0.002 target fidelity, and 430 qsimh runs. This results in an estimated cost of 3.1 trillion USD. For perfect fidelity simulations (necessary for XEB), an extrapolation to a fidelity value of 100% gives a good estimate of the run time. We believe these estimates are a lower bound on costs and simulation time due to the fact that these calculations are likely to compete with each other if they are run on the same nodes.
As a final remark, note that a hypothetical implementation of the decomposition discussed at the end of Section X D 3 could decrease the computation time presented here by a factor of 16.
57
qubits, n cycles, m total #paths fidelity run time
53
12
41724 1.4%
2 hours
53
14
42124 0.9% 2 weeks
53
16
42523 0.6%
4 years
53
18
42823 0.4% 175 years
53
20
43124 0.2% 10000 years
TABLE XI. Approximate qsimh run times using one million CPU cores extrapolated from the average simulation run time for 1000 simulation paths on one CPU core.
G. Understanding the scaling with width and depth of the computational cost of verification
1. Runtime scaling formulas
Here we study the scaling of the runtime of the classical computation of exact amplitudes from the output wave function of a circuit with m cycles and n qubits on Sycamore, assuming a supercomputer with 1M cores. This computation is needed in order to perform XEB on the circuits run. We consider two algorithms: a distributed Schr¨odinger algorithm (SA) [74, 75] (see Section X C) and a hybrid Schr¨odinger-Feynman algorithm (SFA) [38] that splits the circuit in two patches and time evolves each of them for all Feynman paths connecting both patches (see Section X A). The latter is embarrassingly parallelizable. Note that these scaling formulas provide rough estimates presented with the intent of building intuition on the scaling of runtimes with the width and depth of the circuits, and that the finite size effects of the circuits can give discrepancies of an order of magnitude or more for the circuit sizes considered in this work.
For SA, the runtime is directly proportional to the size of the wave function on n qubits. This is equal to 2n. In addition, the runtime is proportional to the number of gates applied, which scales linearly with n and m. For this reason, we propose the scaling:
TSA = CSA1 · mn · 2n,
(107)
where the constant CSA is fit to runtimes observed experimentally when running on a supercomputer, and scaled to 1M cores.
For SFA the runtime is proportional to the number of paths connecting both patches, as well as to the time taken to simulate each pair of patches. When using the supremacy two-qubit gate layouts (ABCDCDAB. . . ), each fSim gate bridging between the two patches (crossgates) generates a factor of 4 in the n√umber of paths. The number of cross-gates scales with n (we assume a two-dimensional grid) and with m. The time taken to simulate each patch is proportional to 2n/2, where n/2 estimates the number of qubits per patch, and the exponential dependence comes from a linear scaling of the run-
time with the size of the wave function over that patch.
The runtime therefore scales as:
TSFA,
supremacy
=
CSF1A
·2
·
n
22
· 4B·m n,
(108)
where the extra factor of two accounts for the fact that,
for every path, two patches have to be simulated. The
constant CSFA, with units of frequency, is the effective frequency with which 1M cores simulate paths and is fit
from experimentally observed runtime. The constant B
accounts for the average number of cross-gates observed
per cycle, which depends on the two-dimensional grid
considered and on the two-qubit gate layouts used. For
Sycamore, with the supremacy layouts, we find 35 cross-
gates for n = 53 and m = 20, which gives B = 0.24 ≈
1/4.
For SFA, using the verifiable two-qubit gate lay-
outs (EFGHEFGH. . . ), the main difference with the
supremacy circuits case is the fact that most of the cross-
gates can be fused in pairs, forming three-qubit gates we
refer to as wedges (see Sec. VII G 2 and X D 3). Each cross-wedge generates only 4 paths, as opposed to the 42
paths the two independent fSim gates would have gen-
erated. Since every 4 cycles provide 7 cross-gates, and
from those 7 gates, 6 are converted into 3 wedges, we count only 44 paths, as opposed to a naive count of 47
for those 4 cycles. In turn, the exponent in the last factor
of
Eq.
108
is
corrected
by
the
fraction
4 7
.
This
results
in:
TSFA,
verifiable
=
CSF1A
·
2
·
n
22
·
4
4 7
√ B·m n
.
(109)
2. Assumptions and corrections
There are several assumptions considered in Section X G 1 and other details that can either (1) contribute to a somewhat large discrepancy between the runtimes predicted by the scaling formulas and the actual runtimes potentially measured experimentally, or (2) be ignored with no significant impact on the accuracy of the predictions. Here we discuss the ones we consider most relevant.
Concerning SA, the algorithm is benchmarked in practice on up to 100K cores. Since this is a distributed algorithm, the scaling with number of cores is not ideal and therefore the constant CSA can only be estimated roughly. We assume perfect scaling in our estimates for runtime on 1M cores, i.e., the runtime on 1M cores is the one on 100K cores divided by 10; this is of course an optimistic estimate, and runtimes should be expected to be larger.
For memory requirement estimates, we assume a 2 byte encoding of complex numbers. Beyond about 49 qubits there is not enough RAM on any existing supercomputer to store the wave function. In those cases, runtimes are given for the unrealistic, hypothetical case that one can store the wave function.
58
FIG. S51. Scaling of the computational cost of XEB using SA and SFA. a, For a Schro¨dinger algorithm, the limitation is RAM size, shown as vertical dashed line for the Summit supercomputer. Circles indicate full circuits with n = 12 to 43 qubits that are benchmarked in Fig. 4a of the main paper [1]. 53 qubits would exceed the RAM of any current supercomputer, and is shown as a star. b, For the hybrid Schro¨dinger-Feynman algorithm, which is more memory efficient, the computation time scales exponentially in depth. XEB on full verifiable circuits was done at depth m = 14 (circle). c, XEB on full supremacy circuits is out of reach within reasonable time resources for m = 12, 14, 16 (stars), and beyond. XEB on patch and elided supremacy circuits was done at m = 14, 16, 18, and 20.
SFA is embarrassingly parallelizable, and so it does not suffer from non-ideal scaling. However, there are other factors to take into account. First, we have written no explicit dependence of the time to simulate patches of the circuit with m; the number of cycles m only plays a role when counting the number of paths to be considered. SFA stores several copies of the state of a patch after its evolution at different depths, iterating over paths over several nested loops. For this reason, most of the time is spent iterating over the inner-most loop, which accounts for the last few gates of the circuit and is similar in cost for all depths. This implies that the amortized time per path is considered approximately equal for all depths and the direct m dependence was correctly ignored.
A factor contributing to the discrepancy between the predicted runtimes of the scaling formulas of Section X G 1 and those expected in practice is due to finite size effects. While these scaling formulas consider the average number of cross-gates encountered per cycle, different cycles have layouts that contribute a few more (or less) gates than others. Since the runtime dependency is exponential in the number of gates, this might cause
discrepancies of around an order of magnitude. Furthermore, for verifiable circuits, wedges form over groups of two cycles; this coarse graining exacerbates finite size effects. For the sake of simplicity in the scaling formulas, we do not perform any corrections to include these factors. However, in order to mitigate the propagation of finite size effect errors, we consider different constants CSFA, supremacy and CSFA, verifiable, that we fit independently.
Finally, we refer to runtimes of our simulations on a hypothetical supercomputer with 1M cores. While this is a realistic size for a Top-5 supercomputer currently, a core-hour can vary significantly between different CPU types. Again, we only intend to provide rough estimates in order to build intuition on the dependence of runtimes with circuit width and depth.
3. Fitting constants
In the case of SA, we fit the constant CSA with a runtime of 0.1 hours for the simulation with n = 43 and
59
m = 14. This runtime is obtained by assuming ideal scaling when extrapolating a runtime of 1 hour on nearly 100K nodes (215 MPI processes, 3 cores per process), as reported in Sec. X C. This gives a value of
CSA = 0.015 × 106 GHz.
(110)
For SFA, we consider B = 1/4 for simplicity. In order to fit CSFA, we consider a runtime of 5 hours and 4 years for the case with n = 53 and m = 14 for verifiable and supremacy circuits, respectively (see Fig. 4 of the main text). This gives:
CSFA, verifiable = 0.0062 × 106 GHz CSFA, supremacy = 3.3 × 106 GHz.
(111)
As discussed above, these fits provide times estimated for a supercomputer with 1M cores. Contour plots showing the dependency of runtime with n and m are presented in Fig. S51.
4. Memory usage scaling
Let us conclude with a discussion of the memory foot-
print of both algorithms. For these estimates, we assume
a 2-byte encoding of complex numbers, as opposed to 8
bytes (single precision) or 16 bytes (double precision).
This results in a lower bound for the memory usage of
these two algorithms. These estimates need an extra fac-
tor of 4 (8) when using single (double) precision. SA
stores the wave function of the state on all qubits. For
this reason, it needs 2n × 2 = 2n+1 bytes. SFA simu-
lates the wave function of both halves of the system (n/2
qubits)
per
path,
one
at
a
time.
This
requires
2n 2
·2
bytes
per path. In practice, the use of checkpoints implies the
need to store more than one wave function per path; for
simplicity, and in the same optimistic spirit of other as-
sumptions, we ignore this fact. If 1M cores are used
and each path is simulated using a single core, the total
memory
footprint
is
estimated
to
be
106
×
2
n 2
+1
bytes.
State-of-the-art supercomputers have less than 3 PB of
memory.
H. Energy advantage for quantum computing
With the end of Dennard scaling for CMOS circuits, gains in computing energy efficiency have slowed significantly [88]. As a result, todays high performance computing centers are usually constrained by available energy supplies rather than hardware costs. For example, the Summit supercomputer at Oak Ridge National Laboratory has a total power capacity of 14 MW available to achieve a design specification of 200 Pflop/s doubleprecision performance. We took detailed energy measurements with qFlex running on Summit. The energy
consumption grows exponentially with the circuit depth, as illustrated in Table VII.
For a superconducting quantum computer, the two primary sources of energy consumption are:
1. A dilution refrigerator: our refrigerator has a direct power consumption of 10 kW, dominated by the mechanical compressor driving the 3 K cooling stage. The power required to provide chilled water cooling for the compressor and pumps associated with the refrigerator can be an additional 10 kW or more.
2. Supporting electronics: these include microwave electronics, ADCs, DACs, clocks, classical computers, and oscilloscopes that are directly associated with a quantum processor in the refrigerator. The average power consumption of supporting electronics was nearly 3 kW for the experiments in this paper.
We estimate the total average power consumption of our apparatus under worst-case conditions for chilled water production to be 26 kW. This power does not change appreciably between idle and running states of the quantum processor, and it is also independent of the circuit depth. This means that the energy consumed during the 200 s required to acquire 1M samples in our experiment is 5×106 J ( 1 kWh). As compared to the qFlex classical simulation on Summit, we require roughly 7 orders of magnitude less energy to perform the same computation (see Table VII). Furthermore, the data acquisition time is currently dominated by control hardware communications, leading to a quantum processor duty cycle as low as 2%. This means there is significant potential to increase our energy efficiency further.
XI. COMPLEXITY-THEORETIC FOUNDATION OF THE EXPERIMENT
The notion of quantum supremacy was originally introduced by John Preskill [89]. He conceived of it as “the day when well controlled quantum systems can perform tasks surpassing what can be done in the classical world”. For the purpose of an experimental demonstration we would like to refine the definition.
Demonstrating quantum supremacy requires:
1. A well defined computational task, i.e. a mathematical specification of a computational problem with a well defined solution.
Comment: This requirement, standard in computer science, excludes tasks such as “simulate a glass of water”. However, it would include finding the ground state energy of an H2O molecule to a given precision governed by a specific Hamiltonian. Note
60
that a mathematical specification of a computational problem calls for highly accurate control resulting in measurable system fidelity.
2. Programmable computational device
Comment: Many physics experiments estimate the values of observables to a precision which can not be obtained numerically. But those do not involve a freely programmable computational device and the computational task is often not well defined as required above. Ideally, we would even restrict ourselves to devices that are computationally universal. However, this would exclude proposals to demonstrate quantum supremacy with BosonSampling [90] or IQP circuits [91].
3. A scaling runtime difference between the quantum and classical computational processes that can be made large enough as a function of problem size so that it becomes impractical for a supercomputer to solve the task using any known classical algorithm.
Comment: What is impractical for classical computers today may become tractable in ten years. So the quantum supremacy frontier will be moving towards larger and larger problems. But if a task is chosen such that the scaling for the quantum processors is polynomial while for the classical computer it is exponential then this shift will be small. Establishing an exponential separation requires substantial efforts designing and benchmarking classical algorithms [27, 50, 6770, 72, 74, 75, 85], and support from complexity theory arguments [27, 30, 92]. Sampling the output of random quantum circuits is likely to exhibit this scaling separation as a function of the number of qubits for large enough depth. In this context, we note that quantum analog simulations that estimate an observable in the thermodynamic limit typically do not define a problem size parameter.
The requirements above are satisfied by proposals of quantum supremacy emerging from computer science, such as BosonSampling [90], IQP circuits [91], and random circuit sampling [6, 27, 30, 92, 93]. They are also implicit in the Extended Church-Turing Thesis: any reasonable model of computation can be efficiently simulated, as a function of problem size, by a Turing machine.
We note that formal complexity proofs are asymptotic, and therefore assume an arbitrarily large number of qubits. This is only possible with a fault tolerant quantum computer and therefore near term practical demonstrations of quantum supremacy must rely on a careful comparison with highly optimized classical algorithms on state-of-the-art supercomputers.
So far we have argued for quantum supremacy by comparing the running time of the quantum experiment with the time required for the same task using the best known classical algorithms, running on the most powerful supercomputers currently available. The fastest known al-
gorithm for exact sampling (or for computing transition probabilities) runs in time exponential in the treewidth of the quantum circuit [69, 70]; for a depth D circuit on a rectangular lattice of sizes lx and ly, the treewidth is given by min(min(lx, ly)D, lxly). For approximate simulation in which one only requires a given global fidelity F , the classical cost is reduced linearly in F [38]. As classical algorithms and compute power can be improved in the future, the classical cost benchmark is a moving target.
A complementary approach to back up supremacy claims consists of giving complexity-theoretic arguments for the classical hardness of the problem solved (in our case sampling from the output distribution of a random circuit of a given number of qubits, depth and output fidelity). Previous work gave hardness results for sampling exactly from the output distribution of different classes of circuits [27, 90, 9496]. Most relevant to us are Refs. [92, 93, 97], which proved that it is classically intractable (unless the polynomial hierarchy collapses to its third level, which is considered extremely unlikely [98]) to sample from the exact probability distribution of outcomes of measurements in random circuits. We note the distribution of circuits considered in [92, 93, 97] is different from ours.
An important clarification is that such results are asymptotic, i.e. they show that, unless the polynomial hierarchy collapses, there are no polynomial-time classical algorithms for sampling from output measurements of certain quantum circuits. But they cannot be used directly to give concrete lower bounds for quantum computations of a fixed number of qubits and depth. Refs. [99 101] tackled this question using tools from fine-grained complexity, giving several finite size bounds.
There are also results arguing for the hardness of approximate sampling (see e.g. [27, 90, 91, 95]), where the task is only to sample from a distribution which is close to the ideal one. As the quantum experiment will never be perfect, this is an important consideration. However those results are weaker than the ones for exact sampling, as the hardness assumptions required have been much less studied (and in fact were introduced with the exact purpose of arguing for quantum supremacy). Another drawback is that the results only apply to the situation where the samples come from a distribution very close to the ideal one (i.e. with high fidelity with the ideal one). This is not the regime in which our experiment operates.
With these challenges in mind, we consider an alternative hardness argument in this section, which will allow us to lower bound the classical simulation cost of noisy quantum circuits by the cost of the ideal one. On one hand, our argument will be more restrictive than previous results in that we will assume a particular noise model for the quantum computer (one, however, which models well the experiment). On the other hand, it will be stronger in two ways: (1) it will apply even to the setting in which the output fidelity of the experimental state with the ideal one can be very small, but still the
61
product of total fidelity with exact computational cost is large; and (2) it will be based on more mainstream complexity assumptions in contrast to the tailor-made conjectures required in e.g. [90, 91, 95] to handle the case of small adversarial noise.
A. Error model
Our error model is the following. We assume that the quantum computer samples from the following output distribution:
rU,F (x) := F | x| U |0 |2 + (1 F )/2n,
(112)
with U the circuit implemented. In words, we assume global depolarizing noise. Ref. [27] argues that Eq. (112) is a good approximation for the output state of random circuits (see Sec. IV and Section III of [27]); this form has also been verified experimentally on a small number of qubits. In the experiment, F is in the range 102 103.
We note that while we assume a global white noise model in this section, we do not assume it in the rest of the paper, neither for validating the cross entropy test nor in the comparison with state-of-the-art classical algorithms (and indeed the algorithm considered in Section X samples from an approximate distribution different from the one in Eq. (112)).
B. Definition of computational problem
Before stating our result, let us define precisely the computational problem we consider. We start with the ideal version of the problem with no noise:
Circuit Sampling: The input is a description of a n qubit quantum circuit U , described by a sequence of oneand two-qubit gates. The task of the problem is to sample from the probability distribution of outcomes pU (x) := | x|U |0 |2.
Circuit sampling is an example of a sampling problem [102]. A classical algorithm for circuit sampling can be thought of, without loss of generality, as a function A mapping m ∈ poly(n) bits r = (r1, . . . rm) to n bits such that
1 2m |{(r1, . . . , rm) s.t. A(r1, . . . , rm) = x}| = p˜U (x),
(113) with p˜(x) an approximation of pU (x) to l ∈ poly(n) bits of precision. So when r is chosen uniformly at random, the output of A are samples from p (up to rounding errors which can be made super-exponentially small).
Assuming the polynomial hierarchy does not collapse, it is known that Circuit Sampling cannot be solved classically efficiently in n, meaning any algorithm A satisfying Eq. (113) must have superpolynomial circuit complexity, for several classes of circuits (such as short depth circuits
[96], IQP [94] and Boson Sampling [90]). We might also be interested in the average case of circuit sampling (for a restricted class of circuits).
Random Circuit Sampling: The input is a set of quantum circuits U on n qubits. The task is to sample from pU (x) := | x|U |0 |2 for most circuits U ∈ U .
Ref. [92] proved that an efficient (in terms of n) classical algorithm for this task for random circuits would also collapse the polynomial hierarchy. As every realistic quantum experiment will be somewhat noisy, it is relevant to consider a variant of this task allowing for small deviations from ideal. One possible formulation is the following:
ε-Approximate Random Circuit Sampling: The input is a set of quantum circuits U on n qubits. The task is to sample for most circuits U ∈ U, from any distribution qU s.t. dVD(qU , pU ) ≤ ε, where dVD(p, q) is the variational-distance between the distributions p, q [103] and pU (x) := | x|U |0 |2.
Refs. [27, 90, 91] put forward new complexity-theoretic assumptions about the #P-hardness of certain problems and proved they imply that several restricted classes of circuits are hard to approximately sample for ε sufficiently close to zero. However, we cannot use these results here as the ε we achieve is far from zero. We will resort to the following different variant of approximate circuit sampling.
Unbiased-Noise F -Approximate Random Circuit Sampling: The input is a set of quantum circuits U on n qubits. The task is to sample from the distribution rU,F given by Eq. (112), for most circuits U ∈ U .
We note that there are alternatives for defining the computational problem for which supremacy is achieved without having to use sampling problems. These have the advantage that it is possible to verify, for each problem instance, that the task was achieved (whereas while it is in principle possible to verify that one is sampling from the correct distribution by estimating the frequencies of outcomes, this is unfeasible in practice for high entropy distributions with > 250 outcomes as the one we consider here).
One such problem (considered on Refs. [27, 30]) is the following:
b-Heavy Output Generation: Given as input a number b > 1 and a random circuit U on n qubits (drawn at random from a set of circuits U), generate output strings x1, . . . , xk s.t.
1 k
k
|
xj|U |0
|2
b 2n
j=1
(114)
Ref. [30] argues for the hardness of this task for every b > 1, although here again one has to resort to
62
rather bold complexity-theoretic conjectures. Cross entropy benchmarking allows us to estimate b for a reasonable value of k (though the classical time needed to compute | xj|U |0 |2 still grows very fast), see Sec. IV. In terms of known algorithms, the complexity of solving Heavy Output Generation is equivalent to the complexity of sampling k samples from a noisy distribution corresponding to the same b value.
The experiment we report in this paper can be interpreted as showing quantum supremacy in solving the bHeavy Output Generation with b = 1 + F and F the fidelity of the output quantum state.
C. Computational hardness of unbiased-noise sampling
To state our result, we use the complexity class ArthurMerlin, which is a variant of the class NP and is denoted by AM[T ]. It is defined as the class of problems for which there is an Arthur-Merlin one-round protocol of the following form: given an instance of a problem in AM [T ] (which Arthur would like to decide if it is a YES or NO instance), Arthur first sends random bits to Merlin. Merlin (which is computationally unbounded) then sends back a proof to Arthur. Finally Arthur uses the proof and decides in time T if he accepts. In the YES case, Arthur accepts with probability larger than 2/3. In the NO case, he accepts with probability no larger than 1/3.
Theorem 1 Assume there is a classical algorithm running in time T and using m bits of randomness that samples from the distribution rU,F (x) given by Eq. (112), for a given quantum circuit U on n qubits and F ≥ 0. Then for every integer L, there is an AM [LT + 2Lm] protocol for deciding, given λ > 0, whether
| 0| U |0 |2 ≥ λ
2 1+
L
2(1 F ) + F L2n
or
| 0| U |0 |2 ≤ λ
2 1
L
2(1 F ) F L2n
(115) (116)
Before giving the proof, let us discuss the significance of the result. We are interested in the theorem mostly when L = c/F with c a small constant (say 10). Noting that for a random circuit, with high probability, | 0| U |0 |2 ≥ 2n/5 [97], the theorem states that if we can sample classically in time T from the distribution given in Eq. (112), then we can calculate a good estimate for | 0| U |0 |2 in time 10T /F (with the help from an all-powerful but untrustworthy Merlin). It is unlikely that Merlin can be of any help for this task for random circuits, as estimating | 0| U |0 |2 for random circuits is a #P-hard problem [92], and it is believed #P is vastly more complex than AM (which is contained on the third level of the polynomial hierarchy [98]). Therefore we conclude that global white noise leads to no more than
a linear decrease in fidelity in classical simulation time (which is in fact optimal as it is achieved by the method presented in Ref. [38]).
Ref. [104] proposed a similar, but more demanding, conjecture about the non-existence of certain AM protocols for estimating transition probabilities of random circuits. This conjecture was applied to show that the output bits of our supremacy experiment can be used to produce certifiable random bits.
We note Theorem 1 does not establish a lower bound on the classical computation cost of calculating a transition amplitude with additive error δ/2n, for small constant δ > 0. What it does is to show that the sampling problem with unbiased noise is as hard as this task, up to a linear reduction in F in complexity.
Concerning the hardness of computing | 0|U |0 |2 it is known that this problem is #P hard for random circuits to additive error 2poly(n) [92]. This implies that there is no subexponential-time algorithms for this task (unless #P collapses to P). For finite size bounds, which are more relevant to our experiment, the result of Ref. [99] is the most relevant. It shows that under the Strong Exponential Time Hypothesis (SETH) [105], there are quantum circuits on n qubits which require 2(1o(1))n time for estimating | 0|U |0 |2 to additive error 2(n+1) [106]. Together with Theorem 1, we find there is a quantum circuit U on n qubits for which the distribution rU,F (given by Eq. (112)) cannot be sampled in time F 2(1o(1)n), unless SETH is false.
It is an open question to show a similar lower bound to the one proved in Ref. [99] for estimating the transition probability of random circuits. Even more relevant for this work, it would be interesting to study if one can show a lower bound of the form 2(1o(1))treewidth for a random quantum circuit, under a suitable complexitytheoretic assumption, as the depth of the construction in [99] is relatively high.
D. Proof of Theorem 1
The proof will follow along similar lines to previous work [90, 91, 95]. We will use approximate counting (which can be done in AM) to show that a sampling algorithm for rU,F running in time T implies an AM protocol to compute rU,F (0)(1 ± 1/L), with classical verification of order LT . Since the noise is unbiased, i.e. rU,F (0) = F 0|U |0 |2 +(1F )/2n, we can subtract it and find an AM protocol for estimating | 0|U |0 |2 as stated in the theorem.
In more detail, suppose there is a classical algorithm for sampling from rU,F given by a function A mapping m ∈ poly(n) bits r = (r1, . . . rm) to n bits such that
1 2m |{(r1, . . . , rm) s.t. A(r1, . . . , rm) = x}|
= rU,F (x). (117)
63
Let a(r1, . . . , rm) be a function which is 1 if A(r1, . . . , rm) = 0n and zero otherwise.
We start with the following lemma, showing the existence of A implies an AM [LT + 2Lm] protocol for estimating rU,F (0):
Lemma 1 Assume there is an algorithm A given by Eq. (117). Then for every θ and L there is an AM [LT + 2Lm] protocol which determines if (i) rU,F (0) ≥ θ(1 + 2/L) (YES instance) or (ii) rU,F (0) ≤ θ(1 2/L) (NO instance).
Proof: The protocol is the following:
1. For every t ∈ [Lm], Arthur chooses a function at random ht ∈ HLm,t from a family HLm,t of 2universal linear hash functions from {0, 1}Lm to {0, 1}t [98]. Then he communicates his choice of (h1, . . . , hLm) to Merlin.
2. Merlin sends an Lm-bitstring w to Arthur and an integer s ∈ [Lm] .
3. Arthur verifies that hs(w) = 0 and
a(w1,1, . . . w1,m) ∧ . . . ∧ a(wL,1, . . . wL,m) = 0.
He rejects if any of the three equations is not satisfied. Then he checks if θ ≤ 2m201/L2s/L(1 + 2/L)1, accepting if it is the case and rejecting otherwise.
The cost to compute a(w1,1, . . . w1,m) is T , and the cost to compute is hs(w) is less than 2Lm, so the total verification time of the AM protocol is LT + 2Lm.
Let us analyze the completeness and soundness of the protocol.
Completeness: Suppose we have a YES instance,
rU,F (0) ≥ θ(1+2/L). Let us show that Merlin can send w and s which makes Arthur accept with high probability.
Let M be the number of solutions of a(r1, . . . , rm) = 0 (i.e. M = 2mrU,F (0)). Then a(r1,1, . . . r1,m) ∧ . . . ∧ a(rL,1, . . . rL,m) has M L solutions, M for each copy of the function a. As part of the proof Merlin sends s satisfying 20 ≥ M L/2s ≥ 10 (such a value always exists as s can
be an arbitrary integer less than or equal to Lm).
Let us apply Lemma 2 (stated below) with q = Lm, t = s, δ = 1/2, and S the set of solutions, so |S| = M L. Then indeed |S|/2s > 10 > 1/δ3. Therefore, with high
probability, the number of solutions of
a(x1,1, . . . x1,m) ∧ . . . ∧ a(xL,1, . . . xL,m) ∧ hs(x) (118)
is in the interval [(1/2)M L/2s, 2M L/2s]. Since (1/2)M L/2s ≥ 1, there is a string w s.t.
a(w1,1, . . . w1,m) ∧ . . . ∧ a(wL,1, . . . wL,m) ∧ hs(w) = 0, which Merlin also sends to Arthur as part of the proof.
Since M = 2mrU,F (0) ≥ 2mθ(1 + 2/L) and M L/2s ≤ 20,
20
ML 2s
2Lm 2s
θL
2 1+
L
L
,
(119)
so indeed θ ≤ 2m201/L2s/L(1 + 2/L)1 and Arthur will accept with high probability.
Soundness: Suppose we have a NO instance, rU,F (0) ≤ θ(1 2/L). Let us show that no matter which witnesses w, s Merlin sends, Arthur will only accept with a small probability. Merlin must send s such that
θL ≤ (20)2Lm2s(1 + 2/L)L,
(120)
otherwise Arthur rejects. By Lemma 2 (stated below), the number of solutions of
a(x1,1, . . . x1,m) ∧ . . . ∧ a(xL,1, . . . xL,m) ∧ hs(x) (121)
will be in the interval [(1/2)M L/2s, 2M L/2s], with M = 2mrU,F (0) ≤ 2mθ(1 2/L). Since
2M L/2s ≤ 2(2s)2LmθL(1 2/L)L ≤ 40(1 2/L)L(1 + 2/L)L ≤ 40e4 < 1, (122)
there is no solution to Eq. (121) and thus there is no w which will make Arthur accept. This finishes the proof of Lemma 1.
Reduction to AM protocol for | 0|U |0 |2: Finally let us show how to use Lemma 1 to build the AM protocol stated in Theorem 1. Since rU,F (0) = F | 0| U |0 |2 + (1 F )/2n, on one hand:
| 0|U |0 |2 ≥ λ
2 1+
L
2(1 F ) + F L2n
(123)
implies that
rU,F (0) ≥ (F λ + (1 F )/2n)
2 1+
L
.
(124)
On the other hand:
| 0|U |0 |2 ≤ λ
2 1
L
2(1 F ) F L2n
(125)
implies that
rU,F (0) ≤ (F λ + (1 F )/2n)
2 1
L
.
(126)
Setting θ = F λ+(1F )/2n we see that the AM protocol from before can also be used to decide if Eq. (123) or Eq. (125) hold true. This ends the proof of the theorem.
Lemma 2 [98] For t ≤ q, let Hq,t be a family of pairwise-independent linear hash functions mapping {0, 1}q to {0, 1}t, and let δ > 0. Let S ⊆ {0, 1}n be arbitrary with |S| ≥ δ32t. Then with probability larger
than 9/10 over the choice of h ∈ Hn,t,
(1
|S| δ) 2t
|{x
S|h(x)
=
0t}|
(1
+
|S| δ) 2t
(127)
Moreover h(x) can be evaluated in time 2n, for every h ∈ Hn,t.
64
ACKNOWLEDGMENTS
We acknowledge Georg Goerg for consultation on sta-
tistical analyses. This research used resources of the Oak
Ridge Leadership Computing Facility, which is a DOE
Office of Science User Facility supported under Contract
DE-AC05-00OR22725.
Correspondence and requests for materials should be addressed to John M. Martinis (jmartinis@google.com).
† Frank Arute1, Kunal Arya1, Ryan Babbush1, Dave Bacon1, Joseph C. Bardin1,2, Rami Barends1, Rupak Biswas3, Sergio Boixo1, Fernando G.S.L. Brandao1,4, David A. Buell1, Brian Burkett1, Yu Chen1, Zijun Chen1, Ben Chiaro5, Roberto Collins1, William Courtney1, Andrew Dunsworth1, Edward Farhi1, Brooks Foxen1,5, Austin Fowler1, Craig Gidney1, Marissa Giustina1, Rob Graff1, Keith Guerin1, Steve Habegger1, Matthew P. Harrigan1, Michael J. Hartmann1,6, Alan Ho1, Markus Hoffmann1, Trent Huang1, Travis S. Humble7, Sergei V. Isakov1, Evan Jeffrey1, Zhang Jiang1, Dvir Kafri1, Kostyantyn Kechedzhi1, Julian Kelly1, Paul V. Klimov1, Sergey Knysh1, Alexander Korotkov1,8, Fedor Kostritsa1, David Landhuis1, Mike Lindmark1, Erik Lucero1, Dmitry Lyakh9, Salvatore Mandra`3,10, Jarrod R. McClean1, Matthew McEwen5,Anthony Megrant1, Xiao Mi1,Kristel Michielsen11,12, Masoud Mohseni1, Josh Mutus1, Ofer Naaman1, Matthew Neeley1, Charles Neill1, Murphy Yuezhen Niu1, Eric Ostby1, Andre Petukhov1, John C. Platt1, Chris Quintana1, Eleanor G. Rieffel3, Pedram Roushan1, Nicholas C. Rubin1, Daniel Sank1, Kevin J. Satzinger1, Vadim Smelyanskiy1, Kevin J. Sung1,13, Matthew D. Trevithick1, Amit Vainsencher1, Benjamin Villalonga1,14, Theodore White1, Z. Jamie Yao1, Ping Yeh1,
Adam Zalcman1, Hartmut Neven1, John M. Martinis1,5
1. Google AI Quantum, Mountain View, CA, USA, 2. Department of Electrical and Computer Engineering, University of Massachusetts Amherst, Amherst, MA, USA, 3. Quantum Artificial Intelligence Lab. (QuAIL), NASA Ames Research Center, Moffett Field, USA, 4. Institute for Quantum Information and Matter, Caltech, Pasadena, CA, USA, 5. Department of Physics, University of California, Santa Barbara, CA, USA, 6. Friedrich-Alexander University ErlangenNu¨rnberg (FAU), Department of Physics, Erlangen, Germany, 7. Quantum Computing Institute, Oak Ridge National Laboratory, Oak Ridge, TN, USA, 8. Department of Electrical and Computer Engineering, University of California, Riverside, CA, USA, 9. Scientific Computing, Oak Ridge Leadership Computing, Oak Ridge National Laboratory, Oak Ridge, TN, USA 10. Stinger Ghaffarian Technologies Inc., Greenbelt, MD, USA, 11. Institute for Advanced Simulation, Ju¨lich Supercomputing Centre, Forschungszentrum Ju¨lich, Ju¨lich, Germany, 12. RWTH Aachen University, Aachen, Germany, 13. Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA, 14. Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL, USA
ERRATUM
The caption of Figure 4 in the main paper [1] incorrectly states that the error bars in the figure represent both statistical and systematic uncertainty. They represent the statistical uncertainty. See Figure S41 for comparison of both types of uncertainty and discussion in Section VIII for details. Note that both types of uncertainty were accounted for in the analysis and all conclusions remain intact.
[1] Arute, F. et al. Quantum supremacy using a programmable superconducting processor. Nature 574, 505 (2019). URL https://doi.org/10.1038/ s41586-019-1666-5
[2] Barends, R. et al. Superconducting quantum circuits at the surface code threshold for fault tolerance. Nature 508, 500 (2014).
[3] Neill, C. A path towards quantum supremacy with superconducting qubits. Ph.D. thesis, University of California, Santa Barbara (2017).
[4] Yan, F. et al. Tunable coupling scheme for implementing high-fidelity two-qubit gates. Phys. Rev. Applied 10, 054062 (2018).
[5] Chen, Y. et al. Qubit architecture with high coherence and fast tunable coupling. Phys. Rev. Lett. 113, 220502 (2014).
[6] Neill, C. et al. A blueprint for demonstrating quantum supremacy with superconducting qubits. Science 360, 195199 (2018).
[7] Khezri, M., Dressel, J. & Korotkov, A. N. Qubit measurement error from coupling with a detuned neighbor in circuit QED. Phys. Rev. A 92, 052306 (2015).
[8] Tucci, R. R. An introduction to Cartans KAK decomposition for QC programmers. Preprint at https://arxiv.org/abs/quant-ph/0507171 (2005).
[9] Dunsworth, A. High fidelity entangling gates in superconducting qubits. Ph.D. thesis, University of California, Santa Barbara (2018).
[10] Dunsworth, A. et al. A method for building low loss multi-layer wiring for superconducting microwave devices. Appl. Phys. Lett. 112, 063502 (2018).
[11] Rosenberg, D. et al. 3D integrated superconducting qubits. npj Quantum Inf. 3, 42 (2017).
[12] Foxen, B. et al. Qubit compatible superconducting interconnects. Quantum Sci. Tech. 3, 014005 (2017).
[13] Foxen, B. et al. High speed flux sampling for tunable superconducting qubits with an embedded cryogenic transducer. Supercond. Sci. Technol. 32, 015012 (2018).
[14] Blais, A., Huang, R.-S., Wallraff, A., Girvin, S. M. & Schoelkopf, R. J. Cavity quantum electrodynamics for superconducting electrical circuits: An architecture for quantum computation. Phys. Rev. A 69, 062320 (2004).
[15] Gambetta, J. et al. Qubit-photon interactions in a cavity: Measurement-induced dephasing and number splitting. Phys. Rev. A 74, 042318 (2006).
[16] Bultink, C. C. et al. General method for extracting the quantum efficiency of dispersive qubit readout in circuit QED. Appl. Phys. Lett. 112, 092601 (2018).
[17] Sank, D. et al. Measurement-induced state transitions in a superconducting qubit: Beyond the rotating wave approximation. Phys. Rev. Lett. 117, 190503 (2016).
[18] Clerk, A. A., Devoret, M. H., Girvin, S. M., Marquardt, F. & Schoelkopf, R. J. Introduction to quantum noise, measurement, and amplification. Rev. Mod. Phys. 82, 11551208 (2010).
[19] Caves, C. M. Quantum limits on noise in linear amplifiers. Phys. Rev. D 26, 18171839 (1982).
[20] Mutus, J. Y. et al. Strong environmental coupling in a Josephson parametric amplifier. Appl. Phys. Lett. 104, 263513 (2014).
[21] Ryan, C. A. et al. Tomography via correlation of noisy measurement records. Phys. Rev. A 91, 022118 (2015).
[22] Jeffrey, E. et al. Fast accurate state measurement with superconducting qubits. Phys. Rev. Lett. 112, 190504 (2014).
[23] Sank, D. What is the connection between analog signal to noise ratio and signal to noise ratio in the IQ plane in a quadrature demodulation system? Signal Processing Stack Exchange. URL https://dsp.stackexchange.com/questions/24372 (2015).
[24] Reed, M. D. et al. Fast reset and suppressing spontaneous emission of a superconducting qubit. Appl. Phys. Lett. 96, 203110 (2010).
[25] Sete, E. A., Martinis, J. M. & Korotkov, A. N. Quantum theory of a bandpass purcell filter for qubit readout. Phys. Rev. A 92, 012325 (2015).
[26] Chen, Y. et al. Multiplexed dispersive readout of superconducting phase qubits. Appl. Phys. Lett. 101, 182601 (2012).
[27] Boixo, S. et al. Characterizing quantum supremacy in near-term devices. Nat. Phys. 14, 595 (2018).
[28] Wootters, W. K. Random quantum states. Found. Phys. 20, 13651378 (1990).
[29] Emerson, J., Livine, E. & Lloyd, S. Convergence conditions for random quantum circuits. Phys. Rev. A 72, 060302 (2005).
[30] Aaronson, S. & Chen, L. Complexity-theoretic foundations of quantum supremacy experiments. In 32nd Computational Complexity Conference (CCC 2017) (2017).
[31] Magesan, E., Gambetta, J. M. & Emerson, J. Characterizing quantum gates via randomized benchmarking. Phys. Rev. A 85 (2012).
[32] Magesan, E., Gambetta, J. M. & Emerson, J. Robust randomized benchmarking of quantum processes. Phys. Rev. Lett. 106 (2011).
[33] Popescu, S., Short, A. J. & Winter, A. Entanglement and the foundations of statistical mechanics. Nat. Phys. 2, 754 (2006).
[34] Bremner, M. J., Mora, C. & Winter, A. Are random pure states useful for quantum computation? Phys. Rev. Lett. 102, 190502 (2009).
[35] Gross, D., Flammia, S. T. & Eisert, J. Most quantum states are too entangled to be useful as computational resources. Phys. Rev. Lett. 102, 190501 (2009).
[36] McClean, J. R., Boixo, S., Smelyanskiy, V. N., Babbush, R. & Neven, H. Barren plateaus in quantum neural network training landscapes. Nat. Comm. 9, 4812 (2018).
[37] Ledoux, M. The concentration of measure phenomenon. 89 (American Mathematical Society, 2005).
[38] Markov, I. L., Fatima, A., Isakov, S. V. & Boixo, S. Quantum supremacy is both closer and farther than it
65
appears. Preprint at https://arxiv.org/pdf/1807.10749 (2018). [39] Cross, A. W., Bishop, L. S., Sheldon, S., Nation, P. D. & Gambetta, J. M. Validating quantum computers using randomized model circuits. Phys. Rev. A 100, 032328 (2019). [40] Kelly, J., OMalley, P., Neeley, M., Neven, H. & Martinis, J. M. Physical qubit calibration on a directed acyclic graph. Preprint at https://arXiv.org/abs/1803.03226 (2019). [41] Wallraff, A. et al. Strong coupling of a single photon to a superconducting qubit using circuit quantum electrodynamics. Nature 431, 162 (2004). [42] Chen, Z. Metrology of quantum control and measurement in superconducting qubits. Ph.D. thesis, University of California, Santa Barbara (2018). [43] Chen, Z. et al. Measuring and suppressing quantum state leakage in a superconducting qubit. Phys. Rev. Lett. 116, 020501 (2016). [44] Klimov, P. V. et al. Fluctuations of energy-relaxation times in superconducting qubits. Phys. Rev. Lett. 121, 090502 (2018). [45] Kelly, J. et al. Optimal quantum control using randomized benchmarking. Phys. Rev. Lett. 112, 240504 (2014). [46] Bialczak, R. C. et al. Quantum process tomography of a universal entangling gate implemented with Josephson phase qubits. Nat. Phys. 6, 409 (2010). [47] Martinis, J. M. & Geller, M. R. Fast adiabatic qubit gates using only σz control. Phys. Rev. A 90, 022307 (2014). [48] DiCarlo, L. et al. Demonstration of two-qubit algorithms with a superconducting quantum processor. Nature 460, 240 (2009). [49] Kivlichan, I. D. et al. Quantum simulation of electronic structure with linear depth and connectivity. Phys. Rev. Lett. 120, 110501 (2018). [50] Villalonga, B. et al. A flexible high-performance simulator for the verification and benchmarking of quantum circuits implemented on real hardware. npj Quantum Information 5, 1 (2019). [51] Wallman, J., Granade, C., Harper, R. & Flammia, S. T. Estimating the coherence of noise. New J. Phys. 17, 113020 (2015). [52] Erhard, A. et al. Characterizing large-scale quantum computers via cycle benchmarking. Preprint at https://arxiv.org/pdf/1902.08543 (2019). [53] Johnson, J. E. et al. Heralded state preparation in a superconducting qubit. Phys. Rev. Lett. 109, 050506 (2012). [54] Sank, D. T. Fast, accurate state measurement in superconducting qubits. Ph.D. thesis, University of California, Santa Barbara (2014). [55] Experimental data repository URL https://doi.org/ 10.5061/dryad.k6t1rj8 [56] Babbush, R. et al. Low-depth quantum simulation of materials. Phys. Rev. X 8, 011044 (2018). [57] Boykin, P., Mor, T., Pulver, M., Roychowdhury, V. & Vatan, F. On universal and fault-tolerant quantum computing. Proc. 40th Annual Symposium on Foundations of Computer Science, IEEE Computer Society Press (1999). [58] DiVincenzo, D. P. Two-bit gates are universal for quantum computation. Phys. Rev. A 51, 10151022 (1995).
66
[59] Shor, P. W. Scheme for reducing decoherence in quan-
tum computer memory. Phys. Rev. A 52, R2493(R)
(1995).
[60] Knill, E. et al. Randomized benchmarking of quantum
gates. Phys. Rev. A 77, 012307 (2008).
[61] Lehmann, E. L. & Romano, J. P. Testing Statistical
Hypotheses (Springer-Verlag New York, 2005).
[62] Jones, E., Oliphant, T., Peterson, P. et al. SciPy:
Open source scientific tools for Python (2001). URL
http://www.scipy.org/ (2016).
[63] R Core Team. R: A Language and Environment for Sta-
tistical Computing. R Foundation for Statistical Com-
puting, Vienna, Austria (2017). URL https://www.R-
project.org/
[64] Efron, B. Bootstrap methods: Another look at the jack-
knife. Ann. Stat. 7, 1 (1979).
[65] Dyakonov, M. The case against quantum computing.
IEEE Spectrum 56, 3, 24-29 (2019).
[66] Kalai, G. The argument against quantum computers.
Preprint at https://arxiv.org/pdf/1908.02499 (2019).
[67] Smelyanskiy, M., Sawaya, N. P. & Aspuru-
Guzik, A. qHiPSTER: the quantum high perfor-
mance software testing environment. Preprint at
https://arxiv.org/pdf/1601.07195 (2016).
[68] Villalonga, B. et al. Establishing the quantum
supremacy frontier with a 281 Pflop/s simulation.
Preprint at https://arxiv.org/pdf/1905.00444 (2019).
[69] Markov, I. L. & Shi, Y. Simulating quantum computa-
tion by contracting tensor networks. SIAM J. Comput.
38, 963981 (2008).
[70] Boixo, S., Isakov, S. V., Smelyanskiy, V. N. &
Neven, H. Simulation of low-depth quantum circuits
as complex undirected graphical models. Preprint at
https://arxiv.org/pdf/1712.05384 (2017).
[71] Lyakh, D.
Tensor algebra library rou-
tines for shared memory systems.
URL
https://github.com/DmitryLyakh/TAL SH (2019).
[72] Chen, J. et al.
Classical simulation of
intermediate-size quantum circuits. Preprint at
https://arxiv.org/pdf/1805.01450 (2018).
[73] Guo, C. et al. General-purpose quantum cir-
cuit simulator with projected entangled-pair states
and the quantum supremacy frontier. Preprint at
https://arxiv.org/pdf/1905.08394 (2019).
[74] De Raedt, K. et al. Massively parallel quantum com-
puter simulator. Comput. Phys. Commun. 176, 121136
(2007).
[75] De Raedt, H. et al. Massively parallel quantum com-
puter simulator, eleven years later. Comput. Phys. Com-
mun. 237, 4761 (2019).
[76] Krause, D. & Tho¨rnig, P. Jureca: Modular supercom-
puter at Ju¨lich supercomputing centre. Journal of large-
scale research facilities JLSRF 4, 132 (2018).
[77] Krause, D. Juwels: Modular tier-0/1 supercomputer at
the Ju¨lich supercomputing centre. Journal of large-scale
research facilities JLSRF 5, 135 (2019).
[78] Kalai, G. & Kindler, G. Gaussian noise sensitivity and
bosonsampling. arXiv:1409.3093 (2014).
[79] Bremner, M. J., Montanaro, A. & Shepherd, D. J.
Achieving quantum supremacy with sparse and noisy
commuting quantum computations. Quantum 1, 8
(2017).
[80] Yung, M.-H. & Gao, X. Can chaotic quantum
circuits maintain quantum supremacy under noise?
arXiv:1706.08913 (2017). [81] Boixo, S., Smelyanskiy, V. N. & Neven, H. Fourier anal-
ysis of sampling from noisy chaotic quantum circuits. arXiv:1708.01875 (2017). [82] Nahum, A., Vijay, S. & Haah, J. Operator spreading in random unitary circuits. Phys. Rev. X 8, 021014 (2018). [83] Von Keyserlingk, C., Rakovszky, T., Pollmann, F. & Sondhi, S. L. Operator hydrodynamics, OTOCs, and entanglement growth in systems without conservation laws. Phys. Rev. X 8, 021013 (2018). [84] Gogate, V. & Dechter, R. A complete anytime algorithm for treewidth. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, 201208 (2004). [85] Zhang, F. et al. Alibaba cloud quantum development kit: Large-scale classical simulation of quantum circuits. Preprint at https://arxiv.org/pdf/1907.11217 (2019). [86] Chen, M.-C. et al. Quantum teleportation-inspired algorithm for sampling large random quantum circuits. Preprint at https://arxiv.org/pdf/1901.05003 (2019). [87] We assume 2 vCPUs per core. [88] Koomey, J. & Naffziger, S. Energy efficiency of computing: Whats next? Electronic Design. URL https://www.electronicdesign.com/microprocessors/ energy-efficiency-computing-what-s-next (2016). [89] Preskill, J. Quantum computing and the entanglement frontier. Rapporteur talk at the 25th Solvay Conference on Physics, Brussels (2012). [90] Aaronson, S. & Arkhipov, A. The computational complexity of linear optics. In Proceedings of the Fortythird Annual ACM Symposium on Theory of Computing, 333342 (2011). [91] Bremner, M. J., Montanaro, A. & Shepherd, D. J. Average-case complexity versus approximate simulation of commuting quantum computations. Phys. Rev. Lett. 117, 080501 (2016). [92] Bouland, A., Fefferman, B., Nirkhe, C. & Vazirani, U. On the complexity and verification of quantum random circuit sampling. Nat. Phys. 15, 159 (2019). [93] Movassagh, R. Cayley path and quantum computational supremacy: A proof of average-case #P-hardness of random circuit sampling with quantified robustness. Preprint at https://arxiv.org/pdf/1909.06210 (2019). [94] Bremner, M. J., Jozsa, R. & Shepherd, D. J. Classical simulation of commuting quantum computations implies collapse of the polynomial hierarchy. Proc. Royal Soc. A 467, 459472 (2010). [95] Harrow, A. W. & Montanaro, A. Quantum computational supremacy. Nature 549, 203 (2017). [96] Terhal, B. M. & DiVincenzo, D. P. Adaptive quantum computation, constant depth quantum circuits and Arthur-Merlin games. Quant. Inf. Comp. 4, 134145 (2004). [97] Harrow, A. W. & Mehraban, S. Approximate unitary t-designs by short random quantum circuits using nearest-neighbor and long-range gates. Preprint at https://arxiv.org/pdf/1809.06957 (2018). [98] Arora, S. & Barak, B. Computational complexity: a modern approach (Cambridge University Press, 2009). [99] Huang, C., Newman, M. & Szegedy, M. Explicit lower bounds on strong quantum simulation. Preprint at https://arxiv.org/pdf/1804.10368 (2018). [100] Dalzell, A. M., Harrow, A. W., Koh, D. E. & La Placa, R. L. How many qubits are needed
67
for quantum computational supremacy? Preprint at
https://arxiv.org/pdf/1805.05224.pdf (2018).
[101] Morimae, T. & Tamaki, S. Fine-grained quantum
supremacy of the one-clean-qubit model. Preprint at
https://arxiv.org/pdf/1901.01637 (2019).
[102] Aaronson, S. The equivalence of sampling and search-
ing. Theory of Computing Systems 55, 281298 (2014).
[103] The variational distance is defined as dVD(p, q) :=
1 2
x |p(x) q(x)| where the sum is over all output bit-
strings x.
[104] Aaronson, S. Certifiable randomness from supremacy.
Manuscript in preparation. (2019).
[105] Calabro, C., Impagliazzo, R. & Paturi, R. The com-
plexity of satisfiability of small depth circuits. In Inter-
national Workshop on Parameterized and Exact Com-
putation, 7585 (Springer, 2009).
[106] In terms of depth, the current construction gives a circuit U on n qubits of depth n3k/2+5/2 for which it takes time 2(12/k)n to estimate the transition probability to additive error 2(n+1), assuming a form of SETH stating that it takes time no less than 2(12/k)n to solve
k-SAT.