# Ultrahigh-fidelity spatial mode quantum gates in high-dimensional space by diffractive deep neural networks

### Concept of quantum gate assisted by D^{2}NN architecture

Figure 1a illustrates the quantum gate implementation concept. The spatial mode quantum gate is responsible for converting the spatial modes, where different mapping rules correspond to different logical gate operations. To design a specific quantum gate, the input and output mode correspondence should be determined. As an example, we use three Laguerre-Gaussian modes (\(LG_0^-2\), \(LG_1^0\) and \(LG_0^2\)) as inputs and outputs for the neat axial symmetry of their superposition states and exploiting azimuthal and radial DoFs to show the mapping of the three-dimensional *X* gate. It should be clarified that the notion of \(LG_p^\ell \) represents the LG mode with azimuthal order of \(\ell\) and radial order of *p*. The mode phases and the mapping between them are represented on the side of the D^{2}NN for clarity, with the yellow line indicating the current working mode conversion in this concept figure. To implement this gate, we need to generate phase layers, which are where the D^{2}NN comes into play. The D^{2}NN, inspired by digital artificial neural networks, uses phase planes as hidden layers to learn the relation between inputs and outputs in an energy- and time-efficient way by harnessing the power of light. In the D^{2}NN, inputs and outputs are optical fields, and the hidden layers are represented by phase planes, whose pixel phase determines the layer weight. Pixels between two layers are linked by the diffraction effect, and multiple phase layers are aligned sequentially along the optical axis to accomplish the desired mode conversion. The model of the spatial mode quantum gate is built and ready to be trained by iterative algorism on a computer. To learn the optimal phase layers that implement the desired operation on the experiment setup, we minimize the loss function of the deviation between the inference output field and the theoretical output field by the Adam gradient-descent algorithm^{31} while the pixel phase values are updated in diffractive layers. The optimization result is obtained when the loss function converges. See Supplementary Note 1 for more details on the pattern generation.

To verify the functionality of the generated phase layers, we conduct an experiment in which we imprint them onto a spatial light modulator (SLM), as depicted in Fig. 1b. The device comprises four parts: source, preparation, quantum gate, and measurement. The source is made by a heralded single photon source which detects one photon (known as heralding photon) from a photon pair to announce the arrival of the other one (the heralded photon), as show in Fig. 1c. Here the photon pairs at 1550 nm are generated by spontaneous parametric down-conversion (SPDC) in a type-0 periodically poled lithium niobite bulk crystal (PPLN2). The pump beam at 775 nm are generated via second harmonic generation of PPLN1 (see Method for more details). The second-order correlation function \(g^2(\tau )\) of the heralded single photon source, characterized through the Hanbury Brown and Twiss (HBT) setup, is illustrated in Fig. 1d. The value of \(g^2(0)\) is measured to be 0.024(2) based on the experimental outcomes derived from three-fold coincidences, with delays being varied. When assessed using another commonly employed evaluation method, \(g^2(0)=C_123\times N_3/C_13/C_23\), the value of \(g^2(0)\) could be as low as 2.56 × 10^{−4}. Here, *C*_{123} is the coincidence count of channel 1, 2 and 3 in the HBT setup, *N*_{3} is the single count of channel 3, and \(C_13(C_23)\) denotes the coincidence count of channels 1 and 3 (2 and 3).

The heralded photons are then guided to the preparation stage, where we employ the complex modulation technique^{38,39} to prepare the desired states and to project output states, and load SLM2 with the generated D^{2}NN phases to implement the quantum gate operation. To optimize the efficiency of SLM usage, simplify the experimental setup, and reduce the footprint of the device, we set a mirror opposing the SLM2 and adjust the incident angle such that the light reflects on the SLM2 the same number of times as the number of layers. We note that the incident angle will affect the alignment since layers are perpendicular to the optical axis in the modeling described in Fig. 1a. But the degradation in this experiment is not significant and thus acceptable here. Further analysis of the effect of the incident angle can be found in ref. ^{40}.

### High-dimensional spatial mode gates, single-photon CNOT gate and process tomography

The high-dimensional spatial mode gate performs a crucial function in higher dimensional encoding space. We design and demonstrate three-dimensional *X* gates, *H* gates, and a single-photon *CNOT* gate, as the diagrams depicted in Fig. 2a–c. The three-dimensional *X*_{1} gate circularly shifts the basis states forward, while the three-dimensional *H*_{1} gate transforms the basis states into superpositions of all basis states with different well-defined phases. For the *CNOT* gate, we adopted a unique coding method to encode two bits of information, utilizing four orbital angular momentum (OAM) modes of a single photon, which are \(\left|-1\right\rangle\), \(\left|+1\right\rangle\), \(\left|-3\right\rangle\) and \(\left|+3\right\rangle\) corresponding to \(\left|00\right\rangle\), \(\left|01\right\rangle\), \(\left|10\right\rangle\) and \(\left|11\right\rangle\). With this method, input state \(\left|-3\right\rangle\) is flipped to \(\left|+3\right\rangle\) by this *CNOT* gate and vice versa, while input states \(\left|\pm 1\right\rangle\) remaining unchanged. This method is inspired by the concept of path coding. Consider the scenario where four distinct path states are at disposal; they can be interpreted as a tensor product of two two-dimensional qubits, such as the top/bottom and left/right directions, as depicted in the upper section of Fig. 2d. Similarly, the OAM DoF, also possessing infinite dimensions, can be viewed as the “path” within the mode space. We artificially categorize four OAM states into two types: sign dimensions (referring to phase rotation directions) and order dimensions (indicating phase rotation orders). Each category comprises two levels, effectively forming the dual levels for control and target qubits. This configuration is illustrated in the lower segment of Fig. 2d. The path encoding method has been employed for demonstrating quantum fault-tolerant threshold recently^{41}. Additionally, the mode encoding method has also been utilized, with reference to the azimuthal and radial order of LG modes^{16}. Figure 2e–g display the output mode profiles of these gates, which exhibit a high degree of consistency between the simulations and experimental measurements. The minor deviations in the output modes might be attributed to possible misalignment and imperfections in the apparatus. The corresponding inputs are prepared in all mutually unbiased bases (MUBs) in three-dimension for *X* and *H* gates, while for the *CNOT* gate, an overcomplete state set in 2 × 2-dimension is utilized. Further details about the MUBs can be found in Supplementary Note 2.

To obtain a more comprehensive characterization of these gates, it is necessary to employ quantum process tomography (QPT)^{42}. It involves preparing the input states in all MUBs as previously described, applying the gates to this set of states, and performing full state tomography on every output state by means of projective measurements in all MUBs. The resulting normalized measurement outcomes yield the tomography matrices presented in Fig. 3a–c. With these tomography results, we can infer the quantum process \(\varepsilon (\rho _in)\), which can be decomposed as

$$\varepsilon (\rho _in)=\mathop\sum \limits_m,n=1^d^2\chi _mnE_m\rho _inE_n^\dagger $$

(1)

where \(\rho _in\) is the input state of the system, \(d\) is the dimension, and \(E\) is a set of operators usually being generalized Pauli matrix. We use the Gell–Mann matrices for \(d=3\) and two-qubit generalized Pauli matrices for \(d=4\). The \(d^2\times d^2\) matrix \(\chi\) is the process matrix, which contains all information about this quantum process \(\varepsilon (\rho _in)\) and can be reconstructed since all other elements in Eq. (1) is known. Figure 3d–f exhibit bar plots of the reconstructed process matrices *χ*. For a clearer view, the bars with negative values are flipped up and presented with pale gray edges. A physical matrix must be positive semidefinite and trace preserving. Due to inherent experimental noise, however, standard QPT might produce an unphysical matrix with negative eigenvalues. Thus, the maximum-likelihood estimation (MLE) method that always yields physically sensible results is devised for the reconstruction^{42,43}. By introducing the Lagrange multiplier and constructing an appropriate iteration form, MLE method preserves the positive semi-definiteness and trace normalization of the process matrix (see Supplementary Note 3). The conformity between theory and experiment is evaluated by process fidelity \(F=\rmTr(\chi _t\chi _e)\), which is the trace of the product of theoretical process matrix \(\chi _t\) and experimental process matrix \(\chi _e\). In this experiment, we achieve \(F_3DX1=98.4(2) \%\), \(F_3DH1=99.4(3) \%\) and \(F_CNOT=99.6(2) \%\), respectively. To quantify the comparison between theoretical, simulated and experimental tomography matrices, the mean squared errors (MSEs) of them for different gates are exhibited in Fig. 3g. An intuitive observation emerging from the data comparison is that lower MSE signifies better fidelity, and the experimental imperfections significantly reduce the MSE level. Additionally, the process matrices of *X*_{2}, *H*_{2}, *H*_{3} are listed in Supplementary Note 2. Our results indicate not only the feasibility of high-dimensional operation in spatial modes of photons using D^{2}NN, but also the high performance which is essential to the reliable execution of quantum algorithms.

### Demonstration of the Deutsch algorithm and intelligent deployment

Quantum gates are fundamental components of quantum algorithms, which leverage quantum superposition and interference to achieve computational advantages over classical algorithms. These gates are capable of simultaneously processing all possible states via superposition and achieving the correct outcome via interference. As an application of the D^{2}NN quantum gates, we implement the quantum circuit of the two-bit version of the Deutsch-Jozsa algorithm, known as the Deutsch algorithm^{44,45}. The algorithm’s goal is to determine whether a Boolean function \(f:\x_1,x_2,\cdots ,x_n\\to \0,1\\) is constant (\(f(x_i)=0\) or \(f(x_i)=1\) for all \(x\)), or balanced (\(f(x_i)=0\) for half of \(x\) and \(f(x_i)=1\) for the other half of \(x\)). In classical computing, the worst-case scenario requires \(2^n-1+1\) query of \(f\) to identify it, since there are \(2^n\) possible outputs.

Leveraging the power of the Deutsch algorithm, the question can be solved with just one function evaluation, instead of growing exponentially with the number of qubits. In the state preparation stage, we set the qubit \(x\) as \(|0\rangle _x\) and qubit \(y\) as \(|1\rangle _y\), and apply two *H* gates to each qubit to create a superposition that represents all possible state combinations. By creating a superposition, the oracle function (maps the state \(|x\rangle |y\rangle\) to \(|x\rangle |y\oplus f(x)\rangle\), \(\oplus\) is the *XOR* operation) can work on all possible configurations simultaneously. After applying the oracle function and the interference (*H* gates to each qubit after the oracle), we will measure qubit \(x\) as 0 for all constant functions or 1 for balanced functions. The derivation of this process is in Supplementary Note 4. Two kinds of oracle functions that are constant and balanced are constructed using an identity operator and a *CNOT* operator, respectively, as shown in Fig. 4a. The corresponding output states are presented before measurement. Notably, the oracle function and interference process are implemented with a single SLM, which reduces system complexity and alignment difficulties, enhancing the practicality of our implementation. Although only \(x\) qubit measurement is required, we project and measure the output photon in four bases of two qubits for clarity, as shown in Fig. 4b, and the results are consistent with the theoretical output states in Fig. 4a.

In addition to demonstrating various gates in the quantum circuit, our reconfigurable implementation showcases more innovative features. We propose a protocol to explore its potential in applications that demand intelligent deployment. This protocol takes advantage of the flexibility offered by the reconfigurable setup and the smooth coordination among preparation, operation, and measurement devices. The task is to configure the gate setup according to preset commands, specifically achieving the automated switch from \(U_1\) to \(U_2\). The entire self-configuration process is illustrated in Fig. 5a, encompassing several steps. First, the current gate’s performance is assessed using process tomography. This tomography matrix is then utilized for reconstructing the gate fidelity via the MLE method mentioned earlier. The MLE method produces different fidelities based on various assumptions about ideal tomography matrices, as depicted in Fig. 5b. Among these assumptions, the one leading to the highest fidelity is most likely to represent the actual gate behavior, thereby allowing the gate to be tested and identified. Subsequently, the system transitions to another desired gate configuration by loading a new layer set, which is *H*_{3} here. To confirm the realization of the intended gate, another round of process tomography and reconstruction are performed. The annotations at the initiation and termination points of the tomography arrow in Fig. 5a signify the initial and measured tomography matrices, respectively. Meanwhile, the annotations along the switch arrow denote the previous and updated layers.

After configuring to the desired gate, the next step involves optimizing the spacing between each layer to match the experimental setup. Adjusting the spacing between SLM2 and the mirror can be a tedious task, but it can be avoided by varying the spacing during pattern generation. This approach is more compatible with automatic protocols. First, a rough estimate of the spacing range is made based on the setup, and then five equally spaced sampling points are selected as the initial condition. The visibilities of these sampling points and their distribution are used to update the sampling points for the next step (see Supplementary Note 5). This iterative process stops when the searching range is below the threshold, where the spacing variance does not correspond to a significant change in visibility. The threshold is presumed to be lower than 0.1 mm, which is supported by the analysis of Z-axis offset effects in Supplementary Note 6 (with some approaches to improve misalignment tolerance). Figure 5c shows the visibilities of different sampling spots in each update, with the highest visibility observed around 0.97 at 41 mm.

### Gate performance analysis and comparison with WFM

Optimizing practical parameters inherent to the SLM is of paramount importance to achieve optimal quantum gate performance. To assess the critical pixel number of the SLM, it is essential to explore two distinct scenarios, as diagramed in Fig. 6a, b. The first scenario involves variations in the number of pixels while keeping the physical dimensions of the layers fixed. Alternatively, the second scenario considers changes in the number of pixels while maintaining the pixel dimensions, which, in turn, alter the physical dimensions of the layers.

In the first scenario, our primary focus is on the influence of pixel density or sampling precision on quantum gates. Our analysis intuitively suggests that higher pixel densities contribute to enhanced quantum gate performance. Additionally, we delve into the consequences of upscaling or downsampling quantum gates with different pixel densities. The findings depicted in Fig. 6e reveal that both upscaling and downsampling have a detrimental effect on performance, although they operate through distinct mechanisms. Upscaling reduces performance by diminishing the effective modulation degrees of freedom, while downsampling leads to performance degradation due to sampling errors arising from pixel merging. In the second scenario, our analysis demonstrates, in Fig. 6f, that increasing the number of pixels can indeed boost performance. However, it’s important to note that the marginal improvement diminishes, and performance changes become less significant after reaching approximately 384 pixels. This suggests that, for the scope of our work, selecting 384 pixels suffices to achieve the desired outcomes.

Moreover, the gray level depth of the SLM can significantly impact quantum gate performance, as illustrated in Fig. 6c, g. Lower gray level depths introduce inaccuracies in pixel values. Surprisingly, our findings indicate that quantum gate performance remains relatively stable within the range of 32 to 512 gray levels, with a noticeable performance drop occurring only when the depth falls below 16 levels. Another critical parameter affecting performance is the presence of phase distortion in the SLM, as shown in Fig. 6d, h. The distortion phase applied is a combination of the fourth and fifteenth terms of Zernike polynomials. Phase distortion introduces additional phase components that render the original design ineffective. Consequently, the necessity of pre-compensating for shape distortions in the SLM becomes evident.

Despite our successful experimental demonstration of the D^{2}NN spatial mode quantum gate, we remain concern about its upper performance limit. Two primary metrics are used to evaluate the device performance: visibility and loss. Visibility characterizes the quality of the output, and is defined as \(V_i=|\langle \varphi _i|\varphi _i\rangle ^2/\sum _i,j^d|\langle \varphi _i|\varphi _j\rangle ^2\), where \(|\varphi _j\rangle\) is the defined computational basis, and \(d\) is the dimension. Loss, defined as \((|\rmE_in^2-|\rmE_out^2)/|\rmE_in^2\), is the normalized energy waste and negatively related to efficiency. It is worth mentioning that the energy loss here referring to the scattering loss that arises from undesired phase pattern design or pattern misalignment, and does not account for Fresnel reflection effects. Several factors might impact visibility and loss, including the number of trainings, the spacing between each layer, and the number of layers with a fixed spacing or a fixed total length, as presented in Fig. 7a–d. Our data indicate that performance generally improves with increasing numbers of training and layers, eventually converging to a certain value. It is intuitive that the D^{2}NN fitness shows positive correlation with the training number, resulting in better visibility and lower loss. Similarly, more layers provide more DoFs to achieve desired conversion, whether fixed spacing or fixed total length. It is worth noting that poor and less reliable performance is observed when the number of trainings or layers is very small, as these designs do not fully function. Specifically, some inputs are not correctly converted, and their energy is lost due to scattering. Of interest, the loss related to spacing between each layer (Fig. 7b) shows a small dip around 10 mm and flattens out as the spacing increases. This discrepancy could be attributed to lower pixel utilization for shorter diffraction distances and larger losses due to scattering for larger diffraction distances.

To clearly illustrate the improvement of our approach, we compare the performance of D^{2}NN with WFM in Fig. 7e, f. The results indicate that both methods converge within a few dozen iterations in terms of visibility and loss. WFM achieves near-perfect energy conservation, as evidenced by its impressive loss plot. This can be explained by the weight update mechanism in WFM, which is based on the overlap between the forward propagation field and the backpropagation field. In contrast, D^{2}NN allows for arbitrary connections between nodes and uses the Adam algorithm to avoid local optima^{46}. Although D^{2}NN requires more training to reduce its loss, it outperforms WFM in terms of visibility and approaches the ideal value of 1. It is worth noting that in all cases shown in Fig. 7, we use all 12 states in the three-dimensional MUBs as input states, which diminishes the performance of WFM. Overall, the comparison reveals that D^{2}NN significantly improves visibility at a small cost of energy loss.

Fortunately, theoretical energy losses associated with D^{2}NN can be controlled by incorporating them into the optimization loss function, as depicted in Fig. 7g. Originally, the loss function is defined as the MSE between the inferred outputs \(E\) and the corresponding theoretical outputs \(\hatE\), which is \(f_L=\frac1n^2\mathop\sum \limits_i,j^n(^2)\). It is worth noting that the original loss function inherently accounts for energy loss to some extent, yet the energy term constraints are not fully expressed during the training process. Instead, greater emphasis is placed on optimizing visibility. The introduction of a separate energy loss term aims to strike a balance of these two parameters. As the weight assigned to the energy loss in the new loss function increases, we observe a slight reduction in output visibility. However, it is noteworthy that the D^{2}NN outputs continue to outperform the optimized visibility achieved with the WFM method. Simultaneously, the energy loss of D^{2}NN converges towards that of the WFM method, as shown in Fig. 7h, i, underscoring the superiority of our approach.

link