Realizing a deep reinforcement learning agent discovering real-time feedback control strategies for a quantum system
ORAL
Abstract
To realize the full potential of quantum technologies, finding good strategies to control quantum information processing devices in real time becomes increasingly important. Usually these strategies require a precise understanding of the device itself, which is generally not available. Model-free reinforcement learning circumvents this need by discovering control strategies from scratch without relying on an accurate description of the quantum system. Furthermore, important tasks like state preparation, gate teleportation and error correction need feedback at time scales much shorter than the coherence time, which for superconducting circuits is in the microsecond range. Developing and training a deep reinforcement learning agent able to operate in this real-time feedback regime has been an open challenge.
Here, we have implemented such an agent in the form of a latency-optimized deep neural network on an FPGA. We demonstrate its use to efficiently initialize a superconducting qubit into a target state. To train the agent, we use model-free reinforcement learning that is based solely on measurement data. We study the agent's performance for high-fidelity, low-fidelity and three-level readout, and compare with simple strategies based on thresholding. This demonstration motivates further research towards adoption of reinforcement learning for real-time feedback control of quantum devices and more generally any physical system requiring learnable low-latency feedback control.
Here, we have implemented such an agent in the form of a latency-optimized deep neural network on an FPGA. We demonstrate its use to efficiently initialize a superconducting qubit into a target state. To train the agent, we use model-free reinforcement learning that is based solely on measurement data. We study the agent's performance for high-fidelity, low-fidelity and three-level readout, and compare with simple strategies based on thresholding. This demonstration motivates further research towards adoption of reinforcement learning for real-time feedback control of quantum devices and more generally any physical system requiring learnable low-latency feedback control.
*This work was supported by the Swiss National Science Foundation (SNSF) through the project "Quantum Photonics with Microwaves in Superconducting Circuits", by the European Research Council (ERC) through the project "Superconducting Quantum Networks" (SuperQuNet), by the National Centre of Competence in Research "Quantum Science and Technology" (NCCR QSIT), a research instrument of the Swiss National Science Foundation (SNSF), by ETH Zurich, the Munich Quantum Valley, which is supported by the Bavarian state government with funds from the Hightech Agenda Bayern Plus, and by the Max Planck Society.
–
Presenters
-
Jonas Landgraf
- Max Planck Institute for the Science of Light