QGym mapper

The QGym package functions in a manner similar to the well-known gym package, in the sense that it provides a number of environments on which reinforcement learning (RL) agents can be applied. The main purpose of qgym is to develop reinforcement learning environments which represent various passes of the OpenQL framework.

The package offers RL-based environments resembling quantum compilation steps, namely for initial mapping, qubit routing, and gate scheduling. The environments offer all the relevant components needed to train agents, including states and action spaces, and (customizable) reward functions (basically all the components required by a Markov Decision Process). Furthermore, the actual training of the agents is handled by the StableBaselines3 python package, which offers reliable, customizable, out of the box Pytorch implementations of DRL agents.

The initial mapping problem is translated to a RL context within QGym in the following manner. The setup begins with a fixed connection graph (an undirected graph representation of the hardware connectivity), static across all episodes. Each episode introduces a novel, randomly generated interaction graph (undirected graph representation of the qubit interactions within the circuit) for the agent to observe, alongside an initially empty mapping. At every step, the agent can map a virtual qubit to a physical qubit until the mapping is fully established. In theory, this process enables the training of agents that are capable of managing various interaction graphs on a predetermined connectivity. Both the interaction and connection graphs are easily represented via Networkx graphs.

At the moment, the following DRL agents can be used to map circuits in Opensquirrel:

Proximal Policy Optimization (PPO)
Advantage Actor-Critic (A2C)
Trust Region Policy Optimization (TRPO)
Recurrent PPO
PPO with illegal action masking

The last three agents in the list above can be imported from the extension/experimental package of StableBaselines3, namely sb3-contrib.

The following code snippet demonstrates the usage of the QGymMapper.

We assume that the connectivity of the target backend QPU is known, as well as that a TRPO.zip file, containing the weights of a trained agent, is available in the working directory.

from opensquirrel.passes.mapper import QGymMapper
from opensquirrel import CircuitBuilder
import networkx as nx
import json

connectivity = {
        "0": [2],
        "1": [2],
        "2": [0, 1, 3, 4],
        "3": [2],
        "4": [2],
}

qgym_mapper = QGymMapper(
    agent_class="TRPO",
    agent_path="path-to-agent/TRPO.zip",
    connectivity=connectivity
)

builder = CircuitBuilder(5)
builder.H(0)
builder.CNOT(0, 1)
builder.H(2)
builder.CNOT(1, 2)
builder.CNOT(2, 4)
builder.CNOT(3, 4)
circuit = builder.to_circuit()

circuit.map(mapper = qgym_mapper)