qgym.envs.routing package
Module containing the environment, rewarders, visualizer and other utils for the routing problem of OpenQL.
- class qgym.envs.routing.BasicRewarder(illegal_action_penalty=-50, penalty_per_swap=-10, reward_per_surpass=10)[source]
 Bases:
RewarderRL Rewarder, for computing rewards on the
RoutingState.- __init__(illegal_action_penalty=-50, penalty_per_swap=-10, reward_per_surpass=10)[source]
 Set the rewards and penalties.
- Parameters:
 illegal_action_penalty (
float) – Penalty for performing an illegal action. An action is illegal when the action means ‘surpass’ even though the next gate cannot be surpassed. This value should be negative (but is not required) and defaults to -50.penalty_per_swap (
float) – Penalty for placing a swap. In general, we want to have as little swaps as possible. Therefore, this value should be negative and defaults to -10.reward_per_surpass (
float) – Reward given for surpassing a gate. In general, we want to have go to the end of the circuit as fast as possible. Therefore, this value should be positive and defaults to 10.
- compute_reward(*, old_state, action, new_state)[source]
 Compute a reward, based on the old state, new state, and the given action.
- Parameters:
 old_state (
RoutingState) –RoutingStatebefore the current action.action (
int) – Action that has just been taken.new_state (
RoutingState) –RoutingStateafter the current action.
- Return type:
 - Returns:
 The reward for this action.
- class qgym.envs.routing.EpisodeRewarder(illegal_action_penalty=-50, penalty_per_swap=-10, reward_per_surpass=10)[source]
 Bases:
BasicRewarderRewarder for the
Routingenvironment, which only gives a reward after at the end of a full episode. The reward is the highest for the lowest amount of SWAPs. This could be improved for setting for taking into account the fidelity of edges and scoring good and looking at what edges the circuit is executed.- compute_reward(*, old_state, action, new_state)[source]
 Compute a reward, based on the new state, and the given action.
- Parameters:
 old_state (
RoutingState) –RoutingStatebefore the current action.action (
int) – Action that has just been taken.new_state (
RoutingState) –RoutingStateafter the current action.
- Return type:
 - Returns:
 If an action is illegal returns the illegal_action_penalty. If the episode is finished returns the reward calculated over the episode, otherwise returns 0.
- class qgym.envs.routing.Routing(connection_graph, interaction_generator=None, max_observation_reach=5, observe_legal_surpasses=True, observe_connection_graph=False, *, rewarder=None, render_mode=None)[source]
 Bases:
Environment[Dict[str,ndarray[Any,dtype[int32]]],int]RL environment for the routing problem of OpenQL.
- __init__(connection_graph, interaction_generator=None, max_observation_reach=5, observe_legal_surpasses=True, observe_connection_graph=False, *, rewarder=None, render_mode=None)[source]
 Initialize the action space, observation space, and initial states.
The supported render modes of this environment are
"human"and"rgb_array".- Parameters:
 connection_graph (
Union[Graph,Buffer,_SupportsArray[dtype[Any]],_NestedSequence[_SupportsArray[dtype[Any]]],bool,int,float,complex,str,bytes,_NestedSequence[Union[bool,int,float,complex,str,bytes]],list[int],list[Iterable[int]],tuple[int,...],tuple[Iterable[int],...]]) – Graph representation of the QPU topology. Each node represents a physical qubit and each edge represents a connection in the QPU topology. Seeparse_connection_graph()for supported formats.interaction_generator (
InteractionGenerator|None) – Interaction generator for generating interaction circuits. This generator is used to generate a new interaction circuit whenRouting.reset()is called without an interaction circuit.max_observation_reach (
int) – Sets a cap on the maximum amount of gates the agent can see ahead when making an observation. When bigger that max_interaction_gates the agent will always see all gates ahead in an observationobserve_legal_surpasses (
bool) – IfTruea boolean array of length observation_reach indicating whether the gates ahead can be executed, will be added to the observation_space.observe_connection_graph (
bool) – IfTrue, the connection_graph will be incorporated in the observation_space. Reason to set itFalseis: QPU-topology practically doesn’t change a lot for one machine, hence an agent is typically trained for just one QPU-topology which can be learned implicitly by rewards and/or the booleans if they are shown, depending on the other flag above. Default isFalse.rewarder (
Rewarder|None) – Rewarder to use for the environment. Must inherit fromRewarder. IfNone(default), thenBasicRewarderis used.render_mode (
str|None) – If"human"open apygamescreen visualizing the step. If"rgb_array", return an RGB array encoding of the rendered frame on each render call.
- reset(*, seed=None, options=None)[source]
 Reset the state and set/create a new interaction circuit.
To be used after an episode is finished.
- Parameters:
 seed (
int|None) – Seed for the random number generator, should only be provided (optionally) on the first reset call i.e., before any learning is done.options (
Mapping[str,Any] |None) – Mapping with keyword arguments with additional options for the reset. Keywords can be found in the description ofRoutingState.reset().
- Return type:
 tuple[dict[str,ndarray[Any,dtype[int32]]],dict[str,Any]]- Returns:
 Initial observation and debugging info.
- class qgym.envs.routing.RoutingState(*, interaction_generator, max_observation_reach, connection_graph, observe_legal_surpasses, observe_connection_graph)[source]
 Bases:
State[Dict[str,ndarray[Any,dtype[int32]]],int]The
RoutingStateclass.- __init__(*, interaction_generator, max_observation_reach, connection_graph, observe_legal_surpasses, observe_connection_graph)[source]
 Init of the
RoutingStateclass.- Parameters:
 interaction_generator (
InteractionGenerator) – Interaction generator for generating interaction circuits. This generator is used to generate a new interaction circuit whenRoutingState.reset()is called without an interaction circuit.max_observation_reach (
int) – Sets a cap on the maximum amount of gates the agent can see ahead when making an observation. When bigger than max_interaction_gates the agent will always see all gates ahead in an observationconnection_graph (
Graph) –networkxgraph representation of the QPU topology. Each node represents a physical qubit and each edge represents a connection in the QPU topology.observe_legal_surpasses (
bool) – IfTruea boolean array of length max_observation_reach indicating whether the gates ahead can be executed, will be added to the observation_space.observe_connection_graph (
bool) – IfTrue, the connection_graph will be incorporated in the observation_space. Reason to set itFalseis: QPU-topology doesn’t change, hence an agent could infer the topology from the training data without needing to explicitly add it to the observations. This reduced the size observation_space.
- connection_graph
 networkxgraph representation of the QPU topology. Each node represents a physical qubit and each edge represents a connection in the QPU topology.
- create_observation_space()[source]
 Create the corresponding observation space.
- Return type:
 - Returns:
 Observation space in the form of a
Dictspace containing:MultiDiscretespace representing the interaction gates ahead of current position.MultiDiscretespace representing the current mapping of logical onto physical qubits
- edges
 List of all the edges, used to decode given actions.
- interaction_circuit
 An array of 2-tuples of integers, where every tuple represents a, not specified, gate acting on the two qubits labeled by the integers in the tuples.
- interaction_generator
 Sets the maximum amount of gates in the interaction_circuit, when a new interaction_circuit is generated.
- is_done()[source]
 Checks if the current state is in a final state.
Returs: Boolean value stating whether we are in a final state.
- Return type:
 
- is_legal_surpass(logical_qubit1, logical_qubit2)[source]
 Checks whether a surpass of the current gate ahead is legal.
- mapping
 Array of which each index represents a logical qubit and each value represents a physical qubit.
- max_observation_reach
 An integer that sets a cap on the maximum amount of gates the agent can see ahead when making an observation. When bigger than max_interaction_gates the agent will always see all gates ahead in an observation.
- position: int
 An integer representing before which gate in the interaction_circuit the agent currently is.
- reset(*, seed=None, interaction_circuit=None, **_kwargs)[source]
 Reset the state (in place) and load a new (random) initial state.
To be used after an episode is finished.
- Parameters:
 seed (
int|None) – Seed for the random number generator, should only be provided (optionally) on the first reset call, i.e., before any learning is done.circuit – Optional list of tuples of ints that the interaction gates via the qubits the gates are acting on.
_kwargs (
Any) – Additional options to configure the reset.
- Return type:
 - Returns:
 Self.
- steps_done: int
 Number of steps done since the last reset.
- swap_gates_inserted: deque[tuple[int, int, int]]
 A deque of 3-tuples of integers, to register which gates to insert and where. Every tuple (g, q1, q2) represents the insertion of a SWAP-gate acting on logical qubits q1 and q2 before gate g in the interaction_circuit.
- update_state(action)[source]
 Update the state (in place) of this environment using the given action.
- Parameters:
 action (
int) – Integer value in [0, n_connections]. Each value of 0 to n_connections-1 corresponds to placing a SWAP and this SWAP gate will be appended to the swap_gates_inserteddeque. The value of n_connections correspond to a surpass.- Return type:
 - Returns:
 Self.
- class qgym.envs.routing.SwapQualityRewarder(illegal_action_penalty=-50, penalty_per_swap=-10, reward_per_surpass=10, good_swap_reward=5)[source]
 Bases:
BasicRewarderRewarder for the
Routingenvironment which takes swap qualities into account.The
SwapQualityRewarderhas an adjusted reward w.r.t. theBasicRewarderin the sense that good SWAPs give lower penalties and bad SWAPs give higher penalties.- __init__(illegal_action_penalty=-50, penalty_per_swap=-10, reward_per_surpass=10, good_swap_reward=5)[source]
 Set the rewards and penalties and a flag.
- Parameters:
 illegal_action_penalty (
float) – Penalty for performing an illegal action. An action is illegal when the action means ‘surpass’ even though the next gate cannot be surpassed. This value should be negative (but is not required) and defaults to -50.penalty_per_swap (
float) – Penalty for placing a swap. In general, we want to have as little swaps as possible. Therefore, this value should be negative and defaults to -10.reward_per_surpass (
float) – Reward given for surpassing a gate. In general, we want to have go to the end of the circuit as fast as possible. Therefore, this value should be positive and defaults to 10.good_swap_reward (
float) – Reward given for placing a good swap. In general, we want to place as little swaps as possible. However, when they are good, the penalty for the placement should be suppressed. That happens with this reward. So, the value should be positive and smaller than the penalty_per_swap, in order not to get positive rewards for swaps, defaults to 5.
- compute_reward(*, old_state, action, new_state)[source]
 Compute a reward, based on the old state, the given action and the new state.
Specifically, the change in observation reach is used.
- Parameters:
 old_state (
RoutingState) –RoutingStatebefore the current action.action (
int) – Action that has just been taken.new_state (
RoutingState) –RoutingStateafter the current action.
- Return type:
 - Returns:
 The reward for this action. If the action is illegal, then the reward is illegal_action_penalty. If the action is legal, then the reward for a surpass is just reward_per_surpass. But, for a legal swap the reward adjusted with respect to the BasicRewarder. Namely, the penalty of a swap is reduced if it increases the observation_reach and the penalty is increased if the observation_reach is decreases.
- qgym.envs.routing.routing module
 - qgym.envs.routing.routing_rewarders module
 - qgym.envs.routing.routing_state module
RoutingStateRoutingState.__init__()RoutingState.connection_graphRoutingState.create_observation_space()RoutingState.edgesRoutingState.interaction_circuitRoutingState.interaction_generatorRoutingState.is_done()RoutingState.is_legal_surpass()RoutingState.mappingRoutingState.max_observation_reachRoutingState.n_connectionsRoutingState.n_qubitsRoutingState.obtain_info()RoutingState.obtain_observation()RoutingState.positionRoutingState.reset()RoutingState.steps_doneRoutingState.swap_gates_insertedRoutingState.update_state()
 - qgym.envs.routing.routing_visualiser module