qgym.envs.routing package
Module containing the environment, rewarders, visualizer and other utils for the routing problem of OpenQL.
- class qgym.envs.routing.BasicRewarder(illegal_action_penalty=-50, penalty_per_swap=-10, reward_per_surpass=10)[source]
Bases:
Rewarder
RL Rewarder, for computing rewards on the
RoutingState
.- __init__(illegal_action_penalty=-50, penalty_per_swap=-10, reward_per_surpass=10)[source]
Set the rewards and penalties.
- Parameters:
illegal_action_penalty (
float
) – Penalty for performing an illegal action. An action is illegal when the action means ‘surpass’ even though the next gate cannot be surpassed. This value should be negative (but is not required) and defaults to -50.penalty_per_swap (
float
) – Penalty for placing a swap. In general, we want to have as little swaps as possible. Therefore, this value should be negative and defaults to -10.reward_per_surpass (
float
) – Reward given for surpassing a gate. In general, we want to have go to the end of the circuit as fast as possible. Therefore, this value should be positive and defaults to 10.
- compute_reward(*, old_state, action, new_state)[source]
Compute a reward, based on the old state, new state, and the given action.
- Parameters:
old_state (
RoutingState
) –RoutingState
before the current action.action (
int
) – Action that has just been taken.new_state (
RoutingState
) –RoutingState
after the current action.
- Return type:
- Returns:
The reward for this action.
- class qgym.envs.routing.EpisodeRewarder(illegal_action_penalty=-50, penalty_per_swap=-10, reward_per_surpass=10)[source]
Bases:
BasicRewarder
Rewarder for the
Routing
environment, which only gives a reward after at the end of a full episode. The reward is the highest for the lowest amount of SWAPs. This could be improved for setting for taking into account the fidelity of edges and scoring good and looking at what edges the circuit is executed.- compute_reward(*, old_state, action, new_state)[source]
Compute a reward, based on the new state, and the given action.
- Parameters:
old_state (
RoutingState
) –RoutingState
before the current action.action (
int
) – Action that has just been taken.new_state (
RoutingState
) –RoutingState
after the current action.
- Return type:
- Returns:
If an action is illegal returns the illegal_action_penalty. If the episode is finished returns the reward calculated over the episode, otherwise returns 0.
- class qgym.envs.routing.Routing(connection_graph, interaction_generator=None, max_observation_reach=5, observe_legal_surpasses=True, observe_connection_graph=False, *, rewarder=None, render_mode=None)[source]
Bases:
Environment
[Dict
[str
,ndarray
[Any
,dtype
[int32
]]],int
]RL environment for the routing problem of OpenQL.
- __init__(connection_graph, interaction_generator=None, max_observation_reach=5, observe_legal_surpasses=True, observe_connection_graph=False, *, rewarder=None, render_mode=None)[source]
Initialize the action space, observation space, and initial states.
The supported render modes of this environment are
"human"
and"rgb_array"
.- Parameters:
connection_graph (
Union
[Graph
,Buffer
,_SupportsArray
[dtype
[Any
]],_NestedSequence
[_SupportsArray
[dtype
[Any
]]],bool
,int
,float
,complex
,str
,bytes
,_NestedSequence
[Union
[bool
,int
,float
,complex
,str
,bytes
]],list
[int
],list
[Iterable
[int
]],tuple
[int
,...
],tuple
[Iterable
[int
],...
]]) – Graph representation of the QPU topology. Each node represents a physical qubit and each edge represents a connection in the QPU topology. Seeparse_connection_graph()
for supported formats.interaction_generator (
InteractionGenerator
|None
) – Interaction generator for generating interaction circuits. This generator is used to generate a new interaction circuit whenRouting.reset()
is called without an interaction circuit.max_observation_reach (
int
) – Sets a cap on the maximum amount of gates the agent can see ahead when making an observation. When bigger that max_interaction_gates the agent will always see all gates ahead in an observationobserve_legal_surpasses (
bool
) – IfTrue
a boolean array of length observation_reach indicating whether the gates ahead can be executed, will be added to the observation_space.observe_connection_graph (
bool
) – IfTrue
, the connection_graph will be incorporated in the observation_space. Reason to set itFalse
is: QPU-topology practically doesn’t change a lot for one machine, hence an agent is typically trained for just one QPU-topology which can be learned implicitly by rewards and/or the booleans if they are shown, depending on the other flag above. Default isFalse
.rewarder (
Rewarder
|None
) – Rewarder to use for the environment. Must inherit fromRewarder
. IfNone
(default), thenBasicRewarder
is used.render_mode (
str
|None
) – If"human"
open apygame
screen visualizing the step. If"rgb_array"
, return an RGB array encoding of the rendered frame on each render call.
- reset(*, seed=None, options=None)[source]
Reset the state and set/create a new interaction circuit.
To be used after an episode is finished.
- Parameters:
seed (
int
|None
) – Seed for the random number generator, should only be provided (optionally) on the first reset call i.e., before any learning is done.options (
Mapping
[str
,Any
] |None
) – Mapping with keyword arguments with additional options for the reset. Keywords can be found in the description ofRoutingState
.reset()
.
- Return type:
tuple
[dict
[str
,ndarray
[Any
,dtype
[int32
]]],dict
[str
,Any
]]- Returns:
Initial observation and debugging info.
- class qgym.envs.routing.RoutingState(*, interaction_generator, max_observation_reach, connection_graph, observe_legal_surpasses, observe_connection_graph)[source]
Bases:
State
[Dict
[str
,ndarray
[Any
,dtype
[int32
]]],int
]The
RoutingState
class.- __init__(*, interaction_generator, max_observation_reach, connection_graph, observe_legal_surpasses, observe_connection_graph)[source]
Init of the
RoutingState
class.- Parameters:
interaction_generator (
InteractionGenerator
) – Interaction generator for generating interaction circuits. This generator is used to generate a new interaction circuit whenRoutingState.reset()
is called without an interaction circuit.max_observation_reach (
int
) – Sets a cap on the maximum amount of gates the agent can see ahead when making an observation. When bigger than max_interaction_gates the agent will always see all gates ahead in an observationconnection_graph (
Graph
) –networkx
graph representation of the QPU topology. Each node represents a physical qubit and each edge represents a connection in the QPU topology.observe_legal_surpasses (
bool
) – IfTrue
a boolean array of length max_observation_reach indicating whether the gates ahead can be executed, will be added to the observation_space.observe_connection_graph (
bool
) – IfTrue
, the connection_graph will be incorporated in the observation_space. Reason to set itFalse
is: QPU-topology doesn’t change, hence an agent could infer the topology from the training data without needing to explicitly add it to the observations. This reduced the size observation_space.
- connection_graph
networkx
graph representation of the QPU topology. Each node represents a physical qubit and each edge represents a connection in the QPU topology.
- create_observation_space()[source]
Create the corresponding observation space.
- Return type:
- Returns:
Observation space in the form of a
Dict
space containing:MultiDiscrete
space representing the interaction gates ahead of current position.MultiDiscrete
space representing the current mapping of logical onto physical qubits
- edges
List of all the edges, used to decode given actions.
- interaction_circuit
An array of 2-tuples of integers, where every tuple represents a, not specified, gate acting on the two qubits labeled by the integers in the tuples.
- interaction_generator
Sets the maximum amount of gates in the interaction_circuit, when a new interaction_circuit is generated.
- is_done()[source]
Checks if the current state is in a final state.
Returs: Boolean value stating whether we are in a final state.
- Return type:
- is_legal_surpass(logical_qubit1, logical_qubit2)[source]
Checks whether a surpass of the current gate ahead is legal.
- mapping
Array of which each index represents a logical qubit and each value represents a physical qubit.
- max_observation_reach
An integer that sets a cap on the maximum amount of gates the agent can see ahead when making an observation. When bigger than max_interaction_gates the agent will always see all gates ahead in an observation.
- position: int
An integer representing before which gate in the interaction_circuit the agent currently is.
- reset(*, seed=None, interaction_circuit=None, **_kwargs)[source]
Reset the state (in place) and load a new (random) initial state.
To be used after an episode is finished.
- Parameters:
seed (
int
|None
) – Seed for the random number generator, should only be provided (optionally) on the first reset call, i.e., before any learning is done.circuit – Optional list of tuples of ints that the interaction gates via the qubits the gates are acting on.
_kwargs (
Any
) – Additional options to configure the reset.
- Return type:
- Returns:
Self.
- steps_done: int
Number of steps done since the last reset.
- swap_gates_inserted: deque[tuple[int, int, int]]
A deque of 3-tuples of integers, to register which gates to insert and where. Every tuple (g, q1, q2) represents the insertion of a SWAP-gate acting on logical qubits q1 and q2 before gate g in the interaction_circuit.
- update_state(action)[source]
Update the state (in place) of this environment using the given action.
- Parameters:
action (
int
) – Integer value in [0, n_connections]. Each value of 0 to n_connections-1 corresponds to placing a SWAP and this SWAP gate will be appended to the swap_gates_inserteddeque
. The value of n_connections correspond to a surpass.- Return type:
- Returns:
Self.
- class qgym.envs.routing.SwapQualityRewarder(illegal_action_penalty=-50, penalty_per_swap=-10, reward_per_surpass=10, good_swap_reward=5)[source]
Bases:
BasicRewarder
Rewarder for the
Routing
environment which takes swap qualities into account.The
SwapQualityRewarder
has an adjusted reward w.r.t. theBasicRewarder
in the sense that good SWAPs give lower penalties and bad SWAPs give higher penalties.- __init__(illegal_action_penalty=-50, penalty_per_swap=-10, reward_per_surpass=10, good_swap_reward=5)[source]
Set the rewards and penalties and a flag.
- Parameters:
illegal_action_penalty (
float
) – Penalty for performing an illegal action. An action is illegal when the action means ‘surpass’ even though the next gate cannot be surpassed. This value should be negative (but is not required) and defaults to -50.penalty_per_swap (
float
) – Penalty for placing a swap. In general, we want to have as little swaps as possible. Therefore, this value should be negative and defaults to -10.reward_per_surpass (
float
) – Reward given for surpassing a gate. In general, we want to have go to the end of the circuit as fast as possible. Therefore, this value should be positive and defaults to 10.good_swap_reward (
float
) – Reward given for placing a good swap. In general, we want to place as little swaps as possible. However, when they are good, the penalty for the placement should be suppressed. That happens with this reward. So, the value should be positive and smaller than the penalty_per_swap, in order not to get positive rewards for swaps, defaults to 5.
- compute_reward(*, old_state, action, new_state)[source]
Compute a reward, based on the old state, the given action and the new state.
Specifically, the change in observation reach is used.
- Parameters:
old_state (
RoutingState
) –RoutingState
before the current action.action (
int
) – Action that has just been taken.new_state (
RoutingState
) –RoutingState
after the current action.
- Return type:
- Returns:
The reward for this action. If the action is illegal, then the reward is illegal_action_penalty. If the action is legal, then the reward for a surpass is just reward_per_surpass. But, for a legal swap the reward adjusted with respect to the BasicRewarder. Namely, the penalty of a swap is reduced if it increases the observation_reach and the penalty is increased if the observation_reach is decreases.
- qgym.envs.routing.routing module
- qgym.envs.routing.routing_rewarders module
- qgym.envs.routing.routing_state module
RoutingState
RoutingState.__init__()
RoutingState.connection_graph
RoutingState.create_observation_space()
RoutingState.edges
RoutingState.interaction_circuit
RoutingState.interaction_generator
RoutingState.is_done()
RoutingState.is_legal_surpass()
RoutingState.mapping
RoutingState.max_observation_reach
RoutingState.n_connections
RoutingState.n_qubits
RoutingState.obtain_info()
RoutingState.obtain_observation()
RoutingState.position
RoutingState.reset()
RoutingState.steps_done
RoutingState.swap_gates_inserted
RoutingState.update_state()
- qgym.envs.routing.routing_visualiser module