qgym.envs.routing package

Module containing the environment, rewarders, visualizer and other utils for the routing problem of OpenQL.

class qgym.envs.routing.BasicRewarder(illegal_action_penalty=-50, penalty_per_swap=-10, reward_per_surpass=10)[source]

Bases: Rewarder

RL Rewarder, for computing rewards on the RoutingState.

__init__(illegal_action_penalty=-50, penalty_per_swap=-10, reward_per_surpass=10)[source]

Set the rewards and penalties.

Parameters:
  • illegal_action_penalty (float) – Penalty for performing an illegal action. An action is illegal when the action means ‘surpass’ even though the next gate cannot be surpassed. This value should be negative (but is not required) and defaults to -50.

  • penalty_per_swap (float) – Penalty for placing a swap. In general, we want to have as little swaps as possible. Therefore, this value should be negative and defaults to -10.

  • reward_per_surpass (float) – Reward given for surpassing a gate. In general, we want to have go to the end of the circuit as fast as possible. Therefore, this value should be positive and defaults to 10.

compute_reward(*, old_state, action, new_state)[source]

Compute a reward, based on the old state, new state, and the given action.

Parameters:
Return type:

float

Returns:

The reward for this action.

class qgym.envs.routing.EpisodeRewarder(illegal_action_penalty=-50, penalty_per_swap=-10, reward_per_surpass=10)[source]

Bases: BasicRewarder

Rewarder for the Routing environment, which only gives a reward after at the end of a full episode. The reward is the highest for the lowest amount of SWAPs. This could be improved for setting for taking into account the fidelity of edges and scoring good and looking at what edges the circuit is executed.

compute_reward(*, old_state, action, new_state)[source]

Compute a reward, based on the new state, and the given action.

Parameters:
  • old_state (RoutingState) – RoutingState before the current action.

  • action (int) – Action that has just been taken.

  • new_state (RoutingState) – RoutingState after the current action.

Return type:

float

Returns:

If an action is illegal returns the illegal_action_penalty. If the episode is finished returns the reward calculated over the episode, otherwise returns 0.

class qgym.envs.routing.Routing(connection_graph, interaction_generator=None, max_observation_reach=5, observe_legal_surpasses=True, observe_connection_graph=False, *, rewarder=None, render_mode=None)[source]

Bases: Environment[Dict[str, ndarray[Any, dtype[int32]]], int]

RL environment for the routing problem of OpenQL.

__init__(connection_graph, interaction_generator=None, max_observation_reach=5, observe_legal_surpasses=True, observe_connection_graph=False, *, rewarder=None, render_mode=None)[source]

Initialize the action space, observation space, and initial states.

The supported render modes of this environment are "human" and "rgb_array".

Parameters:
  • connection_graph (Union[Graph, Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]], list[int], list[Iterable[int]], tuple[int, ...], tuple[Iterable[int], ...]]) – Graph representation of the QPU topology. Each node represents a physical qubit and each edge represents a connection in the QPU topology. See parse_connection_graph() for supported formats.

  • interaction_generator (InteractionGenerator | None) – Interaction generator for generating interaction circuits. This generator is used to generate a new interaction circuit when Routing.reset() is called without an interaction circuit.

  • max_observation_reach (int) – Sets a cap on the maximum amount of gates the agent can see ahead when making an observation. When bigger that max_interaction_gates the agent will always see all gates ahead in an observation

  • observe_legal_surpasses (bool) – If True a boolean array of length observation_reach indicating whether the gates ahead can be executed, will be added to the observation_space.

  • observe_connection_graph (bool) – If True, the connection_graph will be incorporated in the observation_space. Reason to set it False is: QPU-topology practically doesn’t change a lot for one machine, hence an agent is typically trained for just one QPU-topology which can be learned implicitly by rewards and/or the booleans if they are shown, depending on the other flag above. Default is False.

  • rewarder (Rewarder | None) – Rewarder to use for the environment. Must inherit from Rewarder. If None (default), then BasicRewarder is used.

  • render_mode (str | None) – If "human" open a pygame screen visualizing the step. If "rgb_array", return an RGB array encoding of the rendered frame on each render call.

reset(*, seed=None, options=None)[source]

Reset the state and set/create a new interaction circuit.

To be used after an episode is finished.

Parameters:
  • seed (int | None) – Seed for the random number generator, should only be provided (optionally) on the first reset call i.e., before any learning is done.

  • options (Mapping[str, Any] | None) – Mapping with keyword arguments with additional options for the reset. Keywords can be found in the description of RoutingState.reset().

Return type:

tuple[dict[str, ndarray[Any, dtype[int32]]], dict[str, Any]]

Returns:

Initial observation and debugging info.

class qgym.envs.routing.RoutingState(*, interaction_generator, max_observation_reach, connection_graph, observe_legal_surpasses, observe_connection_graph)[source]

Bases: State[Dict[str, ndarray[Any, dtype[int32]]], int]

The RoutingState class.

__init__(*, interaction_generator, max_observation_reach, connection_graph, observe_legal_surpasses, observe_connection_graph)[source]

Init of the RoutingState class.

Parameters:
  • interaction_generator (InteractionGenerator) – Interaction generator for generating interaction circuits. This generator is used to generate a new interaction circuit when RoutingState.reset() is called without an interaction circuit.

  • max_observation_reach (int) – Sets a cap on the maximum amount of gates the agent can see ahead when making an observation. When bigger than max_interaction_gates the agent will always see all gates ahead in an observation

  • connection_graph (Graph) – networkx graph representation of the QPU topology. Each node represents a physical qubit and each edge represents a connection in the QPU topology.

  • observe_legal_surpasses (bool) – If True a boolean array of length max_observation_reach indicating whether the gates ahead can be executed, will be added to the observation_space.

  • observe_connection_graph (bool) – If True, the connection_graph will be incorporated in the observation_space. Reason to set it False is: QPU-topology doesn’t change, hence an agent could infer the topology from the training data without needing to explicitly add it to the observations. This reduced the size observation_space.

connection_graph

networkx graph representation of the QPU topology. Each node represents a physical qubit and each edge represents a connection in the QPU topology.

create_observation_space()[source]

Create the corresponding observation space.

Return type:

Dict

Returns:

Observation space in the form of a Dict space containing:

  • MultiDiscrete space representing the interaction gates ahead of current position.

  • MultiDiscrete space representing the current mapping of logical onto physical qubits

edges

List of all the edges, used to decode given actions.

interaction_circuit

An array of 2-tuples of integers, where every tuple represents a, not specified, gate acting on the two qubits labeled by the integers in the tuples.

interaction_generator

Sets the maximum amount of gates in the interaction_circuit, when a new interaction_circuit is generated.

is_done()[source]

Checks if the current state is in a final state.

Returs: Boolean value stating whether we are in a final state.

Return type:

bool

Checks whether a surpass of the current gate ahead is legal.

Parameters:
  • logical_qubit1 (int) – First qubit of the interaction.

  • logical_qubit2 (int) – Second qubit of the logical interaction.

Return type:

bool

Returns:

A boolean value stating whether a connection gate with the two qubits can be executed with the current mapping and connection graph.

mapping

Array of which each index represents a logical qubit and each value represents a physical qubit.

max_observation_reach

An integer that sets a cap on the maximum amount of gates the agent can see ahead when making an observation. When bigger than max_interaction_gates the agent will always see all gates ahead in an observation.

property n_connections: int

Number of connections in the connection_graph.

property n_qubits: int

Number of qubits in the connection_graph.

obtain_info()[source]

Obtain additional information of the current state.

Return type:

dict[str, int | deque[tuple[int, int, int]] | ndarray[Any, dtype[int32]] | list[tuple[int]]]

Returns:

Dictionary containing optional debugging info for the current state.

obtain_observation()[source]

Observe the current state.

Return type:

dict[str, ndarray[Any, dtype[int32]]]

Returns:

Observation based on the current state.

position: int

An integer representing before which gate in the interaction_circuit the agent currently is.

reset(*, seed=None, interaction_circuit=None, **_kwargs)[source]

Reset the state (in place) and load a new (random) initial state.

To be used after an episode is finished.

Parameters:
  • seed (int | None) – Seed for the random number generator, should only be provided (optionally) on the first reset call, i.e., before any learning is done.

  • circuit – Optional list of tuples of ints that the interaction gates via the qubits the gates are acting on.

  • _kwargs (Any) – Additional options to configure the reset.

Return type:

RoutingState

Returns:

Self.

steps_done: int

Number of steps done since the last reset.

swap_gates_inserted: deque[tuple[int, int, int]]

A deque of 3-tuples of integers, to register which gates to insert and where. Every tuple (g, q1, q2) represents the insertion of a SWAP-gate acting on logical qubits q1 and q2 before gate g in the interaction_circuit.

update_state(action)[source]

Update the state (in place) of this environment using the given action.

Parameters:

action (int) – Integer value in [0, n_connections]. Each value of 0 to n_connections-1 corresponds to placing a SWAP and this SWAP gate will be appended to the swap_gates_inserted deque. The value of n_connections correspond to a surpass.

Return type:

RoutingState

Returns:

Self.

class qgym.envs.routing.SwapQualityRewarder(illegal_action_penalty=-50, penalty_per_swap=-10, reward_per_surpass=10, good_swap_reward=5)[source]

Bases: BasicRewarder

Rewarder for the Routing environment which takes swap qualities into account.

The SwapQualityRewarder has an adjusted reward w.r.t. the BasicRewarder in the sense that good SWAPs give lower penalties and bad SWAPs give higher penalties.

__init__(illegal_action_penalty=-50, penalty_per_swap=-10, reward_per_surpass=10, good_swap_reward=5)[source]

Set the rewards and penalties and a flag.

Parameters:
  • illegal_action_penalty (float) – Penalty for performing an illegal action. An action is illegal when the action means ‘surpass’ even though the next gate cannot be surpassed. This value should be negative (but is not required) and defaults to -50.

  • penalty_per_swap (float) – Penalty for placing a swap. In general, we want to have as little swaps as possible. Therefore, this value should be negative and defaults to -10.

  • reward_per_surpass (float) – Reward given for surpassing a gate. In general, we want to have go to the end of the circuit as fast as possible. Therefore, this value should be positive and defaults to 10.

  • good_swap_reward (float) – Reward given for placing a good swap. In general, we want to place as little swaps as possible. However, when they are good, the penalty for the placement should be suppressed. That happens with this reward. So, the value should be positive and smaller than the penalty_per_swap, in order not to get positive rewards for swaps, defaults to 5.

compute_reward(*, old_state, action, new_state)[source]

Compute a reward, based on the old state, the given action and the new state.

Specifically, the change in observation reach is used.

Parameters:
Return type:

float

Returns:

The reward for this action. If the action is illegal, then the reward is illegal_action_penalty. If the action is legal, then the reward for a surpass is just reward_per_surpass. But, for a legal swap the reward adjusted with respect to the BasicRewarder. Namely, the penalty of a swap is reduced if it increases the observation_reach and the penalty is increased if the observation_reach is decreases.