qgym.envs.initial_mapping package

Module containing the environment, rewarders and visualizer for the initial mapping problem of OpenQL.

class qgym.envs.initial_mapping.BasicRewarder(illegal_action_penalty=-100, reward_per_edge=5, penalty_per_edge=-1)[source]

Bases: Rewarder

Basic rewarder for the InitialMapping environment.

__init__(illegal_action_penalty=-100, reward_per_edge=5, penalty_per_edge=-1)[source]

Initialize the reward range and set the rewards and penalties.

Parameters:
  • illegal_action_penalty (float) – Penalty for performing an illegal action. An action is illegal if the action contains a virtual or physical qubit that has already been mapped. This value should be negative (but is not required) and defaults to -100.

  • reward_per_edge (float) – Reward gained per ‘good’ edge in the interaction graph. An edge is ‘good’ if the mapped edge overlaps with an edge of the connection graph. This value should be positive (but is not required) and defaults to 5.

  • penalty_per_edge (float) – Penalty given per ‘bad’ edge in the interaction graph. An edge is ‘bad’ if the edge is mapped and is not ‘good’. This value should be negative (but is not required) and defaults to -1.

compute_reward(*, old_state, action, new_state)[source]

Compute a reward, based on the new state, and the given action.

Specifically the connection graph, interaction graphs and mapping are used.

Parameters:
Return type:

float

Returns:

The reward for this action. If the action is illegal, then the reward is illegal_action_penalty. If the action is legal, then the reward is the total number of ‘good’ edges times reward_per_edge plus the total number of ‘bad’ edges times penalty_per_edge.

class qgym.envs.initial_mapping.EpisodeRewarder(illegal_action_penalty=-100, reward_per_edge=5, penalty_per_edge=-1)[source]

Bases: BasicRewarder

Rewarder for the InitialMapping environment, which only gives a reward at the end of the episode or when an illegal action is taken.

compute_reward(*, old_state, action, new_state)[source]

Compute a reward, based on the new state, and the given action.

Specifically the connection graph, interaction graphs and mapping are used.

Parameters:
Return type:

float

Returns:

The reward for this action. If the action is illegal, then the reward is illegal_action_penalty. If the action is legal, but the mapping is not yet finished, then the reward is 0. If the action is legal, and the mapping is finished, then the reward is the number of ‘good’ edges times reward_per_edge plus the number of ‘bad’ edges times penalty_per_edge.

class qgym.envs.initial_mapping.InitialMapping(connection_graph, graph_generator=None, *, rewarder=None, render_mode=None)[source]

Bases: Environment[Dict[str, ndarray[Any, dtype[int32]]], ndarray[Any, dtype[int32]]]

RL environment for the initial mapping problem of OpenQL.

__init__(connection_graph, graph_generator=None, *, rewarder=None, render_mode=None)[source]

Initialize the action space, observation space, and initial states. Furthermore, the connection graph and edge probability for the random interaction graph of each episode is defined.

The supported render modes of this environment are "human" and "rgb_array".

Parameters:
action_space: Space[Any]

The action space of this environment.

metadata: dict[str, Any]

Additional metadata of this environment.

observation_space: Space[Any]

The observation space of this environment.

reset(*, seed=None, options=None)[source]

Reset the state and set a new interaction graph.

To be used after an episode is finished.

Parameters:
  • seed (int | None) – Seed for the random number generator, should only be provided (optionally) on the first reset call i.e., before any learning is done.

  • return_info – Whether to receive debugging info. Default is False.

  • options (Mapping[str, Any] | None) – Mapping with keyword arguments with additional options for the reset. Keywords can be found in the description of InitialMappingState.reset().

Return type:

tuple[dict[str, ndarray[Any, dtype[int32]]], dict[str, Any]]

Returns:

Initial observation and debugging info.

class qgym.envs.initial_mapping.InitialMappingState(connection_graph, graph_generator)[source]

Bases: State[Dict[str, ndarray[Any, dtype[int32]]], ndarray[Any, dtype[int32]]]

The InitialMappingState class.

__init__(connection_graph, graph_generator)[source]

Init of the InitialMappingState class.

Parameters:
  • connection_graph (Graph) – networkx Graph representation of the QPU topology. Each node represents a physical qubit and each edge represents a connection in the QPU topology.

  • graph_generator (GraphGenerator) – Graph generator for generating interaction graphs. This generator is used to generate a new interaction graph when InitialMappingState.reset() is called without an interaction graph.

create_observation_space()[source]

Create the corresponding observation space.

Return type:

Dict

Returns:

Observation space in the form of a Dict space containing the following values if the connection graph has no fidelity information:

graphs

Dictionary containing the graph and matrix representations of the both the interaction graph and connection graph.

is_done()[source]

Determine if the state is done or not.

Return type:

bool

Returns:

Boolean value stating whether we are in a final state.

is_truncated()[source]

Determine if the episode should be truncated or not.

Return type:

bool

Returns:

Boolean value stating whether the episode should be truncated. The episode is truncated if the number of steps in the current episode is more than 10 times the number of nodes in the connection graph.

mapped_qubits: dict[str, set[int]]

Dictionary with two sets containing mapped physical and logical qubits.

mapping

Array of which the index represents a physical qubit, and the value a virtual qubit. A value of n_nodes + 1 represents the case when nothing is mapped to the physical qubit yet.

mapping_dict: dict[int, int]

Dictionary that maps logical qubits (keys) to physical qubits (values).

property n_nodes: int

The number of physical qubits.

obtain_info()[source]

Obtain additional information.

Return type:

dict[str, Any]

Returns:

Optional debugging info for the current state.

obtain_observation()[source]

Obtain an observation based on the current state.

Return type:

dict[str, ndarray[Any, dtype[int32]]]

Returns:

Observation based on the current state.

reset(*, seed=None, interaction_graph=None, **_kwargs)[source]

Reset the state and set a new interaction graph.

To be used after an episode is finished.

Parameters:
  • seed (int | None) – Seed for the random number generator, should only be provided (optionally) on the first reset call i.e., before any learning is done.

  • interaction_graph (Graph | None) – Interaction graph to be used for the next iteration, if

  • created. (None a random interaction graph will be)

  • _kwargs (Any) – Additional options to configure the reset.

Return type:

InitialMappingState

Returns:

(self) New initial state.

steps_done: int

Number of steps done since the last reset.

update_state(action)[source]

Update the state (in place) of this environment using the given action.

Parameters:

action (ndarray[Any, dtype[int32]]) – Mapping action to be executed.

Return type:

InitialMappingState

Returns:

Self.

class qgym.envs.initial_mapping.SingleStepRewarder(illegal_action_penalty=-100, reward_per_edge=5, penalty_per_edge=-1)[source]

Bases: BasicRewarder

Rewarder for the InitialMapping environment, which gives a reward based on the improvement in the current step.

compute_reward(*, old_state, action, new_state)[source]

Compute a reward, based on the new state, and the given action.

Specifically the connection graph, interaction graphs and mapping are used.

Parameters:
Return type:

float

Returns:

The reward for this action. If the action is illegal, then the reward is illegal_action_penalty. If the action is legal, then the reward is the number of ‘good’ edges times reward_per_edge plus the number of ‘bad’ edges times penalty_per_edge created by the this action.