qgym.envs package

Specific environments of this RL Gym in the Quantum domain. This package contains the InitialMapping, Routing and Scheduling environments, which model their respective OpenQL passes.

class qgym.envs.InitialMapping(connection_graph, graph_generator=None, *, rewarder=None, render_mode=None)[source]

Bases: Environment[Dict[str, ndarray[Any, dtype[int32]]], ndarray[Any, dtype[int32]]]

RL environment for the initial mapping problem of OpenQL.

__init__(connection_graph, graph_generator=None, *, rewarder=None, render_mode=None)[source]

Initialize the action space, observation space, and initial states. Furthermore, the connection graph and edge probability for the random interaction graph of each episode is defined.

The supported render modes of this environment are "human" and "rgb_array".

Parameters:

connection_graph (Union[Graph, Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]], list[int], tuple[int, ...]]) – Graph representation of the QPU topology. Each node represents a physical qubit and each edge represents a connection in the QPU topology. See parse_connection_graph() for supported formats.
graph_generator (GraphGenerator | None) – Graph generator for generating interaction graphs. This generator is used to generate a new interaction graph when InitialMapping.reset() is called without an interaction graph. If None is provided a new BasicGraphGenerator with the same number of nodes as the interaction graph will be made.
rewarder (Rewarder | None) – Rewarder to use for the environment. Must inherit from qgym.templates.Rewarder. If None (default), then BasicRewarder is used.
render_mode (str | None) – If "human" open a pygame screen visualizing the step. If "rgb_array", return an RGB array encoding of the rendered frame on each render call.

action_space: Space[Any]: The action space of this environment.

metadata: dict[str, Any]: Additional metadata of this environment.

observation_space: Space[Any]: The observation space of this environment.

reset(*, seed=None, options=None)[source]

Reset the state and set a new interaction graph.

To be used after an episode is finished.

Parameters:

seed (int | None) – Seed for the random number generator, should only be provided (optionally) on the first reset call i.e., before any learning is done.
return_info – Whether to receive debugging info. Default is False.
options (Mapping[str, Any] | None) – Mapping with keyword arguments with additional options for the reset. Keywords can be found in the description of InitialMappingState.reset().

Return type:

tuple[dict[str, ndarray[Any, dtype[int32]]], dict[str, Any]]

Returns:

Initial observation and debugging info.

class qgym.envs.Routing(connection_graph, interaction_generator=None, max_observation_reach=5, observe_legal_surpasses=True, observe_connection_graph=False, *, rewarder=None, render_mode=None)[source]

Bases: Environment[Dict[str, ndarray[Any, dtype[int32]]], int]

RL environment for the routing problem of OpenQL.

__init__(connection_graph, interaction_generator=None, max_observation_reach=5, observe_legal_surpasses=True, observe_connection_graph=False, *, rewarder=None, render_mode=None)[source]

Initialize the action space, observation space, and initial states.

The supported render modes of this environment are "human" and "rgb_array".

Parameters:

connection_graph (Union[Graph, Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]], list[int], list[Iterable[int]], tuple[int, ...], tuple[Iterable[int], ...]]) – Graph representation of the QPU topology. Each node represents a physical qubit and each edge represents a connection in the QPU topology. See parse_connection_graph() for supported formats.
interaction_generator (InteractionGenerator | None) – Interaction generator for generating interaction circuits. This generator is used to generate a new interaction circuit when Routing.reset() is called without an interaction circuit.
max_observation_reach (int) – Sets a cap on the maximum amount of gates the agent can see ahead when making an observation. When bigger that max_interaction_gates the agent will always see all gates ahead in an observation
observe_legal_surpasses (bool) – If True a boolean array of length observation_reach indicating whether the gates ahead can be executed, will be added to the observation_space.
observe_connection_graph (bool) – If True, the connection_graph will be incorporated in the observation_space. Reason to set it False is: QPU-topology practically doesn’t change a lot for one machine, hence an agent is typically trained for just one QPU-topology which can be learned implicitly by rewards and/or the booleans if they are shown, depending on the other flag above. Default is False.
rewarder (Rewarder | None) – Rewarder to use for the environment. Must inherit from Rewarder. If None (default), then BasicRewarder is used.
render_mode (str | None) – If "human" open a pygame screen visualizing the step. If "rgb_array", return an RGB array encoding of the rendered frame on each render call.

reset(*, seed=None, options=None)[source]

Reset the state and set/create a new interaction circuit.

To be used after an episode is finished.

Parameters:

seed (int | None) – Seed for the random number generator, should only be provided (optionally) on the first reset call i.e., before any learning is done.
options (Mapping[str, Any] | None) – Mapping with keyword arguments with additional options for the reset. Keywords can be found in the description of RoutingState.reset().

Return type:

tuple[dict[str, ndarray[Any, dtype[int32]]], dict[str, Any]]

Returns:

Initial observation and debugging info.

class qgym.envs.Scheduling(machine_properties, *, max_gates=200, dependency_depth=1, circuit_generator=None, rulebook=None, rewarder=None, render_mode=None)[source]

Bases: Environment[Dict[str, ndarray[Any, dtype[int32]] | ndarray[Any, dtype[int8]]], ndarray[Any, dtype[int32]]]

RL environment for the scheduling problem.

__init__(machine_properties, *, max_gates=200, dependency_depth=1, circuit_generator=None, rulebook=None, rewarder=None, render_mode=None)[source]

Initialize the action space, observation space, and initial states for the scheduling environment.

Parameters:

machine_properties (Mapping[str, Any] | str | MachineProperties) – A MachineProperties object, a Mapping containing machine properties or a string with a filename for a file containing the machine properties.
max_gates (int) – Maximum number of gates allowed in a circuit. Defaults to 200.
dependency_depth (int) – Number of dependencies given in the observation. Determines the shape of the dependencies observation, which has the shape (dependency_depth, max_gates). Defaults to 1.
circuit_generator (CircuitGenerator | None) – Generator class for generating circuits for training.
rulebook (CommutationRulebook | None) – CommutationRulebook describing the commutation rules. If None (default) is given, a default CommutationRulebook will be used. (See CommutationRulebook for more info on the default rules.)
rewarder (Rewarder | None) – Rewarder to use for the environment. If None (default), then a default BasicRewarder is used.
render_mode (str | None) – If "human" open a pygame screen visualizing the step. If "rgb_array", return an RGB array encoding of the rendered frame on each render call.

get_circuit(mode='human')[source]

Return the quantum circuit of this episode.

Parameters:: mode (str) – Choose from be "human" or "encoded". Defaults to "human".
Raises:: ValueError – If an unsupported mode is provided.
Return type:: list[Gate]
Returns:: Human or encoded quantum circuit.

reset(*, seed=None, options=None)[source]

Reset the state, action space and load a new (random) initial state.

To be used after an episode is finished.

Parameters:

seed (int | None) – Seed for the random number generator, should only be provided (optionally) on the first reset call, i.e., before any learning is done.
return_info – Whether to receive debugging info.
options (Mapping[str, Any] | None) – Mapping with keyword arguments with additional options for the reset. Keywords can be found in the description of SchedulingState.reset.
_kwargs – Additional options to configure the reset.

Return type:

tuple[dict[str, ndarray[Any, dtype[int32]] | ndarray[Any, dtype[int8]]], dict[str, Any]]

Returns:

Initial observation and debugging info.