qgym.envs.initial_mapping.initial_mapping_rewarders module

This module contains some vanilla Rewarders for the InitialMapping environment.

Usage:

The rewarders in this module can be customized by initializing the rewarders with different values.

from qgym.envs.initial_mapping.initial_mapping_rewarders import BasicRewarder

rewarder = BasicRewarder(
    illegal_action_penalty = -1,
    reward_per_edge = 5,
    penalty_per_edge: = -2,
    )

After initialization, the rewarders can be given to the InitialMapping environment.

Note

When implementing custom rewarders, they should inherit from Rewarder. Furthermore, they must implement the compute_reward() method. Which takes as input the old state, the new state and the given action. See the documentation of the initial_mapping module for more information on the state and action space.

class qgym.envs.initial_mapping.initial_mapping_rewarders.BasicRewarder(illegal_action_penalty=-100, reward_per_edge=5, penalty_per_edge=-1)[source]

Bases: Rewarder

Basic rewarder for the InitialMapping environment.

__init__(illegal_action_penalty=-100, reward_per_edge=5, penalty_per_edge=-1)[source]

Initialize the reward range and set the rewards and penalties.

Parameters:

illegal_action_penalty (float) – Penalty for performing an illegal action. An action is illegal if the action contains a virtual or physical qubit that has already been mapped. This value should be negative (but is not required) and defaults to -100.
reward_per_edge (float) – Reward gained per ‘good’ edge in the interaction graph. An edge is ‘good’ if the mapped edge overlaps with an edge of the connection graph. This value should be positive (but is not required) and defaults to 5.
penalty_per_edge (float) – Penalty given per ‘bad’ edge in the interaction graph. An edge is ‘bad’ if the edge is mapped and is not ‘good’. This value should be negative (but is not required) and defaults to -1.

compute_reward(*, old_state, action, new_state)[source]

Compute a reward, based on the new state, and the given action.

Specifically the connection graph, interaction graphs and mapping are used.

Parameters:

old_state (InitialMappingState) – State of the InitialMapping before the current action.
action (ndarray[Any, dtype[int32]]) – Action that has just been taken.
new_state (InitialMappingState) – Updated state of the InitialMapping.

Return type:

float

Returns:

The reward for this action. If the action is illegal, then the reward is illegal_action_penalty. If the action is legal, then the reward is the total number of ‘good’ edges times reward_per_edge plus the total number of ‘bad’ edges times penalty_per_edge.

class qgym.envs.initial_mapping.initial_mapping_rewarders.EpisodeRewarder(illegal_action_penalty=-100, reward_per_edge=5, penalty_per_edge=-1)[source]

Bases: BasicRewarder

Rewarder for the InitialMapping environment, which only gives a reward at the end of the episode or when an illegal action is taken.

compute_reward(*, old_state, action, new_state)[source]

Compute a reward, based on the new state, and the given action.

Specifically the connection graph, interaction graphs and mapping are used.

Parameters:

old_state (InitialMappingState) – State of the InitialMapping before the current action.
action (ndarray[Any, dtype[int32]]) – Action that has just been taken.
new_state (InitialMappingState) – Updated state of the InitialMapping.

Return type:

float

Returns:

The reward for this action. If the action is illegal, then the reward is illegal_action_penalty. If the action is legal, but the mapping is not yet finished, then the reward is 0. If the action is legal, and the mapping is finished, then the reward is the number of ‘good’ edges times reward_per_edge plus the number of ‘bad’ edges times penalty_per_edge.

class qgym.envs.initial_mapping.initial_mapping_rewarders.SingleStepRewarder(illegal_action_penalty=-100, reward_per_edge=5, penalty_per_edge=-1)[source]

Bases: BasicRewarder

Rewarder for the InitialMapping environment, which gives a reward based on the improvement in the current step.

compute_reward(*, old_state, action, new_state)[source]

Compute a reward, based on the new state, and the given action.

Specifically the connection graph, interaction graphs and mapping are used.

Parameters:

old_state (InitialMappingState) – State of the InitialMapping before the current action.
action (ndarray[Any, dtype[int32]]) – Action that has just been taken.
new_state (InitialMappingState) – Updated state of the InitialMapping.

Return type:

float

Returns:

The reward for this action. If the action is illegal, then the reward is illegal_action_penalty. If the action is legal, then the reward is the number of ‘good’ edges times reward_per_edge plus the number of ‘bad’ edges times penalty_per_edge created by the this action.