qgym.envs.initial_mapping.initial_mapping_rewarders module
This module contains some vanilla Rewarders for the
InitialMapping
environment.
- Usage:
The rewarders in this module can be customized by initializing the rewarders with different values.
from qgym.envs.initial_mapping.initial_mapping_rewarders import BasicRewarder rewarder = BasicRewarder( illegal_action_penalty = -1, reward_per_edge = 5, penalty_per_edge: = -2, )
After initialization, the rewarders can be given to the
InitialMapping
environment.
Note
When implementing custom rewarders, they should inherit from
Rewarder
. Furthermore, they must implement the
compute_reward()
method. Which takes as input the old
state, the new state and the given action. See the documentation of the
initial_mapping
module for more information on the state and action space.
- class qgym.envs.initial_mapping.initial_mapping_rewarders.BasicRewarder(illegal_action_penalty=-100, reward_per_edge=5, penalty_per_edge=-1)[source]
Bases:
Rewarder
Basic rewarder for the
InitialMapping
environment.- __init__(illegal_action_penalty=-100, reward_per_edge=5, penalty_per_edge=-1)[source]
Initialize the reward range and set the rewards and penalties.
- Parameters:
illegal_action_penalty (
float
) – Penalty for performing an illegal action. An action is illegal if the action contains a virtual or physical qubit that has already been mapped. This value should be negative (but is not required) and defaults to -100.reward_per_edge (
float
) – Reward gained per ‘good’ edge in the interaction graph. An edge is ‘good’ if the mapped edge overlaps with an edge of the connection graph. This value should be positive (but is not required) and defaults to 5.penalty_per_edge (
float
) – Penalty given per ‘bad’ edge in the interaction graph. An edge is ‘bad’ if the edge is mapped and is not ‘good’. This value should be negative (but is not required) and defaults to -1.
- compute_reward(*, old_state, action, new_state)[source]
Compute a reward, based on the new state, and the given action.
Specifically the connection graph, interaction graphs and mapping are used.
- Parameters:
old_state (
InitialMappingState
) – State of theInitialMapping
before the current action.action (
ndarray
[Any
,dtype
[int32
]]) – Action that has just been taken.new_state (
InitialMappingState
) – Updated state of theInitialMapping
.
- Return type:
- Returns:
The reward for this action. If the action is illegal, then the reward is illegal_action_penalty. If the action is legal, then the reward is the total number of ‘good’ edges times reward_per_edge plus the total number of ‘bad’ edges times penalty_per_edge.
- class qgym.envs.initial_mapping.initial_mapping_rewarders.EpisodeRewarder(illegal_action_penalty=-100, reward_per_edge=5, penalty_per_edge=-1)[source]
Bases:
BasicRewarder
Rewarder for the
InitialMapping
environment, which only gives a reward at the end of the episode or when an illegal action is taken.- compute_reward(*, old_state, action, new_state)[source]
Compute a reward, based on the new state, and the given action.
Specifically the connection graph, interaction graphs and mapping are used.
- Parameters:
old_state (
InitialMappingState
) – State of theInitialMapping
before the current action.action (
ndarray
[Any
,dtype
[int32
]]) – Action that has just been taken.new_state (
InitialMappingState
) – Updated state of theInitialMapping
.
- Return type:
- Returns:
The reward for this action. If the action is illegal, then the reward is illegal_action_penalty. If the action is legal, but the mapping is not yet finished, then the reward is 0. If the action is legal, and the mapping is finished, then the reward is the number of ‘good’ edges times reward_per_edge plus the number of ‘bad’ edges times penalty_per_edge.
- class qgym.envs.initial_mapping.initial_mapping_rewarders.SingleStepRewarder(illegal_action_penalty=-100, reward_per_edge=5, penalty_per_edge=-1)[source]
Bases:
BasicRewarder
Rewarder for the
InitialMapping
environment, which gives a reward based on the improvement in the current step.- compute_reward(*, old_state, action, new_state)[source]
Compute a reward, based on the new state, and the given action.
Specifically the connection graph, interaction graphs and mapping are used.
- Parameters:
old_state (
InitialMappingState
) – State of theInitialMapping
before the current action.action (
ndarray
[Any
,dtype
[int32
]]) – Action that has just been taken.new_state (
InitialMappingState
) – Updated state of theInitialMapping
.
- Return type:
- Returns:
The reward for this action. If the action is illegal, then the reward is illegal_action_penalty. If the action is legal, then the reward is the number of ‘good’ edges times reward_per_edge plus the number of ‘bad’ edges times penalty_per_edge created by the this action.