qgym.envs.scheduling.scheduling_rewarders module
This module contains some vanilla Rewarders for the Scheduling
environment.
- Usage:
The rewarders in this module can be customized by initializing the rewarders with different values.
from qgym.envs.scheduling import BasicRewarder rewarder = BasicRewarder( illegal_action_penalty = -1, update_cycle_penalty = -2, schedule_gate_bonus: = 3, )
After initialization, the rewarders can be given to the
Scheduling
environment.
Note
When implementing custom rewarders, they should inherit from
Rewarder
. Furthermore, they must implement the
compute_reward()
method. Which takes as input the old
state, the new state and the given action. See the documentation of the
scheduling
module for more information on the state and
action space.
- class qgym.envs.scheduling.scheduling_rewarders.BasicRewarder(illegal_action_penalty=-5.0, update_cycle_penalty=-1.0, schedule_gate_bonus=0.0)[source]
Bases:
Rewarder
Basic rewarder for the
Scheduling
environment.- __init__(illegal_action_penalty=-5.0, update_cycle_penalty=-1.0, schedule_gate_bonus=0.0)[source]
Initialize the reward range and set the rewards and penalties.
- Parameters:
illegal_action_penalty (
float
) – Penalty for performing an illegal action. An action is illegal ifaction[0]
is not instate["legal_actions"]
. This value should be negative (but is not required) and defaults to -5.update_cycle_penalty (
float
) – Penalty given for incrementing a cycle. Since theScheduling
environment wants to create the shortest schedules, incrementing the cycle should be penalized. This value should be negative (but is not required) and defaults to -1.schedule_gate_bonus (
float
) – Reward gained for successfully scheduling a gate. This value should be positive (but is not required) and defaults to 0.
- compute_reward(*, old_state, action, new_state)[source]
Compute a reward, based on the new state, and the given action. Specifically the ‘legal_actions’ actions array.
- Parameters:
old_state (
SchedulingState
) – State of theScheduling
environment before the current action.action (
ndarray
[Any
,dtype
[int32
]]) – Action that has just been taken.new_state (
SchedulingState
) – Updated state of theScheduling
environment.
- Return type:
- Returns:
The reward for this action. If the action is illegal, then the reward is illegal_action_penalty. If the action is legal, and increments the cycle, then the reward is update_cycle_penalty. Otherwise, the reward is schedule_gate_bonus.
- class qgym.envs.scheduling.scheduling_rewarders.EpisodeRewarder(illegal_action_penalty=-5.0, update_cycle_penalty=-1.0)[source]
Bases:
Rewarder
Rewarder for the
Scheduling
environment, which only gives a reward at the end of the episode or when an illegal action is taken.- __init__(illegal_action_penalty=-5.0, update_cycle_penalty=-1.0)[source]
Initialize the reward range and set the rewards and penalties.
- Parameters:
illegal_action_penalty (
float
) – Penalty for performing an illegal action. An action is illegal ifaction[0]
is not instate["legal_actions"]
. This value should be negative (but is not required) and defaults to -5.update_cycle_penalty (
float
) – Penalty given for incrementing a cycle. Since theScheduling
environment wants to create the shortest schedules, incrementing the cycle should be penalized. This value should be negative (but is not required) and defaults to -1.
- compute_reward(*, old_state, action, new_state)[source]
Compute a reward, based on the new state, and the given action.
- Parameters:
old_state (
SchedulingState
) – State of theScheduling
environment before the current action.action (
ndarray
[Any
,dtype
[int32
]]) – Action that has just been taken.new_state (
SchedulingState
) – Updated state of theScheduling
environment.
- Return type:
- Returns:
The reward for this action. If the action is illegal, then the reward is illegal_action_penalty. If the action is legal, but the episode is not yet done, then the reward is 0. Otherwise, the reward is update_cycle_penalty`x`current cycle.