qgym.envs.scheduling.scheduling_rewarders module

This module contains some vanilla Rewarders for the Scheduling environment.

Usage:

The rewarders in this module can be customized by initializing the rewarders with different values.

from qgym.envs.scheduling import BasicRewarder

rewarder = BasicRewarder(
    illegal_action_penalty = -1,
    update_cycle_penalty = -2,
    schedule_gate_bonus: = 3,
    )

After initialization, the rewarders can be given to the Scheduling environment.

Note

When implementing custom rewarders, they should inherit from Rewarder. Furthermore, they must implement the compute_reward() method. Which takes as input the old state, the new state and the given action. See the documentation of the scheduling module for more information on the state and action space.

class qgym.envs.scheduling.scheduling_rewarders.BasicRewarder(illegal_action_penalty=-5.0, update_cycle_penalty=-1.0, schedule_gate_bonus=0.0)[source]

Bases: Rewarder

Basic rewarder for the Scheduling environment.

__init__(illegal_action_penalty=-5.0, update_cycle_penalty=-1.0, schedule_gate_bonus=0.0)[source]

Initialize the reward range and set the rewards and penalties.

Parameters:

illegal_action_penalty (float) – Penalty for performing an illegal action. An action is illegal if action[0] is not in state["legal_actions"]. This value should be negative (but is not required) and defaults to -5.
update_cycle_penalty (float) – Penalty given for incrementing a cycle. Since the Scheduling environment wants to create the shortest schedules, incrementing the cycle should be penalized. This value should be negative (but is not required) and defaults to -1.
schedule_gate_bonus (float) – Reward gained for successfully scheduling a gate. This value should be positive (but is not required) and defaults to 0.

compute_reward(*, old_state, action, new_state)[source]

Compute a reward, based on the new state, and the given action. Specifically the ‘legal_actions’ actions array.

Parameters:

old_state (SchedulingState) – State of the Scheduling environment before the current action.
action (ndarray[Any, dtype[int32]]) – Action that has just been taken.
new_state (SchedulingState) – Updated state of the Scheduling environment.

Return type:

float

Returns:

The reward for this action. If the action is illegal, then the reward is illegal_action_penalty. If the action is legal, and increments the cycle, then the reward is update_cycle_penalty. Otherwise, the reward is schedule_gate_bonus.

class qgym.envs.scheduling.scheduling_rewarders.EpisodeRewarder(illegal_action_penalty=-5.0, update_cycle_penalty=-1.0)[source]

Bases: Rewarder

Rewarder for the Scheduling environment, which only gives a reward at the end of the episode or when an illegal action is taken.

__init__(illegal_action_penalty=-5.0, update_cycle_penalty=-1.0)[source]

Initialize the reward range and set the rewards and penalties.

Parameters:

illegal_action_penalty (float) – Penalty for performing an illegal action. An action is illegal if action[0] is not in state["legal_actions"]. This value should be negative (but is not required) and defaults to -5.
update_cycle_penalty (float) – Penalty given for incrementing a cycle. Since the Scheduling environment wants to create the shortest schedules, incrementing the cycle should be penalized. This value should be negative (but is not required) and defaults to -1.

compute_reward(*, old_state, action, new_state)[source]

Compute a reward, based on the new state, and the given action.

Parameters:

old_state (SchedulingState) – State of the Scheduling environment before the current action.
action (ndarray[Any, dtype[int32]]) – Action that has just been taken.
new_state (SchedulingState) – Updated state of the Scheduling environment.

Return type:

float

Returns:

The reward for this action. If the action is illegal, then the reward is illegal_action_penalty. If the action is legal, but the episode is not yet done, then the reward is 0. Otherwise, the reward is update_cycle_penalty`x`current cycle.