qgym.templates package

This package contains templates used from building custom environments.

All Environments should inherit from Environment and should contain a rewarder, state and visualiser which inherit from the base classes Rewarder, State and Visualiser respectively.

class qgym.templates.Environment[source]

Bases: Env[ObservationT, ActionT]

RL Environment containing the current state of the problem.

Each subclass should set at least the following attributes:

action_space: Space[Any]: The action space of this environment.

close()[source]

Close the screen used for rendering.

Return type:: None

observation_space: Space[Any]: The observation space of this environment.

render()[source]

Render the current state using pygame.

Return type:: None | ndarray[Any, dtype[int32]]
Returns:: Result of rendering.

abstract reset(*, seed=None, options=None)[source]

Reset the environment and load a new random initial state.

To be used after an episode is finished. Optionally, one can provide additional options to configure the reset.

Parameters:

seed (int | None) – Seed for the random number generator, should only be provided (optionally) on the first reset call, i.e., before any learning is done.
options (Mapping[str, Any] | None) – Dictionary containing keyword-argument pairs to configure the reset.

Return type:

tuple[TypeVar(ObservationT), dict[str, Any]]

Returns:

Initial observation and a dictionary containing debugging information.

property rewarder: Rewarder

Return the rewarder that is set for this environment.

Used to compute rewards after each step.

property rng: Generator

Return the random number generator of this environment.

If none is set yet, this will generate a new one using numpy.random.default_rng.

step(action)[source]

Update the state based on the input action.

Return observation, reward, done-indicator, terminated-indicator and debugging info based on the updated state.

Parameters:

action (TypeVar(ActionT)) – Action to be performed.

Return type:

tuple[TypeVar(ObservationT), float, bool, bool, dict[Any, Any]]

Returns:

A tuple containing five entries

The updated state;
Reward for the given action;
Boolean value stating whether the new state is a final state (i.e., if we are done);
Boolean value stating whether the episode is truncated.
Additional (debugging) information.

class qgym.templates.Rewarder[source]

Bases: object

RL Rewarder, for computing rewards on a state.

__eq__(other)[source]

Checks whether an object ‘other’ is equal to self.

This check is performed by checking of the self and other are of exactly the same type. Afterwards, all slots (if any) are equal and if all attributes are equal.

Return type:: bool
Returns:: Boolean stating wether other is equal to self.

abstract compute_reward(*, old_state, action, new_state)[source]

Compute a reward, based on the old state, new state, and the given action.

Parameters:

old_state (Any) – State of the Environment before the current action.
action (Any) – Action that has just been taken.
new_state (Any) – Updated state of the Environment.

Return type:

float

Returns:

The reward for this action.

property reward_range: tuple[float, float]: Reward range of the rewarder. I.e., range that rewards can lie in.

class qgym.templates.State[source]

Bases: Generic[ObservationT, ActionT]

RL State containing the current state of the problem.

abstract create_observation_space()[source]

Create the corresponding observation space.

Return type:: Space[Any]

abstract is_done()[source]

Boolean value stating whether we are in a final state.

Return type:: bool

is_truncated()[source]

Boolean value stating whether the episode is truncated.

Return type:: bool

abstract obtain_info()[source]

Optional debugging info for the current state.

Return type:: dict[Any, Any]

abstract obtain_observation()[source]

Observation based on the current state.

Return type:: TypeVar(ObservationT)

abstract reset(*, seed=None, **_kwargs)[source]

Reset the state.

Return type:: State[TypeVar(ObservationT), TypeVar(ActionT)]
Returns:: Self.

property rng: Generator

Return the random number generator of this environment. If none is set yet, this will generate a new one using numpy.random.default_rng.

Returns:: Random number generator used by this Environment.

seed(seed=None)[source]

Seed the rng of this space, using numpy.random.default_rng.

Parameters:: seed (int | None) – Seed for the rng. Defaults to None
Return type:: list[int | None]
Returns:: The used seeds.

steps_done: int: Number of steps done since the last reset.

abstract update_state(action)[source]

Update the state of this Environment using the given action.

Parameters:: action (TypeVar(ActionT)) – Action to be executed.
Return type:: State[TypeVar(ObservationT), TypeVar(ActionT)]
Returns:: Self.

class qgym.templates.Visualiser(render_mode, *args)[source]

Bases: object

Visualizer for the the current state of the problem.

close()[source]

Close the screen used for rendering.

Return type:: None

property colors: dict[str, Tuple[int, int, int]]: Dict containing names color pairs.

property font: dict[str, Font]: Dict containing str Font pairs.

property is_open: bool: Boolean value stating whether a pygame.screen is currently open.

abstract render(state)[source]

Render the current state using pygame.

Return type:: None | ndarray[Any, dtype[int32]]

render_data: RenderData

property screen: Surface: Main screen to draw on.

property screen_height: int: Screen height of the main screen.

property screen_width: int: Screen width of the main screen.

step(state)[source]

To be used during a step of the environment.

Renders the display if render_mode is ‘human’, does nothing otherwise.

Parameters:: state (Any) – State to render if render_mode is ‘human’.
Return type:: None