qgym.templates.environment module

Generic abstract base class for RL environments.

All environments should inherit from Environment.

class qgym.templates.environment.Environment[source]

Bases: Env[ObservationT, ActionT]

RL Environment containing the current state of the problem.

Each subclass should set at least the following attributes:

action_space: Space[Any]

The action space of this environment.

close()[source]

Close the screen used for rendering.

Return type:

None

observation_space: Space[Any]

The observation space of this environment.

render()[source]

Render the current state using pygame.

Return type:

None | ndarray[Any, dtype[int32]]

Returns:

Result of rendering.

abstract reset(*, seed=None, options=None)[source]

Reset the environment and load a new random initial state.

To be used after an episode is finished. Optionally, one can provide additional options to configure the reset.

Parameters:
  • seed (int | None) – Seed for the random number generator, should only be provided (optionally) on the first reset call, i.e., before any learning is done.

  • options (Mapping[str, Any] | None) – Dictionary containing keyword-argument pairs to configure the reset.

Return type:

tuple[TypeVar(ObservationT), dict[str, Any]]

Returns:

Initial observation and a dictionary containing debugging information.

property rewarder: Rewarder

Return the rewarder that is set for this environment.

Used to compute rewards after each step.

property rng: Generator

Return the random number generator of this environment.

If none is set yet, this will generate a new one using numpy.random.default_rng.

step(action)[source]

Update the state based on the input action.

Return observation, reward, done-indicator, terminated-indicator and debugging info based on the updated state.

Parameters:

action (TypeVar(ActionT)) – Action to be performed.

Return type:

tuple[TypeVar(ObservationT), float, bool, bool, dict[Any, Any]]

Returns:

A tuple containing five entries

  1. The updated state;

  2. Reward for the given action;

  3. Boolean value stating whether the new state is a final state (i.e., if we are done);

  4. Boolean value stating whether the episode is truncated.

  5. Additional (debugging) information.