:py:mod:`multirotor.env`
========================

.. py:module:: multirotor.env

.. autoapi-nested-parse::

   This module defines OpenAI Gym compatible classes based on the Multirotor class.


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   multirotor.env.BaseMultirotorEnv
   multirotor.env.DynamicsMultirotorEnv
   multirotor.env.SpeedsMultirotorEnv


.. py:class:: BaseMultirotorEnv(vehicle: multirotor.simulation.Multirotor = None, seed: int = None)


   Bases: :py:obj:`gym.Env`

   The base environment class, defining the episode, and reward function.

   .. py:property:: state
      :type: numpy.ndarray


   .. py:attribute:: max_angle

      The max tilt angle in radians.


   .. py:attribute:: proximity
      :value: 0.5

      Distance from the waypoint at which to consider it has been reached.


   .. py:attribute:: period
      :value: 10

      Maximum duration of the episode (seconds).


   .. py:attribute:: bounding_box
      :value: 20

      Size of the cube in which the vehicle can fly, centered at origin.


   .. py:attribute:: motion_reward_scaling

      
   .. py:attribute:: bonus

      
   .. py:method:: seed(seed: int = None, _seed_with_none: bool = False) -> List[Union[int, tuple]]


   .. py:method:: reset(x: numpy.ndarray = None) -> numpy.ndarray

      Reset the vehicle to a random initial position.

      Parameters
      ----------
      x : np.ndarray, optional
          A state to set the vehicle to, by default None

      Returns
      -------
      np.ndarray
          The state vector of the vehicle.


   .. py:method:: reward(state: numpy.ndarray, action: numpy.ndarray, nstate: numpy.ndarray) -> float


   .. py:method:: step(action: numpy.ndarray) -> tuple[numpy.ndarray, float, bool, dict]
      :abstractmethod:

      Run one timestep of the environment's dynamics.

      When end of episode is reached, you are responsible for calling :meth:`reset` to reset this environment's state.
      Accepts an action and returns either a tuple `(observation, reward, terminated, truncated, info)`.

      Args:
          action (ActType): an action provided by the agent

      Returns:
          observation (object): this will be an element of the environment's :attr:`observation_space`.
              This may, for instance, be a numpy array containing the positions and velocities of certain objects.
          reward (float): The amount of reward returned as a result of taking the action.
          terminated (bool): whether a `terminal state` (as defined under the MDP of the task) is reached.
              In this case further step() calls could return undefined results.
          truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.
              Typically a timelimit, but could also be used to indicate agent physically going out of bounds.
              Can be used to end the episode prematurely before a `terminal state` is reached.
          info (dictionary): `info` contains auxiliary diagnostic information (helpful for debugging, learning, and logging).
              This might, for instance, contain: metrics that describe the agent's performance state, variables that are
              hidden from observations, or individual reward terms that are combined to produce the total reward.
              It also can contain information that distinguishes truncation and termination, however this is deprecated in favour
              of returning two booleans, and will be removed in a future version.

          (deprecated)
          done (bool): A boolean value for if the episode has ended, in which case further :meth:`step` calls will return undefined results.
              A done signal may be emitted for different reasons: Maybe the task underlying the environment was solved successfully,
              a certain timelimit was exceeded, or the physics simulation has entered an invalid state.


.. py:class:: DynamicsMultirotorEnv(vehicle: multirotor.simulation.Multirotor = None, allocate: bool = False, max_rads: float = np.inf)


   Bases: :py:obj:`BaseMultirotorEnv`

   The base environment class, defining the episode, and reward function.

   .. py:method:: step(action: numpy.ndarray, disturb_forces: numpy.ndarray = 0.0, disturb_torques: numpy.ndarray = 0.0) -> Tuple[numpy.ndarray, float, bool, dict]

      Step environment by providing dynamics acting in local frame.

      Parameters
      ----------
      action : np.ndarray
          An array of x,y,z forces and x,y,z torques in local frame.
      disturb_forces : np.ndarray, optional
          Disturbinng x,y,z forces in the vehicle's local frame, by default 0.
      disturb_torques : np.ndarray, optional
          Disturbing x,y,z torques in the vehicle's local frame, by default 0.

      Returns
      -------
      Tuple[np.ndarray, float, bool, dict]
          The state and other environment variables.


.. py:class:: SpeedsMultirotorEnv(vehicle: multirotor.simulation.Multirotor)


   Bases: :py:obj:`BaseMultirotorEnv`

   A multirotor environment that uses speed signals as action inputs. The speed
   signals can be one of two kinds:

     1. Actual speeds (rad/s).

       a. If the multirotor's propellers have `Motor` instances, then the 
       `MotorParams.speed_voltage_scaling` parameter should be provided. It 
       converts speed to voltage signal used for speed calculations. The 
       `helpers.learn_speed_voltage_scaling` function can be used for this.

       b. Else, if the propellers do not have a motor, then the parameter
       need not be provided.

     2. Voltage signals (V). In this case, propellers should have a `Motor` instance
     with the `speed_voltage_scaling` parameter equal to 1 (no need to learn it,
     since voltage is already being given).

   .. py:method:: step(action: numpy.ndarray, disturb_forces: numpy.ndarray = 0.0, disturb_torques: numpy.ndarray = 0.0) -> Tuple[numpy.ndarray, float, bool, dict]

      Step environment by providing speed signal.

      Parameters
      ----------
      action : np.ndarray
          An array of speed signals.
      disturb_forces : np.ndarray, optional
          Disturbinng x,y,z forces in the velicle's local frame, by default 0.
      disturb_torques : np.ndarray, optional
          Disturbing x,y,z torques in the vehicle's local frame, by default 0.

      Returns
      -------
      Tuple[np.ndarray, float, bool, dict]
          The state and other environment variables.