Reinforcement Learning Environment Tutorial (Python 3)
In this tutorial, we will create a Reinforcement Learning environment similar to OpenAI Gym Pendulum-v0. We will use a Vortex Studio model of the inverted pendulum, which is a part connected to a reference frame (static part) using a Hinge constraint. This Environment will be compatible with a Keras DDPG (Deep Deterministic Policy Gradient) Agent. The training algorithm is already coded, so we need to create an Environment (env class) that interfaces the AI Agent with the Vortex simulation.
Name | Location |
|---|---|
Reinforcement Learning Environment | <Vortex Studio Installation Folder>\tutorials\Python\Vortex\Python\Vortex\PyLearningEnvironment |
Looking at the file structure
Before we begin, let's look at the files available in the PyLearningEnvironment folder.
Vortex Resources Folder
This folder contains Setup.vxc. It's a barebones application setup with an Engine module and a Debugger Window.
In "Pendulum", you will find also the Pendulum Mechanism, Assembly, Graphics Gallery and embedded scripts. Open Pendulum.vxmechanism with the Vortex Editor. You will notice there are two VHL Interfaces, "RL Interface" and "Settings".
In "Settings", you'll be able to modify the pendulum mass and color.
"RL Interface" will be used to send and receive data to and from the DDPG Agent. It outputs observations cos, sin and speed (ω), which are derived from the rotational position and speed of the pendulum (see image below). It inputs torque (τ) , which is the added torque on the hinge, and the only output from the DDPG Agent. episode and reward are there solely for the HUD.
In this example, we are rewarding the Agent for maximizing -cos (standing upwards) and minimizing abs(sin) and speed (to prevent the Agent from spinning really fast). This reward is updated each time the algorithm extract observations, and the total reward is then updated to our RL Interface VHL. In the Vortex mechanism, this is only used to display on the HUD.
An Episode is a run of the simulation, stopped either by a timeout or by a critical event (like a collision).
models.py
Script that contains the actor and critic models. These models use Tensorflow Keras libraries, more details can be found here.
train_ddpg.py
This python script contains the main loop that trains the model. This code is heavily inspired by the example provided by Keras for a DDPG algorithm. You can refer to it here. You'll notice that this script imports the env class from environment, which is the Vortex Studio Gym-like environment we're going to be building.
If you want to find out more about Reinforcement Learning, we highly recommend the machine learning tutorials by Sentdex.
run_ddpg.py
This script lets you load a saved model and run it, without any training or noise.
environment.py
The gym environment script that will create a Vortex Studio Application and load the Pendulum and calculate the reward. For now, this script file is empty and will be the main focus of this tutorial.
environment_solution.py
This file contains the solution for this tutorial. You can simply rename it to environment.py and skip this tutorial to get the solution. But what's the fun in that?
Setting up the Python 3 Interpreter.
Install a Python 3.8.6 64-bit distribution.
Add this Environment Variable to your System variables:
PYTHONPATH="C:\CM Labs\Vortex Studio <version>\bin"
Where <version> is your installed version (ex. 2020b).
Install the following packages to your interpreter:
Tensorflow
Numpy
Matplotlib
Open "Resources/Setup.vxc".
Select Setup in the Explorer to access its properties.
In the Python 3 section of its Parameters, change the Interpreter Directory to your Python 3 interpreter folder. By default, Python installs in "C:/Users/<user>/AppData/Local/Programs/Python/Python38".
Defining the env class, its constructor and destructor
Open environment.py
Add imports.
import Vortex import vxatp3 import numpy as npWe import vxatp3, which is the Vortex Automated Test Platform library. It's a toolset designed to allow automated testing of Vortex, which is very useful in the case of Reinforcement Learning, where you have to load and reload the same environment over and over. This library is included when installing Vortex to you Python 3.8 Interpreter.
Next, we'll create the env class and init function. At this step, we'll define the setup file and the mechanism to be loaded. We'll also create the Vortex Application and create a 3D display.
class env(): def __init__(self): # VxMechanism variable for the mechanism to be loaded. self.vxmechanism = None self.mechanism = None self.interface = None # Define the setup and mechanism file paths self.setup_file = 'Resources/Setup.vxc' self.content_file = 'Resources/Pendulum/Pendulum.vxmechanism' # Create the Vortex Application self.application = vxatp3.VxATPConfig.createApplication(self, 'Pendulum App', self.setup_file) # Create a display window self.display = Vortex.VxExtensionFactory.create(Vortex.DisplayICD.kExtensionFactoryKey) self.display.getInput(Vortex.DisplayICD.kPlacementMode).setValue("Windowed") self.display.setName('3D Display') self.display.getInput(Vortex.DisplayICD.kPlacement).setValue(Vortex.VxVector4(50, 50, 1280, 720))Still in the __init__ function, we need to define the observations and actions spaces. This code defines the shape and expected ranges of actions and observations. It is very similar than what can be found in Pendulum-v0. You might notice that the torque output from the model is [-1, 1] N.m, which is very low compared to what it takes to even move the pendulum. This is because Neural Networks work better when their outputs vary between -1 and 1, so this value will be multiplied later in the code.
# Initialize Action and Observation Spaces for the NN self.max_speed = 8.0 self.max_torque = 1.0 high = np.array([1., 1., self.max_speed]) self.action_space = np.array([-self.max_torque, self.max_torque, (1,)]) self.observation_space = np.array([-high, high])Finally we define a simple destructor method to handle the cleanup of the Vortex application.
def __del__(self): # It is always a good idea to destroy the VxApplication when we are done with it. self.application = None