GitHub

A State-of-the-art Distributed Open Source Framework.

About • Hello World • Framework Architecture • Installation • Personalization • Algorithm • Credits • Support • License

💡 About

The main objective of our open-source framework wiseRL is facilitate distributed reinforcement learning algorithms by implementing distributed data sampling and centralized training. It serves to elucidate the principles of interpret the principles of [Reinforcement Learning](Reinforcement Learning (mit.edu)).

Reinforcement learning is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives while interacting with a complex, uncertain environment.
-- Sutton and Barto

In the main function, WiseRL is utilized to create and manage agents for single or multi-agent algorithms. Parallel processing is implemented using the Ray library for training, and multi-processing is used for interaction with the environment. This framework combines reinforcement learning algorithm implementation with Ray's parallel processing capabilities and multi-processing, simplifying the development and experimentation of distributed reinforcement learning algorithms.

Hello World

In this section, we'll use "Hello World" to guide you through WiseRL tasks and actors, as well as how to work with instance objects.

Getting Started

The first step is to import and initialize Ray:

import ray
ray.init()

Running a task

WiseRL lets you run functions as remote tasks in the cluster. To do this, you decorate your function with @ray.remote to declare that you want to run this function remotely. Then, you call that function with .remote() instead of calling it normally. This remote call returns a future, a so-called Ray object reference, that you can then fetch with ray.get:

# Define a print task.
@ray.remote
def run(x):
	return "Hello World"

# Launch three parallel print tasks.
rst = [run.remote(i) for i in range(3)]

# Retrieve results.
print(ray.get(rst))
# -> ["Hello World", "Hello World", "Hello World"]

Calling a Runner

Ray provides actors to allow you to parallelize computation across multiple actor instances. When you instantiate a class that is a Ray actor, Ray will start a remote instance of that class in the cluster. This actor can then execute remote method calls and maintain its own internal state:

# Define the GymRunner actor.
@ray.remote
class GymRunner:
    def __init__(self):
        self.i = 0

    def run(self, value):
        self.i += value
        return "Hello World " + str(self.i) 

# Create a GymRunner actor.
g = GymRunner.remote()

# Submit calls to the actor. These calls run asynchronously but in 
# submission order on the remote actor process.
for _ in range(100):
    g.run.remote(1)

# Retrieve final actor state.
print(ray.get(g.run.remote()))
# -> Hello World 100

🏛️ Installation

Downloading and installing steps:

At the Terminal

Initialize your Python interpreter with a version of Python >= 3.7 and install the corresponding PyTorch.
```
conda create -n wise python=3.8
conda activate wise
```
You can choose either the CPU version of PyTorch or the CUDA version of PyTorch, depending on your device.

Follow the installation instructions for PyTorch here.
Enter the command:

git clone https://github.com/wiseworker/wiseRL.git

[!NOTE]

Remember to turn off your vpn when using command 'git clone'

Or you can choose

Download the latest version of wiseRL.
Open the archive and extract the contents of the wiseRL-main folder into the root path folder: wiseRL-main/

Then

enter the command at the terminal

cd wiseRL
pip install pettingzoo pygame
pip install -e . # run setup.py
cd example
ray start --head --port=6379
python ddpg.py

Nanny Tutorial

You have the freedom to develop your own algorithms within our framework.

Start with GymRunner

Initialization

Start by adding a new algorithm folder in the example directory
- create a .py file within it to serve as the main function.
- In the GymRunner initialization file, create the environment (env):
```
self.env = gym.make("Acrobot-v1")
```
  You can obtain environment information using APIs like:
```
self.env.observation_space.shape[0] # dimension of the observation space
self.env.action_space.shape[0]      # dimension of the action space
self.env.action_space.high[0]		# maximum value of the action space
```
  These may vary depending on the environment wrapper. You can also customize your own environment.
- For improved sampling efficiency, consider wrapping the environment for multiprocessing to enhance the interaction and collection of environment information.
  
  If your algorithm utilizes a Replay Buffer, be mindful of the dimension of the information collected, especially due to the parallel sampling of environments.
Create your agent and get its information
- Use the make_agent function to create an agent when self.rank == 0.
```
if self.rank == 0:
	wise_rl.make_agent(name='ddpg_agent', agent_class=DDPGAgent, sync=True, **vars(self.config))
```
  Here, you need to pass along several pieces of information, including the agent class (DDPGAgent, which will be discussed later), a synchronization or asynchrony argument, and all hyperparameters as a dictionary (**vars). This will be beneficial for your usage in the Agent class.
  
  We create an agent only once and retrieve its information in every process.
- Then use get_agent to access the agent's model information.
```
self.agent = wise_rl.getAgent('ddpg_agent')
```
  This self.agent can be directly invoked later on.

Run

Initialize the Training Environment for Each Episode

Each new episode starts with resetting the environment and obtaining the initial state.
```
state = self.env.reset()[0]
```
This step ensures that each episode begins from a fresh starting point in the environment, allowing the agent to learn from a diverse range of situations.
Running the Training Loop

Within each episode, perform the following steps in a loop until the episode ends or the maximum number of steps (max_step) is reached:
```
for t in range(self.config.max_step):
	action = self.agent.choose_action(state)
  next_state, reward, done, info, _ = self.env.step(action)
  ep_r += reward
  self.agent.update(state, action, reward, next_state, done)
  state = next_state
  if done or t == self.config.max_step - 1:
    break
```
- Action Selection: The agent selects an action based on the current state.
- Environment Interaction: The agent performs the action in the environment, which returns the next state, reward, and a boolean indicating whether the episode has ended (done).
- Agent Update: The agent updates its policy based on the transition (current state, action, reward, next state).
- State Transition: The current state is updated to the next state for the next iteration.
- Episode Termination: The loop breaks if the episode ends (done is True) or the maximum number of steps is reached.
Monitoring Progress

After each episode, update the training information and periodically report progress:
```
print("Episode:{}--train_info:{}".format(i, np.mean(train_info[-20:])))
plt_list.append(np.mean(train_info[-20:]))
```
- After each episode, the average reward of the last 20 episodes is printed. This helps monitor the training progress and evaluate the agent's performance over time.
- The average rewards are appended to a list (plt_list) for plotting the training progress graphically.

Make your own Agent Class

Initialization

To begin, navigate to the wiserl/agent directory. Here, you'll create a new agent class that inherits from the base Agent class. This base class should support both synchronous and asynchronous model parameter updates, which are crucial for distributed learning scenarios.

Create the DDPGAgent Class:
```
class DDPGAgent(Agent):
	def __init__(self, sync=True, **kwargs):
  	super().__init__(sync)
    self.__dict__.update(kwargs)
```
In this class:
- The constructor initializes the agent, optionally synchronizing model parameters.
- self.__dict__.update(kwargs) allows for flexible parameter initialization, accommodating any number of additional keyword arguments.
Initialize Components:
```
self.actor = make_actor_net()
self.critic = make_critic_net()
self.replay_buffer = ReplayBuffer(self.buffer_size, self.state_dim, self.action_dim)
self.actor_optimizer = optim.Adam(self.actor.parameters(), lr=self.lr_a)
self.critic_optimizer = optim.Adam(self.critic.parameters(), lr=self.lr_c)
```
Here, you initialize the actor and critic networks, a replay buffer for storing experiences, and optimizers for training the networks. Custom network definitions should be placed in the net.py file.
Choose Action

For selecting actions based on the current state:
```
state = torch.FloatTensor(state).unsqueeze(0).to(device)
action = self.actor(state)
return action.detach().cpu().numpy()[0]
```
This method transforms the current state into a tensor, processes it through the actor network to obtain the action, and then returns the action as a NumPy array.
Update

The update method is where the core of the DDPG algorithm's learning logic is implemented. It generally involves sampling transitions from the replay buffer and performing gradient updates on both the actor and critic networks.
- Sample Transitions:
Start by sampling a batch of transitions from the replay buffer.
- Algorithm Customization:
Implement the specific learning updates for the DDPG algorithm, including calculating the loss for both the actor and critic networks and performing backpropagation.
- Synchronize Models (if using asynchronous updates):
```
def _sync_model(self):
  param = self.actor.state_dict()
  if device.type != "cpu":
  	for name, mm in param.items():
    	param[name]= mm.cpu()
  self._fire(param)
```
This method synchronizes the model parameters across different instances or nodes, ensuring consistency in distributed setups.

Main

Hyperparameter

The argparse library in Python is commonly used for handling command-line arguments. It's particularly useful in machine learning projects for setting and adjusting hyperparameters externally without modifying the source code.

First, initialize the Parser:

parser = argparse.ArgumentParser("Hyperparameter Setting for DDPG")
parser.add_argument("--net_dims", default=(256, 128), help="The number of neurons in hidden layers of the neural network")

In this snippet, we initialize an ArgumentParser object with a description. Then, we add a hyperparameter --net_dims that expects two integers, representing the number of neurons in the hidden layers of a neural network. The default=(256, 128) specifies default values if none are provided externally.

Then:

args = parser.parse_args()

Here, parse_args() processes the command-line arguments. The args object now has an attribute net_dims, accessible via args.net_dims, containing the provided values for the neural network dimensions.

Later, you might encounter situations where you need to add more hyperparameters after the initial parsing. You can add these dynamically using setattr

setattr(args, 'state_dim', self.env.observation_space.shape[0])

In this example, setattr is used to add a new attribute state_dim to the args object. This is particularly useful for parameters like state_dim, which depends on the environment's observation space and can only be set after the environment is initialized.

Runner

You can create a set of runner objects (runners) for executing RL environments:

runners = wise_rl.makeRunner("runner", GymRunner, args, num=2)
wise_rl.startAllRunner(runners)

By adjusting the num parameter, you can control the number of parallel processes. This makes scaling your experiments as simple as modifying a single argument, provided you have the resources to support the increased parallelism.

Alternatively, you can choose to run your code locally without Ray.

If you choose not to use Ray, your code will execute on a single machine, running one process at a time if not otherwise parallelized. This approach is simpler and doesn't require managing a distributed system, making it attractive for development, debugging, or when resources are limited.

runners = GymRunner()
runners.run()

💻 Framework Architecture

WiseRL
├── __init__.py
├── example
│   ├── ddpg
│   │   └── ddpg.py
│   ├── dqn
│   │   ├── dqn.py
│   │   └── dqn_cnn.py
│   ├── mappo
│   │   ├── mpe.py
│   │   └── mpe2.py
│   ├── ppo
│   │   ├── ppo_continuous.py
│   │   ├── ppo_discrete.py
│   └── sac
│       └── sac.py
├── setup.py
├── wiserl
│   ├── agent
│   │   ├── ddpg_agent
│   │   │   ├── __pycache__
│   │   │   ├── config.py
│   │   │   └── ddpg_agent.py
│   │   ├── dqn_agent
│   │   │   ├── doul_dqn_agent.py
│   │   │   └── dqn_agent.py
│   │   ├── ppo_agent
│   │   │   ├── ppo2_agent.py
│   │   │   ├── ppo_agent.py
│   │   └── sac_agent
│   │       └── sac_agent.py
│   ├── core
│   │   ├── __init__.py
│   │   ├── actor.py
│   │   ├── agent.py
│   │   ├── agent_proxy.py
│   │   ├── registre_server.py
│   │   ├── remote.py
│   │   ├── runner.py
│   │   ├── runner_proxy.py
│   │   └── wise_rl.py
│   ├── net
│   │   ├── nn_net.py
│   └── utils
│       ├── __pycache__
│       ├── mem_store.py
│       ├── normalization.py
│       └── replay_buffer.py
└── wiserl.egg-info

🔏 Algorithm

Algorithm	Multi-processing	Parallel-processing
DQN	✔️	✔️
PPO-discrete	✔️	✔️
PPO-continuous	✔️	✔️
DDPG	✔️	✔️
SAC	✔️	✔️
TRPO	✔️	✔️
TD3	✔️	✔️
MADDPG	❌	✔️
MAPPO	❌	✔️
QMIX	❌	✔️
VDN	❌	✔️

Credits


Wiseworker

Support

Reach out to me via the profile addresses.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
doc		doc
example		example
wiserl.egg-info		wiserl.egg-info
wiserl		wiserl
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
setup.py		setup.py
wsreadme.md		wsreadme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A State-of-the-art Distributed Open Source Framework.

💡 About

Hello World

Getting Started

Running a task

Calling a Runner

🏛️ Installation

Downloading and installing steps:

Nanny Tutorial

Start with GymRunner

Initialization

Run

Make your own Agent Class

Main

Hyperparameter

Runner

💻 Framework Architecture

🔏 Algorithm

Credits

Support

License

About

Releases

Packages

Contributors 3

Languages

wiseworker/wiseRL

Folders and files

Latest commit

History

Repository files navigation

A State-of-the-art Distributed Open Source Framework.

💡 About

Hello World

Getting Started

Running a task

Calling a Runner

🏛️ Installation

Downloading and installing steps:

Nanny Tutorial

Start with GymRunner

Initialization

Run

Make your own Agent Class

Main

Hyperparameter

Runner

💻 Framework Architecture

🔏 Algorithm

Credits

Support

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages