AML IP Workload Distribution Scenario

The AML-IP tool can be used to parallelize complex computational tasks, deploying it locally, remotely, or even at the same time. Keep reading to discover how!

The Algebraic Machine Learning - Integrating Platform (AML-IP) is a communications framework that enables the simple deployment of a distributed AML architecture. It is designed to cover a wide range of actions in an AML network and it will serve as a platform for non-expert users to create and manage different AML nodes and the communication between them, in local and remote networks, keeping the privacy and independence of each node. It will also allow to create complex distributed networks, with one or multiple users working collaboratively in the same problem, and provide an easy to use framework to facilitate the process of different aspects of the AML process. Please take a look at the Making Algebraic Machine Learning node communication easy with AML Integration Platform article for more information on the AML Integrating Platform.

One of the biggest strengths of AML is its parallelization capability, allowing different AML engines to share their data in order to distribute the computational effort. These parallelization possibilities are exploited by the AML-IP, which differentiates three main scenarios: a) collaborative learning scenario, b) inference service scenario and c) workload distribution scenario. This article explains the last one.

The so-called Workload Distribution Scenario refers to those scenarios in AML where one computational complex task could be divided into subtasks, and these subtasks could be processed in parallel by different computational nodes, to finally obtain a joint result.  A Workload Distribution Scenario could be deployed locally, remotely, or even both at the same time. 

AML-IP-Engine nodes are in charge of training an AML model by processing chunks of data, which is expensive and requires high computational power, but this is also a parallelizable process. In a workload distribution scenario, this high computational effort is distributed in remote available nodes in the network in a balanced and automatic way regardless of their location and ownership, in order to parallelize the training process and not block any other critical actions that may require to run in the same device. AML-IP implements a MultiService communication to publish those tasks in an efficient way, which is a communication protocol designed by eProsima that exploits the advantages of DDS to create a specific protocol suitable for AML.

  

 

The training data-set of an AML model is stored in a Main Node which divides it into subsets of the original training data, referred to as Jobs, and sends them to the available Computing Nodes in order to perform this parallel training.

 

AML-IP nodes involved in Workload Distribution Scenario

The AML-IP nodes involved in this use case are as follows:

 

  • Main Node: This node acts as a central point for the training process. It divides the training dataset into Jobs, which could be seen as batches of data to be processed by the training algorithm. These Jobs are sent to the Computing Nodes, which process the Jobs and send back a Solution once the task has been executed. This Solution can be a new AML Atomization or a new model state.
  • Computing Node: This node waits for a Job from a Main Node  and once received it performs the training over a batch of AML data, returning a new AML Atomization or a new model state. Its processing function should be implemented by the user.

 

 

Demo

This project intends to simulate what a real future AML network would look like, as well as to give a simple example on how to implement, use and deploy Main and Computing Nodes. 

For this demo the actual AML Engine is not provided. Instead, the AML Engine is mocked so it does not use real AML Jobs. This mocked AML Engine simulates a simple functional block that converts an input string to uppercase and randomly waits between 1 and 5 seconds.

The installation steps to install AML-IP and this demo can be found in the AML-IP Documentation.

For each keyboard input or argument received in the Main Node, it will convert it to a string (str) and send it as a Job. When an empty string given or arguments run out, it will finish the execution and the node is destroyed.

The Computing Node calculated Solution  is an uppercase conversion of the string received.

 

 

Execution

Open as many terminals as main/computing nodes are to be run. In each of them, source the workspace and launch a node, optionally specifying the number of iterations to undergo.

 

# Source colcon installation

source install/setup.bash

 

Run Main node:

 

# To send 2 jobs

python3 ./install/amlip_demo_nodes/bin/main_node.py "first_job" "second job"

 

# To wait for keyboard input

python3 ./install/amlip_demo_nodes/bin/main_node.py

 

Run Computing Node:

 

# To answer 2 jobs

./install/amlip_demo_nodes/bin/computing_node 2

 

Future work

Stay tuned for upcoming news! Soon we will upload a new demo on how to perform distributed inferences using TensorFlow and Darknet. 

 

MORE INFORMATION ABOUT ALMA:

For any questions please contact This email address is being protected from spambots. You need JavaScript enabled to view it..

 

 

 

 

 

We use cookies on our website. Some of them are essential for the operation of the site, while others help us to improve this site and the user experience (tracking cookies). You can decide for yourself whether you want to allow cookies or not. Please note that if you reject them, you may not be able to use all the functionalities of the site.