# QuickStart¶

We are going to deploy a federated infrastructure where CometCloud will orchestrate the execution of workflows across different federated resources. In this case we will federate a Cluster and a Cloud (i.e. Chameleon Alamo). The deployment will look like as follows.

Figure: Simple CometCloud Deployment

## Download Software¶

Please visit the download section to obtain the software.

All you need for this guide is located inside the cometWorkflow.tgz file that you downloaded. This file can be uncompressed using tar vxfz cometWorkflow.tgz, and it contains three directories dist, lib, and simple_run. The two first contain libraries and the last one contains scripts and examples to help with the deployment.

## Requirements¶

This guide requires your system to have Java 1.7+ runtime installed, rsync, and python 2.6+.

## Deploy Workflow Manager¶

We deploy the workflow manager in machine1.domain.com, the workflow service is listening in port 8888 and the autonomic scheduler in port 7778. Moreover, we reference to the Task Generator/WorkflowMaster machine that is located in machine2.domain.com listening in port7777. More details about the workflow manager can be found in the Configuring Workflow Manager and Autonomic Scheduler Section.

Open a terminal and go to the directory simple_run/workflow, that you got after uncompressing the cometWorkflow.tgz file.

### Edit manager.properties file¶

publicIpManager=machine1.domain.com
portManager=8888

logFile=/tmp/WorkflowManager.log

#task generator(workflowMaster)
workflowmasterURI=machine2.domain.com:7777

#resource manager and autonomic scheduler
StartCentralManager=true
CentralManagerAddress=machine1.domain.com
CentralManagerPort=7778

MonitorInterval=60


### Starting the Service¶

Once the configuration is ready, you can start the workflow manager and autonomic scheduler by executing the script startWorkflowManager.sh. This script contains the following code that defines the CLASSPATH and executes the appropriated java class.



## Deploy Agent for Cluster¶

We deploy an agent in machine3.domain.com, which is the front-end node of a cluster. The Task Service of this agent is listening on port 8880 for request from workers and replies from Task Manager (machine where request handler is running); the Local Resource Management Service is listening in port 9889 for commands from Autonomic Scheduler; and File Server is listening in port 6668 for queries from workers.

Open a terminal and go to the directory simple_run/agent, that you got after uncompressing the cometWorkflow.tgz file.

### comet.properties file¶

Replace local file with the file that has been modified in Task Manager.

### Edit agent.properties file¶

This file contains the information required to configure the different services of an Agent. We are going to explain the content of this file by sections.

#Task Service
publicIpAgent=machine3.domain.com
portAgent=8880
MaxNoTaskRetries=100
logFile=Agent.log

#File Server
StartFileServer=true
FileServerPort=6668
FileServerMaxThreads=10
StageFileDir=/tmp/stage/

#Local Resource Management Service configuration
StartManager=true
MgmtPort=9889
CentralManagerServer=machine1.domain.com:7778
MonitorInterval=30

#Resource configuration
Resources=cluster:clusterDell


### Configuring a Cluster (clusterDell)¶

We find the clusterDell configuration file in the directory. Thus, we only need to modify the resource information, and the software . More information about how to configure a cluster can be found in Configuring a Cluster Section.

#resource information
Overhead=5
WorkerLimit=192.168.2.156:1;192.168.2.157:1
Cost=192.168.2.156:0.6;192.168.2.157:0.12
Perf=192.168.2.156:0.67;192.168.2.157:1.34
#port where worker will be started
workerPortRange=7777:7788

#supported apps
SupportedApps=testapp2,testapp3,sampleTutorial
testapp2=tassl.application.cometcloud.sample.AppWorker
testapp3=tassl.application.cometcloud.sampleNonBlocking.AppWorker
sampleTutorial=tassl.application.cometcloud.sampleTutorial.AppWorker

#Resource Information internal to the worker machine (i.e. 192.168.2.156)
SoftwareDirWorker=/cac/soft/cometcloud/
UserWorker=userid
WorkingDir=/tmp/work/


Important

SoftwareDirWorker. Software directory in worker machine. Directory where the software is expected to be in the resource for each application. For example, the software of sampleTutorial is expected to be in the path <SoftwareDirWorker>/sampleTutorial/ (i.e. /cac/soft/cometcloud/sampleTutorial/). Inside that directory we must have any binary required by our application plus the dist and lib directories with the CometCloud jar files as well as the specific worker application jar. More details can be found in Configure Cluster worker.

### Configure Agent machine¶

1. Configure Agent machine to automatically trust worker machines (SSH first handshake). First make sure you have a directory called .ssh in your home directory, if not you can create it executing mkdir ~/.ssh. Then create a file inside that directory called config (file should be ~/.ssh/config)

Host *
StrictHostKeyChecking no
ForwardAgent yes

2. Typically we would want to retrieve data from other Agents. In this way we can make use of intermediate results that are available at remote sites or we can exploit data locality by retrieving the copy of a file that is closest to us. For this we should be able to ssh with no password between Agents.

• Generate sshkey in each agent by executing ssh-keygen. Press enter to all questions and do not establish a passphrase.
• Add public ssh key to all other Agents and data storage sites from which Agents may retrieve data. Copy the content of the file ~/.ssh/id_rsa.pub and place insert it in the file ~/.ssh/authorized_keys of all other Agents. If ~/.ssh/authorized_keys does not exist, you can create it and make sure that the permissions are 0600 (chmod 0600 ~/.ssh/authorized_keys).

### Starting the Service¶

Once the configuration is ready, you can start the Agent by executing the script startAgent.sh.



### Testing Nova Client¶

We should test that our cloud configuration file works. For this we should execute the following commands in the agent VM.

$sudo apt-get install python-nova$ source novarc
$nova list$ nova image-list


### VMs Firewall¶

In a typical cloud configuration all ports are closed by default. Hence, it is required for us to open them. This is done by modifying the security group (e.g., in OpenStack there is a section called Access & Security). We need to make sure that we made modification in the default security group. The specific ports that you need to open are: (i) for worker VMs we need to open the ports specified in portRange, and (ii) for an agent VM we need to open the ports specified in portAgent, FileServerPort, MgmtPort, and the port 12342 (or the port specified in variable IsolatedProxyPort in comet.properties, if defined).

### Configure Agent machine¶

1. Configure Agent machine to automatically trust worker machines (SSH first handshake). First make sure you have a directory called .ssh in your home directory, if not you can create it executing mkdir ~/.ssh. Then create a file inside that directory called config (file should be ~/.ssh/config). Make sure that the permissions are 0600 (chmod 0600 ~/.ssh/config).

Host *
StrictHostKeyChecking no
ForwardAgent yes

2. Typically we would want to retrieve data from other Agents and from data sources (InputData field in the workflow XML). In this way we can make use of intermediate results that are available at remote sites or we can exploit data locality by retrieving the copy of a file that is closest to us. For this we should be able to ssh with no password between Agents.

• Generate sshkey in each agent by executing ssh-keygen. Press enter to all questions and do not establish a passphrase.
• Add public ssh key to all other Agents and data storage sites from which Agents may retrieve data. Copy the content of the file ~/.ssh/id_rsa.pub and place insert it in the file ~/.ssh/authorized_keys of all other Agents. If ~/.ssh/authorized_keys does not exist, you can create it and make sure that the permissions are 0600 (chmod 0600 ~/.ssh/authorized_keys).

### Starting the Service¶

Once the configuration is ready, you can start the Agent by executing the script startAgent.sh.

$./startAgent.sh  ## Executing Workflows¶ Open a terminal and go to the directory simple_run/client, that you got after uncompressing the cometWorkflow.tgz file. ### Defining workflow¶ We are going to define a workflow to execute the sample application called mapreduce. This workflow has two stages, one that generates PR tasks per input file (map method) and another that aggregates all results and produces a final consolidated result file (reduce method). The results of stage 1 will be placed in the staging area of the Agent that managed the execution of those tasks. sample.properties has different values useful for the application. We have a created a workflow called workflowSample.xml with the following content. <xflow name="SimpleWorkflow"> <!--Stages definition --> <stages> <!-- Stage 1 --> <stage id="S1" type="AppGenerateClass" value="tassl.application.cometcloud.sampleTutorial.GenerateTasks" method="map"/> <stage id="S1" type="PropertyFile" value="./sample.properties"/> <stage id="S1" type="Application" value="sampleTutorial"/> <stage id="S1" type="InputData"> <InputData value="jdiaz@sierra.futuregrid.org:/tmp/inputs/" zone="zoneA" site="siteSierra" constraint="zoneA,siteIndia,siteAlamo"/> </stage> <stage id="S1" type="Results" value="" zone="" site="" constraint=""/> <!-- Stage 2 --> <stage id="S2" type="AppGenerateClass" value="tassl.application.cometcloud.sampleTutorial.GenerateTasks" method="reduce"/> <stage id="S2" type="PropertyFile" value="./sample.properties"/> <stage id="S2" type="Application" value="sampleTutorial"/> <stage id="S2" type="InputData"> </stage> <stage id="S2" type="Results" value="jdiaz@sierra.futuregrid.org:/home/output/" zone="zoneB" site="siteSierra" constraint="siteSierra"/> </stages> <!-- Scheduling Policies --> <objectives> <objective id="S1" type="MinRunningTime" value="0" /> <objective id="S2" type="MinRunningTime" value="0" /> </objectives> <!-- Dependencies --> <transitions> <transition from="S1" to="S2" blocking="true"/> </transitions> </xflow>  Important Please make sure that you can SSH without password to the machine specified in the InputData field. Otherwise you may get an exception or the execution of the workflow may halt. Important Please make sure that the directory specified in InputData (e.g., jdiaz@sierra.futuregrid.org:/tmp/inputs/”) contains files. Otherwise, the workflow will not do anything or you may get some exceptions. #### sample.properties file¶ We define properties for the application in this file. We are going to define PR=1, which means that we create one task per input file. We also define minTime and maxTime to indicate the estimated completion time of the tasks. Finally, we define ReferenceBenchmark to allow the system estimate the completion time of the tasks in heterogeneous resources (i.e. all resources in the system have a benchmark score calculated using the same benchmark). The content of this file is as follows. PR=1 minTime=10 maxTime=100 ReferenceBenchmark=5757  #### Create dummy input files¶ As InputData we defined that the input files will be located in the machine sierra.futuregrid.org in the directory /tmp/inputs/ and that they can be accessed with the user jdiaz. The task generator will try to ssh to list the files within the directory and the agents will also ssh to retrieve the files for computation. Hence, we should have SSH no password configured. Important We need to be able to SSH no password to the machine sierra.futuregrid.org from machine2.domain.org, machine3.domain.org, and machine4.domain.org. To create the files we ssh into sierra.futuregrid.org, and create files executing the following command: $ cd /tmp/inputs
$dd if=/dev/zero of=file1 bs=1024 count=0 seek=1024$ dd if=/dev/zero of=file2 bs=1024 count=0 seek=1024
$dd if=/dev/zero of=file3 bs=1024 count=0 seek=1024$ dd if=/dev/zero of=file4 bs=1024 count=0 seek=1024
$dd if=/dev/zero of=file5 bs=1024 count=0 seek=1024  #### Managing workflow¶ • Register workflow $ workflowClient.sh -serverAddress machine1.domain.com -serverPort 8888 ­‐regWorkflow workflowSample.xml

• Check status of the workflow identified by the ID 21312541

$workflowClient.sh -serverAddress machine1.domain.com -serverPort 8888 ‐checkStatus 21312541  • Retrieve results from workflow identified by the ID 21312541. The user in the remote machine is jdiaz and I would like to get the results in my local directory /home/jdiaz/results/ $ workflowClient.sh -serverAddress machine1.domain.com -serverPort 8888 ‐retrieveResutls 21312541 -user jdiaz -path /home/jdiaz/results/
`