# NXG HPC Jobs

The code in this repository relates to running IEDB NextGen Tools on the
cluster.  The goal is to include scripts that can:

Currently, these are python scripts that make use of the NXG API client and
the standalone tool.  It is conceivable that some stages / pipelines may be
better served by creating pipelines in Nextflow.

## Main stage scripts

A request to run a pipeline stage is sent to the cluster through the
[Slurm API](https://slurm.schedmd.com/rest_api.html) by passing along
the stage ID.  This kicks off the wrapper script [run_stage_prediction.sh](run_stage_prediction.sh), which sets up the environment and launches [run_stage_prediction.py](run_stage_prediction.py). This script will use
the [NXG Client API](../nxg_client/README.md) and appropriate command line tool to:

* create a temporary directory for the stage
* pull down stage information
* pull down inputs for the stage
* split jobs into smaller chunks
* submit jobs to the cluster
* aggregate results
* post results, warnings, errors, job ids back to the database

### Usage

The run_stage_prediction.py script is run by passing a stage
ID and en environment name (default is 'dev').

```shell
python run_tcell_mhci.py \
--stage-id  f1bdd585-a162-4fa3-a271-cb790f4ed5c1 \
--env-name dev
```

The environment name will be used to set appropriate parameters for the given run
environment.  These parameters will be pulled from [settings.py](settings.py).

There are
many command line options that can be used to change / fine-tune the behavior.
Here is the usage information as of [44d0aa1f3](https://gitlab.lji.org/iedb/tools/tools-redesign/global-dependencies/nxg-tools/-/commit/44d0aa1f3870bb6510aec98c3bdfbeba4c5f0604):

```shell
usage: run_stage_prediction.py [-h] --stage-id STAGE_ID [--base-uri BASE_URI] [--env-name ENV_NAME] [--dev]
                               [--scratch-dir SCRATCH_DIR] [--stage-dir STAGE_DIR] [--cmdline-path CMDLINE_PATH]
                               [--token TOKEN] [--python-path PYTHON_PATH] [--virtualenv VIRTUALENV]
                               [--modules MODULES] [--virtualenv-nxgtools VIRTUALENV_NXGTOOLS]
                               [--modules-nxgtools MODULES_NXGTOOLS]

options:
  -h, --help            show this help message and exit
  --stage-id STAGE_ID   The stage id to submit for execution on the cluster
  --base-uri BASE_URI   The URI for the API
  --env-name ENV_NAME   Name of the environment to use and pull setttings from for all default values
  --dev                 A flag to change the URI to point to the dev server. This will override the --base_uri
                        parameter.
  --scratch-dir SCRATCH_DIR
                        The top level directory to be used for creating temporary directories. Default is temp
                        directory defined by the shell.
  --stage-dir STAGE_DIR
                        The directory in which the job will be run. If not specified, a temporary directory will be
                        created
  --cmdline-path CMDLINE_PATH
                        Path to the pepmatch command line tool
  --token TOKEN         API token for POSTing data
  --python-path PYTHON_PATH
                        Path to the python binary to used for running the command line script
  --virtualenv VIRTUALENV
                        Path to the virtualenv to be activated before running the command line script
  --modules MODULES     List of modules that need to be loaded before running the command line script
  --virtualenv-nxgtools VIRTUALENV_NXGTOOLS
                        Path to the virtualenv to be activated to run scripts in the nxg-tools project
  --modules-nxgtools MODULES_NXGTOOLS
                        List of modules that need to be loadedto run scripts in the nxg-tools project
```

Note that the ```NXG_AUTH_TOKEN``` environment variable must be set in order for
the script to be able to post results and job IDs back to the database.

## Filter stage script

*NOTE*: As of 10/14/2023, the filter stages are still being run from the API and the content of this section is outdated.

The (run_stage_filer.py)[run_stage_filter.py] script will:

* pull down stage information
* pull down previous/input stage information
* pull down input stage results
* apply any filters
* post filtered results back to database

### Usage

Current, a stage ID and input stage ID need to be provided.  In a future version,
only the stage ID will be needed, as the input stage ID will be provided from
the API.  Here is the current usage information:

```shell
usage: run_stage_filter.py [-h] --stage_id STAGE_ID [--input_stage_id INPUT_STAGE_ID] [--base_uri BASE_URI] [--dev]

options:
  -h, --help            show this help message and exit
  --stage_id STAGE_ID   The filter stage id
  --input_stage_id INPUT_STAGE_ID
                        The input stage ID. This is only here for debugging as the input stage ID should be grabbed from the filter stage data
  --base_uri BASE_URI   The URI for the API
  --dev                 A flag to change the URI to point to the dev server. This will override the --base_uri parameter.
```

Similar to the ```run_tcell_mhci.py``` script, the ```NXG_AUTH_TOKEN``` environment
variable must be set so that this script can post results back to the database.

Also similar to the main stage script is that a wrapper, (run_stage_filter.sh)[run_stage_filter.sh]
is provided and should be called from the Slurm API.

## Shell wrapper scripts

In order to simplify the most common calls to the python scripts from the Slurm
API, shell scripts corresponding to each python script are provided.  These
scripts:

* pull the NXG_BASE_URI from the environment, or use dev as the default
* source a 'secrets' file that adds the ```NXG_AUTH_TOKEN``` to the environment
* load the right python module & virtual environment
* run the python scripts

## Using the SLURM API

The Slurm API is used to kick off the main & filter stage jobs.  A payload
is sent to the ```/submit``` endpoint:

```
dev - http://ar-hpc-dev.internal.iedb.org:6820/slurm/v0.0.36/job/submit
prod - http://slurm.internal.iedb.org:8999/slurm/v0.0.38/job/submit
```

An example payload:

```json
{
    "job": {
        "tasks": 1,
        "name": "tc1-stage-1",
        "comment": "stage_id=efc0e6c5-c5f9-4cc6-be95-86eb641ee0c5;",
        "nodes": 1,
        "current_working_directory": "/scratch/",
        "environment": {
            "PATH": "/bin:/usr/bin/:/usr/local/bin/",
            "LD_LIBRARY_PATH": "/lib/:/lib64/:/usr/local/lib",
					  "NXG_BASE_URI": "https://api-nextgen-tools-dev.iedb.org/api/v1"
        }
    },
    "script": "#!/bin/bash\n. ~/.bash_profile\n /apps/iedb-libs/nxg-tools/nxg_hpc_jobs/run_stage_prediction.sh 'efc0e6c5-c5f9-4cc6-be95-86eb641ee0c5'"
}
```

__Parameters__

More specifics on the available parameters for submission can be found
in the Slurm API documentation for the (/submit)[https://slurm.schedmd.com/rest_api.html#v0.0.36_job_desc_msg] endpoint.

* ```name``` will become the job name in the slurm system and should ideally be
unique.
* ```current_working_directory``` will be the directory where the main script is
executed and will contain ```slurm-JOBID.out```.  Passing a ```standard_output``` and
```standard_error``` location can override this.
* ```script``` is the actual script that gets executed.  Note that we source the
```.bash_profile``` since it doesn't seem to be sourced by default.
* ```comment``` - we use this field to encode the stage id in the job so that the backend
can group all related jobs together.  All downstream jobs that are created from teh master
job will have the stage_id embedded in the comment as well.

__POST headers__

In order to be accepted, proper credentials must be supplied with the POST
headers:

* ```X-SLURM-USER-NAME``` the username for the cluster user
* ```X-SLURM-USER-TOKEN``` the token for the cluster user - obtained from the
cluster by running ```scontrol token```.  By default these tokens are
valid for 30 minutes, but can be extended with the ```lifespan``` option by specifying
the number of seconds.  E.g., to create  token valid for 30 days:

```
scontrol token lifespan=2592000
```

The [get_slurm_token.py](get_slurm_token.py) script has a function that will create valide tokens, if given the path to a private key file.  The private keys for the dev and prod head nodes are found at ```/var/spool/slurmd/jwt_hs256.key``` and have been placed on the nextgen-tools & nextgen-tools-dev servers at ```/mnt/data/secrets```.

## Important paths

__Installation location__
This repo is currently installed on the cluster at ```/apps/iedb-libs/nxg-tools/nxg_hpc_jobs```

__Scratch directory__
```/scratch``` is mounted on all nodes and is where all temporary files and stage
directories will be created.

__Command line tool__
The T cell MHC I command line tool is on the cluster under ```/apps/iedbtools/tc_mhci-0.0.2```

__Virtual environments__
Python virtual environments are installed under ```/apps/python-virtualenvs```.