IEDB Next-Generation Tools Peptide Variant Comparison - version 0.2-beta
========================================================================

Introduction
------------
This package is a wrapper around T Cell Class I tool and will run predictors
from that tool (binding, elution, immunogenicity) on a set of paired peptides
and compare the results.  Additionally it includes ICERFIRE, which was specifically
developed to quantify differences between wild-type and mutant peptides in the
context of cancer neoepitopes.

Release Notes
-------------
v0.2-beta - Initial public beta release


Prerequisites
-------------
The following prerequisites must be met before installing this tool:

+ Linux 64-bit environment.  Most modern Linux distributions should work.
  * http://www.ubuntu.com/

+ Python 3.8 or higher
  * http://www.python.org/

+ IEDB T Cell Class I Tool
  * https://nextgen-tools-dev.iedb.org/download-all


Installation
------------
Below, we will use the example of installing to /opt/iedb_tools.

1. Extract the code and change directory: 
  $ mkdir /opt/iedb_tools
  $ tar -xvzf IEDB_NG_PVC-0.2-beta.tar.gz -C /opt/iedb_tools
  $ cd /opt/iedb_tools/ng_pvc-0.2-beta

2. (Optional) - If you plan to use ICERFIRE, there are a few additional steps.

  a. Create a Python virtual environment under which ICERFIRE will run.  Note that ICERFIRE will not run under Python 3.11 or higher,
     so it may be necessary to add an additional Python installation to your system.

    $ python3 -m venv /path/to/icerfire_virtualenv
    $ source /path/to/icerfire_virtualenv/bin/activate
    $ pip install -r method/icerfire-1.0-executable/requirements.txt

  b. Download a copy of the IEDB PepX database.  Note that this database is large (>100GB), so ensure you have enough free
     space available. The database can be downloaded from https://downloads.iedb.org/datasets/pepx/LATEST.

3. Switch to activate T Cell Class I Python virtual environment if you have set it up:

    $ source /path/to/tc1_virtualenv/bin/activate

  Otherwise, you can create now:
  (Please note Python 3.8 or higher is required for T Cell Class I Python virtual environment)

    $ python3 -m venv /path/to/mhci_virtualenv
    $ source /path/to/mhci_virtualenv/bin/activate
    $ pip install -r environments/pvc-requirements.txt

4. Using a text editor (e.g., nano), update the values in the `paths.py` to match the layout of your system.

5. Run the `configure` script:

  $ ./configure


Usage
-----

python3 src/run_pvc.py [-j] <input_json_file> [-o] <output_prefix> 

The output format will be 'json'.

Example: python3 src/run_pvc.py  -j examples/input_sequence_text.json -o output

Run the following command, or see the 'example_commands.txt' file in the 'src'
directory for typical usage examples:

> python3 src/run_pvc.py -h


Input formats
-------------
Inputs may be specified in JSON format.  See the JSON files in the 'examples'
directory.  When multiple methods are selected, jobs will be run serially and
the output will be concatenated.  This can be avoided with the '--split' and 
'--aggregate' workflow which is described below.

Here is an example JSON that illustrates the basic format:

{
  "input_sequence_text": "RKLYCVLLFLSAAE\tRKLYCVLLFLSAFE\nCVLLLSAFFEATYM\t
  CVLLLSAFFEFTYM\nLSAFEFTAYMINFG\tLSAFEFTFYMINFG\nEFTYMAFFGRGQNA\t
  EFTYMNFFGRGQNA",
  "alleles": "HLA-A*02:01",
  "predictors": [
    {
      "type": "binding",
      "method": "netmhcpan_el"
    }
  ]
}

* input_sequence_text: List of peptide pairs separated by a tab ("\t"). Each
  pair must be followed by a newline character ("\n").
* alleles: A comma-separated string of alleles.
* predictors: A list of individual predictors to run.  See the file 
  examples/input_sequence_text.json from T Cell Class I tool for a list of all
  possible predictors and options.  Multiple predictors may be specified.


Job splitting and aggregation
-----------------------------
*NOTE that this is an experimental workflow and this package does not contain
the code to automate job submission and aggregation. Althogh this workflow is
intended for IEDB internal usage, it is the only workflow currently supported
by this tool.

The workflow consists of:

1. Running PVC with the --split option to create a job_description.json file.
2. Running tcell_mhci.py for the peptide A and peptide B datasets separately.
3. Collating the results with the --aggregate option.

The 'job_description.json' file produced with then --split option is used
will include the commands needed to run each individual job, its dependencies,
and the expected outputs. Each job can be executed as job dependencies are
satisfied.  The job description file will also contain an aggregation job,
that will combine all of theindividual outputs into one JSON file.


Caveats
-------
All IEDB next-generation standalones have been developed with the primary
focus of supporting the website.  Some user-facing features may be lacking,
but will be improved as these tools mature.


Contact
-------
Please contact us with any issues encountered or questions about the software
through any of the channels listed below.

IEDB Help Desk: https://help.iedb.org/
Email: help@iedb.org