IEDB Next-Generation Tools Peptide Variant Comparison - version 0.2-beta ======================================================================== Introduction ------------ This package is a wrapper around T Cell Class I tool and will run predictors from that tool (binding, elution, immunogenicity) on a set of paired peptides and compare the results. Additionally it includes ICERFIRE, which was specifically developed to quantify differences between wild-type and mutant peptides in the context of cancer neoepitopes. Release Notes ------------- v0.2-beta - Initial public beta release Prerequisites ------------- The following prerequisites must be met before installing this tool: + Linux 64-bit environment. Most modern Linux distributions should work. * http://www.ubuntu.com/ + Python 3.8 or higher * http://www.python.org/ + IEDB T Cell Class I Tool * https://nextgen-tools-dev.iedb.org/download-all Installation ------------ Below, we will use the example of installing to /opt/iedb_tools. 1. Extract the code and change directory: $ mkdir /opt/iedb_tools $ tar -xvzf IEDB_NG_PVC-0.2-beta.tar.gz -C /opt/iedb_tools $ cd /opt/iedb_tools/ng_pvc-0.2-beta 2. (Optional) - If you plan to use ICERFIRE, there are a few additional steps. a. Create a Python virtual environment under which ICERFIRE will run. Note that ICERFIRE will not run under Python 3.11 or higher, so it may be necessary to add an additional Python installation to your system. $ python3 -m venv /path/to/icerfire_virtualenv $ source /path/to/icerfire_virtualenv/bin/activate $ pip install -r method/icerfire-1.0-executable/requirements.txt b. Download a copy of the IEDB PepX database. Note that this database is large (>100GB), so ensure you have enough free space available. The database can be downloaded from https://downloads.iedb.org/datasets/pepx/LATEST. 3. Switch to activate T Cell Class I Python virtual environment if you have set it up: $ source /path/to/tc1_virtualenv/bin/activate Otherwise, you can create now: (Please note Python 3.8 or higher is required for T Cell Class I Python virtual environment) $ python3 -m venv /path/to/mhci_virtualenv $ source /path/to/mhci_virtualenv/bin/activate $ pip install -r environments/pvc-requirements.txt 4. Using a text editor (e.g., nano), update the values in the `paths.py` to match the layout of your system. 5. Run the `configure` script: $ ./configure Usage ----- python3 src/run_pvc.py [-j] [-o] The output format will be 'json'. Example: python3 src/run_pvc.py -j examples/input_sequence_text.json -o output Run the following command, or see the 'example_commands.txt' file in the 'src' directory for typical usage examples: > python3 src/run_pvc.py -h Input formats ------------- Inputs may be specified in JSON format. See the JSON files in the 'examples' directory. When multiple methods are selected, jobs will be run serially and the output will be concatenated. This can be avoided with the '--split' and '--aggregate' workflow which is described below. Here is an example JSON that illustrates the basic format: { "input_sequence_text": "RKLYCVLLFLSAAE\tRKLYCVLLFLSAFE\nCVLLLSAFFEATYM\t CVLLLSAFFEFTYM\nLSAFEFTAYMINFG\tLSAFEFTFYMINFG\nEFTYMAFFGRGQNA\t EFTYMNFFGRGQNA", "alleles": "HLA-A*02:01", "predictors": [ { "type": "binding", "method": "netmhcpan_el" } ] } * input_sequence_text: List of peptide pairs separated by a tab ("\t"). Each pair must be followed by a newline character ("\n"). * alleles: A comma-separated string of alleles. * predictors: A list of individual predictors to run. See the file examples/input_sequence_text.json from T Cell Class I tool for a list of all possible predictors and options. Multiple predictors may be specified. Job splitting and aggregation ----------------------------- *NOTE that this is an experimental workflow and this package does not contain the code to automate job submission and aggregation. Althogh this workflow is intended for IEDB internal usage, it is the only workflow currently supported by this tool. The workflow consists of: 1. Running PVC with the --split option to create a job_description.json file. 2. Running tcell_mhci.py for the peptide A and peptide B datasets separately. 3. Collating the results with the --aggregate option. The 'job_description.json' file produced with then --split option is used will include the commands needed to run each individual job, its dependencies, and the expected outputs. Each job can be executed as job dependencies are satisfied. The job description file will also contain an aggregation job, that will combine all of theindividual outputs into one JSON file. Caveats ------- All IEDB next-generation standalones have been developed with the primary focus of supporting the website. Some user-facing features may be lacking, but will be improved as these tools mature. Contact ------- Please contact us with any issues encountered or questions about the software through any of the channels listed below. IEDB Help Desk: https://help.iedb.org/ Email: help@iedb.org