IEDB Next-Generation Tools PEPMatch - version 0.1 beta ====================================================== Introduction ------------ PEPMatch will take a list of peptides as input and search against a reference proteome for peptides that match with X or fewer substitutions. This command-line tool drives the PEPMatch tool at https://nextgen-tools.iedb.org/pipeline?tool=pepmatch. It is a wrapper layer around version 0.9.6 of PEPMatch, which is available on GitHub: https://github.com/IEDB/PEPMatch/. Release Notes ------------- v0.1 beta - Initial public beta release Prerequisites ------------- The following prerequisites must be met before installing the tools: + Python 3.7 or higher * http://www.python.org/ The following prerequisites must be met before running the tools: + Preprocessed proteome files are required for running the tool. These may be downloaded from: https://downloads.iedb.org/datasets/pepmatch-proteome/preprocessed/LATEST + 8GB RAM or more Installation ------------ Below, we will use the example of installing to /opt/iedb_tools. 1. Extract the code and change directory: $ mkdir /opt/iedb_tools $ tar -xvzf IEDB_NG_PEPMATCH-VERSION.tar.gz -C /opt/iedb_tools $ cd /opt/iedb_tools/ng_pepmatch-VERSION 2. Optionally, create and activate a Python 3.7+ virtual environment using your favorite virtual environment manager. Here, we will assume the virtualenv is at ~/virtualenvs/cluster: $ python3 -m venv ~/venvs/pepmatch $ source ~/venvs/pepmatch/bin/activate 3. Install python requirements: $ pip install -r requirements.txt Usage ----- python3 src/match.py -j [-o] [-f] The format of the input JSON file is described below. The output_prefix and output_format are optional. By default, the output will be printed to the screen in TSV format. Options are 'tsv' or 'json'. Input formats ------------- Currently, only JSON input is supported. *NOTE*: This tool only accepts JSON inputs, formatted as described below { "input_sequence_text": "DDEDSKQNIFHFLYR\nADPGPHLMGGGGRAK\nKAVELGVKLLHAFHT\nQLQNLGINPANIGLS\nHEVWFFGLQYVDSKG", "mismatch": 3, "proteome": "human", "best_match": true } * input_sequence_text: a fasta-formatted string. To create an appropriate string from a fasta file: awk '{printf "%s\\n", $0}' * mismatch: the maximum number of mismatches to allow in the search * proteome: this must correspond to one of the preprocessed proteome names listed below. Future versions of this tool will allow for custom proteomes: - cow - dog - horse - human - mouse - pig - rabbit - rat * best_match: return only the best match per peptide. If false, all matches at or below the mismatch threshold will be returned. Caveats ------- All IEDB next-generation standalones have been developed with the primary focus of supporting the website. Some user-facing features may be lacking, but will be improved as these tools mature. Contact ------- Please contact us with any issues encountered or questions about the software through any of the channels listed below. IEDB Help Desk: https://help.iedb.org/ Email: help@iedb.org