AXEL-F - Standalone version 1.1.0
==========================================

Introduction
------------
This package includes AXEL-F prediction tool, which improves epitope prediction by taking account of
both peptide-MHC binding affinities and expression levels of the peptide's source protein.
The collection contains python scripts to run on linux-based environments and Dockerfile that allows
user to create image containing AXEL-F tool.


Prerequisites:
-------------

+ Python 3.6 or higher
  * http://www.python.org/

+ tcsh
  * http://www.tcsh.org/Welcome
    - Under ubuntu: sudo apt-get install tcsh


Installation (Linux environment):
--------------------------------
Unpack the tar.gz files (IEDB_AXELF-VERSION.tar.gz)
Install packages in 'requirements' to install packages that are necessary for AXEL-F.

$ tar -xzvf IEDB_AXELF-VERSION.tar.gz
$ cd axelf
$ pip install -r requirements.txt

Installation (Mac OS or others):
-------------------------------
Under Mac OS, Axelf won't work properly. The workaround is to run docker container for this.
$ tar -xzvf IEDB_AXELF-VERSION.tar.gz
$ cd axelf
$ docker build -t axelf_img .

Help:
----
On Linux  : python run_axelf.py -h` or `python run_axelf.py --help
Container : docker run --rm axelf_img python run_axelf.py --help


View all available alleles for Axelf:
------------------------------------
On Linux  : python run_axelf.py -p
Container : docker run --rm axelf_img python run_axelf.py -p


CSV Examples:
------------
When providing CSV file as an input for Axelf, it must be in a valid CSV format. A valid
CSV file will contain a header and have no missing data or empty cell in each row.

1. CSV file with peptide sequences only.
The simplest CSV file you can have is a file containing peptide information only.
Ex) peptide
    ADMGHLKY
    ELDDTLKY
    FMDHVLRY
    FSDLPLRV

When providing with the above example, allele information must be provided along with either
TPM value or TCGA data. Here are some of the command options...

* Providing allele with TPM value.
On Linux  : python run_axelf.py tests/data/input/peptide_input.csv -a "HLA-A*01:01" -t 3.0
Container : docker run --rm axelf_img python run_axelf.py tests/data/input/peptide_input.csv -a "HLA-A*01:01" -t 3.0

* Providing allele with TCGA data (MUST specify cancer type and gene name).
On Linux  : python run_axelf.py tests/data/input/peptide_input.csv -s tcga -c CESC -g TIGAR -a "HLA-A*01:01"
Container : docker run --rm axelf_img python run_axelf.py tests/data/input/peptide_input.csv -s tcga -c CESC -g TIGAR -a "HLA-A*01:01"

2. CSV file with peptide sequences and allele name only.
Axelf can also take in a CSV file with allele information along with the peptide information. In this case,
allele flag is not necessary unless you want to override the allele in the CSV file.
Ex) peptide,allele
    ADMGHLKY,HLA-A*01:01
    ELDDTLKY,HLA-A*01:01
    FMDHVLRY,HLA-A*01:01
    FSDLPLRV,HLA-A*01:01

* Providing with TPM value.
On Linux  : python run_axelf.py tests/data/input/peptide_allele_input.csv -t 3.0
Container : docker run --rm axelf_img python run_axelf.py tests/data/input/peptide_allele_input.csv -t 3.0

* Providing with TCGA data (MUST specify cancer type and gene name).
On Linux  : python run_axelf.py tests/data/input/peptide_allele_input.csv -s tcga -c CESC -g TIGAR
Container : docker run --rm axelf_img python run_axelf.py tests/data/input/peptide_allele_input.csv -s tcga -c CESC -g TIGAR

3. CSV file provided with peptide sequences, allele, and TPM value.
When choosing to utilize TPM value, you may have TPM column inside the CSV as well.
Ex) peptide,allele,tpm
    ADMGHLKY,HLA-A*01:01,122.985
    ELDDTLKY,HLA-A*01:01,34.705
    FMDHVLRY,HLA-A*01:01,16.2825
    FSDLPLRV,HLA-A*01:01,3.7025

* When using such CSV file, no flag needs to be specified.
On Linux  : python run_axelf.py tests/data/input/sample_input.csv
Container : docker run --rm axelf_img python run_axelf.py tests/data/input/sample_input.csv

4. CSV file provided with peptide sequences, allele, and gene name.
If each peptide needs to be provided with different genes, simply add a column containing gene names.
Ex) allele,peptide,gene name
    HLA-B*52:01,EGMKTQYSV,RP11-368I23.2
    HLA-C*12:02,YLASLHPRL,RP11-167B3.1
    HLA-A*26:01,ELFQGSDLGV,RP11-742D12.2
    HLA-B*38:01,LRDDKDNIERL,RAB4B

* Providing with TCGA data by specifying cancer type only.
On Linux  : python run_axelf.py tests/data/input/sample_input2.csv -s tcga -c CESC
Container : docker run --rm axelf_img python run_axelf.py tests/data/input/sample_input2.csv -s tcga -c CESC


FASTA Examples:
--------------
When providing FASTA file as an input for Axelf, it must be in a valid FASTA format. A valid
FASTA file, according to the NIH, will be a single-line description followed by lines of sequence data.
The single-line description will be distinguished from sequence data by ">" symbol at the beginning.

Ex) >SEQUENCE_1
    MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
    LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK
    IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL
    MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL

Also, note that whenever using a FASTA input for Axelf, length must be specified using "-l" or "--peptide-length" flag.
This will allow FASTA sequence to break up into kmers (peptides with length 'k') and process through.

1. Providing with TPM value.
On Linux  : python run_axelf.py tests/data/input/sample_input.fasta --tpm 3.0 -l 8 -a "HLA-A*01:01"
Container : docker run --rm axelf_img python run_axelf.py tests/data/input/sample_input.fasta --tpm 3.0 -l 8 -a "HLA-A*01:01"

2. Providing with TCGA data (MUST specify cancer type and gene name).
On Linux  : python run_axelf.py tests/data/input/sample_input.fasta -s tcga -c CESC -g TIGAR -l 8 -a "HLA-C*12:02"
Container : docker run --rm axelf_img python run_axelf.py tests/data/input/sample_input.fasta -s tcga -c CESC -g TIGAR -l 8 -a "HLA-C*12:02"
