# Allele Validator

This library provides many functionalities that will easily retrieve data from datasources. This would include validating alleles, converting between labels/synonyms/MRO IDs, and information retrieval. In order to make this library work, Allele Validator relies on several files which would need preprocessing to be done. Please follow the steps below in order to prepare the necessary file that Allele Validator would need.

> **NOTE** : Allele Validator will also prepare datasource file that will be used for **[Allele Autocomplete](https://gitlab.lji.org/iedb/tools/tools-redesign/api-dependencies/allele-autocomplete)**.

<br>

## Required Initial Files

- **Tools_MRO_mapping.xlsx**: Static file that shouldn't be modified (default).
- **mro_molecules.tsv**: File that needs to get updated every once in a while from **[MRO Github](https://github.com/IEDB/MRO)**.
- **method-table.xlsx**: Table containing method names, versions, and default version. This file needs to be updated when new method or version is added.
- **allele-lengths.xlsx**: Table containing method and available allele lengths.
- **Additional_netMHCpanRV.xlsx**: File containing 94 netmhcpan 4.1 alleles that were newly mapped and were not seen in `<i>`Tools_MRO_mapping.xlsx `</i>`.

<br>

> The `<i>`**unmapped_alleles.txt** `</i>` and `<i>`**unmapped_98_alleles.txt** `</i>` can be disregarded. It was mainly used for analysis.`<br>`More information can be found from [issue #346](https://gitlab.lji.org/iedb/tools/tools-redesign/ar-redesign-prototype/-/issues/346#note_26488).

<br>

## Getting Started
Within a clean `virtualenvironment`, install the requirements:
```shell
pip install -r requirements.txt
```
In a python script, instantiate Allele Validator.
```python
from allele_validator import AlleleValidator

def main():
    validator = AlleleValidator()
```
The library supports many different functions. Each use case can be found by looking into all the unit test files. Here, few use cases will be displayed:
```python
iedb_label = 'BoLA-2*026:01'
tool_label = validator.convert_iedblabel_to_methodlabel(iedb_label)
print(tool_label) # 'BoLA-2:02601'

tool_label = 'DRB5*02:02'
mroid = self.validator.convert_methodlabel_to_mroid(tool_label)
print(mroid) # 'MRO:0001349'
```

### Return Values
Every functions in Allele Validator will have `input-output agreement`. What this means is that if input is a string, the output will also be a string. If the input is a list, the output will also be a list.
Here's an example:
```python
# Input is a string
tool_label = 'DRB5*02:02'
result = self.validator.convert_methodlabel_to_mroid(tool_label)
print(result) # 'MRO:0001349'

# Input is a list
tool_label = ['DRB5*02:02']
result = self.validator.convert_methodlabel_to_mroid(tool_label)
print(result) # ['MRO:0001349']
```

## Unit Testing
All the unit testing files are located in the `tests` folder. Run the following command to run all the unit test files.
```shell
python -m unittest discover -a tests
```
> **NOTE:**<br> 
> When running locally, install packages that are necessary for the unit tests before running the above command.
> ```
> pip install -r requirements_unittest.txt
> ```
> Some packages (NetMHCpan, NetMHCIIpan, etc.) requires `tcsh` and `gawk`, which makes it unable to run under Macbook M1 computers. Recommend to run tests under Windows or Linux computers.


## Regular Build

The **mro_molecules.tsv** is regularly updated in Github, thus we would need to pull down, process, and re-pickle the data.
Run the following script to do so :

```
sh weekly_build.sh
```

Weekly build script consists of three stages :

1. Retrieving the latest MRO data from Github using the `get_latest_MRO.sh` script.
2. Running the regular build that addds extra/newly added tools label as synonym, populating predictor availability column and tool_group column, and rebuilding autocomplete datasource for the Autocomplete.
3. Generating pickle files of the data files.

### Testing weekly_build script locally

When testing the script locally, it can be convenient to run `weekly_build.sh` multiple times. The build script cannot be run more than once as it only runs if MRO molecule file from Github is updated.

To run be able to run `weekly_build` multiple times despite having the latest MRO molecules file, pass `1` as a parameter.

```bash
sh weekly_build.sh 1
```
