README.md

# PAPER TITLE

The current Github repository contains the main material that has been used in the paper “title”, by “Authors”. The paper has been focused on the application of a set of machine learning algorithms over a dataset of patients that is derived from the Electronic Health Records (EHR) of patients who received treatment at public addiction centers in Andalusia. The EHR system is managed by the Information System of the Andalusian Plan on Drugs (SiPASDA), which maintains a centralized dataset for all addiction centers. The EHR stores various information following the standards outlined by the European Monitoring Centre for Drugs and Drug Addiction (EMCDDA, 2012).

Concretely, our population is focused on patients that were under treatment for alcohol disorders. The goal of the study is the use of machine learning techniques to predict if a given patient is more likely to drop out. Creating such type of models would allow to know what are the main variables that drive the decision of dropping out, allowing to pay more attention to those patients that are more likely to drop out. This allows, on one hand, to increase the effectiveness of the therapeutic treatment, as paying more attention to these potential patients might allow to reduce the dropout, and hence, increase the chances of finishing the treatment in a successful way. On the other hand, this also can imply an improvement in the processes associated with these therapeutic treatments and can have benefits for example reducing associated costs.

The material in this repository consists in:

* [code](https://medal.ctb.upm.es/internal/gitlab/compara/mldropoutalcohol/tree/master/code): It contains the code that has been developed for both the data handling, the creation of the models and the Explainability.
    
    * [models](https://medal.ctb.upm.es/internal/gitlab/compara/mldropoutalcohol/tree/master/code/models): It contains two python files.
    
    * [explainability](https://medal.ctb.upm.es/internal/gitlab/compara/mldropoutalcohol/tree/master/code/shap): It contains X python files with the code that has been done to execute SHAP and get Explainability results.


* [results](https://medal.ctb.upm.es/internal/gitlab/compara/mldropoutalcohol/tree/master/results):It contains the supplementary material referred to the results shown in the paper.
    
    * [models_results.xlsx](https://medal.ctb.upm.es/internal/gitlab/compara/mldropoutalcohol/blob/master/results/models_results.xlsx): This excel file shows a summary of all the results for all the algorithms that have been tested. It contains the average results of the 10-fold cross-validation.
    
    * [filtering_discretization.txt]():When the filtering process to reduce the number of features is executed, it is based on a discretization of the original features depending on their values. This file contains an explanation, for each of the two potential subdatasets (psychological/psychiatric disorders grouped in clusters and psychological/psychiatric disorders not grouped), of all the selected features as a result of the filtering method.

The dataset has not been provided since the authors don’t have permission for its sharing from the data providers.


For any inquiry you can contact to:
* Xxx xxx email
* Yyy yyy email