# RECLASS: Multi-Task Deep Learning for App Review Classification **COMP6013 | Oxford Brookes University | 2025-26** --- # README not finished ## Overview RECLASS is a multitask learning system which uses a shared multilingual transformer encoder with task-specific heads and single-task implementations for optional comparison. | Task | Output | Classes | |------|--------|---------| | Bug Report Detection | Binary | Yes / No | | Feature Request Detection | Binary | Yes / No | | Aspect Classification | Multi-class | Driver, App, Pricing, Service, Payment, General | | Aspect Sentiment | Multi-class | Positive, Neutral, Negative | ## Dataset - **Source**: [Uber Customer Reviews (Kaggle)](https://www.kaggle.com/datasets/khushipitroda/ola-vs-uber-play-store-reviews) - **Original size**: ~1.07M Reviews - **After Preprocessing**: ~495K Reviews - **Annotation subsets**: 5,000 from the original distribution, 5,000 from a keyword boosted sample ## Preprocessing Steps - Removed URLS and emails - Normalised text and punctuation - Removed duplicate reviews - Filtered reviews less than 5 words - Output sets - Original: matches the original distribution of the raw dataset - Boosted: oversamples bug reports and feature requests using keyword heuristics ## Model - Encoder: XLM-RoBERTa (large multilingual transformer model) - Architecture: - Shared encoder - Task-specific classification heads - Training setups: - MTL (Multitask learning) - STL (Single-task learning) Class weights are applied to reduce imbalance effects. ## Repository Structure . ├── data │ └── processed │ ├── boosted_test.csv │ ├── boosted_train.csv │ ├── boosted_val.csv │ ├── original_test.csv │ ├── original_train.csv │ ├── original_val.csv │ └── review.csv ├── notebooks/ │ ├── outputs │ └── figures/ ├── README.md ├── architecture.png └── src ├── dataset.py ├── evaluate.py ├── infer.py ├── model.py ├── multitag.py ├── preprocess.py ├── sampler.py └── train.py ## Results Evaluation includes Precision, Recall, Macro F1, Confusion matrices and confidence analysis. Results and summaries are found in outputs/*.json and outputs/figures/ ## Installation ``` # Create conda environment conda create -n reclass python=3.11 conda activate reclass ``` ``` # Install dependencies conda install --file requirements.txt ``` ## Usage #### Train Model ``` python src/train.py --mode mtl --dataset original ``` #### Evaluate Model ``` python src/evaluate.py --mode mtl --dataset original --model_path .pt ``` #### Run Inference ``` python src/infer.py --mode mtl --model_path .pt --dataset review ``` ## Notes - The same tokenizer is used across training, evaluation and inference to ensure consistency - Sampling and preprocessing choices are documented further in src files and dissertation --- *Last updated: January 2025*