charlie-rasberry 8d3dee6d30 House Cleaning
2026-01-28 16:41:27 +00:00
2026-01-28 16:41:27 +00:00
2026-01-28 16:41:27 +00:00
2025-12-19 07:19:02 +00:00
2026-01-28 16:41:27 +00:00
2026-01-28 16:41:27 +00:00

RECLASS: Multi-Task Deep Learning for App Review Classification

COMP6013 | Oxford Brookes University | 2025-26


Project Overview

RECLASS is a multi-task learning system which uses a shared BERT encoder with task-specific classification heads.

Task Output Classes
Bug Report Detection Binary Yes / No
Feature Request Detection Binary Yes / No
Aspect Classification Multi-class Driver, App, Pricing, Service, Payment, General
Aspect Sentiment Multi-class Positive, Neutral, Negative

Dataset

  • Source: Uber Customer Reviews (Kaggle)
  • Original size: 1,069,616 reviews
  • Cleaned size: 495,036 reviews (after removing short/duplicate reviews)
  • Annotation target: 5,000 manually labelled reviews

Repository Structure

6013/
README.md
requirements.txt
    multitag/
        data/
            uber_reviews.csv           # Raw dataset
            uber_reviews_cleaned.csv   # Preprocessed reviews
            uber_reviews_sampled.csv   # Stratified sample for annotation
            uber_reviews_tagged.csv    # Annotated reviews (in progress)
        notebooks/
            datasets_reviews.ipynb     # Initial data exploration
            preprocessing_uber.ipynb   # Preprocessing analysis
            uber_cleaned.ipynb         # Cleaned data verification
        src/
            preprocess.py              # Text cleaning and filtering pipeline
            sampler.py                 # Stratified sampling strategies
            multitag.py                # GUI annotation tool
            train.py                   # Model training (in progress)
            infer.py                   # Inference pipeline (in progress)

Current Progress

  • Manual annotation of 5,000 reviews
  • BERT baseline implementation
  • Multi-task model architecture
  • Training and evaluation
  • Comparative analysis (MTL vs single-task)
  • Final report and presentation

Installation

# Clone repository
...
# Create conda environment
...
# Install dependencies
...requirements.txt

Usage

References

Licenses


Last updated: January 2025

Description
Review Classification using XLM-RoBERTa in a multitask configuration to compare against single head baselines. All Data, statistics and model checkpoints used are attached.
Readme 6.7 GiB
Languages
Jupyter Notebook 68.7%
Python 30.1%
Batchfile 1.2%