cleaned notebooks, finished datalabelling

This commit is contained in:
charlie-rasberry
2026-02-16 12:36:29 +00:00
parent 8d3dee6d30
commit b88504725d
5 changed files with 199 additions and 64 deletions

View File

@@ -24,26 +24,28 @@ RECLASS is a multi-task learning system which uses a shared BERT encoder with ta
## Repository Structure
```
## Repository Structure
```
6013/
README.md
requirements.txt
multitag/
data/
uber_reviews.csv # Raw dataset
uber_reviews_cleaned.csv # Preprocessed reviews
uber_reviews_sampled.csv # Stratified sample for annotation
uber_reviews_tagged.csv # Annotated reviews (in progress)
notebooks/
datasets_reviews.ipynb # Initial data exploration
preprocessing_uber.ipynb # Preprocessing analysis
uber_cleaned.ipynb # Cleaned data verification
src/
preprocess.py # Text cleaning and filtering pipeline
sampler.py # Stratified sampling strategies
multitag.py # GUI annotation tool
train.py # Model training (in progress)
infer.py # Inference pipeline (in progress)
README.md
.gitignore
data/
uber_reviews.csv # Raw dataset
uber_reviews_cleaned.csv # Preprocessed reviews
uber_reviews_sampled.csv # Stratified sample for annotation
uber_reviews_tagged.csv # Annotated reviews (in progress)
notebooks/
preprocessing_uber.ipynb # Preprocessing analysis
uber_cleaned.ipynb # Cleaned data verification
src/
preprocess.py # Text cleaning and filtering pipeline
sampler.py # Stratified sampling strategies
multitag.py # GUI annotation tool
train.py # Model training (in progress)
infer.py # Inference pipeline (in progress)
outputs/
figures/
```
## Current Progress