afe61eaaa2
Merge branch 'main' of github.com:charlie-rasberry/6013
charlie-rasberry2026-03-07 18:52:38 +00:00
5206e62d95
Analysis started, almost complete - compiled some excel sheets from the csv output with notes. Started infer.py, nothing major implemented yet
charlie-rasberry2026-03-07 18:51:15 +00:00
a8aaa077c7
Delete src/__pycache__ directory
Charlie Rasberry
2026-02-26 20:39:59 +00:00
96a0c45e84
Added implementation for single task roberta, using args for everything made it simple
charlie-rasberry2026-02-26 18:21:13 +00:00
01e2142276
Fixed a few issues with performance data collection and debugging output, mtl training is ready, moving on to single-task training to compare in write-up
charlie-rasberry2026-02-26 17:40:37 +00:00
4f0c54fe28
Added training loop for the MTL architecture on the original distribution
charlie-rasberry2026-02-23 16:26:48 +00:00
7bd68108d0
Implemented initial training structure, adding further logic soon including loss, stopping, optimisation and loop
charlie-rasberry2026-02-23 12:54:23 +00:00
76d9b8509b
Model almost complete, need to work on loss functions soon
charlie-rasberry2026-02-20 19:17:22 +00:00
cccd91a680
Small bit of progress towards model.py, now building forward()
charlie-rasberry2026-02-20 18:18:17 +00:00
61df4e3e26
Implemented dataset.py which tokenises and returns tensors, ready to load the model now
charlie-rasberry2026-02-19 22:10:25 +00:00
19c0d4bce3
Started dataset.py, added the ReviewDataset class and implemented the __init__, __len__ and __getitem__ methods. The __getitem__ method currently just returns the review text, but will be updated to return the tokenized review as a tensor
charlie-rasberry2026-02-19 18:45:55 +00:00
19bcf2aa18
Started dataset.py, added the ReviewDataset class and implemented the __init__, __len__ and __getitem__ methods. The __getitem__ method currently just returns the review text, but will be updated to return the tokenized review as a tensor
charlie-rasberry2026-02-19 18:41:37 +00:00
c5e91b79b2
Decided on max_length by finding out how many and which reviews would be truncated (it will be 256 tokens)
charlie-rasberry2026-02-19 01:28:10 +00:00
0be7da2dde
Finally processed the data fully and tested. Moving on to dataset.py and model.py
charlie-rasberry2026-02-19 00:44:36 +00:00
608588f023
Preprocessed tagged datasets, fixed CSV formatting issues, and added integrity checks. Also saved mappings for later inference use.
charlie-rasberry2026-02-18 22:36:58 +00:00
5b9fbfc75e
data processing pipeline now finished just need to annotate reviews
charlie-rasberry2025-11-22 09:41:12 +00:00
45ec02fa46
Moving on to multitag.py, sampling complete I think
charlie-rasberry
2025-11-12 06:21:16 +00:00
2cbdd55243
Fixed get_stratified_sample() and replace broken x() with actual working logic, added sample_with_keywords().
charlie-rasberry
2025-11-12 02:05:20 +00:00
4d6e2511e6
Added multitag, includes preprocess.py, sampler.py and multitag.py(the main gui for labelling/annotation)
charlie-rasberry2025-11-06 17:40:29 +00:00