docs: add instruction for NER processing

This commit is contained in:
2025-08-16 22:37:39 +02:00
parent e08084797f
commit ed60f9deff
+31 -3
View File
@@ -56,8 +56,6 @@ the `drc-ners-nlp/config/pipeline.yaml` file.
stages: stages:
- "data_cleaning" - "data_cleaning"
- "feature_extraction" - "feature_extraction"
- "ner_annotation"
- "llm_annotation"
- "data_splitting" - "data_splitting"
``` ```
@@ -67,6 +65,36 @@ stages:
python main.py --env development python main.py --env development
``` ```
## NER Processing
This project implements a custom named entity recognition (NER) pipeline tailored for Congolese names.
Its main objective is to accurately identify and tag the different components of a Congolese name,
specifically distinguishing between the native part and the surname.
```bash
python ner.py --env development
```
Once you've built and train the NER model you can use it to annotate **CoMPOSE** name in the original dataset
**Running the Pipeline with NER Annotation**
```yaml
stages:
- "data_cleaning"
- "feature_extraction"
- "ner_annotation"
- "data_splitting"
```
**Running the Pipeline with LLM Annotation**
```yaml
stages:
- "data_cleaning"
- "feature_extraction"
- "llm_annotation"
- "data_splitting"
```
## Experiments ## Experiments
This project provides a modular experiment (model training and evaluation) framework for systematic model comparison and This project provides a modular experiment (model training and evaluation) framework for systematic model comparison and
@@ -100,7 +128,7 @@ experiments and make predictions without needing to understand the underlying co
### Running the Web Interface ### Running the Web Interface
```bash ```bash
streamlit run app.py streamlit run web/app.py
``` ```
## Contributors ## Contributors