diff --git a/README.md b/README.md index a338236..b6017a0 100644 --- a/README.md +++ b/README.md @@ -56,8 +56,6 @@ the `drc-ners-nlp/config/pipeline.yaml` file. stages: - "data_cleaning" - "feature_extraction" - - "ner_annotation" - - "llm_annotation" - "data_splitting" ``` @@ -67,6 +65,36 @@ stages: python main.py --env development ``` +## NER Processing + +This project implements a custom named entity recognition (NER) pipeline tailored for Congolese names. +Its main objective is to accurately identify and tag the different components of a Congolese name, +specifically distinguishing between the native part and the surname. + +```bash +python ner.py --env development +``` + +Once you've built and train the NER model you can use it to annotate **CoMPOSE** name in the original dataset + +**Running the Pipeline with NER Annotation** +```yaml +stages: + - "data_cleaning" + - "feature_extraction" + - "ner_annotation" + - "data_splitting" +``` + +**Running the Pipeline with LLM Annotation** +```yaml +stages: + - "data_cleaning" + - "feature_extraction" + - "llm_annotation" + - "data_splitting" +``` + ## Experiments This project provides a modular experiment (model training and evaluation) framework for systematic model comparison and @@ -100,7 +128,7 @@ experiments and make predictions without needing to understand the underlying co ### Running the Web Interface ```bash -streamlit run app.py +streamlit run web/app.py ``` ## Contributors