drc-ners-nlp

Author	SHA1	Message	Date
bernard-ng	0816207a2c	fix: use full_name feature for all models	2025-08-19 19:36:04 +02:00
bernard-ng	7101cea5e7	fix: dependencies in requirements.txt	2025-08-19 17:38:56 +02:00
bernard-ng	d4e8e2a34e	remove max_len from config	2025-08-19 08:04:48 +02:00
bernard-ng	cab5f63809	fix: update default template path in argument parser	2025-08-17 16:31:11 +02:00
bernard-ng	33c7aceb0c	feat: remove data heavy viz	2025-08-17 16:03:46 +02:00
bernard-ng	b65aad6ac6	feat: add visualizations for gender, province, and name length distributions in dashboard	2025-08-17 15:52:15 +02:00
bernard-ng	f70b4be6e0	feat: add NER testing interface and evaluation statistics handling	2025-08-17 15:33:16 +02:00
bernard-ng	6faf9f355e	fix: NER training loop	2025-08-17 14:15:12 +02:00
bernard-ng	3122c92f5e	fix: escape csv field to avoid error on empty fields	2025-08-17 13:39:19 +02:00
bernard-ng	ed60f9deff	docs: add instruction for NER processing	2025-08-16 22:37:39 +02:00
bernard-ng	e08084797f	feat: Experiment Builder	2025-08-16 22:14:55 +02:00
bernard-ng	cf1cbac1a8	hotfixes	2025-08-16 20:34:45 +02:00
bernard-ng	84f7d41a84	feat: web application multipage support	2025-08-16 19:05:24 +02:00
bernard-ng	7b652d6999	hotfixes	2025-08-15 08:08:11 +02:00
bernard-ng	9601c5e44d	feat: enhance logging and memory management across modules	2025-08-13 23:09:05 +02:00
bernard-ng	47e52d130c	hotfixes	2025-08-12 23:17:18 +02:00
bernard-ng	3977d5c313	feat: implement NER dataset feature engineering with multiple transformation formats	2025-08-12 00:11:46 +02:00
bernard-ng	d5a4aaaf4a	feat: add NER annotation step and integrate into pipeline	2025-08-11 07:13:09 +02:00
bernard-ng	6d39c3afc1	feat: enhance training pipeline with research templates and experiment configuration	2025-08-08 23:48:55 +02:00
bernard-ng	96291b4ad0	refactor: update configuration loading and ensure directory existence across modules	2025-08-07 00:36:32 +02:00
bernard-ng	104d7e1146	refactor: rename setup_config_and_logging to setup_config and update references	2025-08-06 22:50:04 +02:00
bernard-ng	9338d6eab8	feat: implement unified configuration loading and logging setup across entry points	2025-08-06 22:17:02 +02:00
bernard-ng	d7aa24a935	refactor: reorganize project structure and enhance model verbosity	2025-08-06 21:57:10 +02:00
Amaury Cansa	ad8db43748	Add analysis and map of categories of dominant first names, surnames and middle names by province with GeoPandas (#7 )	2025-08-06 08:37:36 +02:00
Amaury Cansa	80496feb99	Added full name analysis with grouping by first name, last name and postname by region, gender and former provinces. Extraction via identified_name, co-occurrence heatmaps, filtering of simple cases only, and restructuring of regional mappings, co-occurrence, and heatmaps by first name, last name and middle name. (#6 )	2025-08-05 21:15:06 +02:00
bernard-ng	f4689faf80	refactoring: add initial pipeline configuration and model classes	2025-08-04 16:12:25 +02:00
bernard-ng	19c66fd0ee	fix: dataype	2025-07-25 10:42:02 +02:00
bernard-ng	14fc302b28	fix: eda with latest dataset	2025-07-24 19:57:51 +02:00
bernard-ng	cbe3b0ecf2	feat: fix annotated datatype	2025-07-24 17:17:52 +02:00
bernard-ng	9f410ca674	refactor: fix logging	2025-07-24 14:27:54 +02:00
bernard-ng	326b854615	refactor: fix logging	2025-07-24 14:18:16 +02:00
bernard-ng	5e5e07c601	refactor: prompt engineering	2025-07-24 14:14:03 +02:00
bernard-ng	72c7007404	refactor: prompt engineering	2025-07-24 13:28:59 +02:00
bernard-ng	2b63c37f4e	refactor: optimization, no need to annotate entire dataset	2025-07-24 13:16:47 +02:00
bernard-ng	e2536c1899	refactor: include province and annotation pipeline	2025-07-24 12:53:51 +02:00
1Cansa	da7b09dab3	mapping of regions (educational provinces) into the current political provinces, then into 11 large former provinces to facilitate distribution (#5 )	2025-07-23 23:41:30 +02:00
bernard-ng	eacbb94a48	experiment: using LLM for initial annotation	2025-07-18 22:49:45 +02:00
bernard-ng	78355eb1d1	feat: add analysis exploration	2025-07-18 09:33:57 +02:00
1Cansa	1aed22016a	Add functionality to display top middle names and surnames by region and sex with flexible filtering; (#4 ) Implement region-based limiting for cleaner, more focused data views and visualizations	2025-07-03 11:47:23 +02:00
bernard-ng	efd97911d3	feat: create evaluation dataset	2025-07-03 10:16:52 +02:00
bernard-ng	0888d94596	feat: balanced dataset loading	2025-06-30 01:32:10 +02:00
bernard-ng	eb139ee09a	fix: artifacts saving and dataset loading	2025-06-24 21:49:03 +02:00
bernard-ng	fb95c72ab7	fix: lstm model	2025-06-24 09:40:42 +02:00
1Cansa	d8980ec328	Firstnames treatment (#3 ) * feat: name processing added, first name/last name/post name extraction and display of top 10 first names * [FIX] Fix path in __init__.py and modify name analysis * [ENH] Group first names by gender, by region, by region and gender and then group first names common to both sexes by region * Update requirements.txt --------- Co-authored-by: Bernard Ngandu <31113941+bernard-ng@users.noreply.github.com>	2025-06-23 15:37:48 +02:00
bernard-ng	88bb2f207e	docs: add gender inference instructions	2025-06-21 10:53:02 +02:00
bernard-ng	25f1df46d8	feat: improve inference for logreg model	2025-06-21 10:35:48 +02:00
bernard-ng	a46a5f7924	feat: improve inference for logreg model	2025-06-21 10:34:26 +02:00
bernard-ng	33d096f8ff	fix: dataset path	2025-06-20 16:48:03 +02:00
bernard-ng	b20f96a450	fix: dependencies	2025-06-20 16:45:54 +02:00
1Cansa	c829cac51c	Add exploratory data analysis (#1 ) * feat: name processing added, first name/last name/post name extraction and display of top 10 first names * [FIX] Fix path in __init__.py and modify name analysis --------- Co-authored-by: Bernard Ngandu <31113941+bernard-ng@users.noreply.github.com>	2025-06-20 16:41:06 +02:00

1 2

52 Commits