Commit Graph

28 Commits

Author SHA1 Message Date
Amaury Cansa 80496feb99 Added full name analysis with grouping by first name, last name and postname by region, gender and former provinces. Extraction via identified_name, co-occurrence heatmaps, filtering of simple cases only, and restructuring of regional mappings, co-occurrence, and heatmaps by first name, last name and middle name. (#6) 2025-08-05 21:15:06 +02:00
bernard-ng f4689faf80 refactoring: add initial pipeline configuration and model classes 2025-08-04 16:12:25 +02:00
bernard-ng 19c66fd0ee fix: dataype 2025-07-25 10:42:02 +02:00
bernard-ng 14fc302b28 fix: eda with latest dataset 2025-07-24 19:57:51 +02:00
bernard-ng cbe3b0ecf2 feat: fix annotated datatype 2025-07-24 17:17:52 +02:00
bernard-ng 9f410ca674 refactor: fix logging 2025-07-24 14:27:54 +02:00
bernard-ng 326b854615 refactor: fix logging 2025-07-24 14:18:16 +02:00
bernard-ng 5e5e07c601 refactor: prompt engineering 2025-07-24 14:14:03 +02:00
bernard-ng 72c7007404 refactor: prompt engineering 2025-07-24 13:28:59 +02:00
bernard-ng 2b63c37f4e refactor: optimization, no need to annotate entire dataset 2025-07-24 13:16:47 +02:00
bernard-ng e2536c1899 refactor: include province and annotation pipeline 2025-07-24 12:53:51 +02:00
1Cansa da7b09dab3 mapping of regions (educational provinces) into the current political provinces, then into 11 large former provinces to facilitate distribution (#5) 2025-07-23 23:41:30 +02:00
bernard-ng eacbb94a48 experiment: using LLM for initial annotation 2025-07-18 22:49:45 +02:00
bernard-ng 78355eb1d1 feat: add analysis exploration 2025-07-18 09:33:57 +02:00
1Cansa 1aed22016a Add functionality to display top middle names and surnames by region and sex with flexible filtering; (#4)
Implement region-based limiting for cleaner, more focused data views and visualizations
2025-07-03 11:47:23 +02:00
bernard-ng efd97911d3 feat: create evaluation dataset 2025-07-03 10:16:52 +02:00
bernard-ng 0888d94596 feat: balanced dataset loading 2025-06-30 01:32:10 +02:00
bernard-ng eb139ee09a fix: artifacts saving and dataset loading 2025-06-24 21:49:03 +02:00
bernard-ng fb95c72ab7 fix: lstm model 2025-06-24 09:40:42 +02:00
1Cansa d8980ec328 Firstnames treatment (#3)
* feat: name processing added, first name/last name/post name extraction and display of top 10 first names

* [FIX] Fix path in __init__.py and modify name analysis

* [ENH] Group first names by gender, by region, by region and gender and then group first names common to both sexes by region

* Update requirements.txt

---------

Co-authored-by: Bernard Ngandu <31113941+bernard-ng@users.noreply.github.com>
2025-06-23 15:37:48 +02:00
bernard-ng 88bb2f207e docs: add gender inference instructions 2025-06-21 10:53:02 +02:00
bernard-ng 25f1df46d8 feat: improve inference for logreg model 2025-06-21 10:35:48 +02:00
bernard-ng a46a5f7924 feat: improve inference for logreg model 2025-06-21 10:34:26 +02:00
bernard-ng 33d096f8ff fix: dataset path 2025-06-20 16:48:03 +02:00
bernard-ng b20f96a450 fix: dependencies 2025-06-20 16:45:54 +02:00
1Cansa c829cac51c Add exploratory data analysis (#1)
* feat: name processing added, first name/last name/post name extraction and display of top 10 first names

* [FIX] Fix path in __init__.py and modify name analysis

---------

Co-authored-by: Bernard Ngandu <31113941+bernard-ng@users.noreply.github.com>
2025-06-20 16:41:06 +02:00
bernard-ng 1d58e3ccc4 feat: add gender base models architectures 2025-06-20 16:38:48 +02:00
bernard-ng f454ba7938 Initial commit 2025-06-19 18:45:11 +02:00