bernard-ng
e41b15a863
feat: document models
2025-09-20 23:35:54 +02:00
bernard-ng
dd2a9f2711
refactor: clean up imports and improve gender normalization method
2025-09-20 22:55:24 +02:00
bernard-ng
0816207a2c
fix: use full_name feature for all models
2025-08-19 19:36:04 +02:00
bernard-ng
7101cea5e7
fix: dependencies in requirements.txt
2025-08-19 17:38:56 +02:00
bernard-ng
d4e8e2a34e
remove max_len from config
2025-08-19 08:04:48 +02:00
bernard-ng
cab5f63809
fix: update default template path in argument parser
2025-08-17 16:31:11 +02:00
bernard-ng
33c7aceb0c
feat: remove data heavy viz
2025-08-17 16:03:46 +02:00
bernard-ng
b65aad6ac6
feat: add visualizations for gender, province, and name length distributions in dashboard
2025-08-17 15:52:15 +02:00
bernard-ng
f70b4be6e0
feat: add NER testing interface and evaluation statistics handling
2025-08-17 15:33:16 +02:00
bernard-ng
6faf9f355e
fix: NER training loop
2025-08-17 14:15:12 +02:00
bernard-ng
3122c92f5e
fix: escape csv field to avoid error on empty fields
2025-08-17 13:39:19 +02:00
bernard-ng
ed60f9deff
docs: add instruction for NER processing
2025-08-16 22:37:39 +02:00
bernard-ng
e08084797f
feat: Experiment Builder
2025-08-16 22:14:55 +02:00
bernard-ng
cf1cbac1a8
hotfixes
2025-08-16 20:34:45 +02:00
bernard-ng
84f7d41a84
feat: web application multipage support
2025-08-16 19:05:24 +02:00
bernard-ng
7b652d6999
hotfixes
2025-08-15 08:08:11 +02:00
bernard-ng
9601c5e44d
feat: enhance logging and memory management across modules
2025-08-13 23:09:05 +02:00
bernard-ng
47e52d130c
hotfixes
2025-08-12 23:17:18 +02:00
bernard-ng
3977d5c313
feat: implement NER dataset feature engineering with multiple transformation formats
2025-08-12 00:11:46 +02:00
bernard-ng
d5a4aaaf4a
feat: add NER annotation step and integrate into pipeline
2025-08-11 07:13:09 +02:00
bernard-ng
6d39c3afc1
feat: enhance training pipeline with research templates and experiment configuration
2025-08-08 23:48:55 +02:00
bernard-ng
96291b4ad0
refactor: update configuration loading and ensure directory existence across modules
2025-08-07 00:36:32 +02:00
bernard-ng
104d7e1146
refactor: rename setup_config_and_logging to setup_config and update references
2025-08-06 22:50:04 +02:00
bernard-ng
9338d6eab8
feat: implement unified configuration loading and logging setup across entry points
2025-08-06 22:17:02 +02:00
bernard-ng
d7aa24a935
refactor: reorganize project structure and enhance model verbosity
2025-08-06 21:57:10 +02:00
Amaury Cansa
ad8db43748
Add analysis and map of categories of dominant first names, surnames and middle names by province with GeoPandas ( #7 )
2025-08-06 08:37:36 +02:00
Amaury Cansa
80496feb99
Added full name analysis with grouping by first name, last name and postname by region, gender and former provinces. Extraction via identified_name, co-occurrence heatmaps, filtering of simple cases only, and restructuring of regional mappings, co-occurrence, and heatmaps by first name, last name and middle name. ( #6 )
2025-08-05 21:15:06 +02:00
bernard-ng
f4689faf80
refactoring: add initial pipeline configuration and model classes
2025-08-04 16:12:25 +02:00
bernard-ng
19c66fd0ee
fix: dataype
2025-07-25 10:42:02 +02:00
bernard-ng
14fc302b28
fix: eda with latest dataset
2025-07-24 19:57:51 +02:00
bernard-ng
cbe3b0ecf2
feat: fix annotated datatype
2025-07-24 17:17:52 +02:00
bernard-ng
9f410ca674
refactor: fix logging
2025-07-24 14:27:54 +02:00
bernard-ng
326b854615
refactor: fix logging
2025-07-24 14:18:16 +02:00
bernard-ng
5e5e07c601
refactor: prompt engineering
2025-07-24 14:14:03 +02:00
bernard-ng
72c7007404
refactor: prompt engineering
2025-07-24 13:28:59 +02:00
bernard-ng
2b63c37f4e
refactor: optimization, no need to annotate entire dataset
2025-07-24 13:16:47 +02:00
bernard-ng
e2536c1899
refactor: include province and annotation pipeline
2025-07-24 12:53:51 +02:00
1Cansa
da7b09dab3
mapping of regions (educational provinces) into the current political provinces, then into 11 large former provinces to facilitate distribution ( #5 )
2025-07-23 23:41:30 +02:00
bernard-ng
eacbb94a48
experiment: using LLM for initial annotation
2025-07-18 22:49:45 +02:00
bernard-ng
78355eb1d1
feat: add analysis exploration
2025-07-18 09:33:57 +02:00
1Cansa
1aed22016a
Add functionality to display top middle names and surnames by region and sex with flexible filtering; ( #4 )
...
Implement region-based limiting for cleaner, more focused data views and visualizations
2025-07-03 11:47:23 +02:00
bernard-ng
efd97911d3
feat: create evaluation dataset
2025-07-03 10:16:52 +02:00
bernard-ng
0888d94596
feat: balanced dataset loading
2025-06-30 01:32:10 +02:00
bernard-ng
eb139ee09a
fix: artifacts saving and dataset loading
2025-06-24 21:49:03 +02:00
bernard-ng
fb95c72ab7
fix: lstm model
2025-06-24 09:40:42 +02:00
1Cansa
d8980ec328
Firstnames treatment ( #3 )
...
* feat: name processing added, first name/last name/post name extraction and display of top 10 first names
* [FIX] Fix path in __init__.py and modify name analysis
* [ENH] Group first names by gender, by region, by region and gender and then group first names common to both sexes by region
* Update requirements.txt
---------
Co-authored-by: Bernard Ngandu <31113941+bernard-ng@users.noreply.github.com >
2025-06-23 15:37:48 +02:00
bernard-ng
88bb2f207e
docs: add gender inference instructions
2025-06-21 10:53:02 +02:00
bernard-ng
25f1df46d8
feat: improve inference for logreg model
2025-06-21 10:35:48 +02:00
bernard-ng
a46a5f7924
feat: improve inference for logreg model
2025-06-21 10:34:26 +02:00
bernard-ng
33d096f8ff
fix: dataset path
2025-06-20 16:48:03 +02:00