Commit Graph

67 Commits

Author SHA1 Message Date
bernard-ng 9dd4f759b3 refactoring: uv 2025-10-05 18:14:15 +02:00
bernard-ng f3b06fbd07 feat: regions clusters 2025-10-03 11:58:36 +02:00
bernard-ng 912d518106 feat: support gpu 2025-09-29 22:52:08 +02:00
bernard-ng a1d500830b feat: support gpu 2025-09-29 21:07:23 +02:00
bernard-ng 9e35f95107 feat: statistics tests 2025-09-28 23:50:40 +02:00
bernard-ng 9039e9a4cf feat: statistics tests 2025-09-28 17:16:02 +02:00
bernard-ng ef4ec70fcc feat: generate names based on gender 2025-09-25 23:45:44 +02:00
bernard-ng 817081b443 feat: stabilize name analysis 2025-09-25 23:17:49 +02:00
Amaury Cansa 4874b178c9 Name Analysis (#9)
* feat: implement representative sampling by province (~500k records), extract surnames from the first token of name, build letter transition matrices (frequency and probability), add heatmap visualization for transitions, and integrate a Markov chain–based name generator.

* Implemented letter frequency analysis with histograms, computed bigram and trigram frequencies, and displayed the top results in tabular format. Rebuilt the transition probability matrix, and developed a name generator capable of producing realistic outputs based on surname data.
2025-09-24 20:23:40 +02:00
bernard-ng dda83510ac fix: add missing regions in region_mapper 2025-09-23 00:05:35 +02:00
bernard-ng c1b502c878 feat: add osm data 2025-09-21 16:23:44 +02:00
bernard-ng 63e23d6600 fix: normalize hyper params 2025-09-21 13:10:07 +02:00
bernard-ng 83d21c640b feat: add more baseline expirements 2025-09-21 00:06:01 +02:00
bernard-ng e41b15a863 feat: document models 2025-09-20 23:35:54 +02:00
bernard-ng dd2a9f2711 refactor: clean up imports and improve gender normalization method 2025-09-20 22:55:24 +02:00
bernard-ng 0816207a2c fix: use full_name feature for all models 2025-08-19 19:36:04 +02:00
bernard-ng 7101cea5e7 fix: dependencies in requirements.txt 2025-08-19 17:38:56 +02:00
bernard-ng d4e8e2a34e remove max_len from config 2025-08-19 08:04:48 +02:00
bernard-ng cab5f63809 fix: update default template path in argument parser 2025-08-17 16:31:11 +02:00
bernard-ng 33c7aceb0c feat: remove data heavy viz 2025-08-17 16:03:46 +02:00
bernard-ng b65aad6ac6 feat: add visualizations for gender, province, and name length distributions in dashboard 2025-08-17 15:52:15 +02:00
bernard-ng f70b4be6e0 feat: add NER testing interface and evaluation statistics handling 2025-08-17 15:33:16 +02:00
bernard-ng 6faf9f355e fix: NER training loop 2025-08-17 14:15:12 +02:00
bernard-ng 3122c92f5e fix: escape csv field to avoid error on empty fields 2025-08-17 13:39:19 +02:00
bernard-ng ed60f9deff docs: add instruction for NER processing 2025-08-16 22:37:39 +02:00
bernard-ng e08084797f feat: Experiment Builder 2025-08-16 22:14:55 +02:00
bernard-ng cf1cbac1a8 hotfixes 2025-08-16 20:34:45 +02:00
bernard-ng 84f7d41a84 feat: web application multipage support 2025-08-16 19:05:24 +02:00
bernard-ng 7b652d6999 hotfixes 2025-08-15 08:08:11 +02:00
bernard-ng 9601c5e44d feat: enhance logging and memory management across modules 2025-08-13 23:09:05 +02:00
bernard-ng 47e52d130c hotfixes 2025-08-12 23:17:18 +02:00
bernard-ng 3977d5c313 feat: implement NER dataset feature engineering with multiple transformation formats 2025-08-12 00:11:46 +02:00
bernard-ng d5a4aaaf4a feat: add NER annotation step and integrate into pipeline 2025-08-11 07:13:09 +02:00
bernard-ng 6d39c3afc1 feat: enhance training pipeline with research templates and experiment configuration 2025-08-08 23:48:55 +02:00
bernard-ng 96291b4ad0 refactor: update configuration loading and ensure directory existence across modules 2025-08-07 00:36:32 +02:00
bernard-ng 104d7e1146 refactor: rename setup_config_and_logging to setup_config and update references 2025-08-06 22:50:04 +02:00
bernard-ng 9338d6eab8 feat: implement unified configuration loading and logging setup across entry points 2025-08-06 22:17:02 +02:00
bernard-ng d7aa24a935 refactor: reorganize project structure and enhance model verbosity 2025-08-06 21:57:10 +02:00
Amaury Cansa ad8db43748 Add analysis and map of categories of dominant first names, surnames and middle names by province with GeoPandas (#7) 2025-08-06 08:37:36 +02:00
Amaury Cansa 80496feb99 Added full name analysis with grouping by first name, last name and postname by region, gender and former provinces. Extraction via identified_name, co-occurrence heatmaps, filtering of simple cases only, and restructuring of regional mappings, co-occurrence, and heatmaps by first name, last name and middle name. (#6) 2025-08-05 21:15:06 +02:00
bernard-ng f4689faf80 refactoring: add initial pipeline configuration and model classes 2025-08-04 16:12:25 +02:00
bernard-ng 19c66fd0ee fix: dataype 2025-07-25 10:42:02 +02:00
bernard-ng 14fc302b28 fix: eda with latest dataset 2025-07-24 19:57:51 +02:00
bernard-ng cbe3b0ecf2 feat: fix annotated datatype 2025-07-24 17:17:52 +02:00
bernard-ng 9f410ca674 refactor: fix logging 2025-07-24 14:27:54 +02:00
bernard-ng 326b854615 refactor: fix logging 2025-07-24 14:18:16 +02:00
bernard-ng 5e5e07c601 refactor: prompt engineering 2025-07-24 14:14:03 +02:00
bernard-ng 72c7007404 refactor: prompt engineering 2025-07-24 13:28:59 +02:00
bernard-ng 2b63c37f4e refactor: optimization, no need to annotate entire dataset 2025-07-24 13:16:47 +02:00
bernard-ng e2536c1899 refactor: include province and annotation pipeline 2025-07-24 12:53:51 +02:00