fc469a037e
feat: implementation of transition matrices (P_male, P_female, P_both) by province and activation of synthetic name generation by province, as well as analysis of letter, 3-gram, 4-gram and 5-gram frequencies.
feature/name-analysis-region
amaury
2025-09-27 03:32:25 +02:00
773ebf32c6
Adding surname transition analysis with Markov models, frequency studies, and visualizations, including cleaned surname preprocessing, province sampling, bigram/trigram stats, and male–female transition comparisons
amaury
2025-09-26 13:20:37 +02:00
ef4ec70fcc
feat: generate names based on gender
bernard-ng2025-09-25 23:45:44 +02:00
3977d5c313
feat: implement NER dataset feature engineering with multiple transformation formats
bernard-ng2025-08-12 00:11:46 +02:00
d5a4aaaf4a
feat: add NER annotation step and integrate into pipeline
bernard-ng2025-08-11 07:13:09 +02:00
6d39c3afc1
feat: enhance training pipeline with research templates and experiment configuration
bernard-ng2025-08-08 23:48:55 +02:00
96291b4ad0
refactor: update configuration loading and ensure directory existence across modules
bernard-ng2025-08-07 00:36:32 +02:00
104d7e1146
refactor: rename setup_config_and_logging to setup_config and update references
bernard-ng2025-08-06 22:50:04 +02:00
9338d6eab8
feat: implement unified configuration loading and logging setup across entry points
bernard-ng2025-08-06 22:17:02 +02:00
d7aa24a935
refactor: reorganize project structure and enhance model verbosity
bernard-ng2025-08-06 21:57:10 +02:00
ad8db43748
Add analysis and map of categories of dominant first names, surnames and middle names by province with GeoPandas (#7)
Amaury Cansa
2025-08-06 08:37:36 +02:00
80496feb99
Added full name analysis with grouping by first name, last name and postname by region, gender and former provinces. Extraction via identified_name, co-occurrence heatmaps, filtering of simple cases only, and restructuring of regional mappings, co-occurrence, and heatmaps by first name, last name and middle name. (#6)
Amaury Cansa
2025-08-05 21:15:06 +02:00
f4689faf80
refactoring: add initial pipeline configuration and model classes
bernard-ng2025-08-04 16:12:25 +02:00
2b63c37f4e
refactor: optimization, no need to annotate entire dataset
bernard-ng2025-07-24 13:16:47 +02:00
e2536c1899
refactor: include province and annotation pipeline
bernard-ng2025-07-24 12:50:30 +02:00
da7b09dab3
mapping of regions (educational provinces) into the current political provinces, then into 11 large former provinces to facilitate distribution (#5)
1Cansa
2025-07-23 23:41:30 +02:00
eacbb94a48
experiment: using LLM for initial annotation
bernard-ng2025-07-18 22:49:45 +02:00
1aed22016a
Add functionality to display top middle names and surnames by region and sex with flexible filtering; (#4)
1Cansa
2025-07-03 11:47:23 +02:00