Commit Graph

18 Commits

Author SHA1 Message Date
bernard-ng 9039e9a4cf feat: statistics tests 2025-09-28 17:16:02 +02:00
bernard-ng ef4ec70fcc feat: generate names based on gender 2025-09-25 23:45:44 +02:00
bernard-ng 817081b443 feat: stabilize name analysis 2025-09-25 23:17:49 +02:00
Amaury Cansa 4874b178c9 Name Analysis (#9)
* feat: implement representative sampling by province (~500k records), extract surnames from the first token of name, build letter transition matrices (frequency and probability), add heatmap visualization for transitions, and integrate a Markov chain–based name generator.

* Implemented letter frequency analysis with histograms, computed bigram and trigram frequencies, and displayed the top results in tabular format. Rebuilt the transition probability matrix, and developed a name generator capable of producing realistic outputs based on surname data.
2025-09-24 20:23:40 +02:00
bernard-ng dda83510ac fix: add missing regions in region_mapper 2025-09-23 00:05:35 +02:00
bernard-ng c1b502c878 feat: add osm data 2025-09-21 16:23:44 +02:00
bernard-ng dd2a9f2711 refactor: clean up imports and improve gender normalization method 2025-09-20 22:55:24 +02:00
Amaury Cansa ad8db43748 Add analysis and map of categories of dominant first names, surnames and middle names by province with GeoPandas (#7) 2025-08-06 08:37:36 +02:00
Amaury Cansa 80496feb99 Added full name analysis with grouping by first name, last name and postname by region, gender and former provinces. Extraction via identified_name, co-occurrence heatmaps, filtering of simple cases only, and restructuring of regional mappings, co-occurrence, and heatmaps by first name, last name and middle name. (#6) 2025-08-05 21:15:06 +02:00
bernard-ng 19c66fd0ee fix: dataype 2025-07-25 10:42:02 +02:00
bernard-ng 14fc302b28 fix: eda with latest dataset 2025-07-24 19:57:51 +02:00
1Cansa da7b09dab3 mapping of regions (educational provinces) into the current political provinces, then into 11 large former provinces to facilitate distribution (#5) 2025-07-23 23:41:30 +02:00
bernard-ng 78355eb1d1 feat: add analysis exploration 2025-07-18 09:33:57 +02:00
1Cansa 1aed22016a Add functionality to display top middle names and surnames by region and sex with flexible filtering; (#4)
Implement region-based limiting for cleaner, more focused data views and visualizations
2025-07-03 11:47:23 +02:00
1Cansa d8980ec328 Firstnames treatment (#3)
* feat: name processing added, first name/last name/post name extraction and display of top 10 first names

* [FIX] Fix path in __init__.py and modify name analysis

* [ENH] Group first names by gender, by region, by region and gender and then group first names common to both sexes by region

* Update requirements.txt

---------

Co-authored-by: Bernard Ngandu <31113941+bernard-ng@users.noreply.github.com>
2025-06-23 15:37:48 +02:00
bernard-ng 33d096f8ff fix: dataset path 2025-06-20 16:48:03 +02:00
bernard-ng b20f96a450 fix: dependencies 2025-06-20 16:45:54 +02:00
1Cansa c829cac51c Add exploratory data analysis (#1)
* feat: name processing added, first name/last name/post name extraction and display of top 10 first names

* [FIX] Fix path in __init__.py and modify name analysis

---------

Co-authored-by: Bernard Ngandu <31113941+bernard-ng@users.noreply.github.com>
2025-06-20 16:41:06 +02:00