Name Analysis (#9)

* feat: implement representative sampling by province (~500k records), extract surnames from the first token of name, build letter transition matrices (frequency and probability), add heatmap visualization for transitions, and integrate a Markov chain–based name generator.

* Implemented letter frequency analysis with histograms, computed bigram and trigram frequencies, and displayed the top results in tabular format. Rebuilt the transition probability matrix, and developed a name generator capable of producing realistic outputs based on surname data.

This commit is contained in:

Amaury Cansa

2025-09-24 20:23:40 +02:00

committed by

GitHub

parent dda83510ac

commit 4874b178c9

2 changed files with 2229 additions and 219 deletions

notebooks/eda.ipynb

Vendored

+227 -219

View File

File diff suppressed because one or more lines are too long

notebooks/provinces_stats.ipynb

Vendored

+2002

View File

File diff suppressed because one or more lines are too long