In the final episode of Game of Thrones, Tyrion Lannister asks a powerful question:
“What unites people? Armies? Gold? Flags?”
“Stories,” he answers. “There’s nothing in the world more powerful than a good story.”
That line stuck with me. Because when you think about it, every nation is built on stories—of struggle, of pride, of faith, of identity. And there’s no better place to hear a country’s story than in its national anthem.
So when I came across a dataset on Kaggle that included the lyrics of national anthems from around the world—all translated into English for ease of understanding—I immediately knew what I wanted to do. A few years ago, I watched a video by India In Pixels where he clustered countries by their anthems using TF-IDF. At the time, I found it fascinating—but I also knew we could do better.
Instead of relying on TF-IDF, which treats text more like math than meaning, I turned to a more powerful model: paraphrase-multilingual-MiniLM-L12-v2
. While the dataset was in English, this model is still great at understanding deeper semantic meanings within texts—which is exactly what I wanted when working with something as poetic and nuanced as anthem lyrics.
I also decided to use just three clusters—not because it’s “scientific,” but because it intuitively felt right. One cluster for the Western power center led by the U.S., one for China and its sphere, and a third for emerging nations like India, Brazil, and Vietnam. Again, there’s no perfect number of clusters; I simply picked a number that made sense to me.
With this, I set out on a journey—not just to group anthems by linguistic similarity, but to uncover the shared values, hopes, and identities that unite nations through song. In a way, I was looking for the kinds of stories Tyrion spoke about—the ones powerful enough to define a people.
The Approach
I used the SentenceTransformers model paraphrase-multilingual-MiniLM-L12-v2
to encode the national anthems. Even though the lyrics were all in English, this model is still excellent for capturing semantic meaning at the sentence and paragraph level.
Once I had the embeddings, I applied K-Means clustering with k=3
. The algorithm then grouped countries based on the thematic content of their anthem lyrics. What emerged was fascinating—three distinct clusters, each with a unique underlying theme.
Let’s start with the big picture before we delve into the specifics of each group. The map below illustrates how countries are clustered based on their national anthems.
Group 1: Religious Devotion
This group immediately stood out for its strong spiritual and religious undertones. Countries in this cluster include:
- Saudi Arabia
- Pakistan
- Turkey
- Lebanon
- Albania
- United States (surprisingly!)
Common themes:
- References to God, faith, and blessings
- Sacredness of the land
- Divine protection and guidance
Example – Pakistan:
“Blessed be the sacred land, Happy be the bounteous realm, Symbol of high resolve, Land of Pakistan, Blessed be thou citadel of faith”
Example – Turkey:
“Fear not! For the crimson banner that proudly ripples in this glorious twilight, shall never fade…”
Even the U.S. anthem aligns thematically:
“God shed His grace on thee…”
The clustering picked up on vocabulary and sentiment—words like “sacred,” “faith,” “divine,” which resonated across the dataset.
Group 2: Natural Heritage
This cluster was dominated by African nations, island countries, and Southeast Asian states:
- Ghana
- Kenya
- Nigeria
- Fiji
- Indonesia
- Papua New Guinea
Common themes:
- Natural beauty and land
- Ancestral pride
- Unity and tradition
Example – Ghana:
“Hail to thy name, O Ghana, To thee we make our solemn vow…”
Example – Kenya:
“O God of all creation, Bless this our land and nation, Justice be our shield and defender.”
These countries often emphasized their heritage, unity among diverse groups, and connection to land and nature.
Group 3: Democratic Ideals
This was the most ideologically diverse group, yet thematically coherent. It included:
- France
- Germany
- India
- Brazil
- Japan
- Canada
Common themes:
- Freedom and independence
- National unity
- Democratic and humanistic values
Example – France:
“Arise, children of the fatherland, The day of glory has arrived!”
Example – Brazil:
“They heard, on placid shores of the Ipiranga river, the resounding cry of a heroic people”
Example – India:
“Thou art the ruler of the minds of all people, Dispenser of India’s destiny”
These anthems emphasize progress, liberty, and the right to self-determination—reflecting both historical struggles and modern aspirations.
Final Thoughts
This clustering revealed something profound: anthems, at their core, are reflections of national identity. They echo a nation’s story—its values, struggles, and dreams.
- Religious Devotion group nations see themselves through faith and divine purpose.
- Natural Heritage group celebrates land, community, and tradition.
- Democratic Ideals group envisions freedom, unity, and progress.
Some results were surprising—the U.S. clustering with Islamic nations due to its religious tones, or India sharing a cluster with European democracies. But that’s what makes this exploration so compelling. Nations that seem different on the surface often share deep, poetic similarities.
One more interesting thing: While cleaning data, I found that Cyprus and Greece actually share the same national anthem!
In the end, I didn’t just learn about data or clustering. I discovered that when countries sing about themselves, they tell stories—stories that transcend borders, stories that unite people.
And as Tyrion said, there’s nothing more powerful than a good story.
If you’d like to explore the code or try the clustering yourself, Here are the notebook and dataset links.