The World of Wikipedia

Algorithms - Learn How to Make Your Own Maps

We start with raw, vectorized data representing Wikipedia articles, which come from Wikibrain. Each article is represented by a 100-dimensional vector that captures its relationship to other articles (via word occurrence and linking context). These vectors are then clustered using the k-means algorithm. These clusters later become the colored countries that are seen on the map.

The next step is to move from 100-dimensional vector space to a 2-dimensional embedding that we can represent visually. We do this using tsne, which stands for t-Distributed Stochastic Neighbor Embedding. This looks something like this:

From there, we draw the country borders. This involves setting a "water level", which floods water points into the less-dense regions of the map, creating lakes and coastlines. Outlier points are thrown out before border generation in order to creater neater, more coherent countries.

Next, we generate the contours. There are two options: density-based, which is a straightforward contouring where the brighter areas represent higher density, and centroid-based, which draws the contours based on the center of a cluster in 100-dimensional space - the closer to the center the brighter the color.

The final generation step involves labeling the map. Labels are generally handled by mapnik, our mapping library, but we wrote the code to determine which labels show up as you zoom in, and also to determine their sizing based off popularity.

User interaction is done in Leaflet.js, a JavaScript mapping library with a wide selection of plugins to enable functions like search and interactivity.

If you'd like to know more or want to check out our code, fork us on Github!

Fork us!

Welcome to the World of Cartograph

Explore some fun examples

Classic Movies

Sustainable Banking

R & B

Meta-Wikipedia

About Us

Who We Are

What We Do

Algorithms - Learn How to Make Your Own Maps