Malaria and Population Density in Africa


This map shows population density as well as the malaria (plasmodium falciparum) incidence rate per 1000 people in Africa in 2015. Areas of high population density are represented by vertical rises in the lines, and malaria incidence is represented by the color of the lines. The lines are grey where there was no data for malaria incidence.

Malaria is caused by plasmodium parasites, with plasmodium falciparum being the most prevalent type in Africa, and it’s responsible for most of the world’s malaria-related deaths. The map above shows that many large population centers in Africa like Addis Ababa, Cairo, and Johannesburg are at a low risk of malaria. However, there are large swaths of densely populated areas in West Africa that are at a very high risk of malaria, as well as much of Central Africa and southeast towards Mozambique.

About the map and data:

Data on malaria incidence came from the Malaria Atlas Project based out of the University of Oxford. The raster of malaria incidence plotted below was used to make the main map.


Data on population density was obtained from NASA’s Socioeconomic Data and Applications Center and had global coverage. In order to limit the population density raster to African countries a shape file containing only African countries was merged with the global population density data, creating the plot of the data below.


Using R the rasters for population density and malaria were merged to put all the needed data into one place. Then, using the rasterToPoints command in the “raster” package, the raster data was transformed into a data frame that could be plotted with ggplot. In order to create the heart monitor-like look of the map each line of latitude had to be grouped together so ggplot could know to draw lines connecting each point of longitude that lay on the same line of latitude. Population data from 10 latitude points above and 10 latitude below each line was aggregated and malaria incidence data was averaged to avoid losing any data while only drawing 1/20th of the latitude lines. The code to create this map can be found on my GitHub page.

Mapping Web Traffic: A Post on a Previous Post

I got lucky with my last post and a friend helped me publish it in Foreign Policy. In order to publish the interactive map created in D3 I had to host it on my own server, which meant I had access to more web traffic data than this lowly blog ever gets.

My hosting service doesn’t provide me with much data but there was enough to provide a pretty good sample of the hits that the Foreign Policy article received. Below are two maps of where people were viewing the article from and on which kind of device, and then some explanation on how the maps were made.


The map above shows 1000 IP locations for visitors to this article. The article is in English from an American publication which likely explains why most of the visits are from the US – 56% of visitors were from the US  – or countries where English is widely spoke. Yet there are still quite a few visitors from parts of Asia, but very few from the “global south.” The fact that Foreign Policy is not widely read there isn’t particularly surprising. Just 0.6% of the visitors are from mainland China, and 2.2% are from Hong Kong and 0.8% from Taiwan. Maybe it’s best home buyers in large Chinese cities aren’t reading another article on a potential real estate bubble.


By looking at another dataset from my server I was able to extract both IP locations and the type of device used to access the article. One reason this data is mildly interesting is because the interactive graphic in the Foreign Policy article was not viewable on mobile devices (or at least not on iPhones), so every iPhone and Android “dot” was probably confused when the article mentioned how to interact with the map. Luckily only 28% of people were using mobile devices to read the article.

About the maps and data: 

The shapefiles for the maps were downloaded from Natural Earth and loaded into R using the rgdal package.

Data from the first map was collected from Bluehost’s, my server host, awstats service. The data was already in a table so it was just pasted into a csv file. The data shows visitors from 6:00am on July 27 to 6:00am on July 28. Linking the IP addresses to a geographic location was done with an R function created by Andrew Ziem. The map image was created with ggplot:

ip_world_map <- ggplot() +
   geom_polygon(data = map, aes(long, lat, group = group)) +
   coord_equal() +
   geom_point(data = ip_df, aes(x=longitude, y = latitude, size = hits),
      color = "red", alpha = .15) +
   scale_size(range = c(.1, 3)) +
   ggtitle("Website Visitor IP Locations") +
   theme(plot.title = element_text(lineheight=.8),
      axis.ticks.y = element_blank(),
      axis.text.y = element_blank(),
      axis.ticks.x = element_blank(),
      axis.text.x = element_blank(),


Data for the second map of IP locations and devices was gathered by going to the “Latest Visitors” on my server’s cpanel and then viewing the page source. From there I was able to copy the json code into a text editor and read it into R using the rjson package. Originally each of the 385 objects from the json  file looked like this (IP address scrubbed out):

{"localtime":"7\/27\/16 7:52 PM","protocol":"HTTP\/1.1","status":"200","ip":"##.###.##.###","httpdate":"27\/Jul\/2016:19:52:52","size":"36274","timestamp":1469670772,"agent":"Mozilla\/5.0 (iPad; CPU OS 9_3_2 like Mac OS X) AppleWebKit\/601.1.46 (KHTML, like Gecko) Mobile\/13F69","url":"\/chinarealestate\/Indexed_China_Housing.csv","tz":"-0600","method":"GET","referer":"http:\/\/\/chinarealestate\/","line":999}

This wasn’t very hard to turn into a nice R friendly data frame but parsing the text to get the device was harder than expected because the device type was embedded in a string like this: “Mozilla\/5.0 (iPad; CPU OS 9_3_2 like Mac OS X) AppleWebKit\/601.1.46 (KHTML, like Gecko) Mobile\/13F69”. First I was able to extract that characters between the first set of parenthesis then take the first word in that string. This was a useful resource for figuring out how to parse text.  The same function to get geographic locations of IP address for the first map was used to get the latitude and longitude of the IP addresses for this map. And then finally ggplot was used create the map image. The data are from 3:00pm-8:00pm on July 27 and 6:00am-10:00am on July 28.

The R code can be found here on GitHub.