I've been working on some mapping stuff for an internet business. I love working with maps and data - there's a real geeky pleasure to seeing numbers translate to places, making sense of data in the real world. The website in question is on the front page of Google for almost all of its target searches and these are all geographical ("event catering in Warwickshire", for example). In the back end of the website, there are 400 or so locations around the UK that form the basis of these targeted Google search results. My job is to look at the geographical coverage of the site and identify if there are missing areas. I started off by extracting the 400 or so postcodes from the back end of the site, then moving these over to Google Maps - "My Maps" CSV import feature makes this sort of thing brilliantly easy.
There we are. Of course, given any pile of geographical data, you can produce something like this which is brilliant for reports, funding bids and so on - this is why I'm such an advocate, for good, basic data. If you're a small charity, recording the home postcode of everyone you work with can lead to really professional-looking reports.
Anyway - from this we can see gaps in the coverage - some areas of Wales, Skegness, and so on. This may not really matter, because there might not be many target customers in those areas anyway - that's for the website owners to decide. The next step is to compare this with real geographical areas, the most important one being counties and major cities. It's surprisingly easy to get the data to accurately draw counties - you can get it from the ONS in KML format. Essentially this is a big list of points to draw on the map and then join up, giving you a county area - like so (this is Buckinghamshire).
The trouble is that there are so many points that the text file which contains all of this information is about 20Mb, which isn't much in general terms but is massive for a text file, and google maps will only let you add KML files up to 5Mb. One option is to separate out the counties individually, but doing this would draw so many points on the map that the map as a whole gets very slow and unresponsive. To give you an idea, the file to draw Buckinghamshire contained about 15,000 coordinate pairs. What I needed was a way of smoothing the shape.
The points are in a list, like this...
...latitude and longitude pairs. I worked out that removing every other point would therefore halve the file size but keep the shape basically the same (until you zoom right in, which I'm not interested in anyway). So how to discard every other point?
It turns out this is rather difficult. You can do it with Excel macros, but (I did warn you this was geeky) I did it with the Unix utility sed. Working on a Mac, I used Terminal, then the following command:
sed -n 'n;p' a.txt > b.txt
Which means "take a.txt, go through it, discard every even numbered line, then write the output to a new file called b.txt".
In fact, I did this process three times, ending up with d.txt which is approximately one-eighth the size of the original list of points.
I'm working through the list of 27 counties at the moment. If you want a copy of this KML file then get in touch with me on Twitter (@data_gardener) and I'll send it over!
Here's Cambridgeshire treated in this fashion. As you can see, at the zoom levels we need, there's no noticeable loss of resolution, but the file size has been greatly reduced.
Until next time, if I haven't put you off reading my blog entirely :D