Address Twins – Postscript

Well!
You're Winner!
So my previous blog post ended up being quite popular on Twitter. I mean, not Justin Bieber levels of popular, but for the sake of context, when I tweeted about my Isle Of Wight Map Names blog post, it got precisely one ‘like’ and one retweet.

So, when I got up to 40 ‘likes’ I thought “Yes! Fame at last!” But then the ‘likes’ and retweets kept coming, until they gradually started to tail off.

And then data-god Ben Goldacre tweeted about it (and spoilered the big reveal! Seriously, if you’re not up to date with Game Of Thrones, don’t talk to Ben!). At that point it went a bit silly, and now my website’s Google Analytics look like this…

Google Analytics graph showing huge increase in hits

The most popular subject of comments tended to be centred around time. How much time did I spend? Do I have too much spare time on my hands? What is time anyway?

So I thought I’d write a follow-up post. Unfortunately, life stuff got in the way – as it is wont to do – and now it’s over four months later (time eh?). But here it is anyway!


The first thing I should explain is that when I said something like “I removed uniquely-named streets, and calculated the distance between any two streets with the same name.” what I meant was that my computer did this.

Reading my post back, I realised that if someone was unfamiliar with processing lots of data, they might form the impression that I did all this work by hand.

I didn’t.

What generally happened was I’d spend somewhere up to five minutes writing a script to do something, and then I’d set the computer off doing the work while I did the washing up, or went to the shops, or read a book, or just did something else.

And then, with the resulting output, I’d run a new script to do the next stage, etc.

TECH INFO:

I did all the processing on tab-delimited data files using PHP, because I know it. Also, I ran everything off a virtual RAM drive for speed and to avoid wear-and-tear on my hard drive.

I’m a self-taught programmer/data nerd/etc. so my methods are possibly a bit hacky!

So when it came to using the HM Land Registry data, I personally didn’t really spend much time on it. It was mostly just my PC churning data while I got on with my life. The most data-intensive part was geolocating all the postcodes. Even optimising the process as much as possible this took the computer hours. I started the script running in the evening and it was still going when I went to bed, so I left it running. It had finished by the morning. When I said “All this took ages”, I meant that in the context of “My computer was churning data for 10 hours”.

Even the results didn’t take much checking. All the problems I mentioned about houses appearing under two different postcodes, and new-built houses having the wrong postcodes, and so on, only applied to pairs of streets which were in the same postcode area. Because I started checking streets which were in different areas, the ultimate result was literally about the second pair I checked.

If I’d started out using this data, I’d have finished the project pretty quickly, including getting my housework done!

…but I didn’t.


So, I mentioned errors in the Ordnance Survey data a few times, and some people got the impression I was slagging off the OS. Well, I wasn’t. I love the OS!

The thing is, with both this project and my previous street/town names post I was using the data in a really unorthodox way. The OS manages such a huge amount of data that errors are inevitable, but they can’t be corrected until someone discovers them. If an error is of a type which doesn’t show up during ‘normal’ usage of the data, it might never be found.

The process for the OS data was pretty much identical. The only difference was when I had to check the results by hand. If I’d known how long this would take, I wouldn’t have started it – although, I did optimise the process somewhat.

I find it difficult to believe anyone actually cares about the details here, but basically it involved creating an HTML page with a list of street names, with two map links per name. I then went down the list in batches while listening to podcasts, just opening the two links, seeing if it was actually the same street twice, and then closing them again. Since visited links changed colour, that served as a record of which ones I’d done. I decided to do this while listening to podcasts because I listen to podcasts anyway, so I wasn’t wasting time I could be using for something else.


Too much spare time?

A common thread in the responses to my post was disbelief that I’d spend so much time on this. Some of these responses were jokey from people who were clearly fellow nerds, and possibly recognised something of themselves in my madness, but other people seemed to genuinely think I’m a bit mentally ill.

I’ve never understood the whole “you have too much free time” sentiment. People only ever seem to say it about nerdy pastimes, even in jest.

If someone watched all 32 hours (per series) of X-Factor, that’s seen as an acceptable use of their time. But if they built a large-scale model of Lincoln Cathedral out of matchsticks, somebody would joke that they have too much spare time on their hands – even if they built it while watching X-Factor.

A journalist from the Manchester Evening News asked me why I did this. I know that he was just prompting me for quotes he could use, but I’d never actually stopped to consider why I was doing it. I mean, I was curious about something so I found out the answer. Don’t other people do that?

I experience a similar thing with my website, british-film-locations.com, for which I track down movie filming locations in the British Isles. People can’t believe how much work I’ve put into the site for seemingly no reward (including the producer of the TV show QI, who you’d expect to be used to people who’re obsessed with tracking down information!). For me, finding the information is the reward.

I have dozens of ongoing personal ‘projects’ – things I’m trying to research, or find answers to. Not important things in the grand scheme of things, just things which piqued my interest. Some of them will, I imagine, end up as future blog posts. It never really occurred to me that this might be considered weird.

This project consisted almost entirely of things I enjoy doing: recreational programming, creative problem solving, finding out new and quirky things about the world. So as strange as it might sound, this is recreation for me – in the same way watching X-Factor is for other people.

The only difference is that my hobby got me a fleeting fifteen seconds of mild fame! Plus some money from a major magazine, a two-page article in the Manchester Evening News, my own tag on Neatorama, and Twitter follows from some of my major nerd/data heroes (who I’m fairly sure are going to unfollow me again once they realise their mistake)!

So yeah. Being curious about the world – even its mundane aspects – should be the norm. If we were all brought up to be more curious, and were given the tools to find our own answers, the world might not be in quite the mess it currently finds itself.

Anyway, that’s maybe straying too far into profundity. I hope to be a bit more prolific in my writing output this year. I currently have ideas for about a dozen future blog posts, all on random topics. I may get around to actually writing some of them. If I do, I hope you like them.

Leave a Reply

Your email address will not be published. Required fields are marked *