You are currently browsing the tag archive for the ‘OSM’ tag.

OSM LogoYears back (while I was still in college) I did a few projects that entailed a lot of time studying the small town of Deerfield, MA (often referred to as the most over-studied small town in America).  During this process, my advisor (Bob) introduced me to what was known as The Deerfield Lunch. The Deerfield Lunch was an irregular brown-bag affair, the attendees of which tended to fluctuate for a variety of reasons, the most commonplace being that everyone was quite busy.  You see, the people who sat down to these lunches were all scholars of varying stripe, many of them leaders in their field.

I managed to attend a couple Deerfield Lunches (maybe a few – it was a long time ago).  They were fun and extremely informative.  The first one I attended made the largest impression on me, mainly because it was there that I met Abbott Lowell Cummings.  Those of you who don’t happen to be students of New England architectural history may not have heard of him, but he’s a pretty big dog in the world of Domestic Architecture, or – more accurately – Vernacular Architecture.  He’s also an absolute sweetheart.

Vernacular Architecture is one of those terms that has been defined extensively, but unfortunately in varying ways.  The basic gist tends to hold more or less the same, but the details get muddy at times.  In a nutshell, Vernacular Architecture describes structures that were constructed by someone who had not been formally educated in the discipline of architecture.  This does not mean they were unskilled or unprofessional.  Neither does it mean that they were inexperienced or incompetent.  All it means is that a builder of Vernacular structures has not been formally educated as an architect.  And it most certainly does not mean that Vernacular structures are in any way sub-standard.

To drive the point home, it has been estimated that as much as 90% of the extant structures on the face of the Earth are, in fact, Vernacular Architecture.

Anyway, a slew of conversations I’ve been having lately have brought Abbott to mind.  He’s a big fan of maps in general and GIS in particular, and I think he’d be interested in the ongoing dialogue about data sources.

You know the one I’m talking about.  It usually starts when someone says ‘crowd-sourced’ or ‘authoritative’ and then it tends to degenerate from there.  Pretty soon, everyone’s arguing about data quality and reliability as if they have a direct correlation to the data’s source (they don’t, in case you were wondering).

As far as I can tell, the problems all stem from inadequate and/or ill-defined terminology.  There seems to be a certain amount of determination to doggedly stick to terms that are not appropriate to the task at hand.  While I love the term ‘crowd-sourced’, it has some connotations attached to it that are rather counter-productive.  It’s the crowd, after all.  Barely a step away from the mob. And we all know about the mob – they’re unskilled and unruly.  They’re the Great Unwashed.  They stormed the Bastille, for crying out loud!

All personal interpretations aside, though, ‘crowd-sourced’ doesn’t actually describe what we’re talking about here.  At least, it doesn’t according to the guy who coined the term.  In fact, ‘crowd-sourced’ is far closer to ‘outsourced’ than it is to ‘volunteered’.  When we ‘crowd-source’ a job, we start by looking for volunteers.  Failing that, we hire free-lancers.  The waters only get cloudier (no pun intended) when we start adding cost to the definition.

We’re not just talking about free v. not-free data here (and by ‘free’, I’m talking about monetary cost.  Any other form of ‘free’ falls under licensing, which is far too large a discussion to delve into here).  If we were, we could just draw that line.  Neither are we talking about good quality v. poor quality data.  Again, we could simply draw that line and be done with it.  The problem we’re having is that free and not-free, as well as good quality and poor quality, fall on all sides.

The point I’m trying to make is that if we must draw lines (and it appears that there’s no avoiding it), we should do so using terms that do not imply judgment calls in regard to cost and/or quality.

And since everyone seems to be trying to draw lines according to data source, why don’t we go ahead and do just that?  A reasonable attempt was made with Volunteered Geographical Information (VGI), except in that ‘volunteered’ describes the method in which the data is delivered, not the source from whence it comes.  I also think we should avoid narrowing our definitions to geographical data.  Data comes from a variety of sources – applying geography to it is kind of our job, isn’t it?  Besides – I think we Map Dorks tend to put too much emphasis on the ‘G’ in GIS.  GIS should be information that is geographically informed, not information that’s geographically driven.

Another horrible attempt was to label some data as ‘authoritative’ (as opposed to ‘crowd-sourced’).  I hope I don’t have to spell out everything that’s wrong with this one.  ‘Authoritarian’ would probably hit closer to the intent behind this choice.  ‘Official’ would be a better term to describe the source, but it still implies data that is more accurate, correct, or just plain better.

Why don’t we stop worrying about cost (it seems obvious enough) and quality (a judgment call best left to the individual user) and focus our attention on data source?  Where does our data come from (aside from the data we gather ourselves, which need not enter into this conversation)?  While this division usually ends in two categories, I would argue that three is a more appropriate number (and I do not mean to imply that three categories can adequately include all available sources of data.  They can cover a very large percentage of them, though).  Here, presented in an order that is not intended to imply or connote a bloody thing, are the three categories I would separate our data into, including the terms I use to describe them along with a brief definition and examples.

1) Governmental This is probably the largest of the three categories.  This describes data that is produced directly by a governmental body, be it national, regional or local.  This data is often free (unless you count the fact that we pay for it through taxes), but not always.  This data is often described as ‘official’ or ‘authoritative’.  While these descriptions are technically correct, they should not be taken to mean that governmental data is necessarily more accurate or ‘correct’ than other sources of data (see the previous post for a brief discussion of this).  The USGS and the oft-mentioned MassGIS are good examples of Governmental data sources.

2) Commercial This is data that springs from professional sources.  I use the term ‘commercial’ instead of ‘professional’ because this is not the only category to include data created by professionals.  In fact, professionals contribute enormous amounts of data in all three categories.  What separates this category from the others is that it exists in the private sector and it consists of data that was gathered, created and/or derived for money.  Of course, after the fact much of the data is made freely available for a variety of reasons, not least of which is that government is often the entity paying for the contract (or corporations so large they might as well be governmental bodies.  You know – like Google and Microsoft).  Companies like GeoEye, Digital Globe and Navteq are better known examples of Commercial data sources, but there are many, many more out there.

3) Vernacular Vernacular data is data that is provided voluntarily, mostly by private entities, for public consumption.  This is the kind of data that’s provided by people on the ground.  And while most of it is freely accessible to all, a certain amount of knowledge and skill is needed before actually contributing (my mother, for instance, wouldn’t know where to begin).  What separates this kind of data from the others is its self-correcting nature.  Vernacular data tends to be openly editable, which means that anyone who notices errors can correct them.  When this is not the case, the public at large is usually given access to the machinery needed to report errors.   For those of you who don’t know me and have never before read this blog, this comprises my personal favorite source of data, for a variety of reasons (it’s also the one I often refer to as ‘Dork-sourced’).  I choose the word ‘vernacular’ for much the same reason it was chosen to describe structures that have been standing for centuries – because experience and dedication and commitment are often stronger than formal indoctrination.  Examples of these data sources are OpenStreetMap (including the numerous spin-offs that expand upon OSM’s data) and Google Maps (the My Maps aspect, as well as the API).  Other examples (sometimes referred to as ‘passive’ data) include data like geo-tagged photos at Flickr or Panoramio.

The most important thing to remember about all three of these categories is that none of them possess any kind of exclusive claim to accuracy and/or quality.  I’ve chosen the locations of these lines carefully, and I feel they’re safely drawn.  It is important to remember that the lines refer only to source.

In practice, I find that I tend to dance between all three categories, depending on the project at hand and the data I can get my hands on for said project.  I have my personal preferences, and they usually dictate where I start a search for data, but at the end of the day I select my sources based solely upon who offers the best available data for the task I have before me.  I’m sure I’m not alone in this.


In sixty-nine I was twenty-one and I called the road my own
I don’t know when that road turned into the road I’m on

– Jackson Browne

OSM Logo Except in my case it wasn’t ‘69, I was 16, and I’ve never stopped calling the road my own.

See, when I was in High School, I read Kerouac and Kesey, I read about Woody Guthrie and really listened to his music, and I – along with a bunch of my friends – succumbed to the siren song calling us to stand on the highway with our thumbs out.  A month later I returned dirtier, stronger and better than I had ever been.  I came away from the experience with a better understanding of the road, of the United States of America and – most important – a real understanding of freedom.  Travelling just for it’s own sake (and in a fashion that leaves you at the mercy of fate) – with no money, no real destination and no discrete itinerary – entails a level of freedom that the average person doesn’t really understand.  Reading Kerouac and listening to Woody can net you a glimpse, but you’ll never know the reality of it until you experience it firsthand.

A large part of that freedom is ownership.  When you develop such an intimate relationship with the road, you begin to understand the communal – no, universal – aspect of the road.  The road that – in effect – belongs to everyone.  The road that actually deserves capital letters and will hereafter be given them.  The Road that exists outside of boundaries and municipal spheres of influence (even while passing through them).

This is The Road that OpenStreetMap is about.  The Road that belongs to each and every one of us.  This is why Woody Guthrie would love OSM (although I’m pretty sure Kesey wouldn’t understand it and Kerouac wouldn’t give a rat’s ass about it).  Because a central aspect of OSM is about returning ownership of The Road to us, the people.  You know – the Great Unwashed.

And it is ours, you know.  And not just because we paid for it.

Lately, I’ve been thinking a lot about ownership and how we (the collective, all-inclusive ‘we’) fit into it.  And how ownership differs from possession.  This train of thought started with this discussionThis post added fuel to the flames.  Now we’re down to the stew my brain has made of it.  You might want to look away.

So the question that bubbles to the surface of my brain stew is:  Do lines in the sand (i.e., political boundaries and/or parcel data) have a place in OSM?

My immediate reaction is to say “no”.  For technical and philosophical reasons.  On a technical level, these are not the sort of data that the average person on the ground is able to provide.  I can easily take my GPS out into the world and accurately record streets, buildings, rivers, railways, bus-stops, parks, bathrooms, pubs, and trees.  These are all concrete physical features that any one of us can locate on the surface of the Earth and record.  More importantly, they’re features that anyone else can check.  Or double-check.  And this – in case you haven’t noticed – is the strength of OSM.  It’s self-correcting.

But we can’t do this with the lines in the sand.  Where – exactly – is the border of your town?  Can you stand on it and take a waypoint?  Sometimes you can.  Most roads have convenient signs telling you when you’re leaving one political sphere of influence and entering into another.  Here in New England, there are often monuments of one sort or another at pertinent locations to mark the dividing line betwixt one town and another.  And these are certainly locations that can be marked – as points.  If you’re not willing to walk the entire border, however, you shouldn’t draw the line in the sand.  Sometimes you can’t just connect the dots.  Actually, most of the time you can’t.

Of course, we often have the option of downloading border data from various (presumably authoritative) governmental sources.  But then we run into the question of whether we have the right to upload that data to OSM.  Personally, I don’t think the payoff is worth the expenditure of neurons necessary to figure it out.  Especially because the ‘authorities’ don’t always agree:

A quick comparison of the counties of western Massachusetts.  The green background with black outlines was provided by the USGS.  The semi-transparent grey foreground with white outlines was provided by MassGIS.  Note the differences.  For the record, the data provided by MassGIS is vastly superior to that provided by the USGS.  Trust me.

Parcel data is even worse.  Frankly, I don’t know why anyone would want to include parcel data on any map, but then I’ve had a lot of experience with it and therefore I am cognizant of its uselessness.  Parcel maps are more for bean counting than anything else.  Their primary purpose is to delineate taxation and therefore they tend to conform to a “close is good” standard.  They don’t need to be accurate – tax collectors are quite happy to round up.  Take it from a guy who has had occasion to check a large number of parcel maps against the truth on the ground – they are grossly inaccurate (in these parts, it used to be thought that the ground trumped the map and/or the deed.  I’ve seen many maps that have ‘corrected’ acreages on them.  These days, though, the thinking tends in the other direction.  After all – what you paid for is what the paper says you paid for).

On a purely philosophical level, I feel as though lines in the sand have no place in OSM.  Lines in the sand are all about possession.  They are someone’s way of saying “This land is my land.  It’s not your land”.  In my far from humble opinion, this is pretty much the polar opposite of what OSM is about.  OSM is about taking ownership back from the line-drawers and the so-called authorities.  It’s a declaration that the map belongs to us – all of us – and we’d kind of like it to be an accurate map.  If it’s all the same to you.

But then, Kate had an excellent point (she does that, and is almost never annoying about it):  people tend to want to know where they are.  While I agree that this is, indeed, the case, I don’t think borders need to be a part of the picture.  When people go from Town A to Town B, they like to know where they are when they are actually inside the town proper.  But I question whether the average person cares when they cross over the border between Town A and Town B (except, of course, for the 3-year-old in my back seat who always wants to know.  Lucky for him, the driver’s seat is occupied by a daddy with a very accurate personal GPS in his head).  And while I think there is a place in the world for some borders (as I said before, we need some way to determine who’s responsible for plowing the roads and collecting the garbage), I doubt whether that place is on a ‘People’s Map’ like OSM.

Is there a solution?  I think so, and I think Andy hit upon it pretty soundly in the post linked to above:  labels.  With absolutely no lines whatsoever, people have no difficulty identifying points and areas if a map is sprinkled with labels of judicious size and font.  If you doubt me on this one, just look at this map:

If a couple Hobbits can find their way from the Brandywine river to the bowels of Mount Doom without borders, I think maybe OSM can do without them, as well.

The Road I came across this post the other day, and it made flashy things go on and off inside my braincase as normally underused neurons woke up and stretched lazily (do click on the blue letters and read the post).  While I agree with the crux of the above linked post, the light show inside my skull was actually related to (mostly) other ideas.  In my usual, intensely dull, Map Dorkish manner, I was thinking about data.

Really.  It’s something I think about.  A lot.  It’s a sickness.

Anyway, I got to thinking about a discussion I had with a fellow Map Dork on Twitter a short while ago, about data and GIS.  About how the majority of the GIS community spends the bulk of its time thinking about what to do with data, and not enough time thinking about the quality of the data itself.

It’s like this – whenever I make a map, there are two primary components involved in the process.  The first is the software that produces said map.  The leader in the field is far and away Esri, the company that produces ArcGIS (which used to be known as ArcView).  Esri does not produce my software of choice, for a variety of reasons, none of which should be taken as a comment on the software itself (okay – some of it should, but not a lot.  Maybe 30% or so).  Truth is that Esri wins Best In Show when it comes to proprietary software.

In Map Dorkia, though, proprietary software doesn’t carry the kind of weight it does in other fields.  You see, a fair number of Map Dorks also happen to be coders (maybe even most of them).  Because of this, the market has been flooded with a vast number of good, stable, working, free and open source alternatives.  I can’t begin to mention them all, but I will point you to this site, where someone better informed than myself has put together some good overviews (even if parts of them are bit out of date).

At the end of the day, my go-to GIS application is Quantum GIS (although it’s far from the only one I use).  Like the Esri offerings, Quantum GIS is a good, all-around GIS package (but not as feature-packed).  Unlike EsriWare, Quantum GIS has a huge, talented support base.  Everyone who’s working on Quantum GIS is doing so because they care, not just to get a paycheck.  Think about that.

The second component of any map I make is the data with which I make the map.  This data comes in many shapes and sizes, as well as different formats and/or projections.  The lion’s share of what I actually do involves taking all that crap and turning it into an accurate, useful and (hopefully) visually pleasing map.  The problem that Map Dorks run into at this point is:  Where to get the data?

Often, we turn to the federal government.  The USGS has been producing quality maps almost since the Boston Tea Party, so we tend to think of them as a pretty safe bet.  However, it’s wise to check the fine print on the quadrangle you’re looking at.  Around these parts, they generally date back to the sixties, although many of them were updated in the eighties or nineties.

Our government also provides census data, also known as TIGER (Topologically Integrated Geographic Encoding and Referencing system) files.  TIGER data comes in a variety of shapes and sizes, and is of varying accuracy (see below).

These days, most state governments have some sort of GIS department, as do many cities and towns.  These tend to be more accurate than federal sources (although not always) due mainly to the fact that they have a much smaller area of focus.  And, of course, some are better than others.  Here in Massachusetts, we are lucky to have MassGIS.  While MassGIS can be rather quirky (their file naming conventions leave a bit to be desired), they freely offer a wealth of data that tends to be pretty accurate (I know because I’ve checked a fair amount of it on the ground).  They do have a budget, however, so some of their data gets a little old between updates.  And while they offer tons of data via WMS, their servers – well – suck.

For my money, the most accurate data around (besides the data I go out and gather myself, of course) is that which comes from OpenStreetMap.  Steve touched upon this in the post mentioned previously, but it bears repeating.  Because data sources are many and various, it is often difficult to assess the accuracy of the data in question (especially if it’s data depicting an area geographically removed from your own location).

What makes OSM (OpenStreetMap) unique among data providers is the workforce that acquires the data.  The OSM workforce isn’t comprised of people looking only for a paycheck.  The OSM workforce doesn’t daydream about something else while they’re gathering data.  The OSM workforce is extraordinarily focused on the job at hand because they are only doing it because they really want to do it. They also really want the data to be accurate.

Possibly the most important aspect of the OSM workforce is their proximity to the area they provide data about.  In the majority of cases, OSM data is collected by people who can vouch for the accuracy of their data because they can see it out their window or because they walked by it on their way home from work.  When it comes to the OSM workforce, the person who mapped any given road has most probably walked down that road.

Because of the nature of the OSM workforce, I tend to trust the accuracy of OSM data more than most.  To my mind it’s just plain common sense.  And in my experience, OSM data is at least as good as any other source, usually better.  Here’s a comparison of road data from three sources:


You can see the obvious shortcomings of the TIGER data.  You will probably also note the similarity between the MassGIS data and the OSM data.  This is because MassGIS (bless their little hearts) handed a bunch of data to OSM many moons ago (I don’t know exactly when this occurred).  While this is a great thing for Massachusetts, not all of America was so lucky.  And in my experience, even here in Massachusetts OSM data tends to be more up to date than MassGIS’s (the primary reason for this, I think, is that MassGIS dedicates the lion’s share of their budget to flashy projects.  For instance, they just finished gathering new, state-wide aerial imagery – most at 30cm/pixel, some at 15cm.  While the imagery is very cool and very useful, OSM will probably get around to utilizing it before MassGIS does).

As luck would have it, you don’t have to take my word for this.  Bing maps just rolled out a new feature:  an OpenStreetMap layer.  I did a quick comparison:


This pretty much speaks for itself.  Not only is the OSM data more accurate (note the British Rail lines on the left, as well as the placement of the Oxford Canal), but OSM provides far more information than the Bing data (without overcrowding the map).  In pretty much all ways, it’s just plain better data.

And before anyone points to the fact that OSM started in Great Britain (so of course OSM data is better over there), here’s a section of Boston I visited just the other day:


Kudos to Microsoft for including the OSM layer.  By all means head on over to Bing maps and check it out.  It’s nice to see that they’ve finally figured out what many of us Map Dorks figured out long ago:

Always use the best data you can get your hands on.



Blog Stats

  • 26,155 hits


August 2020