Years back (while I was still in college) I did a few projects that entailed a lot of time studying the small town of Deerfield, MA (often referred to as the most over-studied small town in America). During this process, my advisor (Bob) introduced me to what was known as The Deerfield Lunch. The Deerfield Lunch was an irregular brown-bag affair, the attendees of which tended to fluctuate for a variety of reasons, the most commonplace being that everyone was quite busy. You see, the people who sat down to these lunches were all scholars of varying stripe, many of them leaders in their field.
I managed to attend a couple Deerfield Lunches (maybe a few – it was a long time ago). They were fun and extremely informative. The first one I attended made the largest impression on me, mainly because it was there that I met Abbott Lowell Cummings. Those of you who don’t happen to be students of New England architectural history may not have heard of him, but he’s a pretty big dog in the world of Domestic Architecture, or – more accurately – Vernacular Architecture. He’s also an absolute sweetheart.
Vernacular Architecture is one of those terms that has been defined extensively, but unfortunately in varying ways. The basic gist tends to hold more or less the same, but the details get muddy at times. In a nutshell, Vernacular Architecture describes structures that were constructed by someone who had not been formally educated in the discipline of architecture. This does not mean they were unskilled or unprofessional. Neither does it mean that they were inexperienced or incompetent. All it means is that a builder of Vernacular structures has not been formally educated as an architect. And it most certainly does not mean that Vernacular structures are in any way sub-standard.
To drive the point home, it has been estimated that as much as 90% of the extant structures on the face of the Earth are, in fact, Vernacular Architecture.
Anyway, a slew of conversations I’ve been having lately have brought Abbott to mind. He’s a big fan of maps in general and GIS in particular, and I think he’d be interested in the ongoing dialogue about data sources.
You know the one I’m talking about. It usually starts when someone says ‘crowd-sourced’ or ‘authoritative’ and then it tends to degenerate from there. Pretty soon, everyone’s arguing about data quality and reliability as if they have a direct correlation to the data’s source (they don’t, in case you were wondering).
As far as I can tell, the problems all stem from inadequate and/or ill-defined terminology. There seems to be a certain amount of determination to doggedly stick to terms that are not appropriate to the task at hand. While I love the term ‘crowd-sourced’, it has some connotations attached to it that are rather counter-productive. It’s the crowd, after all. Barely a step away from the mob. And we all know about the mob – they’re unskilled and unruly. They’re the Great Unwashed. They stormed the Bastille, for crying out loud!
All personal interpretations aside, though, ‘crowd-sourced’ doesn’t actually describe what we’re talking about here. At least, it doesn’t according to the guy who coined the term. In fact, ‘crowd-sourced’ is far closer to ‘outsourced’ than it is to ‘volunteered’. When we ‘crowd-source’ a job, we start by looking for volunteers. Failing that, we hire free-lancers. The waters only get cloudier (no pun intended) when we start adding cost to the definition.
We’re not just talking about free v. not-free data here (and by ‘free’, I’m talking about monetary cost. Any other form of ‘free’ falls under licensing, which is far too large a discussion to delve into here). If we were, we could just draw that line. Neither are we talking about good quality v. poor quality data. Again, we could simply draw that line and be done with it. The problem we’re having is that free and not-free, as well as good quality and poor quality, fall on all sides.
The point I’m trying to make is that if we must draw lines (and it appears that there’s no avoiding it), we should do so using terms that do not imply judgment calls in regard to cost and/or quality.
And since everyone seems to be trying to draw lines according to data source, why don’t we go ahead and do just that? A reasonable attempt was made with Volunteered Geographical Information (VGI), except in that ‘volunteered’ describes the method in which the data is delivered, not the source from whence it comes. I also think we should avoid narrowing our definitions to geographical data. Data comes from a variety of sources – applying geography to it is kind of our job, isn’t it? Besides – I think we Map Dorks tend to put too much emphasis on the ‘G’ in GIS. GIS should be information that is geographically informed, not information that’s geographically driven.
Another horrible attempt was to label some data as ‘authoritative’ (as opposed to ‘crowd-sourced’). I hope I don’t have to spell out everything that’s wrong with this one. ‘Authoritarian’ would probably hit closer to the intent behind this choice. ‘Official’ would be a better term to describe the source, but it still implies data that is more accurate, correct, or just plain better.
Why don’t we stop worrying about cost (it seems obvious enough) and quality (a judgment call best left to the individual user) and focus our attention on data source? Where does our data come from (aside from the data we gather ourselves, which need not enter into this conversation)? While this division usually ends in two categories, I would argue that three is a more appropriate number (and I do not mean to imply that three categories can adequately include all available sources of data. They can cover a very large percentage of them, though). Here, presented in an order that is not intended to imply or connote a bloody thing, are the three categories I would separate our data into, including the terms I use to describe them along with a brief definition and examples.
1) Governmental This is probably the largest of the three categories. This describes data that is produced directly by a governmental body, be it national, regional or local. This data is often free (unless you count the fact that we pay for it through taxes), but not always. This data is often described as ‘official’ or ‘authoritative’. While these descriptions are technically correct, they should not be taken to mean that governmental data is necessarily more accurate or ‘correct’ than other sources of data (see the previous post for a brief discussion of this). The USGS and the oft-mentioned MassGIS are good examples of Governmental data sources.
2) Commercial This is data that springs from professional sources. I use the term ‘commercial’ instead of ‘professional’ because this is not the only category to include data created by professionals. In fact, professionals contribute enormous amounts of data in all three categories. What separates this category from the others is that it exists in the private sector and it consists of data that was gathered, created and/or derived for money. Of course, after the fact much of the data is made freely available for a variety of reasons, not least of which is that government is often the entity paying for the contract (or corporations so large they might as well be governmental bodies. You know – like Google and Microsoft). Companies like GeoEye, Digital Globe and Navteq are better known examples of Commercial data sources, but there are many, many more out there.
3) Vernacular Vernacular data is data that is provided voluntarily, mostly by private entities, for public consumption. This is the kind of data that’s provided by people on the ground. And while most of it is freely accessible to all, a certain amount of knowledge and skill is needed before actually contributing (my mother, for instance, wouldn’t know where to begin). What separates this kind of data from the others is its self-correcting nature. Vernacular data tends to be openly editable, which means that anyone who notices errors can correct them. When this is not the case, the public at large is usually given access to the machinery needed to report errors. For those of you who don’t know me and have never before read this blog, this comprises my personal favorite source of data, for a variety of reasons (it’s also the one I often refer to as ‘Dork-sourced’). I choose the word ‘vernacular’ for much the same reason it was chosen to describe structures that have been standing for centuries – because experience and dedication and commitment are often stronger than formal indoctrination. Examples of these data sources are OpenStreetMap (including the numerous spin-offs that expand upon OSM’s data) and Google Maps (the My Maps aspect, as well as the API). Other examples (sometimes referred to as ‘passive’ data) include data like geo-tagged photos at Flickr or Panoramio.
The most important thing to remember about all three of these categories is that none of them possess any kind of exclusive claim to accuracy and/or quality. I’ve chosen the locations of these lines carefully, and I feel they’re safely drawn. It is important to remember that the lines refer only to source.
In practice, I find that I tend to dance between all three categories, depending on the project at hand and the data I can get my hands on for said project. I have my personal preferences, and they usually dictate where I start a search for data, but at the end of the day I select my sources based solely upon who offers the best available data for the task I have before me. I’m sure I’m not alone in this.