I’m glad to see the news about the release of objects from the collections of the Science Museum, the National Media Museum and the National Railway Museum has spread so far and wide already.
A few people have commented on the licence (Creative Commons Attribution-NonCommercial-ShareAlike, CC BY-NC-SA) and on the format (CSV). As tomorrow is my last day, I can’t really speak for the museum but the intention is to learn from how people use the data – the things they make, the barriers they face, etc – and iterate (as resources allow) until we get to an optimal solution (or solutions). So please get in touch if you’ve got requests or think you can help clear up some of the issues these kinds of projects face, because there’s a good chance you’ll help make a difference.
The licence is a pragmatic solution – it’s clarification of existing terms rather than a change to our terms, because this avoided a need for legal advice, policy review, etc, that would have added several months to the process.
And yes, I know CSV is quick and dirty, but it’s effective. The museum sector is still working out how to match the resources available with the needs of mash-up type developers who work best with JSON and those who are aiming for linked open data; my hope is that your feedback on this will help museums figure out how to support people using open data in various forms. A simple solution like this also means it’s easy for the museum to re-run the export to update the data as time goes on, and that anyone, geek or not, can open the files without being startled by angle brackets and acronyms. Also, did I mention it was quick?
Finally, we’ve already had some useful feedback and even some improved files. Richard Light sent us a geocoded version of records from the National Railway Museum (NRM) (index of locations: http://api.sciencemuseum.org.uk/collections/updates_from_other_people/Richard_Light/nrm-geo-sort.xml (63kb), full file http://api.sciencemuseum.org.uk/collections/updates_from_other_people/Richard_Light/nrm-geo.xml – 20mb, browser-beware).
I’ll let Richard explain in his own words:
I converted the source CSV to XML using my CSV Converter program, which is a home-made program I wrote to do a “mail-merge” on CSV data, with the aim of easily generating other formats such as XML.
The geocoding was carried out by calls to my place URL-ifier program. This uses the standard Geonames query API, but splits a place description into its component place names (e.g. “Swindon, Wiltshire, England” becomes three place names) and searches for a “Swindon” contained within places “Wiltshire” and “England”.
I wrote an XSLT transform which copied the source document, and each time it found a place field, it called out to my URL-ifier using the document() function:
<xsl:template match=”PLACE_MADE[text()!='']“>
<xsl:variable name=”geonames”
select=”document(concat(‘http://light.demon.co.uk/scripts/getPlaceURL.exe
?amp;q=’, text()))/*/text()”/>
<xsl:copy>
<xsl:if test=”$geonames!=””>
<xsl:attribute name=”geonamesId”><xsl:value-of
select=”$geonames”/></xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
Where this was successful in inferring a Geonames identifier, it added a “geonamesId” attribute to the PLACE_MADE field. So the result is a copy of the source data, with added geocoding.
All of the NRM data was geocoded in a single XSLT operation, but this operation had to call my URL-ifier, and hence the Geonames API, many times. There are limits on how hard you can hit this service, so care needs to be exercised! (You can get your own Geonames identifier for free, and then have your own allocation of API calls, if you want to use this service in a serious way.)
Now that the data contains Geonames URLs, you have access to all the background information about each place. All Geonames entries have lat/long co-ordinates (which is what you need to stick a pin on a map in your browser, using e.g. KML markup), but in addition will often have info such as population. You just need to make an HTTP request for the Geonames URL, specifying that you want RDF back, e.g.: http://light.demon.co.uk/scripts/cgiforwarder.exe?url=http://sws.geonames.org/2633352/&accept=rdf and process the RDF/XML which comes back.
Personally, this kind of thing makes it all worthwhile – we can’t easy export our entire geographical hierarchy, so being able to geocode the imperfect data we have is really useful.
If you’ve done something interesting with our data we’d love to feature it. We’re also curious to know who’s having a look at it, even if you’re not at the point of having something to share.
Finally, I’d almost forgotten to thank the many wonderful people who’d contributed to the Museums and the machine-processable web site or come along to #linkingmuseums meetups to work out how to get to re-usable museum data. I’ll be keeping up the wiki in future, and can be contacted @mia_out.
Update on collections data and geocoded NRM data
I’m glad to see the news about the release of objects from the collections of the Science Museum, the National Media Museum and the National Railway Museum has spread so far and wide already.
A few people have commented on the licence (Creative Commons Attribution-NonCommercial-ShareAlike, CC BY-NC-SA) and on the format (CSV). As tomorrow is my last day, I can’t really speak for the museum but the intention is to learn from how people use the data – the things they make, the barriers they face, etc – and iterate (as resources allow) until we get to an optimal solution (or solutions). So please get in touch if you’ve got requests or think you can help clear up some of the issues these kinds of projects face, because there’s a good chance you’ll help make a difference.
The licence is a pragmatic solution – it’s clarification of existing terms rather than a change to our terms, because this avoided a need for legal advice, policy review, etc, that would have added several months to the process.
And yes, I know CSV is quick and dirty, but it’s effective. The museum sector is still working out how to match the resources available with the needs of mash-up type developers who work best with JSON and those who are aiming for linked open data; my hope is that your feedback on this will help museums figure out how to support people using open data in various forms. A simple solution like this also means it’s easy for the museum to re-run the export to update the data as time goes on, and that anyone, geek or not, can open the files without being startled by angle brackets and acronyms. Also, did I mention it was quick?
Finally, we’ve already had some useful feedback and even some improved files. Richard Light sent us a geocoded version of records from the National Railway Museum (NRM) (index of locations: http://api.sciencemuseum.org.uk/collections/updates_from_other_people/Richard_Light/nrm-geo-sort.xml (63kb), full file http://api.sciencemuseum.org.uk/collections/updates_from_other_people/Richard_Light/nrm-geo.xml – 20mb, browser-beware).
I’ll let Richard explain in his own words:
Personally, this kind of thing makes it all worthwhile – we can’t easy export our entire geographical hierarchy, so being able to geocode the imperfect data we have is really useful.
If you’ve done something interesting with our data we’d love to feature it. We’re also curious to know who’s having a look at it, even if you’re not at the point of having something to share.
Finally, I’d almost forgotten to thank the many wonderful people who’d contributed to the Museums and the machine-processable web site or come along to #linkingmuseums meetups to work out how to get to re-usable museum data. I’ll be keeping up the wiki in future, and can be contacted @mia_out.