avatar tianjara.net | blog icon Andrew Harvey's Blog

Using XSLT to Transform XML data into OSM format
14th February 2011

For a while I used to think that all there was to XML was <blah attribute="value">inner</blah>, but of course there is much more. I'm now digging into the real stuff like XPath, XSLT and XML Schemas.

I've come across a data set of bus stops (as well as live info on where buses are, and their status). The bus stop data set (http://nswbusdata.info/ptipslivedata/getptipslivedata?filename=stopdescriptions.zip, no longer active so I'm hosting my original copies at http://tianjara.net/data/nsw-buses/ for preservation) is in an XML format,

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<StopDescriptionList license="http://creativecommons.org/licenses/by-nc-nd/3.0/au/" copyright="NSW Roads and Traffic Authority">
    <stop longitude="151.17832" latitude="-33.81852" tsndescription="Osborne Rd nr Ronald Av" TSN="206699"/>
    <stop longitude="151.17359" latitude="-33.8082" tsndescription="Ralston St nr Murray St" TSN="2066138"/>
    <stop longitude="151.17764" latitude="-33.82054" tsndescription="Second Av nr Osborne Rd" TSN="206698"/>
    <stop longitude="151.17629" latitude="-33.81926" tsndescription="Fourth Av nr Second Av" TSN="206697"/>
  ...

Although because of the license, I cannot use this data in OpenStreetMap, I was still interested in converting it into an a .osm file. The perfect job for XSLT!

It turned out to be quite a simple task with a neat solution. My XSLT stylesheet used to do the translation:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" indent="yes"/> 

    <xsl:template match="/StopDescriptionList">
        <osm version='0.6' generator='XSLT'>
            <xsl:apply-templates select="stop"/>
        </osm>
    </xsl:template>

    <xsl:template match="stop">
        <xsl:variable name="count">
            <xsl:number/>
        </xsl:variable>

        <node id='-{$count}' lat="{@latitude}" lon="{@longitude}">
            <tag k='ref:tsn' v='{@TSN}' />
            <tag k='fixme' v='{@tsndescription}' />
        </node>
    </xsl:template>

</xsl:stylesheet>

Then it was a just a simple,

xsltproc -o busses.osm busses-stylesheet.xslt stopdescription.xml

The data is CC BY-NC-ND 3.0, but they sneak in some additional terms in the fine print, which in addition to the NC-ND would further lead to incompatibilities with the OSM license, and would under my definition of free data, make this data set non-free. For interest the first three additional terms are,

  1. You must not use the Data in any way that could create false or misleading outcomes or interpretations, or bring the RTA into ridicule or disrepute. You must not use the Data in conjunction with the promotion of alcohol or unsafe road practices.
  2. You must ensure that the Data used is current, and provide details as to the date and time of sourcing the Data from the RTA in all reproductions of the Data (including in any software applications incorporating the Data).
  3. In all reproductions of the Data (including in any software applications incorporating the Data), the following disclaimer must be provided: “The accuracy or suitability of the Data is not verified and it is provided on an “as is” basis.”
Tags: dev, osm.