During the FME 2011 World Tour I had the opportunity to share my excitement and love of XML and GML. One of the things that I discovered on the road is that many users I met with absolutely dread having to work with XML and GML. I was only too happy to share the good news that you do not need to fear XML or GML.

[A note about GML: I won’t mention GML again in this post (for brevity) as GML is XML with predefined primitives that help represent spatial data in standard ways. So if your passion is GML, then simply read GML everywhere where I say XML. The story remains the same.]

Reasons for the Fear of XML
One of the reasons for this fear is that, until recently, working with XML required users to learn technology that is specific to working with XML. The tools are typically open source XML tools such as XQuery or XSLT, and/or XML tools from other vendors. Learning new technology is an investment in time and/or money, thus making adoption expensive.

If I had to learn XQuery or XSLT in order to work with XML, I would not be excited about XML, as I don’t have the time to learn and master these tools. What I am excited about is giving people tools to work with XML without them having to learn XML-specific tools, or be fluent in XML.

The Spatial Alternatives to XML
If XML is so scary then what are the alternatives to sharing data with XML in geospatial?

Is it better to share data using vendor or application-specific binary formats? Many of these “formats” are not data formats but application-specific formats that were designed to support a particular application. They are often undocumented and thus poor candidates for data storage.

Is it better to share data using formats like Excel, CSV, or Esri Shapefile? The biggest thing going for these formats is their simplicity. While there’s a lot to be said for keeping things simple, these formats were either not designed with spatial data in mind or (in the case of Shapefile) are showing their age. These formats also do not lend themselves to the storage of other important information such as relationships between objects/entities, and have no support for metadata.

Why XML Trumps the Alternatives
On the other hand, XML is an expressive open standards based approach to sharing all kinds of data. As the name suggests XML (EXtensible Markup Language) is not a format but rather a language for defining formats (i.e. how data is to be shared). This expressiveness enables XML to store all types of data that spans many different communities and industries. It is capable of representing both the very simple and the very complex.

I am excited about XML as it is capable of representing all types of data. Solving the XML challenge thus opens up the whole world of data.

XML isn’t Hard. Data Models are Hard.
Is XML itself hard? Or, is what’s hard trying to understand some of the data models that are created using the freedom of XML? I would argue that it is more the data models that are hard. Regardless of your view on this situation there do exist many XML data models that are very complex indeed – some needlessly so.

Einstein loves XML

Einstein reminds you to construct simple XML models. (Sure wish I had as much hair as him.)

It goes without saying that simple XML datasets are easier to work with, and new tools are making it easier and easier to work with complex XML models. For those defining XML-based “formats” you should heed to words of Albert Einstein: “Everything should be made as simple as possible, but not simpler.”

Remember, the more complex that the data model is, the more effort there is that’s required to use it, and therefore the less data exchange that will be happen. Given data in a needlessly complex model and the same data in a simple model – the simple model has more value!

I am excited about the new solutions under development that will make working with XML even easier. The future of XML is bright with more and more data of all types being shared in XML.

As XML grows in importance (more about that next week) and tools are released which remove the need for users to learn XQuery, XSLT, etc – there’s no longer any reason to fear XML.

Next week, I’ll talk about the future of XML and invite you to participate in the XML Challenge. For now, watch a video of the XML presentation I gave in March 2011. Or, tell me your opinion about XML and GML. What do you love or hate about XML? Is XML hard to work with?

About Data Data Transformation GML Spatial Data XML

Don Murray

Don is the co-founder and President of Safe Software. Safe Software was founded originally doing work for the BC Government on a project sharing spatial data with the forestry industry. During that project Don and other co-founder, Dale Lutz, realized the need for a data integration platform like FME. When Don’s not raving about how much he loves XML, you can find Don working with the team at Safe to take the FME product to the next level. You will also find him on the road talking with customers and partners to learn more about what new FME features they’d like to see.

Comments

15 Responses to “Defeating XML & GML (Part 1/2: Conquer the Fear)”

  1. Everytime the lights dim when I start loading GML into my desktop apps it makes me wonder if I’m doing something wrong 🙂

    After the parsing is done and the data loaded, then I’m really happy for so many reasons – including the ability to have verbose descriptions of the data therein. I’m not sure if better tools are needed, or if developers need a different approach or if we should just be converting our data (even temporarily) into some other form that seems to require less overhead.

    Looking forward to your thoughts on these sorts of pragmatic challenges and all the secrets for how you deal with it in FME most efficiently. 🙂

  2. Don Murray says:

    Tyler,

    Thanks for reading and for the response. I thought only my lights dimmed as I was grinding thru some massive XML file! 🙂

    XML is many things, but as you suggest it does not have high information density, and it is expensive to parse! The flexibility of XML gives those designing XML schema a lot of rope and it is clear that in many instances little or no regard is given for the software developer that has to work with the data. Those sharing data need to really work to make their files simpler and need to think more about the development community who is trying to work with the data.

    As far as converting XML to some other form that requires less overhead I fully agree. In my mind XML is a good data transport format for data publishers to deliver their datasets or other information. For data delivery the XML/GML documents are often delivered in a compressed format/stream for more effective transport. When we get XML and work with clients using XML it is always to be loaded or transformed into some other system for their operational purposes.

    What I find exciting is that in the past (as short as 2 years ago) when I was working with XML I spent much of the time worrying about how I was going to pull the information I needed out of XML and get it into a structure suitable for the target system. Now I don’t worry about that at all. I used to dread getting XML files, now I get excited! Seriously.

    Our work for sure is not complete and we continue to work on XML. What sort of XML documents are you working on? I would definitely be interested in following up with you if you are interested.

  3. mhabarta says:

    Hello Don

    Congratulations for your XML excitement. I’m still waiting for the moment that I get pulled over to the XML/GML fan group.

    Up to now the experience about missing tools or insufficient capabilities in existing tools is still prevalent in my everyday work.

    So to say I felt some comfort reading your article and I’m looking forward to learn about part 2.

    I fully agree with the finding that complex modeling is more of a challenge than the usage of XML/GML.

    Too many people try to proof what is ultimately possible instead of following Einsteins advice. The challenge is to find a robust compromise between the scientists at universities who want to impress with intellect and the workers in IT business who need to get things done.

  4. Thanks for the response Don, one dataset I’ve picked away at using over the past couple years has been the Geobase.ca Nation Hydrographic Network: http://geobase.ca/geobase/en/data/nhn/index.html It’s cool data and more verbose than I had the right to expect – so big kudos to Geobase for publishing in an open format! Quantum GIS sucks it in great, it always successfully churns its way through parsing but my wife just can’t use the microwave at the same time!

  5. Don Murray says:

    Michael,

    I will do my best to pull you into the XML/GML fan club when we meet again. Your comment on the complex model is key. Regardless of the model that data is shared in, the more complicated the model the more effort it is going to be to:

    a) Understand that model. There are no silver bullets here
    b) Map that complex model into a model that an existing system can use.

    I think that a better challenge for all model developers is to show how they can use their intellect to design the very simplest of models that gets things done. That is much harder than creating a complex monster. Nice clean design wins out over complexity every time. Here is no different. Given two datasets with the same data in them – one very complex and the other simple – the simple one is going to be more successful.

    What I am excited about is that now I get to look forward to getting my hands on GML/XML data and the discussion is now “What information do I need/want from it?” rather than “How the heck am I going to get data out of it?” For me this is exciting.

  6. Don Murray says:

    Tyler,

    Thanks for the link to the data. I am going to check it out and explore it. Thanks also for the warning, I’ll make sure that other appliances are not running.

  7. Don Murray says:

    Tyler,

    I checked it out with FME and our GML reader sucks it straight in with no problem too. Geobase did a good job with their data, keeping the model simple and making it much easier for folks to be able to use the good data that they are providing.

    Don

  8. […] week, in the first part of Defeating XML & GML, I talked about the fear that many have when it comes to XML and GML. Now, I will explain why the […]

  9. […] But somehow in all the excitement over the years about Raster, 3D, BIM, and yes @DonAtSafe, also XML – it seems as if CAD and GIS kind of got a bit lost in the […]

  10. […] you all know XML is one of my passions and I am always looking for others who are excited about XML. Having said that I am still afraid to […]

  11. I am also very enthusiastic about XML technologies (and also with lever intensity about GML). It is very clear in my opinion, well structured. I teach XML in several study programs and would not say that it is difficult to teach / to understand, and of course essential for a lot of technologies, not only in Geo IT. I would join the “party for those who love XML”.

  12. Don Murray says:

    Dr. Behr,

    Good to meet you as another who is excited about XML. It would be great to touch base and learn about the XML studies that you teach and get your perspective on where further research is needed to make working with XML even easier. You also mentioned that XML is not restricted to the spatial domain. While my love for XML has grown through attacking spatial challenges, I am finding that the XML work we have done is opening FME up more and more to general IT XML challenges. Indeed we are finding that many IT tools focus on providing developers with XML tools rather than focusing on developing XML tools for the non-development crowd.

  13. Ben says:

    What about JSON and GeoJSON? Wouldn’t they be just a little smaller, simpler, more flexible, and easier to edit than XML?

  14. Don Murray says:

    Ben,

    Great comment on JSON and GeoJSON. JSON is definitely smaller and in many cases easier to work with. We need only look at the amount of effort we at Safe have spent adding XML support vs the time we have spent adding JSON support. They are orders of magnitude less. While I go on a lot about how we make XML easy, we have also worked to make JSON and GeoJSON easier as well in the upcoming FME 2012.

    Currently I see JSON being used more for small messages between systems rather than as a data interchange approach for larger datasets.

    In line with what you’ve said, you will be pleased to know that for our upcoming event notification capability in FME Server 2012, we accept both XML and JSON for the user defined data package, with JSON being the default.

  15. […] If you are working with NIEM data or want to work with NIEM data let me know your challenges and comment below. After all NIEM data is XML and you know how I feel about XML. […]

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts