FME 2014 Sneak Peek: More Ado About Nothing
In a previous Evangelist post (from 2012) I talked about nulls in data and how FME handled them. In 2014 we have exciting new functionality that deals with null values in a proper way.
Because we’ve changed how FME behaves, if you think your source data may contain nulls then it’s important you read this article.
You were nothing to me once, and I was contented; you are now nothing to me again. But how different the second nothing is from the first! – Thomas Hardy in Far From the Madding Crowd critiques our new functionality!
Nulls in General
First let’s recap what we mean by null. In general there are three types of ‘nothing’ values:
- The attribute exists and is an empty value (Empty)
- The attribute exists and is NULL (NULL)
- The attribute doesn’t exist (Non-existent)
I say ‘three’ types, but really there are many more. A numeric attribute with a value of zero could be taken to mean nothing. Also there are NaN (Not-A-Number) values, nils in XML, Nodata values in raster, and many others (did you know Excel has its own special version of ‘nothing’?) – in fact I’m told by our developers that they found fifteen (15) types of nothing; but here we’re looking at the three primary types of nothing and why null is special.
Yes, fifteen types of nothing! When you get all of them in a single dataset it causes a vortex in space-time! True.
But for most people nulls will occur in a database. A true null means a field has been deliberately set to ‘nothing’. It’s not the same as an empty value at all. The closest non-IT analogy I can think of is when you fill in your name on a questionnaire. You could fill in your middle name (in which case it has a value) or you could write “n/a” or cross the field out. That would be equivalent to an empty field; it’s a positive value and we know you have no middle name. However, if you left that field blank, that would be a null; we can’t tell whether or not you have a middle name. It’s a value that signifies “unknown”, sort of a placeholder for real data.
Nulls and FME
Up until now FME did not have explicit support for nulls; we supported empty values and non-existent attributes (plus other forms of nothing, like NaN) but null wasn’t a concept that our internal data model understood. If an attribute was null then it was treated as if it didn’t exist. However, in 2014 we’ve completely overhauled that data model to incorporate null value support. So FME can now read null values, write null values, and carry out transformations on them too.
In terms of databases, it means you can now much more easily identify values that are null and write nulls to new rows, plus change existing rows to be null (which you couldn’t do before).
But, like people, formats come in all shapes and sizes and no two are the same. Interestingly, some formats will support all three of our primary nothing types; some will support just one or two. For example, Shape supports empty values, but not nulls, whereas GeoMedia supports nulls, but doesn’t recognize empty values. Databases generally store both null and empty values, and let you query them too. JSON nicely supports the concept of all three, whereas VPF has at least five types of nothing, maybe six depending on the field type (getting close to a space-time ripple there)! So even with null support, the behaviour of FME’s Readers and Writers will vary according to format, and you have to be aware of that.
Reading and Writing Nulls
If your source data includes null values the FME Reader will now emit null values. You don’t need to update or replace the Reader in your workspace, it will happen automatically. When you write the data out then the Writer will write null values too. If the format you are writing doesn’t support nulls, then the data will be handled appropriately – replaced with zero, or an empty value, or whatever that format requires.
So a translation involving nulls should just work. You don’t need to do anything special to your workspace, even if it’s an older one.
Which formats does this apply to? I’ve created a list at the bottom of the article for you to check, and you can find that list on FMEpedia too. We cover all the major databases, but if there’s a format you think we’ve missed then please do let us know.
So now nulls have been incorporated into the FME Data model, we need a way to inspect the data, and a way to distinguish true nulls from other forms of “nothing”.
When you use the FME Data Inspector to examine your data before translation (you are inspecting your data before translation, aren’t you?) you’ll see null values displayed in the table view:
…and in the Feature Information window:
When an attribute exists, but is empty, the Table View window will show it as an empty field (i.e. the cell in the table is blank) but when the attribute is missing completely (i.e. just doesn’t exist for that feature) then it flags it as such:
Here TaxCoord is part of the source schema, but these features don’t have it at all. So now you can totally identify which attributes are null, which are empty, and which are missing.
Nulls and Transformers
OK, so if we’re reading this data, and have an internal representation, then we also need a way for you to handle it in Workbench. Therefore we’ve updated several transformers to specifically handle nulls, updated many others to simply cope with null data, and added a new transformer to carry out new null functionality.
The list of transformers with null support is, like the formats, listed as an appendix below.
To give you some examples, though, let’s first look at the AttributeCreator.
Notice how you can set (or create) a null attribute using the new entry on the drop-down list, or by typing “<null>” into that field. You’ll find you can also do this with other attribute-setting transformers, such as the AttributeValueMapper:
The Tester transformer doesn’t look much different to before, except for rewording the operators to be more precise, but now when you choose “Attribute is Null” it really is checking for a true null value:
Other examples would be the ChangeDetector and the Matcher. The ChangeDetector and Matcher will now truly compare nulls so that “null” is not the same as “missing” as it was before. The change in the parameters here is just the option to turn this behaviour on or off – that way your workspace can be made to run the way it always has (and in fact – for backwards compatibility reasons – it will default to doing so):
Like I mentioned, other transformers have been updated to better cope with nulls, even if their parameters and outward appearance haven’t changed. For example, the StringPadder won’t attempt to pad a null string, because that would change it to no longer being null. It will still let you pad empty strings, because that won’t affect the attribute’s status.
And then we’ve added a new transformer to handle nulls, called the NullAttributeMapper.
The NullAttributeMapper is a replacement for the NullAttributeReplacer transformer. Basically the function here is to map attribute values to or from null, depending on the original attribute value:
For example here, if my ParkId attribute is missing, or is empty, or has a value of 9999, then I will replace it with null. I’d probably do this when I’m reading from a format that doesn’t support nulls, and writing to one that does. That way I get true null values in my output dataset. Similarly I could map null values to an empty field or a specific value:
I probably wouldn’t need to map to an empty field just for a Writer, because if the format doesn’t support nulls, FME will automatically convert null values to something appropriate for the format. But I might do that if I deliberately wanted to give them a new value or make them empty to match other data.
The other great things about this transformer are that it maps multiple attributes, and that it’s not just limited to nulls; so you could even use it instead of the AttributeValueMapper for some scenarios.
Benefits and Consequences
I think the benefits of this are fairly obvious, so I won’t say anything except that to mention we’ve also updated our APIs to include this functionality too, to ensure developers get the benefits as much as regular users.
However, you do need to be prepared for the effect of reading nulls as “real” attribute values, rather than as a missing attribute.
Many transformers will ignore missing attributes, and you may have come to rely upon that behaviour. But that data might be read as a null now, and that’s going to be treated differently. For example, the ListElementCounter won’t include attributes that are missing. It will, however, include nulls – because these are real values. The two transformers that I think need most careful scrutiny are the Tester and the FeatureMerger.
With the Tester you might check for an attribute that is empty (attr=””) or that “doesn’t exist”. In FME 2014 that attribute might now be read as null instead; in that case the test would fail, where it passed before. Basically, the functionality is better in 2014, it’s just that you’ve come to rely on the older (less good) behaviour. In this scenario you could change the test to be “Attribute is Null” instead. Otherwise, the new NullAttributeMapper transformer can be inserted to change nulls to missing and so revert the behaviour.
For the FeatureMerger, think of what happens when you merge attributes from one feature to another, particularly when a “requestor” attribute has the same name as a “supplier” attribute. The requestor attribute will be overwritten, except for where the supplier attribute is missing. But now, if your missing records are represented as nulls, that requestor would get overwritten as a null. Again it’s because null is a “real value”, and again the NullAttributeMapper can be used to workaround this.
Yes, cases like this are going to be fairly rare and obscure, but it’s worth being aware of them. We’d like you to embrace the new behaviour, but if you want to revert then use the NullAttributeMapper. Basically if you put a NullAttributeMapper into your workspace, directly after your Readers, and map “Null” to “Missing” then the behaviour in the rest of the workspace would be the same as before (although it wouldn’t be as good!)
For the latest info on null behaviour (and the formats/transformers supported) see our null support page on FMEpedia.
I think this update is going to be oh so useful to many of our users. Our developers have done a super job implementing this big change (any time you update the core data model it’s a big change) and then incorporating it into all these different Readers, Writers, and transformers.
If you notice any parts that could be improved (either functionality or user experience) then please do let us know about your experiences (good or bad).
Appendix 1: List of null-complete formats:
- FFS (obviously)
- ArcSDE formats
- Autodesk SDF
- Geodatabase API
- Geodatabase ArcObjects and family
- MS Access and Excel
- MSSQL Server (Windows Azure SQL Database)
- MySQL (MariaDB) (Google Cloud SQL)
- Oracle and family
- PostGIS and family (inc Redshift)
- Teradata (JDBC and TPT formats)
Appendix 2: List of null-support transformers (ones with exposed null parameters or functionality):
NB: Do be aware that many other transformers will now handle nulls without parameters; for example the Logger will log null values, the InlineQuerier can handle null values in its database, the ListHistogrammer will count the number of nulls, and so on.