To continue a theme from my previous post, when we at Safe use terminology we have to be very careful to be consistent. In the wild and exciting early days of FME such things didn’t really matter, but now that we have close to 100 employees – with resellers all over the world – it’s much more important that we are all using the same language.
So this posting will be about how we’re trying to meet this challenge at Safe. I’ll also cover one particular term which I (as a trainer) feel is quite important for users to grasp the concept of: group-based transformation.
FME Terminology Review
I guess the need to review terminology happens to any long-term product. Theories change over time and names once thought suitable (“The Swizzler” for example) now just sound like a character from a Batman comic, or a trendy kitchen utensil.
Below: The Kitchen Swizzler (nothing to do with FME)
So, for FME2010 we’re going through all of the Workbench interface, documentation, and FME Training and trying to make the terminology consistent throughout. Workbench is the first, probably because so many different people have made contributions to it in different ways.
The biggest changes are probably those which affect the menu or dialog boxes:
Formats and Datasets
At the moment the terms “Formats” and “Datasets” are often used in a generic way, when really they should have a more precise definition. A dataset has its own position in the FME hierarchy, and the term should only be used when referring to this level.
It’s probably easier to understand when viewed as a picture (click to enlarge):
Notice that we’re reviving the Mapping File concepts of “Reader” and “Writer”. A “Reader” is an object which reads one or more datasets of a specific format, and a “Writer” is an object which writes data of a specific format.
So the “Source Data” and “Destination Data” items on the menubar will be renamed to Readers and Writers, and the Formats Gallery renamed to the Readers and Writers Gallery (note how this neatly ties in with the Readers and Writers Manual).
Instead of adding a source dataset, you add a reader (plus specify any number of source datasets to read), so the menubar item Source Data > Add Dataset will now become Reader > Add Reader.
This helps to explain how, for example, you empty the Workbench canvas but still have items in the Navigator window. These leftovers are Readers or Writers.
Also, dialogs titled “Input Settings” will now become “Reader Parameters”, which leads me nicely on to:
Settings, Parameters and Properties
There has been mixed use of the terms “settings”, “properties” and “parameters” in different dialogs. Anything called “Settings” or “Parameters” will be standardized as “Parameters”. The reasoning for this choice is that this is more in line with the use of the term in “Published Parameters”.
Properties is the title given to the Feature Type dialog.
That won’t change because (as you can see above) many already have a Parameters tab. What might happen – just to be consistent – is that transformer dialogs also get called “Properties”.
Other miscellaneous terms have been used interchangeably, so now we’re trying to make sure that we use the terms Objects (not Nodes), Windows (not Panes) and Username (not User, User Name, UserName, or Database User).
And the Swizzler? That will officially be renamed the Advanced Dataset Manager. But the term “Swizzler” will always be like a secret handshake for the society of long-term FME users.
Group-Based transformation is an important topic for any number of reasons.
Simply put, some transformers work on individual features at a time; for example the Rotator transformer takes a single feature, rotates it, then goes on to the next feature. This we call Feature-Based Transformation.
However, some transformers work on groups of features at a time; for example the Intersector transformer can’t just intersect one feature at a time, but must intersect a whole group of features all at once. This is what we call Group-Based Transformation.
So, the concept is simple enough, but perhaps the reasons it is important are less so. Here’s a list of reasons why you might want to know about Groups.
Have you ever wondered “which features form the ‘group’ in a Group-Based Transformation?” This is where it becomes interesting. By default the group is the entire set of features passing through that transformer; I think the term I want to describe this is “inclusive”.
However, many transformers have a setting called Group-By. This is where you say to FME, I don’t want to process ALL my features as a group, I want to create my own set of sub-groups. You select an attribute and all features with the same value for that attribute are formed into their own sub-group.
Below: If I have a transformer called “EatCandies” and set a group-by to “CandyColour” then I would eat all the yellow ones first, then the reds, then the greens, etc. Without a group-by I would just eat them all at once!
Below: Here’s a real example, a LineOnLineOverlayer transformer (click to enlarge):
“name” has been selected as the Group-By attribute. The result is that intersection will take place on features with the same value for “name”.
So Group-By is a very powerful ability, but it’s also worth remembering that this is an optional parameter; without it you’ll just be using ALL features as the group being processed.
2) Prototyping Workspaces
When creating a workspace for a specific task it’s usual to test it against a single feature. Once that works I might test it on another feature to test a different case. However, before I put it into production, I also need to consider if any transformers are group-based. If so, running the workspace on a whole dataset might produce unexpected results.
Check out this workspace on fmepedia. It creates a special type of one-sided buffer.
The first prototype worked wonderfully on a single feature, but when I fed multiple features into it I had problems. That’s because I’m using group-based transformers and this caused the features to interact amongst themselves.
So how did I fix it? Easy. By assigning each feature a unique ID, and by using that ID in the transformer Group-By parameters, I effectively made each feature a group by itself. Therefore it gets processed by itself without interacting with any other features.
3) Flow of Features
This is a strange one, but if you attended the Advanced training at the 2008 User Conference, you should know about this.
To keep it brief, each feature in FME is processed through the entire workspace before the next. ie we read a feature, process it, write it, read the next feature, etc.
But, group-based transformers break the usual pattern. They “blocK’ individual features from being sent on until the entire group is available for processing. Therefore the flow of features is interrupted.
Below: The “TakeBath” transformer is a blocker. Water features are stored up until ready to use. When “TakeBath” is complete, the plug is pulled and the water continues down the pipeline:
Why might this be a problem? Well sometimes a workspace needs to deliberately hold up features from being processed immediately; in the same way that a bath isn’t much good if it doesn’t block water. In fact the FeatureHolder transformer was designed especially for this purpose.
But on the other hand, sometimes a workspace is created which absolutely relies upon the feature-based nature of FME. Put a “blocker” transformer in there and you can break the whole workspace. For example “TakeShower” is not be a transformer which needs to be a blocker.
So it’s sometimes important to know whether you need to block or not. The most common scenario is when using VariableSetter and VariableRetriever transformers. Often these are used to pass information from one feature to the next (or even to the one before it!) – but only work when you have the correct use of group-based/feature-based transformers. Another use might be when you want to write features to a destination dataset in a certain order.
4) Memory Resources
Here’s a subject important to everyone: performance. The reason FME uses the flow of features described above (Read-Process-Write each feature individually) is because it’s very efficient. Very little data needs to be stored in memory.
But remember, group-based transformers break this pattern. In these cases FME has to potentially store every single feature in memory so they can be processed as a group. So point 1 is that you need to know that group-based transformers will use more system resources. Sometimes there’s no way around it – if you need to use a Clipper then you need to use a Clipper – but it’s still worth being aware of the issues.
Below: Processing multiple features at once always takes more resources than one at a time.
Point 2 is that putting together a string of group-based transformers can have a cumulative effect on memory. In other words transformer 2 might be hoarding features while transformer 1 still has the complete set already stored. It’s rare, but again worth being aware of.
The final point on the subject of performance, is that some transformers – that are by nature group-based – can be turned back into feature-based with the right settings. This would reduce memory use and improve performance. The “Clippers First” setting on the Clipper is one example. If you can guarantee that all Clippers being used arrive first, then there’s no need to store up a group of Clippees – they can be processed one at a time. This setting has the greatest effect when you have just a few clippers, but a huge number of clippees. Other transformers have similar parameters to help performance.
5) Passing Attributes
Many Group-Based transformers consume attributes (ie features are output without attributes) because FME doesn’t know which value out of the group to put onto the output. Sometimes you can get around this using a List parameter. However, it’s worth remembering that if you use a Group-By, then the attributes you select are usually passed through. That’s because FME knows they must be the same value (else they wouldn’t form a group) and so there is no confusion about which value to pass through.
6) Visualizer Fan-Out
A minor point, but a useful tip: Set a group-by in a Visualizer transformer and it has the effect of producing a fan-out in the data when displayed in the FME Viewer. Try it and see for yourself.
How to Tell if a Transformer is Group-Based:
Of course, a key question might be, how do I know if a transformer is group-based. If you can’t work it out then check for a group-by setting: that’s usually a good indicator. However, at some point in the future (maybe 2010) the transformer documentation will have a quick facts section which tells you if a transformer is group-based or not (though we aren’t decided yet whether “group-based” is the name we will use).
Standardizing terminology is also very important for what we call “localization”.
Localization has nothing to do with Unicode or non-English characters in spatial data (phew!) but instead refers to setting up the FME GUI in a series of different languages. This is quite important if FME is to be a truly world-wide product. For example, there are currently FME’s available in French and German, and there’s a version with Spanish documentation.
However, because localization relies on a language translation of the FME GUI, it’s very important that we have a standard set of terminology and stick to it. Hence the importance of this review.
So, by standardizing terminology we hope to make it easier to create local versions of FME, and that in the near future we’ll have editions of FME in many other languages.
This Edition of the FME Evangelist…
…was written to the tune of the Divine Comedy’s excellent Something for the Weekend.
She said, “there’s something in the woodshed. I know because I saw it. I can’t simply ignore it.”
He said, “now baby don’t be stupid, get this into your sweet head, there ain’t nothing in the woodshed – ‘cept maybe some wood!”
Mark IrelandMark, aka iMark, is the FME Evangelist (est. 2004) and has a passion for FME Training. He likes being able to help people understand and use technology in new and interesting ways. One of his other passions is football (aka. Soccer). He likes both technology and soccer so much that he wrote an article about the two together! Who would’ve thought? (Answer: iMark)