Upserts and Dog-Walking: What’s New with Change Detection in FME 2019
I discussed database updates recently, and mentioned that there are two general scenarios.
One way is when you receive a changelog – a list of updates to be made – and apply them to your control database. That’s a simple one, because you know in advance what has changed and therefore which features need updates.
The second way of updating is when you receive a whole new dataset, with no indication of what is new or changed. For that you’ll need to do change detection. Previously an FME Hub transformer by the name of UpdateDetector was commonly used. But in 2019 we have a great update to the ChangeDetector transformer that should make it your go-to transformer from now on…
Change Detection: What’s New?
In 2018 and earlier the ChangeDetector transformer compared an “original” dataset against a “revised” version and separated the features into Added, Deleted, and Unchanged.
- Added: Records in the revised dataset that weren’t in the original
- Deleted: Records in the original dataset that aren’t in the revised
- Unchanged: Records in the revised dataset that matched a feature in the original dataset
However, what it did not do is identify records that had changed. If a record existed in both the original and revised, but one of its attributes was now a different value, then that counted as a new (Added) feature. This made it hard to carry out “upserts” (database updates where a record already exists) because it was harder to tell whether a feature was truly new or not.
In 2019, however, the transformer is capable of handling updated records. In fact the new transformer design shows this through an Updated output port:
So this is very important. It vastly widens the scope of change detection for this transformer.
However, there is the issue of matching original features to their revised counterpart. FME can’t decide a revised record has been updated, without an original record to compare against. This is done using a key attribute value, meaning that the 2019 ChangeDetector has new parameters to handle that:
We’ll look at an example shortly. For now just notice that the parameters dialog has a parameter called Update Detection Key Attributes, through which to select an ID or key value.
Anyway, if that functionality sounds familiar, it might be because we already included it in an FME Hub transformer called the UpdateDetector…
Replacing the FME Hub UpdateDetector
The UpdateDetector was created to fill gaps in the ChangeDetector’s functionality. Judging by the number of downloads, it was very popular transformer. But now it is deprecated on the FME Hub:
The new ChangeDetector not only replaces this hub transformer, it exceeds it in both functionality and performance. The UpdateDetector will still function in an existing workspace, but we advise you to replace it with a new ChangeDetector.
Now let’s take a look at an example of how the new ChangeDetector works…
Let’s say I have an address database:
Besides that I have been given a new version of the data. I must now determine which address records have changed, so that I can push those changes to the database. I do that by simply adding a reader for each dataset, and passing it to a ChangeDetector:
From there I can see that 35 records changed between the original and revised dataset, with 2 new ones added. Also 13 records from the original data are absent from the revision. The majority of records are unchanged. Let’s look at the parameters I used:
- GlobalID is the update detection key. This is the attribute that tells FME how to match records in the Original data with records in the revised data.
- The selected attributes are the ones I am checking for changes. i.e. where two records have a matching GlobalID, check these attributes for differences.
- This flag tells the transformer to also check the geometry in a spatial dataset. Several advanced parameters control those exact checks (see below).
- This parameter defines a list in which to store the changes that occurred; for example which attribute values differed and how.
Just as the parameters are a little different, so can the output be…
New Change Detection Output
Because the ChangeDetector transformer now checks for matches – but not necessarily on every attribute – it’s possible that Original and Revised might count as a match, and yet not be totally identical. For example, attributes a, b, and c are a match, but d is different. The features are still a match though because you didn’t pick d in the Selected Attributes.
To handle that scenario a parameter allows you to output either – or both – of the matched features:
If you do output both features, then a Match ID attribute is added, so that you can identify which features counted as a match.
Additionally, by setting a List Name, the output features record the differences between Original and Revised. This record (for example) has two differences:
The list tells me that two attributes (OWNERNM1 and OWNERNM2) were modified with new values, whereas this list:
…tells me that the geometry of the feature was modified.
Also – just as the UpdateDetector did – this transformer sets the fme_db_operation attribute. Here is an example for a deleted record (was in the original, but not the revised):
This means that I can simply pass features to a database writer, specify the Feature Operation (fme_db_operation) and match column (here GlobalID again)…
…and my address database is automatically modified with updates, additions, and deletions as necessary.
New Tolerance Algorithm
You may have noticed one of the advanced geometry parameters called Vector Tolerance:
Despite the name, this is not the same as the tolerance setting I discussed previously for FME 2018. Why is this tolerance different? Because it’s not trying to find where two lines intersect and it’s not trying to adjust existing points. Instead it finds whether two features are within tolerance, using something called the Fréchet distance.
The Fréchet distance is a measure of the similarity of two spatial features (usually “curves”) derived using the distance between the two features.
The common – and simpler – explanation is that of walking a dog. Say you walk along a path with your dog on a lead (leash). You walk in a relatively straight line (the red line, A, below), but your dog moves from side to side, in order to sniff at trees (the blue line, B):
The question is, how long does the lead need to be for each of you to walk your respective path? In the above diagram the widest gap between dog and walker is marked F. If my dog lead is at least F length, then we can walk our respective path without pulling at each other.
This concept makes a great solution for change detection. In FME terms a feature is unchanged if the Fréchet distance between Original and Revised is less than the specified tolerance value.
We feel that this algorithm is an improvement over the past method. It works on more geometries and it also allows applying tolerance in lenient matching mode, which wasn’t possible before.
Fréchet? Which Fréchet?
Please feel free to skip past this part if you aren’t into computational geometry. However, for you connoisseurs of Fréchet distances, you’ll notice this is a True Fréchet, not the Discrete Fréchet (which only calculates distance between vertex points). There is also a Weak Fréchet and FME uses that when the Lenient Geometry Matching parameter is active:
A Weak Fréchet is when you say that the walker or the dog is allowed to backtrack their steps. In a True Fréchet each has to keep moving forward.
In this example the dog walker (A) keeps walking in a straight line. The dog walks straight at first (to b1) but then veers around and to the right (to b2). They are still moving forward along their path, but their path is in a different direction:
In a True Fréchet the walker cannot reverse their path to account for this deviation. The most they can do is stop at their current position. For example here the walker minimizes the lead length by stopping to wait at point a1, while the dog follows their meandering path:
That, perhaps, is where the analogy breaks down a bit. Frechet distances are calculated knowing the path in advance, whereas it’s impossible in real life to know what course a dog will take!
Anyway, in a Weak Fréchet, the walker is allowed to reverse their course. When the dog starts heading from b1 to b2, the walker can double-back to say a2, to make for a shorter Fréchet:
If you don’t understand, you really shouldn’t worry. Just look at the two paths and remember that the Lenient Matching option means two features are more likely to be classed as a match.
More Change Detection Information
The above example looked at the new change detection behaviour. However, in some cases you might want to carry out the same Added/Deleted/Unchanged process that the old ChangeDetector did. i.e. you don’t necessarily need to look for modified records.
If that’s the case, then simply leave the Update Detection Key parameter empty:
Then your features are assigned to Inserted or Deleted, according to whether they were Original or Revised input:
And – of course – if you don’t have 2019, then you can still download the UpdateDetector. The term “UpdateDetector” now only appears in Workbench as an alias for the ChangeDetector though; so access the transformer through the FME Hub, being sure to check the option for Show Deprecated in the interface.
Additionally, the Matcher transformer got a small makeover for 2019. Its parameters dialog was refreshed and gained the same tolerance parameter and algorithm as the ChangeDetector.
So that’s what’s coming up for Change Detection. Now you’ve read this article you won’t be surprised to see the entirely new ChangeDetector dialog when you upgrade to FME2019.
I believe this is a really useful update, and I look forward to using it. Oh, and in case you were struggling(!) here’s the answers to the canine spot-the-difference image: