Tortoises & Tesseracts: ETL for BI Data Warehousing
The municipality of Arnhem wanted all of their ETL processes on one platform—and to align their geospatial and business intelligence working groups. But could FME handle it all? Slowly Changing Dimensions were one of many BI-centric tasks that it would need to deal with…
I should probably start by explaining this title.
It all started with an email a while back, with the news that the municipality of Arnhem in the Netherlands had decided to try closing the gap between their geo and IT departments – and that included gathering all of their ETL processes onto one platform, if possible. And since they implemented FME Server in 2014, Engelbert Wijnhoven, ICT / BI Specialist, wanted to see if FME could successfully handle all the workflows they needed.
Which brings us to the tortoises and tesseracts.
Now, I had never heard the phrase “Slowly Changing Dimension” before, but it’s a key aspect to BI data warehousing. Me, I thought it sounded like a great premise for a science fiction treatment of a classic fable.
Apparently it’s also a good scenario for an FME user success story.
Challenge: Slowly Changing Dimensions
Slowly Changing Dimensions, or SCDs, as I soon found out, are pieces of information that are mostly static, but can change—and do, but not often or predictably. A change in someone’s title, for example, or a supplier that relocates to another geographic area.
The importance of tracking these changes has more to do (generally) with accurately documenting history than with what’s going on in the immediate present. And accurate history is one of the reasons that we warehouse data – so that questions about the exact state of things at specific moments in time can be answered.
The conventional BI-oriented ETL tool that Engelbert was using is quite accustomed to dealing with this.
FME Takes On BI
Much of the work involved in this project was quite straightforward schema mapping, scheduling, and automation. As fans know already, FME is adept at moving data to where it’s needed – whether or not there is geometry attached.
The case of SCDs, though, took a bit more thinking. When a record comes through, it’s not simply a matter of insert/update/delete. Itay and Engelbert created this workspace (one of many!) that reads data from both the staging and storage platforms, and performs detailed checks by doing CRC calculations on the contents of the record itself—which then determines the further routing and database actions performed as a result.
The goal is to identify non-predicted state changes, and both preserve them and instantly deal with them. By using the CRCCalculator to generate unique keys, and then indexing them, change identification is fast and accurate.
Right now, Arnhem has somewhere in the neighborhood of 30 geospatial tasks and another 25 non-spatial BI workflows that FME Server is taking care of automatically. The project has been an absolute success, both technically and in aligning different working groups on a single platform and bringing them together.
FME is maintaining enterprise geospatial databases, performing data integration tasks, and now populating the BI data warehouse. If a task involves moving data from one place to another, chances are FME is taking care of it—and that has provided consistency across departments, and across technologies.
“BI Data warehousing and FME was a challenge,” says Engelbert. “But combining the two products FME Desktop and FME Server was the best choice we’ve made. We have now one environment for all spatial and non spatial ETL models.”
Itay shared his experience working on the project. “The main challenge for me was not to see any spatial objects in the Data Inspector the way I’m used to. The Table view became essential. But as another FME Certified Pro mentioned, your imagination is the only limitation to what you can accomplish with FME!”
We’re seeing ever more wide-ranging applications of FME in the non-spatial and BI space, and successful projects like this serve to reinforce that spatial and non-spatial are really all just data, and not all that different—on the same plane, I might venture to say!
Even when it changes at the speed of a tortoise in another dimension.
Related Resources –