Author:
Mark Ireland

All Blog Posts    >    About FME    |   December 22, 2011   |    By Mark Ireland

FME 2012 Sneak Peek: Parallel-Processing

Hi folks,
If you’ve installed FME2012, or listen in to the @FMEEvangelist on Twitter, you’ll be aware that some transformers in 2012 have a new option: one to run the transformation as multiple processes.

And, although you might not realize it, this is a seismic shift in thinking, not just for the new capabilities, but for FME as a whole.

Why is that? Well, read on…

What is Parallel-Processing?
Each FME translation is usually a single process on your computer. Parallel processing is when you decide to transform your data as several simultaneous processes. The fact that they run simultaneously means the process will run several times quicker than it used to.

The idea is designed to make use of multiple-core processors that are so prevalent these days.

Each transformer that is capable of handling parallel processing will have a Parallel Processing Level parameter on it. You can choose levels of aggressiveness from “No Parallelism” up to “Extreme”:

The obvious question for most users is now, “but what do these levels mean?”, and I’ll explain that below. But first consider which features get assigned to which process. That’s an important question because features in one process get transformed separately to features into another; you can’t clip a feature in one process using a clip boundary that’s being used in another.

So, how do you assign features to processes? With a Group-By setting. When you set a group-by and choose to parallel process, each group gets run as a separate process. Then it doesn’t matter that there is no interaction between processes, because groups get processed individually anyway; the difference is that beforehand they would be processed consecutively, now they can be processed concurrently.

Check out this example:

I’m overlaying sets of area features, but only where each area belongs to the same country. By setting a parallel processing level, each country gets run as a separate (simultaneous) process, like so:

See, I have one Workbench running four “worker” processes.

So that’s potentially a big time-saver for users with multi-core machines, as a translation has the potential to be multiple times quicker. But it actually represents a shift in thinking for FME too, and let’s look at that next…

Why is it such a BIG thing?
Think about it. Up till now one FME license has resulted in one FME process. But now, you can run multiple-processes at the same time, and all with the same FME license!

So multiple-processing doesn’t need multiple-licenses. We decided it’s your FME. If you can improve performance with a multi-core machine then why should we stop you!

Beforehand you could probably run multiple FME’s at the same time, but we didn’t really design it that way. Now we’ve actually sat down and formalized exactly how this will work.

So this is not just a seismic shift in functionality, but in the way that we think of FME. I suspect it will also change how users set up their FME environment, and it’s going to be interesting to see how that happens.

So, are there any rules? Yes, and that is tied into what all the different processing levels mean.

The Parallel Processing Parameter
The values for the parallel processing parameter are obviously quite subjective, and you might be wondering if they map to any particular number of cores or processes. They do. The mapping is like this:

Parameter Processes
No Parallelism 1
Minimal Cores / 2
Moderate Cores
Aggressive Cores x 1.5
Extreme Cores x 2

So (for example) on a quad-core machine, minimal parallelism will result in two simultaneous FME processes. Extreme parallelism on an 8-core machine would result in 16 simultaneous processes.

However, there is also a hard cap for each different level of license.

License Type Process Cap
Base Edition 4
Professional Edition 8
Other Editions 16

So, if you have a base edition license you are never going to get more than four processes at a time, regardless of the machine type and parallelism parameter.

NB: These numbers are provisional. They might get tweaked up or down.

Which Transformers?
There are a number of transformers that have this capability. We picked the ones that were most likely to benefit from parallel processing (i.e. that had the most intensive processing and usually worked on groups of features).

The current list is:

  • Area/PolygonBuilder
  • AreaOnAreaOverlayer
  • Bufferer
  • Clipper
  • DEMGenerator
  • Dissolver
  • DonutBuilder
  • ImageRasterizer
  • Intersector
  • NeighborFinder
  • NumericRasterizer
  • PolygonBuilder
  • RasterDEMGenerator
  • SurfaceModeller
  • TINGenerator

However, you can also apply this to a Custom Transformer too, and this is really quite useful.

Firstly you can just multi-process data in their natural groups, using a custom transformer, for the sake of performance. And if there aren’t natural groups you could still assign different features to a different process by using the ModuloCounter to create an ID number to group-by. Just remember each group gets processed independently of each other.

But secondly, you can use this to apply an artificial group-by, where one didn’t exist before. For example, check out the Tiler transformer; it has no group-by setting. If you put a Tiler into a custom transformer, and set up parallel processing, then in effect you create a group-by where one never existed. The performance isn’t the issue here; it’s more one of using functionality in an innovative way.

Incidentally, to use Parallel Processing on a custom transformer, simply open the transformer definition and you’ll find the parallelism parameter in the Navigator window:

Cautions
So parallel processing can improve FME performance; but it can also degrade it or have very little effect, and it’s worth knowing when you should avoid using this technique.

Many, Small Groups
I would avoid this when you have very many groups each with a small number of features. Remember, each group fires up an FME process and that takes time. For example, with 10,000 groups of 10 features, you might find it costs more performance to start and stop FME 10,000 times than you save in parallel processing. Conversely, 10 groups of 10,000 features would probably be more worthwhile.

Other System Resources
Of course, you need to also make sure other system resources such as memory are adequate for the task. Firing up eight processes to do heavy polygon dissolving when you have eight cores is fine, but if you only have 2gb of memory then you’re just clogging up the machine and making everything slower.

I’m told that parallel processing is most efficient when the task is being offloaded elsewhere. For example if I have multiple requests to make via the HTTPFetcher, then it might be worth parallel processing. I could fire up as many processes as possible because it’s a tiny impact on my system resources.

Writing to Disk
When the task involves writing to disk then that becomes a bottleneck that I won’t solve by spawning multiple processes. My colleague Dmitri found that serializing point clouds suffers for that very reason, so that you hit the sweet spot with about 4 processes and increasing the number beyond that makes very little difference in final translation time; the extra processes are just competing for disk time.

Really – if you aren’t sure about what will work – I’d suggest trying a small subset of your data in multi-processing mode. It will help to give you an idea of whether it’s worth trying it on the full dataset.

Group-By
One final issue – although not related to performance – is that the group-by and process groups are tied together. You might not want this. For example you may wish to process in 8 groups, each of which has their own group-by. The best way to do this is to use a custom transformer, because you can set a group-by on a transformer inside it, and set a “process group” on the custom transformer itself. I suspect that we will detangle these (i.e. turn them into two parameters) for 2013.

Summary
So there you go. Parallel-processing can be a very beneficial tool. When you have a high-spec machine and a CPU-intense transformation to carry out, you can exploit multiple-cores at once and improve performance. I know some users have translations that take days to run and I’m hoping this technique will reduce that significantly.

It’s still an experimental technology, but we’re crunching numbers to see exactly which transformations will benefit the most, and in which configuration. When we find that out, you’ll read it here first!

In the meantime why not download a 2012 beta and give it a try? At this time of year we are so close to a release candidate that it’s virtually the finished product now.

Thanks for following me in 2011 and I hope to see you in 2012. Seasons greetings to one and all and best wishes for the new year.

Regards