Software vendors worldwide are turning their enterprise web applications into Platform as a Service (PaaS) products to provide their customers with the benefits of cloud infrastructure. FME Cloud, Safe Software’s PaaS, enables our clients to take advantage of our enterprise data transformation solution, FME Server, without the costs of hardware.

This two-part blog series dives into the details behind FME Cloud’s development to give you a behind-the-scenes look into its delivery. I’ll talk about our process and give some insight into what methods and technologies have worked for us.

Read part two: Monitoring and maintaining the infrastructure.

The team

First, an overview of the resources and methodologies we used to deliver the platform. A team of three developers collaborated on the project from day one. We worked with relative autonomy, using technologies that were the best fit for the job, rather than because Safe had a historical tie to them. We were also given the freedom to create our own development process with scrum and test-driven development, enabling a continuous deployment approach. Looking back, this freedom was crucial for the success of the project.

Architecture at a glance

A key focus of the product is to allow our customers to leverage the benefits of the cloud without exposing them to the complexities. Our service essentially exposes key parts of the Amazon Web Services (AWS) platform in an intuitive way, with us automating and handling the more complex design decisions.  Major features of the platform include:

The FME Cloud PaaS is a fairly typical architecture. We have a Rails application that the web application and API runs on, which interacts with AWS via their APIs. This allows users to launch and manage their instances, as well as undertake payment and account management.

Overview of the FME Cloud architecture

Overview of the FME Cloud architecture

Services at a glance

The bulk of the platform runs on the AWS cloud computing platform. Other key services include Braintree, which enables us to securely store customers credit cards and accept payments online.  We will cover the other services as we progress through the architecture below.

The services we use to deliver the FME Cloud PaaS

The services we use to deliver the FME Cloud PaaS

Automated build process

Whenever a new build of FME Server (the software we are providing to customers as a service) passes the internal test suite, an automated build process ensures FME Cloud is updated with the new version within a couple of hours. In FME Cloud we use Amazon Machine Images (AMI’s), which are snapshots of EC2 instances. They contain all of the information required to launch an instance.

To build the AMI we need two components: the FME Server installer and the Chef cookbooks. The Chef cookbooks are used to configure the server. These cookbooks are essentially scripts that configure the AMI from a standard Ubuntu install to an optimized install of FME Server. Chef means we can make our infrastructure as versionable and testable as our application code. Without it, we would never be able to manage all of the different FME Server versions that we host.

As the diagram below shows, either a new Chef version or a new FME Server version can trigger the build process. The workflow is triggered via messages sent to the AMIBuilder SQS queue. The AMIBuilder instance is launched from an auto-scaling group on the SQS queue, and this instance runs a series of scripts that builds and tests the AMI. Once the AMI is built and all of the Chef Minitests pass, we alert the Rails application which deploys the AMI to all of the AWS regions ready to be used by clients.

Diagram showing the automated build process we use to create FME Cloud instances

The automated build process we use to create FME Cloud instances

 

This architecture was designed a few years ago, and we would likely do things slightly different now with Lambda integrated in Simple Workflow Service for replacing the instances’ running code. Moving forward, the entire build process will also be simplified as we switch to Docker.

Tip

Deployment of web application

The web application—that is, the Rails application that serves the FME Cloud dashboard and API—also runs on AWS. We use a customized version of AWS Opsworks to deploy the application from GitHub, and Semaphore for continuous integration testing which allows us to deploy in confidence quickly.

Now to dive into the architecture in a bit more detail. We push all of our compiled assets (that means JavaScript, images and CSS files) to S3 and serve them with CloudFront. Using S3 in conjunction with CloudFront is critical. CloudFront uses edge cache locations, which means files are distributed all around the world. When a user requests the file they are routed to the closest location.

We use an ELB with the Opsworks failover procedure configured. This means if the Rails layer does fail, the ELB will bring up a new instance—this works as the Rails layer is stateless.

Opsworks deployment of application servers.

Opsworks deployment of application servers.

Instance launch process

The launch process is a crucial part of the PaaS, as it replaces the need for implementation of equivalent on-premises software. On FME Cloud, the launch takes about 6 minutes, which saves time for the user.

The instance launch process on FME Cloud is managed by CloudFormation. When you request to launch an instance—either via the dashboard or API—the AWS Ruby SDK launches a CloudFormation stack with a pre-configured CloudFormation template and custom parameters. CloudFormation allows you to provision a group of related AWS resources. With AWS handling the ordering for provisioning, all resources are rolled back if one component of the launch fails.

CloudFormation launches the EC2 instance from the AMI, creates IAM roles and security groups, creates DNS entries in Route 53 and creates an SQS queue for the instance. Bash code is also injected into the instance meta-data tag which runs on instance launch and bootstraps the instance. The bash code pulls down the Chef recipes which configures the server by downloading a license, SSL certificate and configuring Linux and FME Server for the hostname that was assigned on launch. Once this is finished, Chef Minitests run to perform final checks to see if all components of FME Server are running as expected. If at any point an issue is identified, a failure action can be sent to CloudFormation to manually trigger a rollback. On FME Cloud, three CloudFormation launches are tried before the launch is abandoned and the user is alerted.

CloudFormation launch process

CloudFormation launch process

Tips

Instance configuration

Within the FME Cloud dashboard and via the API, the user can perform various actions against the instance (i.e. increase disk size, resize instance type, and create backups). These actions are very powerful for the user, as switching out hardware in a matter of minutes is not available on the on-premises deployment of FME Server.

A request from the user triggers the Rails application which uses the AWS API to trigger the actions against the instance. Initially we used a synchronous call directly to the API, but quickly realized this was not going to work because of the frequent failure of API calls creating a poor user-experience.

We switched to using a state machine to handle transitions between each API call and a queue/worker system (DJ) to perform the API calls. This allows failed API calls to be retried  without halting the entire workflow, enabling us to reliably chain together multiple API calls.

The diagram below shows a segment of our state machine for the disk resize workflow. Each state has one API call to AWS, or a logical grouping of API calls if that makes more sense. When an API call is sent, the process differs slightly depending on the type of request, but the state machine normally puts a job into the delayed_job queue. When the job runs, it makes the call against the AWS API. If the call fails then it is retried using an exponential backoff algorithm. Once the API call to trigger an action has been sent, the success of its execution needs to be evaluated. Many AWS resources have the functionality to actively push state back to our stack which makes things easy. If an AWS resource (like snapshots) do not have the functionality, we use a scheduler to poll all resources in a transition state and notify the state machine whenever the state changes.

Workflow that is initiated when a user resizes a disk

Workflow that is initiated when a user resizes a disk

Tips

*

This should offer some insight into how FME Cloud is architected and the services we use to build it. Next time, we will look at the automation behind our billing system, security, and maintenance.

About FME Amazon Web Services Cloud Architecture Cloud Computing FME Cloud

Stewart Harper

Stewart is the Technical Director of Cloud Applications and Infrastructure at Safe. When he isn’t building location-based tools for the web, he’s probably skiing or mountain biking.

Comments

One response to “Behind the scenes FME Cloud: Overview and Architecture”

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts