In last week’s blog post, we looked at how FME Cloud was architected and the services we used to build it. In this post, we will look at how the automated billing works and the automation we have in place to help us secure and maintain our infrastructure. I’ll share some tips based on what we have learned.
Automated billing models for the cloud
FME Cloud has two billing models: pay for an instance hourly, or save and pay annually.
Because FME Cloud is a data integration platform, users can store and send huge volumes of data in and out of instances. Since AWS charges us for both the data transfer costs and the data storage costs, FME Cloud passes the costs onto the client. Every month, FME Cloud calculates the costs, automatically generates an invoice, and charges the credit card.
Two monthly components are required to generate the invoice:
Number of hours each instance ran
To calculate this, FME Cloud simply keeps a log of when an instance was started, stopped, and terminated.
AWS charges for each instance
FME Cloud uses the cost allocation report provided by AWS to identify this. This is a CSV file you can configure to appear in an S3 bucket containing every cost you have incurred on AWS that month. When you create an item within AWS, you can apply custom tags. We create a unique ID called stack name. This stack name then appears in the cost allocation report so this is essentially used as a primary key. These costs are then combined with the hourly costs when the invoice is created.
Customers can also check a real-time view of their costs. We decided to do this early on so customers could gain confidence in the pricing model without committing to a full month. This is calculated using the same logic that runs to generate the monthly invoice. The instance hours are calculated hourly, and every six hours a script runs to import the AWS costs. This updates the database table that the web application and API reads.
Charging a credit card is a relatively simple task with Braintree’s intuitive API. The more complex process for us is integrating the revenue and losses into our other accounting systems. Every month we do a bulk export into Salesforce to sync our systems. We also created an admin dashboard for the sales and accounting teams to handle the payments and check invoices.
- Assume things can change in the cost allocation report. It is a good idea to have checks that alert you when your parsing script encounters usage types that it doesn’t know how to handle. The AWS billing changes over time (new region, new usage pattern, etc.) so the script should alert you instead of ignoring unknown records.
- Reserved instances are worth leveraging to lower EC2 costs. They are, however, a pain to manage, and if you have more than a few instances you will need to use a service. We use Cloudability.
- When moving into a different AWS Region, pricing gets complicated as you are charged different prices per region. We decided to absorb the differences to make things simpler.
- Because reserved instances are not assigned to a specific instance, you can’t use the cost allocation report to charge for EC2 usage. If an instance is applied to a reserved instance, the values come through as zero in the cost allocation report.
PaaS instance management
Monitoring, Alerts, and Notifications
Because FME Cloud is a PaaS and we are providing the customer with an instance, there is joint responsibility between the customer and ourselves to keep the instance running. If there is a hardware failure, then we need to step in and recover the instance. However, if the user runs the disk out of space or overloads the instance, then they have the tools to remedy the situation themselves.
Monitoring is at the heart of things here. CollectD runs on each customer instance and sends a continuous stream of data to Librato, which compiles the data and allows alerts to be configured. Whenever an instance is launched, default alerts are configured in Librato to monitor disk space and response time. As you can see from the diagram below, a notification framework in the Rails application allows users to monitor and configure the Librato notifications through a usable interface. This is mostly what our iPaaS is about, taking away the complexities of running an instance in the cloud by exposing only what the customer needs to see and automating the rest.
- Many great “turnkey” services exist to fully monitor servers (Scout, Datadog, NewRelic). You should use them if you only need to monitor a couple of static servers. Only consider rolling out a custom solution if the price tag grows too high or you really need a more customized workflow.
- Choose a service that is well documented, integrates with the technologies you use, and has a solid API.
We decided to not allow users to SSH onto the instance they launch. We felt it would have otherwise been impossible to maintain and support instances with customer modifications and changes. After a while, we realised that in some scenarios we would need to access the instance to perform actions, such as triggering a snapshot, patching the operating system, and updating the licenses. Because of the way we had secured our instances, we still didn’t want to SSH into the instances ourselves. We therefore set up the following architecture with SQS queues.
If the Rails app wants to send a message to the instance—to patch the operating system, for example, it sends a message to an SQS topic, which forwards it on to the instance’s SQS queue. Each instance has its own SQS queue created by the launch CloudFormation script. A Ruby script is scheduled to run on the instance periodically and check the SQS queue for messages. If there is a message, the Ruby script will pull it down and run the script embedded in the message. If the Rails application needs to be notified after the script is run, then it can use an outgoing queue that the Rails application polls.
Maximizing cloud security
When transitioning clients from running their infrastructure on premises to running it in the cloud, one of the first things many will want to discuss is security. Security is a complex problem but there are several things we have done that have worked well to help customers gain confidence in the platform.
- Leverage existing services that are compliant (e.g. Braintree is PCI compliant so we don’t have to worry about handling credit cards, and AWS complies with almost everything you can comply with).
- Leverage the security material provided by the cloud provider you are using. AWS provides a large amount of material, so we will occasionally reference this for customers.
- Create your own security whitepaper that you can give to customers, walking through what you have in place.
- Ensure you understand how the security access control works for your cloud provider (i.e. IAM). If you can correctly leverage the security that the cloud provider offers, that is a big first step towards securing your architecture.
- Find a security company to partner with. You need to find a security professional who is CISSP certified that can help you practice security-first design, perform vulnerability and network security tests, and review your architecture. Doing this early on gets your developers thinking in the security mindset.
- Use network security and vulnerability management tools like Qualys that continuously scan your architecture for vulnerabilities. The threats are continuously changing so having these real-time tests are key.
We are obviously big on automation here at Safe. For FME Cloud, we have spent a lot of effort designing automated billing, maintenance, and security processes. The above tips should provide some insight into what works for us and what we have learned. To read about the architecture and services involved, check out the first FME Cloud “behind the scenes” blog post.
Stewart HarperStewart is the Technical Director of Cloud Applications and Infrastructure at Safe. When he isn’t building location-based tools for the web, he’s probably skiing or mountain biking.