FME Cloud Support Policy

The FME Cloud support plans explicitly define what FME Cloud customers can expect from Safe Software when they run services on FME Cloud. By default all customers are on the Included support plan. If you wish to receive additional support, then you need to purchase a Managed support plan from an FME Cloud Managed Service Provider (also called an “MSP Partner”).

The main FME Cloud pricing page outlines the basic differences between the Included support plan and the Managed support plan.


Support Channels

There are 3 primary ways for customer to get support: checking the FME Cloud Status page, getting help from the community, or submitting a private support case.

FME Cloud Status

The Safe Software status page contains information on current production status and will be updated during an outage. After an outage, a root-cause analysis is performed and made available via the page.

Please check the status page before submitting a case. If the status page contains a notification about an outage, our team will already be working on restoring service as quickly as possible. Once the issue is resolved, the status page will be updated to reflect that. There is a button on the page to subscribe to notifications of any changes. A root-cause analysis will be published to the status page once an investigation has been done.

Community Forums

Safe’s Knowledge Base contains a wealth of information in the form of articles, tutorials, demos, and FAQs. If you can’t find what you are looking for in the Knowledge Base then you can post to our Q&A Forum or to our Ideas Exchange. A question or idea posted in either place receives the same priority as one privately submitted, and in fact may receive a faster response since people from the community are online 24/7. Everything posted in our community is reviewed by at least one Safer, and many questions see responses from multiple individuals. These resources are available to all customers.

Support Center

You may ask a question via live chat or our report a problem form. Response times are limited to certain hours of coverage and may vary depending on the relative volume and complexity of requests already in the queue. If you have a Managed support plan with an MSP Partner, then they will have defined in the contract how to contact them and their response times.


Scope of Support

Helpdesk Support

FME Cloud Helpdesk support requests cover development and production issues on FME Cloud. If you have a support contract with a MSP Partner, then they will provide the first-line of Helpdesk Support. Helpdesk support is limited to:

Helpdesk Support does not include:

Advanced Support

Advanced support is offered by our MSP Partners. Advanced support builds on the Included level of support offered via Helpdesk Support. The specifics of the offering depend on the terms agreed to between you and your MSP Partner. Some examples of the expertise our partners offer:


Service Priorities

We treat all services FME Cloud provides as mission-critical. There are two things we look at when prioritizing an issue, the severity level and the type of customer impacted. We have identified three severity levels: Critical, Urgent, and Normal. We always work on the highest severity issues first. If multiple customers are affected by the same issue, customers are prioritized based on their support plan and then the instance types as follows:

Severity Levels

Critical

These issues affect the foundational components of FME Cloud and mean your production workflows will be unavailable.

Examples:

Urgent

These issues affect major functionality of FME Cloud. In urgent issues, key production workflows are not affected, but you may experience a degraded level of performance. Data integrity is also not impacted.

Examples:

Normal

Normal priority can include technical questions, configuration issues, suggestions and defects that affect a small number of users. Typically acceptable workarounds exist.

Examples:


Response Times

FME Cloud customers across all support plans can expect the following initial response times during their plan’s hours of coverage, with priority given to customers on Managed support plans:

Severity Initial Response Time
Critical 2 hours
Urgent 1 business day
Normal 3 business days

We strive to meet these target response times, but they are not guaranteed. These times do not indicate how long a final resolution may take. As well, these target response times are only applicable during the hours of coverage defined in your support plan. (Note: Hours of coverage for Included support plans are 8am-5pm PST, excluding statutory holidays in BC, Canada. If you wish to receive support outside of these hours, you need to purchase a Managed support plan from an MSP Partner.)

As defined in the FME Cloud Shared Responsibility Support Model, if the cause of the issue originates from the FME Server application tier, it is your responsibility to diagnose and manage the issue. If you require the FME Server application tier to be proactively monitored, then you need to purchase a Managed support plan through an MSP Partner.

During a mass outage event, this response time will apply exclusively to customers on a Managed support plan. Our response time for customers on the Included support plan will be contingent upon the nature and priority of the outage. Wider spread outages will likely result in longer response times.


Security Incidents

We take security very seriously at Safe Software which you can read more about here. Continuous vulnerability scanning ensures new threats are identified quickly. On identifying a threat, we audit our infrastructure to see what is affected, and based on that, assess the security risk and assign a severity level.

If there is a vulnerability and it is high risk, we will immediately create a patch. Before patching, we will send an email out to the emergency contact of affected customers. If it is a lower severity issue, then we will prepare a patch and communicate the issue via the in-app notifications on the FME Cloud dashboard. When we deploy lower severity patches, we aim to strike a balance between risk and ensuring the impact on your production workflows is minimal.

If it is a high profile issue, we will post a debrief on our blog, e.g. here is a post detailing our exposure to the Heartbleed issue.


Incident Types

We have identified two types of incidents that could impact your level of service: outages and isolated issues.

We follow a specific process for each type of incident.

Outage Lifecycle

We have identified three levels of outage depending on its severity and how widespread it is. We follow a different process for each outage type.

Outage Type Description
Mass Outage Significant parts of the infrastructure are down: entire sets of customers are experiencing compromised performance. An example might be that FME Server instances become unreachable because of a network issue, or that there is a problem in the FME Cloud tier preventing instances from being started/stopped.
Limited Outage Compared to a mass outage, the issue must be limited in either the severity or number of people it affects. An example might be a degraded level of service in a specific data center which only affects 10% of customers. Or a bug that we introduced that affects all customers, but does not compromise the key production workflows.
Emerging Issue We have received reports from a small number of customers (or our monitoring tools have picked it up), usually about edge-case issues with compromised results.

In all cases, we report the issue on the Safe Software Status page. We continually update that status page until the issue has been resolved. It is recommended that you subscribe to receive email updates on this page as it is not tied to your FME Cloud account.

During an outage our target response times may be compromised: we do our best to meet them.

Isolated Issue

If we identify an issue related to your specific FME Server instance that we are responsible for (see FME Cloud Shared Responsibility Model), we will contact you directly. To ensure we contact the correct person, we have created an Emergency Contact form on the FME Cloud account settings page. Please fill this out and ensure it is kept up to date. If you do not have a named contact on file, it may impact your level of service as we sometimes need your permission to fix an issue.

Isolated Issue Lifecycle

  1. Safe's monitoring tools identify an issue with your instance that is causing a drop in the level of service. The monitoring tools are integrated with our incident management platform and will automatically trigger an issue.
  2. A support engineer will acknowledge the incident during the hours defined in your support plan.
  3. If the engineer can fix the incident without gaining SSH access, then we will. If successful, we will email the emergency contact to explain the cause.
  4. If we need to gain SSH access to the instance, we will contact the emergency contact to request permission (via phone and/or email).
  5. On gaining permission, we will work to resolve the issue as quickly as possible.
  6. Once the issue is resolved, we will contact the emergency contact to debrief them on the cause and outline the steps we have taken to prevent the issue happening again.

Key Production Workflows

These are the pieces of functionality that we have identified as being key for production workflows.


FME Cloud Shared Responsibility Model

FME Cloud is a Platform as a Service (PaaS). Two components comprise FME Cloud. The first component is the dashboard/API, herein referred to as the FME Cloud tier. This is a multi-tenant application where FME Cloud customers sign up, launch/manage FME Server instances, and conduct billing and account management.

The second component is the FME Server instances. These are where FME Cloud customers publish their workspaces and associated data. Each FME Server instance is a self-contained environment, isolated from other instances, and includes compute, storage, and database services.

Monitoring, securing and maintaining the FME Cloud tier is the sole responsibility of Safe Software.

For the FME Server instances, to ensure a high level of uptime, both the FME Cloud customer and Safe Software are responsible for supporting the instance—a shared responsibility model. As a customer, you can purchase a Managed support plan from an MSP Partner who will then handle the customer-side responsibilities on your behalf.

Proactive Monitoring Of The FME Cloud Tier

The FME Cloud tier is monitored 24x7 by comprehensive automated systems. In the event of any issue affecting the health and operation of the infrastructure, core systems, or tools, our dedicated operations team is notified and will respond to diagnose and correct any issues. This 24×7 monitoring of the FME Cloud tier benefits all FME Cloud users.

FME Server Instances

Delivering a high level of uptime for the customer’s FME Server deployment on FME Cloud is slightly different to on-premises data centres. When the FME Cloud customer moves their FME Server deployment up to the cloud, the responsibility of ensuring a high level of uptime for their instance is split between the FME Cloud customer/MSP Partner and Safe Software. Safe Software is responsible for monitoring and maintaining the operating system down to the hardware powering the instance, and the FME Cloud customer/MSP Partner is responsible for monitoring and maintaining the FME Server application. This shared responsibility model can reduce the FME Cloud customer’s operational burden in many ways.

Figure 1: FME Cloud Shared Support Responsibility Model

Safe Software Support Responsibilities

Safe Software is responsible for monitoring and responding if there is an issue with the operating system, hardware or network. We monitor the health and operation of all these components and will be alerted immediately if there is an issue.

Operating System: FME Server instances run on Ubuntu. Safe Software will fix any issues at the operating system (OS) level. Before gaining access to the instance, permission will be requested from the emergency contact on the account.

Hardware Failure: If there is an issue with the underlying hardware hosting the instance, Safe Software will be alerted and will work to either fix the issue or help the FME Cloud customer migrate to another instance if the damage is irreparable.

Networking: If there is a network issue that causes connectivity to the machine to degrade, then Safe Software will be alerted and will work to fix the issue. If it is a global outage that affects all customers, Safe Software will communicate the issue as defined in our support policy.

Customer/MSP Partner Support Responsibilities

FME Cloud is a Platform as a Service (PaaS), allowing the FME Cloud customer to provision an instance with FME Server installed in minutes instead of weeks. On provisioning the instance, Safe Software has no ability to access the instance through the FME Server web interface or APIs. This means it is impossible for Safe Software to support the FME Server application uptime as we have no access, and thus insight, into FME Server workloads being run. It is this application tier that the MSP Partner or FME Cloud customer is responsible for supporting.

Monitoring and Automated Alerts

To help the FME Cloud customers and MSP Partners manage this application tier, FME Cloud provides a suite of tools (in addition to those FME Server provides).

Disk Monitoring: If an FME Server instance runs out of disk space, then it can cause a critical outage as FME Server requires free disk to function. FME Cloud customers/MSP Partners can monitor disk usage and define alerts that will send a notification when the amount of remaining disk goes below a certain value.

Memory Monitoring: If an FME Server instance is consistently running out of memory, then it can potentially cause a severe degradation of service. FME Cloud customers/MSP Partners can monitor memory and define alerts that will send a notification when the memory usage is above a certain value for a period of time.

Web Server Responsiveness: If an FME Server instance is overloaded, or experiences connectivity issues, one of the best indicators of a potential critical outage is whether the FME Server web server is responsive. FME Cloud customers/MSP Partners can define alerts on the server response time and there is a special alert which triggers when the server is non-responsive. Notifications can then be configured to ensure the correct people are instantly made aware of the issue.

FME Server Load: If an instance is constantly overloaded, then it can cause a degradation in service as all services (engines, web server, database, etc.) share the same compute. For example, if an FME Engine hogs all of the CPU, then it can cause the database and web server to crash. If the load is consistently high, then the FME Cloud customer/MSP Partner may need to upgrade the instance type. FME Cloud customers/MSP Partners can monitor server load and define alerts that will send a notification when the load is above a certain value for a period of time.

Security Update Management

Ensuring the FME Cloud customer’s instance is secure is critical to ensuring a high level of uptime. If the operating system is not patched with the latest fixes, then the instance could be vulnerable to attack. FME Cloud provides automated security patching which allows FME Cloud customers/MSP Partners to ensure the instance is patched with a few clicks in the dashboard.