FME Cloud Support Policy
The new FME Cloud support plans explicitly define what FME Cloud customers can expect from Safe Software when they run services on FME Cloud. By default all customers are on the Included support plan. If you wish to receive additional support, then you need to purchase a Managed support plan from an FME Cloud Managed Service Provider (also called an “MSP Partner”).
The main FME Cloud pricing page outlines the basic differences between the Included support plan and the Managed support plan.
Support Channels
There are 3 primary ways for customer to get support: checking the FME Cloud Status page, getting help from the community, or submitting a private support case.
FME Cloud Status
The Safe Software status page contains information on current production status and will be updated during an outage. After an outage, a root-cause analysis is performed and made available via the page.
Please check the status page before submitting a case. If the status page contains a notification about an outage, our team will already be working on restoring service as quickly as possible. Once the issue is resolved, the status page will be updated to reflect that. There is a button on the page to subscribe to notifications of any changes. A root-cause analysis will be published to the status page once an investigation has been done.
Community Forums
Safe’s Knowledge Base contains a wealth of information in the form of articles, tutorials, demos, and FAQs. If you can’t find what you are looking for in the Knowledge Base then you can post to our Q&A Forum or to our Ideas Exchange. A question or idea posted in either place receives the same priority as one privately submitted, and in fact may receive a faster response since people from the community are online 24/7. Everything posted in our community is reviewed by at least one Safer, and many questions see responses from multiple individuals. These resources are available to all customers.
Support Center
You may ask a question via live chat or our report a problem form. Response times are limited to certain hours of coverage and may vary depending on the relative volume and complexity of requests already in the queue. If you have a Managed support plan with an MSP Partner, then they will have defined in the contract how to contact them and their response times.
Scope of Support
Helpdesk Support
FME Cloud Helpdesk support requests cover development and production issues on FME Cloud. If you have a support contract with a MSP Partner, then they will provide the first-line of Helpdesk Support. Helpdesk support is limited to:
- Troubleshooting operational or systemic problems on both the FME Cloud tier and FME Server instances.
- Troubleshooting security concerns on both the FME Cloud tier and FME Server instances.
- Troubleshooting access Issues to either the FME Cloud tier or FME Server instances.
- Proactive investigation into product regressions, deficiencies and security threats.
Helpdesk Support does not include:
- Proof of concepts
- Advice on leveraging third-party services that complement typical FME Cloud deployments
- Performing system administration tasks
Advanced Support
Advanced support is offered by our MSP Partners. Advanced support builds on the Included level of support offered via Helpdesk Support. The specifics of the offering depend on the terms agreed to between you and your MSP Partner. Some examples of the expertise our partners offer:
- Architectural Review - Review of your current architecture and advice on how to migrate to the cloud to take advantage of the many opportunities it presents.
- Security - Going beyond securing FME Cloud, and advising on leveraging other applications and data in the cloud so FME Cloud can be deployed as part of a secure architecture.
Service Priorities
We treat all services FME Cloud provides as mission-critical. There are two things we look at when prioritizing an issue, the severity level and the type of customer impacted. We have identified three severity levels: Critical, Urgent, and Normal. We always work on the highest severity issues first. If multiple customers are affected by the same issue, customers are prioritized based on their support plan and then the instance types as follows:
- Managed support plan
- Included support plan with Standard, Premium, Professional and Enterprise instance types.
- We rely on this framework to help us triage issues internally and to set your expectations.
Severity Levels
Critical
These issues affect the foundational components of FME Cloud and mean your production workflows will be unavailable.
Examples:
- FME Server instances: Your instance is unresponsive and you are unable to access it.
- FME Server instances: Data integrity may be at stake.
- FME Cloud tier: You can’t pause or start your instances.
Urgent
These issues affect major functionality of FME Cloud. In urgent issues, key production workflows are not affected, but you may experience a degraded level of performance. Data integrity is also not impacted.
Examples:
- FME Server instances: Your FME Server is operational but certain functionality degraded.
- FME Cloud tier: Non-critical functions such as resizing a disk are not available.
Normal
Normal priority can include technical questions, configuration issues, suggestions and defects that affect a small number of users. Typically acceptable workarounds exist.
Examples:
- FME Server instances: Advice on configuring FME Server workflows.
- FME Cloud tier: Report missing or erroneous documentation.
Response Times
FME Cloud customers across all support plans can expect the following initial response times during their plan’s hours of coverage, with priority given to customers on Managed support plans:
Severity | Initial Response Time |
---|---|
Critical | 2 hours |
Urgent | 1 business day |
Normal | 3 business days |
We strive to meet these target response times, but they are not guaranteed. These times do not indicate how long a final resolution may take. As well, these target response times are only applicable during the hours of coverage defined in your support plan. (Note: Hours of coverage for Included support plans are 8am-5pm PST, excluding statutory holidays in BC, Canada. If you wish to receive support outside of these hours, you need to purchase a Managed support plan from an MSP Partner.)
As defined in the FME Cloud Shared Responsibility Support Model, if the cause of the issue originates from the FME Server application tier, it is your responsibility to diagnose and manage the issue. If you require the FME Server application tier to be proactively monitored, then you need to purchase a Managed support plan through an MSP Partner.
During a mass outage event, this response time will apply exclusively to customers on a Managed support plan. Our response time for customers on the Included support plan will be contingent upon the nature and priority of the outage. Wider spread outages will likely result in longer response times.
Security Incidents
We take security very seriously at Safe Software which you can read more about here. Continuous vulnerability scanning ensures new threats are identified quickly. On identifying a threat, we audit our infrastructure to see what is affected, and based on that, assess the security risk and assign a severity level.
If there is a vulnerability and it is high risk, we will immediately create a patch. Before patching, we will send an email out to the emergency contact of affected customers. If it is a lower severity issue, then we will prepare a patch and communicate the issue via the in-app notifications on the FME Cloud dashboard. When we deploy lower severity patches, we aim to strike a balance between risk and ensuring the impact on your production workflows is minimal.
If it is a high profile issue, we will post a debrief on our blog, e.g. here is a post detailing our exposure to the Heartbleed issue.
Incident Types
We have identified two types of incidents that could impact your level of service: outages and isolated issues.
- Outages: An outage potentially impacts more than one customer and instance.
- Isolated: This incident is specific to one FME Server instance and thus one customer.
We follow a specific process for each type of incident.
Outage Lifecycle
We have identified three levels of outage depending on its severity and how widespread it is. We follow a different process for each outage type.
Outage Type | Description |
---|---|
Mass Outage | Significant parts of the infrastructure are down: entire sets of customers are experiencing compromised performance. An example might be that FME Server instances become unreachable because of a network issue, or that there is a problem in the FME Cloud tier preventing instances from being started/stopped. |
Limited Outage | Compared to a mass outage, the issue must be limited in either the severity or number of people it affects. An example might be a degraded level of service in a specific data center which only affects 10% of customers. Or a bug that we introduced that affects all customers, but does not compromise the key production workflows. |
Emerging Issue | We have received reports from a small number of customers (or our monitoring tools have picked it up), usually about edge-case issues with compromised results. |
In all cases, we report the issue on the Safe Software Status page. We continually update that status page until the issue has been resolved. It is recommended that you subscribe to receive email updates on this page as it is not tied to your FME Cloud account.
During an outage our target response times may be compromised: we do our best to meet them.
Isolated Issue
If we identify an issue related to your specific FME Server instance that we are responsible for (see FME Cloud Shared Responsibility Model), we will contact you directly. To ensure we contact the correct person, we have created an Emergency Contact form on the FME Cloud account settings page. Please fill this out and ensure it is kept up to date. If you do not have a named contact on file, it may impact your level of service as we sometimes need your permission to fix an issue.
Isolated Issue Lifecycle
- Safe's monitoring tools identify an issue with your instance that is causing a drop in the level of service. The monitoring tools are integrated with our incident management platform and will automatically trigger an issue.
- A support engineer will acknowledge the incident during the hours defined in your support plan.
- If the engineer can fix the incident without gaining SSH access, then we will. If successful, we will email the emergency contact to explain the cause.
- If we need to gain SSH access to the instance, we will contact the emergency contact to request permission (via phone and/or email).
- On gaining permission, we will work to resolve the issue as quickly as possible.
- Once the issue is resolved, we will contact the emergency contact to debrief them on the cause and outline the steps we have taken to prevent the issue happening again.
Key Production Workflows
These are the pieces of functionality that we have identified as being key for production workflows.
- API and web access to running FME Server instances
- Ability to pause an instance via the FME Cloud dashboard, a schedule, or the API.
- Ability to start an instance via the FME Cloud dashboard, a schedule or the API.
FME Cloud Shared Responsibility Model
FME Cloud is a Platform as a Service (PaaS). Two components comprise FME Cloud. The first component is the dashboard/API, herein referred to as the FME Cloud tier. This is a multi-tenant application where FME Cloud customers sign up, launch/manage FME Server instances, and conduct billing and account management.
The second component is the FME Server instances. These are where FME Cloud customers publish their workspaces and associated data. Each FME Server instance is a self-contained environment, isolated from other instances, and includes compute, storage, and database services.
Monitoring, securing and maintaining the FME Cloud tier is the sole responsibility of Safe Software.
For the FME Server instances, to ensure a high level of uptime, both the FME Cloud customer and Safe Software are responsible for supporting the instance—a shared responsibility model. As a customer, you can purchase a Managed support plan from an MSP Partner who will then handle the customer-side responsibilities on your behalf.
Proactive Monitoring Of The FME Cloud Tier
The FME Cloud tier is monitored 24x7 by comprehensive automated systems. In the event of any issue affecting the health and operation of the infrastructure, core systems, or tools, our dedicated operations team is notified and will respond to diagnose and correct any issues. This 24×7 monitoring of the FME Cloud tier benefits all FME Cloud users.
FME Server Instances
Delivering a high level of uptime for the customer’s FME Server deployment on FME Cloud is slightly different to on-premises data centres. When the FME Cloud customer moves their FME Server deployment up to the cloud, the responsibility of ensuring a high level of uptime for their instance is split between the FME Cloud customer/MSP Partner and Safe Software. Safe Software is responsible for monitoring and maintaining the operating system down to the hardware powering the instance, and the FME Cloud customer/MSP Partner is responsible for monitoring and maintaining the FME Server application. This shared responsibility model can reduce the FME Cloud customer’s operational burden in many ways.
Safe Software Support Responsibilities
Safe Software is responsible for monitoring and responding if there is an issue with the operating system, hardware or network. We monitor the health and operation of all these components and will be alerted immediately if there is an issue.
Operating System: FME Server instances run on Ubuntu. Safe Software will fix any issues at the operating system (OS) level. Before gaining access to the instance, permission will be requested from the emergency contact on the account.
Hardware Failure: If there is an issue with the underlying hardware hosting the instance, Safe Software will be alerted and will work to either fix the issue or help the FME Cloud customer migrate to another instance if the damage is irreparable.
Networking: If there is a network issue that causes connectivity to the machine to degrade, then Safe Software will be alerted and will work to fix the issue. If it is a global outage that affects all customers, Safe Software will communicate the issue as defined in our support policy.
Customer/MSP Partner Support Responsibilities
FME Cloud is a Platform as a Service (PaaS), allowing the FME Cloud customer to provision an instance with FME Server installed in minutes instead of weeks. On provisioning the instance, Safe Software has no ability to access the instance through the FME Server web interface or APIs. This means it is impossible for Safe Software to support the FME Server application uptime as we have no access, and thus insight, into FME Server workloads being run. It is this application tier that the MSP Partner or FME Cloud customer is responsible for supporting.
Monitoring and Automated Alerts
To help the FME Cloud customers and MSP Partners manage this application tier, FME Cloud provides a suite of tools (in addition to those FME Server provides).
Disk Monitoring: If an FME Server instance runs out of disk space, then it can cause a critical outage as FME Server requires free disk to function. FME Cloud customers/MSP Partners can monitor disk usage and define alerts that will send a notification when the amount of remaining disk goes below a certain value.
Memory Monitoring: If an FME Server instance is consistently running out of memory, then it can potentially cause a severe degradation of service. FME Cloud customers/MSP Partners can monitor memory and define alerts that will send a notification when the memory usage is above a certain value for a period of time.
Web Server Responsiveness: If an FME Server instance is overloaded, or experiences connectivity issues, one of the best indicators of a potential critical outage is whether the FME Server web server is responsive. FME Cloud customers/MSP Partners can define alerts on the server response time and there is a special alert which triggers when the server is non-responsive. Notifications can then be configured to ensure the correct people are instantly made aware of the issue.
FME Server Load: If an instance is constantly overloaded, then it can cause a degradation in service as all services (engines, web server, database, etc.) share the same compute. For example, if an FME Engine hogs all of the CPU, then it can cause the database and web server to crash. If the load is consistently high, then the FME Cloud customer/MSP Partner may need to upgrade the instance type. FME Cloud customers/MSP Partners can monitor server load and define alerts that will send a notification when the load is above a certain value for a period of time.
Security Update Management
Ensuring the FME Cloud customer’s instance is secure is critical to ensuring a high level of uptime. If the operating system is not patched with the latest fixes, then the instance could be vulnerable to attack. FME Cloud provides automated security patching which allows FME Cloud customers/MSP Partners to ensure the instance is patched with a few clicks in the dashboard.