In the first two posts in this series, we looked at what APIs are and how to use them to move data in the cloud. In this third post co-authored with one of our data experts, De Wet van Niekerk, we’re going to dive into the details of migrating data via APIs through one of our own system migration examples. Part 4 details how and why to build your own API.
At Safe, we are always looking for ways to improve our customer service, and we recently upgraded our user community to provide a new Knowledge Center where FME users can ask questions, access resources, and submit ideas straight to our developers all in the same place.
Our old system used a combination of Salesforce and Trello, and APIs made it possible for us to move all of our data to our new AnswerHub community — without losing a thing.
Understanding Requirements and Challenges
Migrating data between services can be a complex process. And one of the most challenging parts is understanding the data models to construct an accurate mapping. But the good news is once that’s finished, flexible data transformation tools, like FME, make the actual migration very straightforward.
Three main tasks needed undertaking in our migration:
- Moving over 1,500 knowledge articles and 3,000 Q&As from Salesforce
- Mapping ideas submitted from customers on Trello boards to the Ideas component in AnswerHub.
- Making it possible for users to sign-in to the whole site with one login using Auth0 for site-wide authentication. To achieve this, user information needed to be loaded to both AnswerHub and Auth0 for sign-in to work automatically.
General Data Migration Challenges
1) Connecting to the APIs — Authentication
To access services, you first need to determine the authentication mechanism: token, OAuth2 or maybe HTTP Basic. And each service usually interprets the standard slightly differently. For example, with token based authentication, does the token go in the query string or in the header?
The complexities around authentication, especially if the service is using OAuth2, make it one of the biggest barriers for working with web services. But thankfully FME supports OAuth2, token and basic authentication — so once we worked out the intricacies of authentication for the three services, we were able to set and forget in our FME data transformation workflows.
2) Creating Repeatable Migration Process
A major difference between loading data via API calls and a direct read-and-write method is that the loading process can easily become a multi-phase process. One piece of data can be loaded and the resulting object, now immediately available through the API, can be used in the next phase of the migration. This does require a bit of a shift in approach; creating a repeatable migration process is more about defining a set of steps than about mapping out an exact target dataset.
For example, once a Q&A question was loaded into AnswerHub, its URL was returned in the response header. From here, we could extract the URL and use it to post comments and answers to the question.
3) API Errors
API errors are a fact of life. They can be caused by network timeouts, improperly formatted requests, or various server errors. At a minimum, it’s important to log these to confirm that information doesn’t get lost. Ideally, it is possible to identify what caused the error and resubmit the requests with only the failed content. Again, thankfully, we were able to configure this in our transformation workflows.
Data Specific Challenges
Every migration is unique, and for this particular example, we had a different set of challenges for each set of data we had to load:
1) Loading Users
Users had to be loaded to Auth0 and AnswerHub from Salesforce and Trello. There were several challenges:
- Attribute mapping to match the new schema
- Truncating usernames if they were over 12 characters
- Removing special characters from usernames as Auth0 only supports alphanumeric.
- Testing for duplicate usernames as the public display name was not necessarily unique in the old community.
2) Loading Knowledge Articles
All existing Salesforce Knowledge Base articles needed to be migrated to the new site. Some of the challenges encountered:
- Different handling of attachments and images: Salesforce stored attachments and images within their infrastructure, whereas AnswerHub allows you to store images and attachments anywhere. We therefore used the migration as an opportunity to migrate all images and attachments to AWS S3.
- Moving from category-based to tag- and search-based organization: Work was required here to identify how to organize content under a new paradigm, but once that was done FME was used to tag everything using the API.
Ideas were pulled from two existing Trello boards into AnswerHub’s Ideation system. Trello is a general-purpose tool for visual organization, while Ideation is specifically targeted to getting customer feedback, and indicating progress on implementation of ideas.
Challenges with this migration:
- Mapping the Trello users that suggested ideas with the AnswerHub users that had been imported from Salesforce.
- Identifying topic tags for the ideas
- Authenticating with Trello required creating an app profile in Trello and getting an authorization token from the Trello site to use in FME
4) Question and Answer Site
The question and answer site was the most complex to migrate:
- Associating content types: A multi-step process was required. Replies had to be associated with questions, and questions had to be associated with users, so it was not possible to load all content at once. Also, many questions and replies had images or other files attached. These needed to be loaded to Amazon S3, and the URLs integrated into the content HTML.
- Bringing over social interactions: We wanted to preserve as much of the flow of the original conversation, while taking advantage of the features of the new platform. This required posting content as the original user, but making use of their new ID on AnswerHub to map ‘Likes’ and ‘Best Answers’ across.
Migrating the Data
After identifying our requirements and understanding the data models, we developed several workflows in FME that connected to Salesforce and Trello via APIs, transformed the data as it was migrated, and loaded it into AnswerHub and Auth0 via API calls.
Key Transformers Used
FME transformers are used to read data into the workflow, validate the data and correct it. Several key transformers were used when undertaking this bulk migration:
- HTTPCaller – all API communication is an HTTP request – this transformer allows you to make a request to a specific URL.
- JSONTemplater – data sent to the the APIs was all JSON, this transformer is used to generate the request body from attribute values
- FeatureMerger – used for linking different content types together by foreign key, and for linking up newly-created article metadata stored in spreadsheets
- S3Uploader – attachments and embedded images were uploaded to S3 right in the workflow
- StringSearcher – used for sanitizing usernames according to a regular expression pattern
Connecting to Various APIs
Authentication is a crucial part of working with APIs. FME allows you to authenticate via token, OAuth 2.0 or HTTP Basic which covers the most popular forms.
While designing the migration workflow, we tested out our ideas in a staging environment. When we were satisfied everything was running smoothly, we switched the target over to the production environment. Rather than carrying out the tedious task of changing the URL, username and password in every HTTPCaller, the URL was set to a published parameter, and the authentication information was stored as a web service. This has several advantages:
- We were able to easily switch between environments
- The migration was easily be run with a different user account
- Credentials were kept separate from the migration workspace
Assessing the Result
APIs give you the flexibility to choose the best services for your needs. In our migration example, we successfully moved data from Salesforce and Trello into AnswerHub/Auth0 and upgraded our user community to provide even better customer service.
The key takeaway? You’re never stuck in one system. With APIs and data transformation technology to get your data exactly how it’s needed, you have the freedom to choose whichever services are best for your needs.
Learn More About APIs
In the next and final blog post in this series, we’ll discuss how to build an API on top of FME without writing any code (and why you’d want to).
To learn more about connecting to APIs to access, migrate, and integrate data with FME, check out our API webinar. And as always, if you have any questions or thoughts on this post, please share in the comments.
Stewart HarperStewart is the Technical Director of Cloud Applications and Infrastructure at Safe. When he isn’t building location-based tools for the web, he’s probably skiing or mountain biking.