GHD wanted to gather business intelligence from financial data by performing natural language processing to find abnormalities and investment diversification opportunities. This data science project involved text mining on massive amounts of unstructured data from an online archive.
Using FME, they built an API to collect over 21 million U.S. Security & Exchange Commission (SEC) quarterly filings from the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) website. This workflow involves reading column-aligned text, unstructured text, and HTML/XML. FME is used to clean the text data, parse it into financial information, analyze it using natural language processing, and perform sentiment analysis to find changes in language over time. The data is then written to SAP HANA for further BI tasks.
The resulting FME workflow performed data enrichment and analysis on large amounts of unstructured text data. GHD was able to create a database of companies who use the EDGAR system, collate financial data in SAP HANA, and execute machine learning algorithms in an automated process.
GHD provides architecture, engineering, construction, advisory and digital services to private and public sector clients.