GHD uses text mining to facilitate data-driven decisions

GHD collects over 21 million filings from the U.S. Security Exchange Commission (SEC) to deliver business and financial intelligence to its clients.

Using machine learning to detect word vectors and perform sentiment analysis

GHD wanted to gather business intelligence from financial data by performing natural language processing to forecast abnormal returns and find investment diversification opportunities. This data science project involved text mining on massive amounts of unstructured data from an online archive.

Using FME, they built an API to collect over 21 million U.S. Security & Exchange Commission (SEC) quarterly filings from the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) website. This workflow involves reading column-aligned text, unstructured text, and HTML/XML.

FME is used to clean the text data, parse it into financial information, analyze it using natural language processing, and perform sentiment analysis to find changes in language over time. The data is then written to SAP HANA for further BI tasks.

The resulting FME workflow performed data enrichment and analysis on large amounts of unstructured text data. GHD was able to create a database of companies who use the EDGAR system, collate financial data in SAP HANA, and execute machine learning algorithms in an automated process.

GHD is a global professional services company that provides architecture, engineering, construction, advisory and digital services to private and public sector clients.

“I use FME for everything.”

Steven Cyphers

Data Solutions Architect, GHD