What are open data portals? Vast collections of data exist online that are full of possibilities and available for anyone to use. The value that open data provides goes beyond community participation and improved products. Having data readily available regarding taxes, housing prices, crime, navigation, energy efficiency, and more, opens up a whole new level of global transparency and accountability.

Governments, businesses, research organizations, and others have embraced the open data movement and have seen enormous benefits.

Open data is free, accessible data that anyone can use for any purpose—and public interest in it is exploding. With it, we have access to information about the places, businesses, and organizations we care about. It means transparency. It enables collaboration, innovation, and scientific and technological advancement.

Table of Contents

  1. Who Produces Open Data?
  2. Why Does Open Data Matter?
  3. Creating Your Own Data Portal

Who Produces Open Data?

All Levels of Government

Governments are currently the main providers of open data and have been the forefront of the open data movement. Data is provided at all levels of government (e.g. city, state and federal) in the UK, USA, Canada, and various other countries.

The reason?

Many governments are now mandated (e.g. INSPIRE) to provide open data to their citizens, and a variety of other countries, states, provinces, and cities are going down the same path. We’re seeing the importance of this more and more, especially since COVID-19 has required everyone around the world to contribute and work together towards a common goal. Sharing data is a huge part of that process.

The Tri-County Health Department has done a fantastic job of not only managing their COVID-19 data, but also publishing it so that it’s accessible to anyone, anywhere.

Displayed is California’s state geoportal where a wide variety of data categories can be explored by anyone. Citizens are also encouraged to request datasets, increasing transparency and engagement even more.

As we move towards smart cities and gain the ability to measure and control almost any physical object, data volumes will increase exponentially. This means that the importance of keeping a low cost of entry for developers is critical.

Non-Governmental Organizations

NGOs have always paid attention to the democratization of data since they work to provide better public services or planning development projects. Relief efforts can also benefit significantly from open data.

With efforts like crowdsourcing and geomapping, NGOs and other non-profit groups can be helped through the power of data. We saw this with the Nepal earthquake in 2015 where 4,000 mappers had mapped out 13,199 miles of road and 110,681 buildings within 48 hours of the natural disaster. These maps allowed aid groups to make rescue plans and find the safest and fastest routes to aid those needing help.

Many NGOs are also starting to produce open data themselves. For example, the Hewlett Foundation and the Gates Foundation share where they spent their aid money for increased transparency. Hundreds of organizations have also published data about their operations and spending via the International Aid Transparency Initiative.

Academic Institutions

Academic institutions are also opening up their data. Fuelled by the success of sharing data on Alzheimer’s, there is a movement beginning for more transparent research practices. As there are many complexities around sharing research data, progress is slow. Despite this, more and more funding agencies and academic publishers are supporting and even mandating data sharing.

Openly accessible research data is also at the heart of the EU’s €80 billion Horizon 2020 program, which gives further hope of dramatic progress in the near future.

One of the greatest successes of open data was The Human Genome Project. In this feat, the human genomic sequence information was made public almost immediately. As a result, a genome can now be mapped in a few hours, costing less than $1000.

Private Companies

Even though governments are the main providers of open data, there is a significant proportion of corporations who are starting to realize that producing open datasets can improve their bottom line. Benjamin Herzberg of the World Bank Institute calls this new frontier the Open Private Sector.

Why Does Open Data Matter?

The debate over whether or not this data should be made available to the public is still ongoing. Traditionally, many governments have viewed the sale of data as a revenue stream. Other governments view the infrastructure costs of freely opening data as prohibitive. Lastly, the raw data may not be in a form that is easily shared.

As data lovers, we believe that data should never be locked inside applications or formats. Data should be free to use whenever, wherever, and however it’s needed. By opening data, knowledge can be shared and new innovations can be made. With open data, citizens have the option to examine data and answer questions they may have, researchers and journalists can gather and analyze data to tell stronger stories, and developers can use data to build applications.

Creating Your Own Open Data Portal

If you want to make your data public, there are few things to consider before creating your own open data portal.

1. Only Share Good Quality Data

Making decisions based on bad data could have extreme ramifications. Before sharing a dataset to the public all aspects should be checked for completeness, correctness, consistency, and compliance. This includes validating geometry, attributes, standards compliance, format-specific issues like XML / JSON structure, and more.

Good quality data means checking that it fulfills requirements, then repairing it where it doesn’t pass. Use manual verification and out-of-the-box tools to validate your data, or use FME to detect and repair problems automatically.

2. Offer Format Choices

Data is nothing if no one can read it.

By definition, open data should be easy for the public to use. Offer a choice with respect to format and remember that data should be both machine readable and human readable. Here are some formats we recommend for open data portals:

CSV
A tabular format that’s easily read by humans. Excel is also a good one to offer for these reasons.
Shapefile
A widely used spatial data format. It’s consistently the most popular GIS format in our usage stats.
XML
It’s machine readable and offers the user a lot of power and flexibility for tabular data.
KML
It’s instantly viewable in a web environment and is the format of choice for Google Maps and Google Earth.
JSON
Like XML, it’s machine readable and flexible, plus it’s a language commonly used by APIs to transfer data over the web.
GeoJSON
It’s flexible, machine readable, and is a language commonly used by APIs to transfer data over the web, but also stores spatial data.

Other useful formats to consider:

3. Update Datasets Frequently

Open data or not, using old data is problematic when trying to make informed decisions. So, it’s important that data in an open data portal is updated regularly.

To help others ensure they’re always getting your updated data, it’s worth providing your data as a published feed (e.g. RSS) or API rather than statically downloadable files. This will allow people to consume the endpoint, and if you make updates they will be automatically reflected in a user’s app.

A great way to set this up is to connect your open data platform to your master database. This way your data will be integrated directly instead of being duplicated in two locations, avoiding issues that may occur when updates are needed in the future.

To do this, start with FME. Synchronize your portal with your database using transformers like the ChangeDetector that can watch for updated fields in your database. Then, use Automations in FME Server to ensure your portal is updated as soon as any changes behind the scenes take place.

4. Provide Projection Options for Spatial Data

Providing your data in different formats is a great start for making your data accessible to users of all kinds. However, when it comes to spatial data, options for coordinate systems and projections should be considered as well.

When it comes to spatial data, users should be able to choose their projection. Local (e.g. State Plane or British National Grid) and global projections should be provided. For global projections, we recommend:

When using FME to manage open data portals, you can provide coordinate system choices by making a published parameter. There are a few ways to go about selecting a coordinate system. One option is to use the Reprojector transformer which uses the CS-Map reprojection engine, but others are available like PROJ, Gtrans, Esri).

5. Choose a Open Data Portal Delivery Solution

Now that your data is ready to be shared, you’ll need a platform to provide the data.

Each of the solutions presented here offers its own set of strengths for data publishers. The world is always creating more options and each one offers something new. So, it’s important to always do your own research to find the best solution for you and your team.

ArcGIS Open Data

(Licensing: Commercial, Delivery Model: SaaS)

Configure your own branded open data site with ArcGIS Server or ArcGIS Online. ArcGIS Open Data will be of particular interest if you currently use Esri within your organization.

Esri has made it very easy to create and configure an open data site, allowing you to focus on your strategy, policy, and adoption rather than technical and operational concerns.

Publishing & Management Visualization Features Geospatial Features
  • Support for ArcGIS Online feature services. ArcGIS for Server feature services, and ArcGIS for Server map services
  • Image services
  • Support CSV, Socrata, and CKAN hosted datasets
  • Can host web maps, URLs, word docs and PDF
  • View data and metadata in browser
  • Data interaction is limited to sorting columns and filtering by a search query
  • Create simple histograms, line, donut, or scatter charts to analyze without downloading the data
  • Configure widgets and customize design with code
  • All data is downloaded in WGS 84
  • Download data as KML, shapefile or via the API (JSON, Geoservice, WMS)
  • Can load any spatial dataset into the ArcGIS web map viewer and get an extremely rich set of tools to visualize and analyze the data

Examples: Open Data DC, City of Burnaby

The City of Langley uses ArcGIS Online along with FME to supply open data to their citizens in a streamlined way. This allowed them to maintain their open data via the ArcGIS Online REST API. Learn more about how they did this by viewing their presentation.

CKAN

(Licensing: Open Source, Deployment Model: Self-hosted with SaaS offerings based on the CKAN technology)

CKAN is a leading open source data portal with over 300 open source data management extensions. It is a powerful platform best suited for large organizations, as it is relatively complex to set up and maintain.

Publishing & Management Visualization Features Geospatial Features
  • Upload data via custom spreadsheet importers or via the web interface
  • Has rich JSON API that can integrate with FME, letting you load any dataset that FME supports into CKAN
  • Can import data from other services at regular intervals
  • Tools to manage permissions and edit the metadata
  • A rich search experience allows you to quickly find the data you want
  • Can review metadata for each dataset and inspect data in a tabular, graphical and mapping view
  • Can plot the data on an interactive map so users can view a sample of the dataset and analyze individual records
  • Users can filter and search for data based on a geographic location

Examples: US Government Open Data, UK Open Data, Government of Canada Open Data

The City of Surrey is a great example of a government that utilizes both CKAN and FME to manage their open data portal. By using both these technologies, they’re able to supply datasets that can be downloaded in any format and any projection.

Socrata

(Licensing: Commercial, Deployment Model: SaaS)

A platform that turns data into a utility that can be discovered, consumed, visualized, analyzed, and shared.All Socrata technologies are developed in the open, on the company’s Github page, and the majority (including the core API server and all installable clients) are open source licensed and free to use or modify. Socrata also organizes large-scale open beta programs to solicit the input of governments around the world when developing new services.

Publishing & Management Visualization Features Geospatial Features
  • Publish data using a WebUI, desktop sync tool, or API
  • Supports CSV, Excel, TSV, PDF, shapefiles, KML, and GeoJSON
  • Can have both a published and working copy of the dataset
  • Many tools around metadata management and workflow
  • Rich set of tools to visually inspect both tabular and geospatial data
  • Can configure data (i.e. select columns) before downloading
  • Do charting on data to produce powerful visual representations of the data
  • All datasets include an API and an OData endpoint, which can be used to interact with the data using Excel, PowerBI, Tableau, and other analytics tools
  • Visualize data from within the web browser
  • Can overlay your own datasets on top of the map and can save views for easy sharing
  • Allows for a direct connection to an Esri catalog

Examples: NYC OpenData, Washington State

Socrata themselves used FME to help with the Police Foundation in Washington D.C. They needed to automate the ingress and centralization of various police data sources into an open data portal for the Task Force on 21st Century Policing. Learn more about how Socrata used FME for the task.

Amazon Web Services

(Licensing: Commercial, Deployment Model: PaaS)

By leveraging the lower-level services of AWS (e.g. S3, EC2, RDS) and making use of FME as a data-mover, you can produce an extremely fault-tolerant, scalable, and powerful service that is easy to maintain and cost effective. Extensive format and projection extraction choices can be supported via FME.

Flexible, pay-as-you-go pricing on both AWS and FME makes it a great option if you are conscious of your spending, but still want rich functionality.

Notes on architecture:

The Arkansas GIS Office open data portal is hosted in the cloud using Amazon and supported by FME. With 7+ terabytes of data to migrate, they started with FME and continue to use it as an ongoing automation tool. Learn more about how Arkansas Geographic Information Systems Office did it.

Free Hosting

(Licensing: Commercial Free, Deployment Model: SaaS)

If you are looking for data visualization or analysis then look to the previous solutions, but if all you want is a simple file catalog service, free hosting may be the way to go.

Free hosting with sites like DataHub.io, FTP, Google Drive, and GitHub is a good place to start. The cloud file storage solutions can be used to store and serve large volumes of data. It is simple to upload the data, and a simple web interface can be built on top of the storage system to provide further context.

If collaboration with users is important, look at GitHub. Several people have successfully used it to host open data, and by uploading GeoJSON, you can even visualize the data on a map.

In all cases, FME can be used to sync the storage services with the master database to ensure the databases are up to date.

6. Automate the Process

Now that you have all the pieces of your data portal put together, you’re going to want to keep it functioning. You could do it manually, but with the amount of data being collected, processed, and stored these days, performing these tasks manually will get overwhelming. Fast.

Using FME is a great way to get things done the way you want them, when you want them. Build workflows that connect to your master database, standardize and validate all sorts of datasets, connect them to the platform of your choice using pre-built connectors or APIs, and then use Automations to ensure when one update happens elsewhere that it’s reflected in your portal.

Better yet, use FME for custom open data access like map based data distribution. Your FME workflows can function behind the scenes so that when a user selects a specific area of interest, the dataset is clipped using their exact shape. This saves them the effort of transforming data themselves, providing an even better, full-functioning service. This is the kind of automation that goes above and beyond and can really WOW your users.

The City of Surrey uses CKAN and FME to automate how they supply data via their open data portal. To learn more, watch their presentation.

Unleash the Data!

We’re going to be seeing a lot more open data in the world due to its overwhelming popularity. Plus, there are no excuses when it comes to the technical side of things. With cloud and automation tools like FME, it’s straightforward and cheap (and fun!) to create an open data portal.

Of course, there will be demand for higher quality data, not just more of it. Open data must be easy to find, use, and collaborate on. We also expect to see open data become normalized so it’s easier to compare cities globally.

With free access to data, citizens will be able to engage with the community at large and innovate based on their own interests. While open data users can take advantage of this, even the citizens with no knowledge of open data can use it. It might be to research the crime rate before purchasing a new home, to find out where their tax dollars go, or to find out how much members of parliament make.

By keeping open data alive, new trends will start to develop on a global level. The possibilities are endless.

About Data Amazon Web Services ArcGIS Online CKAN Open Data Socrata

Amanda Schrack

While Amanda's background is in environmental science and GIS, she now writes content for safe.com. Looks like writing all those research reports paid off! In her spare time, Amanda likes to make crafts, watch documentaries, and learn about bugs (the kind with six legs).

Related Posts