Overview

Data validation is an essential part of any data handling task whether you’re in the field collecting information, analyzing data, or preparing to present your data to stakeholders. If your data isn’t accurate from the start, your results definitely won’t be accurate either. That’s why it’s necessary to verify and validate your data before it is used.

While data validation is a critical step in any data workflow, it’s often skipped over. It may seem as if data validation is a step that slows down your pace of work, however, it is essential because it will help you create the best results possible. These days data validation can be a much quicker process than you might’ve thought. With data integration platforms that can incorporate and automate validation processes, validation can be treated as an essential ingredient to your workflow rather than an additional step.

Why Validate?

Validating the accuracy, clarity, and details of your data is necessary to mitigate any project defects. Without validating your data, you run the risk of basing decisions on data with imperfections that are not accurately representative of the situation at hand.

While verifying your data inputs and values is important, it is also necessary to validate the data data model itself. If your data model is not structured or built correctly, you will run into issues when trying to use your data files in various applications and software.

Both the structure and content of your data files will dictate what exactly you can do with your data. Using validation rules to cleanse your data before use helps to mitigate “garbage in = garbage out” scenarios. Ensuring the integrity of your data helps to ensure the legitimacy of your conclusions.

Types of Data Validation

Validation Rules for Consistency

The most straightforward (and arguably the most essential) rules used in data validation are rules that ensure data integrity. You’re probably familiar with these types of practices. Spell check? Data validation. Minimum password length? Data validation.

Every organization will have its own unique rules for how data should be stored and maintained. Setting basic data validation rules will help your company uphold organized standards that will effectively make working with your data more efficient. Some other common examples of data validation rules that help maintain integrity and clarity include:

  • Data type (ex. integer, float, string)
  • Range (ex. A number between 35-40)
  • Uniqueness (ex. Postal code)
  • Consistent expressions (ex. Using one of St., Str, Street)
  • No null values

Format Standards

Validating the structure of your data is just as important as validating the data itself. Doing so will ensure that you are using the appropriate data model for the formats that are compatible with the applications you would like to use your data in.

File formats and their standards are maintained by non-profit organizations, government departments, industry advisory panels, and private companies. With their assistance, they help to continuously develop, document, and define file structures that hold your data.

When validating your data, the standards and structure of the data model that your dataset is stored in should be well understood. Failing to do so may result in files that are incompatible with applications and other datasets with which you may want to integrate your data.

How to Perform Data Validation

Validation by Scripts

Depending on your fluency in coding languages, writing a script may be an option for validating your data. You can compare your data values and structure against your defined rules to verify that all the necessary information is within the required quality parameters. Depending on the complexity and size of the data set you are validating, this method of data validation can be quite time-consuming.

Validation by Programs

Many software programs can be used to perform data validation for you. This method of validation is very straightforward since these programs have been developed to understand your rules and the file structures you are working with. The ideal tool is one that lets you build validation into every step of your workflow, without requiring an in-depth understanding of the underlying format.

FME For Data Validation

Software like FME enables you to customize your data validation workflow precisely for your needs. You can create workflows that are specific to data validation, or add data validation as a step within other data integration workflows. Additionally, you can automatically run any data validation workflow on a schedule (or on-demand) which means you can build a workflow once, and reuse it over and over.

To ensure that your data is fit to serve its purpose most effectively, you can add validation-based “transformers” to your workflow. For example, FME’s GeometryValidator, AttributeValidator, and Tester transformers all help you verify that your data is formatted and structured based on your specific data validation rules. These transformers can be used at the beginning of workflows to validate that the data you’re reading is correct, or at the end of a workflow to validate that your data has been converted and transformed properly.

FME supports over 450 formats and applications through tools called readers and writers. Each reader and writer has been designed to understand the specific nature of your data format to aid in the validation process. Readers and writers go beyond just understanding a file extension. They understand based on function, too. For example, not all .xml files are the same. You may be using XML to store data for CityGML, GPX, LandXML, or Microsoft MapPoint Web. Each of FME’s readers and writers will interpret your data by need, not just by format.

When you run your workflows if your data is invalid or if there are any other issues with your workflow, you’ll be notified in the reporting details. This information will help you retrace your steps and reconfigure your workflow to fix your data.

With FME you can ensure that your data is correct (contains no inconsistencies or errors), complete (there are no missing fields where a value is required), and compliant (meets the specifications of data model standards).

What is FME?

FME is recognized as the data integration platform with the best support for spatial data worldwide. However, it can handle much more than just spatial data. FME can help you integrate business data, 3D data, and applications all within the same platform. FME has a range of supportive data transformation tools called transformers that make it easy to integrate over 450 formats and applications. With FME you have the flexibility to transform and integrate exactly the way you want to.

Safe Software, the makers of FME, are leaders in the technology world that strive to stay one step ahead of data integration trends. FME is continuously upgraded to ensure it has been adapted to support new data formats, updated versions of existing data formats, and large amounts of data. Gone is the idea that individual departments must work in their data silos, with IT structures limiting the company’s potential to truly work as one. Data should be able to flow freely no matter where, when, or how it’s needed.

Related Resources

Data Validation and QA with FME

Tips For Better Data Quality

The Ultimate Geospatial Data Validation Checklist

Why You Should Care About Spatial Data

What is Data Conversion?

What is Application Integration?