noun • [day-tuh val-eh-day-shun] • the process of ensuring consistency and accuracy within a dataset
Data validation is an essential part of any data handling task whether you’re in the field collecting information, analyzing data, or preparing to present data to stakeholders. If data isn’t accurate from the start, your results definitely won’t be accurate either. That’s why it’s necessary to verify and validate data before it is used.
While data validation is a critical step in any data workflow, it’s often skipped over. It may seem as if data validation is a step that slows down your pace of work, however, it is essential because it will help you create the best results possible. These days data validation can be a much quicker process than you might’ve thought. With data integration platforms that can incorporate and automate validation processes, validation can be treated as an essential ingredient to your workflow rather than an additional step.
Validating the accuracy, clarity, and details of data is necessary to mitigate any project defects. Without validating data, you run the risk of basing decisions on data with imperfections that are not accurately representative of the situation at hand.
While verifying data inputs and values is important, it is also necessary to validate the data model itself. If the data model is not structured or built correctly, you will run into issues when trying to use data files in various applications and software.
Both the structure and content of data files will dictate what exactly you can do with data. Using validation rules to cleanse data before use helps to mitigate “garbage in = garbage out” scenarios. Ensuring the integrity of data helps to ensure the legitimacy of your conclusions.
Types of Data Validation
Validation Rules for Consistency
The most straightforward (and arguably the most essential) rules used in data validation are rules that ensure data integrity. You’re probably familiar with these types of practices. Spell check? Data validation. Minimum password length? Data validation.
Every organization will have its own unique rules for how data should be stored and maintained. Setting basic data validation rules will help your company uphold organized standards that will effectively make working with data more efficient. Some other common examples of data validation rules that help maintain integrity and clarity include:
Data type (ex. integer, float, string)
Range (ex. A number between 35-40)
Uniqueness (ex. Postal code)
Consistent expressions (ex. Using one of St., Str, Street)
No null values
Validating the structure of data is just as important as validating the data itself. Doing so will ensure that you are using the appropriate data model for the formats that are compatible with the applications you would like to use data in.
File formats and their standards are maintained by non-profit organizations, government departments, industry advisory panels, and private companies. With their assistance, they help to continuously develop, document, and define file structures that hold data.
When validating data, the standards and structure of the data model that the dataset is stored in should be well understood. Failing to do so may result in files that are incompatible with applications and other datasets with which you may want to integrate that data.
How to Perform Data Validation
Validation by Scripts
Depending on your fluency in coding languages, writing a script may be an option for validating data. You can compare data values and structure against your defined rules to verify that all the necessary information is within the required quality parameters. Depending on the complexity and size of the data set you are validating, this method of data validation can be quite time-consuming.
Validation by Programs
Many software programs can be used to perform data validation for you. This method of validation is very straightforward since these programs have been developed to understand your rules and the file structures you are working with. The ideal tool is one that lets you build validation into every step of your workflow, without requiring an in-depth understanding of the underlying format.
FME for Data Validation
Software like FME enables you to customize data validation workflows precisely for your needs. You can create workflows that are specific to data validation, or add data validation as a step within other data integration workflows. Additionally, you can automatically run any data validation workflow on a schedule (or on-demand) which means you can build a workflow once, and reuse it over and over.
To ensure that data is fit to serve its purpose most effectively, you can add validation-based “transformers” to your workflow. For example, FME’s GeometryValidator, AttributeValidator, and Tester transformers all help you verify that data is formatted and structured based on your specific data validation rules. These transformers can be used at the beginning of workflows to validate that the data you’re reading is correct, or at the end of a workflow to validate that data has been converted and transformed properly.
FME supports over 450 formats and applications through tools called readers and writers. Each reader and writer has been designed to understand the specific nature of its data format to aid in the validation process. Readers and writers go beyond just understanding a file extension. They understand based on function, too. For example, not all .xml files are the same. You may be using XML to store data for CityGML, GPX, LandXML, or Microsoft MapPoint Web. Each of FME’s readers and writers will interpret the data by need, not just by format.
When you run your workflows if data is invalid or if there are any other issues with your workflow, you’ll be notified in the reporting details. This information will help you retrace your steps and reconfigure your workflow to fix the data.
With FME you can ensure that data is correct (contains no inconsistencies or errors), complete (there are no missing fields where a value is required), and compliant (meets the specifications of data model standards).
What is FME?
FME is recognized as the data integration platform with the best support for spatial data worldwide. However, it can handle much more than just spatial data. FME can help you integrate business data, 3D data, and applications all within the same platform. FME has a range of supportive data transformation tools called transformers that make it easy to integrate over 450 formats and applications. With FME you have the flexibility to transform and integrate exactly the way you want to.
Safe Software, the makers of FME, are leaders in the technology world that strive to stay one step ahead of data integration trends. FME is continuously upgraded to ensure it has been adapted to support new data formats, updated versions of existing data formats, and large amounts of data. Gone is the idea that individual departments must work in their data silos, with IT structures limiting the company’s potential to truly work as one. Data should be able to flow freely no matter where, when, or how it’s needed.