The process of ensuring the accuracy and quality of data
Written bySebastian Taylor
Published April 15, 2021
Updated May 31, 2023
What is Data Validation?
Data validation refers to the process of ensuring the accuracy and quality of data. It is implemented by building several checks into a system or report to ensure the logical consistency of input and stored data.
In automated systems, data is entered with minimal or no human supervision. Therefore, it is necessary to ensure that the data that enters the system is correct and meets the desired quality standards. The data will be of little use if it is not entered properly and can create bigger downstream reporting issues. Unstructured data, even if entered correctly, will incur related costs for cleaning, transforming, and storage.
Types of Data Validation
There are many types of data validation. Most data validation procedures will perform one or more of these checks to ensure that the data is correct before storing it in the database. Common types of data validation checks include:
1. Data Type Check
A data type check confirms that the data entered has the correct data type. For example, a field might only accept numeric data. If this is the case, then any data containing other characters such as letters or special symbols should be rejected by the system.
A code check ensures that a field is selected from a valid list of values or follows certain formatting rules. For example, it is easier to verify that a postal code is valid by checking it against a list of valid codes. The same concept can be applied to other items such as country codes and NAICS industry codes.
3. Range Check
A range check will verify whether input data falls within a predefined range. For example, latitude and longitude are commonly used in geographic data. A latitude value should be between -90 and 90, while a longitude value must be between -180 and 180. Any values out of this range are invalid.
4. Format Check
Many data types follow a certain predefined format. A common use case is date columns that are stored in a fixed format like “YYYY-MM-DD” or “DD-MM-YYYY.” A data validation procedure that ensures dates are in the proper format helps maintain consistency across data and through time.
5. Consistency Check
A consistency check is a type of logical check that confirms the data’s been entered in a logically consistent way. An example is checking if the delivery date is after the shipping date for a parcel.
6. Uniqueness Check
Some data like IDs or e-mail addresses are unique by nature. A database should likely have unique entries on these fields. A uniqueness check ensures that an item is not entered multiple times into a database.
Consider the example of a retailer that collects data on its stores but fails to create a proper check on the postal code. The oversight could make it difficult to leverage the data for information and business intelligence. Several problems can occur if the postal code is not entered or entered improperly.
It can be difficult to define the location of the store in some mapping software. A store postal code will also help generate insights about the neighborhood where the store is located. Without a data check on the postal code, it is more likely to lose the value of data. It will result in further costs if the data needs to be recollected or the postal code needs to be manually entered.
A simple solution to the problem would be to put a check in place that ensures a valid postal code is entered. The solution could be a dropdown menu or an auto-complete form that allows the user to choose the postal code from a list of valid codes. Such a type of data validation is called a code validation or code check.
Data Validation in Excel
The following example is an introduction to data validation in Excel. The data validation button under the data tab provides the user with different types of data validation checks based on the data type in the cell. It also allows the user to define custom validation checks using Excel formulas. The data validation can be found in the Data Tools section of the Data tab in the ribbon of Excel:
Data Entry Task
The example below illustrates a case of data entry, where the province must be entered for every store location. Since stores are only located in certain provinces, any incorrect entry should be caught.
It is accomplished in Excel using a two-fold data validation. First, the relevant provinces are incorporated into a drop-down menu that allows the user to select from a list of valid provinces.
Second, if the user inputs a wrong province by mistake, such as “NY” instead of “NS,” the system warns the user of the incorrect input.
Further, if the user ignores the warning, an analysis can be conducted using the data validation feature in Excel that identifies incorrect inputs.
Thank you for reading CFI’s guide to Data Validation.In order to help you become a world-class \analyst and advance your career to your fullest potential, these additional resources will be very helpful:
Data validation is the practice of checking the integrity, accuracy and structure of data before it is used for a business operation. Data validation operation results can provide data used for data analytics, business intelligence or training a machine learning model.What is data validation examples? ›
- Data Type. This rule ensures the data being entered has the correct data type as required by the field, for example, text. ...
- Code Check. ...
- Range. ...
- Consistent Expressions. ...
- Format. ...
- Uniqueness. ...
- No Null Values. ...
- Standards for Formatting.
- Detail Plan. It is the most critical step, to create the proper roadmap for it. ...
- Validate the Database. This is responsible for ensuring that all the applicable data is present from source to sink. ...
- Validate Data Formatting. ...
Data validation means checking the accuracy and quality of source data before using, importing or otherwise processing data. Different types of validation can be performed depending on destination constraints or objectives. Data validation is a form of data cleansing.What is the main purpose of data validation? ›
The goal is to create data that is consistent, accurate and complete so to prevent data loss and errors during a move.What are the 4 types of validation? ›
- A) Prospective validation (or premarket validation)
- B) Retrospective validation.
- C) Concurrent validation.
- D) Revalidation.
Data Type Validation: This technique checks if the data entered into the system is of the correct data type, such as a string, integer, or date. Range Validation: This technique checks if the data entered into the system falls within a specific range of values, such as a customer's age between 18 and 65 years old.Which is the best approach to validate data? ›
The best way to ensure the high data quality of your datasets is to perform up-front data validation. Check the accuracy and completeness of collected data before you add it to your data warehouse. This will increase the time you need to integrate new data sources into your data warehouse.How do you create Data Validation? ›
- Select the cells you want to validate.
- Click the Data tab.
- Click the Data Validation button.
- Click the Allow list arrow.
- Select the type of data you want to allow. Any value: No validation criteria applied. ...
- Specify the data validation rules. ...
- Click OK.
Data validation in Excel is a feature that allows you to control the type of data entered into your worksheet. For example, Excel data validation allows you to limit data entries to a selection from a dropdown list and to restrict certain data entries, such as dates or numbers outside of a predetermined range.
Data validation rules allow you to constrain the values that can be entered into a worksheet cell. You can define one or more data validation rules for your worksheet. Typically, you define a separate data validation rule for each column in your worksheet where you need to constrain user entered values.What is data validation in SQL? ›
What is Data Validation in SQL? Data validation is the method for checking the accuracy and quality of data. It is often performed prior to adding, updating, or processing data. Similarly, when we want to merge data from disparate sources we often talk of 'cleansing' the data – in other words validating it.What is the difference between data verification and data validation? ›
Now that we understand the literal meaning of the two words, let's explore the difference between “data verification” and “data validation”. Data verification: to make sure that the data is accurate. Data validation: to make sure that the data is correct.What are the 3 stages of process validation? ›
The 3 stages of process validation are 1) Process Design, 2) Process Qualification, and 3) Continued Process Verification. Current Good Manufacturing Practices (cGMP) come strongly into play when participating in pharmaceutical process validation activities. A number of them are legally enforceable requirements.What are the 6 levels of validation? ›
- SIX LEVELS of VALIDATION.
- Level One: Stay Awake and Pay Attention.
- Level Two: Accurate Reflection.
- Level Three: Stating What Hasn't Been Said Out Loud (“the unarticulated”)
- Level Four: Validating Using Past History or Biology.
- Level Five: Normalizing.
- Level Six: Radical Genuineness.
- Set up a team and assign a leader to carry out the design of the validation. ...
- Determine the scope of the study. ...
- Design a sampling plan. ...
- Select a method of analysis. ...
- Establish acceptance criteria.
Data validation provides accuracy, cleanness, and completeness to the dataset by eliminating data errors from any project to ensure that the data is not corrupted. While data validation can be performed on any data, including data within a single application such as Excel creates better results.What is data validation in ETL? ›
What Is Data Validation? In simple terms, Data Validation is the act of validating the fact that the data that are moved as part of ETL or data migration jobs are consistent, accurate, and complete in the target production live systems to serve the business requirements.What are the two key elements of validation? ›
- Conducting data analysis of collected data to identify conclusions, insights, and trends.
- Reporting analyses, observations, and potential COAs.
- Source system loopback verification: ...
- Ongoing source-to-source verification: ...
- Data-Issue tracking: ...
- Data certification: ...
- Statistics collection: ...
- Workflow management:
There are many statistical tools that can be used as part of validation. Control charts, capability studies, designed experiments, tolerance analysis, robust design methods, failure modes and effects analysis, sampling plans, and mistake proofing are but a few.What are examples of data validation controls? ›
There are three types of data validation checks: (1) field checks, (2) record checks, and (3) file checks. Common field check controls include alphanumeric field tests, missing data (completeness) tests, range tests, limit tests, existence (validity) tests, and check-digit verification tests.Why would you want to validate data on a spreadsheet? ›
You can use data validation to restrict the type of data or values that users enter into cells. For example, you might use data validation to calculate the maximum allowed value in a cell based on a value elsewhere in the workbook.How do I create a data validation list from a table? ›
Select the cell in the worksheet where you want the drop-down list. Go to the Data tab on the Ribbon, then click Data Validation . On the Settings tab, in the Allow box, click List . If it's OK for people to leave the cell empty, check the Ignore blank box.What are the 4 critical aspects of validation? ›
Validation determines if assessment tools have produced the intended evidence. Validators must look at the evidence in the sample, and determine if it is valid, reliable, sufficient, current and authentic.What are simple validation rules? ›
A simple validation rule is based on a PredefinedGreexRule, which is used in conjunction with the value of a required attribute. The value is entered as DataCapture information in Sterling Business Center.What is the process of validation? ›
Process validation is defined as documented verification that the manufacturing approach operated according to its specifications consistently generates a product complying with its predefined quality attributes and release specifications.What are the 4 levels of validation in data visualization? ›
As shown on the figure above, there are four nested levels of vis design, including domain situation, task and data abstraction, visual encoding and interaction idiom, and algorithm.How to use SQL for data validation? ›
- Click Tables on the Model menu. ...
- Select the table in the Navigation Grid for which you want to define validation rule usage. ...
- Click the Validation tab.
- Select the validation usage item in the grid that you want to define and work with the following options: ...
- Click Close.
Field validation rules Use a field validation rule to check the value that you enter in a field when you leave the field. For example, suppose you have a Date field, and you enter >=#01/01/2010# in the Validation Rule property of that field.
- presence check - a username must be entered.
- length check - a password must be at least eight characters long.
- range check - age restrictions may require the user's date of birth to be before a certain date.
Select the cell you want to validate. Go to the Data tab > Data tools, and click on the Data Validation button. A data validation dialogue box will appear having 3 tabs - Settings, Input Message, and Error Alerts.What is validation testing with example? ›
Validation testing is the process of assessing a new software product to ensure that its performance matches consumer needs. Product development teams might perform validation testing to learn about the integrity of the product itself and its performance in different environments.What are the 3 validation rules? ›
Validation rule and validation text examples
Value must be zero or greater. You must enter a positive number. Value must be either 0 or greater than 100.
- TYPES OF VALIDATIONS.
- 1) Prospective validation. It is the most common type of validation. ...
- 2) Retrospective validation. ...
- 3) Concurrent validation. ...
- 4) Revalidation (Periodic and After Change).
You can use data validation to restrict the type of data or values that users enter into cells. For example, you might use data validation to calculate the maximum allowed value in a cell based on a value elsewhere in the workbook.What is the difference between data verification and validation? ›
In other words, verification may take place as part of a recurring data quality process, whereas validation typically occurs when a record is initially created or updated. Verification plays an especially critical role when data is migrated or merged from outside data sources.