What is Data Integrity? Definition, Types & Tips
Learn about data integrity, data integrity vs. data security, and more in Data Protection 101, our series on the fundamentals of data protection.
What is Data Integrity?
Data integrity refers to the accuracy and consistency (validity) of data over its lifecycle. Compromised data, after all, is of little use to enterprises, not to mention the dangers presented by sensitive data loss. For this reason, maintaining data integrity is a core focus of many enterprise security solutions.
Data integrity can be compromised in several ways. Each time data is replicated or transferred, it should remain intact and unaltered between updates. Error checking methods and validation procedures are typically relied on to ensure the integrity of data that is transferred or reproduced without the intention of alteration.
DATA INTEGRITY AS A PROCESS AND AS A STATE
The term data integrity also leads to confusion because it may refer either to a state or a process. Data integrity as a state defines a data set that is both valid and accurate. On the other hand, data integrity as a process, describes measures used to ensure validity and accuracy of a data set or all data contained in a database or other construct. For instance, error checking and validation methods may be referred to as data integrity processes.
A CASE FOR DATA INTEGRITY
Maintaining data integrity is important for several reasons. For one, data integrity ensures recoverability and searchability, traceability (to origin), and connectivity. Protecting the validity and accuracy of data also increases stability and performance while improving reusability and maintainability.
Data increasingly drives enterprise decision-making, but it must undergo a variety of changes and processes to go from raw form to formats more practical for identifying relationships and facilitating informed decisions. Therefore, data integrity is a top priority for modern enterprises.
Types of Data Integrity
Data integrity can be compromised in a variety of ways, making data integrity practices an essential component of effective enterprise security protocols. Data integrity may be compromised through:
- Human error, whether malicious or unintentional
- Transfer errors, including unintended alterations or data compromise during transfer from one device to another
- Bugs, viruses/malware, hacking, and other cyber threats
- Compromised hardware, such as a device or disk crash
- Physical compromise to devices
Since only some of these compromises may be adequately prevented through data security, the case for data backup and duplication becomes critical for ensuring data integrity. Other data integrity best practices include input validation to preclude the entering of invalid data, error detection/data validation to identify errors in data transmission, and security measures such as data loss prevention, access control, data encryption, and more.
DATA INTEGRITY FOR DATABASES
In the broad sense, data integrity is a term to understand the health and maintenance of any digital information. For many, the term is related to database management. For databases, there are four types of data integrity.
- Entity Integrity: In a database, there are columns, rows, and tables. In a primary key, these elements are to be as numerous as needed for the data to be accurate, yet no more than necessary. None of these elements should be the same and none of these elements should be null. For example, a database of employees should have primary key data of their name and a specific “employee number.”
- Referential Integrity: Foreign keys in a database is a second table that can refer to a primary key table within the database. Foreign keys relate data that could be shared or null. For instance, employees could share the same role or work in the same department.
- Domain Integrity: All categories and values in a database are set, including nulls (e.g., N/A). The domain integrity of a database refers to the common ways to input and read this data. For instance, if a database uses monetary values to include dollars and cents, three decimal places will not be allowed.
- User-Defined Integrity: There are sets of data, created by users, outside of entity, referential and domain integrity. If an employer creates a column to input corrective action of employees, this data would be classified as “user-defined.”
Image Source: Oracle
DATA INTEGRITY VS. DATA SECURITY
Data integrity and data security are related terms, each playing an important role in the successful achievement of the other. Data security refers to the protection of data against unauthorized access or corruption and is necessary to ensure data integrity.
That said, data integrity is a desired result of data security, but the term data integrity refers only to the validity and accuracy of data rather than the act of protecting data. Data security, in other words, is one of several measures which can be employed to maintain data integrity. Whether it's a case of malicious intent or accidental compromise, data security plays an important role in maintaining data integrity.
For modern enterprises, data integrity is essential for the accuracy and efficiency of business processes as well as decision making. It’s also a central focus of many data security programs. Achieved through a variety of data protection methods, including backup and replication, database integrity constraints, validation processes, and other systems and protocols, data integrity is critical yet manageable for organizations today.
Frequently Asked Questions
What is an example of data integrity?
Data integrity is a concept that speaks to the maintenance and assurance of consistent and accurate data throughout its lifecycle. Data integrity is a crucial aspect of any system that handles and processes data. An example of data integrity can be seen in the processes and procedures an organization employs to ensure that customer data stored in a cloud database is accurate and free from corrupting influences such as malware or unauthorized access.
What are the five principles of data integrity?
The five principles of data integrity are known by the acronym ALCOA.
- Attributable- All data needs to be attributable to the individual who generates it and should include a timestamp indicating when an action regarding the data was performed.
- Legible - Recorded data needs to be permanent and legible.
- Contemporaneous - This principle states that all data must be recorded at the time an action is performed, not retroactively to fit expectations.
- Original - All data should be in its original form and should not be transferred to alternate formats.
- Accurate - Data should be accurate, truthful, complete, and free of errors.
What are the four types of logical data integrity?
Data integrity can be categorized as physical or logical. Logical data integrity is further defined with four types.
- Entity integrity is enforced with primary keys that identify data as distinct entities and ensure elements are not duplicated.
- Referential integrity employs embedded rules that allow the combination of tables in a database.
- Domain integrity is the collection of processes that ensure only acceptable values are stored in a database column.
- User-defined integrity allows users to define constraints that determine how data is used for a particular purpose.
What are the key elements of data integrity?
The three key elements of data integrity are completeness, accuracy, and consistency over the lifetime of the data. Data integrity cannot be ensured if one of these elements is missing.
What are 3 ways data integrity can be compromised?
Data integrity can be compromised in multiple ways including these three:
- Malware or viruses that corrupt data resources
- Accidental or malicious human activity
- Errors during data transfer that are caused by incompatibility or inconsistent connectivity