I. Definition and Characteristics of Data
So, what exactly is data? Simply put, data is a representation of the objective world. It is the result obtained from facts or observations and the raw material formed after logically summarizing objective things. It's like when we observe natural phenomena and record social events. The content from these observations and records is the prototype of data.
Data comes in various forms. It can be in the form of continuous values, such as sound and images. We refer to these as analog data or measurement - type data. Imagine the melodious tune of music. It changes continuously on the time axis, which is a type of continuous data. Also, consider the beautiful landscape photos we see. The color and light - shadow information in them are also continuous analog data.
Data can also be discrete, such as symbols and characters. As an important tool for human communication and information recording, each character in written language is independent. When combined, they convey specific meanings, and this type of data is called numerical data or count - type data.In a computer system, data exists in the form of binary information units, 0 and 1. A computer is like a super translator that converts all kinds of real - world data into combinations of 0s and 1s, and then stores, processes, and transmits them.
On March 30, 2020, the Communist Party of China Central Committee and the State Council released the "Opinions on Establishing a More Complete System and Mechanism for the Market - Oriented Allocation of Production Factors". This is the first central document on the market - oriented allocation of production factors, and it is of great significance. The document sets out the directions for reforms in five key areas of production factors: land, labor, capital, technology, and data. Among them, "data", as a new type of production factor, is for the first time listed alongside traditional factors, which fully reflects the important position of data in today's economic and social development. Data is no longer just information for recording and storage; it has become an important driving force for economic development, like a new form of energy, injecting vitality into various industries.
II. Connotation and Requirements of Data Quality
So, what is data quality? Data quality refers to the degree to which a set of inherent attributes of data meets the requirements of data consumers. Here, data consumers can be enterprises, scientific research institutions, government departments, etc., and they have different needs for data.
The inherent attributes of data mainly include authenticity, timeliness, and relevance. Authenticity means that data is a true reflection of the objective world. For example, in the medical field, patients' medical record data must be real and accurate so that doctors can make correct diagnoses and treatment plans based on this data. If the data is not authentic, it may lead to serious consequences.
Timeliness requires that data be updated promptly as objective circumstances change. In the financial market, stock price data needs to be updated in real - time so that investors can make investment decisions based on the latest price information. If the data is not updated in a timely manner, investors may miss the optimal investment opportunities.
Relevance means that the data is what data consumers care about and need. For an e - commerce enterprise, data related to sales, such as product sales volume and user reviews, are what they focus on. These data can help the enterprise understand market demand, optimize products and services.
High - quality data needs to meet the requirements of data consumers from multiple perspectives. In terms of availability, data consumers should be able to obtain the data smoothly when they need it. This is similar to the situation in a library, where readers can conveniently find and borrow a certain book when they need it.
Timeliness not only requires that data can be obtained in a timely manner but also that it is updated in a timely fashion. Integrity means that the data is complete without any omissions. In a financial statement, all revenue and expenditure items should be fully recorded without any missing entries. Otherwise, it will affect the accurate assessment of an enterprise's financial situation.
Security refers to ensuring the safety of data and preventing unauthorized access and manipulation. In today's digital age, data breaches occur frequently, which can bring huge losses to enterprises and individuals. Therefore, it is crucial to protect data security.
Comprehensibility requires that data can be understood and interpreted. If the data is too complex or lacks clear explanations, data consumers will not be able to extract useful information from it. Correctness emphasizes that data is a true reflection of the real world. Similar to authenticity, it focuses more on the accuracy and reliability of the data.
Based on the above requirements for data quality, we need to evaluate the data to determine whether it meets the needs of consumers. This is the core content of data quality management.
III. Contents and Influencing Factors of Data Quality Management
Data quality management is the management of data throughout its entire lifecycle, from planning, acquisition, storage, sharing, maintenance, application to extinction. At each stage of this process, various data quality issues may arise, which require us to carry out a series of management activities such as identification, measurement, monitoring and early - warning. By improving the management level of the organization, we can further enhance the data quality.
The assessment of data quality has multiple dimensions. Completeness refers to whether the data information is complete and whether there are any missing parts. For example, in an employee information form, if the contact information of some employees is missing, then there is a problem with the completeness of this data.
Normative requirements state that records should comply with the specifications and be stored in the prescribed format. For example, under the standard coding rules, each product has a unique code. If the code does not conform to the rules, it will lead to data chaos.
Consistency means that data should be logical, and there are reasonable logical relationships among single or multiple items of data. In a sales dataset, if the sales volume of a certain product suddenly shows an abnormal increase without a reasonable explanation, then there may be a problem of data inconsistency.
Accuracy is used to measure which data and information are incorrect or whether the data is outdated. If the data in a market research report is from several years ago, it is inaccurate for the current market analysis.
Timeliness refers to the time interval between the generation of data and its availability for viewing, which is also called the data latency. In a real-time monitoring system, the timeliness of data is very important. If the data latency is too long, the data will lose its value.
Uniqueness is used to measure which data are duplicate data or which attributes of the data are duplicate. In a customer information database, if there are multiple identical customer records, it will lead to data redundancy and management chaos.
Rationality refers to judging whether the data is correct from the perspective of business logic. In the evaluation, the practices of standardization and consistency can be referred to. Redundancy means whether there is unnecessary data redundancy in multi - level data. Accessibility emphasizes whether the data is easy to obtain, understand, and use.
The factors influencing data quality mainly come from four aspects. In terms of information factors, the reasons for data quality problems include errors in metadata description and understanding, the inability to guarantee various properties of data measurement, and inappropriate change frequencies. For example, in a data warehouse, if the metadata description is inaccurate, it will lead to deviations in data usage and analysis.
Technical factors mainly refer to data quality problems caused by abnormalities in each technical link of specific data processing. Problems may occur in various links such as data creation, acquisition, transmission, loading, use, and maintenance. For example, during the data transmission process, if the network is unstable, it may lead to data loss or errors.
Process factors refer to data quality issues caused by improper settings of system operation processes and manual operation processes. All aspects of the system data, including the creation process, transfer process, loading process, usage process, maintenance process, and auditing process, need to be reasonably configured. If there is a lack of an auditing link in the data creation process, a large amount of incorrect data may be generated.
Management factors refer to data quality problems caused by reasons related to personnel quality and management mechanisms. Inadequate personnel training, improper personnel management, ineffective training, or inappropriate reward and punishment measures may all lead to management deficiencies or flaws. If employees have not received professional training, they may make mistakes during the data operation process.
Organizations can address data quality issues from the perspective of the Deming Cycle (PDCA). The PDCA cycle consists of four phases: Plan, Do, Check, and Act. Through continuous cycle improvement, data quality can be enhanced.
IV. Policy Significance and Quality Management Considerations of Data Elements
Data, as a new type of production factor, has been written into the "Opinions", which sends an important policy signal. This indicates that the role of data in economic development is receiving increasing attention, and it will play a greater role in various fields in the future. How can we interpret this policy signal? We can understand it from the multiplier effect of data factors on the efficiency of other factors. Data is like a catalyst that can improve the utilization efficiency of traditional factors such as land, labor, capital, and technology, and promote high - quality economic development.
How to make the allocation of data elements more well - regulated is also an important issue. This requires the establishment of a scientific and reasonable data element allocation mechanism to ensure that data can be fairly and efficiently distributed among different entities. How can the big data trading market achieve a breakthrough from scratch? This calls for efforts in multiple aspects, such as improving relevant laws and regulations, establishing data trading platforms, and cultivating professional talents.
In the internationally recognized quality management system ISO 9001, different clauses mention multiple times that an organization should evaluate and analyze data to improve its processes. In ISO 9001:2015, the general principles state that applying the process approach in the quality management system enables an organization to understand and continuously meet requirements, consider processes from a value - added perspective, achieve effective process performance, and improve processes based on the evaluation of data and information. The clause on analysis and evaluation requires the organization to analyze and evaluate appropriate data and information obtained through monitoring and measurement, and use the results of the analysis to evaluate multiple aspects, such as the conformity of products and services, customer satisfaction, the performance and effectiveness of the quality management system.
Finally, we leave you with a question to ponder: How can an organization make good use of quality data to improve its processes? This requires the organization to establish a comprehensive data management system, cultivate employees' data awareness, and apply scientific data analysis methods to extract valuable information from quality data, so as to continuously optimize the organization's processes and decision - making.