Data Cleaning, Normalization, and Enhancement
Data cleaning, normalization, and enhancement techniques aim to address the quality of data sets. This can be measured in a number of ways; we will define each of them below by referring to the concepts we have seen in the previous sections.
- Validity refers to whether the values in the data set are of acceptable data types (e.g., integer, fractional number, or text), fall within acceptable ranges (e.g., between 0 and 100), are from an approved list of options (e.g., "Approved" or "Rejected"), are non-empty, and so on.
- Consistency refers to whether there are contradictory entries within a single data set or across data sets (e.g., if the same customer identifier is associated with different values in the an address column).
- Uniformity refers to whether the values found in records represent measurements in the same units (within the data set or across data sets).
- Accuracy refers to how well the values in each record represent the properties of the real-world object to which the record corresponds. In general, improving accuracy requires some external reference against which the data can be compared.
- Completeness refers to whether there are any missing values in the records. Missing data is very difficult to replace without going back and collecting it again; however, it is possible to introduce new values (such as "Unknown") as placeholders that reflect the fact that information is missing.
Very Informative and creative contents. This concept is a good way to enhance the knowledge. thanks for sharing. Continue to share your knowledge through articles like these, and keep posting on
ReplyDeleteData Engineering Services
Artificial Intelligence Solutions
Data Analytics Services
Business Intelligence Solutions
I appreciate you taking the time and effort to share your knowledge. This material proved to be really efficient and beneficial to me. Thank you very much for providing this information. Continue to write your blog.
ReplyDeleteData Engineering Services
Machine Learning Services
Data Analytics Services
Data Modernization Services