Sunday, December 20, 2020

Common Data Transformations

Forms of common data transformation

The basic data transformation types are enumerated and discussed in detail below:

Union transformations take two data sets as their input and produces an output data set that contains all entries found in both data sets. The output data set must have at least as many records as one of the two input data sets.

Intersection transformations take two data sets as their input and produces an output data set that contains only those entries found in both input data sets. The output data set must have at most as many records as one of the two input data sets.

Difference transformations take two data sets as their input and produce a data set that contains only those records found in the first data set but not the second data set. The output data set must have at most as many records as the number of records in the first input data set.

Selection transformations involve extracting some portion of the data based on zero or more filtering conditions or criteria. A selection transformation might return the entire original data set (e.g., if the criteria are already satisfied by all the records in the input data set), but it cannot return a result that is larger than the original data set.

A filtering condition within a selection transformation usually consists of a logical expression that is either true or false for each record. The condition can reference the values found in each record using their corresponding attribute/column names; it can also contain arithmetic operators (addition, subtraction, multiplication, division, and so on), relational operators (equality, comparison, and so on), and logical operations (such as "and" and "or"). In some database management systems, more complex conditions can be defined (e.g., ones that do text search).

Projection transformations involve converting every record in a data set in some way to produce a new data set. A projection transformation always produces the same number of records in the output data set as there were in the input data set. The conversion itself might throw away some attributes/columns or might introduce new ones. The definition of a projection transformation can use arithmetic and other operations to transform the values inside the input data set's records into the values within the records of the output data set.

Renaming transformations simply rename one or more of the attributes/columns. They are usually combined with projection and selection transformations so that the output data sets can have informative attribute/column names.

The advanced data transformation types are enumerated and discussed in detail below for your reference and review.

Aggregation transformations involve combining the values within a particular attribute/column across all records in a data setExamples of tasks for which this may be useful include counting the number of records in a data set, taking the sum of all values in a column, finding the maximum value across all values in a column, and so on. In its basic form, an aggregation transformation produces a data set with exactly one record.

In some languages and database management systems, it is possible to group the records using a particular attribute (which we call the grouping attribute) when performing an aggregation. In this case, the aggregation operation is only applied to those collections of records that have the same grouping attribute. In this case, the number of records in the output data set corresponds to the number of unique values found in the grouping attribute/column.

Join transformations take two input data sets and return their Cartesian product. Thus, the number of entries in the output data set may be larger (even significantly larger) than the number of entries in each of the two input data sets. It is common to combine join transformations with selection transformations in order to pair corresponding records using their identifiers (or other attributes) even if the records are found in different data sets. One example of this might be matching all purchase records in a purchases data set with all customer records in a customers data set.

 

No comments:

Post a Comment

Please keep your comments relevant.
Comments with external links and adult words will be filtered.