This page explains how to perform numeric calculations when you prepare data in the Wrangler workspace of the Cloud Data Fusion Studio.
You can perform these transformations in columns of the following data types:
- Integer
- Decimal
- Double
- Float
- Long
- Short
For more information, see the set-column
directive.
You can use numeric functions on one or more columns. The available numeric functions vary by the number of columns you select. You can choose to create a new column with the results of the calculations, or you can have the results appear in the column where you add the transformation.
If you apply a numeric function to a single column, the behavior is different than if you apply a numeric function to multiple columns. For example:
- If you use the
Multiply()
function on a single column, you specify the decimal value to multiply each value in the column by. Wrangler performs the multiplication on the sample data and displays the new values in the same column or in a new column. - If you use the
Multiply()
function on multiple columns, Wrangler multiplies the values in each row for the selected columns and displays the new values in the first column of the transformation.
Perform numeric calculations on values in one column
To apply a numeric calculation to one column, follow these steps:
- Go to the Wrangler workspace in Cloud Data Fusion.
- On the Data tab, go to a column name and click the arrow_drop_down expander arrow.
Select Calculate and select an option—for example, Add.
To complete the calculation, some functions require you to enter a decimal value. For example, if you select Subtract, you must enter the value to subtract from each row.
Optional: to create a new column for the calculated values, select the Copy to a new column checkbox.
Click Apply.
The values change based on the calculation. Wrangler adds the corresponding
directive as a step in the recipe. For example, if you subtract two from each
value in the Price
column, Wrangler adds the following transformation to the
recipe:
set-column :Price Price - 2
When you run the data pipeline, the transformation is applied to all values in the column.
Supported calculations for one column
You can perform the following calculations on all values in one column:
Function | Description |
---|---|
Absolute value | Returns the absolute value for each value in a column. For example, the absolute value of -10 is 10. |
Add | Adds a positive or negative decimal to a column or adds multiple columns to output the sum of each row. |
Arccos | Returns the inverse of cosine for each value in a column. |
Arcsin | Returns the inverse of sin for each value in a column. |
Arctan | Returns the inverse of tangent for each value in a column. |
Ceil | Returns the smallest integer greater than or equal to the numeric value in each cell of a column. For example, if the value is 2.21, Ceil returns 3. If the value is 5.88, Ceil returns 6. If the value is -5.15, Ceil returns -5. |
Cos | Returns cosine of each column value. |
Cube | Raises each value in the column to the power of 3. |
Cube root | Returns the cube root of each value in the column. |
Divide | Divides each value in the column by a positive or negative decimal or divides each value in two columns. |
Floor | Returns the largest integer greater than or equal to the numeric value in each cell of a column. For example, if the value is 2.21, FLOOR returns 2. If the value is 5.88, FLOOR returns 5. Likewise, if the value is -5.15, FLOOR returns -6. |
Log | Returns the logarithm of each value in a column. |
Modulo | Returns the remainder when dividing each value in the column by a positive decimal or returns the remainder when dividing each value in two columns. |
Multiply | Multiplies each value in the column by a positive or negative decimal or multiplies each value in two columns. |
Natural Log | Returns the natural logarithm of each value in a column. |
Power of | Raises the column value to the specified power of value. |
Random | Generates random, unpredictable Double values in a specified column. |
Round | Rounds each value in a column. |
Sin | Returns the sin for each value in a column. |
Square | Raises each value in the column to the power of 2. |
Square root | Returns the square root of each value in a column. |
Substract | Subtracts a positive or negative decimal to each value in the column. |
Tan | Returns the tangent for each value in a column. |
Perform numeric calculations on values in two columns
To apply a numeric calculation to two columns, follow these steps:
- Go to the Wrangler workspace in Cloud Data Fusion.
- On the Data tab, select the checkbox by two column names.
- Go to a column name and click the arrow_drop_down expander arrow.
- Select Calculate and select an option—for example, Add.
- Optional: to create a new column for the calculated values, select the Copy to a new column checkbox. Otherwise, the calculated values override existing values.
- Click Apply.
The values change based on the calculation. Wrangler adds the corresponding
directive as a step in the recipe. For example, if you add the values in each row
of the Q1_Sales
and Q2_Sales
columns and create a new column called
H1_Sales
, Wrangler adds the following transformation to the recipe:
set-column :H1_Sales arithmetic:add(Q1_Sales, Q2_Sales)
When you run the data pipeline, Wrangler performs the transformation and creates
a new column called H1_Sales
with the total of Q1_Sales
and Q2_Sales
.
Supported calculations in two columns
You can perform the following numeric calculations on values in each row in two columns:
Function | Description |
---|---|
Add | Adds a positive or negative decimal to a column or adds multiple columns to output the sum of each row. |
Average | Returns the average of each row in multiple columns. |
Divide | Divides each value in the column by a positive or negative decimal or divides each value in two columns. |
Equal | Boolean function that compares two columns to see if values match, row
by row. Returns true or false . |
LCM | Returns the least common multiple of each number in two columns. |
Max | Returns the maximum value for each row in two columns. |
Min | Returns the minimum value for each row in two columns. |
Modulo | Returns the remainder when dividing each value in the column by a positive decimal or returns the remainder when dividing each value in two columns. |
Multiply | Multiplies each value in the column by a positive or negative decimal or multiplies each value in two columns. |
Subtract | Subtracts a positive or negative decimal to each value in the column. |
Perform numeric calculations on values in three or more columns
Cloud Data Fusion supports performing numeric calculations on values in three or more columns in version 6.8.0 and later.
To apply a numeric calculation to three or more columns, follow these steps:
- Go to the Wrangler workspace in Cloud Data Fusion.
- On the Data tab, select the checkbox by two column names.
- Go to a column name and click the arrow_drop_down expander arrow.
- Select Calculate and select an option—for example, Add.
- Optional: to create a new column for the calculated values, select the Copy to a new column checkbox. Otherwise, the calculated values override existing values.
- Click Apply.
The values change based on the calculation. Wrangler adds the corresponding
directive as a step in the recipe. For example, if you add the values in each row
of the Q1_Sales
, Q2_Sales
, Q3_Sales
, and Q4_Sales
columns and create a
new column called 2022_Sales
, Wrangler adds the following transformation to
the recipe:
set-column :2022_Sales arithmetic:add(Q1_Sales, Q2_Sales, Q3_Sales, Q4_Sales)
When you run the data pipeline, Wrangler performs the transformation and creates
a new column called 2022_Sales
with the total of Q1_Sales
, Q2_Sales
,
Q3_Sales
, and Q4_Sales
.
Supported calculations in three or more columns
You can perform the following calculations on values in each row in three or more columns:
Function | Description |
---|---|
Add | Adds a positive or negative decimal to a column or adds multiple columns to output the sum of each row. |
Average | Returns the average of each row in multiple columns. |
Equal | Boolean function that compares two columns to see if values match, row
by row. Returns true or false . |
Max | Returns the maximum value for each row in two columns. |
Min | Returns the minimum value for each row in two columns. |
Multiply | Multiplies each value in the column by a positive or negative decimal or multiplies each value in two columns. |
What's next
- Learn more about Wrangler directives.