- JSON representation
- Profile
- Field
- ProfileInfo
- TopNValue
- StringFieldInfo
- IntegerFieldInfo
- DoubleFieldInfo
- PostScanActionsResult
- BigQueryExportResult
- State
DataProfileResult defines the output of DataProfileScan. Each field of the table will have field type specific profile result.
JSON representation |
---|
{ "rowCount": string, "profile": { object ( |
Fields | |
---|---|
rowCount |
The count of rows scanned. |
profile |
The profile information per field. |
scannedData |
The data scanned for this result. |
postScanActionsResult |
Output only. The result of post scan actions. |
Profile
Contains name, type, mode and field type specific profile information.
JSON representation |
---|
{
"fields": [
{
object ( |
Fields | |
---|---|
fields[] |
List of fields with structural and profile information for each field. |
Field
A field within a table.
JSON representation |
---|
{
"name": string,
"type": string,
"mode": string,
"profile": {
object ( |
Fields | |
---|---|
name |
The name of the field. |
type |
The data type retrieved from the schema of the data source. For instance, for a BigQuery native table, it is the BigQuery Table Schema. For a Dataplex Entity, it is the Entity Schema. |
mode |
The mode of the field. Possible values include:
|
profile |
Profile information for the corresponding field. |
ProfileInfo
The profile information for each field type.
JSON representation |
---|
{ "nullRatio": number, "distinctRatio": number, "topNValues": [ { object ( |
Fields | |
---|---|
nullRatio |
Ratio of rows with null value against total scanned rows. |
distinctRatio |
Ratio of rows with distinct values against total scanned rows. Not available for complex non-groupable field type, including RECORD, ARRAY, GEOGRAPHY, and JSON, as well as fields with REPEATABLE mode. |
topNValues[] |
The list of top N non-null values, frequency and ratio with which they occur in the scanned data. N is 10 or equal to the number of distinct values in the field, whichever is smaller. Not available for complex non-groupable field type, including RECORD, ARRAY, GEOGRAPHY, and JSON, as well as fields with REPEATABLE mode. |
Union field field_info . Structural and profile information for specific field type. Not available, if mode is REPEATABLE. field_info can be only one of the following: |
|
stringProfile |
String type field information. |
integerProfile |
Integer type field information. |
doubleProfile |
Double type field information. |
TopNValue
Top N non-null values in the scanned data.
JSON representation |
---|
{ "value": string, "count": string, "ratio": number } |
Fields | |
---|---|
value |
String value of a top N non-null value. |
count |
Count of the corresponding value in the scanned data. |
ratio |
Ratio of the corresponding value in the field against the total number of rows in the scanned data. |
StringFieldInfo
The profile information for a string type field.
JSON representation |
---|
{ "minLength": string, "maxLength": string, "averageLength": number } |
Fields | |
---|---|
minLength |
Minimum length of non-null values in the scanned data. |
maxLength |
Maximum length of non-null values in the scanned data. |
averageLength |
Average length of non-null values in the scanned data. |
IntegerFieldInfo
The profile information for an integer type field.
JSON representation |
---|
{ "average": number, "standardDeviation": number, "min": string, "quartiles": [ string ], "max": string } |
Fields | |
---|---|
average |
Average of non-null values in the scanned data. NaN, if the field has a NaN. |
standardDeviation |
Standard deviation of non-null values in the scanned data. NaN, if the field has a NaN. |
min |
Minimum of non-null values in the scanned data. NaN, if the field has a NaN. |
quartiles[] |
A quartile divides the number of data points into four parts, or quarters, of more-or-less equal size. Three main quartiles used are: The first quartile (Q1) splits off the lowest 25% of data from the highest 75%. It is also known as the lower or 25th empirical quartile, as 25% of the data is below this point. The second quartile (Q2) is the median of a data set. So, 50% of the data lies below this point. The third quartile (Q3) splits off the highest 25% of data from the lowest 75%. It is known as the upper or 75th empirical quartile, as 75% of the data lies below this point. Here, the quartiles is provided as an ordered list of approximate quartile values for the scanned data, occurring in order Q1, median, Q3. |
max |
Maximum of non-null values in the scanned data. NaN, if the field has a NaN. |
DoubleFieldInfo
The profile information for a double type field.
JSON representation |
---|
{ "average": number, "standardDeviation": number, "min": number, "quartiles": [ number ], "max": number } |
Fields | |
---|---|
average |
Average of non-null values in the scanned data. NaN, if the field has a NaN. |
standardDeviation |
Standard deviation of non-null values in the scanned data. NaN, if the field has a NaN. |
min |
Minimum of non-null values in the scanned data. NaN, if the field has a NaN. |
quartiles[] |
A quartile divides the number of data points into four parts, or quarters, of more-or-less equal size. Three main quartiles used are: The first quartile (Q1) splits off the lowest 25% of data from the highest 75%. It is also known as the lower or 25th empirical quartile, as 25% of the data is below this point. The second quartile (Q2) is the median of a data set. So, 50% of the data lies below this point. The third quartile (Q3) splits off the highest 25% of data from the lowest 75%. It is known as the upper or 75th empirical quartile, as 75% of the data lies below this point. Here, the quartiles is provided as an ordered list of quartile values for the scanned data, occurring in order Q1, median, Q3. |
max |
Maximum of non-null values in the scanned data. NaN, if the field has a NaN. |
PostScanActionsResult
The result of post scan actions of DataProfileScan job.
JSON representation |
---|
{
"bigqueryExportResult": {
object ( |
Fields | |
---|---|
bigqueryExportResult |
Output only. The result of BigQuery export post scan action. |
BigQueryExportResult
The result of BigQuery export post scan action.
JSON representation |
---|
{
"state": enum ( |
Fields | |
---|---|
state |
Output only. Execution state for the BigQuery exporting. |
message |
Output only. Additional information about the BigQuery exporting. |
State
Execution state for the exporting.
Enums | |
---|---|
STATE_UNSPECIFIED |
The exporting state is unspecified. |
SUCCEEDED |
The exporting completed successfully. |
FAILED |
The exporting is no longer running due to an error. |
SKIPPED |
The exporting is skipped due to no valid scan result to export (usually caused by scan failed). |