An expression is required to determine the scope of a data
quality monitoring rule.
All functions can be connected using boolean operators, such as AND, OR, and
NOT, using parentheses to indicate precedence. Functions based on historical
statistics are gathered using the SHOW STATS query on the table. Expressions
are case insensitive.
The following table contains examples of valid data quality rule expressions:
| Expression | Description | 
|---|---|
| row_count_min(5000) | There are a minimum of 5000rows in the table. | 
| row_count_max(99999) | There are a maximum of 99999rows in the table. | 
| row_count_range(5000, 99999) | The number of rows is between 5000and999999. | 
| row_count_delta(1000) | Row count cannot vary by more than 1000compared to previous row count. | 
| row_count_delta(0.05) | Row count cannot vary by more than 5%compared to previous row count. | 
| nulls_fraction_min("age", 0.2) | Column ageminimum fraction ofNULLvalues is0.2. | 
| nulls_fraction_max("age", 0.3) | Column agemaximum fraction ofNULLvalues is0.3. | 
| nulls_fraction_range("age", 0.1, 0.9) | Column agemaximum fraction range ofNULLvalues from0.1to0.9. | 
| nulls_fraction_rows_delta("age", 5000) | Column agerow count multiplied by fraction ofNULLvalues cannot vary by more than5000of previous such multiplication. | 
| nulls_fraction_delta("age", 0.2) | Column agefraction ofNULLvalues cannot vary by more than0.2of previous such multiplication. | 
| nulls_fraction_rows_delta("age", 5000) | Column agerow count multiplied by fraction ofNULLvalues cannot vary by more than5000of previous such multiplication. | 
| low_value_min("age", 18) | Column agelowest value must be at least18. | 
| low_value_max("age", 34) | Column agelowest value must be less than or equal to34. | 
| low_value_range("saturation", 0.5, 0.99) | Column saturationlowest value must be between0.5and0.99. | 
| low_value_delta("age", 10) | Column agelowest value cannot vary by more than10from previous low value. | 
| high_value_min("age", 18) | Column agehighest must be at least18. | 
| high_value_max("age", 34) | Column agehighest value must be less than or equal to34. | 
| high_value_range("saturation", 0.5, 0.99) | Column saturationhighest value must be between0.5and0.99. | 
| high_value_delta("age", 10) | Column agehighest value cannot vary by more than10from previous high value. | 
| distinct_values_count_min("age", 200) | Column ageminimum count of distinct values is200. | 
| distinct_values_count_max("age", 9999) | Column agemaximum count of distinct values is9999. | 
| distinct_values_count_range("age", 200, 9999) | Column agecount of distinct values is between200and9999. | 
| distinct_values_count_delta("age", 5000) | Column agecount of distinct values cannot vary by more than5000from previous distinct values count. | 
| data_size_min("csv_attachment", 200) | Column csv_attachmentminimum data size is200bytes. | 
| data_size_max("csv_attachment", 9999) | Column csv_attachmentmaximum data size is9999bytes. | 
| data_size_range("csv_attachment", 200, 9999) | Column csv_attachmentdata size is between200bytes and9999bytes. | 
| data_size_delta("csv_attachment", 5000) | Column csv_attachmentdata size cannot vary by more than5000bytes from previous data size. | 
| nulls_fraction_min(“temperature”, 0.6) OR (row_count_min(7) AND nulls_fraction_min(“humidity”, 0.3)) | Column temperatureminimum fraction ofNULLvalues is0.6or there are a minimum of7rows and columnhumidityminimum fraction ofNULLvalues is0.3. | 
Is the information on this page helpful?
Yes
No