Example3: Quickly check how many sample results are out of standard

This sample example is why I developed this dependency package in the first place.

EPA is proposing to lower the lead action level from 15 µg/L to 10 µg/L.

We want to look at the database and see those samples that match the original rule, but not the new one.

Step1: we should know the structure of the Table Lcr_Sample_Result

we can use get_table_first_data method to fetch first data from LcrSampleResult table, the output is a table containing a single row of data. This table is formatted with column names in the first row and the corresponding values in the second row.

import sdwis_drink_water

lcr_sample_result_api = sdwis_drink_water.models.LcrSampleResult()

# fetch first 10 data from LcrSampleResult table
first_10_lcr_sample_result_data = lcr_sample_result_api.get_table_first_n_data(n=10, print_to_console=False)

sdwis_drink_water.utils_for_jupyter_print.print_result_data(first_10_lcr_sample_result_data)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 import sdwis_drink_water
      3 lcr_sample_result_api = sdwis_drink_water.models.LcrSampleResult()
      5 # fetch first 10 data from LcrSampleResult table

ModuleNotFoundError: No module named 'sdwis_drink_water'

Step2: Understand Data Meaning

To understand what each column means, we can use the get_table_columns_description method

# Setting "multi_threads=True" will use multithreading to speed up the fetching of data
table_column_description_dict = lcr_sample_result_api.get_table_columns_description(multi_threads=True,
                                                                                    print_to_console=False)
sdwis_drink_water.utils_for_jupyter_print.print_column_description(table_column_description_dict)
Fetching column descriptions for LCR_SAMPLE_RESULT from SDWIS Official Website by multi_threads_mode: 100%|██████████| 9/9 [00:02<00:00,  3.25it/s]

Step 3: Start Filtering Data Based on Our Requirements

When we combine the provided descriptions, we gain a clearer understanding of the dataset:

  • "contaminant_code": Represents the contamination element in the sample test.

  • "unit_of_measure": Denotes the unit of contaminant content.

  • "result_sign_code": Indicates the operation symbol. For results less than 0.01, which are not subdivided, this value is “<”.

  • "sample_measure": Refers to the contaminant content in the sample test.

Unit Conversion

As the unit of measure in our data is “mg/L”, it’s necessary to adjust the units stated in the regulations for consistency:

Before:

EPA is proposing to lower the lead action level from 15 µg/L to 10 µg/L.

After:

EPA is proposing to lower the lead action level from 0.015 mg/L to 10 µg/L.

Filtering Data

To filter data from the LCR_SAMPLE_RESULT_TABLE, we will use the "get_lcr_sample_result_data_by_conditions" method, adhering to the specifications outlined below.

Original Rule Conditions

For the original rule, apply these three conditional formulas:

  • contaminant_code = PB90

  • result_sign_code = =

  • sample_measure > 0.015

New Rule Conditions

For the new rule, the conditional formulas are slightly different:

  • contaminant_code = PB90

  • result_sign_code = =

  • sample_measure > 0.01

# Samples exceeding the original rule
pb90_exceed_original_rule = lcr_sample_result_api.get_lcr_sample_result_data_by_conditions("contaminant_code=PB90",
                                                                                           "result_sign_code==",
                                                                                           "sample_measure>0.015")
# Samples exceeding the new rule
pb90_exceed_new_rule = lcr_sample_result_api.get_lcr_sample_result_data_by_conditions("contaminant_code=PB90",
                                                                                      "result_sign_code==",
                                                                                      "sample_measure>0.01")

Step 4: Handle fetched data with some methods

we can handle fetched data by use some methods, including "intersect_with", "merge_with", "difference_with". Also, we can use "count()" method to get the number of data

# here intersection is the number of pb90_exceed_original_rule
pb90_exceed_original_rule.intersect_with(pb90_exceed_new_rule).count()
# here union is the number of pb90_exceed_new_rule
pb90_exceed_original_rule.merge_with(pb90_exceed_new_rule).count()
# get difference. These samples comply with the original rule, but not the new rule. Additional attention is required
pb90_exceed_new_rule.difference_with(pb90_exceed_original_rule).count()
65

Step 5: Export data we need to file

we can use "count()" method to export data supported “format_type” includes:

  • txt

  • csv

  • xlsx

  • parquet

result_we_need = pb90_exceed_new_rule.difference_with(pb90_exceed_original_rule)
result_we_need.export_data("./output_files/Sample sets for additional attention in the new regulation of Lead.xlsx",
                           format_type="xlsx")
Data is successfully exported to ./output_files/Sample sets for additional attention in the new regulation of Lead.xlsx!