Example3: Quickly check how many sample results are out of standard
This sample example is why I developed this dependency package in the first place.
EPA is proposing to lower the lead action level from 15 µg/L to 10 µg/L.
We want to look at the database and see those samples that match the original rule, but not the new one.
Step1: we should know the structure of the Table Lcr_Sample_Result
we can use get_table_first_data method to fetch first data from LcrSampleResult table, the output is a table containing a single row of data.
This table is formatted with column names in the first row and the corresponding values in the second row.
import sdwis_drink_water
lcr_sample_result_api = sdwis_drink_water.models.LcrSampleResult()
# fetch first 10 data from LcrSampleResult table
first_10_lcr_sample_result_data = lcr_sample_result_api.get_table_first_n_data(n=10, print_to_console=False)
sdwis_drink_water.utils_for_jupyter_print.print_result_data(first_10_lcr_sample_result_data)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 1
----> 1 import sdwis_drink_water
3 lcr_sample_result_api = sdwis_drink_water.models.LcrSampleResult()
5 # fetch first 10 data from LcrSampleResult table
ModuleNotFoundError: No module named 'sdwis_drink_water'
Step2: Understand Data Meaning
To understand what each column means, we can use the get_table_columns_description method
# Setting "multi_threads=True" will use multithreading to speed up the fetching of data
table_column_description_dict = lcr_sample_result_api.get_table_columns_description(multi_threads=True,
print_to_console=False)
sdwis_drink_water.utils_for_jupyter_print.print_column_description(table_column_description_dict)
Fetching column descriptions for LCR_SAMPLE_RESULT from SDWIS Official Website by multi_threads_mode: 100%|██████████| 9/9 [00:02<00:00, 3.25it/s]
Step 3: Start Filtering Data Based on Our Requirements
When we combine the provided descriptions, we gain a clearer understanding of the dataset:
"contaminant_code": Represents the contamination element in the sample test."unit_of_measure": Denotes the unit of contaminant content."result_sign_code": Indicates the operation symbol. For results less than 0.01, which are not subdivided, this value is “<”."sample_measure": Refers to the contaminant content in the sample test.
Unit Conversion
As the unit of measure in our data is “mg/L”, it’s necessary to adjust the units stated in the regulations for consistency:
Before:
EPA is proposing to lower the lead action level from 15 µg/L to 10 µg/L.
After:
EPA is proposing to lower the lead action level from 0.015 mg/L to 10 µg/L.
Filtering Data
To filter data from the LCR_SAMPLE_RESULT_TABLE, we will use the "get_lcr_sample_result_data_by_conditions" method, adhering to the specifications outlined below.
Original Rule Conditions
For the original rule, apply these three conditional formulas:
contaminant_code=PB90result_sign_code==sample_measure>0.015
New Rule Conditions
For the new rule, the conditional formulas are slightly different:
contaminant_code=PB90result_sign_code==sample_measure>0.01
# Samples exceeding the original rule
pb90_exceed_original_rule = lcr_sample_result_api.get_lcr_sample_result_data_by_conditions("contaminant_code=PB90",
"result_sign_code==",
"sample_measure>0.015")
# Samples exceeding the new rule
pb90_exceed_new_rule = lcr_sample_result_api.get_lcr_sample_result_data_by_conditions("contaminant_code=PB90",
"result_sign_code==",
"sample_measure>0.01")
Step 4: Handle fetched data with some methods
we can handle fetched data by use some methods, including "intersect_with", "merge_with", "difference_with".
Also, we can use "count()" method to get the number of data
# here intersection is the number of pb90_exceed_original_rule
pb90_exceed_original_rule.intersect_with(pb90_exceed_new_rule).count()
# here union is the number of pb90_exceed_new_rule
pb90_exceed_original_rule.merge_with(pb90_exceed_new_rule).count()
# get difference. These samples comply with the original rule, but not the new rule. Additional attention is required
pb90_exceed_new_rule.difference_with(pb90_exceed_original_rule).count()
65
Step 5: Export data we need to file
we can use "count()" method to export data
supported “format_type” includes:
txtcsvxlsxparquet
result_we_need = pb90_exceed_new_rule.difference_with(pb90_exceed_original_rule)
result_we_need.export_data("./output_files/Sample sets for additional attention in the new regulation of Lead.xlsx",
format_type="xlsx")
Data is successfully exported to ./output_files/Sample sets for additional attention in the new regulation of Lead.xlsx!