Example2: Working with specific Datatable (use "WaterSystem" as example)

import sdwis_drink_water

# Due to the width limitation, I wrote some functions for outputting some results using an interactive scrollbar

water_system_api = sdwis_drink_water.models.WaterSystem(print_url=False)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 import sdwis_drink_water
      3 # Due to the width limitation, I wrote some functions for outputting some results using an interactive scrollbar
      5 water_system_api = sdwis_drink_water.models.WaterSystem(print_url=False)

ModuleNotFoundError: No module named 'sdwis_drink_water'

fetch table column name

water_system_columns = water_system_api.get_table_column_name(print_to_console=False)
sdwis_drink_water.utils_for_jupyter_print.print_columns(water_system_columns)

fetch table column names with descriptions

water_system_columns_description_dict = water_system_api.get_table_columns_description(print_to_console=False, multi_threads=True)
sdwis_drink_water.utils_for_jupyter_print.print_column_description(water_system_columns_description_dict)
Fetching column descriptions for WATER_SYSTEM from SDWIS Official Website by multi_threads_mode: 100%|██████████| 45/45 [00:05<00:00,  8.30it/s]

fetch first data from Water_System table

# fetch first data from LcrSampleResult table
first_water_system_data = water_system_api.get_table_first_data(print_to_console=False)

sdwis_drink_water.utils_for_jupyter_print.print_result_data(first_water_system_data)

fetch first n data from Water_System table

first_20_water_system_data = water_system_api.get_table_first_n_data(n=20, print_to_console=False)

sdwis_drink_water.utils_for_jupyter_print.print_result_data(first_20_water_system_data)

We can use this method to quickly access the full datasheet, then export to a file

water_system_data_number = water_system_api.get_table_data_number()
all_water_system = water_system_api.get_table_first_n_data(n=water_system_data_number, multi_threads=True, print_to_console=False)
# You can also just set n to a big number
# all_water_system = water_system_api.get_table_first_n_data(n=99999999, multi_threads=True, print_to_console=False)
all_water_system.export_data("./output_files/all_water_system.xlsx", format_type="xlsx")
WATER_SYSTEM
The data number in the database provided by the API is 428808. Your request number exceeds this limit, please note.
Fetching Data by multi_threads_mode:   0%|          | 0/43 [00:00<?, ?it/s]
  0%|          | 0/43 [00:00<?, ?it/s]
 28%|██▊       | 12/43 [00:02<00:05,  5.97it/s]
 30%|███       | 13/43 [00:04<00:10,  2.75it/s]
 33%|███▎      | 14/43 [00:06<00:16,  1.73it/s]
 35%|███▍      | 15/43 [00:08<00:22,  1.24it/s]
 37%|███▋      | 16/43 [00:10<00:27,  1.03s/it]
 40%|███▉      | 17/43 [00:12<00:32,  1.23s/it]
 42%|████▏     | 18/43 [00:14<00:35,  1.41s/it]
 44%|████▍     | 19/43 [00:16<00:37,  1.56s/it]
 47%|████▋     | 20/43 [00:18<00:38,  1.68s/it]
 49%|████▉     | 21/43 [00:20<00:38,  1.77s/it]
 51%|█████     | 22/43 [00:22<00:38,  1.83s/it]
 53%|█████▎    | 23/43 [00:24<00:37,  1.88s/it]
 56%|█████▌    | 24/43 [00:26<00:36,  1.93s/it]
 58%|█████▊    | 25/43 [00:28<00:35,  1.95s/it]
 60%|██████    | 26/43 [00:30<00:33,  1.97s/it]
 63%|██████▎   | 27/43 [00:32<00:31,  1.98s/it]
 65%|██████▌   | 28/43 [00:34<00:29,  1.99s/it]
 67%|██████▋   | 29/43 [00:36<00:27,  1.99s/it]
 70%|██████▉   | 30/43 [00:38<00:25,  2.00s/it]
 72%|███████▏  | 31/43 [00:40<00:24,  2.00s/it]
 74%|███████▍  | 32/43 [00:42<00:22,  2.01s/it]
 77%|███████▋  | 33/43 [00:44<00:20,  2.01s/it]
 79%|███████▉  | 34/43 [00:46<00:18,  2.01s/it]
 81%|████████▏ | 35/43 [00:48<00:16,  2.01s/it]
 84%|████████▎ | 36/43 [00:50<00:14,  2.01s/it]
 86%|████████▌ | 37/43 [00:52<00:12,  2.01s/it]
 88%|████████▊ | 38/43 [00:54<00:10,  2.01s/it]
 91%|█████████ | 39/43 [00:56<00:08,  2.01s/it]
 93%|█████████▎| 40/43 [00:58<00:06,  2.01s/it]
 95%|█████████▌| 41/43 [01:00<00:04,  2.01s/it]
 98%|█████████▊| 42/43 [01:02<00:02,  2.01s/it]
100%|██████████| 43/43 [01:16<00:00,  1.78s/it]
Fetching Data by multi_threads_mode:   0%|          | 0/43 [01:16<?, ?it/s]
Data is successfully exported to ./output_files/all_water_system.xlsx!

fetch data from Water_System table by conditions

columbia_water_systems = water_system_api.get_water_system_data_by_conditions("city_name=columbia", print_to_console=False)
columbia_water_systems.count()
487

get summarize data number according to epa_region

water_system_api.summarize_water_system_data_by_epa_region(multi_threads=True, print_to_console=True)
Fetching the total amount of data by EPA region from the WATER_SYSTEM data table: 100%|██████████| 10/10 [00:05<00:00,  1.91it/s]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
┌─────────────────┬─────────────────────────────┐
│   EPA_REGION_ID │   WATER_SYSTEM_TOTAL_NUMBER │
├─────────────────┼─────────────────────────────┤
│    EPA_REGION_1 │                       45508 │
├─────────────────┼─────────────────────────────┤
│    EPA_REGION_2 │                       67410 │
├─────────────────┼─────────────────────────────┤
│    EPA_REGION_3 │                       24074 │
├─────────────────┼─────────────────────────────┤
│    EPA_REGION_4 │                       33898 │
├─────────────────┼─────────────────────────────┤
│    EPA_REGION_5 │                       30931 │
├─────────────────┼─────────────────────────────┤
│    EPA_REGION_6 │                       15799 │
├─────────────────┼─────────────────────────────┤
│    EPA_REGION_7 │                       43325 │
├─────────────────┼─────────────────────────────┤
│    EPA_REGION_8 │                       25563 │
├─────────────────┼─────────────────────────────┤
│    EPA_REGION_9 │                       17077 │
├─────────────────┼─────────────────────────────┤
│   EPA_REGION_10 │                      125223 │
└─────────────────┴─────────────────────────────┘
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

get data according to epa_region and export

water_system_epa_1 = water_system_api.get_water_system_by_epa_region(epa_region=1, print_to_console=True)
# export data to xlsx
water_system_epa_1.export_data("./output_files/Water System of EPA region 1.xlsx", format_type="xlsx")
IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)
Data is successfully exported to ./output_files/Water System of EPA region 1.xlsx!

Data Handle

# print all keys to determine how to use this data
sdwis_drink_water.utils_for_jupyter_print.print_columns(list(water_system_epa_1.get_all_keys()))

# find by condition "is_grant_eligible_ind==Y"
is_grant_eligible_ind = water_system_epa_1.find_by_condition("is_grant_eligible_ind=Y")
print(is_grant_eligible_ind.count())

# find by condition "is_wholesaler_ind==Y"
is_wholesaler_ind = water_system_epa_1.find_by_condition("is_wholesaler_ind=Y")
print(is_wholesaler_ind.count())

# find by condition "is_grant_eligible_ind==Y" and "is_wholesaler_ind==Y"
is_grant_eligible_ind_and_is_wholesaler_ind = is_grant_eligible_ind.find_by_condition("is_wholesaler_ind=Y")
print(is_grant_eligible_ind_and_is_wholesaler_ind.count())

# export data to csv
is_grant_eligible_ind_and_is_wholesaler_ind.export_data("./output_files/water_system_epa_1_filtered_result.xlsx",
                                                        format_type="xlsx")
10112
41
25
Data is successfully exported to ./output_files/water_system_epa_1_filtered_result.xlsx!