Example2: Working with specific Datatable (use "WaterSystem" as example)
import sdwis_drink_water
# Due to the width limitation, I wrote some functions for outputting some results using an interactive scrollbar
water_system_api = sdwis_drink_water.models.WaterSystem(print_url=False)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 1
----> 1 import sdwis_drink_water
3 # Due to the width limitation, I wrote some functions for outputting some results using an interactive scrollbar
5 water_system_api = sdwis_drink_water.models.WaterSystem(print_url=False)
ModuleNotFoundError: No module named 'sdwis_drink_water'
fetch table column name
water_system_columns = water_system_api.get_table_column_name(print_to_console=False)
sdwis_drink_water.utils_for_jupyter_print.print_columns(water_system_columns)
fetch table column names with descriptions
water_system_columns_description_dict = water_system_api.get_table_columns_description(print_to_console=False, multi_threads=True)
sdwis_drink_water.utils_for_jupyter_print.print_column_description(water_system_columns_description_dict)
Fetching column descriptions for WATER_SYSTEM from SDWIS Official Website by multi_threads_mode: 100%|██████████| 45/45 [00:05<00:00, 8.30it/s]
fetch first data from Water_System table
# fetch first data from LcrSampleResult table
first_water_system_data = water_system_api.get_table_first_data(print_to_console=False)
sdwis_drink_water.utils_for_jupyter_print.print_result_data(first_water_system_data)
fetch first n data from Water_System table
first_20_water_system_data = water_system_api.get_table_first_n_data(n=20, print_to_console=False)
sdwis_drink_water.utils_for_jupyter_print.print_result_data(first_20_water_system_data)
We can use this method to quickly access the full datasheet, then export to a file
water_system_data_number = water_system_api.get_table_data_number()
all_water_system = water_system_api.get_table_first_n_data(n=water_system_data_number, multi_threads=True, print_to_console=False)
# You can also just set n to a big number
# all_water_system = water_system_api.get_table_first_n_data(n=99999999, multi_threads=True, print_to_console=False)
all_water_system.export_data("./output_files/all_water_system.xlsx", format_type="xlsx")
WATER_SYSTEM
The data number in the database provided by the API is 428808. Your request number exceeds this limit, please note.
Fetching Data by multi_threads_mode: 0%| | 0/43 [00:00<?, ?it/s]
0%| | 0/43 [00:00<?, ?it/s]
28%|██▊ | 12/43 [00:02<00:05, 5.97it/s]
30%|███ | 13/43 [00:04<00:10, 2.75it/s]
33%|███▎ | 14/43 [00:06<00:16, 1.73it/s]
35%|███▍ | 15/43 [00:08<00:22, 1.24it/s]
37%|███▋ | 16/43 [00:10<00:27, 1.03s/it]
40%|███▉ | 17/43 [00:12<00:32, 1.23s/it]
42%|████▏ | 18/43 [00:14<00:35, 1.41s/it]
44%|████▍ | 19/43 [00:16<00:37, 1.56s/it]
47%|████▋ | 20/43 [00:18<00:38, 1.68s/it]
49%|████▉ | 21/43 [00:20<00:38, 1.77s/it]
51%|█████ | 22/43 [00:22<00:38, 1.83s/it]
53%|█████▎ | 23/43 [00:24<00:37, 1.88s/it]
56%|█████▌ | 24/43 [00:26<00:36, 1.93s/it]
58%|█████▊ | 25/43 [00:28<00:35, 1.95s/it]
60%|██████ | 26/43 [00:30<00:33, 1.97s/it]
63%|██████▎ | 27/43 [00:32<00:31, 1.98s/it]
65%|██████▌ | 28/43 [00:34<00:29, 1.99s/it]
67%|██████▋ | 29/43 [00:36<00:27, 1.99s/it]
70%|██████▉ | 30/43 [00:38<00:25, 2.00s/it]
72%|███████▏ | 31/43 [00:40<00:24, 2.00s/it]
74%|███████▍ | 32/43 [00:42<00:22, 2.01s/it]
77%|███████▋ | 33/43 [00:44<00:20, 2.01s/it]
79%|███████▉ | 34/43 [00:46<00:18, 2.01s/it]
81%|████████▏ | 35/43 [00:48<00:16, 2.01s/it]
84%|████████▎ | 36/43 [00:50<00:14, 2.01s/it]
86%|████████▌ | 37/43 [00:52<00:12, 2.01s/it]
88%|████████▊ | 38/43 [00:54<00:10, 2.01s/it]
91%|█████████ | 39/43 [00:56<00:08, 2.01s/it]
93%|█████████▎| 40/43 [00:58<00:06, 2.01s/it]
95%|█████████▌| 41/43 [01:00<00:04, 2.01s/it]
98%|█████████▊| 42/43 [01:02<00:02, 2.01s/it]
100%|██████████| 43/43 [01:16<00:00, 1.78s/it]
Fetching Data by multi_threads_mode: 0%| | 0/43 [01:16<?, ?it/s]
Data is successfully exported to ./output_files/all_water_system.xlsx!
fetch data from Water_System table by conditions
columbia_water_systems = water_system_api.get_water_system_data_by_conditions("city_name=columbia", print_to_console=False)
columbia_water_systems.count()
487
get summarize data number according to epa_region
water_system_api.summarize_water_system_data_by_epa_region(multi_threads=True, print_to_console=True)
Fetching the total amount of data by EPA region from the WATER_SYSTEM data table: 100%|██████████| 10/10 [00:05<00:00, 1.91it/s]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
┌─────────────────┬─────────────────────────────┐
│ EPA_REGION_ID │ WATER_SYSTEM_TOTAL_NUMBER │
├─────────────────┼─────────────────────────────┤
│ EPA_REGION_1 │ 45508 │
├─────────────────┼─────────────────────────────┤
│ EPA_REGION_2 │ 67410 │
├─────────────────┼─────────────────────────────┤
│ EPA_REGION_3 │ 24074 │
├─────────────────┼─────────────────────────────┤
│ EPA_REGION_4 │ 33898 │
├─────────────────┼─────────────────────────────┤
│ EPA_REGION_5 │ 30931 │
├─────────────────┼─────────────────────────────┤
│ EPA_REGION_6 │ 15799 │
├─────────────────┼─────────────────────────────┤
│ EPA_REGION_7 │ 43325 │
├─────────────────┼─────────────────────────────┤
│ EPA_REGION_8 │ 25563 │
├─────────────────┼─────────────────────────────┤
│ EPA_REGION_9 │ 17077 │
├─────────────────┼─────────────────────────────┤
│ EPA_REGION_10 │ 125223 │
└─────────────────┴─────────────────────────────┘
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
get data according to epa_region and export
water_system_epa_1 = water_system_api.get_water_system_by_epa_region(epa_region=1, print_to_console=True)
# export data to xlsx
water_system_epa_1.export_data("./output_files/Water System of EPA region 1.xlsx", format_type="xlsx")
IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.
Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)
Data is successfully exported to ./output_files/Water System of EPA region 1.xlsx!
Data Handle
# print all keys to determine how to use this data
sdwis_drink_water.utils_for_jupyter_print.print_columns(list(water_system_epa_1.get_all_keys()))
# find by condition "is_grant_eligible_ind==Y"
is_grant_eligible_ind = water_system_epa_1.find_by_condition("is_grant_eligible_ind=Y")
print(is_grant_eligible_ind.count())
# find by condition "is_wholesaler_ind==Y"
is_wholesaler_ind = water_system_epa_1.find_by_condition("is_wholesaler_ind=Y")
print(is_wholesaler_ind.count())
# find by condition "is_grant_eligible_ind==Y" and "is_wholesaler_ind==Y"
is_grant_eligible_ind_and_is_wholesaler_ind = is_grant_eligible_ind.find_by_condition("is_wholesaler_ind=Y")
print(is_grant_eligible_ind_and_is_wholesaler_ind.count())
# export data to csv
is_grant_eligible_ind_and_is_wholesaler_ind.export_data("./output_files/water_system_epa_1_filtered_result.xlsx",
format_type="xlsx")
10112
41
25
Data is successfully exported to ./output_files/water_system_epa_1_filtered_result.xlsx!