Reference Guide¶
metpyqc.range¶
Range tests
- metpyqc.range.range_all(x, min_val, max_val, flag_val)¶
Check if observed values are outside the instrumental limits.
- Parameters
x (pd.DataFrame) – Dataframe to be tested (time, stations)
min_val (float) – lower limit of instrumental range.
max_val (float) – upper limit of instrumental range.
flag_val (int) – integer representing flag values to be associated to erroneous values.
- Returns
flag (pd.DataFrame) – Dataframe with flags identifying values which fail the test.
res (pd.DataFrame) – Dataframe with quantitative residuals from prescribed limits: positive values indicates wrong values.
- metpyqc.range.range_seas(x, min_vals, max_vals, flag_val)¶
Check if observed values are outside the seasonal climatological limits.
- Parameters
x (pd.DataFrame) – Dataframe to be tested (time, stations)
min_vals (array_like, shape(4,)) – Array of lower limits for each season in this order: DJF (December, January, February), MAM (March, April, May), JJA (June, July, August), SON (September, October, November).
max_vals (array_like, shape(4,)) – Array of upper limits for each season in this order: DJF (December, January, February), MAM (March, April, May), JJA (June, July, August), SON (September, October, November).
flag_val (int) – integer representing flag values to be associated to erroneous values.
- Returns
flag (pd.DataFrame) – Dataframe with flags identifying values which fail the test.
res (pd.DataFrame) – Dataframe with quantitative residuals from prescribed limits: positive values indicates wrong values.
metpyqc.temporal¶
Temporal consistency tests
- metpyqc.temporal.isolated(x, n, flag_val)¶
Check if an observation of rainfall (or other discrete variables) is isolated with respect to the adjacent measures (in time).
- Parameters
x (pd.DataFrame) – Dataframe to be tested (time, stations)
n (int) – Number of previous and following time steps to be considered in the test. The value must be bigger than 1.
flag_val (int) – integer representing flag values to be associated to suspect values.
- Returns
flag (pd.DataFrame) – Dataframe with flags identifying values which fail the test.
res (pd.DataFrame) – Dataframe with quantitative residuals representing the difference between the selected observation and the sum of the adjacent ones. As this value increases, the selected observation is more isolated with respect to neighbors.
Warning
This function must be used with dataframes with homogeneous temporal resolution on the temporal index. Missing dates have to be filled with np.nan values.
- metpyqc.temporal.persistence_noc(x, n, flag_val)¶
Not Observed Change (NOC) Persistence test to check minimum variability of a certain variable with respect to the previous n time steps.
- Parameters
x (pd.DataFrame) – Dataframe to be tested (time, stations)
n (int) – Number of previous time steps to be considered in the test. The value must be bigger than 1.
flag_val (int) – integer representing flag values to be associated to suspect values.
- Returns
flag (pd.DataFrame) – Dataframe with flags identifying values which fail the test.
res (pd.DataFrame) – Dataframe with quantitative residuals representing the maximum absolute difference between the selected observation and the preceding ones. As this value tends to zero, the selected observation indicates temporal persistence.
Warning
This function must be used with dataframes with homogeneous temporal resolution on the temporal index. Missing dates have to be filled with np.nan values.
- metpyqc.temporal.persistence_var(x, window, perc_min, method, var_val_min, flag_val)¶
Minimum Variability Persistence test to check minimum allowed variability of a certain variable with respect to a certain time window.
- Parameters
x (pd.DataFrame) – Dataframe to be tested (time, stations)
window (int) – Number of hours over which evaluate temporal variability
perc_min (int) – Minimum percentage of valid observations (not missing) in order to calculate temporal variability
method ({'STD', 'MAX_MIN', 'IQR'}) – Method for calculating the temporal variability within the desired time window: standard deviation (STD), difference between absolute maximum and minimum values (MAX_MIN), interquartile range (IQR).
var_val_min (float) – Minimum allowed variability for the selected variable within the defined time window
flag_val (int) – integer representing flag values to be associated to suspect values.
- Returns
flag (pd.DataFrame) – Dataframe with flags identifying values which fail the test.
res (pd.DataFrame) – Dataframe with quantitative residuals representing the difference between the actual variability and the minimum allowed variability. Positive values indicates suspect observations.
- Raises
Exception – If the selected method is neither ‘STD’, ‘MAX_MIN’ nor ‘IQR’.
- metpyqc.temporal.step_all(x, step_val_susp, step_val_wrong, flag_val_susp, flag_val_wrong)¶
Check the difference between consecutive observations.
- Parameters
x (pd.DataFrame) – Dataframe to be tested (time, stations)
step_val_susp (float) – step limit for suspect values.
step_val_wrong (float) – step limit for wrong values.
flag_val_susp (int) – integer representing flag values to be associated to suspect values.
flag_val_wrong (int) – integer representing flag values to be associated to wrong values.
- Returns
flag (pd.DataFrame) – Dataframe with flags identifying values which fail the test.
res (pd.DataFrame) – Dataframe with quantitative residuals from prescribed limits: positive values indicates suspect or wrong values.
Warning
This function must be used with dataframes with homogeneous temporal resolution on the temporal index. Missing dates have to be filled with np.nan values.
- metpyqc.temporal.step_seas(x, step_vals_susp, step_vals_wrong, flag_val_susp, flag_val_wrong)¶
Check the difference between consecutive observations by considering different limits for each season.
- Parameters
x (pd.DataFrame) – Dataframe to be tested (time, stations)
step_vals_susp (array_like, shape(4,)) – Array of step limits for suspect values for each season in this order: DJF (December, January, February),MAM (March, April, May), JJA (June, July, August), SON (September, October, November).
step_vals_wrong (array_like, shape(4,)) – Array of step limits for wrong values for each season in this order: DJF (December, January, February),MAM (March, April, May), JJA (June, July, August), SON (September, October, November).
flag_val_susp (int) – integer representing flag values to be associated to suspect values.
flag_val_wrong (int) – integer representing flag values to be associated to wrong values.
- Returns
flag (pd.DataFrame) – Dataframe with flags identifying values which fail the test.
res (pd.DataFrame) – Dataframe with quantitative residuals from prescribed limits: positive values indicates suspect or wrong values.
Warning
This function must be used with dataframes with homogeneous temporal resolution on the temporal index. Missing dates have to be filled with np.nan values.
metpyqc.internal¶
Internal Consistency Tests
- metpyqc.internal.dewpoint_test(temp, rh, flag_val)¶
Check if derived dewpoint temperature is smaller or equal than temperature.
- Parameters
temp (pd.DataFrame) – Temperature dataframe to be tested (time, stations_temp) in degree Celsius
rh (pd.DataFrame) – Relative humidity dataframe to be tested (time, stations_rh) in percentage
flag_val (int) – integer representing flag values to be associated to erroneous values.
- Returns
flag_temp (pd.DataFrame) – Dataframe with flags identifying values which temperature value fail the test.
flag_rh (pd.DataFrame) –
- Dataframe with flags identifying values which relative humidity
value fail the test.
res (pd.DataFrame) – Dataframe with quantitative residuals from prescribed limits: positive values indicates wrong values (use for temperature dataframe).
temp_dew (pd.DataFrame) – Dew-point temperature in degree Celsius
Warning
This test is applied only to stations measuring both temperature and relative humidity.
- metpyqc.internal.heated_raingauge(prec, temp, ind_heater, flag_val)¶
Check when precipitation occurs with freezing temperatures (below 0 degree Celsius). Flag values when the raingauge is not heated.
- Parameters
prec (pd.DataFrame) – Precipitation dataframe to be tested (time, stations_prec) in mm
temp (pd.DataFrame) – Temperature dataframe to be tested (time, stations_temp) in degree Celsius.
ind_heater (list) – List of indexes indicating which raingauge is heated.
flag_val (int) – integer representing flag values to be associated to erroneous values.
- Returns
flag – Dataframe with flags identifying precipitation values which fail the test.
- Return type
pd.DataFrame
Warning
This test is applied only to stations measuring both precipitation and temperature.
- metpyqc.internal.humidity_prec(rh, prec, rh_min, flag_val)¶
Check when relative humidity is below a certain value (rh_min), while precipitation is occurring.
- Parameters
rh (pd.DataFrame) – Relative humidity dataframe to be tested (time, stations_rh) in percentage
prec (pd.DataFrame) – Precipitation dataframe to be tested (time, stations_prec) in mm
rh_min (float) – Minimum acceptable value of relative humidity during a precipitation event.
flag_val (int) – integer representing flag values to be associated to erroneous values.
- Returns
flag – Dataframe with flags identifying relative humidity values which fail the test.
- Return type
pd.DataFrame
Warning
This test is applied only to stations measuring both precipitation and relative humidity.
- metpyqc.internal.leafwet_humidity(lw, rh, rh_min, flag_val)¶
Check when relative humidity is below a certain value (rh_min), while leaf wetness duration is bigger than 0.
- Parameters
lw (pd.DataFrame) – Leaf wetness duration dataframe to be tested (time, stations_lw) in min
rh (pd.DataFrame) – Relative humidity dataframe to be tested (time, stations_rh) in percentage
rh_min (float) – Minimum acceptable value of relative humidity for observing leaf wetness duration.
flag_val (int) – integer representing flag values to be associated to erroneous values.
- Returns
flag – Dataframe with flags identifying leaf wetness values which fail the test.
- Return type
pd.DataFrame
Warning
This test is applied only to stations measuring both leaf wetness and relative humidity.
- metpyqc.internal.maxmin(x, x_max, x_min, flag_val)¶
Check if averaged observed values lies between their maximum and minimum corresponding observations.
- Parameters
x (pd.DataFrame) – Dataframes to be tested (time, stations): average, maximum and minimum values respectively
x_max (pd.DataFrame) – Dataframes to be tested (time, stations): average, maximum and minimum values respectively
x_min (pd.DataFrame) – Dataframes to be tested (time, stations): average, maximum and minimum values respectively
flag_val (int) – integer representing flag values to be associated to erroneous values.
- Returns
flag (pd.DataFrame) – Dataframe with flags identifying values which fail the test.
res (pd.DataFrame) – Dataframe with quantitative residuals from prescribed limits: positive values indicates wrong values.
- metpyqc.internal.snow_grass(snd, snd_toll, start_grass, end_grass, flag_val)¶
Check when snow depth exceed tolerance values during grass growth period.
- Parameters
snd (pd.DataFrame) – Snow depth dataframe to be tested (time, stations) in cm
snd_toll (float) – Snow depth instrumental tolerance in cm
start_grass (int) – Month number identifying the starting period of grass growth.
end_grass (int) – Month number identifying the ending period of grass growth.
flag_val (int) – integer representing flag values to be associated to erroneous values.
- Returns
flag – Dataframe with flags identifying precipitation values which fail the test.
- Return type
pd.DataFrame
- metpyqc.internal.wspeed_wdir(ws, wd, flag_val)¶
Check if wind speed and direction are consistent: if wind speed is null, then also wind direction should be null and viceversa. Residuals cannot be evaluated from this logical condition.
- Parameters
ws (pd.DataFrame) – Wind speed dataframe to be tested (time, stations) in m/s
wd (pd.DataFrame) – Wind direction dataframe to be tested (time, stations) in degree north.
flag_val (int) – integer representing flag values to be associated to erroneous values.
- Returns
flag – Dataframe with flags identifying values which fail the test.
- Return type
pd.DataFrame
metpyqc.spatial¶
Spatial Consistency Tests
- metpyqc.spatial.hubbard_consistency(lat, lon, x, start_test, end_test, n_max, t_max, search_radius, min_neigh, missing_perc, f, flag_val)¶
Hubbard spatial weighted regression analysis.
- Parameters
lat (array_like, shape(n,)) – Array of latitudes in decimal degrees
lon (array_like, shape(n,)) – Array of longitudes in decimal degrees
x (pd.DataFrame, shape(t,n)) – Pandas dataframe of observations, where t is time and n is the number of stations
start_test (string) – Datetime string indicating when to start testing
end_test (string) – Datetime string indicating when to end testing
n_max (int) – Maximum number of best fit stations to use for the estimate
t_max (int) – Number of time steps to be considered in the regression analysis (even number)
search_radius (float) – Radius for Hubbard analysis in decimal degrees
min_neigh (int) – Minimum number of neighbors to find the estimate
missing_perc (int) – Maximum percentage of missing data to perform regression
f (int) – Factor multiplying standard deviation for calculating the acceptable range for valid observations.
flag_val (int) – Integer representing flag values to be associated to erroneous values.
- Returns
df_x_est (pd.DataFrame, shape(t,n)) – Estimated observations, filled with np.nan values where estimate is not possible
df_std_est (pd.DataFrame, shape(t,n)) – Standard deviation from the estimated observations, filled with np.nan values where estimate is not possible
flag (pd.DataFrame, shape(t,n)) – Dataframe with flags identifying values which fail the test.
res (pd.DataFrame, shape(t,n)) – Dataframe with quantitative residuals from prescribed limits: positive values indicates wrong values.
Notes
The spatial weighted regression test is based on the algorithm proposed by [Hubbard2005]. Firstly for each reference station \((0)\) the neighbour stations \((n)\) inside a certain search_radius are founded. This search radius should be set close to the average spacing of the observations and large enough to have at least one neighbour for each station.
Once the neighbours have been established, if the number of missing values in each series is lower than missing_perc and n \(\ge\) n_max , a linear regression is computed between their values \(x(t,n)\) over all the selected time steps t_max and the reference station values \(x(t,0)\), in order to find a first estimate \(x^*_n(0)\) of \(x(0)\) that should be consistent with \(x(n)\) at each time step.
Then the root mean square error between the reference values \(x(0)\) and the estimated values \(x^*_n(0)\) from the regression line with the neighbor station \(n\) (correspondent to the sample standard deviation of the residuals \(\sigma^*_n\) ) is evaluated in order to find a measure of the stations correlation:
\[\sigma^*_n(0)=\sqrt{\frac{1}{t_{max}}\sum_{t=1}^{t_{max}} \big[\underbrace{x(t,0)-x^*_n(t,0)}_{\text{Residuals}}\big]^2}\]This error characterizing each neighbour station is used as weight in the final estimate of the reference value \(x^*(t,0)\) and reference standard deviation \(\sigma^*(0)\) from the surrounding stations at each instant t:
\[\begin{split}x^*(t,0)=\frac{\sum_{n=1}^{n_{max}} (x^*_n)^2/(\sigma^*_n)^2}{\sum_{n=1}^{n_{max}} 1/\sigma_n^2} \\ \sigma^{*2}(0)= \frac{n_{max}}{\sum_{n=1}^{n_{max}} 1/(\sigma^*_n)^2}\end{split}\]Finally at each time step a tolerance interval is established by considering a constant factor f and the spatial consistency is verified by ensuring that:
\[x^*(t,0)-f\sigma^*(0) < x(t,0) < x^*(t,0)+f\sigma^*(0)\]This procedure is repeated for each station and each time step. The final estimates \(x^*(t,n)\) and reference standard deviation \(\sigma^*(n)\) are given as results on the output dataframes df_x_est and df_std_est, respectively.
References
- Hubbard2005
Hubbard, K. G., et al. “Performance of quality assurance procedures for an applied climate information system. “Journal of Atmospheric and Oceanic Technology 22.1 (2005): 105-112.
metpyqc.reconstruct¶
Reconstruction techniques for data gap-filling
metpyqc.calculate¶
Useful routines for quality control tests
- metpyqc.calculate.dewpoint(temperature, relative_humidity)¶
Calculate dewpoint temperature from temperature and relative humidity.
- Parameters
temperature (pd.DataFrame) – Temperature in degree Celsius
relative_humidity (pd.DataFrame) – Relative humidity in percentage
- Returns
dew – Dewpoint temperature in degree Celsius
- Return type
pd.DataFrame
Notes
The formula used to calculate dewpoint temperature is obtained by inverting the Bolton’s formula [Emanuel1994]:
\[T_{d} \simeq \frac{243.5}{(\frac{17.67}{ln (e/6.112)}) -1}.\]where \(e = RH e^{*}\), with e being the actual vapor pressure, RH the relative humidity and \(e^{*}\) the saturation vapor pressure as obtained from the [Bolton1980] formula:
\[e^{x} = 6.112 exp (\frac{17.67 T}{T +243.5})\]References
- Emanuel1994
Emanuel, Kerry A. Atmospheric convection. Oxford University Press on Demand, 1994.
- Bolton1980
Bolton, David. The computation of equivalent potential temperature.Monthly weather review, 1980, 108.7: 1046-1053.
- metpyqc.calculate.find_neighbors(points, xi, r)¶
Find neighbors point inside a search radius
- Parameters
points (array_like, shape(N,2)) – (lat,lon) of surrounding points in decimal degrees. Shape (N,2)
xi (array_like, shape(1,2)) – (lat,lon) of center of ball in decimal degrees. Shape (1,2)
r (float) – Search radius in decimal degrees
- Returns
indices – Indices of neighbors points inside the search radius
- Return type
array_like,