Reference Guide

metpyqc.range

Range tests

metpyqc.range.range_all(x, min_val, max_val, flag_val)

Check if observed values are outside the instrumental limits.

Parameters
  • x (pd.DataFrame) – Dataframe to be tested (time, stations)

  • min_val (float) – lower limit of instrumental range.

  • max_val (float) – upper limit of instrumental range.

  • flag_val (int) – integer representing flag values to be associated to erroneous values.

Returns

  • flag (pd.DataFrame) – Dataframe with flags identifying values which fail the test.

  • res (pd.DataFrame) – Dataframe with quantitative residuals from prescribed limits: positive values indicates wrong values.

metpyqc.range.range_seas(x, min_vals, max_vals, flag_val)

Check if observed values are outside the seasonal climatological limits.

Parameters
  • x (pd.DataFrame) – Dataframe to be tested (time, stations)

  • min_vals (array_like, shape(4,)) – Array of lower limits for each season in this order: DJF (December, January, February), MAM (March, April, May), JJA (June, July, August), SON (September, October, November).

  • max_vals (array_like, shape(4,)) – Array of upper limits for each season in this order: DJF (December, January, February), MAM (March, April, May), JJA (June, July, August), SON (September, October, November).

  • flag_val (int) – integer representing flag values to be associated to erroneous values.

Returns

  • flag (pd.DataFrame) – Dataframe with flags identifying values which fail the test.

  • res (pd.DataFrame) – Dataframe with quantitative residuals from prescribed limits: positive values indicates wrong values.

metpyqc.temporal

Temporal consistency tests

metpyqc.temporal.isolated(x, n, flag_val)

Check if an observation of rainfall (or other discrete variables) is isolated with respect to the adjacent measures (in time).

Parameters
  • x (pd.DataFrame) – Dataframe to be tested (time, stations)

  • n (int) – Number of previous and following time steps to be considered in the test. The value must be bigger than 1.

  • flag_val (int) – integer representing flag values to be associated to suspect values.

Returns

  • flag (pd.DataFrame) – Dataframe with flags identifying values which fail the test.

  • res (pd.DataFrame) – Dataframe with quantitative residuals representing the difference between the selected observation and the sum of the adjacent ones. As this value increases, the selected observation is more isolated with respect to neighbors.

Warning

This function must be used with dataframes with homogeneous temporal resolution on the temporal index. Missing dates have to be filled with np.nan values.

metpyqc.temporal.persistence_noc(x, n, flag_val)

Not Observed Change (NOC) Persistence test to check minimum variability of a certain variable with respect to the previous n time steps.

Parameters
  • x (pd.DataFrame) – Dataframe to be tested (time, stations)

  • n (int) – Number of previous time steps to be considered in the test. The value must be bigger than 1.

  • flag_val (int) – integer representing flag values to be associated to suspect values.

Returns

  • flag (pd.DataFrame) – Dataframe with flags identifying values which fail the test.

  • res (pd.DataFrame) – Dataframe with quantitative residuals representing the maximum absolute difference between the selected observation and the preceding ones. As this value tends to zero, the selected observation indicates temporal persistence.

Warning

This function must be used with dataframes with homogeneous temporal resolution on the temporal index. Missing dates have to be filled with np.nan values.

metpyqc.temporal.persistence_var(x, window, perc_min, method, var_val_min, flag_val)

Minimum Variability Persistence test to check minimum allowed variability of a certain variable with respect to a certain time window.

Parameters
  • x (pd.DataFrame) – Dataframe to be tested (time, stations)

  • window (int) – Number of hours over which evaluate temporal variability

  • perc_min (int) – Minimum percentage of valid observations (not missing) in order to calculate temporal variability

  • method ({'STD', 'MAX_MIN', 'IQR'}) – Method for calculating the temporal variability within the desired time window: standard deviation (STD), difference between absolute maximum and minimum values (MAX_MIN), interquartile range (IQR).

  • var_val_min (float) – Minimum allowed variability for the selected variable within the defined time window

  • flag_val (int) – integer representing flag values to be associated to suspect values.

Returns

  • flag (pd.DataFrame) – Dataframe with flags identifying values which fail the test.

  • res (pd.DataFrame) – Dataframe with quantitative residuals representing the difference between the actual variability and the minimum allowed variability. Positive values indicates suspect observations.

Raises

Exception – If the selected method is neither ‘STD’, ‘MAX_MIN’ nor ‘IQR’.

metpyqc.temporal.step_all(x, step_val_susp, step_val_wrong, flag_val_susp, flag_val_wrong)

Check the difference between consecutive observations.

Parameters
  • x (pd.DataFrame) – Dataframe to be tested (time, stations)

  • step_val_susp (float) – step limit for suspect values.

  • step_val_wrong (float) – step limit for wrong values.

  • flag_val_susp (int) – integer representing flag values to be associated to suspect values.

  • flag_val_wrong (int) – integer representing flag values to be associated to wrong values.

Returns

  • flag (pd.DataFrame) – Dataframe with flags identifying values which fail the test.

  • res (pd.DataFrame) – Dataframe with quantitative residuals from prescribed limits: positive values indicates suspect or wrong values.

Warning

This function must be used with dataframes with homogeneous temporal resolution on the temporal index. Missing dates have to be filled with np.nan values.

metpyqc.temporal.step_seas(x, step_vals_susp, step_vals_wrong, flag_val_susp, flag_val_wrong)

Check the difference between consecutive observations by considering different limits for each season.

Parameters
  • x (pd.DataFrame) – Dataframe to be tested (time, stations)

  • step_vals_susp (array_like, shape(4,)) – Array of step limits for suspect values for each season in this order: DJF (December, January, February),MAM (March, April, May), JJA (June, July, August), SON (September, October, November).

  • step_vals_wrong (array_like, shape(4,)) – Array of step limits for wrong values for each season in this order: DJF (December, January, February),MAM (March, April, May), JJA (June, July, August), SON (September, October, November).

  • flag_val_susp (int) – integer representing flag values to be associated to suspect values.

  • flag_val_wrong (int) – integer representing flag values to be associated to wrong values.

Returns

  • flag (pd.DataFrame) – Dataframe with flags identifying values which fail the test.

  • res (pd.DataFrame) – Dataframe with quantitative residuals from prescribed limits: positive values indicates suspect or wrong values.

Warning

This function must be used with dataframes with homogeneous temporal resolution on the temporal index. Missing dates have to be filled with np.nan values.

metpyqc.internal

Internal Consistency Tests

metpyqc.internal.dewpoint_test(temp, rh, flag_val)

Check if derived dewpoint temperature is smaller or equal than temperature.

Parameters
  • temp (pd.DataFrame) – Temperature dataframe to be tested (time, stations_temp) in degree Celsius

  • rh (pd.DataFrame) – Relative humidity dataframe to be tested (time, stations_rh) in percentage

  • flag_val (int) – integer representing flag values to be associated to erroneous values.

Returns

  • flag_temp (pd.DataFrame) – Dataframe with flags identifying values which temperature value fail the test.

  • flag_rh (pd.DataFrame) –

    Dataframe with flags identifying values which relative humidity

    value fail the test.

  • res (pd.DataFrame) – Dataframe with quantitative residuals from prescribed limits: positive values indicates wrong values (use for temperature dataframe).

  • temp_dew (pd.DataFrame) – Dew-point temperature in degree Celsius

Warning

This test is applied only to stations measuring both temperature and relative humidity.

metpyqc.internal.heated_raingauge(prec, temp, ind_heater, flag_val)

Check when precipitation occurs with freezing temperatures (below 0 degree Celsius). Flag values when the raingauge is not heated.

Parameters
  • prec (pd.DataFrame) – Precipitation dataframe to be tested (time, stations_prec) in mm

  • temp (pd.DataFrame) – Temperature dataframe to be tested (time, stations_temp) in degree Celsius.

  • ind_heater (list) – List of indexes indicating which raingauge is heated.

  • flag_val (int) – integer representing flag values to be associated to erroneous values.

Returns

flag – Dataframe with flags identifying precipitation values which fail the test.

Return type

pd.DataFrame

Warning

This test is applied only to stations measuring both precipitation and temperature.

metpyqc.internal.humidity_prec(rh, prec, rh_min, flag_val)

Check when relative humidity is below a certain value (rh_min), while precipitation is occurring.

Parameters
  • rh (pd.DataFrame) – Relative humidity dataframe to be tested (time, stations_rh) in percentage

  • prec (pd.DataFrame) – Precipitation dataframe to be tested (time, stations_prec) in mm

  • rh_min (float) – Minimum acceptable value of relative humidity during a precipitation event.

  • flag_val (int) – integer representing flag values to be associated to erroneous values.

Returns

flag – Dataframe with flags identifying relative humidity values which fail the test.

Return type

pd.DataFrame

Warning

This test is applied only to stations measuring both precipitation and relative humidity.

metpyqc.internal.leafwet_humidity(lw, rh, rh_min, flag_val)

Check when relative humidity is below a certain value (rh_min), while leaf wetness duration is bigger than 0.

Parameters
  • lw (pd.DataFrame) – Leaf wetness duration dataframe to be tested (time, stations_lw) in min

  • rh (pd.DataFrame) – Relative humidity dataframe to be tested (time, stations_rh) in percentage

  • rh_min (float) – Minimum acceptable value of relative humidity for observing leaf wetness duration.

  • flag_val (int) – integer representing flag values to be associated to erroneous values.

Returns

flag – Dataframe with flags identifying leaf wetness values which fail the test.

Return type

pd.DataFrame

Warning

This test is applied only to stations measuring both leaf wetness and relative humidity.

metpyqc.internal.maxmin(x, x_max, x_min, flag_val)

Check if averaged observed values lies between their maximum and minimum corresponding observations.

Parameters
  • x (pd.DataFrame) – Dataframes to be tested (time, stations): average, maximum and minimum values respectively

  • x_max (pd.DataFrame) – Dataframes to be tested (time, stations): average, maximum and minimum values respectively

  • x_min (pd.DataFrame) – Dataframes to be tested (time, stations): average, maximum and minimum values respectively

  • flag_val (int) – integer representing flag values to be associated to erroneous values.

Returns

  • flag (pd.DataFrame) – Dataframe with flags identifying values which fail the test.

  • res (pd.DataFrame) – Dataframe with quantitative residuals from prescribed limits: positive values indicates wrong values.

metpyqc.internal.snow_grass(snd, snd_toll, start_grass, end_grass, flag_val)

Check when snow depth exceed tolerance values during grass growth period.

Parameters
  • snd (pd.DataFrame) – Snow depth dataframe to be tested (time, stations) in cm

  • snd_toll (float) – Snow depth instrumental tolerance in cm

  • start_grass (int) – Month number identifying the starting period of grass growth.

  • end_grass (int) – Month number identifying the ending period of grass growth.

  • flag_val (int) – integer representing flag values to be associated to erroneous values.

Returns

flag – Dataframe with flags identifying precipitation values which fail the test.

Return type

pd.DataFrame

metpyqc.internal.wspeed_wdir(ws, wd, flag_val)

Check if wind speed and direction are consistent: if wind speed is null, then also wind direction should be null and viceversa. Residuals cannot be evaluated from this logical condition.

Parameters
  • ws (pd.DataFrame) – Wind speed dataframe to be tested (time, stations) in m/s

  • wd (pd.DataFrame) – Wind direction dataframe to be tested (time, stations) in degree north.

  • flag_val (int) – integer representing flag values to be associated to erroneous values.

Returns

flag – Dataframe with flags identifying values which fail the test.

Return type

pd.DataFrame

metpyqc.spatial

Spatial Consistency Tests

metpyqc.spatial.hubbard_consistency(lat, lon, x, start_test, end_test, n_max, t_max, search_radius, min_neigh, missing_perc, f, flag_val)

Hubbard spatial weighted regression analysis.

Parameters
  • lat (array_like, shape(n,)) – Array of latitudes in decimal degrees

  • lon (array_like, shape(n,)) – Array of longitudes in decimal degrees

  • x (pd.DataFrame, shape(t,n)) – Pandas dataframe of observations, where t is time and n is the number of stations

  • start_test (string) – Datetime string indicating when to start testing

  • end_test (string) – Datetime string indicating when to end testing

  • n_max (int) – Maximum number of best fit stations to use for the estimate

  • t_max (int) – Number of time steps to be considered in the regression analysis (even number)

  • search_radius (float) – Radius for Hubbard analysis in decimal degrees

  • min_neigh (int) – Minimum number of neighbors to find the estimate

  • missing_perc (int) – Maximum percentage of missing data to perform regression

  • f (int) – Factor multiplying standard deviation for calculating the acceptable range for valid observations.

  • flag_val (int) – Integer representing flag values to be associated to erroneous values.

Returns

  • df_x_est (pd.DataFrame, shape(t,n)) – Estimated observations, filled with np.nan values where estimate is not possible

  • df_std_est (pd.DataFrame, shape(t,n)) – Standard deviation from the estimated observations, filled with np.nan values where estimate is not possible

  • flag (pd.DataFrame, shape(t,n)) – Dataframe with flags identifying values which fail the test.

  • res (pd.DataFrame, shape(t,n)) – Dataframe with quantitative residuals from prescribed limits: positive values indicates wrong values.

Notes

The spatial weighted regression test is based on the algorithm proposed by [Hubbard2005]. Firstly for each reference station \((0)\) the neighbour stations \((n)\) inside a certain search_radius are founded. This search radius should be set close to the average spacing of the observations and large enough to have at least one neighbour for each station.

Once the neighbours have been established, if the number of missing values in each series is lower than missing_perc and n \(\ge\) n_max , a linear regression is computed between their values \(x(t,n)\) over all the selected time steps t_max and the reference station values \(x(t,0)\), in order to find a first estimate \(x^*_n(0)\) of \(x(0)\) that should be consistent with \(x(n)\) at each time step.

Then the root mean square error between the reference values \(x(0)\) and the estimated values \(x^*_n(0)\) from the regression line with the neighbor station \(n\) (correspondent to the sample standard deviation of the residuals \(\sigma^*_n\) ) is evaluated in order to find a measure of the stations correlation:

\[\sigma^*_n(0)=\sqrt{\frac{1}{t_{max}}\sum_{t=1}^{t_{max}} \big[\underbrace{x(t,0)-x^*_n(t,0)}_{\text{Residuals}}\big]^2}\]

This error characterizing each neighbour station is used as weight in the final estimate of the reference value \(x^*(t,0)\) and reference standard deviation \(\sigma^*(0)\) from the surrounding stations at each instant t:

\[\begin{split}x^*(t,0)=\frac{\sum_{n=1}^{n_{max}} (x^*_n)^2/(\sigma^*_n)^2}{\sum_{n=1}^{n_{max}} 1/\sigma_n^2} \\ \sigma^{*2}(0)= \frac{n_{max}}{\sum_{n=1}^{n_{max}} 1/(\sigma^*_n)^2}\end{split}\]

Finally at each time step a tolerance interval is established by considering a constant factor f and the spatial consistency is verified by ensuring that:

\[x^*(t,0)-f\sigma^*(0) < x(t,0) < x^*(t,0)+f\sigma^*(0)\]

This procedure is repeated for each station and each time step. The final estimates \(x^*(t,n)\) and reference standard deviation \(\sigma^*(n)\) are given as results on the output dataframes df_x_est and df_std_est, respectively.

References

Hubbard2005

Hubbard, K. G., et al. “Performance of quality assurance procedures for an applied climate information system. “Journal of Atmospheric and Oceanic Technology 22.1 (2005): 105-112.

metpyqc.reconstruct

Reconstruction techniques for data gap-filling

metpyqc.calculate

Useful routines for quality control tests

metpyqc.calculate.dewpoint(temperature, relative_humidity)

Calculate dewpoint temperature from temperature and relative humidity.

Parameters
  • temperature (pd.DataFrame) – Temperature in degree Celsius

  • relative_humidity (pd.DataFrame) – Relative humidity in percentage

Returns

dew – Dewpoint temperature in degree Celsius

Return type

pd.DataFrame

Notes

The formula used to calculate dewpoint temperature is obtained by inverting the Bolton’s formula [Emanuel1994]:

\[T_{d} \simeq \frac{243.5}{(\frac{17.67}{ln (e/6.112)}) -1}.\]

where \(e = RH e^{*}\), with e being the actual vapor pressure, RH the relative humidity and \(e^{*}\) the saturation vapor pressure as obtained from the [Bolton1980] formula:

\[e^{x} = 6.112 exp (\frac{17.67 T}{T +243.5})\]

References

Emanuel1994

Emanuel, Kerry A. Atmospheric convection. Oxford University Press on Demand, 1994.

Bolton1980

Bolton, David. The computation of equivalent potential temperature.Monthly weather review, 1980, 108.7: 1046-1053.

metpyqc.calculate.find_neighbors(points, xi, r)

Find neighbors point inside a search radius

Parameters
  • points (array_like, shape(N,2)) – (lat,lon) of surrounding points in decimal degrees. Shape (N,2)

  • xi (array_like, shape(1,2)) – (lat,lon) of center of ball in decimal degrees. Shape (1,2)

  • r (float) – Search radius in decimal degrees

Returns

indices – Indices of neighbors points inside the search radius

Return type

array_like,