Read Statistical%20modeling%20of%20extreme%20rainfall%20event_Report.pdf text version
Anastasiadis Stavros 10Oct10 [email protected]
Statistical modeling of extreme rainfall event (Case Study: Naxos Meteorological Station)
Abstract: The modeling of extreme rainfall events is fundamental part of flood hazard estimation. Most extreme hydrological events cause severe human and material damage. Estimation of rainfall extreme values it's also prime important in engineering practice for water resources and risk assessment. This report introduce a stepwise procedure for estimating quantiles of extreme rainfall events by using classical extreme value theory based on the Generalized Extreme Value (GEV) distribution and Generalized Pareto Distribution (GPD) by applying the Peak Over Threshold method. The aim of this methodology is the estimation of extreme events T of Return Period T based on N years of available rainfall records of Naxos Meteorological Station.
1
Procedure of Case study, Station Naxos Daily Rainfall Records (19552003): · Datasets : Three datasets are used for the analysis Daily Rainfall record of Naxos for time period 19552005 (N=17897) Annual Max record of the Daily Dataset (N=48) Monthly Max record of the Daily Dataset (N= 588) o Descriptive statistics for the three datasets o Distributions Fits of the Datasets Histograms with 15 bins for the three datasets Plots with Normal Fit , Cumulative probability, Empirical Distribution QQplots of the datasets for Normal, Exponential , Weibull, GEV, GPD, EV GEV approach o GEV parameter Estimation with Maximum Likelihood estimator for Annual and Monthly Max records using Daily Rainfall Data (table2) o Quantile Estimation using Eq. 17 for the Annual and Monthly Max records using Daily Rainfall Data (table 3) Peak Over Threshold method o Threshold selection (2 methods , page 7) Sample Mean Plot for the three Datasets Hill Plot for the three Datasets o Creation of the Dataset : Daily Rainfall records over Threshold (u) (u=30 mm, N=71) (figure 20) o Descriptive statistics for Daily Rainfall records over Threshold (u) o Annual Numbers of Data over Threshold (table 4) Histogram of the Annual Numbers of Data over Threshold (5 bins) (figure 19) o Distributions Fits of Daily Rainfall records over Threshold (u) Histograms with 15 bins for the three datasets Plots with Normal Fit , Cumulative probability, Empirical Distribution QQplots of the datasets for Normal, Exponential , Weibull, GEV, GPD, EV o GPD parameter estimation of Daily Rainfall records over Threshold (u) (table 5) Pickands' estimator Probability Weighted Moments Moment method Maximum Likelihood method o Quantile Estimation using Eq. 11 and Eq. 18 for of Daily Rainfall records over Threshold (u)
·
·
2
Hydrological Definitions: Frequency Analysis provides a systematic approach for using historical data to relate the magnitude of a naturally occurring to the probability of its occurring in a given time period or to its recurrence interval. Frequency analysis is typically included in the study of hydrology. Return Period (T) The average length of time in years for an event (e.g. flood or river level) of given magnitude to be exceeded. For example, if rainfall with a 50 year return period at a given location is 120 mm, this is just another way of saying that a rainfall of 120mm or greater, should occur at that location on the average only once every 50 years. Probability of Occurrence (p) (of an event of specified magnitude): The probability that an event of the specified magnitude will be equaled or exceeded during a one year period. If N is the total number of values and m is the rank of a value in a list ordered by descending magnitude(x1>x2>x3..>xm), the exceedence probability of the mth largest value, xm, is 1 Equation (1) is applied successfully in close time series but in hydrology times series are usually open. The Probability of Occurrence (p) is calculated by the equations in the following table.
Table 1 : Empirical Equation for Probability of Occurrence
Probability of Nonoccurrence (q) (of an event of specified magnitude): The probability that an event of the specified magnitude will not be equalled or exceeded during a one year period. Probability of Occurrence within a period of n years (pn): The probability that an event of specified magnitude will be equalled or exceeded within a period of N years. Probability of Nonoccurrence within a period of n years (qn): The probability that an event of specified magnitude will not be equalled or exceeded within a period of N years. A fundamental relationship is that between flood return period (T) and probability of occurrence (p). These two variables are inversely related to each other. 1 1 2 For example, the probability of a 50 year storm occurring in a one year period is 1/50 or 0.02. 3
The probability of occurrence and probability of nonoccurrence are related by the fact that something must either occur or not occur, so p + q = 1 and pn + qn = 1. From basic probability theory, qn = qn. Substituting to get an equation relating pn and p: 1 pn = (1 p) n. This can be rearranged to: pn = 1 (1 p) n. 1 From (1) and (3), Probability of Occurrence in n years, 1 Statistical Method of Probability Estimations of extreme events: There are several approaches to simulate the frequency of extreme events, and to reflect stochastic volatility and leptokurtosis of the return distributions, these being (Goldstein & other): (1) Parametric method is based upon fitting some particular distribution to a set of observed or simulated returns. (2) Historical Non parametric method approach addresses evaluation of appropriate return period histogram. Non parametric approach does not take into consideration events beyond sample range and also does not indicate the tail form. (3) Stochastic methods (Monte Carlo) generate repeated situations that simulate returns based on random traction from stochastic projections. (4) Extreme value theory approach is designed specifically for tail estimation, for recognition and modeling leptokurtic distributions, for dealing with nonstationary distribution and for the determination of current volatility. Extreme Value Theory: Extreme Value (EV) theory forms the theoretical stochastic framework for estimation of extreme quantiles. It's a powerful and yet fairly robust framework for studying the tail behavior of a distribution. According to the Fisher Tipper theorem the block of maxima of a sequence identically, independently distributed (iid) random variables in the limit follows a Generalized Extreme Value (GEV) distribution. A parallel result states that the excesses over a high threshold are Generalized Pareto (GPD) distribution (Peak Over Threshold method). Hydrological time series do not necessarily fulfil the basic requirements of EV theory (Engeland & others, 2004) as there are results of a complex dynamic physical process (example: daily stream data and their extremes are often autocorrelated and not identically distributed). EV theory has, however, shown to provide reasonable approximations in many hydrological cases. 1 1 4 1 1 (3)
4
Figure 1 Maximum and minimum events at desired probability level
Generalized Extreme Value Distribution The normal distribution is the important limiting distribution for sample sums or averages as summarized in a central limit theorem. Similarly, the family of extreme value distributions is used to study the limiting distributions of the sample maxima. This family can be presented under a single parameterization known as the generalized extreme value (GEV) distribution. The theorem of Fisher and Tippett (1928) is at the core of the extreme value theory, which deals with the convergence of maxima. Suppose that x1, x2 , . . . , m is a sequence of independently and identically distributed random variables from an unknown distribution function F (x) and m is the sample size. Denote the maximum of the first n < m observations of x by Mn = max(x1, x2 , . . . , xn). Given a sequence of an > 0 and bn such that (Mn  bn)/an, the sequence of normalized maxima converges in the following GEV distribution: , , , , 0 5
, 0 Where , and are the shape, scale and location parameter, respectively, and x is the maximum of an epoch (Usually in hydrology epoch is a year). If = 0, Eq. (5) is Gumbel distribution with a light upper tail and positively skewed. If < 0, Eq. (5) is Frechet distribution with a heavy upper tail and infinite higher order moments. If > 0, Eq. (5) is Weibull distribution with a bounded upper tail.
Figure 2 Extreme Value distribution forms
5
Generalized Pareto distribution (GPD) and Peak over Threshold method (POT) Extreme value theory,as mentioned earlier, focus only on the maximum of an epoch. An alternative approach, often referred to as the peaks over threshold (POT) approach, is to consider all values greater than a given threshold value. Given a threshold u, the distribution of excess values of x over u is defined by: 6 1 which represents the probability that the value of x exceeds u by at most an amount y, where y = x u. A theorem by Balkema and de Haan (1974) and Pickands (1975) shows that for sufficiently high threshold u, the distribution function of the excess may be approximated by the generalized Pareto distribution (GPD) such that, as the threshold gets large, the excess distribution Fu(y) converges to the GPD, which is, G x, , , µ 1 1 1
/ /

, ,
0 0
7
Where x are the excesses, is a scale parameter and is a shape parameter which is also called tail index or extreme value index. If = 0, Eq. (7) is Exponential (mediumsize tail). If < 0, Eq. (7) is Ordinary Pareto distribution (long tailed). If >0, Eq. (7) is Pareto II type distribution (short tailed). Figure 3 Generalized Pareto Forms The POT method involves three steps: the first step is to choose an appropriate threshold; the second step is to estimate distribution parameters; the last step is to estimate extreme quantiles. 6
Threshold selection The choice of threshold is an important practical problem, which is mainly based on a compromise between bias and variance. The threshold must be high enough for the excess over the threshold to follow GPD but in the same time the sample size should be large enough. · Hill plot threshold selection Hill (1975) proposed the Hill estimator of the tail index as follows: 1 1 log
,
log
,
8
·
Where k is the number of exceedances and N is the sample size. A Hillplot is constructed such that the tail index estimated through Eq.(8) is plotted as a function of either k or of the threshold. A threshold is therefore selected from the plot where the tail index is fairly stable. Sample Mean Excess threshold selection For a claim amount random variable X, the mean excess function or mean residual life function is the expected payment per claim on a policy with a fixed amount deductible of where claims with amounts less than or equal to X are completely ignored: e u E X uX 1 F u du ,
9 1 F u In practice, the mean excess function e u is estimated by based on a representative sample x1, x2 , . . . , xn . 10 # : Extreme Quantile estimation for Return Period T (POT) (Paper: Uncertainty in statistical modeling of extreme hydrological events): For a given return period T, extreme quantile xT at the tail is estimated as follows: 1 1 11 Where Nu is the number of exceedances, is the shape parameter, is the scale parameter and u is the threshold. Extreme Quantile estimation for Return Period T (Paper: Statistical Modeling of Severe Wind Gust) From (6) and the fact that Fu(y) converges to G(x) when u is large, we have the following equation, 1 1 12
Given a threshold u, the Fn(u) can be estimated using (n Nu)/n, where n is the sample size and Nu is the number of exceedances. In the case of 0, (12) can be simplified to 1 1 , 13
It can be seen that (10) is also a GPD with parameters (, ', ') where, 7
1 1
14 1
15
And and are the fitted GPD parameters to x u, where x > u, using the POT method. When the threshold is chosen sufficiently large, it is assumed that the number of exceedances Nu (where u is the threshold) has an approximate Poisson distribution with parameter (the rate of exceedances per year, also called the crossing rate). Hence T is the number of exceedances in T years. Let U be the number of events exceeding a very high level U. 1 1 16
is the inverse of the CDF of the GPD (or the GEV). For GPD, the crossing rate can be Where estimated by Nu/Tdata, where Tdata is the number of years for which data has been recorded. For GEV, the crossing rate has the value 1 if the yearly maximum is used or 12 if the monthly maximum is used. The quantile estimate for GEV can be obtained by inverting (5) and using (16):
1 ln
ln 1 ln 1 1
1 ,
,
0 0
17
For GPD, the quantile estimate is obtained by inverting (6): 1 1 , ln Where , are given in (14), (15). ,
0 0
18
8
Station Naxos Daily Rainfall Records (19552003):
Descriptive statistics Of Daily Rainfall Naxos Dataset (N= 17897) Min Max Mean Median Mode Std Range
0
98.8 1.003 0 0 4.1986 98.8
Figure 4 Naxos Dataset Daily Rainfall 19552003
Descriptive statistics Of Annual Max Rainfall Naxos Dataset (N=49) Min Max Mean Median Mode Std Range
21 98.8 42.508 40 40 17.945 77.8
9
Figure 5 Annual Max Observed Rainfall Value
Bin 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Figure 6 Histogram of Annual Max Rainfall
Count 8 7 8 6 4 2 7 1 1 2 1 0 0 0 2
Center 23.59 28.78 33.96 39.15 44.34 49.52 54.71 59.90 65.08 70.27 75.46 80.64 85.83 91.02 96.20
10
Figure 7 SubPlots Annual Max Rainfall
Figure 8 QQplots Of Annual Max Rainfall
11
Descriptive statistics Of Month Max Rainfall Naxos Dataset (N= 588) Min Max Mean Median Mode Std Range
0 98.8 12.555 7.75 0 14.886 98.8
Figure 9 Month Max Rainfall Value
Figure 10 Histogram of Month Max Rainfall
Bin 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Count 275 83 78 68 35 15 14 6 5 3 2 2 0 0 2
Center 3.2933 9.88 16.467 23.053 29.64 36.227 42.813 49.4 55.987 62.573 69.16 75.747 82.333 88.92 95.507
12
Figure 11 SubPlots Month Max Rainfall
Figure 12 QQplots Of Month Max Rainfall
13
Gev Parameter Estimation
Annual Max Rainfall Data Parameter (mle) 95% CI Parameter (mle) 95% CI 0.24011 0.081931 0.56215 5.292 Nan Nan 11.024 8.3183 14.609 5.3604 Nan Nan 33.239 29.489 36.989 1.0129 Nan Nan
Month Max Rainfall Data
Table 2 GEV Parameter Estimation with Maximum Likehood estimator
Quantile Estimation (Eq.16 Eq.17)
Annual Max Rainfall Data (=1) Month Max Rainfall Data (=12) Threshold Selection T
Eq.16 Eq.17
T
Eq.16 Eq.17
5 53.144 47.124 5 2.49*109 2.0258
10 66.138 52.405 10 9.97*1010 2.0258
20 50 100 1000 81.008 104.5 125.88 228.43 56.65 61.16 63.937 70.408 20 50 100 1000 3.95*1012 5.07*1014 1.99*1016 3.91*1021 2.0258 2.0258 2.0258 2.0258
Table 3 Quantile Estimation with GEV (Ep.16 Ep.17)
Figure 13 Sample Mean Excess Plot with Daily Rainfall Dataset
14
Figure 14 Sample Mean Excess Plot with Annual Max Rainfall Dataset
Figure 15 Sample Mean Excess Plot with Month Max Rainfall Dataset
15
Figure 16 Hill plot with Daily Rainfall Dataset (95% CI)
Figure 17 Hill plot with Annual Max Rainfall Dataset (95% CI)
16
Figure 18 Hill plot with Month Max Rainfall Dataset (95% CI)
Threshold is selected 30 mm from the above plots. Annual Number of Data over Threshold (30mm) 1955 1 1963 1 1971 1 1979 1956 2 1964 0 1972 3 1980 1957 4 1965 0 1973 2 1981 1958 0 1966 0 1974 3 1982 1959 1 1967 0 1975 2 1983 1960 1 1968 0 1976 5 1984 1961 3 1969 0 1977 0 1985 1962 0 1970 2 1978 4 1986
Table 4 Annual Numbers of Data over Threshold (30mm)
3 2 3 2 0 1 5 3
1987 1988 1989 1990 1991 1992 1993 1994
1 1 2 1 1 2 2 4
1995 1996 1997 1998 1999 2000 2001 2002
1 2003 0 1 2 2 0 0 3
4
Figure 19 Histogram of Number of Daily Rainfall Data over Threshold (30mm 10 bins)
17
Descriptive statistics Of Daily Rainfall over Threshold (30mm) Naxos Dataset (N=71) Min Max Mean Median Mode Std Range
30.1 98.8 43.573 38 40 14.647 68.7
Figure 20 Daily Rainfall Data over Threshold (30mm)
Figure 21 Histogram Rainfall Over Threshold (30mm)
Bin 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Count 24 12 11 5 5 5 2 1 2 2 0 0 0 0 2
Center 32.39 36.97 41.55 46.13 50.71 55.29 59.87 64.45 69.03 73.61 78.19 82.77 87.35 91.93 96.51 18
Figure 22 SubPlots Rainfall Data over Threshold (30 mm)
Figure 23 QQ plot with Data over Threshold (30mm)
19
Daily Rainfall Data over Threshold (30mm) GPD Parameter Estimation
Parameter Estimation Method Pickands' estimator 0.88783 13.259 Probability Weighted Moments 3.8643 211.95 Moment method 3.925 214.6 Maximum Likelihood method 0.64839 65.317
Table 5 GPD Parameter Estimations
T Pickands' estimator Daily Rainfall Probability Weighted Data Moments Over Threshold Moment (30mm) method Maximum Likelihood method
Eq. 18
2
15.20 4.8E+09 6.5E+09 67.7769
5
15.37
10
15.64
20
16.13 661844 769513
50
17.46
100
19.51
200
23.29
1000
49.40 84.66 84.50
1.4E+08 9639812 1.8E+08 65.3746 1.2E+07 62.3315
19104.7 1232.81 5.62877 21018.3 1304.62 6.78897
57.5617 46.8703 33.3276 12.1003 95.74942
Table 6 Quantile Estimation with GPD(Ep.18)
T Pickands' estimator Daily Rainfall Probability Weighted Data Moments Over Threshold Moment (30mm) method Maximum Likelihood method
Eq. 11
2
1048.65 24.8492 24.6747 2186.53
5
439.85 24.8492
10
20
50
100
200
1000
217.056 96.6518 17.8306 11.01454 26.60318 40.54269 24.849 24.8461 24.7432 24.672 24.5789 23.3056 23.22 52.6533 2.37 2.57902 11267.61 12215.13
24.6746 24.6745
1148.51 685.412 389.959 156.713
13.73585 89.52921
Table 7 Quantile Estimation with GPD (Ep.11)
20
Questions · Main Idea: Fitting data with a suitable distribution and from the relation ( ) between probability and return period (T, in years) estimation of rainfall values (xm) for several return periods. For Generalized Extreme Value distribution, parameter's estimation was made by using Maximum Likelihood Estimator (table2). Which other estimators are available for GEV? According to the Fisher Tipper theorem the block of maxima of a sequence identically, independently distributed (iid) random variables in the limit follows a Generalized Extreme Value (GEV) distribution. Which methods are capable to examine if our data are identically and independently? How we can evaluate parameter's estimators? The dataset of Naxos Daily Rainfall (N=17897) has 14853 records of Daily Rainfall equal to Zero. What is the best method to tried Zeros of our dataset? How threshold (u) can be examined for other values? Example : u is follows Normal distribution Why POT gives negative estimations of xm ?(table 6, table 7) Which objective function can be used for determination of histogram bins? The maximum value in Naxos Daily Rainfall Data (N=17897) is observed 98.8 mm. How could the probability of 98.8 to appear in the following year can be estimated? (Naxos data 19552003, which is the probability of 98.8 mm to appear in 2004?) In the Paper "Statistical Modelling of Severe Wind Gust", pag.23, return period follows Poisson distribution. Which other distributions can be used? The threshold must be high enough for the excess over the threshold to follow GPD but in the same time the sample size should be large enough. How we can build a process that follow "high enough" and "large enough data size" for different thresholds? (Sme and Hill plot assumes the threshold at the point in which plot starts to became more linear) Secondarily Investigation: Given Naxos Daily Rainfall Dataset split in two periods. (Example 19551990 / 19912003). Which is the best approach to investigate if statistical differences appear in the two periods?
·
·
· · · · · ·
· ·
·
21
References
Goldstein J., Mirza M., Etkin D., Milton J. (.....) "Hydrological Assessment : Application of extreme value theory for climate extremes scenarios construction", Environment Canada. Katz R., Parlange M., Naveau P. (2002) "Statistics of extremes in hydrology", Elsevier, Advances in Water Resources 25, 12871304. England K., Hisdal H., Frigesso. (2004) "Practical Extreme Value Modelling of hydrological Flood and Droughts : A Case Study", Springer Science, Extreme 7, 230. Adlouni S., Bobee B., Ouarda T.B.M.J., (2008) "On the tails of extreme event distributions in hydrology", Science Direct, Journal of Hydrology 355, 1633. Xu Y.P., Booij M., Tong Y.B., (2009) "Uncertainty analysis in modeling of extreme hydrological events", Stoch Environ Res Rick Assess, 10.1007/s0047700903378. Lin X.G., (....) "Statistical Modelling of Severe Wind Gust", Risk Modelling Project, Minerals and Geohazards Division, GeoScience Australia, Canberra, Australia. Holger D., Laurens H., Resnick S. ,( ......) "How to built a hill plot" ,
22
Information
22 pages
Report File (DMCA)
Our content is added by our users. We aim to remove reported files within 1 working day. Please use this link to notify us:
Report this file as copyright or inappropriate
1003309