Contents
 Introduction
 Metrics for Deterministic Forecasts
 Mean Absolute Error (MAE)
 Mean Bias Error (MBE)
 Root Mean Square Error (RMSE)
 Forecast Skill
 Mean Absolute Percentage Error (MAPE)
 Normalized Mean Absolute Error (NMAE)
 Normalized Mean Bias Error (NMBE)
 Normalized Root Mean Square Error (NRMSE)
 Centered (unbiased) Root Mean Square Error (CRMSE)
 Pearson Correlation Coefficient (r)
 Coefficient of Determination (R^2)
 Relative Euclidean Distance (D)
 KolmogorovSmirnov Test Integral (KSI)
 OVER
 Combined Performance Index (CPI)
 Metrics for Deterministic Forecast Events
 Probability of Detection (POD)
 False Alarm Ratio (FAR)
 Probability of False Detection (POFD)
 Critical Success Index (CSI)
 Event Bias (EBIAS)
 Event Accuracy (EA)
 Metrics for Probablistic Forecasts
 Brier Score (BS)
 Brier Skill Score (BSS)
 Reliability (REL)
 Resolution (RES)
 Uncertainty (UNC)
 Quantile Score (QS)
 Quantile Skill Score (QSS)
 Sharpness (SH)
 Continuous Ranked Probability Score (CRPS)
 Value Metrics
 References
Metrics
The Solar Forecast Arbiter evaluation framework provides a suite of metrics for evaluating deterministic and probablistic solar forecasts. These metrics are used for different purposes, e.g., comparing the forecast and the measurement, comparing the performance of multiple forecasts, and evaluating an event forecast.
Metrics for Deterministic Forecasts
The following metrics provide measures of the performance of deterministic forecasts. Each metric is computed from a set of \(n\) forecasts \((F_1, F_2, \dots, F_n)\) and corresponding observations \((O_1, O_2, \dots, O_n)\).
In the metrics below, we adopt the following nomenclature:
 \(n :\) number of samples
 \(F :\) forecasted value
 \(O :\) observed (actual) value
 \(\text{norm} :\) normalizing factor (with the same units as the forecasted and observed values)
 \(\bar{F}, \, \bar{O} :\) the mean of the forecasted and observed values, respectively
For more information on these metrics and others, see Zhang15, Wilks11 and the references listed below.
Note that for normalized metrics (NMAE, NMBE, NRMSE), the Solar Forecast Arbiter currently allows no user control over normalization via the dashboard. Instead, the Arbiter has the following behavior depending on the forecasted variable type:
 AC power: normalize using the AC capacity of the selected power plant
 DC power: normalize using the DC capacity of the selected power plant
 irradiance: no normalization; return normalized metric values as
NaN
 weather (e.g. wind speed): no normalization; return normalized metric values as
NaN
Additionally, the Solar Forecast Arbiter allows users to account for observation uncertainty by setting the error (forecast  observation) equal to zero for any point that is within a specified deadband, with the error unchanged for any point that lies outside the deadband. The deadband is specified as a percentage of the observation value at each time. A value of None
indicates that no deadband was applied for that observation/forecast pair. Currently, the deadband is accounted for in the following metrics: MAE, MBE, RMSE, MAPE, NMAE, NMBE, NRMSE. The deadband is ignored for all other metrics.
Mean Absolute Error (MAE)
The absolute error is the absolute value of the difference between the forecasted and observed values. The MAE is defined as:
\[\text{MAE} = \frac{1}{n} \sum_{i=1}^n \lvert F_i  O_i \rvert\]Mean Bias Error (MBE)
The bias is the difference between the forecasted and observed values. The MBE is defined as:
\[\text{MBE} = \frac{1}{n} \sum_{i=1}^n (F_i  O_i)\]Root Mean Square Error (RMSE)
The RMSE is the square root of the averaged of the squared differences between the forecasted and observed values, and is defined as:
\[\text{RMSE} = \sqrt{ \frac{1}{n} \sum_{i=1}^n (F_i  O_i)^2 }\]RMSE is a frequently used measure for evaluating forecast accuracy. Since the errors are squared before being averaged, the RMSE gives higher weight to large errors.
Forecast Skill (\(s\))
The forecast skill measures the performance of a forecast relative to a reference forecast (Marquez12). The Solar Forecast Arbiter uses the definition of forecast skill based on RMSE:
\[s = 1  \frac{\text{RMSE}_f}{\text{RMSE}_{\text{ref}}}\]where \(\text{RMSE}_f\) is the RMSE of the forecast of interest, and \(\text{RMSE}_{\text{ref}}\) is the RMSE of the reference forecast, e.g., persistence.
Mean Absolute Percentage Error (MAPE)
The absolute percentage error is the absolute value of the difference between the forecasted and observed values,
\[\text{MAPE} = 100\% \cdot \frac{1}{n} \sum_{i=1}^n  \frac{F_i  O_i}{O_i} \]Normalized Mean Absolute Error (NMAE)
The NMAE [%] is the normalized form of the MAE and is defined as:
\[\text{NMAE} = \frac{100\%}{\text{norm}} \cdot \frac{1}{n} \sum_{i=1}^n \lvert F_i  O_i \rvert\]where norm is a constant upper bound on the value of the forecasted variable, e.g., the nameplate AC (DC) capacity of a PV plant when forecasting AC (DC) power.
Normalized Mean Bias Error (NMBE)
The NMBE [%] is the normalized form of the MBE and is defined as:
\[\text{NMBE} = \frac{100\%}{\text{norm}} \cdot \frac{1}{n} \sum_{i=1}^n (F_i  O_i)\]where norm is a constant upper bound on the value of the forecasted variable, e.g., the nameplate AC (DC) capacity of a PV plant when forecasting AC (DC) power.
Normalized Root Mean Square Error (NRMSE)
The NRMSE [%] is the normalized form of the RMSE and is defined as:
\[\text{NRMSE} = \frac{100\%}{\text{norm}} \cdot \sqrt{ \frac{1}{n} \sum_{i=1}^n (F_i  O_i)^2 }\]where norm is a constant upper bound on the value of the forecasted variable, e.g., the nameplate AC (DC) capacity of a PV plant when forecasting AC (DC) power.
Centered (unbiased) Root Mean Square Error (CRMSE)
The CRMSE describes the variation in errors around the mean and is defined as:
\[\text{CRMSE} = \sqrt{ \frac{1}{n} \sum_{i=1}^n \left( (F_i  \bar{F})  (O_i  \bar{O}) \right)^2 }\]The CRMSE is related to the RMSE and MBE through \(\text{RMSE}^2 = \text{CRMSE}^2 + \text{MBE}^2\), and can be decomposed into components related to the standard deviation and correlation coefficient:
\[\text{CRMSE}^2 = \sigma_F^2 + \sigma_O^2  2 \sigma_F \sigma_O r\]where \(\sigma_F\) and \(\sigma_O\) are the standard deviations of the forecast and observation, respectively, and \(r\) is the correlation coefficient.
Pearson Correlation Coefficient (\(r\))
Correlation indicates the strength and direction of a linear relationship between two variables. The Pearson correlation coefficient, aka, the sample correlation coefficient, measures the linear dependency between the forecasted and observed values, and is defined as the ratio of the covariance of the variables to the product of their standard deviation:
\[r = \frac{ \sum_{i=1}^n (F_i  \bar{F}) (O_i  \bar{O}) }{ \sqrt{ \sum_{i=1}^n (F_i  \bar{F})^2} \times \sqrt{ \sum_{i=1}^n (O_i  \bar{O})^2 } }\]Coefficient of Determination (\(R^2\))
The coefficient of determination measures the extent that the variability in the forecast errors is explained by variability in the observed values, and is defined as:
\[R^2 = 1  \frac{ \sum_{i=1}^n (O_i  F_i)^2 }{ \sum_{i=1}^n (O_i  \bar{O})^2 }\]By this definition, a perfect forecast has a \(R^2\) value of 1.
Relative Euclidean Distance (\(D\))
The relative Euclidean distance (D) combines a percent bias error, a percent variance error, and the correlation error in quadrature (Wu12). It is defined as:
\[\text{D} = \sqrt{ \left( \frac{\overline{F}  \overline{O} } { \overline{O} } \right) ^ 2 + \left( \frac{\sigma_{F}  \sigma_{O} } { \sigma_{O} } \right) ^ 2 + \left( \textrm{corr}  1 \right) ^ 2 }\]where:
 \(\overline{F}\) is the forecast mean
 \(\overline{O}\) is the observation mean
 \(\sigma_{F}\) is the forecast standard deviation
 \(\sigma_{O}\) is the observation standard deviation
 \(\textrm{corr}\) is the Pearson correlation coefficient
Special cases include:
 If \(\overline{F} = 0\) and \(\overline{O} = 0\), the bias term is 0 and the metric is defined by the remaining terms.
 If \(\overline{F} \neq 0\) and \(\overline{O} \rightarrow 0\), \(D \rightarrow \infty\).
 If \(\sigma_{F} = 0\) or \(\sigma_{O} = 0\), \(D\) is undefined.
KolmogorovSmirnov Test Integral (KSI)
The KSI quantifies the level of agreement between the cumulative distribution function (CDFs) of the forecasted and observed values (Espinar09), and is defined as:
\[\text{KSI} = \int_{p_{\text{min}}}^{p_{\text{max}}} D_n(p) dp\]where \(p_{\text{min}}\) and \(p_{\text{max}}\) are the minimum and maximum values of the union of forecast and observed values, and \(D_n(p)\) is the absolute difference between the two empirical CDFs:
\[D_n(p) = \text{max}(  \text{CDF}_O(p)  \text{CDF}_F(p)  )\]A KSI value of zero implies that the CDFs of the forecast and observed values are equal.
KSI can be normalized as:
\[KSI (\%) = \frac{100}{a_{\text{critical}}} KSI\]where \(a_{\text{critical}} = V_c (p_{\text{max}}  p_{\text{min}})\) and \(V_c = 1.63 / \sqrt{n}\). When \(n \geq 35 ,\) the normalized KSI can be interpreted as a statistic that tests the hypothesis that the two empirical CDFs represent samples drawn from the same population.
OVER
Conceptually, the OVER metric modifies the KSI to quantify the difference between the two CDFs, but only where the CDFs differ by more than a critical limit \(V_c\) (Espinar09). The OVER metric is calculated as:
\[OVER = \int_{p_{\text{min}}}^{p_{\text{max}}} D_n^* dp\]where
\[D_n^* = \begin{cases} \displaystyle D_n  V_c & \text{if} & D_n > V_c \\ \displaystyle 0 & \text{if} & D_n \leq V_c \end{cases}\]The OVER metric can be normalized using the same approach as for KSI.
Combined Performance Index (CPI)
The CPI provides a measure of the agreement between the distributions of forecasted and observed values, and the overall error by combining KSI, OVER and RMSE (Gueymard12):
\[\text{CPI} = \frac{1}{4} ( \text{KSI} + \text{OVER} + 2 \times \text{RMSE} )\]Metrics for Deterministic Event Forecasts
An event is defined by values that exceed or fall below a threshold. A typical event is the ramp in power of solar generation, which is determine by:
\[ P(t + \Delta t)  P(t)  > \text{Ramp Forecasting Threshold}\]where \(P(t)\) is the solar power output at time \(t\) and \(\Delta t\) is the duration of the ramp event.
Based on the predefined threshold, all observations or forecasts can be evaluated by placing them in either the “event occurred” (Positive) or “event did not occur” (Negative) categories. Then individual pairs of forecasts and observations can be placed into one of four groups based on whether the event forecast agrees (or disagrees) with the event observed value:
 True Positive (TP): Forecast = Event, Observed = Event
 False Positive (FP): Forecast = Event, Observed = No Event
 True Negative (TN): Forecast = No Event, Observed = No Event
 False Negative (FN): Forecast = No Event, Observed = Event
By then counting the the number of TP, FP, TN and FN values, the following metrics can be computed:
Probability of Detection (POD)
The POD is the fraction of observed events correctly forecasted as events:
\[POD = \frac{TP}{TP + FN}\]False Alarm Ratio (FAR)
The FAR is the fraction of forecasted events that did not occur:
\[FAR = \frac{FP}{TP + FP}\]Probability of False Detection (POFD)
The POFD is the fraction of observed nonevents that were forecasted as events:
\[POFD = \frac{FP}{FP + TN}\]Critical Success Index (CSI)
The CSI evaluates how well an event forecast predicts observed events, e.g., ramps in irradiance or power. THe CSI is the relative frequency of hits, i.e., how well predicted “yes” events correspond to observed “yes” events:
\[CSI = \frac{TP}{TP + FP + FN}\]Event Bias (EBIAS)
The EBIAS is the ratio of counts of forecast and observed events:
\[EBIAS = \frac{TP + FP}{TP + FN}\]Event Accuracy (EA)
The EA is the fraction of events that were forecasted correctly, i.e., forecast = “yes” and observed = “yes” or forecast = “no” and observed = “no”:
\[EA = \frac{TP + TN}{TP + FP + TN + FN} = \frac{TP + TN}{n}\]where \(n\) is the number of samples.
Metrics for Probablistic Forecasts
Probablistic forecasts represent uncertainty in the forecast quantity by providing a probability distribution or a prediction interval, rather than a single value.
In the metrics below, we adopt the following nomenclature:
 \(F(t_k) :\) probability forecast for an event \(o\) at each time \(t_k\)
 \(f_i :\) discrete values that appear in the probability forecast \(F\)
 \(o(t_k) :\) indicator for event \(o\): \(o(t_k) = 1\) if an event occurs at time \(t_k\) and \(o(t_k) = 0\) otherwise
 \(N_i :\) the number of times each forecast value \(f_i\) appears in the forecast \(F\)
 \(n = \sum_{i=1}^I N_i :\) number of forecast events
 \(p(f_i) = \frac{N_i}{n} :\) the relative frequency of each forecast value \(f_i\) in the forecast \(F\)

\(\bar{o}_i = p(o_i \ f_i ) = \frac{1}{N_i} \sum_{k \in N_i} o_k :\) the average of \(o(t_k)\) at the \(N_i\) times \(t_k\) when \(F(t_k) = f_i\)
 \(\bar{o} = \frac{1}{n} \sum_{k=1}^n o(t_k) = \frac{1}{n} \sum_{i=1}^I N_i \bar{o}_i :\) the average of \(o(t_k)\) for all times \(t_k\)
Brier Score (BS)
The BS measures the accuracy of forecast probability for one or more events (Brier50). For events with binary outcomes, BS is defined as:
\[\text{BS} = \frac{1}{n} \sum_{i=1}^n (f_i  o_i)^2\]Smaller values of BS indicate better agreement between forecasts and observations. Note that while BS can be generalized to events with more than two outcomes, the Solar Forecast Arbiter only includes builtin support for the (more commonly used) binary events definition. For more info, see Section 7.4.2. of Wilks11.
When the probability forecast takes on a finite number of values (e.g. 0.0, 0.1, …, 0.9, 1.0), the BS can be decomposed into a sum of three metrics that give additional insight into a probability forecast (Murphy73):
\[\text{BS} = \text{REL}  \text{RES} + \text{UNC}\]where REL is the reliability, RES is the resolution and UNC is the uncertatinty, as defined below.
Reliability (REL)
The REL is given by:
\[\text{REL} = \frac{1}{n} \sum_{i=1}^I N_i (f_i  \bar{o}_i)^2\]Reliability is the weighted averaged of the squared differences between the forecast probabilities \(f_i\) and the relative frequencies of the observed event in the forecast subsample of times where \(F(t_k) = f_i\). A forecast is perfectly reliably if \(\text{REL} = 0\). This occurs when the relative event frequency in each subsample is equal to the forecast probability for the subsample.
Resolution (RES)
The RES is given by:
\[\text{RES} = \frac{1}{n} \sum_{i=1}^I N_i (\bar{o}_i  \bar{o})^2\]Resolution is the weighted averaged of the squared differences between the relative event frequency for each forecast subsample and the overall event frequency. Resolution measures the forecast’s ability to produce subsample forecast periods where the event frequency is different. Higher values of RES are desirable.
Uncertainty (UNC)
The UNC is given by:
\[\text{UNC} = \bar{o} (1  \bar{o})\]Uncertainty is the variance of the event indicator \(o(t)\). Low values of UNC indicate that the event being forecasted occurs only rarely.
Brier Skill Score (BSS)
The BSS is based on the BS and measures the performance of a probability forecast relative to a reference forecast:
\[\text{BSS} = 1  \frac{\text{BS}_f}{\text{BS}_{\text{ref}}}\]where \(\text{BS}_f\) is the BS of the forecast of interest, and \(\text{BS}_{\text{ref}}\) is the BS of the reference forecast. BSS greater than zero indicates the forecast performed better than the reference and vice versa for BSS less than zero, while BSS equal to zero indicates the forecast is no better (or worse) than the reference.
Quantile Score (QS)
QS measures the accuracy of quantile forecasts, in which the forecast predicts the variable value corresponding to a constant probability (Koenker78, Wilks19). QS is similar to BS, but measures accuracy in terms of the variable value (e.g. MW) and is defined as:
\[\text{QS} = \frac{1}{n} \sum_{i=1}^n (\text{fx}_i  \text{obs}_i) \cdot (p  \mathbf{1}\{ \text{obs}_i > \text{fx}_i \})\]where \(\mathbf{1}\{ \text{obs} > \text{fx} \}\) is an indicator function:
\[\mathbf{1}\{ \text{obs} > \text{fx} \} = \begin{cases} \displaystyle 1 & \text{obs} > \text{fx} \\ \displaystyle 0 & \text{obs} \leq \text{fx} \end{cases}\]Smaller QS values indicate more accurate forecasts.
QS is always greater than or equal to 0. If \(\text{obs} > \text{fx}\), then QS is nonnegative:
\[\begin{align} (\text{fx}  \text{obs}) &< 0 \\ (p  \mathbf{1}\{\text{obs} > \text{fx}\}) &= (p  1) \leq 0 \\ (\text{fx}  \text{obs}) \cdot (p  1) &\geq 0 \end{align}\]If instead \(\text{obs} < \text{fx}\), then QS is also nonnegative:
\[\begin{align} (\text{fx}  \text{obs}) &> 0 \\ (p  \mathbf{1}\{\text{obs} > \text{fx}\}) &= (p  0) \geq 0 \\ (\text{fx}  \text{obs}) \cdot p &\geq 0 \\ \end{align}\]Quantile Skill Score (QSS)
QSS is based on the QS and measures the performance of a quantile forecast relative to a reference forecast (Bouallegue15):
\[\text{QSS} = 1  \frac{ \text{QS}_{\text{fx}} }{ \text{QS}_{\text{ref}} }\]where \(\text{QS}_{\text{fx}}\) is the QS of the forecast of interest, and \(\text{QS}_{\text{ref}}\) is the QS of the reference forecast. The interpretation of QSS values is the same as BSS.
Sharpness (SH)
The SH represents the degree of “concentration” of a forecast comprising a prediction interval of the form \([ f_l, f_u ]\) within which the forecast quantity is expected to fall with probability \(1  \beta\). A good forecast should have a low sharpness value. The prediction interval endpoints are associated with quantiles \(\alpha_l\) and \(\alpha_u\), where \(\alpha_u  \alpha_l = 1  \beta\). For a single prediction interval, the SH is:
\[\text{SH} = f_u  f_l\]and for a timeseries of prediction intervals, the SH is given by the average:
\[\text{SH} = \frac{1}{n} \sum_{i=1}^n f_{u,i}  f_{l, i}\]Continuous Ranked Probability Score (CRPS)
The CRPS is a score that is a designed to measure both the reliability and sharpness of a probablistic forecast (Matheson76). For a timeseries of forecasts comprising a CDF at each time point, the CRPS is:
\[\text{CRPS} = \frac{1}{n} \sum_{i=1}^n \int ( F_i(x)  O_i(x) )^2 dx\]where \(F_i(x)\) is the CDF of the forecast quantity \(x\) at time point \(i\), and \(O_i(x)\) is the CDF associated with the observed value \(x_i\):
\[O_i = \begin{cases} \displaystyle 0 & x < x_i \\ \displaystyle 1 & x \geq x_i \end{cases}\]The CRPS reduces to the mean absolute error (MAE) if the forecast is deterministic.
Value Metrics
Forecasts can provide economic value in a number of different ways. At a system operator (balancing authority) level they improve scheduling of the system by more effectively committing and dispatching resources to balance supply and demand. This can result in reduced startup of quick start units, more effective use of cheaper generation resources and better use of storage to manage variability. Forecasts can also provide financial benefits to plant owners, traders, and other market participants by allowing them to improve bidding strategies or otherwise reduce risks.
There are two main approaches to assessing costrelated impacts of forecasts: 1) as a function of forecast error (e.g. $/MW of RMSE) and 2) simulations using a production cost model (PCM). Both approaches focus on the value from decisions made based on the forecasts and do not include secondary costs, e.g., the cost to develop and deploy the forecast models. The Solar Forecast Arbiter will only include builtin support for evaluating value as a function of error, but we provide a brief introduction to production cost modeling (see below) for users interested in more accurate assessments.
Value as a Function of Error
Let \(\text{cost}\) be the cost incurred due to forecast error when, say, operating a system or participating in a market. This cost can be written as:
\[\text{cost} = \sum_{i=1}^n C_i(S(F_i, O_i)) ,\]where \(S(\cdot)\) is a measure of the error between the forecast (\(F_i\)) and observation (\(O_i\)), and \(C_i(\cdot)\) are functions that map the forecast error to a cost. In the simplest case, all the \(C_i(\cdot)\) are identical and defined as a constant cost per error value (e.g. [$/MW of RMSE]):
\[\text{cost} = C \cdot S(F, O) .\]However, the \(C_i(\cdot)\) can be defined such that the cost per error varies as a function of time (e.g. onpeak vs offpeak or weekday vs weekend) or as a function of error magnitude (e.g. costs increasing in tiers, with larger errors costing more than smaller errors). For example, a onpeak/offpeak cost could be defined as:
\[C_i = \begin{cases} \displaystyle C_1 & \text{4pm} \leq \text{time} \leq \text{8pm} \\ \displaystyle C_2 & \text{otherwise} \end{cases}\]where \(C_1 \gg C_2\), i.e., the cost of misforecasts during onpeak periods is greater than during offpeak. Similarly, the cost could be defined in tiers based on the error magnitude:
\[C_i = \begin{cases} \displaystyle C_1 & \text{error} \leq 20\% \\ \displaystyle C_2 & 20 < \text{error} \leq 50\% \\ \displaystyle C_3 & \text{error} > 50\% \end{cases}\]where \(C_1 < C_2 < C_3\).
While this approach is straightforward to interpret, a key challenge is how to determine the \(C_i(\cdot)\). The \(C_i(\cdot)\) could be based on analysis of historical data such as realtime energy prices, differences between dayahead and realtime prices, reserve prices (where reserve depends on forecast error) or suitable proxies for nonISO regions. The Solar Forecast Arbiter relies on users to supply the \(C_i(\cdot)\) relevant to their forecast application.
The monetary value of an improved forecast (\(\text{value}_f\)) is then defined as:
\[\text{value}_f = \text{cost}_{f}  \text{cost}_{\text{ref}},\]where \(\text{cost}_f\) and \(\text{cost}_{\text{ref}}\) refer to the costs of the selected forecast and reference forecast, respectively. Note that the choice of the reference forecast is crucial and should be consistent with current operational practices.
Production Cost Modeling
An alternative approach (not implemented by the Solar Forecast Arbiter) is to perform simulations using a production cost model (PCM) and then compare differences in costs incurred when using different forecasts. In addition to providing a more direct evaluation of the forecasts, simulations can provide insight into future value, e.g., how improved forecasts can improve system operations as solar penetration increases. However, such simulations require additional data dependencies and expertise that may not be readily available to forecasters.
In order to the simulate the system, a PCM should be used to simulate the operations with and without energy storage. A number of key considerations for such simulations include:
 Use of multicycle models: A multicycle model captures operations in at least two decision stages, such as dayahead and realtime processes, and links the data together. For example a dayahead decision may be made based on dayahead forecasts and certain generators committed to provide the forecasted energy needs, plus any reserves. Then, the model updates to real time actuals and the system is redispatched, recognizing limitations on the ability to commit additional generation in response to errors. If such a model is used, the ability of an improved forecast to reduce startup of quick start units or reduce solar curtailment can be captured. An example of such a model is included in Ela13 and MartinezAnido16.
 Use of dynamic reserves that reflect forecast errors: As solar penetration increases, it is likely to impact reserves associated with balancing, such as regulation or ramping reserves. A number of ISOs and utilities are moving towards dynamically setting those reserves based on analysis of historical forecast error. Therefore, reducing forecast errors can result in reduced reserve requirements, which should also be included in simulations.
Models that include the above can be used to assess value of forecasts, and have been exercised in previous studies, such as by NREL and others (Zhang15, Wang16a, Wang16b, Wang17). Such an approach provides a more extensive estimate of the value of improved forecasts. At the same time, they are still limited by simplifications made in any model, and are best used for order of magnitude or relative studies. For example, a PCM may say that reducing forecast MAPE by 10% reduces costs in a given system by 1.5%. However, the 1.5% result should be interpreted as meaning reducing the MAPE by 10% is likely to reduce the costs in the range of 010%, rather that stating that the cost reduction will be exactly 1.5%.
References
 [Bouallegue15] Z. Bouallegue, P. Pinson and P. Friederichs, “Quantile forecast discrimination ability and value”, Quarterly Journal of the Royal Meteorological Society, vol. 141, pp. 34153424, 2015. DOI: 10.1002/qj.2624
 [Brier50] G. W. Brier, “Verification of Forecasts Expressed in Terms of Probability”, Mon. Wea. Rev., vol. 78, pp. 13, 1950. DOI: 10.1175/15200493(1950)078<0001:VOFEIT>2.0.CO;2
 [Ela13] E. Ela, V. Diakov, E. Ibanez, and M. Heaney, “Impacts of variability and uncertainty in solar photovoltaic generation at multiple timescales”, Technical Report, NREL/TP550058274, Golden, CO, May 2013
 [Espinar09] B. Espinar, L. Ramírez, A. Drews, H. G. Beyer, L. F. Zarzalejo, J. Polo, and L. Martín, “Analysis of different comparison parameters applied to solar radiation data from satellite and German radiometric stations”, Solar Energy, vol. 83, issue 1, pp. 118125, 2009. DOI: 10.1016/j.solener.2008.07.009
 [Gueymard12] C. A. Gueymard, “Clearsky irradiance predictions for solar resource mapping and largescale applications: improved validation methodology and detailed performance analysis of 18 broadband radiative models”, Solar Energy, vol. 86, pp. 21452169, 2012. DOI: 10.1016/j.solener.2011.11.011
 [Koenker78] R. Koenker and G. Bassett, Jr., “Regression Quantiles”, Econometrica, vol. 46, no. 1, pp. 3350, 1978. DOI: 10.2307/1913643
 [MartinezAnido16] C. B. MartinezAnido, B. Botor, A. R. Florita, C. Draxl, S. Lu, H. F. Hamann, and B. M. Hodge, “The value of dayahead solar power forecasting improvement”, Solar Energy, vol. 129, pp. 192203, 2016. DOI: 10.1016/j.solener.2016.01.049
 [Marquez12] R. Marquez and C. F. M. Coimbra, “Proposed Metric for Evaluation of Solar Forecasting Models”, 2012
 [Matheson76] J. E. Matheson and R. L. Winkler, “Scoring Rules for Continuous Probability Distributions”, Management Science, vol. 22, no. 10, pp. 10871096, 1976. DOI: 10.1287/mnsc.22.10.1087
 [Murphy73] A. H. Murphy, “A New Vector Partition of the Probability Score”, J. Appl. Meteor., vol. 12, pp. 595600, 1973. DOI: 10.1175/15200450(1973)012%3C0595:ANVPOT%3E2.0.CO;2
 [Wang16a] Q. Wang, H. Wu, A. R. Florita, C. B. MartinezAnido, and B. M. Hodge, “The value of improved wind power forecasting: Grid flexibility quantification, ramp capability analysis, and impacts of electricity market operation timescales”, Applied Energy, 184, pp. 696713, 2016. DOI: 10.1016/j.apenergy.2016.11.016
 [Wang16b] Q. Wang, C. Brancucci, H. Wu, A. R. Florita, and B. M. Hodge, “Quantifying the Economic and Grid Reliability Impacts of Improved Wind Power Forecasting”, IEEE Transactions on Sustainable Energy, vol. 7, no. 4, pp. 15251537, 2016. DOI: 10.1109/TSTE.2016.2560628
 [Wang17] Q. Wang, and B. M. Hodge, “Enhancing Power System Operational Flexibility with Flexible Ramping Products: A Review”, IEEE Transactions on Industrial Informatics, vol. 13, no. 4, pp. 16521664, 2017. DOI: 10.1109/TII.2016.2637879
 [Wilks11] D. S. Wilks, “Statistical Methods in the Atmospheric Sciences”, 3rd ed. Oxford; Waltham, MA; Academic Press, 2011.
 [Wilks19] D. S. Wilks, “Statistical Methods in the Atmospheric Sciences”, 4th ed. Oxford; Waltham, MA; Academic Press, 2019.
 [Wu12] W. Wu, Y. Liu, and A. K. Betts, “Observationally based evaluation of NWP reanalyses in modeling cloud properties over the Southern Great Plains”, Journal of Geophysical Research, vol. 117, D12202, 2012. DOI: 10.1029/2011JD016971
 [Zhang15] J. Zhang, A. Florita, B. M. Hodge, S. Lu, H. F. Hamann, V. Banunarayanan, A. Brockway, “A suite of metrics for assessing the performance of solar power forecasting”, Solar Energy, vol. 111, pp. 157175, 2015. DOI: 10.1016/j.solener.2014.10.016