Aditya polumetla in partial fulfillment of the requirements for the degree of master of science

səhifə	4/12
tarix	25.06.2016
ölçüsü	1.32 Mb.

1 2 3 4 5 6 7 8 9 ... 12

Table 2.3: The Backward Algorithm

Backward Algorithm

Initialization (i = L) :

Recursion (i = L-1 ..... 1) :

Termination :

b_k(i) - probability of observing rest of the sequence when in state k and

having already seen i symbols

e_m(x_i) - is the probability of emitting the symbol x_i by state m

a_km- is the probability of transition from state k to state m

P(x) - is the probability of observing the entire sequence

At the start of the Backward algorithm we initialize the probability of being at a state k after observing an entire sequence, b_k(L), to the transition probability from this state k to the end state.
The backward probabilities at each position in the symbol sequence are calculated moving from the end of the sequence to the start. The final probability of the sequence obtained P(x) by either Forward or Backward algorithm is the same. The pseudocode of the Backward algorithm [Durbin et al., 1989] is shown in Table 2.3.
2.3.2 The Baum-Welch Algorithm
The Baum-Welch algorithm, also called as the Forward-Backward algorithm, is used to adjust the HMM parameters when the path taken by each training sequence is not known. The HMM parameters can be initialized to predetermined values or to a constant before applying the Baum-Welch algorithm. As the path taken is not known, the Baum-Welch algorithm uses the counts of number of times each parameter is used when the observed set of symbols in the training sequence is given to the present HMM. The algorithm constitutes of two steps, the Expectation step (E Step) and the Maximization step (M Step). In the Expectation step we first find the forward and backward probability values at each position in the sequence. When forward and backward probabilities are combined together to obtain the probability of the entire sequence with symbol k being observed at state i and is given by

Using all of the training set of sequences, we can calculate the expected number of times a symbol c is emitted at state k using

Table 2.4: The Baum-Welch Algorithm

Baum-Welch Algorithm

Initialize the parameters of HMM and pseudocounts n'_k,cand n'_k->l

Iterate until convergence or for a fixed number of times

- E Step: for each training sequence j = 1 ... n

calculate the forward probability f_k(i) for the sequence j
calculate the backward probability b_k(i) for the sequence j
add the contribution of sequence j to n_k,cand n_k->l

- M Step: update HMM parameters using the expected counts n_k,c and n_k->land the pseudocounts n'_k,c and n'_k->l.

The number of times a transition from state k to m occurs is given by

where the superscript j refers to an instance in the training set.

The Maximization step uses the counts of the number of times a symbol is seen at a state and the number of times a transition occurs between two states which were obtained from the Expectation step to update the transition and emission probabilities in order to maximize the performance. The updated emission probability is

The transition probability is updated using

.
Pseudocounts n'_k,cand n'_k_→_mfor emission and transition probabilities respectively are taken into account because it prevents the numerator or denominator from taking a value of zero which happens when a particular state is not used in the given set of observed sequences or a transition does not take place, respectively. Pseudocounts are usually set to the actual counts plus small decimal values. Table 2.4 shows the pseudocode of the Baum-Welch algorithm.
2.3.3 The Viterbi Algorithm
The Viterbi algorithm is used to find the most probable path taken across the states in the HMM. It uses dynamic programming and a recursive approach to find the path. The algorithm checks all possible paths leading to a state and gives the most probable one. The calculations are done using induction, in an approach similar to the forward algorithm, but instead of using a summation, the Viterbi algorithm uses maximization.

Table 2.5: The Viterbi Algorithm

Viterbi Algorithm

Initialization (i = 0) :

Recursion (i = 1 ..... L) :

Termination : P(x,path*) = max_k(v_k(L)a_k0)

path*_L = argmax_k(v_k(L)a_k0)

Traceback ( i = L.....1) : path*_i-1 = ptr_i(path*_i)

v_m(i) - probability of the most probable path obtained after observing the first i

characters of the sequence and ending at state m

ptr_m(i) - pointer that stores the state that leads to state m after observing i symbols

path*_i- state visited at position i in the sequence

The probability of the most probable path obtained after observing the first i characters of the sequence and ending at state m, represented by v_m(i) is

.
The algorithm states from the start state and thus v₀(0) is initialized to 1. The algorithm keeps track of the best state used during a transition using pointers. The pointer ptr_i(m) stores the state that leads to state m after observing i symbols in the given sequence, it is found using

.
The most probable path is found by moving through the pointers backwards starting from the end state to the start state. Sometimes we may obtain more than one path as the most probable; in such cases one path is randomly selected. The pseudocode for the Viterbi Algorithm [Durbin et al., 1989] is shown in Table 2.5.

Chapter 3
ML Methods for Weather Data Modeling

To detect abnormal behavior of an RWIS sensor we build a model that provides us with a predicted value for a weather condition, which we can compare to the actual value reported by the sensor and calculate the difference between the two to measure likely sensor malfunctions. We can build such models using machine learning (ML) methods that can predict the weather conditions at the RWIS site. To build a weather model, ML methods require historical weather data obtained from the site and its nearby sites to learn the weather patterns. By including nearby sites, we provide additional information for the methods that can be used to indicate current climatic conditions at the site used for predictions.

To predict the temperature at a given time for site 67, in Figure 3.1, we can use the current and a couple of previous hour's weather data, such as temperature and visibility,

Figure 3.1: Predicting temperature value at a site using weather data from nearby sensors.

obtained from a set of sites that are located close to the site 67. Nearby sites to site 67 are indicated by arrows that point from them to site 67 (the arrows show that the information from these sites will be used in making predictions about the site they point to). ML methods are used to build the weather model for site 67 using historical data for the different variables used for predictions. The model built for site 67 will use the temperature and visibility information from the nearby sites and predict the temperature value that will be seen at site 67 at a time

In this chapter we discuss the use of Machine Learning (ML) methods to detect RWIS sensor malfunctions. In the first section, we describe the process of selecting RWIS and AWOS sites that can be used for modeling followed by the description of variables in the weather data collected from the selected sites. In the next section we describe the feature representation used by ML and HMM methods. In the final section we describe the general approach followed by these methods to predict weather variables.
3.1 Choosing RWIS - AWOS Sites
To predict weather variables at a site we gather relevant weather information from the site and this information is used to build a predictive model by using various machine learning methods. We then use the predictive model to classify new or previously unseen data. The prediction of values reported by the sensors at a site can be made more accurate if along with the information for the present site we consider the information obtained from the sensors located at sites surrounding the present site. Todey et al., [2002] report a significant improvement in analysis of weather data when using a combined dataset obtained from the sites in the RWIS and the AWOS network. To predict values reported by an RWIS sensor at a site we use meteorological data from surrounding RWIS sites and also data gathered from AWOS sites.
Out of the 76 RWIS sites present in Minnesota, we selected 13 sites to be used to detect RWIS sensor malfunctions at these locations. The selection of the sites was based on the climatic conditions and landscape at the locations these sites were situated. Sites in regions that have micro-climates, such as Duluth and many places in southern Minnesota, were not selected because of the climatic conditions at these sites do not reflect changes happening in their surroundings and have their own unique ecosystems. Similarly sensors located in urban areas, like Minneapolis, were not selected because of drastic climatic changes that occur in such areas due to human involvement. We further grouped the selected 13 RWIS sites into three sets in order to prevent macro-climate comparisons. As Minnesota has a diverse landscape, the climatological conditions in the north do not always reflect on the conditions in the south regions. The aim of grouping the RWIS sites into sets was to prevent comparisons between two sites present in totally different climatological regions. Each set can be compared to a simple climatological regime and the climatic changes at a site in the set are reflected on other sites, not necessarily at that instant but after a certain duration of time. Grouping helps in predicting the weather condition at a site when that condition is known in other sites in the set.

Along with the weather information from the RWIS sites, we use meteorological data gathered from the AWOS sites to help with the prediction of values at an RWIS site. The location of the RWIS sites with respect to the site's topography is variable, as these sensors sit near a roadway and are sometimes located on bridges. AWOS sites are located at airports on a flat surrounding topography, which leads to better comparability between weather data obtained from these sites. Thus including surrounding AWOS information can be beneficial in predicting values at an RWIS site.

We associated each RWIS site, in the 13 selected for prediction, with all the AWOS sites that were at a distance of at least 30 miles from it. 30 miles is chosen as a measure for association so as to pair at least one AWOS site with each RWIS site. For distances of more then 30 miles, it was seen that some RWIS sites were paired with the same AWOS sites. The distance is calculated using the latitude and longitude coordinates of the respective RWIS and AWOS sites (Tables A1 and A2). All RWIS sites are paired with one AWOS site, with the exception of site 20, which is associated with two AWOS sites namely KAIT and KBRD, with KAIT being the closer one. Due to the comparatively smaller distance between an RWIS site and its associated AWOS site, we find there is often a correlation between the values observed at these sites, which we can use as the basis of our models. Figure 3.2 shows the the locations of the RWIS and AWOS sites that we grouped together and Table 3.1 lists these groups.

F
igure 3.2: Grouping of RWIS and AWOS sites into three sets. This map also shows the locations of the selected RWIS and AWOS sites across Minnesota.

Table 3.1: Grouping of the selected 13 RWIS sites into three sets, along with their respective AWOS sites.

Set	RWIS Sites	AWOS Sites
Set 1	19, 27, 67	KLYU, KINL, KORB
Set 2	14, 20, 35, 49, 62	KFFM, KAIT, KBRD, KLXL, KPKD, KDTL
Set 3	25, 56, 60, 68, 78	KROX, KTVF, KCKN, KFSE, KBDE

3.2 Features Used
Of all the available features reported by RWIS sensors, we decided to focus on predicting air temperature, precipitation type and visibility. These three features were selected because they represent critical aspects of weather data for Mn/DOT.
All of these variables (temperature, precipitation type and visibility) are also reported by the AWOS sites. However, the data format used for reporting these variables by RWIS and AWOS sensors differs (refer to Sections 2.1.1.1 and 2.1.2.1 for details). To make the data from these two sources usable in ML algorithms and HMMs, and for comparisons, the data needs to be transformed into a common format and in some cases pre-processing of the variables may be required based on the requirements of the algorithms used.
3.2.1 Transformation of the Features
To use data from RWIS and AWOS together, for predictions and comparisons, we converted the data reported by them to follow a common format. It is also the case that RWIS reported data every 10 minutes whereas AWOS provides hourly reports. Thus, the RWIS data needs to be averaged if used along with AWOS data. The changes that were made to the features reported by RWIS sites to arrive at a common format are

RWIS uses Greenwich Mean Time (GMT) and AWOS uses Central Time (CT) when reporting data. The reporting time in RWIS is changed from GMT to CT.
Variables like air temperature, surface temperature and dew point that are reported in Celsius by RWIS are converted to Fahrenheit, which is the format used by AWOS. To convert six readings per hour to a single hourly reading in RWIS, a simple average is taken.
Distance, which is measured in kilometers by RWIS for visibility and wind speed is converted into miles, the format used by AWOS. A single hourly reading is obtained through a simple average in RWIS.
In order to obtain hourly averages for precipitation type and intensity reported by RWIS, we use the most frequently reported code for that hour. While for precipitation rate a simple average is used. RWIS sites report precipitation type and intensity as separate variables, whereas AWOS combines them into a single weather code [NCDC, 2005]. Mapping precipitation type and intensity reported by RWIS to the AWOS weather codes is not feasible and needs some compromises to be made. We thus keep these variables in their original format.

Of the three features selected for use in predictions, precipitation type is the one whose direct comparison between RWIS and AWOS values cannot be done because each uses a different format for reporting it. For broader comparison, we combine all the codes that report different forms of precipitation, in both RWIS and AWOS, into a single code which indicates the presence of some form of precipitation.

Apart from using the features obtained from the RWIS and AWOS sites, we also made use of historical information to represent our training data. This increases the amount of weather information we have for a given location or region. We collected hourly temperature values for the AWOS sites mentioned in the three sets (refer to Table 3.1), for a duration of seven years ranging from 1997 to 2004, from the Weather Underground website^². For many locations, the temperature was reported more than once an hour, in such cases the average of the temperature across the hour was taken. As we already have readings for temperature from two different sources, RWIS and AWOS, we use the information gathered from the website to adjust our dataset by deriving values such as the projected hourly temperature. To calculate the projected hourly temperatures for an AWOS site we use past temperature information obtained from the website for this location. The projected hourly temperature for an hour of a day is defined as the sum of the average temperature reported for that day in the year and the monthly average difference in temperature of that hour in the day for the respective month.
The steps followed to calculate the projected hourly temperature for an AWOS site are

Obtain the hourly temperatures from the data collected from wunderground.com for the respective AWOS site.
The average temperature for a day was calculated as the mean of the hourly readings across a day.
The hourly differences for each hour was calculated as the difference between the average daily temperature and actual temperature seen at that hour.
The average difference in temperature for a particular month (monthly average difference) per hour was calculated as the average of all hourly differences in a month for that hour obtained from all the years in the data collected.
The projected hourly temperature for a day is obtained from the sum of the average temperature for that day and the monthly average difference of that hour in the day.

For example, let the temperature value seen at the AWOS site KORB for the first hour for January 1^st for year 1997 is 32ºF. The temperature values for all 24 hrs seen on January 1^st are averaged and let this value be 30ºF. The hourly difference for this day for the first hour will be 2ºF. Let the averages of all the first hour values for the month of January seen in the data for KORB from years 1997 to 2004 be 5ºF. Then the projected temperature value for the first hour in January 1^st, 1997 will be 35ºF, which is the sum of the average temperature seen on January 1^st 1997 and the monthly average difference for the first hour in the month of January.

The projected hourly temperature was used as a feature in the datasets used for predicting weather variables at an RWIS site and is also used in the process of discretization of temperature, which will be discussed in the following section.
3.2.2 Discretization of the Features
Continuous features need to discretized when used in HMMs and classification algorithms. HMMs require all of the features to be discrete. Classification algorithms need the output attributes to be discrete and also the inputs attributes in case the algorithm cannot deal with continuous inputs. Regression algorithms can take inputs with discrete attributes. Discretization of features involves finding a set of values that split the continuous sequence into intervals and each interval is given a single discrete value. Discretization can be done using unsupervised or supervised methods. In unsupervised discretization, the attribute is divided into a fixed number of equal intervals, without any prior knowledge of the target (the output attributes) class values of instances in the given dataset. In supervised discretization, the splitting point is determined at a location which increases the information gain with respect to the given training dataset [Quinlan, 1986]. Dougherty et al., [1995] give a brief description of the process and a comparison of unsupervised and supervised discretization. WEKA provides a wide range of options to discretize any continuous variable, using supervised and unsupervised mechanisms.
In this thesis we propose a new method for discretization of temperature values obtained from RWIS sensors using temperature information obtained from other sources. Using the projected hourly temperature (refer to Section 3.2.1) for an AWOS site along with the current reported temperature for the closest RWIS site, we determine the class value for the current RWIS temperature value.
To determine the class value, the actual reported temperature at a RWIS site is subtracted from the projected hourly temperature for the AWOS site closest to it for that specific hour. This difference is then divided by the standard deviation of the projected hourly temperature for that AWOS site. The result indicates how much the actual value deviates above or below the projected value, that is, the number of standard deviation from the projected value (or the mean).

num_stdev = (actual_temp – proj_temp) / std_dev
The classes are divided according to the number of standard deviations from the mean, For example

Class Value		Class Value
1	num_stdev < -2	6	0.25 < num_stddev ≤ 0.5
2	-2 ≤ num_stdev ≤ -1	7	0.5 < num_stddev ≤ 1
3	-1 < num_stdev ≤ -0.5	8	1 < num_stddev ≤ 2
4	-0.5 < num_stdev ≤ -0.25	9	num_stdev > 2
5	-0.25 < num_stdev ≤ 0.25

Thus the number of standard deviations from the mean obtained for a given temperature value is mapped to one of the ranges and the temperature is assigned the respective class value. The ranges and the number of splits can be determined by the user or based on requirements of the algorithm.

For example, to convert the actual temperature 32ºF at an RWIS site we calculate the projected temperature for that hour at the associated AWOS site KORB, which might be 30ºF, and the standard deviation of projected temperatures at KORB for the year, which might be 5.06. Then our new representation for 32ºF is calculated as 0.396, which means temperature 32ºF is 0.396 standard deviations above from the mean at site KORB. 0.396 maps to class 6 in the split example given above. We thus arrive at the class value of 6 for the temperature 32ºF. Such a representation has an advantage as the effect of season and time of day is at least partially removed from the data.

1 2 3 4 5 6 7 8 9 ... 12