Aditya polumetla in partial fulfillment of the requirements for the degree of master of science

səhifə	10/12
tarix	25.06.2016
ölçüsü	1.32 Mb.

1 ... 4 5 6 7 8 9 10 11 12

SITE: Airport code where the AWOS unit is located

LON: Longitude

LAT: Latitude
Appendix B
Using WEKA
WEKA is written in Java and is organized into packages arranged in a hierarchical manner. Details of the packages and the hierarchy are given by Witten & Frank [2005]. WEKA can be run using its graphical user interface or through entering textual commands in the command prompt. The general structure of the WEKA textual command, to perform multiple 10-fold cross-validations on a dataset using an algorithm (classifier) is

java -mx1024M -cp classpath callClassifier classifier_path classifier_options -t trainset.arff -x 10 -s seed_value -c attribute_index

where -cp specifies the path (i.e., the class path) where WEKA is located, callClassifier³ is a java class that is used to output the complete class probability without which WEKA outputs an evaluative result of the algorithm, classifier_path is the location of the algorithm in the WEKA package hierarchy, classifier_options specifies the options taken by an algorithm, -t specifies the training file, -x specifies the number of folds for cross-validation, -s is used to indicated the seed value when a multiple n-fold cross-validations need to be preformed, -c specifies the output attributes position in the dataset provided. The -T option is used when a test file is used for evaluating the model, when not used a cross validation is preformed on the training set provided.

WEKA requires the data in the train/test file to be in ARFF format. The general format of an ARFF file is given in Table B1. The string @relation is used to mention the name of the dataset, @attribute is used to define the attributes name and type and @data is used to

Table B.1: Format of an ARFF file.

@relation Predict_Temp_Year2002

@attribute temperature_site1 real

@attribute temperature_site2 real

@attribute precipitation (yes, no)

% used for comments

@data

23,22,yes
12,23,no

23,32,no
.............

indicate the start of the data, which is in a comma-separated form.

Following are the classifier_path for the machine learning algorithms that were used in this thesis along with their default options (classifier_options)

Linear Regression

weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8

where -S specifies the attribute selection methods with 0 representing the M5 method, and -R specifies the value of the ridge parameter.
Least Median Square

weka.classifiers.functions.LeastMedSq –S 4 –G 0

where -S specifies the size of random samples used to generate the least squared regression function, and -G specifies the seed value used to select subsets of the training data.

M5Prime

weka.classifiers.trees.M5P –M 4.0

where -M specifies the minimum number of instances.
Multilayer Perceptron

weka.classifiers.functions.MultilayerPerceptron -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H a

where -L specifies the learning rate, -M specifies the momentum, -N specifies the number of training epochs, -V specifies validation set size, -S specifies the seed value taken by the random number generator (random values are used for initialization of weights), -E specifies the validation threshold, and -H specifies the number of hidden layers with its value 'a' representing (num_attributes+num_classes)/2 layers.
RBF Network

weka.classifiers.functions.RBFNetwork -B 2 -S 1 -R 1.0E-8 -M -1 -W 0.1

where -B specifies the number of clusters generated by K-means, -S specifies the value of the seed passed on to the K-means, -R specifies the value of the ridge parameter, -M specifies the number of iterations to be performed by logistic regression, and -W specifies the minimum standard deviation for the clusters.
Conjunctive Rule

weka.classifiers.rules.ConjunctiveRule -N 3 -M 2.0 -P -1

-S 1

where -N specifies the amount of data used for pruning, -M specifies the minimum total weight of the instances in a rule, -P specifies the minimum number of antecedents allowed in a rule when pre-pruning is used, and -S specifies the seed value used.

J48

weka.classifiers.trees.J48 -C 0.25 -M 2

where -C specifies the confidence factor, and -M specifies the minimum number of instances taken by a leaf
Naive Bayes

weka.classifiers.bayes.NaiveBayes

Bayes Net

weka.classifiers.bayes.BayesNet -D

-Q weka.classifiers.bayes.net.search.local.K2 -- -P 1

-E weka.classifiers.bayes.net.estimate.SimpleEstimator -- -A 0.5

-D is used to prevent memory problems with ADTree is used, -Q specifies the search algorithm, and -E specifies the estimator used for finding the CPTs. K2 search algorithm is given by weka.classifiers.bayes.net.search.local.K2 with its option -P specifying the maximum number of parents taken by a node in the Bayesian network. The estimator used for filling up the CPTs is weka.classifiers.bayes.net.estimate.SimpleEstimator, with its option -A specifying the alpha value of the estimator.

Appendix C
Detailed Results
Table C.1: Results obtained from using regression algorithms to predict temperature at an RWIS site (Experiment 1). Feature vector consists of temperature information from RWIS-AWIS sites in a set along with temperature offset for the RWIS sites. The table has mean absolute error values averaged over ten 10-fold cross-validations.

		ML Algorithms
	RWIS Site	LMS	LR	M5P	RBF	CR	MLP
Set 1	19	0.908	0.960	0.936	9.521	11.150	1.059
	27	1.217	0.896	0.873	9.465	10.199	1.058
	67	1.069	0.918	0.885	10.062	11.596	1.108
Set 2	14	0.659	0.795	0.751	8.478	10.605	0.789
	20	0.743	0.820	0.776	9.417	10.821	1.001
	35	0.553	0.977	0.864	9.523	10.817	1.051
	49	0.916	0.913	0.898	9.579	11.017	1.074
	62	0.800	0.779	0.769	9.383	11.040	0.892
Set 3	25	0.984	1.062	0.889	10.386	11.957	1.097
	56	0.925	0.913	0.807	10.510	11.512	1.133
	60	0.889	0.867	0.833	9.675	11.078	1.002
	68	0.958	1.015	0.901	9.017	10.439	1.235
	78	0.929	0.875	0.809	8.945	10.449	1.012
Mean of Abs. Errors (ºF)		0.888	0.907	0.845	9.535	10.975	1.039
StdDev of Abs. Errors		0.171	0.083	0.058	0.559	0.503	0.110

StdDev refers to Standard Deviation

Figure C.1: Mean absolute errors for different RWIS sites obtained from predicting temperature using regression algorithms.

Table C.2: Results obtained from using regression algorithms to predict temperature at an RWIS site (Experiment 2). Feature vector consists of temperature information from RWIS-AWIS sites in a set along with precipitation type for the RWIS sites. The table has mean absolute error values averaged over ten 10-fold cross-validations.

		ML Algorithms
	RWIS Site	LMS	LR	M5P
Set 1	19	1.023	1.115	1.001
	27	0.935	0.959	0.931
	67	1.001	1.006	0.984
Set 2	14	0.726	0.788	0.771
	20	0.862	0.872	0.827
	35	1.052	1.062	0.938
	49	1.051	1.014	1.022
	62	0.848	0.827	0.815
Set 3	25	1.217	1.222	1.004
	56	1.084	1.077	0.992
	60	1.007	0.981	0.924
	68	1.046	1.154	0.973
	78	0.876	0.891	0.856
Mean of Abs Errors (ºF)		0.979	0.997	0.926
StdDev of Abs Errors		0.127	0.130	0.083

StdDev refers to standard deviation

Figure C.2: Mean absolute errors for different RWIS sites obtained from predicting temperature using regression algorithms, with precipitation type information added to the feature vector.

Table C.3: Results obtained from using classification algorithms to predict precipitation type at an RWIS site (Experiment 4). Feature vector consists of temperature information from RWIS-AWIS sites in a set along with precipitation type for the RWIS sites. The table has classification error values, as reported by WEKA, averaged over ten 10-fold cross-validations.

		ML Algorithms
	RWIS Site	J48	NB	Bayes Net
Set 1	19	0.064	0.325	0.346
	27	0.256	0.420	0.363
	67	0.356	0.472	0.414
Set 2	14	0.213	0.384	0.330
	20	0.265	0.450	0.379
	35	0.077	0.312	0.337
	49	0.062	0.328	0.328
	62	0.061	0.312	0.341
Set 3	25	0.068	0.342	0.342
	56	0.072	0.345	0.333
	60	0.095	0.317	0.323
	68	0.061	0.253	0.315
	78	0.065	0.318	0.231
Mean of Classification Errors		0.132	0.352	0.337
StdDev of Classification Errors		0.102	0.062	0.041

1 ... 4 5 6 7 8 9 10 11 12