Ana səhifə

Aditya polumetla in partial fulfillment of the requirements for the degree of master of science


Yüklə 1.32 Mb.
səhifə10/12
tarix25.06.2016
ölçüsü1.32 Mb.
1   ...   4   5   6   7   8   9   10   11   12

SITE: Airport code where the AWOS unit is located

LON: Longitude

LAT: Latitude
Appendix B
Using WEKA
WEKA is written in Java and is organized into packages arranged in a hierarchical manner. Details of the packages and the hierarchy are given by Witten & Frank [2005]. WEKA can be run using its graphical user interface or through entering textual commands in the command prompt. The general structure of the WEKA textual command, to perform multiple 10-fold cross-validations on a dataset using an algorithm (classifier) is

java -mx1024M -cp classpath callClassifier classifier_path classifier_options -t trainset.arff -x 10 -s seed_value -c attribute_index

where -cp specifies the path (i.e., the class path) where WEKA is located, callClassifier3 is a java class that is used to output the complete class probability without which WEKA outputs an evaluative result of the algorithm, classifier_path is the location of the algorithm in the WEKA package hierarchy, classifier_options specifies the options taken by an algorithm, -t specifies the training file, -x specifies the number of folds for cross-validation, -s is used to indicated the seed value when a multiple n-fold cross-validations need to be preformed, -c specifies the output attributes position in the dataset provided. The -T option is used when a test file is used for evaluating the model, when not used a cross validation is preformed on the training set provided.

WEKA requires the data in the train/test file to be in ARFF format. The general format of an ARFF file is given in Table B1. The string @relation is used to mention the name of the dataset, @attribute is used to define the attributes name and type and @data is used to

Table B.1: Format of an ARFF file.

@relation Predict_Temp_Year2002

@attribute temperature_site1 real

@attribute temperature_site2 real

@attribute precipitation (yes, no)

% used for comments

@data


23,22,yes

12,23,no


23,32,no

.............



indicate the start of the data, which is in a comma-separated form.

Following are the classifier_path for the machine learning algorithms that were used in this thesis along with their default options (classifier_options)

Linear Regression

weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8

where -S specifies the attribute selection methods with 0 representing the M5 method, and -R specifies the value of the ridge parameter.
Least Median Square

weka.classifiers.functions.LeastMedSq –S 4 –G 0

where -S specifies the size of random samples used to generate the least squared regression function, and -G specifies the seed value used to select subsets of the training data.

M5Prime

weka.classifiers.trees.M5P –M 4.0

where -M specifies the minimum number of instances.
Multilayer Perceptron

weka.classifiers.functions.MultilayerPerceptron -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H a

where -L specifies the learning rate, -M specifies the momentum, -N specifies the number of training epochs, -V specifies validation set size, -S specifies the seed value taken by the random number generator (random values are used for initialization of weights), -E specifies the validation threshold, and -H specifies the number of hidden layers with its value 'a' representing (num_attributes+num_classes)/2 layers.
RBF Network

weka.classifiers.functions.RBFNetwork -B 2 -S 1 -R 1.0E-8 -M -1 -W 0.1

where -B specifies the number of clusters generated by K-means, -S specifies the value of the seed passed on to the K-means, -R specifies the value of the ridge parameter, -M specifies the number of iterations to be performed by logistic regression, and -W specifies the minimum standard deviation for the clusters.
Conjunctive Rule

weka.classifiers.rules.ConjunctiveRule -N 3 -M 2.0 -P -1

-S 1

where -N specifies the amount of data used for pruning, -M specifies the minimum total weight of the instances in a rule, -P specifies the minimum number of antecedents allowed in a rule when pre-pruning is used, and -S specifies the seed value used.


J48

weka.classifiers.trees.J48 -C 0.25 -M 2

where -C specifies the confidence factor, and -M specifies the minimum number of instances taken by a leaf
Naive Bayes

weka.classifiers.bayes.NaiveBayes


Bayes Net

weka.classifiers.bayes.BayesNet -D

-Q weka.classifiers.bayes.net.search.local.K2 -- -P 1

-E weka.classifiers.bayes.net.estimate.SimpleEstimator -- -A 0.5


-D is used to prevent memory problems with ADTree is used, -Q specifies the search algorithm, and -E specifies the estimator used for finding the CPTs. K2 search algorithm is given by weka.classifiers.bayes.net.search.local.K2 with its option -P specifying the maximum number of parents taken by a node in the Bayesian network. The estimator used for filling up the CPTs is weka.classifiers.bayes.net.estimate.SimpleEstimator, with its option -A specifying the alpha value of the estimator.

Appendix C
Detailed Results
Table C.1: Results obtained from using regression algorithms to predict temperature at an RWIS site (Experiment 1). Feature vector consists of temperature information from RWIS-AWIS sites in a set along with temperature offset for the RWIS sites. The table has mean absolute error values averaged over ten 10-fold cross-validations.

ML Algorithms

RWIS Site

LMS

LR

M5P

RBF

CR

MLP

Set 1

19

0.908

0.960

0.936

9.521

11.150

1.059

27

1.217

0.896

0.873

9.465

10.199

1.058

67

1.069

0.918

0.885

10.062

11.596

1.108

Set 2

14

0.659

0.795

0.751

8.478

10.605

0.789

20

0.743

0.820

0.776

9.417

10.821

1.001

35

0.553

0.977

0.864

9.523

10.817

1.051

49

0.916

0.913

0.898

9.579

11.017

1.074

62

0.800

0.779

0.769

9.383

11.040

0.892

Set 3

25

0.984

1.062

0.889

10.386

11.957

1.097

56

0.925

0.913

0.807

10.510

11.512

1.133

60

0.889

0.867

0.833

9.675

11.078

1.002

68

0.958

1.015

0.901

9.017

10.439

1.235

78

0.929

0.875

0.809

8.945

10.449

1.012

Mean of Abs. Errors (ºF)

0.888

0.907

0.845

9.535

10.975

1.039

StdDev of Abs. Errors

0.171

0.083

0.058

0.559

0.503

0.110

StdDev refers to Standard Deviation

Figure C.1: Mean absolute errors for different RWIS sites obtained from predicting temperature using regression algorithms.

Table C.2: Results obtained from using regression algorithms to predict temperature at an RWIS site (Experiment 2). Feature vector consists of temperature information from RWIS-AWIS sites in a set along with precipitation type for the RWIS sites. The table has mean absolute error values averaged over ten 10-fold cross-validations.

ML Algorithms

RWIS Site

LMS

LR

M5P

Set 1

19

1.023

1.115

1.001

27

0.935

0.959

0.931

67

1.001

1.006

0.984

Set 2

14

0.726

0.788

0.771

20

0.862

0.872

0.827

35

1.052

1.062

0.938

49

1.051

1.014

1.022

62

0.848

0.827

0.815

Set 3

25

1.217

1.222

1.004

56

1.084

1.077

0.992

60

1.007

0.981

0.924

68

1.046

1.154

0.973

78

0.876

0.891

0.856

Mean of Abs Errors (ºF)

0.979

0.997

0.926

StdDev of Abs Errors

0.127

0.130

0.083

StdDev refers to standard deviation



Figure C.2: Mean absolute errors for different RWIS sites obtained from predicting temperature using regression algorithms, with precipitation type information added to the feature vector.


Table C.3: Results obtained from using classification algorithms to predict precipitation type at an RWIS site (Experiment 4). Feature vector consists of temperature information from RWIS-AWIS sites in a set along with precipitation type for the RWIS sites. The table has classification error values, as reported by WEKA, averaged over ten 10-fold cross-validations.




ML Algorithms

RWIS Site

J48

NB

Bayes Net

Set 1

19

0.064

0.325

0.346

27

0.256

0.420

0.363

67

0.356

0.472

0.414

Set 2

14

0.213

0.384

0.330

20

0.265

0.450

0.379

35

0.077

0.312

0.337

49

0.062

0.328

0.328

62

0.061

0.312

0.341

Set 3

25

0.068

0.342

0.342

56

0.072

0.345

0.333

60

0.095

0.317

0.323

68

0.061

0.253

0.315

78

0.065

0.318

0.231

Mean of Classification Errors

0.132

0.352

0.337

StdDev of Classification Errors

0.102

0.062

0.041
1   ...   4   5   6   7   8   9   10   11   12


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©atelim.com 2016
rəhbərliyinə müraciət