|AAAI’2006 Workshop Proposal: Evaluation Methods for Machine Learning
Chris Drummond, William Elazmeh and Nathalie Japkowicz
1. Description of the topic
The goal of this workshop is to encourage debate within the machine learning community into how we experimentally evaluate new algorithms. We aim to discuss what properties of an algorithm need to be measured (e.g., accuracy, comprehensibility, conciseness); to discuss the need for more elaborate approaches than those currently used; to discuss the kind of alternate methods that could be useful; and ultimately to propose specific techniques that would be appropriate to particular problem categories.
To properly debate this area we invite position papers and technical papers addressing three main topics.
Topic I: Basic Issues
What is the purpose of evaluation, why is it important?
Is it necessary to have a simple quantitative metric to measure progress?
Is evaluation inherently multi-objective?
Is the trade-off of the objectives always a qualitative judgment best left to the user?
Topic II: Properties to Evaluate
What are the important properties of an algorithm that need evaluation?
Can we summarize the results into statistically valid single measures?
Are evaluation methods problem dependent, can we categorize these?
How valuable are bench mark data sets and is their use statistically valid?
Topic III: Lessons Learned from Other Fields
What have they measured and why did they measure it?
What sort of statistical techniques have the used in evaluation?
The purpose of this workshop is double. It would seek 1) to discuss the need for better evaluation methods in Machine Learning and establish the directions in which such research should go; 2) to educate the broader Machine Learning community as to the need of using better evaluation methods and what these methods are.
2. Timeliness of the topic
Most disciplines, such as Psychology, Economics, Sociology and Medicine, are concerned with the evaluation of their predictive models. As a community, our evaluation methods are not as mature as those in other fields. We do not pay careful attention to what the measures mean, their utility, and to statistical tests that guarantee, as best as they can, the validity of the findings on larger or future data sets. This was acceptable in the early years when the basic concepts of machine learning were getting established and tested on standard problems from the UC Irvine repository. In the last decade, however, the field has reached a level of maturity such that its algorithms are being applied in a variety of real-life situations. It is time to use more robust tests of their performance.
The timeliness of the topic is supported by the fact that a discussion of evaluation techniques for machine learning has sprung out, particularly, in the sub-community of researchers involved in cost-sensitive and class imbalanced learning. As a matter of fact, the use of ROC Analysis or Cost Curves, rather than accuracy, has become quite standard in that sub-community and two workshops on ROC-Analysis have already been held.
We believe that this discussion should not be limited to the cost-sensitive and class-imbalanced community, and that methods, other than ROC Analysis and Cost-Curves, should also be considered and designed.
We will, thus, discuss where the field of evaluation in Machine Learning needs to go from here and what our goals should be for the next few years.
3. Proposed Format
We are planning to have several invited talks, including some from outside of our research community that could criticize our accepted practices from an external point of view and some from inside our community that would explain how we could improve on our current practices. Potential speakers are:
Jim Malley, National Institute of Health (Outside look)
Tom Dietterich (Inside look)
Charles Elkan (Inside look)
Foster Provost (Inside look)
We are also planning a panel discussion on the issue with all our invited speakers. The audience would, of course, be welcome to participate in that discussion.
The rest of the day would be devoted to the presentation of papers that would have been submitted and accepted to the workshop.
We are envisioning a one-day workshop.
5. Contact Information
Chris Drummond, NRC Institute for Information Technology,1200 Montreal Road
Building M-50, Room 374, Ottawa, ON K1A 0R6; Telephone: +1 (613) 993-0709;
Dr. Chris Drummond is a Research Officer at the Institute for Information Technology of the National Research Council of Canada. Together with Rob Holte, he proposed a new visual evaluation technique, Cost-Curves, that captures the behaviour of learning algorithms on the entire cost-space of a problem. This technique has been adopted by a large number of researchers in the cost-sensitive/class imbalance learning community as a more informative alternative to ROC Analysis.
William Elazmeh, SITE, University of Ottawa, 800 King Edward Ave., P.O. Box 450, Stn A, Ottawa, Ontario, Canada, K1N 6N5; Telephone: (613) 562-5800 ext. , Fax: (613) 562-; E-mail: email@example.com
William Elazmeh is a Ph.D. student at the School of Information Technology and Engineering of the University of Ottawa. He is interested in surveying techniques from the Biostatistical community for use in Machine Learning research. More specifically, he is currently in the process of comparing the confidence intervals generated by the Tango-Wilson approach to those generally used in machine learning. As well, he is developing new techniques that adapt the Biostatistical approaches to the needs of machine learners.
Nathalie Japkowicz, SITE, University of Ottawa, 800 King Edward Ave., P.O. Box 450, Stn A, Ottawa, Ontario, Canada, K1N 6N5; Telephone: (613) 562-5800 ext. 6693, Fax: (613) 562-5187; E-mail: firstname.lastname@example.org
Dr. Nathalie Japkowicz is an associate professor at the School of Information Technology and Engineering of the University of Ottawa. She previously co-organized two workshops on learning from class imbalanced data sets (AAAI’2000 and ICML’2003) and one workshop on autoencoders (NIPS’1997). She also co-edited a special issue of the ACM SIGKDD Explorations newsletter on class imbalanced data sets. She was an early user of ROC curves in Machine Learning and is currently investigating the development of new evaluation measures in both supervised and unsupervised learning.
6. Potential Attendees
A list of potential attendees includes: