Overview
Learn how to process, analyse, and model large data sets
On this course, led by the University of Waikato where Weka originated, you’ll be introduced to advanced data mining techniques and skills.
Following on from their first Data Mining with Weka course, you’ll now be supported to process a dataset with 10 million instances and mine a 250,000-word text dataset.
You’ll analyse a supermarket dataset representing 5000 shopping baskets and learn about filters for preprocessing data, selecting attributes, classification, clustering, association rules, cost-sensitive evaluation.
You’ll also explore learning curves and how to automatically optimize learning parameters.
This course is aimed at anyone who deals in data professionally or is interested in furthering their professional or academic skills in data science.
This course follows on from Data Mining with Weka and it’s recommended that you complete that course first unless you already have a rudimentary knowledge of Weka.
As with the previous course, it involves no computer programming, although you need some experience with using computers for everyday tasks.
High school maths is more than enough; some elementary statistics concepts (means and variances) are assumed.
Before the course starts, download the free Weka software. It runs on any computer, under Windows, Linux, or Mac. It has been downloaded millions of times and is being used all around the world.
(Note: Depending on your computer and system version, you may need admin access to install Weka.)
Syllabus
- Exploring Weka’s interfaces, and working with big data
- Hello again
- What are Weka’s other interfaces for?
- Exploring the Experimenter
- Comparing classifiers
- The Knowledge Flow interface
- Using the Command Line
- Can Weka process big data?
- Working with big data
- Discretization and text classification
- How can you discretize numeric attributes?
- Discretizing numeric attributes
- Supervised discretization
- Discretization in J48
- How do you classify documents?
- Document classification
- Evaluating 2-class classification
- Multinomial Naive Bayes
- How are you getting on?
- Classification rules, association rules, and clustering
- Is it better to generate rules or trees?
- Decision trees and rules
- Generating decision rules
- What if there’s no “class” attribute?
- Association rules
- Learning association rules
- Representing clusters
- Evaluating clusters
- Selecting attributes and counting the cost
- How about selecting key attributes before applying a classifier?
- “Wrapper” attribute selection
- The Attribute Selected Classifier
- Scheme-independent selection
- Attribute selection using ranking
- What happens when different errors have different costs?
- Counting the cost
- Cost-sensitive classification
- Neural networks, learning curves, and performance optimization
- What are “neural networks” and how can I use them?
- Simple neural networks
- Multilayer perceptrons
- How much training data do I need? And how do I optimize all those parameters?
- Learning curves
- Performance optimization
- ARFF and XRFF
- There’s no magic in data mining
- Farewell