Data Mining

Course Objective

To provide a strong foundation in data mining techniques following a combination of pattern-recognition rules, statistical rules, as well as rules drawn from machine learning. It covers a thorough discussion of the widely used data-mining tools and their demonstration using industry based data sets.

 Learning Outcomes

Upon completion of this course, students will be able to

Extract useful information from large and complex data sets

Recognize patterns and trends in the data bases and model them.

Use pattern-recognition rules, statistical rules, as well as rules drawn from machine learning for extracting valuable information from data bases

 Detailed Syllabus

1. Introduction to  Data Mining, Processing the Information and Getting to Know Your Data, Descriptive statistics, Simple plots, Pre-processing, Outlier detection and Cleaning

2. Basic Statistical Models, Standard linear regression, Nonparametric regression approaches, Local polynomial Regression,

Importance of parsimony in statistical modeling , Penalty-based variable selection in regression models with  many parameters (LASSO), Logistic regression, Multinomial logistic regression

3. Classification Models, Binary classification, Probabilities and evaluating classification performance, Classification using a nearest neighbour analysis, The naive Bayesian analysis

 4.More on Classification and Useful tools,  Discussion on discriminant analysis, Decision trees, Chi-square automatic interaction detection (CHAID),  Ensemble methods, bagging, boosting, and random forests,  support vector machines (SVM),  Neural networks

 5.Clustering, Different types of clustering methods, Hierarchical clustering, K-means clustering, Applications with market basket analysis, Discussions on association rules and lift

 6. Dimension Reduction,  Factor models and principal components, Reducing the dimension in regressions with multicollinear inputs, principal components regression and partial least squares

 7. Text as Data and Network Data,  Text mining and sentiment analysis, Analysis of network data