The ability to collect and store data has increased significantly as a result of advances in various areas, such as the Internet, electronic commerce, electronic transactions, readers’ bar-code, mobile devices and intelligent machines. Data mining is a rapidly growing research area that deals with the development of techniques that aim to help data owners to make smart use of these data collections. Also in conjunction with forecasting techniques can leverage existing information to make predictions and decisions.
In this course we will study methods for data analysis, knowledge extraction from voluminous data collections as well as techniques for forecasting and decision making.
Fundamental concepts in data mining and data preparation.
Requirements and review of the main data mining tasks. Data cleaning, transformation. Similarity measures, distance. Overview of analytical prediction methods.
Linear – multiple linear regression, logistic regression, inverse regression normal (Probit regression), spectral regression, multivariate analysis of variance (ANOVA-MANOVA). Exploratory factor analysis. Extraction from the database and advanced forecasting techniques. Experimental design. Predictive modelling based on regression (forecast prediction, cancer prediction).
Presentation of representative clustering algorithms for large databases. Spectral clustering methods. Divisive – hierarchical clustering. Clustering nonlinear separable data. Fuzzy clustering . Cluster validity techniques.
Fundamental classification methods. Statistical Classification. Discrimination function analysis. Support vector machines. Evaluation methods. Cross-classifications analysis. Application examples.
Apriori algorithm. Frequent itemsets. Representative association rules .
Model representation of streaming data, stream clustering approaches.
Introduction to recommendations techniques. Collaborative filtering, Content-based approached, matrix factorization.
Social network analysis.
Network modelling, graph computing metrics (degree, betweenness, centrality, connected components), clustering coefficient.
Models for text representation, similarity measures, predictive models for text clustering techniques.
Hyper-links analysis, Page ranking algorithms, Hubs and authorities (HITS).
- Daniel T. Larose, Chantal D. Larose Data Mining and Predictive Analytics, Wiley, 2015 (2nd Edition).
- Jure Leskovec, Anand Rajaraman, Jeff Ullman. Mining of Massive Datasets. Cam-bridge University Press. 2014 (2nd Edition).