The ability to collect and store data has increased significantly as a result of innovation in various fields, such as the internet, e-commerce, e-commerce, bar-code readers, mobile devices and smart machines. Data mining is a rapidly evolving field that deals with the development of techniques that aim to help data holders make smart use of these collections.
In this course, students will learn about methods that help in the analysis of data, the extraction of useful standards of knowledge from them as well as in the decision-making process.
Basic concepts in data mining and data preparation
Requirements and review of basic data mining operations. Data cleaning, transformation. Measures of similarity, distance. Summary of analytical forecasting methods.
Presentation of basic clustering algorithms for large databases. Spectral clustering methods. Dividing-hierarchical clustering. Clustering of non-linearly separable data. Fuzzy clustering. Techniques for evaluating clustering results.
Linear-multiple linear regression, logistic regression, Probit regression, spectral regression, multivariate analysis of variance (ANOVA-MANOVA). Exploratory factor analysis. Database extraction and advanced forecasting techniques. Experimental design. (Experimental design). Prediction-based prediction modeling (forecast prediction, cancer prediction).
Basic types of categorization. Statistical classification. Discriminant function analysis. Criteria for evaluating categorization methods. Cross-classifications analysis. Typical applications.
Decision trees. Support vector machines. Applications with WEKA.
Dimension reduction techniques
The problem of many dimensions. Presentation of basic dimensional reduction techniques (PCA, SVD).
Hyperlink analysis topics, Page ranking algorithms, Hubs and authorities (HITS).
Analysis of social networks
Network modeling, metrics in graphs (degree, betweenness centrality, connected components), clustering coefficient.
Export communities from graphs
Introduction to the basic concepts of grouping in graph data. Basic techniques for extracting communities from graphs.
Text representation model, similarity measures, text prediction models, clustering techniques.
- Daniel T. Larose, Chantal D. Larose. Data Mining and Predictive Analytics, Wiley, 2015 (2nd Edition)
- Jure Leskovec, Anand Rajaraman, Jeff Ullman. Mining of Massive Datasets. Cam-bridge University Press. 2014 (2nd Edition).