Course Code

AIS-188

Semester

2nd Semester

ECTS Credits

7,5

Type of Course

Mandatory

Data Mining and Analysis

Objective

The ability to collect and store data has increased significantly as a result of innovation in various fields, such as the internet, e-commerce, e-commerce, bar-code readers, mobile devices and smart machines. Data mining is a rapidly evolving field that deals with the development of techniques that aim to help data holders make smart use of these collections.

In this course, students will learn about methods that help in the analysis of data, the extraction of useful standards of knowledge from them as well as in the decision-making process.


Course Contents

Basic concepts in data mining and data preparation

Requirements and review of basic data mining operations. Data cleaning, transformation. Measures of similarity, distance. Summary of analytical forecasting methods.

Clustering

Presentation of basic clustering algorithms for large databases. Spectral clustering methods. Dividing-hierarchical clustering. Clustering of non-linearly separable data. Fuzzy clustering. Techniques for evaluating clustering results.

Regression

Linear-multiple linear regression, logistic regression, Probit regression, spectral regression, multivariate analysis of variance (ANOVA-MANOVA). Exploratory factor analysis. Database extraction and advanced forecasting techniques. Experimental design. (Experimental design). Prediction-based prediction modeling (forecast prediction, cancer prediction).

Classification

Basic types of categorization. Statistical classification. Discriminant function analysis. Criteria for evaluating categorization methods. Cross-classifications analysis. Typical applications.

Categorization algorithms

Decision trees. Support vector machines. Applications with WEKA.

Dimension reduction techniques

The problem of many dimensions. Presentation of basic dimensional reduction techniques (PCA, SVD).

Link Analysis

Hyperlink analysis topics, Page ranking algorithms, Hubs and authorities (HITS).

Analysis of social networks

Network modeling, metrics in graphs (degree, betweenness centrality, connected components), clustering coefficient.

Export communities from graphs

Introduction to the basic concepts of grouping in graph data. Basic techniques for extracting communities from graphs.

Text mining

Text representation model, similarity measures, text prediction models, clustering techniques.





Recommended Readings

  • Daniel T. Larose, Chantal D. Larose. Data Mining and Predictive Analytics, Wiley, 2015 (2nd Edition)
  • Jure Leskovec, Anand Rajaraman, Jeff Ullman. Mining of Massive Datasets. Cam-bridge University Press. 2014 (2nd Edition).