Code

ΠΠΣ-188

Semester

2nd

ECTS

7,5

E-Services

Category

Obligatory

Objective

The ability to collect and store data has increased significantly as a result of innovation in various areas, such as the internet, e-commerce, electronic transactions, bar-code readers, mobile devices and intelligent machines. Data mining is a rapidly growing field that deals with the development of techniques that aim to help data owners make intelligent use of these collections.

After successfully completing the course, students will be able to:

  • understand basic data mining techniques
  • know methods for clustering, classification, regression
  • apply and implement data mining algorithms
  • apply data analysis techniques to text data, world wide web data, and social network data

Learning outcomes

  • Search for, analysis and synthesis of data and information, with the use of the necessary technology
  • Adapting to new situations
  • Decision-making
  • Working independently
  • Production of new research ideas
  • Project planning and management
  • Criticism and self-criticism

Syllabus

  • Basic concepts in data mining and data preparation

    Requirements and review of basic data mining tasks. Data cleaning, transformation. Measures of similarity, distance. Summary of analytical forecasting methods.

     

  • Clustering

    Introduction to basic clustering algorithms for large databases. Spectral clustering methods. Separative-hierarchical clustering. Clustering of non-linearly separable data. Fuzzy clustering. Techniques for evaluating clustering results.

     

  • Regression

    Linear-multiple linear regression, logistic regression, inverse normal regression (Probit regression), spectral regression, multivariate analysis of variance (ANOVA-MANOVA). Exploratory factor analysis. Database mining and advanced prediction techniques. Experimental design. (Experimental design). Regression-based prediction modeling (forecast prediction, cancer prediction).

     

  • Classification

    Basic types of categorization. Statistical classification. Discriminant function analysis. Evaluation criteria for categorization methods. Cross-classifications analysis. Typical applications.

     

  • Classification algorithms

    Decision trees. Support vector machines. Apps with WEKA.

     

  • Dimensional reduction techniques

    The problem of many dimensions. Presentation of basic dimensionality reduction techniques (PCA, SVD).

     

  • Link Analysis

    Topics of hyperlink analysis, Page ranking algorithms, Hubs and authorities (HITS).

     

  • Social network analysis

    Network modeling, graph metrics (degree, betweenness centrality, connected components), clustering coefficient.

     

  • Extract communities from graphs

    Introduction to the basic concepts of clustering on graph data. Basic techniques for extracting communities from graphs.

     

  • Text mining

    Text representation model, similarity measures, predictive models for text, clustering techniques.

Bibliography