Data Mining and Preparation - Μεταπτυχιακό Πληροφοριακά Συστήματα και Υπηρεσίες

Code

ΜΔΑ-283

Semester

1st

ECTS

7,5

E-Services

Lefkippos

Category

Obligatory

Instructors

Μ. Filippakis
Μ. Halkidi

Objective

The ability to collect and store data has increased significantly as a result of innovation in various areas, such as the internet, e-commerce, electronic transactions, bar-code readers, mobile devices and intelligent machines. Data mining is a rapidly growing field that deals with the development of techniques that aim to help data owners make intelligent use of these collections.

In the context of this course, we study methods that help in the selection and preparation of data before the application of analysis and knowledge mining techniques. Also, the basic techniques used to extract useful knowledge patterns from large data collections are presented. Techniques related to the analysis of various types of data including text, data from the World Wide Web and social networks are studied. Through this course, students are expected to acquire significant technical skills in data analysis and become familiar with algorithms and knowledge mining methods.

After successfully completing the course, students will be able to:

assess the quality of the data to be analyzed and apply the necessary data preparation techniques
choose the appropriate data mining technique based on the requirements and data types
apply data mining techniques
use appropriate techniques and tools to extract knowledge from data collections
to evaluate the quality of data mining results

Learning outcomes

Search for, analysis and synthesis of data and information, with the use of the necessary technology
Adapting to new situations
Decision-making
Working independently
Production of new research ideas
Project planning and management
Criticism and self-criticism

Syllabus

Basic concepts in data mining and data preparation

Requirements and review of basic data mining tasks. Data cleaning, transformation. Measures of similarity, distance. Summary of analytical forecasting methods.
Clustering

Introduction to basic clustering algorithms for large databases. Spectral clustering methods. Separative-hierarchical clustering. Clustering of non-linearly separable data. Fuzzy clustering

Techniques for evaluating clustering results.
Classification

Basic types of categorization. Statistical classification. Discriminant function analysis. Support vector machines. Evaluation criteria for categorization methods. Cross-classifications analysis. Typical applications.
Dimensional reduction techniques

The problem of many dimensions. Presentation of basic dimensionality reduction techniques (PCA, SVD).
Association rules, frequently occurring sets of objects

Apriori algorithm, comparison of algorithms, representative correlation rules.
Link Analysis

Hyperlink analysis topics, Page ranking algorithms, Hubs and authorities (HITS).
Social network analysis

Network modeling, graph metrics (degree, betweenness centrality, connected components), clustering coefficient.
Extract communities from graphs

Introduction to the basic concepts of clustering on graph data. Basic techniques for extracting communities from graphs.
Text mining

Text representation model, similarity measures, predictive models for text, clustering techniques.
Recommendation generating systems

Content-based systems, collaborative filtering systems, personalization, knowledge mining techniques for large-scale recommender systems, evaluation of recommender systems, applications of recommender systems.

Suggested bibliography

Daniel T. Larose, Chantal D. Larose Data Mining and Predictive Analytics, Wiley, 2015 (2nd Edition)
Jure Leskovec, Anand Rajaraman, Jeff Ullman. Mining of Massive Datasets. Cam-bridge University Press. 2014 (2nd Edition).
Han and M. Kamber . Data Mining: Concepts and Techniques. Morgan Kaufmann, 2006

Related academic journals

Advanced Information Systems

Big Data and Analytics

IT Governance

Area: Big Data and Analytics

Ειδίκευση: Big Data and Analytics