The main objective of the course is to train students in methodologies and technologies for data management. This course covers advanced topics related to database design and query processing in modern data management architectures in the context of broader network-centric systems and services. It also examines issues of business intelligence, including business data integration, business process modeling, and advanced mining techniques from business data. The expected learning outcomes of the course include the ability of students to effectively develop traditional and network-centric systems and services in database environments including structured, semi-structured and unstructured data. Also, students acquire basic knowledge and skills in data analysis and extraction of useful information.
Distributed data management
Basic concepts, problems, architectures, distributed query processing, peer-to-peer data management systems, unstructured and structured peer-to-peer networks.
Parallel data management
Fundamental concepts and architecture of parallel databases, parallel query processing, data management in the cloud, the MapReduce programming model, the Hadoop implementation, HDFS.
Query processing and optimization
Rank-aware query processing, rank-join query processing, algorithms for rank-aware query processing, skyline queries, algorithms for processing skyline queries.
Dimensionality reduction – Feature selection
Multidimensional data, modeling, problems of many dimensions (“the curse of dimensionality”, “the empty space phenomenon”), failure of indexing methods, dimensionality reduction algorithms, application in practical problems in data management.
Security and privacy issues
Authentication, access control, security policies, users roles (model RBAC), the problem of publishing anonymized data, k-anonymity, l-diversity, privacy-enforcing mechanisms.
Basic concepts, an industry viewpoint on business intelligence, new trends (Big Data, fast business, better software), business process modeling.
Information integration in business intelligence – Data preprocessing
Data selection, data cleaning, handling missing values, data integration, semantic heterogeneity, data visualization for decision support.
Distance measures/similarity measures for different data types (numerical, categorical, text), processing similarity queries (range queries and k-nearest neighbor queries), applications in machine learning.
Multidimensional data model, architecture of data warehouses, design of data warehouses, extract-transform-load (ETL), OLAP operations, data warehouses as tools for business intelligence.
Data mining and text analysis
Basic data mining techniques and application to business intelligence, information extraction techniques from diverse data sources (text, Web, social networks).
- Teorey T. J., Lightstone S. S., Nadeau T. and Jagadish H.V. (2011): Database Modeling and Design, Fifth Edition: Logical Design, Morgan Kaufmann, ISBN-10: 0123820200.
- Teorey T. J. (1998): Database Modeling & Design: The Fundamental Principles, Morgan Kaufmann, ISBN-10: 1558602941.
- Siau K. (2007): Contemporary Issues in Database Design and Information Systems Development, IGI Publishing, ISBN-10: 1599042894.
- Raymond T.Ng et al. (2013): Perspectives on Business Intelligence. Morgan & Claypool Publishers. Synthesis Lectures on Data Management.
- Han J. and Kamber M. (2006): Data Mining-Concepts and Techniques. Morgan Kaufmann, ISBN 1-55860-901-6.
- Vazirgiannis, M., Halkidi, M. and Gunopoulos, D. (2003): Quality Assessment and Uncertainty Handling in Data Mining, Springer Verlag, LNAI Series, ISBN-10: 1852336552.
- Chakrabarti S. (2002): Mining the Web, Discovering Knowledge from Hypertext Data, Morgan Kaufman Publishers, ISBN-10: 1558607544.
- Chaudhuri, S, Dayal, U., Narasayya, V. (2011): An Overview of Business Intelligence Technology. Communications of the ACM, Vol. 54 No. 8, Pages 88-98.