Course Information

FORCE-ADD Requests: If you would like to be enrolled in the course, please Fill-in the Background form and send the file (in PDF format) to the course Email

Time and Location:

Teaching Assistant:

Subhodip Biswas

Course Email:

Course Description:

With the advent of web technology and the availability of massive amounts of data, the traditional approach of "algorithm driven science" is moving towards "data driven science". In recent years, data mining has emerged as a promising tool for solving problems in various application domains. A well-rounded methodology for interpretation and learning from data must be based on a collective combination of data modeling, algorithmic design, prototyping and extensive experimentation along with the interpretation of the results. The underlying principle of data mining is to develop robust algorithms for obtaining useful information from huge amounts of data amassed. Automatically modeling, organizing and interpreting the available data not only enables intelligent manipulation later on but also removes all the unwanted and unnecessary information.

This course introduces the fundamental principles, algorithms and applications of intelligent data processing and analysis. It will provide an in-depth understanding of various concepts and popular techniques used in the field of data mining. Effective data mining comprehensively integrates various concepts from different computer science areas such as machine learning, databases, artificial intelligence, pattern recognition, optimization, algorithms, parallel processing and data visualization. This course is mainly designed for beginning graduate students who are interested in data analysis and applications.

This course will cover the basic techniques in data analytics including the preparation and manipulation of data for analysis. Overview of data mining algorithms in classification, regression, clustering, association analysis, probabilistic modeling and anomaly detection. Detailed study of classification methods including tree-based methods, Bayesian methods, logistic regression, ensemble, bagging and boosting methods, neural network methods, use of support vectors and Bayesian networks. Detailed study of clustering methods including k-means, hierarchical and self-organizing map methods.


Text Books:

Reference Books (These books are placed on reserve in the Library):  

Homework Assignments:

There will be five written homework assignments. Homework problems might constitute some programming exercises that are designed to understand the performance of data mining algorithms. Students are encouraged to talk and discuss with other students to improve their conceptual understanding, but the final submission must be their own work. If any help is taken from others, please acknowledge the people from whom you received some help. Any homework turned in late will be penalized 10% for each late day.


This course will have one mid-term exam (after Spring break) and one final exam (at the end of the semester). Both exams will be in-class and closed book. Each of these exams will have a brief review session which will focus on relevant topics.

Final Project:

One of the major components of this course is the final project. In this project, students will investigate some interesting aspect of a data mining algorithm and apply it to a real-world problem. The main purpose of this project is to enable the students to get some hands-on experience in the design and implementation of a practical data mining system. In addition to the core computer science aspect, the performance of a data mining system significantly depends on some specific domain-dependent expert knowledge in the application field (such as bioinformatics, business intelligence, e-commerce, etc.).   More details about the project proposal and project submission will be provided on the course webpage later.

Grading Policies:

Final grades are based on the performance in homeworks, exams and final project. Here is the distribution.

The final grades will be relative to others in the class.

Accommodations Statement:

Students are encouraged to discuss with the instructor about any special needs or special accommodations as soon as they become aware of such needs. Those seeking accommodations based on disabilities should obtain a Faculty Letter from the Services for Students with Disabilities office (540-231-0858) located in Lavery Hall, Suite 310 (

Honor Code Statement:

All students must adhere to the Honor Code Policies of Virginia Tech. The Honor Code will be strictly enforced in this course. All assignments shall be considered graded work, unless otherwise noted. All aspects of your coursework are covered by the honor system. Any suspected violations of the Honor Code will be promptly reported to the honor system. Honesty in your academic work will develop into professional integrity. The faculty and students of Virginia Tech will not tolerate any form of academic dishonesty. See