Course Detail
Units:
3.0
Course Components:
Lecture
Enrollment Information
Enrollment Requirement:
Prerequisites: "C-" or better in (CS 3500 AND CS 3190).
Description
Meets with CS 6140. Data mining is the study of efficiently finding structures and patterns in large data sets. We will focus on: (1) converting from a messy and noisy raw data set to a structured and abstract one, (2) applying scalable and probabilstic algorithms to these well-structured abstract data sets, and (3) formally modeling and understanding the error and other consequences of parts (1) and (2), including choice of data representation and trade-offs between accuracy and scalability. These steps are essential for training as a data scientist. Topics will include: similarity search, clustering, regression/dimensionality reduction, graph analysis, PageRank, and small space summaries. We will also cover several recent developments and applications.