Course Detail
Units:
3.0
Course Components:
Lecture
Enrollment Information
Enrollment Requirement:
Prerequisites: "C-" or better in (CS 3500 AND CS 3190).
Description
Data mining is the study of efficiently finding structures and patterns in large data sets. We will focus on: (1) converting from a messy and noisy raw data set to a structured and abstract one, (2) applying scalable and probabilistic algorithms to these well-structured abstract data sets, and (3) formally modeling and understanding the error and other consequences of parts (1) and (2), including choice of data representation and trade-offs between accuracy and scalability. These steps are essential for training as a data scientist. Topics will include: similarity search, clustering, regression/dimensionality reduction, graph analysis, PageRank, and small space summaries. We will also cover several recent developments and applications.