11-756 DESIGN AND IMPLEMENTATION OF SPEECH RECOGNITION SYSTEMS

DESIGN AND IMPLEMENTATION OF SPEECH RECOGNITION SYSTEMS

Instructor: Bhiksha Raj, co-instructed by Rita Singh and Mosur Ravishankar

COURSE NUMBER

ECE: 18799D

LTI: 11756

LTI students can also register for this course as a lab course

Credits:	12
Timings:	4:30 p.m. -- 5:50 p.m.
Days:	Mondays and Wednesdays
Location:	GHC 4211

Prerequisites:

Mandatory: Linear Algebra. Basic Probability Theory.

Recommended: Signal Processing.

Coding Skills: This course will require significant programming form the students. Students must be able to program fluently in at least one language (C, C++, Java, Python, LISP, Matlab are all acceptable).

This is a project-based course.

PROJECTS PAGE

Voice recognition systems invoke concepts from a variety of fields including speech production, algebra, probability and statistics, information theory, linguistics, and various aspects of computer science. Voice recognition has therefore largely been viewed as an advanced science, typically meant for students and researchers who possess the requisite background and motivation.

In this course we take an alternative approach. We present voice recognition systems through the perspective of a novice. Beginning from the very simple problem of matching two strings, we present the algorithms and techniques as a series of intuitive and logical increments, until we arrive at a fully functional continuous speech recognition system.

Following the philosophy that the best way to understand a topic is to work on it, the course will be project oriented, combining formal lectures with required hands-on work. Students will be required to work on a series of projects of increasing complexity. Each project will build on the previous project, such that the incremental complexity of projects will be minimal and eminently doable. At the end of the course, merely by completing the series of projects students would have built their own fully-functional speech recognition systems.

Grading will be based on project completion and presentation.


13 Jan 2010	Introduction.	Slides	Assignment 1
20 Jan 2010	Feature computation	Slides	Assignment 2	Notes
25 Jan 2010	Dynamic Time Warping	Slides	Assignment 3 Assignment 4
22 Feb 2010	From DTW to HMMs	Slides
24 Feb 2010	HMMs for isolated words	Slides	Assignment 5
3 Mar 2010	Recognizing Continuous Speech	Slides
24 Mar 2010	BP tables and training from continuous speech	Slides	Assignment 6
5 Apr 2010	Subword units, parts 1 and 2	Slides	Assignment 7
19 Apr 2010	Ngram models, approximate decoding strategies	Slides	Assignment 8
21 Apr 2010	Approximate decoding strategies	Slides	Assignment 9