COURSE NUMBER | -- | ECE: 18799D | LTI: 11756 |
Credits: | 12 | |
Timings: | 4:30 p.m. -- 5:50 p.m. | |
Days: | Mondays and Wednesdays | |
Location: | GHC 4102 | |
Prerequisites: |
Mandatory: Linear Algebra. Basic Probability Theory. |
Recommended: Signal Processing. |
Coding Skills: This course will require significant programming form the students. Students must be able to program fluently in at least one language (C, C++, Java, Python, LISP, Matlab are all acceptable). |
PROJECTS PAGE |
Voice recognition systems invoke concepts from a variety of fields including speech production, algebra, probability and statistics, information theory, linguistics, and various aspects of computer science. Voice recognition has therefore largely been viewed as an advanced science, typically meant for students and researchers who possess the requisite background and motivation.
In this course we take an alternative approach. We present voice recognition systems through the perspective of a novice. Beginning from the very simple problem of matching two strings, we present the algorithms and techniques as a series of intuitive and logical increments, until we arrive at a fully functional continuous speech recognition system.
Following the philosophy that the best way to understand a topic is to work on it, the course will be project oriented, combining formal lectures with required hands-on work. Students will be required to work on a series of projects of increasing complexity. Each project will build on the previous project, such that the incremental complexity of projects will be minimal and eminently doable. At the end of the course, merely by completing the series of projects students would have built their own fully-functional speech recognition systems.
In this edition of the course we will also introduce the theory of Weighted Finite State transducers. In the latter half of the course students will learn to build their own WFST systems, and use open-source tools to compose their own WFST recoginzers.
Grading will be based on project completion and presentation.
Class 1 | 23 Jan 2013 | Introduction | Slides | ||
Class 2 | 28 Jan 2013 | Data capture | Slides | ||
Class 3 | 30 Jan 2013 | Feature computation | Slides | assignment 1 | |
Class 4 | 4 Feb 2013 | String matching | Slides | ||
Class 5 | 6 Feb 2013 | DTW | Slides | assignment 2 | |
Class 6 | 11 Feb 2013 | Assignment 1 presentations | |||
Class 7 | 13 Feb 2013 | DTW to HMMs | Slides | ||
Class 8 | 18 Feb 2013 | HMMs, part 1 | Slides | ||
Class 9 | 23 Feb 2013 | Assignment 2 presentations | assignment 3 | ||
Class 10 | 25 Feb 2013 | HMM part 2 | Slides | ||
Class 11 | 27 Feb 2013 | Recognizing continuous speech | Slides | assignment 4 | |
Class 12 | 4 Mar 2013 | Grammars | Slides | ||
Class 13 | 6 Mar 2013 | Homework presentations | |||
Class 14 | 20 Mar 2013 | Homework presentations HW4 | |||
Class 15 | 22 Mar 2013 | Backpointer tables; training from continuous recordings | Slides | assignment 5 | |
Class 16 | 25 Mar 2013 | No class (instructor away) | |||
Class 17 | 27 Mar 2013 | No class (instructor away) | |||
Class 18 | 1 Apr 2013 | Assignment 5 presentations | |||
Class 19 | 3 Apr 2013 | Ngram models | Slides | assignment 6 | |
Class 20 | 8 Apr 2013 | Ngram models, contd. | Slides | ||
Class 21 | 10 Apr 2013 | Subword units | Slides | ||
Class 23 | 15 Apr 2013 | Subword units continued. | Slides | ||
Class 24 | 17 Apr 2013 | Assignment presentation | |||
Class 25 | 22 Apr 2013 | Tying states | Slides | assignment 7 | |
Class 26 | 24 Apr 2013 | Inexact Search | Slides | ||
Class 27 | 29 Apr 2013 | Lattices and rescoring | Slides | assignment 8 | assignment 9 |