11-756 / 18799D Design and Implementation of ASR Systems

11-756/18799D ASR: Assignment 3, DTW for isolated word recognition

Note: In this assignment you are encouraged to reuse much of the code you have already written. Your feature computation code can be used to derive features from data. You can very easily modify the Levenstein distance computation code to perform DTW. Only problem 2 requires fresh coding from scratch, but at least part of the procedure (segmentation for segmental K-means) can reuse the DT code.

In this assignment you will be required to record digits many many times and build DTW and HMM-based isolated word recognition systems. In total, you will need to record each of the digits 0,1,..9 ten times each. Each recording must be isolated, with no more than half a second of preceding and trailing silence. If you're working in a team, you may want to split the task of recording across team members. The details of the problem are below.

Problem 1

The first problem is on DTW-based recognition.

Problem 2

Use the segmental K-means procedure to train an HMM for each of the digits (using the 5 "training" recordings you have for them). Assume each state to have a single Gaussian distribution, and the HMM for each digit to have 5 states. Recognize the 5 test utterances using the HMM models and report recognition accuracy.

Optional: Repeat the segmental K-means to train HMMs with mixtures of 2 and 4 Gaussians per state. This should improve recognition performance.

Due: Wednesday, 6 Mar 2013