@import url(http://hear.ai.uiuc.edu/wiki/pub/skins/sinorca/basic.css); @import url(http://hear.ai.uiuc.edu/wiki/pub/skins/sinorca/layout.css); @import url(http://hear.ai.uiuc.edu/wiki/pub/skins/sinorca/sinorca.css);
This course will bring the student to the forefront of signal processing with many practical results and a fundamental understanding of the basic requirements to develop novel algorithms in speech recognition and processing, where the resulting signals are meant for listening, such as speech coding. Speech processing in three parts:
The textbook is Speech Analysis Synthesis and Perception by Third Edition by James L. Flanagan, Jont B. Allen and Mark A. Hasegawa-Johnson; To be published by Academic Press (2009) chapters
)
)
You will need the DjVu viewer to read/print many of the materials in this course. A description of Djvu can be found at djvu.org. Djvu is useful because it compresses scanned bitmaps (e.g., old papers for which there is no text or pdf version) in a highly compressed format. It automatically detects images and photos and treats them differently.
There are many versions of this reader software, all free, and any of them will work. The open-source versions are multiple, including an open source version djview4. Please let either Prof. Allen or TA Reggie Weece know, and we will help you get started.
| L/W | D | Date | Lecture and Assignment |
|---|---|---|---|
| 1/1 | M | 8/25 | Self-introductions from each student. Lecture: Overview: Acoustics, Psychophysics, Information processing. Read: Flanagan, Chapt. 1 djvu |
| 2 | W | 8/27 | Lecture: Mechanisms of Speech production: Sounds of speech: Vowels and consonants; Chapter 2 djvu; Acoustic Transmission lines; Intensity, speech power; sound level; dB, dB-SPL, Pressure, volume velocity, impedance. Flanagan, Ch. 3 (pp 13-25) djvu, HW1: Basic Acoustics (due Wed 9/3/07) pdf |
| 3 | F | 8/29 | Lecture: Basic acoustics and the ABCD-Transmission (Chain) matrix; (Notes: 1-D Wave Equations); Flanagan, Ch. 2 (pp 15-17) djvu |
| -/2 | M | 9/1 | Labor Day Holiday -- No class |
| 4 | W | 9/3 | Lecture: Solution of 1-D wave equation, (Notes on: Waves and 1-Port Reflectance); Upgraded homework on 8/29/07 to better explain Helmholtz resonators: pdf Flanagan, Ch II (pp 17-22) djvu |
| 5 | F | 9/5 | Lecture: Vowels, Formants; Introduction to Matlab; filter design, bilinear Z, FIR, IIR, ...; Read:: Flanagan, Ch. III (pp 36-43) djvu; HW2: TL and reflectance (due Fri 9/14) |
| -/3 | M | 9/8 | Read:: Flanagan, Ch. III (pp 36-43) pdf, original(2.8MB), djvu(.9MB)] |
| 6 | W | 9/10 | Lecture: Reflectance on 1-D transmission lines; Conversion tables for 2-ports (djvu) Read:: Model of Ear drum, Parent and Allen 2007 (pdf, djvu) (pp 918-920) |
| 7 | F | 9/12 | Lecture: Transmission Lines with complex loads and the Propagated Reflectance (Notes: 2-Port Reflectance/Transmittance) HW3: TLs with complex terminations (due in two weeks on Fri 9/21; Extended to Wed 9/25) HW3 Refs: Guinan and Peak ( djvu), Lynch et al. ( pdf, djvu ) Read:: Flanagan, Ch. VI (Sect. 6.262, pp 272-276) [pdf, original, djvu] Read:: p. 1-15 from Bilbao PhD Thesis (pdf, djvu) |
| 8/4 | M | 9/15 | Lecture 2: Radiation impedance of tube junction (Karal [Ref: djvu] correction), and half-sphere (the mouth) Read:: Text Ch. 3, Sec. 3.3, pages 136-152 and p. 1-15 from Bilbao PhD Thesis (pdf, djvu) |
| 9 | W | 9/17 | Lecture : 3-port networks and the nasal tract; Read:: Text Chap. 3, sect. 3.4.1 pages 153-156 radiation impedance Review: Rosowski, Carney and Peak (1988) on the cat middle ear (pdf, djvu) |
| 10 | F | 9/19 | Lecture: Signal processing review: Fourier Series, Fourier Transform, Laplace Transform, ZT, DTFT, DFT, FFT; HW4: Vocal-Tract Simulation (Due Mon 9/28, 2 weeks! Extended to: Wed Oct 3) PetersonBarney52.djvu |
| 11/5 | M | 9/22 | Lecture: Signal processing Review cont. Network Postulates:djvu; Optional Read::Thevenin; Norton |
| 12 | W | 9/24 | Lecture: STFT window methods; Inverse STFT; STFT for speech processing with analysis/synthesis; filtering; Read:ing spectrograms |
| Not proofed beyond here | |||
| 13 | F | 9/26 | Lecture: History of acoustics: BC: Pythagoras; Aristotle; 17C: Mersenne, Marin; Galilei, Galileo; Hooke, Robert; Boyle, Robert; Newton; 18C: Bernoulli, D.; Euler; d'Alembert; 19C: Gauss; Laplace; Fourier; Lagrange; Helmholtz; Heaviside; Strutt, William; Rayleigh, Lord; 20-21C: Campbell, George; Hilbert, David; Noether, Emmy; Fletcher, Harvey; Nyquist, Harry; Bode, Henrik; Dudley, Homer; Shannon, Claude; Flanagan, James; |
| 14/6 | M | 9/29 | Lecture: Bernoulli's equation, and the glottis as an oscillator Text Flanagan pp 41-53, van den Berg (1957) (djvu) |
| 15 | W | 10/1 | Lecture: Review of HW2-3; Read:ing spectrograms, Features in speech Read:: Flanagan Sec. 3.74 pages 69-72 |
| 16 | F | 10/3 | Lecture: Linear prediction of speech; Allen on LPC: (pdf, djvu) Flanagan Sec. 8.112, pp 372-376; Sec. 8.13, pp 390-395; Atal and Hanauer (1971) (pdf, djvu) HW5: LPC (Due Mon 10/9), Speech samples |
| 17/7 | M | 10/6 | Lecture: Cepstral analysis I; Read:: Flanagan Chap. 8, pp 361-363; (Continue working on Final-Section I) HW4 due |
| 18 | W | 10/8 | Lecture: Cepstral analysis II; CELP coding; Review for Exam I (In 2008, have the exam now!) |
| 19 | F | 10/10 | Lecture: Room acoustics; point source; 1, 2 and 6 wall Image method; Wall reflection coef. |
| 20/8 | M | 10/13 | No class due to Exam I HW6: STFT/OLA/Speech coding (Due Mon 10/16) |
| - | Tu | 10/14 | Exam I: Acoustics, Modeling the VT, STFT, Signal processing of speech; Time: Tues night (Oct. 9), 7PM EH 106B1 (across the hall from our normal classroom, and south by one door); 1 8.5x11 crib sheet, two sides, hand written. No computer print sheets!! Leave cell phones home. Calculators on the floor and off when your not using them. |
| 21 | W | 10/15 | Lecture: Psychoacoustics I: Intensity JND and the near-miss; Internal noise model of the JND Riesz pure-tone intensity JND (1928) djvu Masking; Weber's and Fechner's Law; Introduction to loudness, Steven's Law; Loudness Lecture notes (Allen): (pdf, djvu) Read:: Flanagan Chapter 4 (pdf, original pdf, djvu); |
| 22 | F | 10/17 | Lecture: Psychoacoustics II: Frequency JND, semitone, Internal noise and Masking; relation between the intensity and frequency JND (Cochlear frequency response and the slope of the tuning curve); Read:: Allen Review (pages 20-30) pdf, Fletcher and Munson (1933) (pp 82-94) |
| 23/9 | M | 10/20 | Lecture: Cochlear Physiology I:Middle ear and inner ear (Cochlear) anatomy, basilar membrane, 1D Models, Hair cells, Nonlinear basilar membrane; Read:: Review of Cochlear Modeling: Part II (pp 19-28): (pdf, djvu); Wegel and Lane (1924); MIT/HST-725: The auditory system pdf, |
| 24 | W | 10/22 | Lecture: Cochlear Physiology II: traveling waves, neural tuning curves, critical bands, hair cells, neural masking, Upward spread of masking; Forward masking; Auditory Pathway I: Neural Tuning pdf Read:: Review of Cochlear Modeling (pp 1-19) (pdf, djvu); Part I Supplement: Pitch: MIT HST725-5 Pitch models pdf Fletcher and Pitch djvu |
| 25 | F | 10/24 | Lecture: Cochlear Physiology III: Micromechanics, OHC, IHC Lecture Notes: Modeling the Cochlea (pdf,, djvu), and Organ of Corti (pdf,, djvu), and a Read:: Wegel and Lane (1924), Part II |
| 26/10 | M | 10/27 | Lecture: Psychoacoustics III: Relations between Psychophysics and the cochlea; Greenwood's place-map function; forward masking; upward spread of masking; Read:: Fletcher and Munson (1933) (pp 82-94), Review of Cochlear Modeling (pp 1-19) (pdf, djvu); Part I |
| 27 | W | 10/29 | Lecture: Cochlear Physiology IV: Nonlinear Cochlear model; supplement: The Auditory Nerve Read:: Representation of speech-like sounds ... auditory-nerve fibers (1980) djvu; HW7: (Due Fri 11/3) |
| 28 | F | 10/31 | Lecture: Cochlear Critical bands: upward spread of masking; 2 tone suppression; Read:: French and Steinberg (1947) (pdf, djvu) (pp 90-100) |
| 29/11 | M | 11/3 | Lecture: Wickesberg, Auditory Pathway (ppt, pdf) M. Bosi: Lecture on Speech coding (pdf, djvu); [Refs.] |
| 30 | W | 11/5 | Lecture: Pandya: The auditory system (pdf) |
| 31 | F | 11/7 | Lecture: Allen Review for Exam II; |
| 32/12 | M | 11/10 | Exam II: NO CLASS Monday at 1PM Psychoacoustics, Physiology, Speech coding, Tube models, Some Historical items; one 8.5x11'' facts-sheet two sides, hand written, plus your facts-sheet from Exam I (two sheets total). Date: 11/5/2008 Time: 7:00 PM Place: Eng. Hall Room 106B6 |
| 33 | W | 11/12 | Lecture: Information theory I: Information, Entropy, Relative Entropy; Channel Capacity DEMO Audio samples from ASA CDROM Review of exam; Read:: pp 1-10 Shannon (1948) pdf, djvu |
| 34 | F | 11/14 | Lecture: Information theory II: Morse code example Shannon Channel HW8: Information processing (Due Mon 11/25); Materials AULI.zip Read:: pp 10-15 (up to Sec. 8) Shannon (1948) |
| 35/13 | M | 11/17 | Lecture: Information theory III:; Entropy, Relative Entropy, Markov models, State diagram; |
| 36 | W | 11/19 | Lecture: EM algorithm: Example: Speech and noise separation Read:: French and Steinberg (1947) [pdf], [djvu]; |
| 37 | F | 11/21 | Read:: Miller, Heisen and Lichten (1951) [pdf], [djvu] |
| - | - | - | Thanksgiving Holiday (11/22-12/1) |
| 38/14 | M | 12/1 | Lecture: Human speech recognition (HSR) and the Articulation Index (AI) [pdf], [djvu] The confusion matrix; Confusions between sounds Read:: Continue with French and Steinberg (1947) Read:: Miller Nicely (1955); Miller Nicely confusions as a function of the articulation index; entropy, grouping and chance Discussion of the EM alg. with more examples, and a discussion of my solution. to HW8, problem 3. Read:: Steeneken & Houtgast (1980) and modulation processing and detection: Houtgast (1989). HW9: Speech recognition (Due: Fri 12/1, >2 weeks) |
| 39 | W | 12/3 | Lecture: Effects of language and semantic context Miller (1962), Boothroyd (1988), Allen Overview: Introduction (2004) Lecture Notes: Events and the AI (djvu, ps.gz, pdf) |
| 40 | F | 12/5 | Lecture: Language context models, Bronkhorst93.djvu |
| 41/15 | M | 12/8 | Lecture: Allen more on context processing |
| 43 | W | 12/10 | Last class: Final overview of the course material, re your final exam. Free Pizza. |
| -/16 | M | 12/11 | Final Due on reading day, Dec 11. |
The final is a 15-25 page paper, written in the style and format (but single column) of a journal paper, that discusses everything that you have learned in this course. Writing style, spelling, figures, labels of figures, are all part of the grade.
The final is graded based on a list of all the topics that are covered. If there is a paragraph that discusses each topic on my list, then you get at least 1 point, and if the discussion covers the topic effectively, you can get up to 5 points. There are at least 20 topics on the list. When you get to 100 points, you get an A+ on the exam. I expect that you draw on the homework as a starting point. Don't just dump the homework into the exam without modification, that wont get you points. Don't just dump a large number of unexplained figures (that you got from someone else for example) and expect to get points. I need words around each figure. I am looking for insightful comments that link the material together.
Your comments on the relevance of each of the topics I covered in this course, homework problems, exams, etc., are welcome. No points will be taken off, nor given, for strong opinions on my teaching style, or lack thereof, organization, or lack thereof, etc. Please put all such comments in a discussion section at the end of the paper, isolated from the rest of the material.
| Instruction | Begins | Monday, August 25 |
| Labor Day | Monday, September 1 (no classes) | |
| Thanksgiving Vacation | Begins | Saturday, November 22, 1 p.m. |
| Instruction | Resumes | Monday, December 1, 7 a.m. |
| Instruction | Ends | Wednesday, December 10 |
| Reading Day | Thursday, December 11 | |
| Finals | Begin | Friday, December 12 |
| End | Friday, December 19 |
Powered by PmWiki