|
|
 | | From: | deltagreen | | Subject: | Re: Mac OSX | | Date: | 13 Dec 2004 05:07:58 -0800 |
|
|
 | Whoever told you that you can get phoneme alignments and phoneme confidence scores most probably mislead you. As far as I know, you can't do that with ViaVoice. But, here is what you can do. Once a word is recognized (regardless if it's command-and-control or dictation), you will get the pronunciation, the score and start and end times associated with the recognition (in a SmWord structure). Then, you're on your own. That is, you know when it was said and how it was pronounced with the score.
On a side note, there is nothing like a confidence score at the phoneme level in today's speech recognition. The phoneme is always only analyzed in the context of other phonemes (through grammars or HMMs). As a matter of fact, if you look at phonemes recognized individually in words that you clearly speak, you will be most probably disapointed by the low scores they generate. The word is recognized only through the fact that most individual phonemes have a higher than null score.
I do not know anything about Tcl scripts.
Good luck!
Philippe Roy Offshore speech developer based in South-America (SAPI 5.1, LumenVox, ViaVoice)
|
|
 | | From: | James Salsman | | Subject: | Re: Mac OSX | | Date: | Tue, 14 Dec 2004 03:29:04 GMT |
|
|
 | Philippe,
Thank you for your reply:
>... here is what you can do. Once a word > is recognized (regardless if it's command-and-control or dictation), > you will get the pronunciation, the score and start and end times > associated with the recognition (in a SmWord structure).
Do you know whether single-phoneme words will work in series to get the phoneme alignments, and if so, if there is any kind of an inter-word durational penalty imposed?
>... if you look at phonemes recognized individually in > words that you clearly speak, you will be most probably disapointed by > the low scores they generate. The word is recognized only through the > fact that most individual phonemes have a higher than null score.
Oh, I'm okay with normalizing non-normal distributions. Most of the phone scores I deal with are lognormal, so they are easy.
Sincerely, James -- www.readsay.com - maker of the ReadSay PROnounce English literacy system 400 MHz PDA included: $499 -- http://www.readsay.com/PROnounce.html
|
|
|