knowledge-database (beta)

Current group: comp.speech.users

Re: Mac OSX

Re: Mac OSX  
deltagreen
 Re: Mac OSX  
James Salsman
From:deltagreen
Subject:Re: Mac OSX
Date:13 Dec 2004 05:07:58 -0800
Whoever told you that you can get phoneme alignments and phoneme
confidence scores most probably mislead you. As far as I know, you
can't do that with ViaVoice. But, here is what you can do. Once a word
is recognized (regardless if it's command-and-control or dictation),
you will get the pronunciation, the score and start and end times
associated with the recognition (in a SmWord structure). Then, you're
on your own. That is, you know when it was said and how it was
pronounced with the score.

On a side note, there is nothing like a confidence score at the phoneme
level in today's speech recognition. The phoneme is always only
analyzed in the context of other phonemes (through grammars or HMMs).
As a matter of fact, if you look at phonemes recognized individually in
words that you clearly speak, you will be most probably disapointed by
the low scores they generate. The word is recognized only through the
fact that most individual phonemes have a higher than null score.

I do not know anything about Tcl scripts.

Good luck!

Philippe Roy
Offshore speech developer based in South-America (SAPI 5.1, LumenVox,
ViaVoice)
From:James Salsman
Subject:Re: Mac OSX
Date:Tue, 14 Dec 2004 03:29:04 GMT
Philippe,

Thank you for your reply:

>... here is what you can do. Once a word
> is recognized (regardless if it's command-and-control or dictation),
> you will get the pronunciation, the score and start and end times
> associated with the recognition (in a SmWord structure).

Do you know whether single-phoneme words will work in series to
get the phoneme alignments, and if so, if there is any kind of an
inter-word durational penalty imposed?

>... if you look at phonemes recognized individually in
> words that you clearly speak, you will be most probably disapointed by
> the low scores they generate. The word is recognized only through the
> fact that most individual phonemes have a higher than null score.

Oh, I'm okay with normalizing non-normal distributions. Most
of the phone scores I deal with are lognormal, so they are easy.

Sincerely,
James
--
www.readsay.com - maker of the ReadSay PROnounce English literacy system
400 MHz PDA included: $499 -- http://www.readsay.com/PROnounce.html
   

Copyright © 2006 knowledge-database   -   All rights reserved