[New site is under construction,
Currently, it represents nested content within the older site. The legacy
site header is displayed on the Home page and other pages with the new header
below. Limited content from the old site is displayed on the current Home
page. Company has created new Contact, About, SpeechMax™, SpeechServers™,
and SweetSpeech™ pages. Clicking on new header hotspots will take you to a
newly-resdesigned page. Support, Downloads, Resellers, Search, and
Translate pages remain the same.]
Software that enhances the user experience and
improves productivity, accuracy, and confidentiality.
Custom Speech USA, Inc. (CSUSA) is an integrator,
software developer, and reseller of a wide-range of
speech solutions and development tools.
training services for desktop users of Dragon NaturallySpeaking
speech recognition (SR).
Company later developed software for
business systems integration,
manual transcription (MT),
audio conversion, and
Speech and language processing includes real-time and
server-based speech recognition (SR) and text to speech
became a certified Nuance (Dragon)
reseller and a Microsoft Certified
partner. It has used
Microsoft, Dragon, IBM ViaVoice
speaker-adaptive speech recognition
NaturalVoices, and ProNexus (telephony)
software development tools.
CSUSA software integrates with Microsoft
Word and other popular,
including Philips dictation
microphones, Sony and Olympus handheld
recorders, VEC Electronics (Infinity)
transcriptionist foot control and
headset, and X-Keys (P.I. Engineering)
CSUSA approach has enabled customers to
use boxed SR software that they
already owned, or purchase less
expensive runtimes from CSUSA.
Company has worked with over 75 market partner resellers
in the U.S. and overseas.
Customers have included transcription
physicians and hospitals, law firms, software
developers, business, insurance, schools and
universities, law enforcement, and government.
For selected customer list
received positive comments from many
"We selected their
running with Dragon NaturallySpeaking . . . . we
recommend it to anyone looking for cost-effective
software for server-based speech recognition."
Robert Duffy, Programmer, Columbia University Medical
Center, New York, NY
"After implementation of web transfer server and web
services for transcription,
there was no significant interruption in service for the
doctors or transcriptionists.
The typing ladies think the new server is wonderful . .
. . "
Ian Yates, Medical I.T. Ltd., Queensland, Australia
"Since installing your product,
we have processed over 1,000,000
conversions (on one
server alone) without a problem. . . . Thank you for
Scott D. Stuckey, ASP Product
Manager, Voice Systems, Inc. Tampa, FL
Various publications have reviewed CSUSA
software, including Speech Strategy
News, Law Office Computing, Law
Technology News, TechnoLawyer,
Proceedings of Australasian Technology
Workshop (2005), and Speech in the User
Interface: Lessons from Experience
Lessons from Experience
- Document window supports read/write text,
audio, or image
training datasets important for accurate
enrollment script often
unrepresentative of daily speech
Dictation often discarded as
useless byproduct of transcription
SR systems segment and time stamp the
same dictation audio differently
Other speech to text conversion
variables (e.g., acoustic model) differ
Different SR text for same audio
indicates higher risk of misrecognition
SR text matches indicate increased
SR accuracy, misrecognitions may
occur every sentence
editors often listen to mostly
correctly recognized speech
confidence scores can be misleading
as to accuracy
- Especially true when comparing output from
different SR systems
misrecognizes, adds, or omits words,
but never misspells them
improved SR accuracy, need ways to
quickly "spot" potential errors
issues with other text, audio, and
image pattern recognition
from availability of comparison of
developed software to support languages
and dialects underserved by mainstream
SR, increase SR accuracy in potentially
"noisy" environments such as home living
room or car, increase low SR accuracy
for meetings, interviews, or videos,
1. Reduce SR
editor audio review time up to 90% with 90% accurate SR,
use text compare to
highlight differences from two
SR engines to direct editor to potential errors rather
than listening to audio from likely accurate text . . .
Time savings from decreased audio review increase with
more accurate SR. With more accurate SR and fewer
errors, there are fewer differences and less audio for
SR editor to review. Expected document accuracy is
about the same as from gold standard manual
2. Improve SR accuracy
with nonadaptive speaker-specific
user profile compared to conventional adaptive
SR . . . . Use company text compare and other
tools to create highly accurate individual speech user
profiles from dictation, everyday conversational speech,
meetings, video audio, and other speech for individuals
or small groups. Improved SR accuracy enhances the
speaker experience and reduces expense of human review.
modify SR session file document text or audio tags; this supports speaker B can voice correction speaker A's
session file text and multi-speaker collaborative document creation . .
. Company's document window and
annotation (comment) desktop features permit one or more
users to make unlimited number of text or audio
annotations to document text to modify session content
with corruption of speaker A or speaker B profile.
4. Protect confidentiality of
automatically or manually processed session file
data during; redact confidential text or speech, divide session files segments into > 2
groups, and scramble session file segments within
each group before outside processing . . .
Offsite and remote storage and processing of electronic
data raise issues about individual privacy and
redaction, division, and scramble techniques provides
a that balances privacy concerns and needs for efficient
Major Speech and Language Software
integrates with Dragon, IBM, and Microsoft
Windows SASR and ATT NaturalVoices
Company speech recognition and text to
speech software are Microsoft
Windows SAPI 5.x compliant.
Potential exists for integration with
other SR and TTS SAPI 5.x compliant
For more information
on version and operating system
multilingual, multiwindow HTML desktop speech-oriented session file editor
for real-time and server-based SR and
other speech and language processing.
It supports standard formatting options,
spell-check, macros (for word expansion
or other purposes), "undo" and "redo,"
disaster recovery, and custom style
sheets. It is an alternative to
the user interface provided by speech
differences between SR software indicate
higher risk of misrecognition. Matches
indicate likelihood of greater
reliability. User can open two or more windows to
compare documents. Software
improves voice correction of another's
speaker's SR in collaborative documents
and helps protect privacy and
confidentiality with remote, off-site SR
read/writes proprietary session file
- Dictation record, audio playback, and manual
Speaker-side real-time dictation and
desktop server-side SR
or editor correction audio-aligned
document window with text and
annotation (comment) window
compare across document or
synchronized by phrase (utterance)
errors and save verbatim text
session file for SR training
transcribe presegmented dictation to
create SR training data
compare results other audio, text,
or image pattern recognition
A 2002 review described TurboTranscribe™ and VerbatiMAX™
text compare functionality using Dragon
NaturallySpeaking and IBM ViaVoice.
to Custom Speech USA's SpeechMax,
the days of having to choose between
Dragon NaturallySpeaking and IBM
ViaVoice have come to a close.
Operating on the
principle, a companion application
called SpeechServers runs your
dictation through both
NaturallySpeaking and ViaVoice.
SpeechMax then compares the two
results, and enables you to correct
the text far more rapidly than you
could when using NaturallySpeaking
or ViaVoice alone. SpeechMax can
display a split screen containing
the transcription from each program,
and you can quickly select the text
from Dragon or IBM that is correct
for the final version.
To speed up the correction process,
technology highlights the likely
errors, and enables you to playback
specific portions of the original
audio by simply selecting text.
(Using the optional VerbatiMAX
technology, you can also compare
manually transcribed text to speech
recognition text to generate
verbatim text for automated speech
training.) After correcting
the text, you can send it and the
accompanying audio back to
SpeechServers for automated,
repetitive training to further
improve recognition accuracy. . . .
Pascoe, "Two Speech Recognition
Programs Are Better Than One," TL
Newswire (June 5, 2002)
diagram shows "dual-engine" text compare
SpeechMax™ multidocument text compare,
text compare of
manual or automated pattern recognition processing of
synchronized source audio, text, or image data, (2)
presegmentation before manual transcription,
speech recognition and speech and language processing
Value proposition (SpeechMax™):
This software is a session
file processor for audio, text, and image processing.
Among other benefits, it provides
significant value in three ways:
1. It can synchronize
text from different speech recognition systems.
After synchronization, it can differences from the same
audio (increased risk of misrecognition) and matches
(increased likelihood of reliablity). This helps
operator detect SR potential errors more rapidly.
It also helps the process more quickly generate data
sets to train speech recognition, sometimes without need
for human supervision. Similar techniques apply to
other speech and language processing, as well as other
audio, text, or image pattern recognition.
2. Software supports
selective modification of SR text or audio tags.
This enables a second speaker to voice correct the SR text
created by a first speaker within a speech-generated
multicollaborative document. The process uses the
1st and 2nd speaker's audio and corrected text to train
the respective speech profiles for both users without
corruption of the user profiles.
3. Software provides privacy
and confidentiality protection
for remote, offsite editing of SR document with speech and text redaction, document division, and
scrambled session file data.
Synchronized Text Compare
Using output from two or more SR
programs, automated wave analysis identifies identical
start/end SR session file text arising from same audio.
Retagging and resegmenting algorithm creates an
identical number of synchronized segments in each
session file. Operator may text compare by segment
(usually short phrase) or use traditional text compare
across entire document.
2. By tabbing to
differences, listening to audio, and correcting text,
speech editor can more rapidly correct output.
This typically has about
the same error rate as gold standard manual
transcription (about 5% or less).
3. Text compare
supports an expected 80% reduction in
speech recognition editor audio review time for 90% accurate SR.
It is expected that audio review time using text compare approaches zero as
SR accuracy approaches 100%, resulting in significant
Text compare error-spotting technology is also supported for
other audio, text, or image pattern recognition.
It also supports text compare of
manual processing of delimited source data.
4. Software can compare synchronized
session files text results from virtually any source or processing
method. Synchronization requires same number of
segments, not identical data content.
5. Nontext results can also be displayed and synchronized
using main document windows and/or annotation feature
that can open files, websites, or run programs (e.g., media
6. Matched text more likely
accurate and can use for "unsupervised" training speech user profile
without manual verification. The process is
supported, for example, whether SR uses Hidden-Markov models, Gaussian mixtures,
neural networks, or other SR methodology.
more information on text compare, click
modification of session file text and audio tags,
including voice correction by second speaker of first
speaker's speech recognition
7. Program supports selective modification of speech
recognition text or audio tags. This feature
supports voice correction in collaborative
documents and training user profiles
with modified text audio annotation pairs. Process
has other applications, including creation of speech
user profiles for robotic speech using synthetic speech.
For more information
on multispeaker correction in a
collaborative document and other topics, click
and confidentiality protection with speech and text
redaction and session file division and scramble
8. Software limits
access to content during the correction and editing
process. It uses selective audio text redaction, division of
session files into two or more groups, and scrambling of
segments within the different groups. After
processing, the censored, redacted material is typically
restored along with merging and unscrambling session
file segments. The process strikes a balance
between privacy concerns and maintaining efficient
speech editing of SR and other transcription.
For more information
on privacy and confidentiality
productivity enhancing features
9. Session file editor implements other
productivity-enhancing features. These include synchronized
speech/text for speech editor playback; automatic
reassignment word audio tags after session file
correction; rapid creation of verbatim text with
verbatim annotation tab; creation of separate
audio-tagged final (distribution) and verbatim text
(training) session files; and rapid identification of
document audio without
having to play the entire audio file.
10. Software provides a
single, speech-oriented graphical user interface to
process, compare, or report synchronized results from a
variety of source input generated by computers, humans,
For more general information on
the product, click
supports server-based SR so that a speech engine can be
centralized and output used at several PCs. Desktop
audio file autotranscribe is also available. Software outputs audio-linked text
in original Dragon or other proprietary format. It
also optionally converts to CSUSA session file format
(.SES). Software can also output text (.TXT).
- Speech user profile
enrollment for Dragon and IBM SR
training for Dragon and IBM SR
for Dragon, IBM,
Presegmentation dictation audio before manual
MT transcribes segmented
audio to create
transcribed session file
transcribed session file
user profile training
SweetSpeech™ profile for single
speaker (speaker dependent)
small-group or large-group speaker
Software supports automated speech user profile training
and nonadaptive SR, server-based transcription, and
audio file presegmentation.
2. For adaptive SR, CommandProfile™
service enrolls and creates speech user with verbatim
text and audio file that is characteristic of speaker's
real speech, such as day-to-day dictation audio and the
service provides repetitive, iterative corrective
adaptive training. It transcribes audio file, compares
output with verbatim text, corrects text through
correction window, and retranscribes until correct or
SpeechSplitter™ service presegments dictation
speech into an
audio-linked untranscribed session file (USF). Manual transcription results in
audio-linked verbatim, training transcribed session
file (TSF). Same process applies to other speech, such
as recorded conversation, video speech, or professional
voice talent reading for audio book.
SR uses verbatim audio-linked TSF or other audio linked
pairs to train or update the speech user profile.
6. Servers provide
speech recognition output in the form of audio-linked
session files or simple text file. The
audio-linked session file may represent
manufacturer-specific format (such as .DRA for older
Dragon) or common session file format developed by CSUSA
7. System is Microsoft
Windows SAPI 5.x compliant but is otherwise independent
of SR speech to text conversion techniques. For
example, system can process speaker-adaptive (SA) or
For more general information on
the server-based processing, click
is a speech engine and model builder for
speaker-dependent nonadaptive SR. It also supports
speaker-dependent small-group profiles for
meetings, legal proceedings, or videos. It is
designed a "do-it-yourself" (DIY) toolkit for
transcription companies, business, law, health care, and
Companies and other users can create speaker models from
day-to-day transcription or other speech data.
The speech and language data associated with these
models is accessible to toolkit user. It is
available for use with other speech
and language processing, such voice commands,
interactive voice response for telephony, speaker ID,
text to speech, phoneme generation, machine translation, audio mining, or natural
language understanding. Software is Unicode
compatible and supports unilingual, bilingual, or
multilingual speech user profiles.
- Nonadaptive (NASR) approach compared to mainstream
creation without MAP, MLLR, or
similar mathematical approximation
Creation single user profile
with speaker's dictation or other
small group or large group user
based on conversational, dictation,
meeting, or interview speech
Profile tuned to speaker
speaking style, word use, and
Microphone, handheld recorder,
telephone/cell, video, or other
Profile reflecting recording
device, background, or channel noise
1. Software includes
automatic TTS pronunciation generator for phonetics
linguistics questions generation, and other tools for
automating production of speech user profiles and reducing need for expensive lexical
2. No microphone speaker
enrollment or corrective adaptation required.
this system, user can create nonadaptive SR user profile from
dictation, conversational, or other speech.
Process may generate training datasets from manual
transcription or extraction from transcribed SR.
4. Automatic tying/untying
states for creation/updates to speech user profile using
accumulator and combiner techniques.
5. A 2004
evaluated nonadaptive single-speaker SR user profile
with less than 8 and over 15 hours of training data.
This resulted in improved accuracy of about 1% compared to adaptive
SR. It also showed higher relative error reduction of nonadaptive SR
compared to speaker-independent and adaptive SR, and less rapid saturation of acoustic
model compared to speaker-adaptive system.
of this article's findings, click
6. Company software
supports creation of nonadaptive small group profiles for meetings,
interviews, or video transcription. Intended use
is for group of 2 or more speakers that meet
frequently over longer periods of time. Group
profile represents range of speech characteristics of
different speakers. In addition, software supports
creation of separate speaker-dependent speech user
profiles for each speaker.
For more information on the
software features, click