U.S. patents and
In 1998 company began developing Windows-based dictation,
transcription, and speech recognition software in Crown Point, IN.
Company has served as authorized reseller for Microsoft, IBM, Dragon
(Nuance), ATT, Sony, and Olympus products. Company last developed
using Microsoft Vista operating system. It closed its business and
development office in 2007. It has since worked as a virtual
company providing primarily consulting services. To obtain more
information, click contact button above and complete form, or contact
Dictation and transcription software includes acWAVE™ audio file conversion for Sony and Olympus handheld recorders and other audio sources. CustomMike™ is a driver for handheld Philips microphone. MacroBLASTER™ is a macro editor for programmable keypad for voice, barcode, or keyboard command. PlayBax™ runs with Infinity transcriptionist foot pedal and Spectra headset for manual transcription and provides small footprint transcription window.
Speech Recognition Processing
. . . . More than a word processor™
Software includes application programming
interface (API) and software development kit (SDK). Use to develop
software add-on to create, review, or edit session file from
nonspeech audio, text, or image data. In 2007
company implemented SpeechMax™
Microsoft Word Toolbar Add-In. Among other functions, Add-In has Next/Previous
toolbar arrows to navigate to Word bookmarks, plus Import/Export toolbar functions that support annotation and phrase
migration wizards. These wizards support, for example, data migration to/from
a customizable form in SpeechMax™ to a Word document.
Annotation feature is located below main document window. Dictating speaker, transcriptionist, speech recognition editor, or other user can annotate document window text with text and/or audio comments. Office staff, speaker, transcriptionist, or other user can also use annotation feature to create a customized fill-in-the blank form in the main document window, such as an employee information form (see below).
Location and order of "blanks" is completely customizable. Different users can enter text and/or record voice data in the order preferred by a particular user, e.g., a dictating speaker. Type of data entry is customizable too so that one user can enter data by voice and another by text.
Last name "Streeter" has been entered at bottom in annotation window in first screen shot). It is associated to the first blank of the "Full Name" field (last name first). As shown in second screen snot, user entered other data as annotations and uploaded to main document window. Process automatically removes underlining as annotation text moves into document window.
Demo #10 demonstrates use of employee information form and data migration to Microsoft Word. Audio annotation (blue highlighting) supports text and/or audio entry. Text annotation (purple highlighting) supports text only entry. Form includes instructions created as audio annotation for playback by user. Flash MP4
Demo #11 shows
form creation. Form creation generally involves
entry of field name and creation of audio annotation
within otherwise empty session file segment. Video
also shows text entry with manual transcription or speech
recognition. Flash WMP MP4
Demo #13A illustrates SpeechMax™ Microsoft
Office Toolbar Add-In with data
transfer to/from Microsoft Word and SpeechMax™.
Software enables speaker to
dictate into Microsoft Word with Dragon speech recognition, transfer text and audio
data to SpeechMax™ for
correction, and migrate final text into Microsoft Word or other
for example, transcribes the audio using one or more speech engines for
server-based recognition that returns audio-tagged text after processing
of the entire file.
Text may be reviewed by speech editor and/or
session file editor.
Protocols for speech editor review include: (1) speech editor reviews all audio and text, makes corrections, and returns text to speaker for final review and approval; (2) editor only reviews text differences and associated audio only (as difference indicates that one or both texts are incorrectly recognized), or (3) some variation of the above. Based upon company experience, highly accurate speech recognition returns text with about the same error rate as gold standard manual transcription. For critical documents, it is recommended that dictating speaker review entire document and make changes and/or request additional edit where appropriate before sign-off
In one approach, WordCheck™ synchronized text compare identifies
differences between speech engine output and highlights the differences.
Nonhighlighted text indicates no detected text difference.
The organization can use time savings from using SpeechMax™ to support other activities. For example, speaker who was required to self correct can spend less time correcting errors (as a speech editor has corrected most or all), and more time on his or her primary work, e.g., a doctor who treats patients. The same applies to other dictating speakers, such as lawyers, engineers, or business personnel.
Graph below shows estimated time savings using text comparison in bar graph below. Graph assumes that speech editor reviews DIFFERENCES in FirstLook™ mode only. At theoretical limit of 100% accuracy in all texts, there are no differences. Speech editor review time drops to zero at the 100% limit. Dictating speaker carefully reviews all text before signoff primarily to detect errors, including identical misrecognition (where two or more speech engines make the same mistake).
At 90% accuracy, speech editor generally will identify some differences requiring review. At this accuracy level, there is 80% speech editor decrease in audio review time. The dictating speaker still reviews the final text. With highly accurate speech engines, speaker has few if any errors to correct.
different speech engines sometimes make the same recognition error. These and
other errors become less frequent with improved accuracy. Process can use SDK and API to create database reliability
index to track previous identical misrecognitions. This index can
alert editor and/or speaker if previously identically misrecognized text (e.g.,
"underlining" instead of "underlying") appears in transcribed text.
Total audio review time with text compare
depends upon how much audio the speech editor must review. Bar graph shows expected decrease in audio
review time when reviewing text with highlighted differences.
Speech editor corrects detected errors. With highly accurate speech recognition, text
returned to speaker after speech editor review generally will have the
same, or about
the same error rate, as manual transcription. As described
above, when there are no misrecognitions and no differences for review,
speech editor review time drops to approximately zero. No recognition
errors are highlighted also when speech engines make the same mistake
(identical misrecognition). It is assumed, for purposes of this
graph, that each speech engine misrecognizes different words from those
misrecognized by the other speech engine. This assumption tends to
increase expected speech editor review time and reduce the projected
time savings. First, consider the case where each speech engine misrecognizes the same word
in a sentence as the other engine, but differently (e.g., user
said "ball" and first engine transcribes "hall" and the second "wall").
Speech editor need only review text and audio for a single word. However, if the misrecognition involves words in different
locations in the sentence, then word audio for two different locations
(two different audio tags) in the sentence must be reviewed. This
would increase editor audio review time. As
misrecognition errors may occur in the same word (position) in a
sentence, graph would tend to underestimate potential time savings.
reorders speech utterances (phrases) before manual transcription or
speech recognition editing. This is designed to limit
understanding of document content as a whole by any one transcriptionist or
session file editor. Process can send all rearranged audio phrases
to a single transcriptionist, several reordered phrases to several
transcriptionists, or a single phrase to each of many transcriptionists.
After transcription, the transcribed phrases are rearranged into their
original sequence and text distributed as a final report or other
document. Similarly, process can scramble and divide multiple
speech recognition text-tagged utterances among different speech
recognition editors, rearrange the reviewed and corrected session file
segments into proper order, and return session file to speaker for
review and approval.
Microsoft research also indicates that nonadaptive speaker-specific approach is potentially more accurate than mainstream speaker adaptive technology.
Increased accuracy in bar graph from Microsoft research refers to 8.6% error reduction of nonadaptive recognition compared to adaptive technique with less than 8 hours of training data, and 12.3% error reduction with greater than 15 hours of training. Word error rate (%) is about 1% lower for nonadaptive speaker-specific model for different levels of speaker training data. Authors estimate word error rate is about 7% for training data of 12,000 sentences for nonadaptive technique, and 8% for adaptive model with same level of training data. Similarly, there is improvement in relative error reduction of 45% and 52.2% for nonadaptive approach compared to speaker-independent model.
Price, terms, specifications, and availability are subject to change without notice. Custom Speech USA, Inc. trademarks are indicated. Other marks are the property of their respective owners. Dragon and NaturallySpeaking® are licensed trademarks of Nuance® (Nuance Communications, Inc.)