Practical Solutions for Common Tasks

Speech and workflow software have integrated with widely-used
software and hardware products. 

U.S. patents and patent pending 

Company began operation in 1998, closed main development office in 2007,
and now operates as a virtual company.  Company last developed software using Microsoft Vista operating system in 2007.  

To request specific product information, see Contact

Partial Customer List

U.S. and overseas purchasers have included solo transcriptionists, transcription companies, physician offices, medical clinics, medical examiner offices, hospitals, imaging centers, medical billing services, medical office software suppliers, law firm and public defender offices, marketing services, flow systems equipment manufacturer, networking equipment manufacturer, consumer product manufacturer,  municipalities, and law enforcement and sheriff offices.

Trade Shows and Meetings

ABA Law Practice Management Section
American Association for Medication Transcription (AAMT)
Annual Conference New Hampshire Medical Group Management Assn
Annual Neuroimaging Symposium at the Barrow Neurological Institute
IBM Annual Conference for Speech Recognition Dealers
Legal Tech Chicago
Medical Transcription Industry Association (MTIA)
Northern Illinois Physicians for Connectivity
Northwestern University Medical Informatics
OPEN MRI 2000 Conference
Radiology Society of North America Annual Meeting.  

Products and Services

*Indicates that not separately sold

Speech and Other Pattern Recognition

SpeechMax multilingual, multiwindow session file editor
SpeechServers™ speech recognition user profile training
SweetSpeech™ model builder, speech engine


Command workflow manager
Command Call Center for telephone dictation
PathPerfect™* diagramatic workflow diagram editor ("drag and drop where you want the files to go")

Desktop Utilities

CustomMike™ driver/configuration for Philips handheld microphone
PlayBax™ driver for Infinity transcriptionist foot pedal and Spectra transcriptionist headset
audio conversion for Sony, Olympus files, and other audio
MacroBlaster™ macro editor for programmable keypad, bar code  device, or voice recognition 

Other Software

TTSVoice™ text to speech add-on for ATT text to speech
SR Autotranscribe* desktop, server-based speech recognition for  Microsoft Vista

Services have included programming, configuration, installation, and consulting.

Dragon, IBM, and Windows SR Compatibility/OS Support

Company developed Microsoft Windows compatible software.  Latest software versions are compatible with Windows XP with limited testing with Vista or later Windows operating systems. 

Company uses Microsoft SAPI 5.x for SweetSpeech
, Microsoft (Vista), Dragon (Nuance), and IBM speech recognition, and Microsoft and ATT Natural Voices text to speech.  As of 2007, IBM no longer supported ViaVoice speech recognition. 

Server-based and/or real-time speech recognition software have supported Dragon Professional, Medical, and Legal 10, Dragon Preferred 10, IBM ViaVoice Professional 10, Windows Vista SR, and this company's nonadaptive SweetSpeech.  System supports SAPI 5.x text to speech, including Microsoft and AT&T Natural Voices.  System may run with Dragon Professional, Medical, or Legal v. or higher.  Only version 10.x is supported.  System may run with IBM Professional v.8.x or higher, but only IBM USB Pro 10.x is supported. 

Company developed latest version of SpeechMax software with Microsoft .NET 2.0.  It tested this and other company software with Windows XP.  Limited testing with early Windows Vista indicated compatibility issues involving device drivers. 


Reviews have appeared in a variety of publications, including Law Office Computing, Speech Strategy News, Law Technology News, TechnoLawyer, Proceedings of Australasian Technology Workshop (2005), and Speech in the User Interface:  Lessons from Experience (2010) (William Meisel, editor). 

Speech Strategy News--Custom Speech Offers Wide Range of Speech Development Options (3/07)
Speech Strategy News--Custom Speech Adds Compatibility with Latest Dragon Versions (11/06)
Proceedings of the Australasian Technology Workshop (12/05)
TechnoLawyer--Command! Call Center (08/03)
The Times (Crown Point)--Patenting Northwest Indiana's Products (01/03)
Law Office Computing--SpeechProfessional (01/03)
Law Technology News--SpeechProfessional (09/02)
Crown Point Star--Crown Point Firm Nominated for Award (09/02)
The Times (Crown Point)--Seeing What You Say (07/02)
Lake County Post-Tribune--Talking to Software (07/02)
TechnoLawyer--SpeechProfessional (06/02)

Crown Point Star--Crown Point Firm Takes Lead in Voice Recognition (06/02)

Later Review 

Speech in the User Interface (2010) includes over 50 articles written by contributors from various companies active in speech processing interface design. These include larger companies such as Microsoft, Nuance, IBM, M*Modal, Nortel, Vlingo, Convergys, Voxify, BBN Technologies, Loquendo, and some smaller companies such as Custom Speech USA.  Collection includes description of this company's SpeechMax™ text compare for automated error detection, utterance scramble, and other features.  Mr. Meisel is editor of widely-read industry publication Speech Strategy News. 

Back cover states, "Speech recognition and other speech technologies are an increasingly important means of interacting with users of technology and communications.  This book highlights what works and what doesn't based on the hands-on experience of fifty-one top experts."  

Review describes SpeechMax™ session file editor comparison of synchronized speech data sets from two or more speech engines to detect errors for faster editing.  It also notes that tool can be used for datasets processed by manual transcribers. 

Article also states, "Custom Speech offers software tools that make it easier to develop applications using speech engines on PCs, applications that leverage the typical speech-to-text or command execution in speech recognition software from Nuance (Dragon NaturallySpeaking, IBM, Microsoft (speech recognition in Windows Vista), as well as other SAPI 5.x speech recognition, such as our Custom Speech speaker-specific SweetSpeech™."

According to the article, this "do-it-yourself" speech engine and toolkit "enables a transcription service to create unlimited speech user profiles from day-to-day dictation."  With these features, transcription companies, hospitals and clinics, law firms, and other organizations can create customized speech user profiles for highly-accurate nonadaptive, massively-speaker specific from day-to-day dictation and transcription.  Article focuses upon three examples of "value-add":

1.  Speech transcription aid comparing speech data sets processed by one or more speech recognition engines, manual transcriptionists, or both.

2.  Features addressing privacy concerns when human editors review speech recognition or when speech recognition or manual transcription is sent out for transcription or review.

3.  Fill-in in forms, including interacting with local or web-based databases.

As explained in the article, 

. . . a human speech editor can identify many misrecognitions without having to listen to the entire audio.  Instead the editor can use the tab key to advance to the next difference, listen to the audio, and select the appropriate text from a dropdown of engine output.  If correct text is not listed, the editor can manually transcribe it.  The edited text may be returned to the original speaker for review and optionally further edited.  If all speech engines agree, suggesting a high level of accuracy, editing time approaches zero.  The application can apply the same synchronized text comparison techniques to documents created by human transcribers (or a transcriber and speech engine).  In some cases, a speech engine may recognize unusual words (e.g., technical terms) more accurately than a human transcriber.

Article further explains, "The same tools support synchronization of source text and one or more Unicode translations."  Similarly, "color-coding can indicate not only the presence of differences, but also the number."  If dictation is transcribed by three engines, for example, no highlighting (clear) could indicate agreement by all three texts, pink,  agreement by 2 of 3, and red, no agreement, indicating higher risk of error.  Software also supports text compare of synchronized human output, or sequential text compare of speech recognition followed by text compare of the respective translations of speech recognition text.   More than 3 texts can be synchronized and compared as well. 

The review notes that SpeechMax™ also addresses confidentiality concerns related to transcription with the ScrambledSpeechfeature.   An utterance is a phrase or short sentence.  As described in article, feature can scramble order of audio-tagged utterance text of two or more utterances.  Process can send different scrambled  utterances to different speech editors for correction.  This  limits knowledge of document by any single speech recognition editor, but leaves intact the utterance, a basic unit of human speech.  Session file editor also supports audio and text redaction, as well as scrambling utterance audio segments sent for manual transcription. 

The article also describes SweetSpeech™ "do-it-yourself" speech recognition toolkit.  Among other features, this software make it easier for a relative novice (such as a transcriptionist) to generate speech and language data for speech engine user profile, including data for company's nonadaptive speaker dependent speech engine and toolkit.  The data can also be used for other speech and language processing, such as text to speech or voice commands.
Review also describes easy creation of speech-oriented forms, "With these features, a transcriptionist or clerk can create a structured dictation form in a matter of minutes.  This may include [Talking Form™] audio prompts with a human or text-to-speech voice.  The medical, legal, law enforcement, or other user can record audio for each blank.  The audio for the fill-in-the-blank text can later be transcribed manually or with server-based speech recognition. The user may also input data with the keyboard, real-time speech recognition, or a bar code reader." 

Review includes brief discussion of SpeechMax™ My AV Notebook.  With the software, user can easily create audiovisual presentations for playback of speech and other audio related to lectures, electronic greeting cards and diaries, karaoke, and other documents and activities.  See, for example: 

Singalong/karaoke:  Stairway to Heaven (Led Zeppelin) . .  . Demo shows playback of audio recording, slider bar, and highlighted text in SpeechMax™Flash WMP MP4

"Stairway to Heaven," produced by Jimmy Page, executive producer Peter Grant, © 1971 Atlantic Recording Corporation for the United States and WEA International Inc. for the world outside of the United States. 

A.  Summary of Features


SpeechMax™ multilingual, multiwindow speech-oriented session file HTML editor synchronizes and compares output from different speech engines and other pattern recognition.  Color bar over differences text highlights differences.  This helps editor or other user more quickly find recognition mistakes.  Software also supports synchronization and error spotting for other audio, text, and image pattern recognition.  Software currently is hard-coded to support display, synchronization, and comparison of over a thousand session files generated by different pattern recognition and artificial intelligence programs.  Document window annotation feature includes unlimited number of text boxes associated to specific document text.  There is annotation sound recorder and upload audio file functionality.   These features supports unlimited number of text or audio comments associated to specific document text.  This may be used to support "crowdsourcing" of document.  Annotation feature supports command line to open text, audio, or image file or web page.  User may also use annotation to run program associated with document text.  Examples include play video player for multimedia display linked to specific document text.  Software can also convert and synchronize third-party speech recognition and other files to standardized format.  With lock session file feature, user can distribute "portable" session file representing  dictation, audio book, audiovisual speech or lecture, electronic scrapbook, singalong or karaoke, and other speech-related multimedia.   Other features include audio and text redaction and phrase "scramble."   Both protect document confidentiality.  Software includes application program interface (API).  

A few screenshots showing basic organization and architecture of SpeechMax™. . . .

Document Window

Screen shot shows toolbars and main document window.  Purple vertical markers (utterance boundaries) represent placeholders delimiting phrases in Pledge of Allegiance. 

Main/Annotation Window

There is no need for expensive programming or scripting to create custom forms that can easily be completed with voice and/or text.   Plus, the process can use audio-text data to train a speech user profile.

First screen shot (below) shows main document window and annotation window.   Annotation window includes sound recorder and text box.  Tools support form creation by nontechnical office staff with no need for software development kit or scripting.   In the example below, main window has form with blanks associated to annotations.  Process supports other formats.  Audio annotation (blue highlighting) supports text and/or audio entry Text annotation (purple highlighting, not shown) supports text entry only.  Text "Streeter" in annotation window represents last name.  It may be moved to first blank after "Full Name" in the document window form. 

Last name "Streeter" entered into text box may be entered into the first blank of the "Full Name" field (last name first), as shown in the second screen shot above.  The remainder of the form has been completed with annotation window data.  The underlining is automatically removed as annotation text is moved into document window. 

More Than Just for Forms Creation:  Annotation supports entry of text or audio (comment) by one or more authors and/or speakers.  Document text may be transposed (swapped) or replaced by annotation text in same or different language.  Document audio tag may be transposed or replaced by annotation audio.  New audio tag may be same or different voice than original text audio tag.  New tag may represent natural (human) or synthetic speech or other audio.  Process can use word-audio pairs to train first and/or second speaker voice user profile with annotation training of first and/or second speaker word-audio pairs. 

Multiwindow Tiled Display

Screen shot shows tiled horizontal display of 3 session files representing original English text and 2 translations (into French and Spanish).   Vertical placeholders delimit text in  in 3 document windows.   The translations have the same number of segments delimited by the same punctuation.  The last synchronized segment is highlighted in all three windows.  Software supports horizontal, vertical, and cascade tiling.

  Integration with Microsoft Word

Microsoft Word runs SpeechMax™ Microsoft Office Toolbar Add-In.  Add-In has Next/Previous toolbar arrows for user navigation to Word bookmarks.  Add-In also has Import/Export toolbar functions that support annotation and phrase migration wizards.  These wizards support data migration to/from SpeechMax™ session file to Word. 

Software has HTML display and XML storage of proprietary session file content. 

Full/Reader Editions of Text Editor

Full Edition supports audio record for dictation, audio playback with transcriptionist foot pedal or hotkeys, real-time speech recognition, desktop autotranscribe audio file with speech recognition, edit server-based speech recognition, multilingual, multiwindow document display, and text and audio annotation (comments).   It integrates with SpeechServers™ server-based speech recognition and SweetSpeech™ speech engine and model builder. 

Reader Edition displays session file text, audio, and image content.  Document license-embedded SessionLock™ is designed to prevent unauthorized editing.

SpeechMax™ is available with SpeechProfessional™ combo package. 

SpeechMax™ General Functions

  • Document window supports read/write text, audio, or image
  • Read/write includes .TXT, .RTF, .HTML, or .SES (proprietary session file)
  • Copy/paste graphics into document window
  • Content stored as XML
  • Single, double, or multiple window viewing available
  • Tile documents horizontally, vertically, or cascade
  • Enter text by keyboard, barcode, or speech recognition
  • SweetSpeech™ real-time or server-based (local autotranscribe) SR
  • Plugins (addins) for third party-software
  • Includes SR, text to speech, and other speech and language processing
  • Continuous or segmental (utterance) time-stamped audio playback
  • Select text, playback audio with audio-tagged document text
  • Use keyboard hotkeys or foot control to start/stop playback
  • Text compare across document (like word processor)
  • Represents text compare of text strings with no reference to source data
  • Text compare by phrase (synchronized text compare)
  • Use to compare SR audio-aligned text
  • Use different SR SDK to converts to CSUSA proprietary .SES format
  • Create equal number of segments (phrases) in .SES session files
  • Each phrase (utterance) arises from same audio
  • Each respective phrase has same start/duration time
  • Once synchronized, text segments (phrase) compared
  • Differences highlighted, matches clear
  • Differences indicate higher error risk
  • Matches indicate greater reliability
  • Consolidated text compare color codes (highlights) degree of differences
  • Visualized in single window
  • E.g., with 3 texts, clear = all match, pink = 2 differ, red = all differ
  • Any type of text can be phrase compared if synchronized
  • With equal numbers of segments, session files are "synchronized"
  • Phrase compare supported even if underlying source data different
  • For example, phrase compare synchronized SR text and translation text
  • DataInSync™ = bounded synchronized data input for pattern recognition
  • Input data include may include text, audio, image, volumes, and spaces
  • Vertical placeholders delimit bounded input and output data
  • Optional display with/without vertical placeholders (delimiters) 
  • Create text and/or audio annotation (comment) to selected document text
  • Unlimited annotations (multilevel data) to selected text
  • Unlimited users (multiuser collaborative) may annotate same text
  • Text annotation may include comment or question
  • Text annotation may include hyperlink, command line
  • Command line may launch program, e.g., open media player or web page
  • Use annotation window sound recorder to record audio comment  
  • Load audio file or use text to speech to create audio annotation
  • Switch (transpose) document/annotation content
  • Selectively move annotation text into document to replace session file text
  • Selectively move annotation audio into document to replace audio
  • Divide session file segments into two or more groups
  • Sort (scramble)/unsort (unscramble) group session file content
  • Session lock converts document/annotation content to read only
  • Data migration document phrase or annotation
  • Application programming interface (API)
  • Web services and file management available through workflow manager
  • Web-based Help files
  • How much time does text compare save?

    Time required by editor to review 1 minute of dictation
    American Health Information Management Association (AHIMA) position paper indicates that it takes a speech recognition editor 2-3 minutes to listen to and edit a minute of speech recognition text using traditional playback techniques (playing back audio for all text) 
    (AHIMA October 2003) 

    Fewer text differences => more editor time savings
    Hours/minutes saved by using text compare to correct errors depends mainly upon speech recognition accuracy.    As speech engines become more accurate, there are more potential time savings. Speaker saves time because speech editor has edited most, if not, all errors--leaving speaker with few, if any, corrections to make. 

    With 90% accurate server-based speech recognition, a correctionist listens to 10 hours of correctly transcribed business dictation or voice mail to find 1 hour of incorrectly transcribed speech.  With 90% accuracy in both speech engines, there is estimated up to 80% reduction of audio review time by speech editor using text compare (see graphic below).   Consequently, speech editor can reduce review time of three hours of audio to as little as 36 minutes--a time savings of up to 144 minutes (.8 x 180). 

    In those cases where organizations require doctors and other dictating speakers to self-correct their dictation, or where speech editors correct text before speaker review, supplementary
    WordCheckby dictating speaker or speech editor can enhance error detection of recognition errors.    

    Bar graph shows expected decrease in audio review time when reviewing text with highlighted differences.  Speech editor corrects most or all errors for dictating speaker using this technique.  With highly accurate speech recognition, text returned to speaker after speech editor review generally will have about the same error rate as manual transcription.   As described above, when there are no misrecognitions and no differences for review, speech editor review time drops to virtually zero.  No recognition errors are highlighted also when speech engines make the same mistake (identical misrecognition).  Bar graph shows decreased audio review time when comparing output from two different speech engines.  Total audio review time with text compare depends upon how much audio the speech editor must review. Consider the case where each speech engine misrecognizes the same word in a sentence or phrase as the other engine, but differently (e.g., user said "ball" and first engine transcribed "hall" and the second "wall").  Only audio for a single word (single audio tag) need be reviewed.  However, if the misrecognition involves errors of words in different locations in the sentence, then word audio for two different locations (two different audio tags) in the sentence are involved.  This would increase audio review time of editor.  Estimate of decrease in audio review time may underestimate time savings.  Graph reflects assumption that errors may occur in different parts of the sentence.

    How accurate is speech recognition?
    Nuance® reports up to 98% accuracy for physician users of Dragon® Medical.  Voice Recognition Software Dictation Test  indicates that MacSpeech Dictate had one error in 124 words.  Microsoft claims parity with human correctionist and reported accuracy of 6.3%.  Microsoft Hits Another Milestone in Speech Recognition Software Accuracy.   One study indicates substantially higher error rates based on radiology reports at South African hospital where speakers were dictating in a second language. The Accuracy of Radiology Speech Recognition Reports in  Multilingual South African Teaching Hospital  Reasons for speech recognition errors are often poorly understood and with occurrence hard to predict. Which Words are Hard to Recognize? Prosodic, Lexical, and Disfluency Factors that Increase Speech Recognition Error Rates 

    Importance of error identification and correction
    Practice standards and health care regulations emphasize the importance of detecting spelling and misrecognition errors
    to protect patient safety.   See, e.g.,
    The Joint Commission, Division of Health Care Improvement, "Transcription Translates to Patient Risk," Quick Safety (April 2015).   Misrecognition can have serious consequences in other areas.  For those documents requiring accuracy, text compare can sometimes help identify an error that would have been missed by both speaker and speech editor. 

    A technology whose time has come
    With improved software and faster computers, speech recognition has become highly accurate for many dictating speakers.  Speech recognition accuracy of up to 98% is obtainable for some speakers.  Occasionally there is 100% accuracy.  Even with 90% accurate recognition, there are significant speech editor productivity gains compared to use with less accurate recognition (see bar graph above)

    WordCheck™ for other machine or manual processing
    Company's patented process finds mistakes using the principle that output differences by two pattern recognition processes indicate a mistake by one or both.  Same logic applies to differences between three or more texts, as well as nonspeech audio, text, or image pattern recognition or manually-generated, synchronized output.  Software supports synchronized text compare for audio mining, speaker identification, machine translation, natural language understanding, facial recognition, fusion biometrics (e.g., facial + speaker recognition, computer-aided diagnosis (CAD) for medical purposes, and other
    pattern recognition.

    includes a speaker-dependent, nonadaptive speech engine and "do-it-yourself" (DIY) model builder for creation of acoustic model, language model, and lexicon.  Software can return recognized text from audio stored in a file after transcription by SpeechServers™.  Model builder supports English and other languages and dialects.   Software is designed for transcription companies, government, business, and others with access to dictation or other speech and text data.  Transcriptionists, information technology personnel, and other users can create speaker-specific speech recognition user profiles from day-to-day transcription, voice mail, or call center inquiries.  Toolkit can help create user profiles for languages and dialects underserved by current speech recognition.  Toolkit can also help create user profiles for speakers with physical or psychological speech impediments.  Software includes tools to reduce reliance upon expensive lexical expertise.  These tools include easy-to-use text-to-speech phonetic pronunciation generator using SAPI Universal Phone Set (based upon IPA pronunciation).  Software also includes speech engine automatic linguistic questions generation.  End users may also use text and audio data to create user recognition models for voice commands, speaker identification, text to speech, machine translation, and other speech and language processing.  Research indicates that speaker-specific, nonadaptive speech recognition word error rate is more accurate than speaker-adaptive speaker-dependent or speaker-independent models. 

    Increased accuracy in bar graph below from Microsoft research refers to 8.6% error reduction of nonadaptive recognition compared to adaptive technique with less than 8 hours of training data, and 12.3% error reduction with greater than 15 hours of training.  Word error rate (%) is about 1% lower for nonadaptive speaker-specific model for different levels of speaker training data.  Authors estimate word error rate is about 7% for training data of 12,000 sentences for nonadaptive technique, and 8% for adaptive model with same level of training data.  Similarly, there is improvement in relative error reduction of 45% and 52.2% for nonadaptive approach compared to speaker-independent model. 

    SpeechServers™ supports return of audio-tagged text after complete transcription of entire audio file.  Software also supports repetitive, iterative training of speech recognition user profile.   This server system is part of the Command™ workflow and file management system.  Software supports Microsoft SAPI 5.x server-based speech recognition with SweetSpeech™, Microsoft  speech recognition, Dragon NaturallySpeaking, and IBM ViaVoice (no longer supported by IBM).   Desktop server-based autotranscribe is available for SweetSpeech, Microsoft, and Dragon engines.  Software supports pretraining speech user profile with verbatim text and audio file with elimination of script-reading of traditional microphone enrollment.  Process supports repetitive, iterative training of speech recognition speech user profile, as described in following newswire.

    B.  Video Demos

    Demo #1A shows SpeechSplitter™ utterance (phrase) segmentation to create untranscribed session file (USF) from dictation audio file.  Transcriptionist manually transcribes in SpeechMax™ to create transcribed session file (TSF) using PlaySpeech™ functionality.   Demo also shows realignment segment boundary marker to include audio for "period" with larger adjacent utterance.   Flash WMP MP4  
    Demo #1B
     shows SpeechSplitter™ utterance (phrase) segmentation to create untranscribed session file from dictation audio.  Transcriptionist imports previously transcribed text, sequentially listens to each untranscribed utterance using PlaySpeech™ functionality, and sequentially delimits each utterance by toggling play audio control.  This demo shows how process can generate audio-aligned text from audio file and preexisting text.  The segmented transcribed session file can be used as a training session file.  Flash WMP MP4  

    Demo #2 shows server-based transcription using prototype SweetSpeech speech recognition.  In-house staff created speech user profile with SweetSpeech speech and language processing toolkit.  Video first shows text immediately after speech-to-text conversion (raw speech engine decoding).  This is followed by regular expressions algorithms to search and match text strings.  Conversion rules may reflect speaker or institutional preferences.  Speech user profile typically reflects these preferences.  User loaded post-formatting transcribed session file (TSF) into SpeechMax™ to play back audio and make any needed corrections.  Flash WMP MP4

    Demo #3 shows single-window dual-engine comparison using server-based SweetSpeech and Dragon NaturallySpeaking.  User sequentially opens Dragon and Custom Speech USA™ session files, clicks compare documents toolbar button to highlight differences, plays differences using menu dropdown, makes changes, increases leading/trailing playback to listen to word "of," and copies/pastes "well-maintained" from Dragon to format text, and enters new lines. Operator saves final distribution report as .TXT. Small cap "l" was transcribed by both engines and capitalized as "L" for report distribution. Since user did not create a separate verbatim annotation, the final text is automatically saved as the verbatim text with Instant Verbatim™ feature.  Flash WMP MP4

    Demo #4 shows double-window dual-engine comparison of Demo #3 session files.  Operator selects toolbar window icon to horizontally display  audio-aligned text.  Operator specifically references option of play entire phrase (including difference) as opposed to playing difference only.  Flash  WMP MP4

    Demo #5  shows double-window SpeechMax™ VerbatiMAX™ comparison of uncorrected speech recognition transcribed session file (TSF) with verbatim text .TXT.  Any difference represents an error.  Text comparison reduces transcriptionist review time.  This supports batch, automated correction of transcribed session file (TSF) to verbatim transcribed session file for speech user profile training.  Flash WMP MP4 

    Demo #6 shows potential training application for manual transcription.  In this approach, a medical transcription instructor can create a verbatim transcribed session file text key and compare to student output.  Text comparison identifies errors in spelling and punctuation and generates accuracy rate.  Instructor and student do not have to search/replay any part of original audio file with audio-tagged text to find errors.  Multilingual SpeechMax™ supports medical transcriptionist training in multiple languages.  Accuracy levels are determined automatically.  Flash WMP MP4

    Demo #7A was prepared for local elementary school to show use of software for phonetics training ("phonics").  Local elementary school teacher requested development of prototype for local school board as proof of concept.  The video for "r" phonics shows how teacher can customize training with web-based resources and use text comparison.  Teacher can customize training to child's needs and locate make available web resources using multilevel text annotations to specific document text.  Accuracy levels are determined automatically.  Flash WMP MP4

    Demo #7B was prepared for local institution to demonstrate use of software for training students.  Using SpeechMax™, the teacher can enter text phrases into the document window.  Teacher can record his or her personalized interpretation of the melody in the annotation (bottom) window sound recorder.  Within the annotation window, the user can associate (link) document phrases with annotation window melody.  As a result, each musical (audio) phrase is synchronized to the corresponding text in the document (top) window.   As opposed to using a recording by another instructor with a different interpretation, this recording makes it easier for the student to remember how instructor renders the melody.    By selecting the text, student can playback the audio associated to the text.  Instructor can annotate original by recording melody using software's annotation window sound recorder in the annotation (bottom) window.  Video shows that instructor can also provide informational audio or text annotations (comments) to help student learn the material.  Instructor can also supply hyperlinks to web sources, programs, or other files related to training.  MP4

    Navigating II OTR © 2000 World OTR, London, UK. US contact World OTR, New York, NY
    Demo #8
     shows open MT Desk website for medical transcriptionist in training learning about prostate cancer treatment. This functionality can be used to launch video player or any other program.  Flash WMP MP4

    Demo #9 shows use of spell check supplemented by audio annotation pronunciation of medical term ("Prostaseed") for student medical transcriptionist  Spell check showed word that required capitalization.  Annotation supplied by instructor shows capital letter and explains meaning of term.  Flash WMP MP4

    Demo #10 demonstrates use of employee information form and data migration to Microsoft Word.  Audio annotation (blue highlighting) supports text and/or audio entryText annotation (purple highlighting) supports text only entry.  Form includes instructions created as audio annotation for playback by user. Flash MP4

    Demo #11 shows form creation.  Form creation generally involves entry of field name and creation of audio annotation within otherwise empty session file segment. Video also shows transcription by manual transcription or entry using speech recognition. Flash WMP MP4

    Demo #12 discusses
    The Talking Form™ and audio prompt creation for form user.   Flash WMP MP4

    Demo #13A illustrates SpeechMax™ Microsoft Office Toolbar Add-In with data transfer to/from Microsoft Word and SpeechMax™.  Using software speaker can dictate into Microsoft Word with Dragon speech recognition, transfer text and audio data to SpeechMax™ for transcription, and migrate data back to Microsoft Word.  User can also enter data into SpeechMax™ and migrate to Microsoft Word. WMP MP4

    Demo #13B  illustrates SpeechMax™ Microsoft Office Toolbar Add-In with  transfer XML data to/from Continuity of Care Record (CCR) to/from Microsoft Word and to/from SpeechMax™.  Add-In supports download/upload XML data to/from Microsoft Word and CCR.  Demo represents proof-of-concept workflow.  Video demonstrates download data into Word using Add-In and data modify based upon written or dictated information.  If dictation, transcriptionist may play back dictated audio using SpeechMax™, PlayBax™, or other software and transcribe into Word.  Alternatively, use may dictate into Microsoft Office using speech recognition.  Data may be transferred to SpeechMax™ and modified using dictation/transcription, speech recognition, keyboard, or bar code.  Modified data may be transferred to Word and uploaded directly into CCR using the Add-In.  Alternatively, operator may modify in Word before upload. WMP MP4 

    The first 7 examples below demonstrate how user can create complex presentations with speech, audio, dictated and nondictated text, and graphics.  Examples include electronic audio book, electronic scrapbook, presentation on segmenting dictation, lecture on geography of Ireland, presentation Gettysburg Address, sales presentation, and language instruction for introductory German. Only the final example (singalong or karaoke) provides the completed presentation.
    The last example shows short AV presentation using "Stairway to Heaven" by Led Zeppelin. The last demo includes highlighted, audio-synchronized text, document window elapsed time display, and slider bar available for user adjustment of play location. Click on presentation hyperlinks to see web page images, Flash or WMP video also available for all except the segmenting dictation lecture.

    Electronic audio book:  Romeo and Juliet Demo #14A Flash WMP MP4
    Electronic scrapbook:  One Fabulous Vacation Demo #14B Flash WMP MP4
    Lecture #1:  Segmenting Dictation   See Demo #1A Flash WMP MP4
    Lecture #2:  Geography of Ireland  Demo #15 Flash WMP MP4
    Speech: Gettysburg Address  Demo #16 Flash WMP MP4
    Sales presentation:  Pet Palace  Demo #17 Flash WMP MP4
    Language instruction:  Introductory German Demo #18 Flash WMP MP4

    Singalong/karaoke:  Stairway to Heaven (Led Zeppelin)  Demo #19 Flash WMP MP4

    "Stairway to Heaven," produced by Jimmy Page, executive producer Peter Grant, © 1971 Atlantic Recording Corporation for the United States and WEA International Inc. for the world outside of the United States.

    Demo #20 video shows operator copying and pasting English (source) text into web-based machine translator.  User delimits translation Spanish (target) text output with vertical placeholders.  Operator clicks the synchronizes session tags button.  Clicking in English source text segment indicates the corresponding translated segments.  Flash WMP MP4

    Demo #21 video shows operator tiling document windows horizontally and synchronizing French translation with English source.  Operator opens third document window for previously delimited Spanish translation.  Operator synchronizes Spanish translation with both English and French translations and synchronizes French and Spanish translations.  Operator selects each English segment sequentially and confirms synchronized highlighting of French and Spanish segments.  Flash WMP MP4

    To compare accuracy of translation, operator may repeat process by substituting identically delimited French and Spanish translations from different manual or automatic translation source.  Thereafter, operator text compare against initial English or French translations or other standard. Initial translations are shown in the screen shot below. 

    Demo #22 shows use of multilevel annotations for multilevel knowledge base for creation of a nondisclosure agreement and efficient document assembly with SpeechMax™.  In example, user selects phrases from various "fill-in-the-blank" alternatives.  User text compares alternative form selections.  User can also use color-coded composite Best Session™ to visualize the variability in selection choice for each field. Color coding shows agreement between source documents in knowledge base.  Redder indicates considerable difference.  Pink indicates minimal difference and clear no difference.   Knowledge base may be utilized by law firm, business, or other organization. Flash MP4

    Demo #23 video shows divide/scramble untranscribed audio session file and merge/unscramble transcribed session files. Flash WMP  MP4

    Demo #24 video shows transcribed session file in SpeechMax™ and selective redaction of patient name from transcribed session file (TSF). Flash WMP MP4

    Demo #25 shows transcribed session file in SpeechMax™, selective redaction of patient name from transcribed session file (TSF), playback of export in PlayBax™ transcription software controlled with foot pedal, and transcription of redacted audio file in Word.  Flash WMP MP4

    Demo #26 shows dictation with Olympus handheld recorder and remote Dragon server-based speech recognition.  Flash WMP MP4

    Demo #27 shows Vista Windows speech recognition with desktop, server-based autotranscribe.  WMP MP4


    Price, terms, specifications, and availability are subject to change without notice. Custom Speech USA, Inc. trademarks are indicated.   Other marks are the property of their respective owners. Dragon and NaturallySpeaking® are licensed trademarks of Nuance® (Nuance Communications, Inc.)