Let There Be Speech!™

Supports Server-Based Speech Recognition (SR), Speech User Training, and Presegmentation of Dictation Audio

Supports Windows and
SweetSpeech™  SR and earlier versions of Dragon and IBM SR.

Repetitive, Iterative Training for Speaker-Adaptive Speech Recognition


Pricing/custom programming/business systems integration

Version compatibility/operating systems/dependencies

Feature summary SpeechMax™/SpeechServers™/SweetSpeech

Video Demos

Software Development Milestones

Nonadaptive Speech Recognition

Company software also supports workflow and business systems integration, workflow design, call center, telephone dictation, digital dictation with handheld microphone, manual transcription audio playback with foot control, audio conversion, and macro creation for programmable keypad, voice, or bar code

U.S. patents and patents pending


Software represents server-based based speech user management/workflow for server-based speech recognition (SR).  SpeechServers™ is available for purchase separately or with SpeechProfessional™ combo package.

Company has offered Microsoft SAPI 5.x compatible, Microsoft  Dragon, and IBM speech recognition.  SAPI 5.x compatible SweetSpeech™ supports NASR based upon speech of a single speaker (speaker-dependent) and small-group and large-group speech user profiles. 

Edit output session files with the SpeechMax™ HTML text editor. 

Servers runs as a remote server or as desktop (e.g., autotranscribe for Microsoft SR or SweetSpeech™).  Software transcribes dictation from microphone, handheld recorder, or landline/cell phone, and also presegments audio prior to manual transcription (MT).

System supports server-based transcription with delayed return of transcribed session file.  Remote, real-time processing is not available.

SpeechServers™ SAPI 5.x is included in same install kit as SweetSpeech™, but is separately licensed.

The servers output audio-linked text transcribed session file for Microsoft speech recognition (tested with Microsoft Vista) and other compatible Microsoft SAPI 5.x speech engines (SaveSession™).  Other options include transcription of .WAV or other audio file into separate text file (TransWaveX™).

Services for Dragon and other adaptive SR also include CompleteProfile™.  This includes remote enrollment through audio file and verbatim text.   Software was available for early versions of Dragon and IBM.  It was not extensively tested with Windows Vista SR.  It is not used with SweetSpeech™ that enrolls user with audio-linked text.  

Software also supports presegmentation dictation audio into untranscribed session file (SpeechSplitter™). Presegmentation parameters are the same for speech to text conversion before manual transcription and SR (common segmentation module). This results in the same number of segments.  

Subsequent manual transcription converts data into transcribed session file.  This includes  with creation of audio-linked distribution (final) text, as well as verbatim text training session file (.TRS) for training and updating company's nonadaptive SweetSpeech™

SpeechServers™ version from 2007 supports only Microsoft,  SweetSpeech™, and Dragon speech recognition:

SweetSpeech™/SpeechServers™ SAPI 5.x ("SAPI 5.x") output includes audio-linked text in proprietary .SES session file (SaveSession™), text output (TransWaveX™), and untranscribed .SES session file with segmented audio (SpeechSplitter™).  This supports server-based ™ for Microsoft and SweetSpeech™

2. SpeechServers™ ("Dragon") supports Dragon 9.x.  It may run with more recent versions.  Services support transcribed .TXT text output, transcribed audio linked Dragon (.DRA) or CSUSA (.SES) session files, and speech user enrollment for SASR with verbatim text and audio.  Open .SES in SpeechMax™ only. 

SweetSpeech™  speech user profile training graphical interface is available through that software's Model Builder. 

User profile creation and updates for CompleteProfile™ run through Dragon, IBM, and Microsoft speech engines with higher level user/file management running through SpeechServers™.

System integrates with non-Windows workflow.  For example, medical center purchased SpeechServers
™ (Dragon) for back-end, server-based speech recognition. 

One requirement was the ability to transfer transcription data form a Windows platform to Unix through use of a Java interface.  Ease of integration with preexisting applications was a consideration in selecting the company's system. 

The system incorporates Dragon NaturallySpeaking Medical 9 dictation runtime and supports custom scripting for back-end workflow integration. Two-way file transfer occurs through use of SpeechServers™ Command!™ workflow/file management system.  This supported file monitor two-way transfer of data.   

Older version (Dragon v. 5/6 or IBM v. 8+) supported SpeechTrainer™.  Older version also included standalone desktop client that is no longer available.

Older Help files describe SpeechTrainer™ service:

1.  SpeechTrainer™ corrective adaptation was handled through automatic activity of CSUSA_Train_X where X is engine type. The activity must exist on a workflow for processing.  System performs corrective adaptation by transcribing audio, selecting differences with the transcribed output and provided verbatim text, and applying corrections. The process is repeated until appropriate conditions are met (target accuracy, unable to correct further, maximum number of cycles).

2.  SpeechTrainer™ source audio file must be in any supported format and will be converted to engine's requirements before training.  High-quality recording is required. To insure maximum accuracy, company recommended high-quality recording (low bit rates can cause more inaccuracies). Source verbatim text should be in .TXT format.  If the job does not have an assigned engine user ID, the engine automatically creates a new user (based on default settings) for processing. Software assigns new user to job's author. [This supported creating speech user on workflow without prior Dragon microphone enrollment.]

For more information on version compatibility, operating system requirements, and other dependencies, click here

SpeechServers™ General Functions (Services)

  • Workflow/speech user management
  • SweetSpeech™/SpeechServers™ SAPI 5.x ("SAPI 5.x")
  • SpeechServers™ ("Dragon")
  • SpeechTrainer™ iterative training, available only on early versions
  • SaveSession™ output audio-linked session file
  • TransWaveX™ output .TXT file only
  • CompleteProfile™ create user profile with verbatim text and audio file
  • SpeechSplitter™ segments dictation audio before manual transcription

SpeechServers™ SAPI 5.x is included in same install kit as SweetSpeech™, but is separately licensed.

Earlier version of Dialog when running Dragon v. 6.0 (last version when SpeechTrainer™ available)

Value Proposition:  Efficient workflow/file management, that also provides server-based pretraining for SASR with representative data before SR use and ongoing training if required.  SweetSpeech™  provides a separate interface for pretraining SASR model.  

Problem:  A speaker who usually dictates into a handheld recorder out-of-office in a car will typically dictate differently when reading a script on a computer monitor screen and experience different background noise.  Research  indicates that this will result in a mismatch between the speaker's actual speech and the model created and likely decrease in accuracy.  How does speaker create generate representative data for user problem creation.

Solution:   SpeechServers™ supports server-based enrollment with audio file and verbatim text for SASR.  Speech user can download handheld recorder audio file and submit with verbatim text to train speech user profile.  SweetSpeech™ speaker-dependent NASR requires submission of audio-linked verbatim text training session file.


 Use representative, actual speech to train speech user profile

  • Representative data used to create SASR and NASR speech user profile
  • CompleteProfile™ uses dictation audio file and verbatim text for SASR
  • SpeechSplitter™ segments audio for manual transcription (MT)
  • Resulting transcribed audio-linked text used to train SweetSpeech™
  • Use audio-linked session file for training SweetSpeech™
  • If starting with MT, text compare MT with SR to determine accuracy
  • If sufficient accuracy, speaker enters automation SR phase
  • May also extract text and audio from preexisting SR from training data


1.  System supports use of more representative data through remote speaker audio file text enrollment.  May obtain from manual transcription or extract audio and text data from dictation from using SR system.  Use audio file and text for SASR system.  Use audio-linked session file for NASR.  Requires SR SDK to convert to .SES format for NASR training.  For SASR training, extract audio and text, convert text to verbatim, and submit both to CompleteProfile™ for remote speaker enrollment.   

2.  Efficient file/workflow management.  Compliant with well recognized SAPI 5.x standard. Currently supports 3 speech engines (Dragon, Microsoft, and CSUSA).  Potential to increase number of speech engines utilized.  Process supports different underlying speech engine processing.  For example, SR may use hidden Markov models with Gaussian mixtures, neural networks, adaptive, nonadaptive, speaker dependent, speaker independent, or other techniques. 

3.  Software integrates with non-Windows or other systems with workflow file monitor.  Two-way file is transfer established.  Audio file and other outside data passed by non-Windows system to company workflow file monitor server as workflow "job".   Speech engine processes data and returns to file monitor folder.  Non-Windows software or other system receives processed data from file monitor folder and processes.

SpeechSplitter™, SaveSession™ (SweetSpeech™)

Demo #1A . . . SpeechSplitter™ segmentation . . . Dictation audio file is segmented to create untranscribed session file (USF) consisting of multiple utterances (phrases).  Transcriptionist manually transcribes in SpeechMax™ to create transcribed session file (TSF) using PlaySpeech™ functionality.   Demo also shows realignment segment boundary marker to include audio for "period" with larger adjacent utterance.   Flash WMP MP4  
Demo #1B
 . . . SpeechSplitter™ segmentation and creation of transcribed session file from previously transcribed file . . . Video shows SpeechSplitter™ utterance (phrase) segmentation to create untranscribed session file from dictation audio.  Transcriptionist imports previously transcribed text, sequentially listens to each untranscribed utterance using PlaySpeech™ functionality, and sequentially delimits each utterance by toggling play audio control.  This demo shows how process can generate audio-aligned text from audio file and preexisting text.  The segmented transcribed session file can be used as a training session file.  Flash WMP MP4  

Demo #2 . . . Server-based transcription using prototype SweetSpeech™ speech to text conversion . . . In-house staff created speech user profile with SweetSpeech speech and language processing toolkit.  Video first shows text immediately after speech-to-text conversion (raw speech engine decoding).  This is followed by regular expressions algorithms to search and match text strings.  Conversion rules may reflect speaker or institutional preferences.  Speech user profile typically reflects these preferences.  User loaded post-formatting transcribed session file (TSF) into SpeechMax™ to play back audio and make any needed corrections.  Flash WMP MP4

SaveSession™ Functionality (Dragon, Vista Windows SR)

Demo #26 . . . Dictation with Olympus handheld recorder and remote Dragon server-based speech recognition . . . Flash WMP MP4

Demo #27 . . . Vista Windows SR and desktop autotranscribe . . .  WMP MP4

Price, terms, specifications, and availability are subject to change without notice. Custom Speech USA, Inc. trademarks are indicated.   Other marks are the property of their respective owners. Dragon and NaturallySpeaking® are licensed trademarks of Nuance® (Nuance Communications, Inc.)