Skip to content

STT-Helper

STT-Helper is an application for the linguistic processing of automatically generated transcripts. Submitted text passes through a three-stage LLM-driven pipeline consisting of correction, stylistic revision, and structuring. Processing runs in the background; the result is delivered by email.

In contrast to a direct LLM call, the application splits the input text into overlapping chunks, applies a dedicated prompt configuration per stage, and merges the partial results using a fault-tolerant strategy. Long transcripts can therefore be processed in a single run without losing context across chunk boundaries.

At a glance

  • Turn raw transcripts into readable, linguistically cleaned text
  • Convert spoken-style content into a sober, written-style register
  • Generate Markdown-structured documents with headings and paragraphs
  • Process long transcripts in a single run without manual splitting
  • Prepare recordings for downstream use in RAG applications
  • Steer terminology recognition by providing subject-area context
  • Receive results asynchronously by email, without keeping a browser session open

Highlights

The application differs from a plain call against an LLM chat endpoint by means of a multi-stage, fault-tolerant processing pipeline. The implications for output quality: longer texts remain processable, partial failures do not invalidate the entire run, and each phase pursues a clearly delimited goal.

  • Three-stage pipeline: correction, revision, and formatting are independent phases with dedicated prompts. Each phase builds on the result of the previous one; the desired final stage is chosen in the UI.
  • Chunking with overlap: input texts are split into fixed-length segments with overlap, preserving sentence boundaries and contextual coherence across chunk borders.
  • Parallel processing with retry: multiple chunks are sent to the LLM API concurrently; on transient failures the call is retried with progressive backoff.
  • Robust merging: if a stage fails permanently for individual chunks, the procedure reuses the highest-ranked successful intermediate result and falls back to the original chunk as a last resort, instead of aborting the whole run.
  • Context-driven terminology: a free-text context field (subject area, special vocabulary) is injected into every stage prompt and steers the recognition and correction of technical terms.
  • Asynchronous background processing: a job queue separates the interactive frontend from the long-running processing. The session can be closed once the job has been submitted.
  • Multiple input channels: file upload (text formats) or direct entry into the web form.
  • Input validation: encoding fallback across multiple character sets, binary content detection, size and length checks.
  • Bilingual user interface: German and English, switched by browser language or URL parameter.