STT-Helper¶

STT-Helper is an application for the linguistic processing of automatically generated transcripts. Submitted text passes through a three-stage LLM-driven pipeline consisting of correction, stylistic revision, and structuring. Processing runs in the background; the result is delivered by email.

In contrast to a direct LLM call, the application splits the input text into overlapping chunks, applies a dedicated prompt configuration per stage, and merges the partial results using a fault-tolerant strategy. Long transcripts can therefore be processed in a single run without losing context across chunk boundaries.

At a glance¶

Turn raw transcripts into readable, linguistically cleaned text
Convert spoken-style content into a sober, written-style register
Generate Markdown-structured documents with headings and paragraphs
Process long transcripts in a single run without manual splitting
Prepare recordings for downstream use in RAG applications
Steer terminology recognition by providing subject-area context
Receive results asynchronously by email, without keeping a browser session open

Highlights¶

The application differs from a plain call against an LLM chat endpoint by means of a multi-stage, fault-tolerant processing pipeline. The implications for output quality: longer texts remain processable, partial failures do not invalidate the entire run, and each phase pursues a clearly delimited goal.

Three-stage pipeline: correction, revision, and formatting are independent phases with dedicated prompts. Each phase builds on the result of the previous one; the desired final stage is chosen in the UI.
Chunking with overlap: input texts are split into fixed-length segments with overlap, preserving sentence boundaries and contextual coherence across chunk borders.
Parallel processing with retry: multiple chunks are sent to the LLM API concurrently; on transient failures the call is retried with progressive backoff.
Robust merging: if a stage fails permanently for individual chunks, the procedure reuses the highest-ranked successful intermediate result and falls back to the original chunk as a last resort, instead of aborting the whole run.
Context-driven terminology: a free-text context field (subject area, special vocabulary) is injected into every stage prompt and steers the recognition and correction of technical terms.
Asynchronous background processing: a job queue separates the interactive frontend from the long-running processing. The session can be closed once the job has been submitted.
Multiple input channels: file upload (text formats) or direct entry into the web form.
Input validation: encoding fallback across multiple character sets, binary content detection, size and length checks.
Bilingual user interface: German and English, switched by browser language or URL parameter.