STT-Helper¶
STT-Helper is an application for the linguistic processing of automatically generated transcripts. Submitted text passes through a three-stage LLM-driven pipeline consisting of correction, stylistic revision, and structuring. Processing runs in the background; the result is delivered by email.
In contrast to a direct LLM call, the application splits the input text into overlapping chunks, applies a dedicated prompt configuration per stage, and merges the partial results using a fault-tolerant strategy. Long transcripts can therefore be processed in a single run without losing context across chunk boundaries.
At a glance¶
- Turn raw transcripts into readable, linguistically cleaned text
- Convert spoken-style content into a sober, written-style register
- Generate Markdown-structured documents with headings and paragraphs
- Process long transcripts in a single run without manual splitting
- Prepare recordings for downstream use in RAG applications
- Steer terminology recognition by providing subject-area context
- Receive results asynchronously by email, without keeping a browser session open
Highlights¶
The application differs from a plain call against an LLM chat endpoint by means of a multi-stage, fault-tolerant processing pipeline. The implications for output quality: longer texts remain processable, partial failures do not invalidate the entire run, and each phase pursues a clearly delimited goal.
- Three-stage pipeline: correction, revision, and formatting are independent phases with dedicated prompts. Each phase builds on the result of the previous one; the desired final stage is chosen in the UI.
- Chunking with overlap: input texts are split into fixed-length segments with overlap, preserving sentence boundaries and contextual coherence across chunk borders.
- Parallel processing with retry: multiple chunks are sent to the LLM API concurrently; on transient failures the call is retried with progressive backoff.
- Robust merging: if a stage fails permanently for individual chunks, the procedure reuses the highest-ranked successful intermediate result and falls back to the original chunk as a last resort, instead of aborting the whole run.
- Context-driven terminology: a free-text context field (subject area, special vocabulary) is injected into every stage prompt and steers the recognition and correction of technical terms.
- Asynchronous background processing: a job queue separates the interactive frontend from the long-running processing. The session can be closed once the job has been submitted.
- Multiple input channels: file upload (text formats) or direct entry into the web form.
- Input validation: encoding fallback across multiple character sets, binary content detection, size and length checks.
- Bilingual user interface: German and English, switched by browser language or URL parameter.