Features¶

STT-Helper covers a narrow, clearly defined set of functions: accepting a transcript, processing it through a multi-stage LLM service, and delivering the result by email. Detailed functionality is grouped around input, processing, quality assurance, and output.

Use cases¶

Lecture and seminar notes: automatically generated transcripts of a teaching session are turned into readable, linguistically polished text suitable for distribution as accompanying material.
Interview and meeting transcripts: spoken contributions containing filler words, broken sentences, and transcription errors are cleaned up; the result is a citable text version.
Stylistic conversion to written register: spoken-style content is rewritten into a sober, active style oriented toward academic prose.
Structured documentation from raw text: a contiguous transcript is turned into a Markdown document organized by headings and paragraphs.
Preparation of recordings for RAG applications: recorded content is converted into a cleaned, structured Markdown form, making it available for indexing in retrieval-based chatbots or knowledge bases in a form suitable for chunking.
Domain-specific correction: a context field for subject area and terminology steers the recognition and correction of mistranscribed technical terms (in fields such as medicine, law, or engineering).

At a glance¶

Input channels: file upload and direct text entry
Three selectable, cumulative processing stages (correction, revision, formatting)
Free-text field for context to steer domain-specific processing
Chunk-based processing with overlap, parallel execution, and retries
Result delivery by email with a phase-specific filename
Bilingual user interface (German and English)
Validation of input data and email address

Input¶

Texts enter the application either as a file upload or through direct entry into the web form. File uploads accept plain-text formats: .txt, .text, .md, and .markdown, with a maximum size of 10 MB. When reading, the application tries multiple character encodings (UTF-8, UTF-8 with BOM, Latin-1, CP1252) and detects purely binary content via the proportion of null bytes. Input texts shorter than 100 characters or longer than five million characters are rejected.

A delivery email address is checked at submission time; the domain can be validated against an institutional constraint.

Processing stages¶

Processing is cumulative: stage 2 builds on the output of stage 1, and stage 3 builds on the output of stage 2. The UI selects the desired final stage; the application automatically runs all preceding stages.

Stage 1 — Correction: removal of obvious transcription errors, incomplete sentences, and colloquial filler. Technical terms are corrected based on the supplied context.
Stage 2 — Revision: stylistic conversion into a sober, professional written register with active voice and complete sentences. Content is neither added nor abridged.
Stage 3 — Formatting: structuring into Markdown with headings (level two and below), paragraphs, and a consistently readable layout.

Context steering¶

A free-text field accepts a description of the subject area or desired terminology. This context is injected into every stage as part of the prompt and influences both the correction of individual terms and the stylistic direction. If no context is supplied, the application uses a generic default.

Quality-assurance functions¶

Chunk overlap: consecutive chunks share an overlap of 500 characters, so that sentence boundaries and semantic context are not lost at split points.
Multi-stage pipeline with separated concerns: each stage pursues its own isolated goal; correction, stylistic alignment, and formatting are not conflated into a single prompt.
Per-chunk retries: in case of API errors or timeouts, a chunk is retried up to five times with progressively increasing backoff.
Best-result merging: when assembling the output, the application selects the highest-ranked successful stage result per chunk. If a later stage produces no result for a chunk, the result from an earlier stage is used.
Original fallback: if processing of a chunk fails across all stages, the original segment is emitted with an error note instead of discarding the entire result.
Input validation: file and text properties are checked before the job is accepted (size, extension, encoding, binary detection, minimum and maximum length).
Statistics logging: job submissions are recorded with a timestamp in a statistics file; write access is serialized through a file lock.

Output¶

The result is delivered as an email attachment. The filename reflects the chosen final stage (corrected_text, revised_text, or formatted_text.md). On success, the message contains a short summary and the processing duration; on failure, it contains a description of the problem encountered. After delivery, all input data, intermediate results, and the working directory are removed.

Operation¶

The interface is available in German and English; the language is selected automatically via the Accept-Language header or explicitly via a URL parameter. A dedicated button checks the availability of the connected LLM API without submitting a job.