Architecture¶
STT-Helper follows a separation between an interactive web frontend and an asynchronous processing worker. The application does not present a persistent result view in the UI; instead, it submits a job to a queue, and the actual long-running LLM processing takes place in a decoupled component, ending in an email delivery. The frontend therefore stays responsive and is not tied to the lifetime of the browser session.
At a glance¶
- Separation between web frontend, job queue, and worker process
- Asynchronous background processing; result delivered by email
- Chunk-based pipeline with semaphore-limited parallelism
- Cumulative three-stage processing through separate prompts
- File-based job configuration in a temporary working directory
- HTTP/JSON connection to an LLM API with retry and timeout
- Stateless frontend; persistent state only in a simple statistics log
Components¶
The application consists of three active components and several connections:
- Web frontend: receives input, validates the file and email address, sets up a temporary working directory containing the input text and a configuration file, and submits a job to the queue.
- Job queue: brokers jobs between frontend and worker. Jobs are submitted in background mode; the frontend receives no progress feedback.
- Worker process: reads the job configuration, performs chunking, calls the LLM API stage by stage, merges the results, and hands the output file off to the email dispatcher.
- LLM API: an externally addressed HTTP endpoint, called once per chunk and stage.
- Email dispatcher: a helper utility responsible for delivering the result, or an error notification, to the recipient.
Workflow¶
flowchart TD
User[User]
UI[Web Frontend]
Validate[Validation]
Workdir[Working Directory<br/>Input + Config]
Queue[Job Queue]
Worker[Worker Process]
Chunker[Chunking + Overlap]
Stage1[Stage 1: Correction]
Stage2[Stage 2: Revision]
Stage3[Stage 3: Formatting]
LLM[LLM API]
Merge[Result Merging]
Result[Result File]
Mailer[Email Dispatcher]
Mailbox[User Mailbox]
User -->|File or Text| UI
UI --> Validate
Validate -->|valid| Workdir
Workdir --> Queue
Queue --> Worker
Worker --> Chunker
Chunker --> Stage1
Stage1 -->|HTTP/JSON| LLM
LLM --> Stage1
Stage1 --> Stage2
Stage2 -->|HTTP/JSON| LLM
LLM --> Stage2
Stage2 --> Stage3
Stage3 -->|HTTP/JSON| LLM
LLM --> Stage3
Stage1 --> Merge
Stage2 --> Merge
Stage3 --> Merge
Merge --> Result
Result --> Mailer
Mailer --> Mailbox
Mailbox --> User
The flow starts with input through the web frontend. After successful validation, a temporary working directory is created containing the input text and a configuration file with the recipient address, the chosen final stage, the language preference, and the context. The frontend hands this path to the job queue and confirms acceptance to the user; the browser session can be closed at this point.
The worker process is launched by the queue, loads the configuration, and reads the input text. It then splits the text into chunks of fixed length with overlap. For each active stage, the LLM API call is executed in parallel for every chunk, bounded by a semaphore. On transient errors, up to five retries with progressive backoff are performed. The output of one stage becomes the input of the next.
When merging the final result, the worker selects the highest-ranked successful intermediate result per chunk and falls back to the original segment with an error note if a chunk fails permanently. The result file is stored under a name reflecting the final stage and is handed off to the email dispatcher. After delivery, the working directory is removed entirely.
Role of the LLM in the pipeline¶
The application uses a single LLM, but invokes it with a dedicated prompt at each stage. The stages pursue distinct tasks (linguistic correction, stylistic revision, Markdown structuring) and are not bundled into a single mixed call. Embedders, rerankers, or agentic decision logic are not used; the pipeline is deterministic in the order of stages and in the mapping of chunk to API call.
Concurrency and robustness¶
Parallelism is bounded through a semaphore to a fixed number of concurrent chunk calls. This caps the load on the LLM API and keeps the call rate predictable. Each chunk and stage allows up to five retries with progressive backoff; a permanently failing chunk does not block the processing of other chunks or the completion of the overall job. The statistics file is serialized through a file lock to avoid write conflicts under concurrent submissions.
Configuration and operation¶
Key runtime parameters (root path, request timeout) are controlled through environment variables. The LLM connection is defined through central constants. The job queue is contacted on startup, and the web frontend binds to a configurable port. The worker is orchestrated by a Bash script that invokes the Python processing process and then handles email delivery and cleanup of the working directory.
Technology overview¶
- Web frontend: Gradio (Python)
- Asynchronous HTTP calls: asyncio, aiohttp
- Job queue: Gearman (Python client
gear) - File lock: filelock
- Email delivery:
sendemail(CLI) - Worker-run orchestration: Bash script
- LLM connection: HTTP/JSON against a chat-completions-compatible endpoint