Skip to content

Architecture

STT-Helper follows a separation between an interactive web frontend and an asynchronous processing worker. The application does not present a persistent result view in the UI; instead, it submits a job to a queue, and the actual long-running LLM processing takes place in a decoupled component, ending in an email delivery. The frontend therefore stays responsive and is not tied to the lifetime of the browser session.

At a glance

  • Separation between web frontend, job queue, and worker process
  • Asynchronous background processing; result delivered by email
  • Chunk-based pipeline with semaphore-limited parallelism
  • Cumulative three-stage processing through separate prompts
  • File-based job configuration in a temporary working directory
  • HTTP/JSON connection to an LLM API with retry and timeout
  • Stateless frontend; persistent state only in a simple statistics log

Components

The application consists of three active components and several connections:

  • Web frontend: receives input, validates the file and email address, sets up a temporary working directory containing the input text and a configuration file, and submits a job to the queue.
  • Job queue: brokers jobs between frontend and worker. Jobs are submitted in background mode; the frontend receives no progress feedback.
  • Worker process: reads the job configuration, performs chunking, calls the LLM API stage by stage, merges the results, and hands the output file off to the email dispatcher.
  • LLM API: an externally addressed HTTP endpoint, called once per chunk and stage.
  • Email dispatcher: a helper utility responsible for delivering the result, or an error notification, to the recipient.

Workflow

flowchart TD
    User[User]
    UI[Web Frontend]
    Validate[Validation]
    Workdir[Working Directory<br/>Input + Config]
    Queue[Job Queue]
    Worker[Worker Process]
    Chunker[Chunking + Overlap]
    Stage1[Stage 1: Correction]
    Stage2[Stage 2: Revision]
    Stage3[Stage 3: Formatting]
    LLM[LLM API]
    Merge[Result Merging]
    Result[Result File]
    Mailer[Email Dispatcher]
    Mailbox[User Mailbox]

    User -->|File or Text| UI
    UI --> Validate
    Validate -->|valid| Workdir
    Workdir --> Queue
    Queue --> Worker
    Worker --> Chunker
    Chunker --> Stage1
    Stage1 -->|HTTP/JSON| LLM
    LLM --> Stage1
    Stage1 --> Stage2
    Stage2 -->|HTTP/JSON| LLM
    LLM --> Stage2
    Stage2 --> Stage3
    Stage3 -->|HTTP/JSON| LLM
    LLM --> Stage3
    Stage1 --> Merge
    Stage2 --> Merge
    Stage3 --> Merge
    Merge --> Result
    Result --> Mailer
    Mailer --> Mailbox
    Mailbox --> User

The flow starts with input through the web frontend. After successful validation, a temporary working directory is created containing the input text and a configuration file with the recipient address, the chosen final stage, the language preference, and the context. The frontend hands this path to the job queue and confirms acceptance to the user; the browser session can be closed at this point.

The worker process is launched by the queue, loads the configuration, and reads the input text. It then splits the text into chunks of fixed length with overlap. For each active stage, the LLM API call is executed in parallel for every chunk, bounded by a semaphore. On transient errors, up to five retries with progressive backoff are performed. The output of one stage becomes the input of the next.

When merging the final result, the worker selects the highest-ranked successful intermediate result per chunk and falls back to the original segment with an error note if a chunk fails permanently. The result file is stored under a name reflecting the final stage and is handed off to the email dispatcher. After delivery, the working directory is removed entirely.

Role of the LLM in the pipeline

The application uses a single LLM, but invokes it with a dedicated prompt at each stage. The stages pursue distinct tasks (linguistic correction, stylistic revision, Markdown structuring) and are not bundled into a single mixed call. Embedders, rerankers, or agentic decision logic are not used; the pipeline is deterministic in the order of stages and in the mapping of chunk to API call.

Concurrency and robustness

Parallelism is bounded through a semaphore to a fixed number of concurrent chunk calls. This caps the load on the LLM API and keeps the call rate predictable. Each chunk and stage allows up to five retries with progressive backoff; a permanently failing chunk does not block the processing of other chunks or the completion of the overall job. The statistics file is serialized through a file lock to avoid write conflicts under concurrent submissions.

Configuration and operation

Key runtime parameters (root path, request timeout) are controlled through environment variables. The LLM connection is defined through central constants. The job queue is contacted on startup, and the web frontend binds to a configurable port. The worker is orchestrated by a Bash script that invokes the Python processing process and then handles email delivery and cleanup of the working directory.

Technology overview

  • Web frontend: Gradio (Python)
  • Asynchronous HTTP calls: asyncio, aiohttp
  • Job queue: Gearman (Python client gear)
  • File lock: filelock
  • Email delivery: sendemail (CLI)
  • Worker-run orchestration: Bash script
  • LLM connection: HTTP/JSON against a chat-completions-compatible endpoint