Skip to content

Architecture

The application is built as a single container with a clear separation of concerns: UI, orchestration, prompt construction, and LLM access are implemented as separate modules. It communicates with an externally provided LLM instance via the OpenAI-compatible API and itself holds no persistent state. Configuration and connection are handled entirely via environment variables.

At a glance

  • Single-container application based on Gradio 6
  • Layered model: UI · orchestration · prompt builder · LLM client
  • External dependency: locally operated vLLM instance with an OpenAI-compatible API
  • Two-stage LLM workflow: neutralization (optional) → stylization
  • Configuration exclusively via environment variables (.env)
  • Session-scoped state in the UI, no data persistence
  • Reverse-proxy operation supported via GRADIO_ROOT_PATH

Component overview

The application is organized into five modules:

  • UI layer (app.py). Builds the Gradio interface with four tabs (Transformation, Style Controls, Neutralization, History), manages session state (history, counter), and wires up the event handlers.
  • Orchestration (app.py:transform_text). Drives the two-stage workflow — when neutralization is active, the neutralization prompt is generated and executed first, and its result is fed into the stylization stage.
  • Prompt builder (prompt_builder.py). Generates system and user prompts for both stages. Translates numeric control values into a five-tier intensity semantic and combines the active controls into a structured instruction list.
  • LLM client (llm_client.py). Wraps the OpenAI client, manages timeout and retry logic, and returns LLM responses to the orchestration layer.
  • Data models (models.py). Defines the data classes for controls, control settings, neutralization configuration, and transformation results.

In addition there is a configuration module (config.py), a token counter (token_counter.py), and two JSON files holding the default controls and default presets.

Data flow

flowchart TB
    User[Browser user]

    subgraph Container[Application container]
        UI[Gradio UI<br/>app.py]
        Orchestrator[Orchestration<br/>transform_text]
        PromptBuilder[Prompt builder]
        LLMClient[LLM client<br/>OpenAI-compatible]
        Token[Token counter<br/>tiktoken]
        State[Session state<br/>history]
        Defaults[(default_regler.json<br/>default_presets.json)]
    end

    vLLM[Locally operated<br/>vLLM instance]

    User -->|Input text, control values| UI
    UI --> Token
    UI --> Orchestrator
    Orchestrator -->|Stage 1: optional| PromptBuilder
    PromptBuilder -->|System + user prompt| LLMClient
    LLMClient -->|HTTP/JSON| vLLM
    vLLM -->|Response| LLMClient
    LLMClient --> Orchestrator
    Orchestrator -->|Stage 2| PromptBuilder
    Orchestrator --> State
    Orchestrator --> UI
    UI -->|Result, comparison| User
    Defaults --> UI

Workflow in detail

A transformation proceeds as follows:

  1. Input. The UI receives the text and the active control settings. The token counter checks the input size against the configured limit.
  2. Neutralization (optional). If neutralization is enabled and at least one dimension is selected, the prompt builder constructs a system prompt with the chosen dimensions and the strict anti-preamble rules. The LLM client issues the request to the vLLM instance and returns the neutralized intermediate text.
  3. Stylization. The prompt builder translates the active controls into textual instructions. Numeric values are mapped via _get_intensitaet_details to tier descriptions (slight, moderate, distinct, strong, EXTREME) and corresponding instructional sentences. Polar controls additionally include an avoidance clause for the opposite pole.
  4. LLM call. The LLM client invokes the chat completions API. On failure, the request is retried according to LLM_MAX_RETRIES with a wait time of LLM_RETRY_DELAY_SECONDS.
  5. Result handling. Input, intermediate, output, and the chosen configuration are stored in the session history; the UI renders the result as Markdown and updates the comparison selection.

Role of the LLM

The application uses the LLM as a pure text-processing tool for two clearly separated tasks (neutralization, stylization). Neither embedder nor reranker is involved; there is no agentic orchestration in the sense of tool-based autonomy. Determinism and controllability are instead achieved through the explicit step separation, the strict system prompts, and the five-tier intensity semantic.

Robustness and configuration

Robustness rests on three mechanisms: configurable timeouts (LLM_TIMEOUT_SECONDS), automatic retries (LLM_MAX_RETRIES), and upstream input validation through token counting. The full configuration — LLM API base URL, API key, model name, token limits, server port, and reverse-proxy path — is read from environment variables in an .env file, which is loaded at runtime via python-dotenv.

Deployment

The application is delivered as a single-container image (Python 3.11 Slim, non-privileged user, health check). A docker-compose.yml orchestrates startup, exposes port 7860, and forwards the configuration from the .env file into the container. Via extra_hosts, access to a vLLM instance on the host system is supported; alternatively, any other reachable OpenAI-compatible API can be used. The GRADIO_ROOT_PATH variable enables operation behind a reverse proxy under a sub-path.

Technology overview

  • UI: Gradio 6
  • LLM client: openai (Python SDK), used against an OpenAI-compatible API
  • LLM backend: locally operated vLLM instance
  • Token counting: tiktoken (encoding o200k_base)
  • Configuration: python-dotenv
  • Containerization: Docker, Docker Compose
  • Language and runtime: Python 3.11