Architecture¶
The application is built as a single container with a clear separation of concerns: UI, orchestration, prompt construction, and LLM access are implemented as separate modules. It communicates with an externally provided LLM instance via the OpenAI-compatible API and itself holds no persistent state. Configuration and connection are handled entirely via environment variables.
At a glance¶
- Single-container application based on Gradio 6
- Layered model: UI · orchestration · prompt builder · LLM client
- External dependency: locally operated vLLM instance with an OpenAI-compatible API
- Two-stage LLM workflow: neutralization (optional) → stylization
- Configuration exclusively via environment variables (
.env) - Session-scoped state in the UI, no data persistence
- Reverse-proxy operation supported via
GRADIO_ROOT_PATH
Component overview¶
The application is organized into five modules:
- UI layer (
app.py). Builds the Gradio interface with four tabs (Transformation, Style Controls, Neutralization, History), manages session state (history, counter), and wires up the event handlers. - Orchestration (
app.py:transform_text). Drives the two-stage workflow — when neutralization is active, the neutralization prompt is generated and executed first, and its result is fed into the stylization stage. - Prompt builder (
prompt_builder.py). Generates system and user prompts for both stages. Translates numeric control values into a five-tier intensity semantic and combines the active controls into a structured instruction list. - LLM client (
llm_client.py). Wraps the OpenAI client, manages timeout and retry logic, and returns LLM responses to the orchestration layer. - Data models (
models.py). Defines the data classes for controls, control settings, neutralization configuration, and transformation results.
In addition there is a configuration module (config.py), a token counter (token_counter.py), and two JSON files holding the default controls and default presets.
Data flow¶
flowchart TB
User[Browser user]
subgraph Container[Application container]
UI[Gradio UI<br/>app.py]
Orchestrator[Orchestration<br/>transform_text]
PromptBuilder[Prompt builder]
LLMClient[LLM client<br/>OpenAI-compatible]
Token[Token counter<br/>tiktoken]
State[Session state<br/>history]
Defaults[(default_regler.json<br/>default_presets.json)]
end
vLLM[Locally operated<br/>vLLM instance]
User -->|Input text, control values| UI
UI --> Token
UI --> Orchestrator
Orchestrator -->|Stage 1: optional| PromptBuilder
PromptBuilder -->|System + user prompt| LLMClient
LLMClient -->|HTTP/JSON| vLLM
vLLM -->|Response| LLMClient
LLMClient --> Orchestrator
Orchestrator -->|Stage 2| PromptBuilder
Orchestrator --> State
Orchestrator --> UI
UI -->|Result, comparison| User
Defaults --> UI
Workflow in detail¶
A transformation proceeds as follows:
- Input. The UI receives the text and the active control settings. The token counter checks the input size against the configured limit.
- Neutralization (optional). If neutralization is enabled and at least one dimension is selected, the prompt builder constructs a system prompt with the chosen dimensions and the strict anti-preamble rules. The LLM client issues the request to the vLLM instance and returns the neutralized intermediate text.
- Stylization. The prompt builder translates the active controls into textual instructions. Numeric values are mapped via
_get_intensitaet_detailsto tier descriptions (slight, moderate, distinct, strong, EXTREME) and corresponding instructional sentences. Polar controls additionally include an avoidance clause for the opposite pole. - LLM call. The LLM client invokes the chat completions API. On failure, the request is retried according to
LLM_MAX_RETRIESwith a wait time ofLLM_RETRY_DELAY_SECONDS. - Result handling. Input, intermediate, output, and the chosen configuration are stored in the session history; the UI renders the result as Markdown and updates the comparison selection.
Role of the LLM¶
The application uses the LLM as a pure text-processing tool for two clearly separated tasks (neutralization, stylization). Neither embedder nor reranker is involved; there is no agentic orchestration in the sense of tool-based autonomy. Determinism and controllability are instead achieved through the explicit step separation, the strict system prompts, and the five-tier intensity semantic.
Robustness and configuration¶
Robustness rests on three mechanisms: configurable timeouts (LLM_TIMEOUT_SECONDS), automatic retries (LLM_MAX_RETRIES), and upstream input validation through token counting. The full configuration — LLM API base URL, API key, model name, token limits, server port, and reverse-proxy path — is read from environment variables in an .env file, which is loaded at runtime via python-dotenv.
Deployment¶
The application is delivered as a single-container image (Python 3.11 Slim, non-privileged user, health check). A docker-compose.yml orchestrates startup, exposes port 7860, and forwards the configuration from the .env file into the container. Via extra_hosts, access to a vLLM instance on the host system is supported; alternatively, any other reachable OpenAI-compatible API can be used. The GRADIO_ROOT_PATH variable enables operation behind a reverse proxy under a sub-path.
Technology overview¶
- UI: Gradio 6
- LLM client:
openai(Python SDK), used against an OpenAI-compatible API - LLM backend: locally operated vLLM instance
- Token counting:
tiktoken(encodingo200k_base) - Configuration:
python-dotenv - Containerization: Docker, Docker Compose
- Language and runtime: Python 3.11