Architecture¶

TextTool is designed as a single container that provides a web interface and forwards requests to an external LLM endpoint. The application is separated by responsibility into five modules; editing state is held exclusively per browser session in memory and is not persisted.

At a glance¶

Single-container application based on Python and Gradio, configured through environment variables.
Modular separation: UI/orchestration, prompt management, LLM client, export, configuration.
Communication with the LLM via the OpenAI-compatible Chat Completions interface (HTTP).
Session-bound state (source text, result, history) without database or disk persistence.
In-memory generation of export files; downloads are isolated per session.
Synchronous request/response model with timeout and error handling in the client.
Deployment via Docker or docker-compose; operation behind a reverse proxy is supported (configurable ROOT_PATH).

Architecture description¶

Components and layers¶

The application follows a classic three-layer arrangement: a presentation layer in the browser, a server-side application layer in the container, and an external inference service.

The presentation layer is generated by the Gradio framework. It comprises the tool bar, the side-by-side text fields for source text and result, the accordions for the modifier instruction and the custom command, the export controls, and a second tab for the editing history. Session data is held in Gradio state variables and is isolated per browser session.

The application layer in the container consists of five cooperating modules:

app.py orchestrates the interface, binds buttons to processing functions, manages session state (source text, result, history, accordion states), and triggers tool execution, the apply action, restoration, and export.
prompts.py contains the predefined tool prompts, the shared system prompt, and tool-specific generation parameters. A helper function combines the tool prompt, the optional modifier instruction, and the input text into a complete prompt.
llm_integration.py encapsulates the call to the LLM endpoint. It builds chat messages from the system and user prompt, sets temperature, top-p, maximum tokens, and timeout, measures response time, and translates exceptions into user-readable error messages.
export_utils.py generates the four export formats from source text and result. Markdown is rendered into headings, lists, tables, and inline formatting for the DOCX output; all files are assembled in memory.
config.py loads configuration values (endpoint URL, model name, sampling parameters, server port, history limit, ROOT_PATH for reverse-proxy operation) from environment variables using type-safe conversion helpers.

The external LLM endpoint is treated as an internally provided inference service. The only requirement is that the service offers the OpenAI-compatible Chat Completions interface.

Workflow¶

flowchart TB
    User[User in browser]

    subgraph Container[Container]
        UI[Gradio UI<br/>app.py]
        State[Session state<br/>source, result, history]
        Prompts[Prompt module<br/>prompts.py]
        Client[LLM client<br/>llm_integration.py]
        Export[Export module<br/>export_utils.py]
        Config[Configuration<br/>config.py]
    end

    LLM[LLM endpoint<br/>OpenAI-compatible]
    Files[(Export files<br/>txt / md / html / docx)]

    User -->|HTTP| UI
    UI <-->|reads and writes| State
    UI -->|tool selection, text, modifier| Prompts
    Prompts -->|constructed prompt| Client
    Client -->|Chat Completions request| LLM
    LLM -->|response| Client
    Client -->|result, latency| UI
    UI -->|export request| Export
    Export -->|generated in memory| Files
    Files -->|download| User
    Config -.->|parameters| Client
    Config -.->|server, paths| UI

A single editing step proceeds as follows: after a text is entered and a tool is selected, the UI takes the current source text and any modifier instruction and invokes prompt construction. The prompt module combines the predefined tool prompt, the modifier instruction, and the input text into a complete user message and resolves any tool-specific generation parameters. The LLM client builds a chat request with system and user message from this, sends it to the OpenAI-compatible endpoint, and returns the response text together with the measured latency. The UI updates the result field, adds an entry to the session history, and shows a brief status message.

The "Apply" action sets the result as the new source text and adds another history entry, so several tools can be applied to the same text in sequence. From the history tab a previous version can be inspected and restored as the current version; the previously active version is itself stored in the history. On export, the requested content (source, result, or both) is generated in all four formats simultaneously in memory and offered as a file bundle for download.

Robustness and concurrency¶

The LLM client uses a configurable timeout and translates connection, timeout, and authentication errors into structured notices in the interface. Before each tool invocation, the input is checked for presence; for empty source or result text, actions are rejected with an informational message.

The application is designed for parallel operation of multiple sessions. All editing data — source text, result, history, accordion states — is held in Gradio state variables that are kept separate per browser session. Export files are not written to a shared directory but generated in a temporary, session-bound directory so that no file conflicts arise between users. The history is limited per session to a configurable maximum number of entries; older entries are dropped on a FIFO basis.

Configuration and deployment¶

The application is built via a Dockerfile as a Python 3.11 container and can be operated standalone or through docker-compose. A health-check definition probes the availability of the web server; runtime is performed under a non-root user. All runtime parameters — endpoint URL, API key, model name, sampling defaults, maximum tokens, timeout, server port, share switch, reverse-proxy path, and history limit — are set through environment variables. This allows the LLM endpoint or model to be swapped without code changes. Operation behind a reverse proxy (e.g. nginx) is supported through the ROOT_PATH variable.

Technology overview¶

UI framework: Gradio (Blocks API, version 5.48 or later).
LLM client: official openai Python client against the Chat Completions interface.
Interface: HTTP/HTTPS, OpenAI-compatible (/v1/chat/completions).
Export: python-docx for Word documents, the markdown package, and custom routines for Markdown-to-DOCX conversion; HTML is generated with an embedded style sheet.
Containerisation: Docker, docker-compose, Python 3.11-slim base image, health checks, resource limits.
Configuration: environment variables with type-safe parsing; no persistent data storage.