Skip to content

Flex-Kartierung — Architecture

Flex-Kartierung follows a container-based layered model with a clear separation between synchronous operation through the web API and asynchronous processing in a worker pool. Jobs are decoupled through a job queue, so that the operating interface and the pipeline run independently. The delivery of public content is separated from capture: it takes place as a static website against a dedicated directory that can be served by a web server or CDN.

At a glance

  • Three containers in standard operation: application (FastAPI + admin UI + site generator), worker pool, and database, complemented by a Redis container for queue and cache.
  • Layer separation: operating interface, REST API, business logic (services), data access via repository pattern, persistence.
  • Asynchronous processing: all costly steps (crawl, extract, validate, normalize, translate, generate) run as prioritized background jobs.
  • LLM calls are encapsulated through a uniform client and protected by circuit breaker, retry manager, and a worker-spanning rate limit.
  • Configuration as YAML files (categories, prompts, translation strings) and through environment variables (backends, limits, behavior).
  • Database schema with schema migrations via Alembic; each processing stage is modeled as a separately updatable field.
  • Static website export decouples delivery from capture and enables operation without an application backend for end users.

Components and workflow

Layers

The application is structured into four logical layers. The operating interface comprises the server-side rendered admin UI and the generated public website. The API layer provides an admin API for management and a public API for read access. The business logic encapsulates the processing steps in self-contained services: crawler, prompt engine, entity normalizer, translation service, profile generator, site generator, and the robustness components circuit breaker and retry manager. The data-access layer follows the repository pattern and hides SQLAlchemy details from the services. Persistence consists of a PostgreSQL database for all master and history data and a Redis instance for the job queue.

Pipeline and data flow

The processing of a source is modeled as a sequence of decoupled jobs in the job queue. When a URL is registered, a crawl job is created first. The crawler fetches the page, optionally additional linked subpages of the same domain, converts the HTML to Markdown, and writes it to the source in the database. Subsequently, extract jobs are created for each prompt of the corresponding category. A dependency resolver determines the execution order in waves so that fields with dependencies are only run once their source fields are available. Each extract job triggers an LLM request with the Markdown as context; the result is stored as the raw extraction and is then re-evaluated by the LLM in a dedicated validate phase. Fields with entity references run through entity normalization, in which the LLM produces a suggestion from the existing entity inventory; depending on the confidence, the link is established automatically or a review task is created. Translatable fields are then translated to English individually, each with the full profile as the context frame and with protection for proper names, URLs, and technical abbreviations. As soon as all fields of a source are available, the profile generator produces the Markdown document from the category template; on demand, the static website is regenerated.

Diagram

flowchart TB
    User([Operators])
    Web[(University web pages)]
    LLM[(LLM backend<br/>OpenAI-compatible)]

    subgraph Frontend[Operating interface]
        AdminUI[Admin UI<br/>htmx + Alpine.js]
        PublicSite[Static website<br/>HTML + client search]
    end

    subgraph API[FastAPI backend]
        AdminAPI[Admin API]
        PublicAPI[Public API]
        SiteGen[Site generator]
    end

    Queue[(Redis<br/>job queue)]
    DB[(PostgreSQL)]

    subgraph Workers[Worker pool]
        Crawler[Crawler service]
        PromptEngine[Prompt engine<br/>extract + validate]
        Normalizer[Entity normalizer]
        Translator[Translation service]
        Profile[Profile generator]
    end

    subgraph Robust[Robustness layer]
        CB[Circuit breaker]
        Retry[Retry manager]
    end

    User --> AdminUI
    AdminUI --> AdminAPI
    PublicSite --> PublicAPI

    AdminAPI --> Queue
    AdminAPI --> DB
    PublicAPI --> DB
    SiteGen --> DB
    SiteGen --> PublicSite

    Queue --> Crawler
    Crawler --> Web
    Crawler --> DB
    Crawler --> PromptEngine

    PromptEngine --> DB
    PromptEngine --> Normalizer
    Normalizer --> DB
    Normalizer --> Translator
    Translator --> DB
    Translator --> Profile
    Profile --> DB

    PromptEngine -.-> CB
    Normalizer -.-> CB
    Translator -.-> CB
    CB -.-> LLM
    Retry -.-> LLM
    PromptEngine -.-> LLM
    Normalizer -.-> LLM
    Translator -.-> LLM

Explanation

Operators interact exclusively with the admin UI, which writes jobs synchronously to the database and to the job queue. The worker pool pulls jobs from the queue and executes them in the order determined by job priority: crawl jobs have the highest priority, followed by independent extract jobs, the validate phase, dependent extract jobs, entity normalization, translation, and finally profile generation. Each stage writes its result into a separate database field, so that downstream stages are repeatable without re-running previous stages.

All calls to the LLM backend run through a uniform LLM client that enforces a per-worker and global rate limit. Behind it sit the circuit breaker and retry manager: on repeated failures, the circuit breaker opens and blocks further requests for a configurable pause; transient errors are dampened through exponential backoff with jitter and a bounded number of attempts. Non-retryable errors (such as validation errors) are deliberately not retried.

The public website is generated by the site generator from the published profiles, the category data, and the i18n strings, and is written as plain file trees into a mounted directory. It can subsequently be served by a web server or a CDN; backend access is not required for end users. Each language has its own URL structure; the search function operates client-side on a co-delivered JSON index.

Role of the LLM in the pipeline

The LLM is invoked at four pipeline stages: extraction (one request per prompt with the crawled Markdown as context), validation (one request to evaluate the raw result, returning a quality class, a score, and a justification), entity normalization (one request per entity candidate with the existing entity inventory as the reference space), and translation (one request per translatable field). No embedders, vector indexes, or rerankers are used; the pipeline is modeled as a chain of specialized prompts. The system is therefore deliberately tailored to a single LLM endpoint and connectable to arbitrary OpenAI-compatible models.

Concurrency and configuration

The worker pool runs multiple jobs in parallel; the maximum number of concurrent workers and concurrent LLM calls is configurable separately. Within the pool, a semaphore limits the number of concurrent LLM requests, a token bucket the requests per minute, and a minimum interval between requests the load on the backend. Crawler and translation use their own rate limits per domain or globally. All thresholds, time limits, worker counts, crawler options, LLM address, and model data are set via environment variables; application settings use Pydantic validation. Schema changes to the database are managed via Alembic migrations.

Deployment

Standard operation is a Docker Compose composition consisting of an application container (FastAPI with admin UI and site generator), a worker container (background processing), a database container (PostgreSQL), and a cache/queue container (Redis). Application and worker share a common directory for the generated static website; application code, templates, and migrations are mounted read-only.

Technologies

  • Application framework: FastAPI, Uvicorn, Pydantic, Pydantic-Settings.
  • Data access and migrations: SQLAlchemy (asyncio), asyncpg, Alembic, PostgreSQL.
  • Queue and cache: Redis (async client).
  • Crawler: httpx, BeautifulSoup, lxml, html5lib, markdownify, urllib robotparser.
  • Operating interface: Jinja2, htmx, Alpine.js.
  • LLM connection: OpenAI-compatible chat-completions interface via httpx.
  • Site generation: Jinja2, markdown2, static delivery via web server or CDN.
  • Observability: structlog (structured JSON logs), prometheus-client, health endpoint.
  • Operations: Docker, Docker Compose.