Architecture¶

Recherche-Tool is built as a containerised Python application with clearly separated layers: a Gradio-based user interface, an orchestration layer for the research and analysis pipelines, a connector layer for external and institution-internal sources, an LLM layer with two separate model endpoints, and a persistence layer on the file system. The pipeline runs fully asynchronously; several phases are parallelised.

At a glance¶

Container setup with three services: application container, SearXNG metasearch, and a Redis cache for SearXNG.
Layered design covering user interface, orchestration, pipeline, connectors, LLM layer, and persistence.
Asynchronous processing via asyncio with per-domain rate limiting and configurable concurrency limits for the fetch and harvest phases.
Dual-LLM configuration with separately configurable endpoints for primary and harvest models.
External embedder and reranker services accessed through OpenAI-compatible interfaces for the hybrid people search.
Configuration exclusively via environment variables, with default values bundled in a central configuration class.
Persistence per research run as a directory containing report, sources, extracts, and metadata.

Architecture description¶

The application is structured into five logical layers. The user interface accepts queries, manages session state and context documents, and dispatches to the appropriate pipeline path depending on the selected mode. The orchestration layer drives the research through its phases and delegates per phase to LLM calls or connectors. The connectors encapsulate access to individual sources behind a common interface with search and fetch methods. The LLM layer provides two independent clients. The persistence layer stores input documents, intermediate results, and final reports on the file system.

Data flow¶

flowchart TB
    User([User])
    UI[Gradio UI]

    subgraph Orchestration
        FormatAgent[Format agent]
        Planner[Research planner]
        Loop[Search and harvest loop]
        Gap[Gap analysis]
        Contradict[Contradiction check]
        Synth[Synthesis]
    end

    subgraph Connectors
        SearXNG[SearXNG]
        Web[Web scraper]
        Git[GitHub / GitLab]
        Solr[Solr index]
        ES[Elasticsearch]
        ZIS[Directory]
        ZSearch[People search]
        LitAPI[Literature APIs]
        Local[Local files / WebDAV]
    end

    subgraph LLM-layer
        Primary[Primary LLM]
        Harvest[Harvest LLM]
        Embed[Embedder]
        Rerank[Reranker]
    end

    subgraph Analysis-pipeline
        Decompose[Decomposer]
        DAG[DAG executor]
        Tasks[Sub-tasks]
    end

    Storage[(Persistence: reports, sources, extracts)]
    LitChecker[Bibliography checker]

    User --> UI
    UI -->|research modes| FormatAgent
    UI -->|analysis modes| Decompose
    UI -->|bibliographies| LitChecker

    FormatAgent --> Planner
    Planner --> Loop
    Loop --> Gap
    Gap -->|follow-up questions| Loop
    Gap -->|complete| Contradict
    Contradict --> Synth
    Synth --> Storage
    Synth --> UI

    Loop --> SearXNG
    Loop --> Web
    Loop --> Git
    Loop --> Solr
    Loop --> ES
    Loop --> ZIS
    Loop --> ZSearch
    Loop --> Local

    LitChecker --> LitAPI
    LitChecker --> Storage

    Decompose --> DAG
    DAG --> Tasks
    Tasks --> Storage

    FormatAgent -.-> Primary
    Planner -.-> Primary
    Synth -.-> Primary
    Gap -.-> Primary
    Contradict -.-> Primary
    Loop -.-> Harvest
    Tasks -.-> Primary
    Tasks -.-> Harvest
    ZSearch -.-> Embed
    ZSearch -.-> Rerank

A query is first handed to the format agent, which derives an output schema with title, sections, and style guidance from the query, the selected template, and the chat history. The research planner then produces a research plan with concrete research questions, multilingual search terms, URLs to fetch directly, and, where applicable, specific repository or directory queries. Optionally, the plan is shown to the user for confirmation at this point.

In the search-and-harvest loop, requests are run against the connectors per round, results are deduplicated, and the corresponding content is downloaded. From every fetched document, a harvest LLM call extracts relevant facts in a structured form, each linked to a plan question. Negative extracts are filtered. After every round, a gap analysis checks which questions remain open and schedules follow-up queries for another round; if a round yields no new positive extracts, the loop stops.

A contradiction check then runs over the collected extracts. Finally, the synthesis phase produces the report in Markdown from the extracts, the plan, the schema, and the context documents. Throughout the run, progress, token usage, and phase status are tracked and streamed to the user interface.

Role of the AI components¶

The application separates LLM tasks by complexity and concurrency requirements. Format agent, research planner, gap analysis, contradiction check, and synthesis run on the primary model because they require longer context and higher linguistic quality. Per-source fact extraction is highly parallelisable and runs on the harvest model, which is operated with shorter context, lower temperature, and without thinking mode. A semaphore caps the number of concurrent harvest calls.

The hybrid people search relies on two additional AI services. An embedder produces vectors for semantic similarity search in the local directory index. A cross-encoder reranker reorders candidates from full-text and vector search for the final ranking. Both services are accessed through OpenAI-compatible interfaces and can be configured independently of the main research LLM.

The analysis modes use a second pipeline variant. A decomposer breaks a task down into sub-tasks with dependencies and phase assignments. A DAG executor sorts the sub-tasks topologically into layers and runs tasks within a layer in parallel. Each sub-task selects whether to use the primary or the harvest model and whether to enable thinking mode. Checkpoints are persisted between layers, so execution can resume after a stop or a reload.

Concurrency and robustness¶

The pipeline runs fully asynchronously via asyncio. At the connector level, fetches are parallelised; a per-domain rate limiter enforces a minimum interval between requests to the same domain. For the literature APIs, a shared rate limiter coordinates parallel calls. URLs are deduplicated through normalisation, and a blocklist filters out off-topic and low-quality domains.

Model calls are retried with exponential backoff on transient server errors. JSON responses are parsed defensively; truncated or invalid JSON is handled by a repair heuristic, and on failure the call is retried once with an explicit JSON instruction. Running searches can be cancelled at any time via a stop signal, which is checked between phases and triggers a controlled shutdown with status persistence.

Configuration and deployment¶

The application runs as a container and reads its configuration entirely from environment variables. A central configuration class bundles settings for the user interface, pipeline, connectors, LLM endpoints, and document processing with sensible defaults. Per connector, activation, endpoints, authentication, field mappings, and limits can be controlled separately.

The default deployment combines three containers: the application container with the Gradio UI, a SearXNG container with a tailored engine configuration, and a Redis container as a cache for SearXNG. External services such as search engines, literature APIs, the institution-internal directory, Solr and Elasticsearch indices, and the LLM, embedder, and reranker endpoints are accessed over the network. The application creates a persistent data directory for stored research runs; cleanup parameters control the retention period.

A separate synchronisation script builds the local people index: it crawls the institution-internal directory, fetches linked homepages and vita pages, builds a SQLite index with FTS5, and computes vector embeddings for semantic search. Consent is respected throughout, and revocations are reconciled regularly.

Technology overview¶

Language and core frameworks: Python 3.12, Gradio for the user interface, asyncio for concurrency, httpx as the asynchronous HTTP client.
LLM integration: OpenAI Python SDK against OpenAI-compatible endpoints, with thinking mode passed through chat_template_kwargs for vLLM-based backends.
Web and document processing: trafilatura for text extraction from web pages, optional Playwright for heavily JavaScript-driven pages, pdfminer.six, python-docx, python-pptx, and openpyxl for office and PDF documents, python-docx with custom XML extensions for the Word export.
Persistence and indices: SQLite with FTS5 for the people index, NumPy for vector operations, file-system persistence for research runs.
External search and index services: SearXNG metasearch, Solr and Elasticsearch for internal indices, Redis as a SearXNG cache.
External data APIs: GitHub REST API, GitLab REST API, plus arXiv, CrossRef, OpenAlex, Semantic Scholar, DBLP, and OpenLibrary for bibliographic work.
Deployment: Docker with python:3.12-slim as the base image, multi-container topology (application, SearXNG, Redis), configuration exclusively via environment variables.