Features¶

Recherche-Tool offers ten modes for different task types — from open web research to structured analyses. Across all modes, sources are evaluated through several connectors, facts are extracted in a multi-stage procedure, and the results are turned into a structured report.

Application scenarios¶

General web research on an open question: The starting point is a question without predefined sources. The application creates a research plan, searches the web with multilingual search terms, extracts relevant facts, and delivers a structured report with clickable source references.
Institution-internal research on people and units: A query concerns staff, departments, or content of the university. The application combines the internal directory, the institution-wide web index, and external web search results, and reconciles statements with verified directory data.
Validation of a bibliography before submission: An existing reference list is to be checked for formal correctness and discoverability. The application parses the entries, queries several academic databases in parallel, fills in missing DOIs, marks field-level deviations, and produces a corrections report.
Literature search for a topic overview: A research question is to be backed up with current literature. The application queries several literature APIs, ranks results by citations, and screens each result for relevance before producing an annotated suggestion list.
Structured decision analysis: Several options are to be compared on the basis of criteria. The application generates an analysis plan composed of sub-tasks, gathers evidence per criterion and option, checks for consistency, and produces a comparison table together with an explanatory discussion.
Virtual peer review of a manuscript: A submitted text is to be reviewed. The application produces a structured review with isolated assessments per criterion and structured comments.

At a glance¶

Ten modes: web research, institution research, bibliography validation, literature search, concept explainer, virtual peer review, decision analysis, research design, grant proposal, literature review.
Six output templates for research modes: general research, summary, structured overview, comparison analysis, fact check, technical documentation.
Multilingual search with separate search terms per question and language, topic-driven language selection, and a language filter for academic sources.
Import formats: PDF, Word, PowerPoint, Excel, HTML, RTF, Markdown, plain text, and CSV as context documents; free text and URLs as input.
Export formats: Markdown in the browser; Word export with real hyperlinks and an appendix containing sources, extracts, and metadata.
Quality assurance: URL deduplication, domain blocklist, negative-extract filter, gap analysis across rounds, contradiction detection, person cross-check, reliability rating.
Persistence per research run: report, sources, extracts, plan, search strategy, token usage, and metadata are stored on disk.

Connectors¶

The research pipeline relies on several connectors that are orchestrated per query and per mode. External connectors are described in more detail below; institution-internal connectors are limited to their function.

External connectors¶

SearXNG metasearch: Central search layer over a self-hosted SearXNG instance. Returns results from multiple search engines and supports filters for language, time range, and engine selection. The mode for academic literature uses a whitelist of academic engines (Wikipedia, Wikidata, Google Scholar, Semantic Scholar, arXiv, PubMed).
Web scraper: Generic fetcher for arbitrary web pages. Built on httpx with a custom user agent and trafilatura for content extraction. Includes a time-to-live cache, per-domain rate limiting, and an optional Playwright fallback for heavily JavaScript-driven pages.
GitHub: Connector for the GitHub REST API for repository, code, and issue search and for targeted retrieval of README files, releases, commits, and issues of a given project. Uses a personal access token for authentication; rate limits are handled with backoff.
GitLab: Connector for the GitLab REST API of a configurable instance, with the same search and retrieval patterns as the GitHub connector.
Literature APIs: Parallel queries against arXiv, CrossRef, OpenAlex, Semantic Scholar, DBLP, and OpenLibrary for bibliography validation and literature search. A shared rate limiter coordinates the parallel calls; results are scored against the original entry.

Institution-internal connectors¶

Directory (ZIS): Connector to the institution's internal directory for people, organisational units, and functions. Provides contact data, affiliations, and homepages, and serves at the same time as the reference for cross-checking statements from web sources.
Local people search: Local searchable index built from the directory and from crawled vita and homepage content. Combines full-text search (FTS5), semantic vector search through an embedder, and a cross-encoder reranker.
Institution web index (Solr): Institution-wide web index with a German and an English core for searching the official web content of the university.
Elasticsearch index: Optional connector to a further full-text index, for example a specific website or an editorial system. Field mapping is configurable.
WebDAV: Optional connector to a WebDAV store for internal document sources.
Local files: Connector for uploaded context documents and configurable local directories.

Import and export formats¶

For input, the application accepts free text, URLs, and uploaded documents as context. Supported document types are PDF (handled via pdfminer with a fallback to a heavyweight extractor), Word (DOCX/DOC), PowerPoint (PPTX/PPT), Excel (XLSX/XLS), HTML, RTF, Markdown, reStructuredText, plain text, and CSV. Document processing strips typical artefacts such as headers, footers, and page numbers and deduplicates repeated paragraphs.

On the output side, the report is shown as Markdown in the browser. A Word export produces a structured document with clickable hyperlinks in the body and an appendix containing the source list, per-source extracts, a progress log, and metadata. In addition, every research run is persisted as a directory containing the report, source snapshots, extracts, and metadata.

Quality assurance¶

Several mechanisms safeguard the quality of the research output and intervene at different points along the pipeline.

URL deduplication and domain blocklist: URLs are normalised before processing (scheme, subdomain, path, tracking parameters) and checked against a blocklist of off-topic, low-quality, or commonly mis-routed domains.
Negative-extract filter: Statements that explicitly express the absence of information ("no information available", "not mentioned") are removed before synthesis so the report is not inflated with descriptions of what is missing.
Iterative gap analysis: After each research round, a dedicated LLM step checks which plan questions are still open and schedules targeted follow-up queries or additional direct URLs for the next round.
Diminishing-returns detection: When a round produces no new positive extracts, the loop stops early to avoid unnecessary model calls.
Reliability rating per extract: Every extracted fact carries a reliability level that feeds into the synthesis and is shown in the extracts panel.
Person cross-check: Web findings that assign a person to a unit of the university are cross-checked against the verified directory; deviations are marked as unverified in the report. A hard name filter prevents fuzzy-match errors in people search.
Contradiction detection: An LLM-based contradiction check runs over the collected extracts; identified conflicts are flagged in the report.
Plan confirmation: The generated plan can optionally be inspected and approved before the search and extraction phase begins.
Token and call statistics: Per run, the number of model calls and the token usage per model are recorded and reported.

Further functions¶

Discuss the request: Before starting a research run, the request can be refined in a guided chat; the system then proposes a concrete formulation for the research task.
Output templates: Six predefined templates for research reports (general research, summary, structured overview, comparison analysis, fact check, technical documentation) control the structure and style of the report.
Mode filters: Two switches restrict the search to institution-internal sources or to academic search engines.
Research history: Earlier runs are listed in the sidebar and can be reopened.
Stop and progress display: Running searches can be stopped at any time; progress and current phase are updated during the run.
Analysis pipeline for structured modes: The analysis modes are executed by a DAG runner that parallelises sub-tasks, persists intermediate results as checkpoints, and resumes after a stop or a reload.