Recherche-Tool¶
Recherche-Tool is an application for autonomous, multi-stage research and analysis backed by large language models. A query is not handed to a language model in a single step but runs through an agentic pipeline of planning, search across multiple connectors, extraction, gap analysis, and synthesis. Beyond open web research, dedicated modes cover institution-internal queries, bibliographic work, and structured analyses.
At a glance¶
- Produce research reports for open questions — from query to a structured report with embedded source links, all stages run automatically.
- Run institution-internal research — people, organisational units, and web content of the university are evaluated jointly and cross-checked against the verified directory.
- Validate bibliographies — existing reference lists are checked entry by entry against multiple academic databases, missing DOIs are added, and deviations are flagged.
- Find literature for a research question — results are weighted by citations and screened for relevance.
- Produce structured analyses — concept explainer, virtual peer review, decision analysis, research design, and literature review as dedicated modes.
- Include local documents — uploaded PDF, Word, PowerPoint, and Excel files feed into the research as context.
- Export reports — Markdown directly in the browser, Word documents with clickable hyperlinks and an appendix.
Highlights¶
In contrast to a direct LLM prompt or a plain search-engine query, Recherche-Tool delivers a reproducible, multi-stage run with traceable sources. The following properties distinguish the application from simpler alternatives and have a direct effect on the quality of the resulting reports:
- Agentic pipeline with separated phases: Format agent, research plan, iterative search-and-fetch loop, fact extraction, gap analysis, and synthesis are kept apart. Each phase has its own prompt schema, so errors in one stage do not propagate into the next.
- Dual-LLM architecture: Complex tasks such as planning and synthesis run on a primary model; parallelisable fact extraction runs on a separately configurable harvest model. Both can be assigned independent endpoints, context sizes, and concurrency limits.
- Coverage of many sources: SearXNG metasearch, generic web scraper, GitHub, GitLab, a Solr index, an Elasticsearch index, the institution-internal directory of people and units, WebDAV, local files, and academic literature APIs (arXiv, CrossRef, OpenAlex, Semantic Scholar, DBLP, OpenLibrary).
- Multilingual search: Search terms are produced per question and per language; the pipeline picks topic-relevant languages and avoids cross-language duplicates.
- Iterative gap analysis: After each research round, a dedicated LLM step checks which plan questions remain unanswered and schedules targeted follow-up queries or additional URLs for the next round.
- Hybrid people search with embedder and reranker: For directory queries the application combines full-text search, semantic vector similarity, and a cross-encoder reranker, complemented by a hard name filter that prevents fuzzy-match errors.
- Cross-checking of person-related claims: Web findings that assign a person to a unit of the university are cross-checked against the verified directory; deviations are marked as unverified in the report.
- Contradiction detection: Inconsistent statements across the extracted facts are flagged in the report so they do not disappear into a smoothed-out synthesis.
- Plan confirmation before research: The generated plan, including the search strategy, can optionally be inspected and approved before the costly search and extraction phase begins.
- Reproducible research runs: Every run is persisted on disk together with the report, sources, extracts, plan, search strategy, and metadata, and can be revisited later.