Skip to content

Features

CodeDocumentation covers the entire path from source code submission to the downloaded Markdown bundle. Three input sources are available; the subsequent analysis captures languages, frameworks, endpoints, configuration keys, and dependencies, and produces six coherent documents along with an accompanying generation report.

Application scenarios

  • Initial documentation of an existing project — A previously undocumented codebase receives a complete Markdown baseline with architecture overview, API reference, and installation guide, which can then be refined manually.
  • Standardisation across multiple projects — Several projects receive uniformly structured documentation with the same file layout, the same diagram convention, and the same tone.
  • Onboarding of new staff members — Architecture and API overviews serve as an entry point into an unfamiliar codebase, without the team having to write a complete introductory documentation from scratch.
  • Periodic regeneration as a baseline — After significant code changes, the generated version is regenerated and compared with the manually maintained version to identify outdated sections.
  • Migration of a GitLab codebase — A repository is loaded directly from GitLab without a local checkout, analysed, and converted into a searchable Markdown document set.

At a glance

  • Three input sources: local directory, ZIP upload, GitLab API
  • Deep support for four frameworks; generic fallback for other Python and PHP codebases
  • Six output documents plus a generation report, individually selectable
  • Two separately configurable LLM endpoints with optional thinking mode
  • Test files excluded by default, with opt-in toggle
  • Preview of individual documents in the interface, both rendered and as source
  • ZIP bundle as a single download with a date-stamped filename

Input sources and authentication

Three sources are accessed via a shared source loader component. The source is switched in the interface; input fields adapt to the selection.

  • Local directory — A project that has already been checked out is analysed directly from the file system. Suitable for codebases that are already available on the same system.
  • ZIP upload — A ZIP file is extracted into a temporary working directory. If the archive contains a single root directory, that directory is automatically used as the project root.
  • GitLab repository — A project is loaded via the GitLab REST API, both from gitlab.com and from self-hosted instances. Input is provided as a path (group/project) or as a numeric ID; branch or tag are optional. The personal access token with the read_repository scope remains exclusively in the session state, is not persisted, and can be cleared from the interface at any time.

Language model integration

CodeDocumentation connects to two OpenAI-compatible endpoints that are configured separately.

  • Fast model — Applied to individual files in parallel during the analysis phase, typically with thinking mode disabled.
  • Thinking model — Used in the generation phase for the prose sections and the project characterisation.
  • Connection test — Both endpoints can be tested individually before a run, in order to detect configuration and authentication errors early.
  • Thinking extensions — The non-standard enable_thinking parameter is forwarded to the OpenAI interface via extra_body. Endpoints without support for the parameter ignore or reject it; in that case, the mode can be disabled via a checkbox.

Captured structures

The analysis distinguishes between deterministic inspection and model-based evaluation.

  • Languages and frameworks — Languages are weighted by file extension and lines of code; frameworks are detected via dependency files and source code signals.
  • Endpoints and routes — FastAPI, Flask, Laravel, and Symfony are evaluated by dedicated extractors, including path parameters, methods, and tags. For other codebases, a generic regex-based extractor provides baseline coverage.
  • Configuration keys — Environment variables from .env files, configuration files (e.g. config/*.yml), and Dockerfile ENV instructions are captured.
  • Dependenciescomposer.json/composer.lock, requirements.txt, pyproject.toml, and Poetry manifests are evaluated.
  • Build and runtime environment — Dockerfile (base image, ports, entrypoint, env), Docker Compose services, Makefile targets, and CI configurations for GitLab CI and GitHub Actions are parsed.
  • Per file — Purpose, public classes and functions, notable properties, and an importance score are determined with model support.

Generated documents

Each run produces up to six documents plus a generation report.

  • README — Overview, description, key features, quick start, tech stack, and links to the other documents.
  • Architecture — Overview, Mermaid component diagram, module structure as a table, external dependencies.
  • API reference — Endpoint catalog grouped by resource, with method, path, parameters, and schemas where they can be derived from the code.
  • Configuration — Table of environment variables, list of configuration files, Dockerfile variables.
  • Installation — Prerequisites, installation steps, Docker variant, CI overview.
  • Highlights — Notable technology choices, unusual implementations, and a compact set of project metrics.
  • Generation report — Runtime, model calls, and token usage per model role, detected languages and frameworks, recommendations for manual review.

Individual documents can be deselected before a run.

Quality assurance mechanisms

  • Combination of deterministic and model-based sources — Endpoints from framework-specific extractors and those from model analyses are merged and deduplicated by method and path.
  • Importance scoring — Each file is assigned a score that drives the ordering in the generated documents and the selection of files used for project characterisation.
  • Package-level aggregation — File analyses are condensed into package overviews with file count, lines of code, public API, and endpoints per package.
  • JSON robustness — Model responses are parsed even when wrapped in Markdown code fences or embedded in surrounding text; on failure, an empty result is returned rather than passing on malformed data.
  • Thinking block handling<think> sections from thinking models are removed before further processing, including truncated, incomplete blocks.
  • Retry logic — LLM calls are retried up to three times on failure with exponential backoff; token usage is tracked separately per model role.
  • Bounded concurrency — The number of concurrent model calls during the analysis phase is bounded by a semaphore and can be adjusted in the interface.
  • Exclusions and filters — Generic exclusion patterns (e.g. virtual environments, build artefacts, cache directories) and test files are skipped by default; additional glob patterns can be specified. .gitignore entries are evaluated as well.

Additional features

  • Preview in the interface — Generated files can be viewed as rendered Markdown or as source, without downloading the ZIP first.
  • Date-stamped ZIP bundle — The download contains all enabled documents and the report under a date-stamped filename.
  • Configurable bind address and port — Bind address, port, and reverse proxy path prefix are set via environment variables.