Skip to content

Code Analyzer

Code Analyzer is an application for the automated analysis and documentation of source code projects in Java, PHP, and Python. It combines static structural analysis with a multi-stage, role-based LLM pipeline that examines each source file from four specialised perspectives (business logic, technical aspects, interfaces, issues) and consolidates the aggregated results into a system-wide deep analysis with report generation. Intermediate results are persisted as YAML and thus form a traceable, auditable layer between source code and evaluation.

At a glance

  • Systematically capture and document complete code bases in Java, PHP, or Python without manually reviewing each class.
  • Extract business functions, capabilities, and critical methods per file in a structured form and consolidate them at the domain level.
  • Automatically catalogue the API landscape (REST endpoints, SOAP services) and visualise it as a hierarchy.
  • Review and prioritise security and quality findings by severity and category.
  • Derive and assess the architectural style, layer distribution, and module structure of a project.
  • Generate a complete analysis report covering everything from a management summary to concrete recommendations.
  • Export capabilities and issues as CSV for further processing.

Highlights

Compared with a direct prompt to an LLM or a simple script invocation, the application produces deterministically reproducible and aggregable results because it combines static preprocessing, staged LLM calls, and persistent intermediate states.

  • Multi-stage LLM pipeline with role specialisation — Each source file is processed by four separate LLM calls with dedicated roles (Senior Software Architect for business logic, technical aspects, and interfaces; Senior Security Engineer for issues). Each role receives a focused prompt and an enforced JSON structure, increasing the depth and consistency of the results.
  • Aggregated system analysis in seven steps — Building on the per-file YAMLs, a second pipeline produces a project-wide evaluation in seven stages (overview, architecture, business domains, interfaces, quality, modernisation, executive summary) and consolidates them into a coherent report.
  • Separation of capture and evaluation — Code Scanner and Code Analyzer are separate applications that communicate exclusively through the YAML directory. Once a code base has been captured, it can be evaluated, exported, or re-reported any number of times without further LLM calls.
  • Static structural analysis as a baseline — Architectural style, layers, and build modules are detected rule-based via regex and keyword lists before any LLM is invoked. This reduces hallucinations and provides an objective reference alongside the LLM evaluation.
  • YAML as an auditable intermediate layer — All LLM responses are stored per file as structured YAML. Findings are therefore traceable, version-controllable, diff-friendly, and usable independently of the evaluation frontend.
  • Connection to three language families and arbitrary LLM endpoints — Processes Java, PHP, and Python projects from the file system; talks to any OpenAI-compatible endpoint (locally with Ollama or vLLM, via the OpenAI API, or LM Studio).
  • Asynchronous processing with configurable parallelism — File analyses run in parallel under a semaphore (default: 3), aligning throughput with capable LLMs while avoiding overloading small local endpoints.
  • Validation and repair steps — A dedicated validation view checks the YAML inventory for completeness; a CLI tool re-analyses failed files in a targeted manner.
  • Configurable LLM operation — Endpoint, model, and API key can be exchanged via environment variables or through the UI; local and cloud-hosted models can be used interchangeably.