Code Analyzer¶

Code Analyzer is an application for the automated analysis and documentation of source code projects in Java, PHP, and Python. It combines static structural analysis with a multi-stage, role-based LLM pipeline that examines each source file from four specialised perspectives (business logic, technical aspects, interfaces, issues) and consolidates the aggregated results into a system-wide deep analysis with report generation. Intermediate results are persisted as YAML and thus form a traceable, auditable layer between source code and evaluation.

At a glance¶

Systematically capture and document complete code bases in Java, PHP, or Python without manually reviewing each class.
Extract business functions, capabilities, and critical methods per file in a structured form and consolidate them at the domain level.
Automatically catalogue the API landscape (REST endpoints, SOAP services) and visualise it as a hierarchy.
Review and prioritise security and quality findings by severity and category.
Derive and assess the architectural style, layer distribution, and module structure of a project.
Generate a complete analysis report covering everything from a management summary to concrete recommendations.
Export capabilities and issues as CSV for further processing.

Highlights¶

Compared with a direct prompt to an LLM or a simple script invocation, the application produces deterministically reproducible and aggregable results because it combines static preprocessing, staged LLM calls, and persistent intermediate states.

Multi-stage LLM pipeline with role specialisation — Each source file is processed by four separate LLM calls with dedicated roles (Senior Software Architect for business logic, technical aspects, and interfaces; Senior Security Engineer for issues). Each role receives a focused prompt and an enforced JSON structure, increasing the depth and consistency of the results.
Aggregated system analysis in seven steps — Building on the per-file YAMLs, a second pipeline produces a project-wide evaluation in seven stages (overview, architecture, business domains, interfaces, quality, modernisation, executive summary) and consolidates them into a coherent report.
Separation of capture and evaluation — Code Scanner and Code Analyzer are separate applications that communicate exclusively through the YAML directory. Once a code base has been captured, it can be evaluated, exported, or re-reported any number of times without further LLM calls.
Static structural analysis as a baseline — Architectural style, layers, and build modules are detected rule-based via regex and keyword lists before any LLM is invoked. This reduces hallucinations and provides an objective reference alongside the LLM evaluation.
YAML as an auditable intermediate layer — All LLM responses are stored per file as structured YAML. Findings are therefore traceable, version-controllable, diff-friendly, and usable independently of the evaluation frontend.
Connection to three language families and arbitrary LLM endpoints — Processes Java, PHP, and Python projects from the file system; talks to any OpenAI-compatible endpoint (locally with Ollama or vLLM, via the OpenAI API, or LM Studio).
Asynchronous processing with configurable parallelism — File analyses run in parallel under a semaphore (default: 3), aligning throughput with capable LLMs while avoiding overloading small local endpoints.
Validation and repair steps — A dedicated validation view checks the YAML inventory for completeness; a CLI tool re-analyses failed files in a targeted manner.
Configurable LLM operation — Endpoint, model, and API key can be exchanged via environment variables or through the UI; local and cloud-hosted models can be used interchangeably.