Workshop report: LLM-enabled development#

How do you develop with LLMs?#

The aim of this exercise was to gain methodological experience with LLM-supported coding:

How does the specification process work for more complex software?
What architectural decisions are necessary to keep complexity manageable?
What are the practical limits in terms of code size and complexity?
How do you systematically deal with errors in LLMs?

Based on a question from another project, a code analysis tool for Java code was chosen. Such a tool required various technical components – file scanning, LLM integration, asynchronous processing, structured data storage and web interfaces.

The following functional requirements were defined for the learning project:

Analysis of the architectural structure and layer assignment
Identification of business functionalities
Detection of security and quality issues as well as technical debt
Cataloguing of API interfaces
Analysis of dependencies between components
Textual description of these aspects and readability of the output

These requirements were deliberately chosen to test various aspects of LLM integration: structured data extraction, natural language descriptions, pattern recognition and categorisation.

Technical design and architecture#

Fundamental design decisions#

The development of the analysis tool was based on several deliberate architectural decisions that arose from the requirements and capabilities of large language models.

Hierarchical analysis approach#

A key design principle was the realisation that, in large projects, a complete analysis cannot be processed in a single LLM context. Therefore, a hierarchical approach was chosen, in which analyses were performed and aggregated at different levels:

File level: Detailed analysis of individual Java files
Package level: Aggregation and summarisation at package level
Module level: Module-based analysis
System level: Overall system architecture and cross-cutting patterns

This structure allowed each level of analysis to serve as input for further LLM-based analyses without exceeding context boundaries.

Storage format and data structure#

YAML was chosen as the storage format because it is easier to handle than XML and is well suited for further machine processing. The analysis results were stored in a package-based directory structure that reflects the logical organisation of the source code:

analysis/
├── project_structure.yaml
├── summary.yaml
└── files/
    └── com/example/package/
        └── ClassName.yaml

Technology stack#

Python and Gradio were chosen as the base technologies for pragmatic reasons:

Python offers broad support for LLM integration and asynchronous processing
Gradio enabled rapid development of web interfaces, was easy to deploy, and was well ‘understood’ by LLMs, which facilitated AI-assisted development
Asyncio allowed efficient parallel processing with controllable resource usage

Mistral Small 2506 was used as a locally hosted model for LLM integration. One design criterion was the availability of a local LLM without restrictions on the token size for queries.

Architecture components#

The tool consisted of 6 Python files with a total of 5000-6000 lines of code and implemented a modular architecture:

File Scanner: Identified all relevant Java files based on include and exclude patterns
Structure Analyzer: Analysed the package hierarchy, identified architecture styles (MVC, Layered, Hexagonal) and assigned packages to layers
Code Analyzer: Performed LLM-based analyses at file level
Storage Manager: Managed the persistent storage of the analysis results in YAML files
Aggregator: Summarised individual results to higher levels of abstraction
Report Builder: Generated different views of the analysis data

Two-stage interface design#

A deliberate architectural decision was to separate the system into two distinct Gradio interfaces:

Analyser Interface (Port 7860): Performed the actual analysis, which could take several hours
Analysis Dashboard (Port 7861): Enabled interactive exploration of the analysis data already generated

This separation made it easy to distinguish between code analysis and data exploration.

LLM-based code analysis: Implementation details#

Multi-prompt strategy#

One key implementation detail that developed iteratively was the division of the analysis into specialised LLM prompts per file:

Business Logic Analysis: Identified the business purpose, capabilities and critical methods
Technical Aspects Analysis: Recognised frameworks, design patterns and dependencies
Interface Analysis: Extracted REST endpoints and SOAP services
Issue Detection: Identified security issues, code smells, and technical debt

This specialisation proved helpful, as a single prompt was not sufficiently focused to capture all relevant aspects at once.

Concurrency and stability#

Analysing large projects required dealing with several technical challenges:

Rate Limiting: A semaphore-based mechanism controlled parallel execution (configurable 1-10 workers) to avoid overloading the LLM
Error Handling: Each file analysis was isolated; errors did not cause the entire process to abort
JSON extraction: LLM responses were searched for JSON structures using regex parsing to ensure robust results even with inconsistent LLM response formats

Example of an LLM analysis#

The business logic analysis used a structured prompt that guided the LLM to JSON-formatted responses:

Analyse this Java class and identify its business purpose and capabilities.

Class: UserService

Code: [first 3000 characters]

Focus on:
1. What is the business purpose of this class?
2. What business capabilities does it provide?
3. Which public methods are business-critical?

Respond in JSON format:
{
  ‘purpose’: ‘Brief description’,  ‘capabilities’: [
    {
      ‘name’: ‘capability name’,
      ‘description’: ‘what it does’,
      ‘methods’: [‘method1’, ‘method2’]
    }
  ],
  ‘critical_methods’: [“method1”, ‘method2’]
}

Development process with LLM support#

Specification phase (1 hour)#

The development process began with a detailed specification phase in which the requirements and constraints were worked out using various LLMs. This phase was deliberately designed as a dialogue in which the LLM was not only supposed to implement requirements, but also ask questions and request clarifications.

Consistent application of the KISS principle (Keep It Small and Simple) ensured that the specification remained realistic and feasible.

Implementation phase (2 hours)#

The actual code generation took place in 2 hours over 4-5 main iterations. A linear development process without subsequent refactoring was followed. The strategy was:

Clarify functional requirements: What should the tool do?
Define UI structure: Which interfaces were needed?
Specify technologies: Which libraries and frameworks?
Design architecture: How did the components interact?

Technical framework conditions were partly specified in advance and partly developed in discussion with the LLM. The Gradio interfaces were specified in terms of their structure, while the detailed implementation was left to the LLM.

Role of the developers#

The role as a non-developer primarily consisted of:

Clarifying requirements: What specifically should the tool solve?
Architectural decisions: What basic structure was appropriate?
Quality control: Did the code correspond to the specification?
Complexity control: Was the solution as simple as possible?

The LLM took care of the actual code implementation, syntax details and the concrete implementation of algorithms.

Methodological findings#

Specification as a success factor#

A key insight from this project was that LLM-supported coding works with good specifications. The clearer, more consistent and more complete the specification, the better the generated code. This clarity was developed in dialogue with the LLM, as the LLM was able to uncover gaps and contradictions by asking specific questions.

Possibilities and limitations of code generation#

While LLMs demonstrated impressive capabilities in code generation, there were realistic limitations:

Scope limitation: LLMs were able to reliably generate code up to about 1000-1500 lines per file. Beyond that, quality declined and the likelihood of inconsistencies increased.

Overengineering tendency: LLMs often proposed overly complex solutions that were technically correct but difficult to maintain. This required active control. There was not enough information available on the use of LLMs for tools. LLMs therefore often used complicated heuristics, which was avoided.

Understanding of architecture required: Even without coding yourself, a thorough understanding of software architecture was required to specify realistic and controllable solutions.

Background knowledge for deployments and IT security aspects had to be available or was built up.

Changed development workflow#

The development process with LLMs changed the priorities:

Requirements clarification became a core skill
UI/UX design took place before implementation
Architecture decisions were central
Implementation details were delegated

This enabled subject matter experts without in-depth programming knowledge to develop specialised tools for their domain.

Validation of the learning project#

Functional test on a real example#

To validate the functionality of the developed tool, it was applied to a real Java project – a system with approximately 1.2 million lines of code, including Java files, XML configurations and XSD definitions. The analysis run took place overnight.

The goal was not to create a productive tool, but to understand:

Did the developed approach work technically?
Were the LLM analyses meaningful in terms of content?
Did the architecture scale to large code bases?
Where did problems arise in the implementation?

Observed results#

The tool generated comprehensive structured documentation:

Project structure overview with recognised architecture style
Package hierarchy with automatic layer assignment
Detailed file analyses with business and technical aspects
Catalogue of all REST and SOAP interfaces
Categorised list of identified issues by severity
Aggregated statistics at various levels of abstraction

Technical validation: The code worked without any problems, the analysis ran completely, the YAML outputs were valid and the data structures were consistent.

Content quality: The LLM-generated descriptions were linguistically readable, recognised architectural patterns appeared plausible, and issue detection identified visible problem areas. Discussions with the development team confirmed the key findings.

Focus: Methodological insights#

The value of the experiment lay not in the tool itself, but in the insights gained about LLM-assisted coding:

Specification methodology: How was the dialogue with LLMs structured?
Architectural patterns: Which structures worked with LLM-generated code?
Practical limitations: Where were the realistic limits?
Workflow changes: How did the role of the developers change?

These methodological insights may be transferable to other projects, regardless of the specific application.

Conclusion: Methodological insights from the learning project#

The experiment with LLM-supported software development was designed as a learning project and fulfilled this purpose. In 3 hours of development time (1 hour of specification, 2 hours of implementation), a functional tool with 5000-6000 lines of code was created that worked technically and could analyse a project with 1.2 million lines of code.

The value lay not in the tool itself, but in the methodological insights gained about LLM-supported coding at a realistic level of complexity.

Key insights into the specification methodology#

Specification as a success factor: LLM-supported coding worked on the basis of clear, consistent specifications. The more precise the requirements, the better the generated code. This clarity was developed in a structured dialogue with the LLM, as the LLM was able to uncover gaps and inconsistencies through targeted questions.

Dialogue structure: The time invested in the specification phase (1 hour) paid off. The dialogue followed a pattern:

Clarify functional requirements
Define UI structure
Make technology decisions
Specify architecture components

Enforce the KISS principle: LLMs tended to over-engineer. Active control for simplicity was necessary and was carried out continuously. Complex solutions proposed by the LLM were questioned and simplified.

Insights into architecture patterns#

Hierarchical structuring: The hierarchical analysis approach (file → package → module → system) proved to be a viable pattern for LLM-generated code. Each level produced results that could themselves serve as input.

Modularisation: The division into six separate Python files with clear responsibilities (scanner, analyser, storage, etc.) worked well. Dataclasses as data models facilitated structuring.

Interface separation: The deliberate separation into two Gradio interfaces (analysis vs. exploration) reflected different usage patterns and reduced the complexity of the individual components.

Practical limitations of LLM coding#

The experiment revealed realistic limits:

Scope limitation: LLMs could generate code up to about 1000-1500 lines per component. Beyond that, coherence decreased. The six files with a total of 5000-6000 lines remained within this range thanks to clever modularisation.

Architectural expertise required: LLMs implemented details, but the basic architecture had to be understood and specified. Without an understanding of software architecture, more complex projects could not be managed.

Iterative corrections: Four to five main iterations were required to eliminate inconsistencies and refine the specification.

Technical challenges: Stability over long analysis runs and clean JSON generation were initially problematic. Robust error handling and regex-based JSON parsing were helpful adjustments.

Changed development workflow#

LLM-supported coding changed priorities:

Requirements clarification became a core skill: The ability to specify precisely proved to be more important than syntax knowledge.

Architecture before implementation: Structural decisions were made before code generation. The LLM could advise on the architecture, but the decision was up to the developers.

New role: Developers became specification authors, architects and quality gatekeepers. Code implementation was delegated.

Empowerment of non-developers: With an understanding of architecture, subject matter experts without in-depth coding knowledge were able to create functional prototypes.