Manually adjusted#

Methodological insights from the development of a multi-agent system with LLMs#

In an experiment lasting several days, a presentation preparation tool with a two-agent architecture was developed in order to gain more experience with agentic approaches and LLM systems.

Project framework#

The tool assists in structuring presentations based on uploaded documents. Unlike many other solutions, it does not create content, but structures existing information according to the user’s preferences, such as target audience and presentation time. The architecture consists of two specialised agents: a chat agent conducts the conversation with the user, while an artefact agent maintains the presentation structure as a Markdown document.

Overview of how it works#

A typical workflow looks like this: Users upload one or more documents (PDF, DOCX, text, Markdown or PPTX). The system automatically creates an initial artefact from these – a structured Markdown document that represents the presentation. This artefact contains slide titles as Markdown headings, key points as bullet points and speaker notes as block quotes.

In the subsequent chat, the user clarifies details such as ‘The presentation is for executives, 20 minutes long’ or ‘The focus should be on the results, not the methodology’. After each chat turn, the artefact agent analyses whether the chat contains information relevant to the structure and updates the artefact accordingly. If information is missing, it forwards questions to the chat agent. The final artefact can be exported as Markdown, PowerPoint or Word.

Development process#

The development process followed a proven methodology: first, a functional specification was drawn up, then a technical specification was developed – a total of approx. 2 hours over several iterations. The actual code implementation, with around 3000 lines, took only 30-60 minutes based on the detailed specification.

Important results#

Prompt following as a success factor: An important finding concerns the prompt following capabilities of LLMs for agentic systems. Initially, it was assumed that clear system prompts would suffice. In practice, however, it became apparent that ambiguities in agent execution required a structured communication protocol between the agents. The implementation of a ‘back channel’ – a JSON-based protocol through which the artefact agent can make queries to the chat agent – significantly improved the quality of the results. Free-text communication between agents proved to be too error-prone.

Importance of detailed specifications: The separation between functional and technical specifications was crucial. In the functional phase, 4-5 iterations were needed to clarify architectural decisions – for example, the exact interaction between the chat and artefact agents. Discussions about technical feasibility and complexity estimates were particularly important in order to identify lean solutions. The technical specification then defined exact data structures, interfaces and error handling strategies. This clarity enabled rapid code generation without further iterations.

LLM-maintainable code base: Code files under 1000 lines are significantly better suited for LLM-supported maintenance. This requires consistent modularisation, but pays off in terms of development speed. The largest component, the artefact agent with around 900 lines, remained easily manageable. The module structure (core/, agents/, export/) developed iteratively with the LLM was specifically designed with this maintainability in mind.

Model selection based on prompt following: Mistral Small 2506 was used for the productive agents. The choice of smaller models should be based primarily on their prompt-following capabilities, not exclusively on general benchmarks. Mistral Small proved to be sufficient, although larger models would probably have worked more accurately. However, the high inference speed (response times of a few seconds) ensured a good user experience.

Practical validation#

The tool was tested by various stakeholders with real documents (scope: a few to 60 pages). The quality of time planning and target group adaptation was particularly surprising. This demonstrates an ‘inspirational value’ – the tool provides structured suggestions that serve as a starting point for further development. The structuring of presentation sequences based on uploaded documents also worked robustly.

Transferable principles#

The following principles can be derived for future multi-agent projects with LLMs:

Investing in detailed specifications is worthwhile. The two hours spent on functional and technical specs enabled a 30-60 minute implementation.
Structured protocols for agent communication are helpful. JSON-based interfaces are more robust than free text.
Two-way communication: The return channel helps when agents need to work in a coordinated manner. Unidirectional data flows were insufficient.
Modularisable with a view to LLM maintainability. The <1000-line guideline per file has proven its worth.
Selection of models based on prompt-following capabilities for agentic applications.

The structured approach – functional specification, technical specification, then implementation – has established itself as a reproducible workflow and will be retained in further projects.

This article is part of a series on the methodical documentation of LLM-supported development projects.