From idea to presentation: What we learned about multi-agent systems while developing an AI tool#

We developed a tool that automatically creates structured presentations from uploaded documents. What makes it special is that two hours of planning resulted in only 30-60 minutes of programming time – and we learned a lot about the practical application of language models in the process.

What was the goal?#

Less manual effort in preparing presentations.

Most presentation tools create finished slides with pre-designed content. We wanted to take a different approach: a system that takes existing information and structures it according to your own specifications (e.g. target audience, presentation time, content focus).

What can the tool do?#

Process documents: You upload files (PDF, Word, text, Markdown or PowerPoint) and the system develops a first draft from them
Refine during conversation: You specify details such as ‘The presentation is for laypeople, 20 minutes’ or ‘Focus on results, not methodology’
Automatic updates: After each discussion step, the system checks whether the structure needs to be adjusted
Flexible export: The result can be saved as a Markdown, PowerPoint or Word file

This creates a presentation structure that exactly matches your requirements.

How was it developed?#

We proceeded in three clear steps:

Step 1 (4-5 iterations): Functional planning – What should the tool be able to do? How should the components work together?
Step 2 (2-3 iterations): Technical planning – What data structures do we need? How do we handle errors?
Step 3 (30-60 minutes): Programming according to the detailed specifications

Total planning time: Approximately 2 hours
Result: Approximately 3,000 lines of code in several modules

Why did it work so well?#

Because we took the preparation seriously.

A clear description of the requirements prevented many errors later on. In the functional phase, we discussed which technical approaches were actually feasible – and deliberately opted for lean solutions. The technical planning then defined exact interfaces. The result: the code could be written in one go, without any major changes.

The two-agent architecture was central to this: one agent conducts the conversation with the users. A second agent maintains the presentation structure. This separation simplified development.

Key findings#

1. Language models need structured instructions

At the beginning, we tried to formulate clear descriptions in the system prompt. In practice, we saw that this was not enough. The agents worked too imprecisely.

We then introduced a structured protocol – a JSON format that the agents use to communicate with each other. The structure agent can ask the conversation agent specific questions. This significantly improved the quality. Free-text communication between agents was too error-prone.

2. Thorough planning saves time

The two hours spent on functional and technical planning paid off. We hardly had to revise anything during programming.

The discussions about technical feasibility were particularly important. Instead of building complex solutions, we focused on the essentials. The technical planning then defined the exact data structures. This clarity enabled rapid implementation.

3. Small code files are easier to maintain

Files with fewer than 1,000 lines are much easier to edit with language models. This requires consistent division into modules, but it pays off.

Our largest component – the structure agent with around 900 lines – was still easy to manage. We developed the module structure iteratively, always with maintainability in mind.

4. The choice of model is important

We used Mistral Small 2506 for the productive agents. With smaller models, the primary consideration should be: Do they follow instructions reliably? Not only that: How do they perform in general tests?

Mistral Small was sufficient; larger models would probably have worked more precisely. However, the fast response times (a few seconds) ensured a good user experience.

What can others learn from this?#

Invest time in planning: The two hours spent on functional and technical preparation enable very fast implementation.
Use structured protocols: JSON-based interfaces between agents are more robust than free text.
Two-way communication: A return channel helps when agents need to work in a coordinated manner.
Consider maintainability: The guideline of less than 1,000 lines per file has proven effective
Choose models based on instruction fidelity: It’s not just overall performance that counts, but how accurately the model implements specifications

Practical validation#

Various people tested the tool with real documents (range: a few to 60 pages). Time planning and adaptation to different target groups worked particularly well.

The tool provides structured suggestions that serve as a starting point. The structuring of presentations based on uploaded documents worked robustly.

Conclusion#

✔ Structured preparation with functional and technical planning significantly speeds up implementation.
✔ Clear protocols for communication between agents significantly improve quality.
✔ Maintainable code structure and the right model choice based on instruction fidelity are crucial.

This is part of a series on the practical use of language models. The focus is on what can be learned from such projects – not just on the results.