Insights from an AI-powered survey tool: When conceptual complexity becomes more important than lines of code#

Context: As part of a learning project, an AI-powered survey tool was developed that asks intelligent follow-up questions to clarify unclear answers. The primary goal was not the tool itself, but rather the exploration of methodological limits and possibilities of LLM-supported coding in multi-stage workflows.

The tool: Intelligent survey orchestration#

The developed system is based on a single-agent approach with three core functions: A SurveyAgent evaluates incoming responses for clarity and specificity, generates context-specific follow-up questions as needed, and structures final responses for later clustering. The technical implementation was carried out using Python, Gradio for the UI, Pydantic for data validation and asyncio for asynchronous processing – a deliberately chosen stack based on existing experience in order to be able to concentrate on the actual learning task.

The system processes survey responses in a multi-stage workflow: After each response, an automatic evaluation (clarity score 0-1) is performed. If the clarity is insufficient, the agent generates a targeted query, the expanded response is re-evaluated and finally converted into a structured form for statistical analysis. Modularity was specified from the outset in order to create clear responsibilities.

Key methodological findings#

1. Conceptual complexity exceeds code volume: The real challenge was not the number of lines of code (4,000 lines, 15 files), but understanding the required architectural patterns. Without prior experience with such agent workflow interactions, it was difficult to specify how the individual components should be orchestrated. This leads to the realisation that more complex LLM architectures first require exploratory prototypes in order to familiarise oneself with the architectural patterns.

2. LLMs tend to overengineer: During the 5-6 rounds of iterations to refine the prompt templates, it repeatedly became apparent that the LLM was proposing overly ambitious and complex mechanisms. It was necessary to keep the concept strictly simple and actively steer the LLM towards the KISS principle. This required explicit guidelines against unnecessary abstractions.

3. Structured JSON outputs as the key: The consistent use of structured JSON responses was crucial for the interaction between the survey agent, workflow manager and UI. This enabled clear state management and predictable data flows. At the same time, a critical learning emerged: JSON-driven LLM systems require robust safeguards against infinite loops, as faulty processing can quickly lead to circular queries.

4. Specification vs. experimentation: The initial specification phase (2 hours) was important for clarifying basic requirements. However, it became apparent that specification alone is not sufficient for novel architectural patterns – actual feasibility and optimal structuring can only be determined through practical experimentation. This led to an iterative approach consisting of specification, implementation and architectural learning.

5. Limitations in criteria derivation: The greatest technical challenge was the automatic derivation of evaluation criteria from responses. While the system worked reliably in clear-cut cases (very vague vs. very specific), the evaluation of the completeness of responses was inconsistent. This points to a fundamental limitation in semantic deep analysis.

Practical suitability and consequences#

The tool has been functionally validated and shows mixed results: for certain types of questions, the query system works reliably, while for others, criteria derivation remains problematic. The development time of four hours of pure implementation over three days demonstrates the efficiency of LLM-supported coding for clearly defined tasks.

The project was explicitly designed as an exploratory learning vehicle. The insights gained were directly incorporated into an improved follow-up project (‘ppt-helper’), which further develops the architectural patterns with several specialised agents. This demonstrates the real value of such experiments: not immediate productive use, but the systematic development of expertise for more complex LLM architectures.

Conclusion#

The key finding of this experiment: When developing more complex systems with LLM support, understanding the architecture is the limiting factor, not code generation. LLMs are highly efficient at implementing clearly specified components, but orchestrating multi-stage workflows requires experience that can only be gained through practical experimentation. A clear but not overly detailed specification combined with iterative prototyping forms the optimal workflow. In this process, the LLM must be actively guided towards simple solutions – its natural tendency is towards overengineering.

For future projects, this means that complex architectures should first be tested in small exploratory prototypes to understand the required patterns before moving on to productive implementation.