LLM coding experiment: Personnel cost calculator with hybrid architecture#
Initial situation#
A request from a department at Humboldt University in Berlin asked the question: Can a tool for calculating personnel costs for third-party funded projects be implemented with LLM support? The challenge: Precise calculations according to the TV-L tariff while simultaneously allowing natural language operation—two requirements that are traditionally difficult to combine with LLMs.
The experiment was designed to explore whether and how exact calculations can be combined with the flexibility of natural language input.
The tool#
The personnel cost calculator allows project requirements to be entered in natural language:
“We need 2 doctoral students E13/2 at 67% and one postdoc E14/4, term April 2026 to March 2029.”
The system automatically extracts the relevant parameters (pay grades, levels, job shares, time periods), performs the calculation according to TV-L, and provides a structured cost overview with annual slices. If the information is incomplete, the system asks specific questions. An Excel export allows direct use in third-party funding applications.
Architecture decision#
The key finding of the specification phase: LLMs and precise calculations require strict separation. The chosen architecture:
LLM component: Exclusively for parameter extraction from natural language and the generation of queries. The LLM delivers structured JSON data but does not perform any calculations.
Deterministic component: All calculations are performed without LLM involvement using Python Decimal for exact accuracy. Remuneration tables, contribution rates, and employer contributions are statically configured.
State machine: Controls the dialog flow between parsing, queries, calculation, and fallback.
Fallback mechanism: After three failed parsing attempts, a manual input form is displayed – planned from the outset, as the reliability of LLM extraction could not be guaranteed.
Development process#
The process followed a clear sequence:
Requirements gathering (upstream): A specially developed prompt systematically captured the requirements of the department – problem definition, functional requirements, technical framework conditions.
Specification discussion (50 minutes): Intensive coordination of architecture, data flows, workflows, and UI concept. Only when all aspects had been clarified was the specification finalized.
Implementation (approx. 40 minutes): The LLM generated 5700 lines of code in 29 Python files in one round. One revision was necessary.
The total duration of less than two hours for a functional tool of this complexity was only possible thanks to the thorough preparatory work.
Methodological insights#
Specification before implementation: The quality of the specification determined the success. Architecture, data flows, workflows, and UI must be fully clarified before code is generated. The focus on architecture prevented overengineering.
Hybrid architecture as a pattern: Not everything has to be done by the LLM. Classic programming techniques remain superior for many tasks – the LLM serves as an intelligent interface between the user and robust code. This pattern is suitable for all applications that need to combine precise calculations with flexible input.
Plan for robustness: The fallback mechanism was not an afterthought, but part of the concept from the outset. When LLM reliability is uncertain, alternative input paths are required.
JSON extraction as an interface: The structured output of the LLM as JSON with subsequent Pydantic validation proved to be a robust bridge between natural language input and type-safe processing.
Current status#
The tool is in the evaluation phase and, based on tests to date, is functioning as planned. The question of whether LLMs and exact calculations are compatible can be answered in a differentiated way: Yes—if the architecture specifically combines the strengths of both worlds.
Metrics#
| Aspect | Value |
|---|---|
| Code size | 5700 lines |
| Files | 29 Python files |
| Specification time | 50 minutes |
| Implementation time | approx. 40 minutes |
| Main iterations | 1 + rework |
| Sessions | 1 |
This article is part of a series documenting methodological findings from LLM-supported development projects.