From natural language to precise figures: A tool for calculating personnel costs#

In this series, we report on findings from AI-supported development projects. This time, we focus on a tool that understands project requirements in simple language—and still calculates exactly according to the tariff. What makes it special is that we developed it in less than two hours.

What was the challenge?#

A department needed a tool for calculating personnel costs for third-party funded projects (e.g., research proposals to the DFG, BMBF, or EU).

The problem: Users want to enter their requirements in normal language. But the calculations must be made exactly according to TV-L rates (i.e., the official salary tables for the public sector).

These two requirements traditionally do not go well together. Language models are good at understanding text – but they are unreliable when it comes to calculations.

We wanted to find out: Can both strengths be combined?

What can the tool do?#

The tool allows inputs such as:

“We need 2 doctoral students E13/2 at 67% and one postdoc E14/4, term April 2026 to March 2029.”

It automatically generates a complete cost calculation from this. Here are the most important functions:

Parameter extraction: The AI recognizes all relevant information from the text (e.g., pay grades, experience levels, job shares, start and end dates)
Tariff calculation: The tool calculates according to TV-L with correct contribution rates and employer contributions (social security, supplementary pension).
Annual slices: Costs are automatically broken down by fiscal year, as required by third-party funding providers.
Intelligent queries: If information is missing (e.g., term unclear, job share not specified), the tool asks specific questions.
Export: At the end, there is a finished Excel file for the third-party funding application.

Initial tests show that the extraction works correctly in most cases. This saves time and reduces errors when submitting applications.

How was it developed?#

We divided the process into three phases:

Phase 1 – Requirements gathering (upstream): A specially developed prompt systematically recorded all requirements – problem definition, desired functions, technical framework conditions. Good preparation was key.
Phase 2 – Specification (50 minutes): We intensively coordinated the architecture, data flows, and interface concept. Every component was defined before we started coding.
Phase 3 – Implementation (approx. 40 minutes): The language model generated the entire code in a single round.

Total effort: Less than 2 hours for a functional tool

Result: 5,700 lines of code in 29 Python files

Revisions: Only 1 correction loop was necessary

Why did it work so well?#

The answer in one sentence: We neatly separated the strengths of language models and classic programming.

The key architectural decision was: The language model only handles parameter extraction from natural language. All calculations are performed using classic Python code – completely without AI involvement.

In concrete terms, it looks like this:

Language model: Understands the input and delivers structured data (in JSON format, i.e., machine-readable)
Python component: Performs all calculations with exact precision – nothing can be “hallucinated” here
Flow control: Coordinates the dialogue between input, queries, and calculation
Fallback level: After 3 failed attempts, a manual form appears automatically

This hybrid pattern – language model as an intelligent interface, classic code for critical logic – was planned from the outset. This is what made rapid implementation possible in the first place.

Important insights#

1. Specification determines success

Thorough preparatory work accelerated everything. The architecture, data flows, and interface concept were completely clarified before we even generated a single line of code. The better the planning, the faster the implementation.

2. Hybrid architecture as a reusable pattern

Not everything has to be done by the language model. Classic code remains superior for precise calculations. The language model serves as an intelligent bridge between the user and robust code – it “translates” from human to machine, so to speak.

3. Plan fallback levels from the outset

The manual input method was not an afterthought. Given the uncertain reliability of text recognition, alternative input methods (e.g., classic forms) are part of the original concept.

Structured output as a bridge

The JSON output of the language model with subsequent validation combines natural language input with type-safe processing. This allows us to be flexible with input—but strict with calculation.

What can others learn from this?#

For applications with exact calculations: Use the language model only for input processing – not for the calculation itself.
Invest more time in the specification – this saves time during implementation (in our case: 50 minutes of planning enabled 40 minutes of implementation).
Plan for fallback levels if the reliability of a component is not guaranteed (e.g., manual forms as an alternative).
Structured intermediate formats such as JSON create clear interfaces between language processing and calculation logic.
Good preparation is the most important success factor – this has been confirmed once again in this project.

Conclusion#

✔ Language models and exact calculations go well together – if the architecture separates them cleanly

✔ Thorough specification enabled less than 2 hours of development time for 5,700 lines of code

✔ The hybrid pattern (AI for understanding, code for computing) is suitable for all applications that need to combine flexible input with precise processing

This is part of a series on experiences with AI-supported software development. The focus is on what can be learned from such projects – not just on the finished tools.