Features¶

Umfrage-Analyse-System covers the complete evaluation workflow: from raw data import through coding and clustering of free-text responses to the production of multilingual reports and dashboard datasets. The focus is on traceable processing with manual intervention possible at every critical point.

Use cases¶

Complete evaluation of a university survey. A survey exported as CSV or Excel with mixed question types is imported; the application detects question types, translates foreign-language responses, clusters free texts, computes significance tests and produces a structured Word report.
Reworking automatically generated clusters. After an initial LLM-supported clustering, cluster names appear vague or the granularity unsuitable. Through the workbench, clusters are renamed, merged or redefined, and a renewed clustering run aligns the items with the corrected categories.
Cross-question evaluation by topic. Themes are defined via keywords and explicit question assignments; the application produces a theme-to-question mapping as the basis for navigation in a separate dashboard application.
Segment comparison by country, institution type and size. For every question — including free-text questions with cluster structures — segment shares, grouped charts, small multiples and chi-square tests are computed and documented in an appendix.
Multilingual publication. A generated report is produced in German, French and English; supplementary Markdown sources (cover, background, introduction) are translated paragraph-by-paragraph and combined with the evaluation part into a publication-ready document.
Data provisioning for an accompanying website. All analysis results, correlations and theme assignments are exported as JSON and serve as the data basis for a separately operated dashboard application.

At a glance¶

Seven question-type handlers (single/multiple choice, yes/no matrix, Likert, ranking, free text, cooperation matrix)
Three connectors: tabular import (CSV/Excel), LimeSurvey structure file (.lss), LLM API
Four export formats: Word, JSON, CSV, PNG
Three languages for processing and output (DE/FR/EN)
Segmentation by country, institution type, size — including for free-text clusters
Workbench for clusters, items and normalisation
Resumable batch operations with database cache

Data import¶

Tabular survey data is loaded via an import dialogue. The application detects column structures automatically and suggests suitable handler types; existing configurations can be loaded from YAML files or edited in the UI.

CSV / Excel: Tabular survey data with semicolon- or comma-based formats is read with automatic encoding detection (UTF-8, UTF-8-BOM, Latin-1, CP1252). Excel files (.xlsx, .xls) are supported directly.
LimeSurvey structure file (.lss): The XML-based structure file from LimeSurvey is parsed in order to import multilingual question texts, help texts and answer options into the internal configuration. Manual maintenance of question translations is therefore not required.
LLM API (external): An OpenAI-compatible interface (e.g. vLLM) is called for translation, item extraction, clustering, theme extraction and summary texts. URL, model name and timeouts are configurable via environment variables.

Processing of free-text responses¶

Free texts are processed in two stages. First, composite responses are split into individual items; subsequently, the items are grouped thematically. Both steps are documented traceably and can be corrected manually.

Item extraction (hybrid). Rule-based detection captures structured responses (lists, enumerations, separators). For low-confidence cases or running prose, LLM-supported extraction takes over with two prompt modes: conservative (split as little as possible) or thematic (split by independent statements while preserving context).
Clustering. Items are grouped into thematic clusters via the LLM. The minimum and maximum number of clusters as well as the maximum share of a single cluster scale with the response volume in order to avoid both over- and under-segmentation.
Selection of representative examples. For each cluster, the LLM selects a small set of supporting items from the assigned items, which appear in the report as examples.

Workbench for quality assurance¶

The workbench bundles all options for correcting automatically generated results. It is designed for evaluators who review LLM suggestions and adjust them to the domain perspective.

Cluster editor: edit cluster names (DE/FR/EN), descriptions and add or delete clusters.
Item editor: rename, delete, merge and split extracted items; reassign between clusters; search and filter by cluster and keyword.
Normalisation: detect and unify similar spellings (such as "M365" and "Office 365") via rule- or LLM-supported suggestions.
Re-clustering with fixed categories: reassign items to a manually defined category set; optionally allowing new categories.

Analysis and segmentation¶

For every question type a dedicated handler with question-type-appropriate aggregation exists. Evaluation can be performed in aggregate or by segment.

Segmentation: by country, institution type and size; configurable in the central configuration file.
Free-text segmentation: cluster comparison per segment, including comparison tables and grouped charts.
Statistical tests: chi-square test with Cramér's V as effect size; sample warnings for low expected frequencies.
Correlation analysis: computation of pairwise associations between questions with effect-size classification.
Chart types: horizontal bar, stacked 100% bar, pie chart, treemap, diverging bar, ranking chart, grouped bar and small multiples.

Evaluation texts and themes¶

Beyond the numerical evaluation, the application produces texts that classify the results and assigns questions to thematic areas.

Question summaries: for each question, three text blocks are produced — description of the distribution, interpretation of notable findings and a note on significant segment differences. The texts are produced in the three languages and cached in the database.
Theme analysis: themes are defined with keywords; the assignment is performed via string matching or a deeper LLM-supported review of the individual clusters. The result is a theme-to-question mapping as the basis for dashboard navigation.

Export and report generation¶

The application produces both individual reports and a pipeline output containing all components of a publication-ready evaluation.

Structured Word report: title page, table of contents, main body grouped by question groups with charts per question (overall, grouped segment comparison, small multiples), Appendix A with detail tables and significance values, Appendix B with free-text clusters and examples.
Document Builder: Markdown source files (cover, introduction, background, theme summaries) are translated paragraph-by-paragraph with glossary support and combined with the report into a multilingual publication document.
Dashboard JSON: complete dataset with global and segmented results, correlations and theme assignments for a separate dashboard web application.
CSV and PNG: evaluation tables as CSV and individual charts as PNG.
Resume mode: pipeline runs draw on previously computed results, so only missing steps are executed anew.