Architecture overview#

Introduction#

AI mapping at universities is a system for recording, analysing and visualising AI activities at German universities. The application allows users to submit URLs for AI projects, automatically extracts relevant information, creates structured profiles and displays them in a clear web interface.

System Architecture#

The application is designed as a modern web application with a clear separation between backend and frontend. It follows an API-first approach with static site generation for the presentation of data.

Architecture diagram#

graph TD
    User -->|Submits URL| API[FastAPI backend]
    Admin -->|Manages data| API
    
    API -->|Stores data| DB[(SQLite/PostgreSQL)]    API -->|Extracts text| Crawler[Web crawler]
    Crawler -->|Processes content| LLM[LLM client]
    
    API -->|Generates profile| LLM
    API -->|Generates static pages| Generator[Site generator]
    
    Generator -->|Generates HTML| Static [Static website]
    
    User -->|Views results| Static
    
    subgraph Backend
        API
        Crawler
        LLM
        Generator
    end
    
    subgraph Database
        DB
    end
    
    subgraph Frontend
        Static
    end

Main components#

1. FastAPI backend#

The backend is implemented with FastAPI and forms the core of the application. It provides RESTful API endpoints for:

Submitting URLs
Extracting and analysing website content
Managing submissions and projects
Generating profiles
Generating the static website

2. Web Crawler & Extractor#

This component is responsible for extracting relevant information from submitted web pages:

SimpleExtractor: Extracts basic text and metadata from web pages
FeatureAwareExtractor: Advanced extraction with a focus on AI-specific content

3. LLM integration#

The application uses local language models (LLM) via an OpenAI-compatible client to:

Analyse the extracted texts
Identify AI-relevant information
Generate structured profiles
Recognise relationships between different projects

4. Database#

The application uses SQLite for development and PostgreSQL for production:

Submissions: Stores submitted URLs and their status
Projects: Stores approved and structured project data
Users: Manages administrator accounts for the backend

5. Site generator#

The site generator:

Generates static HTML pages from the project data
Uses Jinja2 templates for consistent design
Creates index, project list and detail pages
Generates metadata for search engines

6. Frontend#

The frontend consists of:

Static HTML pages with CSS and JavaScript
User-friendly forms for submitting URLs
Visualisations and filters for project data
Responsive design for various end devices

Technology stack#

Backend#

Python 3.11+: Basic programming language
FastAPI: Web framework for modern API development
SQLAlchemy: ORM for database access
Pydantic: Data validation and conversion
Alembic: Database migration tool
OpenAI client: Communication with local LLMs
BeautifulSoup/Trafilatura: HTML parsing and text extraction

Frontend#

HTML5/CSS3: Markup and styling
JavaScript: Client-side interactivity
Jinja2: Template engine for site generation

Database#

SQLite: For development and smaller deployments
PostgreSQL: For production environments

Deployment#

Docker: Containerisation
Docker Compose: Container orchestration

Data flow#

User submits URL to an AI project via the web form
The URL is validated and stored as a submission in the database
The crawler extracts text and metadata from the URL
The FeatureAwareExtractor analyses the content and identifies relevant structures
The LLM client generates a structured profile based on the extracted text
An administrator reviews and approves the submission
The approved project is stored in the database
The site generator creates updated static HTML pages
Users can view and search the projects on the website

Security and authentication concept#

API key-based authentication for admin endpoints
Validation and cleansing of all user input
CORS configuration for front-end security
Server-based CSRF tokens

Scaling concept#

The application is designed for different scaling levels:

Simple deployment: Single server with SQLite for smaller instances
Medium scaling: PostgreSQL database and Docker deployment
Advanced scaling: Distributed crawlers and LLM processing for high request volumes

Since the front-end presentation is done via static pages, scaling the website deployment is easy via CDNs or static hosting services.