Changelog for AH2K25 Project

This document outlines the incremental improvements and changes across three versions of the AH2K25 project: Version 1a, Version 1b, and the Final Backend.

Final Backend Overview

The Final Backend is a microservices-based platform with unified middleware, robust timeouts, and observability hooks. Tasks 1a and 1b were independent efforts; we leveraged both sets of work to deliver this production-ready backend.

Architecture

Microservices with clear boundaries and APIs.
Unified middleware in backend/core:middleware.py,exception_handler.py,error_handler.py,timeout_middleware.py.
Request timeouts across services to prevent hangs.
Observability-ready: structured logs/metrics compatible with Grafana/Prometheus/Loki.

Key Services

chat_service, collection_summary_service, insights_service
persona, podcast_service, relevance_service
chunk_builder, embeddings, indexer, snippets
core (logging + middleware)

Operational Readiness

Strict timeouts via timeout_middleware.py.
Global exception and error handling with consistent JSON responses.
Metrics/logs structured for drop-in dashboards and alerts (Grafana/Prometheus/Loki).
Tests for error/timeout handling in backend validate resilience.

File Structure (Final Backend)

├── LOGGING_README.md
├── README.md
├── chat_service/
├── chunk_builder/
├── collection_summary_service/
├── core/
├── embeddings/
├── indexer/
├── insights_service/
├── lexicons/
├── main.py
├── outline_extractor/
├── persona/
├── podcast_service/
├── relevance_service/
├── scripts/
├── snippets/
├── uploads/
├── vector_store/
└── web_static/

Foundations from Prior Tasks

Task 1a (Independent)

PDF heading extraction with heuristics + XGBoost.
PyMuPDF (fitz) text extraction and heading level assignment.
Title extraction and multilingual support.

Task 1b (Independent)

RAG pipeline with embeddings, retrieval, and cross-encoder reranking.
Persona-aware filtering and dynamic exclude lists.
Standardized I/O and analysis of relevant sections.