Back to Home

Changelog for AH2K25 Project

This document outlines the incremental improvements and changes across three versions of the AH2K25 project: Version 1a, Version 1b, and the Final Backend.

Final Backend Overview

The Final Backend is a microservices-based platform with unified middleware, robust timeouts, and observability hooks. Tasks 1a and 1b were independent efforts; we leveraged both sets of work to deliver this production-ready backend.

Architecture

  • Microservices with clear boundaries and APIs.
  • Unified middleware in backend/core:middleware.py,exception_handler.py,error_handler.py,timeout_middleware.py.
  • Request timeouts across services to prevent hangs.
  • Observability-ready: structured logs/metrics compatible with Grafana/Prometheus/Loki.

Key Services

  • chat_service, collection_summary_service, insights_service
  • persona, podcast_service, relevance_service
  • chunk_builder, embeddings, indexer, snippets
  • core (logging + middleware)

Operational Readiness

  • Strict timeouts via timeout_middleware.py.
  • Global exception and error handling with consistent JSON responses.
  • Metrics/logs structured for drop-in dashboards and alerts (Grafana/Prometheus/Loki).
  • Tests for error/timeout handling in backend validate resilience.

File Structure (Final Backend)

├── LOGGING_README.md
├── README.md
├── chat_service/
├── chunk_builder/
├── collection_summary_service/
├── core/
├── embeddings/
├── indexer/
├── insights_service/
├── lexicons/
├── main.py
├── outline_extractor/
├── persona/
├── podcast_service/
├── relevance_service/
├── scripts/
├── snippets/
├── uploads/
├── vector_store/
└── web_static/

Foundations from Prior Tasks

Task 1a (Independent)

  • PDF heading extraction with heuristics + XGBoost.
  • PyMuPDF (fitz) text extraction and heading level assignment.
  • Title extraction and multilingual support.

Task 1b (Independent)

  • RAG pipeline with embeddings, retrieval, and cross-encoder reranking.
  • Persona-aware filtering and dynamic exclude lists.
  • Standardized I/O and analysis of relevant sections.