githubEdit

Knowledge Base Sync Repository (KBs)

This repository is the source-of-truth mirror for DuploCloud Knowledge Base Articles (KBAs) that are published in GitBook and consumed by the AI Agent PrivateGPT via the GitBook MCP server.

Its primary responsibility is to automatically synchronize KB content exported to S3 into Git, so that:

  • GitBook stays up to date

  • Content changes are auditable in Git

  • AI agents can reliably read the latest KBAs via MCP


High-level Architecture

Pylon KB Export (S3)

GitHub Actions (scheduled)

tools/pylon_sync_from_s3.py

/pylon (Git-tracked markdown)

GitBook Space (/KB)

GitBook MCP Server

PrivateGPT / DuploCloud AI Agents

Repository Purpose

This repo exists to:

  • Pull the latest Pylon Knowledge Base export from S3

  • Normalize it into a Git-friendly Markdown structure

  • Commit changes automatically

  • Serve as the backing repository for the GitBook KB space

  • Enable AI agent PrivateGPT to query up-to-date KBAs through the GitBook MCP server

No Kubernetes CronJobs. No cluster credentials. Everything is GitHub-native and auditable.


Directory Structure

/pylon Directory

The /pylon directory is GitBook-facing content.

pylon/public/

  • Public knowledge base articles

  • Markdown files generated from Pylon exports

  • File names are stable and ID-based:

pylon/customer/

  • Customer-only KB articles

  • Same naming and structure as public articles

  • Used by authenticated GitBook content and AI agents

pylon/.last_manifest_etag

  • Stores the ETag of the last processed manifests/latest.json

  • Used to detect no-op syncs

  • Prevents unnecessary downloads and commits

This file is intentionally committed to Git to preserve sync state across runs.

GitHub Actions Workflow

Workflow: Sync Pylon KB export from S3

Defined in:

Triggers

  • Scheduled: Every Monday morning (UTC-based cron)

  • Manual: workflow_dispatch (can target a branch if needed)

What it does

  1. Checks out the target branch (normally main)

  2. Sets up Python

  3. Installs dependencies

  4. Runs the sync script

  5. Commits and pushes changes only if content changed

Why GitHub Actions?

  • No cluster resources

  • No CronJob DST issues

  • Native audit logs

  • Uses GITHUB_TOKEN (no secrets stored in Kubernetes)


Sync Tool (tools/pylon_sync_from_s3.py)

This Python script is the engine of the repository.

Responsibilities

  • Fetch manifests/latest.json from the Pylon S3 export

  • Compare manifest ETag with .last_manifest_etag

  • Exit early if nothing changed

  • Download public + customer article indexes

  • Fetch all referenced articles

  • Write normalized Markdown into /pylon

  • Optionally archive removed content

  • Leave Git commit/push to the workflow

Configuration (via environment variables)

Variable
Purpose

EXPORT_BASE_URL

Base S3 URL for the Pylon export

REPO_OUTPUT_ROOT

Output directory (pylon)

INCLUDE_PUBLIC

Include public KB articles

INCLUDE_CUSTOMER

Include customer KB articles

ARCHIVE_REMOVED

Archive removed articles


Relationship to GitBook

  • This repo is connected to a GitBook space (e.g. /KB)

  • GitBook reads Markdown directly from /pylon

  • SUMMARY.md controls navigation

  • Changes pushed here automatically reflect in GitBook

This repo does not render content itself — GitBook is the presentation layer.


Relationship to AI / PrivateGPT

  • DuploCloud’s PrivateGPT AI Agent uses the GitBook MCP server

  • MCP provides structured, read-only access to KB content

  • By keeping GitBook synced, we ensure:

    • AI agents always see the latest KBAs

    • No direct S3 or Pylon coupling inside the agent

    • Clear separation between content ingestion and AI inference


Design Principles

  • Git is the source of truth

  • Stateless sync jobs

  • Idempotent runs

  • No cluster dependencies

  • Human-readable diffs

  • AI-consumable structure

Operational Notes

  • Normal runs should be no-op if content hasn’t changed

  • Large commits indicate real KB updates

  • Manual runs are safe for testing

  • Feature branches can be used for dry-runs before merging to main

Last updated

Was this helpful?