episteme

Background

Karpathy's LLM Wiki, for research

In April 2026, Andrej Karpathy shared a pattern that went viral in the AI community.

The core idea: Instead of re-reading raw documents every time you ask a question (RAG), have the LLM compile sources into a structured markdown wiki at ingest time. Knowledge accumulates. The LLM is the librarian.

episteme implements this for academic papers and adds a source tracing layer: after answering from the wiki, it embeds the response and vector-searches the original source sections — returning the exact passages that produced the answer. Every claim is auditable.

System Design

Two pipelines

Two POST endpoints. The ingest pipeline builds the wiki. The chat pipeline queries it and traces the answer back to the source.

POST /ingest arXiv PDF → wiki

📄

PDF Uploaded

arXiv PDF received via multipart form

✂️

Section Splitter

Splits into logical sections
{ section_id, title, text, source_file }

🗄️

Indexed to SQLite

section_id, section_text,
section_title, source_filename

SQLite

🔷

Embedded to ChromaDB

Vector embedding of section_text
metadata: section_id, source_filename

ChromaDB

🤖
LLM Wiki Writer

                  For each section from the paper:

                  1. Check wiki/index.md — page exists?

                  2. Yes → retrieve section, update it

                     No → create new wiki page + section

                  3. Write section_id: desc → index.md

                  4. Write section_id: source_id → index.json
                
                  wiki/*.md
                  index.md
                  index.json
                  LLM

POST /chat query → answer + sources

💬

User Query

{ "query": "How does X work?" }

📖

Wiki Section Retrieval

Search wiki/index.md for relevant
section_ids + descriptions
Fetch matching wiki page sections

index.md wiki/*.md

🤖

LLM Answer Generation

Answer generated from wiki
section context only

LLM

🔍
Source Tracing

                  1. Lookup section_ids → source_ids via index.json

                  2. Embed the LLM response

                  3. Vector search in ChromaDB,

                     scoped to those source papers only

                  4. Return exact passages + similarity
                
                  index.json
                  ChromaDB

✅

Response

answer + [ { paper, section,
passage, similarity } ]

Wiki Internals

index.md & index.json

All reads and writes happen at the section level. These two index files are what let the system track exactly where every piece of knowledge came from.

wiki/index.md .md

Human-readable. Maps every section_id to a description of what that wiki section covers and which paper it came from. The LLM consults this on every ingest to decide whether to create a new section or update an existing one.

## sec_001
Transformer architecture overview.
Encoder-decoder structure, self-attention.
Source: attention_is_all_you_need.pdf

## sec_002
Multi-head attention mechanism.
Parallel projection heads, concatenation.
Source: attention_is_all_you_need.pdf

## sec_003
BERT masked language modeling.
Pre-training objective, [MASK] token.
Source: bert_pretraining.pdf

wiki/index.json .json

Machine-readable. Maps section_id → source_id (source filename). Used by the chat pipeline's tracer to scope the ChromaDB vector search to only the papers that the retrieved wiki sections were compiled from.

{
  "sec_001": "attention_is_all_you_need.pdf",
  "sec_002": "attention_is_all_you_need.pdf",
  "sec_003": "bert_pretraining.pdf",
  "sec_004": "bert_pretraining.pdf",
  "sec_005": "gpt2_paper.pdf",
  "sec_006": "attention_is_all_you_need.pdf"
}

A wiki section can have multiple sources: if two papers both discuss attention mechanisms, sec_002 in index.json maps to both. The tracer then searches ChromaDB against all of them.

Comparison

Why not just RAG?

The difference is when the reasoning happens — at query time, or at ingest time.

Property	Classic RAG	episteme (LLM Wiki)
Knowledge accumulation	None — every query starts fresh	Wiki grows richer with each paper
Cross-paper synthesis	Chunks retrieved in isolation	LLM links concepts at write time
Source attribution	Rough chunk-level	Exact passage via post-answer trace
Self-maintenance	Passive — never improves	LLM updates wiki on new ingests
Human readability	Raw chunks, not browsable	Structured markdown you can read

Structure

Project layout

episteme/
├── backend/
│   ├── app.py                   # POST /ingest, POST /chat
│   │
│   ├── ingest/
│   │   ├── pdf_splitter.py      # Section-aware PDF extraction
│   │   ├── section_store.py     # SQLite: section text + metadata
│   │   ├── vector_store.py      # ChromaDB: section embeddings
│   │   └── wiki_writer.py       # LLM wiki updater (section-level)
│   │
│   ├── chat/
│   │   ├── retriever.py         # wiki/index.md section lookup
│   │   ├── responder.py         # LLM answer from wiki context
│   │   └── tracer.py            # index.json → ChromaDB source trace
│   │
│   ├── wiki/
│   │   ├── index.md             # section_id → description
│   │   ├── index.json           # section_id → source_file
│   │   └── *.md                 # wiki pages (one per topic)
│   │
│   └── requirements.txt
│
└── frontend/                    # Chat UI
    ├── index.html
    ├── src/app.js
    └── styles/main.css

API Reference

Two endpoints

POST /ingest

# request
curl -X POST localhost:8000/ingest -F "file=@attention_is_all_you_need.pdf"

# response
{
  "status": "ok",
  "paper": "attention_is_all_you_need.pdf",
  "sections_indexed": 12,
  "wiki_pages_updated": ["transformers.md", "attention_mechanism.md"],
  "new_sections": ["sec_041", "sec_042"]
}

POST /chat

# request
curl -X POST localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"query": "How does multi-head attention work?"}'

# response
{
  "answer": "Multi-head attention runs the attention function in parallel...",
  "wiki_sections_used": ["sec_002", "sec_007"],
  "sources": [
    {
      "paper": "attention_is_all_you_need.pdf",
      "section": "3.2 Multi-Head Attention",
      "passage": "Instead of performing a single attention function...",
      "similarity": 0.94
    }
  ]
}

Quick Start

Get running

01

Clone & set up

git clone https://github.com/xreedev/episteme.git cd episteme/backend python -m venv venv && source venv/bin/activate pip install -r requirements.txt

02

Configure .env

LLM_API_KEY=your_key LLM_MODEL=gpt-4o # or claude-3-5-sonnet CHROMA_PATH=./chroma_db SQLITE_PATH=./sections.db WIKI_PATH=./wiki

03

Start the server & ingest a paper

python app.py # → localhost:8000 curl -X POST localhost:8000/ingest -F "file=@paper.pdf" curl -X POST localhost:8000/chat \ -d '{"query":"What does this paper contribute?"}'

GitHub Metadata

Repo settings

Chosen so developers who saw Karpathy's LLM Wiki idea find this project.

Repo Name

episteme

Description

"LLM wiki for arXiv papers — ingests PDFs into a self-maintaining markdown knowledge base and traces every answer back to the exact source passage."

Topics

karpathy-llm-wiki llm-wiki second-brain arxiv source-attribution verifiable-ai chromadb sqlite knowledge-base research-assistant pdf-ingestion markdown-wiki python rag

Karpathy's LLM Wiki, for research

Two pipelines

index.md & index.json

Why not just RAG?

Project layout

Two endpoints

Get running

Repo settings

Sreedev