Karpathy LLM Wiki Pattern

episteme

An LLM-maintained wiki for arXiv papers.

Ingests PDFs into a self-maintaining markdown knowledge base — section by section, with full source tracking. Every answer is traced back to the exact passage in the original paper that produced it.

Python ChromaDB SQLite Markdown Wiki Source Tracing arXiv
SCROLL

Background

Karpathy's LLM Wiki, for research

In April 2026, Andrej Karpathy shared a pattern that went viral in the AI community.

The core idea: Instead of re-reading raw documents every time you ask a question (RAG), have the LLM compile sources into a structured markdown wiki at ingest time. Knowledge accumulates. The LLM is the librarian.

episteme implements this for academic papers and adds a source tracing layer: after answering from the wiki, it embeds the response and vector-searches the original source sections — returning the exact passages that produced the answer. Every claim is auditable.

System Design

Two pipelines

Two POST endpoints. The ingest pipeline builds the wiki. The chat pipeline queries it and traces the answer back to the source.

POST /ingest arXiv PDF → wiki
📄
PDF Uploaded
arXiv PDF received via multipart form
✂️
Section Splitter
Splits into logical sections
{ section_id, title, text, source_file }
🗄️
Indexed to SQLite
section_id, section_text,
section_title, source_filename
SQLite
🔷
Embedded to ChromaDB
Vector embedding of section_text
metadata: section_id, source_filename
ChromaDB
🤖
LLM Wiki Writer
For each section from the paper:
1. Check wiki/index.md — page exists?
2. Yes → retrieve section, update it
   No → create new wiki page + section
3. Write section_id: descindex.md
4. Write section_id: source_idindex.json
wiki/*.md index.md index.json LLM
POST /chat query → answer + sources
💬
User Query
{ "query": "How does X work?" }
📖
Wiki Section Retrieval
Search wiki/index.md for relevant
section_ids + descriptions
Fetch matching wiki page sections
index.md wiki/*.md
🤖
LLM Answer Generation
Answer generated from wiki
section context only
LLM
🔍
Source Tracing
1. Lookup section_idssource_ids via index.json
2. Embed the LLM response
3. Vector search in ChromaDB,
   scoped to those source papers only
4. Return exact passages + similarity
index.json ChromaDB
Response
answer + [ { paper, section,
passage, similarity } ]

Wiki Internals

index.md & index.json

All reads and writes happen at the section level. These two index files are what let the system track exactly where every piece of knowledge came from.

wiki/index.md .md

Human-readable. Maps every section_id to a description of what that wiki section covers and which paper it came from. The LLM consults this on every ingest to decide whether to create a new section or update an existing one.

## sec_001
Transformer architecture overview.
Encoder-decoder structure, self-attention.
Source: attention_is_all_you_need.pdf

## sec_002
Multi-head attention mechanism.
Parallel projection heads, concatenation.
Source: attention_is_all_you_need.pdf

## sec_003
BERT masked language modeling.
Pre-training objective, [MASK] token.
Source: bert_pretraining.pdf
wiki/index.json .json

Machine-readable. Maps section_id → source_id (source filename). Used by the chat pipeline's tracer to scope the ChromaDB vector search to only the papers that the retrieved wiki sections were compiled from.


  "sec_001" "attention_is_all_you_need.pdf"
  "sec_002" "attention_is_all_you_need.pdf"
  "sec_003" "bert_pretraining.pdf"
  "sec_004" "bert_pretraining.pdf"
  "sec_005" "gpt2_paper.pdf"
  "sec_006" "attention_is_all_you_need.pdf"
A wiki section can have multiple sources: if two papers both discuss attention mechanisms, sec_002 in index.json maps to both. The tracer then searches ChromaDB against all of them.

Comparison

Why not just RAG?

The difference is when the reasoning happens — at query time, or at ingest time.

Property Classic RAG episteme (LLM Wiki)
Knowledge accumulationNone — every query starts freshWiki grows richer with each paper
Cross-paper synthesisChunks retrieved in isolationLLM links concepts at write time
Source attributionRough chunk-levelExact passage via post-answer trace
Self-maintenancePassive — never improvesLLM updates wiki on new ingests
Human readabilityRaw chunks, not browsableStructured markdown you can read

Structure

Project layout

episteme/ ├── backend/ │ ├── app.py # POST /ingest, POST /chat │ │ │ ├── ingest/ │ │ ├── pdf_splitter.py # Section-aware PDF extraction │ │ ├── section_store.py # SQLite: section text + metadata │ │ ├── vector_store.py # ChromaDB: section embeddings │ │ └── wiki_writer.py # LLM wiki updater (section-level) │ │ │ ├── chat/ │ │ ├── retriever.py # wiki/index.md section lookup │ │ ├── responder.py # LLM answer from wiki context │ │ └── tracer.py # index.json → ChromaDB source trace │ │ │ ├── wiki/ │ │ ├── index.md # section_id → description │ │ ├── index.json # section_id → source_file │ │ └── *.md # wiki pages (one per topic) │ │ │ └── requirements.txt │ └── frontend/ # Chat UI ├── index.html ├── src/app.js └── styles/main.css

API Reference

Two endpoints

POST /ingest
# request curl -X POST localhost:8000/ingest -F "file=@attention_is_all_you_need.pdf" # response { "status": "ok", "paper": "attention_is_all_you_need.pdf", "sections_indexed": 12, "wiki_pages_updated": ["transformers.md", "attention_mechanism.md"], "new_sections": ["sec_041", "sec_042"] }
POST /chat
# request curl -X POST localhost:8000/chat \ -H "Content-Type: application/json" \ -d '{"query": "How does multi-head attention work?"}' # response { "answer": "Multi-head attention runs the attention function in parallel...", "wiki_sections_used": ["sec_002", "sec_007"], "sources": [ { "paper": "attention_is_all_you_need.pdf", "section": "3.2 Multi-Head Attention", "passage": "Instead of performing a single attention function...", "similarity": 0.94 } ] }

Quick Start

Get running

01
Clone & set up
git clone https://github.com/xreedev/episteme.git cd episteme/backend python -m venv venv && source venv/bin/activate pip install -r requirements.txt
02
Configure .env
LLM_API_KEY=your_key LLM_MODEL=gpt-4o # or claude-3-5-sonnet CHROMA_PATH=./chroma_db SQLITE_PATH=./sections.db WIKI_PATH=./wiki
03
Start the server & ingest a paper
python app.py # → localhost:8000 curl -X POST localhost:8000/ingest -F "file=@paper.pdf" curl -X POST localhost:8000/chat \ -d '{"query":"What does this paper contribute?"}'

GitHub Metadata

Repo settings

Chosen so developers who saw Karpathy's LLM Wiki idea find this project.

Repo Name
episteme
Description
"LLM wiki for arXiv papers — ingests PDFs into a self-maintaining markdown knowledge base and traces every answer back to the exact source passage."
Topics
karpathy-llm-wiki llm-wiki second-brain arxiv source-attribution verifiable-ai chromadb sqlite knowledge-base research-assistant pdf-ingestion markdown-wiki python rag

Built by

Sreedev

Building tools that make research tractable.

Inspired by Andrej Karpathy's LLM Wiki gist