# 📚 book-ingestor > *"The agent reads the book so you don't have to explain it."* A standalone Python service that watches a folder for PDFs (and other text documents), intelligently processes them into layered memories, and feeds them into a [mem0](https://github.com/mem-ai/mem0) server via its REST API. The result: your AI agent doesn't *search* for knowledge — it simply *knows* it. --- ## How it works ``` 📂 books/inbox/ ← drop a PDF here ↓ (watchdog detects new file) 🔍 Structure Detection ← is this a book with chapters, or a flat doc? ↓ ✂️ Chunking ← smart paragraph/semantic chunking (no LLM used) ↓ 🧠 Summarization ← Groq/Llama generates book + chapter summaries ↓ 💾 mem0 /memories ← layered memories POSTed to your mem0 server ↓ 📂 books/done/ ← file archived, manifest saved ``` Memories are stored in layers: - **Book summary** — one high-level memory for the whole document - **Chapter summaries** — one memory per chapter/section (structured docs) - **Content chunks** — paragraph-level memories for fine-grained recall --- ## Requirements - Python 3.11+ - A running [mem0 server](https://github.com/mem-ai/mem0) accessible on your LAN - A [Groq API key](https://console.groq.com/) (free tier is plenty) --- ## Quick Start ```bash git clone https://github.com/yourname/book-ingestor.git cd book-ingestor cp .env.example .env # fill in your values pip install -r requirements.txt python -m book_ingestor.watchdog_runner ``` Drop a PDF into `books/inbox/` and watch it get ingested. --- ## Configuration All config lives in `.env`: ```env MEM0_BASE_URL=http://192.168.0.200:8420 MEM0_AGENT_ID=knowledge_base GROQ_API_KEY=your_groq_key_here GROQ_MODEL=meta-llama/llama-4-scout-17b-16e-instruct BOOKS_INBOX=./books/inbox BOOKS_PROCESSING=./books/processing BOOKS_DONE=./books/done BOOKS_MANIFESTS=./books/manifests CHUNK_SIZE_TOKENS=350 LOG_LEVEL=INFO ``` --- ## Folder Structure ``` book-ingestor/ ├── books/ │ ├── inbox/ ← drop zone (watched) │ ├── processing/ ← in-flight (do not touch) │ ├── done/ ← archived originals │ └── manifests/ ← JSON record per ingested book ├── book_ingestor/ │ ├── watchdog_runner.py │ ├── pipeline.py │ ├── detector.py │ ├── chunker.py │ ├── summarizer.py │ ├── mem0_writer.py │ ├── manifest.py │ └── config.py ├── .env.example ├── requirements.txt ├── PROJECT.md └── README.md ``` --- ## Supported File Types | Format | Status | |--------|--------| | PDF (text-based) | ✅ | | PDF (scanned/image) | 🔜 (OCR planned) | | Markdown (.md) | 🔜 | | Plain text (.txt) | 🔜 | | EPUB | 🔜 | --- ## Notes - This project is **completely independent** of OpenClaw or any specific AI agent — it only talks to mem0. - Any machine on the LAN with network access to your mem0 server can run this. - Docker support is planned for a future release. --- ## License MIT