|
|
3 hari lalu | |
|---|---|---|
| .gitignore | 3 hari lalu | |
| PROJECT.md | 3 hari lalu | |
| README.md | 3 hari lalu |
"The agent reads the book so you don't have to explain it."
A standalone Python service that watches a folder for PDFs (and other text documents), intelligently processes them into layered memories, and feeds them into a mem0 server via its REST API.
The result: your AI agent doesn't search for knowledge — it simply knows it.
📂 books/inbox/ ← drop a PDF here
↓ (watchdog detects new file)
🔍 Structure Detection ← is this a book with chapters, or a flat doc?
↓
✂️ Chunking ← smart paragraph/semantic chunking (no LLM used)
↓
🧠 Summarization ← Groq/Llama generates book + chapter summaries
↓
💾 mem0 /memories ← layered memories POSTed to your mem0 server
↓
📂 books/done/ ← file archived, manifest saved
Memories are stored in layers:
git clone https://github.com/yourname/book-ingestor.git
cd book-ingestor
cp .env.example .env # fill in your values
pip install -r requirements.txt
python -m book_ingestor.watchdog_runner
Drop a PDF into books/inbox/ and watch it get ingested.
All config lives in .env:
MEM0_BASE_URL=http://192.168.0.200:8420
MEM0_AGENT_ID=knowledge_base
GROQ_API_KEY=your_groq_key_here
GROQ_MODEL=meta-llama/llama-4-scout-17b-16e-instruct
BOOKS_INBOX=./books/inbox
BOOKS_PROCESSING=./books/processing
BOOKS_DONE=./books/done
BOOKS_MANIFESTS=./books/manifests
CHUNK_SIZE_TOKENS=350
LOG_LEVEL=INFO
book-ingestor/
├── books/
│ ├── inbox/ ← drop zone (watched)
│ ├── processing/ ← in-flight (do not touch)
│ ├── done/ ← archived originals
│ └── manifests/ ← JSON record per ingested book
├── book_ingestor/
│ ├── watchdog_runner.py
│ ├── pipeline.py
│ ├── detector.py
│ ├── chunker.py
│ ├── summarizer.py
│ ├── mem0_writer.py
│ ├── manifest.py
│ └── config.py
├── .env.example
├── requirements.txt
├── PROJECT.md
└── README.md
| Format | Status |
|---|---|
| PDF (text-based) | ✅ |
| PDF (scanned/image) | 🔜 (OCR planned) |
| Markdown (.md) | 🔜 |
| Plain text (.txt) | 🔜 |
| EPUB | 🔜 |
MIT