|
|
2 päivää sitten | |
|---|---|---|
| book_ingestor | 2 päivää sitten | |
| books | 2 päivää sitten | |
| .env.example | 2 päivää sitten | |
| .gitignore | 3 päivää sitten | |
| Dockerfile | 2 päivää sitten | |
| PROJECT.md | 3 päivää sitten | |
| README.md | 2 päivää sitten | |
| docker-compose.yml | 2 päivää sitten | |
| requirements.txt | 2 päivää sitten |
"The agent reads the book so you don't have to explain it."
A standalone Python service that watches a folder for PDFs (and other text documents), intelligently processes them into layered memories, and feeds them into a mem0 server via its REST API.
The result: your AI agent doesn't search for knowledge — it simply knows it.
📂 books/inbox/ ← drop a PDF here
↓ (watchdog detects new file)
🔍 Structure Detection ← is this a book with chapters, or a flat doc?
↓
✂️ Chunking ← smart paragraph/semantic chunking (no LLM used)
↓
🧠 Summarization ← Groq/Llama generates book + chapter summaries
↓
💾 mem0 /memories ← layered memories POSTed to your mem0 server
↓
📂 books/done/ ← file archived, manifest saved
Memories are stored in layers:
git clone https://github.com/yourname/book-ingestor.git
cd book-ingestor
cp .env.example .env # fill in your values
docker compose up -d --build
Watch logs:
docker compose logs -f
Stop / restart:
docker compose down
docker compose up -d
If a PDF gets stuck in books/processing/ after an interrupted run:
mv books/processing/*.pdf books/inbox/
docker compose restart
git clone https://github.com/yourname/book-ingestor.git
cd book-ingestor
cp .env.example .env # fill in your values
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python -m book_ingestor.watchdog_runner
Drop a PDF into books/inbox/ and watch it get ingested.
All config lives in .env:
MEM0_BASE_URL=http://192.168.0.200:8420
MEM0_AGENT_ID=knowledge_base
GROQ_API_KEY=your_groq_key_here
GROQ_MODEL=meta-llama/llama-4-scout-17b-16e-instruct
BOOKS_INBOX=./books/inbox
BOOKS_PROCESSING=./books/processing
BOOKS_DONE=./books/done
BOOKS_MANIFESTS=./books/manifests
CHUNK_SIZE_TOKENS=350
LOG_LEVEL=INFO
book-ingestor/
├── books/
│ ├── inbox/ ← drop zone (watched)
│ ├── processing/ ← in-flight (do not touch)
│ ├── done/ ← archived originals
│ └── manifests/ ← JSON record per ingested book
├── book_ingestor/
│ ├── watchdog_runner.py
│ ├── pipeline.py
│ ├── detector.py
│ ├── chunker.py
│ ├── summarizer.py
│ ├── mem0_writer.py
│ ├── manifest.py
│ └── config.py
├── Dockerfile
├── docker-compose.yml
├── .env.example
├── requirements.txt
├── PROJECT.md
└── README.md
| Format | Status |
|---|---|
| PDF (text-based) | ✅ |
| PDF (scanned/image) | 🔜 (OCR planned) |
| Markdown (.md) | 🔜 |
| Plain text (.txt) | 🔜 |
| EPUB | 🔜 |
books/ folder is mounted into the container — PDFs, manifests and archives survive restarts and rebuilds.network_mode: host is used so the container can reach your LAN mem0 server without extra networking config.MIT