I looked into PageIndex, a RAG system that builds hierarchical document trees using only LLM reasoning, without chunking or vector databases. I also consider how it fits with layout detection and OCR pipelines.
Introducing a Mac mini M4 Pro to build an in-house RAG system. A plan for setting up a LoRA training environment during downtime while waiting for specs to be finalized.
When Layout Parser wouldn't install and NDLOCR alone couldn't handle a 4-column vertical text book, I used PyMuPDF and histogram analysis to brute-force split the columns.