Zhipu AI's GLM-OCR reaches 94.62% on OmniDocBench v1.5 despite using only 0.9B parameters. I dug into its layout parsing, vertical text handling, and math recognition.
Baidu's PaddleOCR-VL-1.5 reaches 94.5% accuracy on OmniDocBench v1.5 with just 0.9B parameters, surpassing large models such as GPT-4o and Qwen2.5-VL-72B.
An explanation of the difference between conventional OCR and VLM (vision-language model) based OCR. Introduces DeepSeek-OCR and explores the possibility of combining both approaches.