- Refactored OCR page recognition to utilize a configured Tesseract worker.
- Added functions to manage font URLs and asset filenames based on language.
- Implemented language availability checks and error handling for unsupported languages.
- Enhanced PDF workflow to display available OCR languages and handle user selections.
- Introduced utility functions for resolving Tesseract asset configurations.
- Added tests for OCR functionality, font loading, and Tesseract runtime behavior.
- Updated global types to include environment variables for Tesseract and font configurations.
- Create new hocr-transform.ts utility for parsing hOCR output
- Add line-aware text processing with baseline and rotation support
- Implement width-based font size calculation to match word bounding boxes
- Fix text selection not covering full characters issue
- Add proper type definitions for OcrLine, OcrPage, WordTransform
- Support RTL languages and CJK word break handling