feat: enhance sanitization

This commit is contained in:
alam00000
2026-04-17 23:40:24 +05:30
parent d92ee1a003
commit b4779bb49b
35 changed files with 2703 additions and 1240 deletions

View File

@@ -95,8 +95,8 @@ docker run -d -p 3000:8080 bentopdf:custom
| `SIMPLE_MODE` | Build without LibreOffice tools | `false` |
| `BASE_URL` | Deploy to subdirectory | `/` |
| `VITE_WASM_PYMUPDF_URL` | PyMuPDF WASM module URL | `https://cdn.jsdelivr.net/npm/@bentopdf/pymupdf-wasm@0.11.16/` |
| `VITE_WASM_GS_URL` | Ghostscript WASM module URL | `https://cdn.jsdelivr.net/npm/@bentopdf/gs-wasm/assets/` |
| `VITE_WASM_CPDF_URL` | CoherentPDF WASM module URL | `https://cdn.jsdelivr.net/npm/coherentpdf/dist/` |
| `VITE_WASM_GS_URL` | Ghostscript WASM module URL | `https://cdn.jsdelivr.net/npm/@bentopdf/gs-wasm@0.1.1/assets/` |
| `VITE_WASM_CPDF_URL` | CoherentPDF WASM module URL | `https://cdn.jsdelivr.net/npm/coherentpdf@2.5.5/dist/` |
| `VITE_TESSERACT_WORKER_URL` | OCR worker script URL | _(empty; use Tesseract.js default CDN)_ |
| `VITE_TESSERACT_CORE_URL` | OCR core runtime directory | _(empty; use Tesseract.js default CDN)_ |
| `VITE_TESSERACT_LANG_URL` | OCR traineddata directory | _(empty; use Tesseract.js default CDN)_ |
@@ -110,6 +110,16 @@ docker run -d -p 3000:8080 bentopdf:custom
WASM module URLs are pre-configured with CDN defaults — all advanced features work out of the box. Override these for air-gapped or self-hosted deployments.
### Content-Security-Policy
The nginx image ships with an enforcing `Content-Security-Policy` header. The CSP's `script-src`, `connect-src`, and `font-src` directives are **generated at build time** from the `VITE_*` URL variables above — whatever hosts you pass via `--build-arg` are automatically added to the policy.
As a result:
- If you override `VITE_CORS_PROXY_URL` or `VITE_WASM_*_URL` at build time, the CSP permits those origins automatically — no extra config needed.
- If you configure custom WASM URLs at _runtime_ via the in-app Advanced Settings page, those origins are **not** in the CSP and the browser will block fetches to them. Runtime configuration is intended for experimentation; for permanent custom URLs set the matching `VITE_*` build arg.
- Air-gapped deployments that override all three `VITE_WASM_*_URL` values also get the public `cdn.jsdelivr.net` removed from CSP (each default is replaced, not appended). Similarly, setting `VITE_CORS_PROXY_URL` replaces the public `bentopdf-cors-proxy.bentopdf.workers.dev` default.
For OCR, leave the `VITE_TESSERACT_*` variables empty to use the default online assets, or set all three together for self-hosted/offline OCR. Partial OCR overrides are rejected because the worker, core runtime, and traineddata directory must match. For fully offline searchable PDF output, also set `VITE_OCR_FONT_BASE_URL` so the OCR text-layer fonts are loaded from your internal server instead of the public Noto font URLs.
`VITE_DEFAULT_LANGUAGE` sets the UI language for first-time visitors. Supported values: `en`, `ar`, `be`, `fr`, `de`, `es`, `zh`, `zh-TW`, `vi`, `tr`, `id`, `it`, `pt`, `nl`, `da`. Users can still switch languages — this only changes the default.