feat: enhance sanitization

This commit is contained in:
alam00000
2026-04-17 23:40:24 +05:30
parent d92ee1a003
commit b4779bb49b
35 changed files with 2703 additions and 1240 deletions

View File

@@ -95,8 +95,8 @@ docker run -d -p 3000:8080 bentopdf:custom
| `SIMPLE_MODE` | Build without LibreOffice tools | `false` |
| `BASE_URL` | Deploy to subdirectory | `/` |
| `VITE_WASM_PYMUPDF_URL` | PyMuPDF WASM module URL | `https://cdn.jsdelivr.net/npm/@bentopdf/pymupdf-wasm@0.11.16/` |
| `VITE_WASM_GS_URL` | Ghostscript WASM module URL | `https://cdn.jsdelivr.net/npm/@bentopdf/gs-wasm/assets/` |
| `VITE_WASM_CPDF_URL` | CoherentPDF WASM module URL | `https://cdn.jsdelivr.net/npm/coherentpdf/dist/` |
| `VITE_WASM_GS_URL` | Ghostscript WASM module URL | `https://cdn.jsdelivr.net/npm/@bentopdf/gs-wasm@0.1.1/assets/` |
| `VITE_WASM_CPDF_URL` | CoherentPDF WASM module URL | `https://cdn.jsdelivr.net/npm/coherentpdf@2.5.5/dist/` |
| `VITE_TESSERACT_WORKER_URL` | OCR worker script URL | _(empty; use Tesseract.js default CDN)_ |
| `VITE_TESSERACT_CORE_URL` | OCR core runtime directory | _(empty; use Tesseract.js default CDN)_ |
| `VITE_TESSERACT_LANG_URL` | OCR traineddata directory | _(empty; use Tesseract.js default CDN)_ |
@@ -110,6 +110,16 @@ docker run -d -p 3000:8080 bentopdf:custom
WASM module URLs are pre-configured with CDN defaults — all advanced features work out of the box. Override these for air-gapped or self-hosted deployments.
### Content-Security-Policy
The nginx image ships with an enforcing `Content-Security-Policy` header. The CSP's `script-src`, `connect-src`, and `font-src` directives are **generated at build time** from the `VITE_*` URL variables above — whatever hosts you pass via `--build-arg` are automatically added to the policy.
As a result:
- If you override `VITE_CORS_PROXY_URL` or `VITE_WASM_*_URL` at build time, the CSP permits those origins automatically — no extra config needed.
- If you configure custom WASM URLs at _runtime_ via the in-app Advanced Settings page, those origins are **not** in the CSP and the browser will block fetches to them. Runtime configuration is intended for experimentation; for permanent custom URLs set the matching `VITE_*` build arg.
- Air-gapped deployments that override all three `VITE_WASM_*_URL` values also get the public `cdn.jsdelivr.net` removed from CSP (each default is replaced, not appended). Similarly, setting `VITE_CORS_PROXY_URL` replaces the public `bentopdf-cors-proxy.bentopdf.workers.dev` default.
For OCR, leave the `VITE_TESSERACT_*` variables empty to use the default online assets, or set all three together for self-hosted/offline OCR. Partial OCR overrides are rejected because the worker, core runtime, and traineddata directory must match. For fully offline searchable PDF output, also set `VITE_OCR_FONT_BASE_URL` so the OCR text-layer fonts are loaded from your internal server instead of the public Noto font URLs.
`VITE_DEFAULT_LANGUAGE` sets the UI language for first-time visitors. Supported values: `en`, `ar`, `be`, `fr`, `de`, `es`, `zh`, `zh-TW`, `vi`, `tr`, `id`, `it`, `pt`, `nl`, `da`. Users can still switch languages — this only changes the default.

View File

@@ -203,8 +203,8 @@ These are set in `.env.production` and baked into the build:
```bash
VITE_WASM_PYMUPDF_URL=https://cdn.jsdelivr.net/npm/@bentopdf/pymupdf-wasm@0.11.16/
VITE_WASM_GS_URL=https://cdn.jsdelivr.net/npm/@bentopdf/gs-wasm/assets/
VITE_WASM_CPDF_URL=https://cdn.jsdelivr.net/npm/coherentpdf/dist/
VITE_WASM_GS_URL=https://cdn.jsdelivr.net/npm/@bentopdf/gs-wasm@0.1.1/assets/
VITE_WASM_CPDF_URL=https://cdn.jsdelivr.net/npm/coherentpdf@2.5.5/dist/
VITE_TESSERACT_WORKER_URL=
VITE_TESSERACT_CORE_URL=
VITE_TESSERACT_LANG_URL=

View File

@@ -26,6 +26,31 @@ The tool extracts PKCS#7 signature objects from the PDF, decodes the ASN.1 struc
- **Self-signed detection**: Is the certificate its own issuer?
- **Trust chain**: When a trusted certificate is provided, does the signer's certificate chain back to it?
### Cryptographic Verification
- The tool reconstructs the bytes covered by the signature's `/ByteRange`, hashes them with the algorithm the signer declared, and compares the result against the `messageDigest` attribute inside the signature.
- It then re-serializes the signed attributes as a DER SET and verifies the signature against the signer certificate's public key.
- **A signature is reported as valid only when all of these pass.** If the PDF bytes were modified after signing, or the embedded signature does not match the signer's key, the status shows "Invalid — Cryptographic Verification Failed" with the specific reason.
#### Supported Signature Algorithms
| Algorithm | Verification path |
| ----------------------- | -------------------------------------------------------------------------------------- |
| RSA (PKCS#1 v1.5) | node-forge `publicKey.verify`, with Web Crypto fallback |
| RSA-PSS (RSASSA-PSS) | Web Crypto `verify({name: 'RSA-PSS', saltLength})` |
| ECDSA P-256/P-384/P-521 | Web Crypto `verify({name: 'ECDSA', hash})` after DER → IEEE P1363 signature conversion |
If a signature uses an algorithm outside this list (for example Ed25519, SM2, or RSA with an unusual digest OID), the card shows **"Unverified — Unsupported Signature Algorithm"** in yellow, along with the specific OID or reason. This is a deliberate three-state distinction:
- **Valid** — signature cryptographically verified against the signer's public key.
- **Invalid** — verification ran and produced a negative result (bytes changed, key mismatch).
- **Unverified** — the tool could not run verification for this algorithm. The certificate metadata is still shown, but you should treat the signature as "unknown" and verify it with Adobe Acrobat or `openssl cms -verify`.
### Insecure Digest Algorithms
- Signatures using **MD5 or SHA-1** are rejected as invalid and flagged with "Insecure Digest" status. Both algorithms have published collision attacks, so a signature over a SHA-1 or MD5 hash offers no integrity guarantee.
- SHA-224, SHA-256, SHA-384, and SHA-512 are all accepted.
### Document Coverage
- **Full coverage**: The signature covers the entire PDF file, meaning no bytes were added or changed after signing.