feat: separate AGPL libraries and add dynamic WASM loading

- Add WASM settings page for configuring external AGPL modules
- Implement dynamic loading for PyMuPDF, Ghostscript, and CoherentPDF
- Add Cloudflare Worker proxy for serving WASM files with CORS
- Update all affected tool pages to check WASM availability
- Add showWasmRequiredDialog for missing module configuration

Documentation:
- Update README, licensing.html, and docs to clarify AGPL components
  are not bundled and must be configured separately
- Add WASM-PROXY.md deployment guide with recommended source URLs
- Rename "CPDF" to "CoherentPDF" for consistency
This commit is contained in:
alam00000
2026-01-27 15:26:11 +05:30
parent f6d432eaa7
commit 2c85ca74e9
75 changed files with 9696 additions and 6587 deletions

92
cloudflare/WASM-PROXY.md Normal file
View File

@@ -0,0 +1,92 @@
# WASM Proxy Setup Guide
BentoPDF uses a Cloudflare Worker to proxy WASM library requests, bypassing CORS restrictions when loading AGPL-licensed components (PyMuPDF, Ghostscript, CoherentPDF) from external sources.
## Quick Start
### 1. Deploy the Worker
```bash
cd cloudflare
npx wrangler login
npx wrangler deploy -c wasm-wrangler.toml
```
### 2. Configure Source URLs
Set environment secrets with the base URLs for your WASM files:
```bash
# Option A: Interactive prompts
npx wrangler secret put PYMUPDF_SOURCE -c wasm-wrangler.toml
npx wrangler secret put GS_SOURCE -c wasm-wrangler.toml
npx wrangler secret put CPDF_SOURCE -c wasm-wrangler.toml
# Option B: Set via Cloudflare Dashboard
# Go to Workers & Pages > bentopdf-wasm-proxy > Settings > Variables
```
**Recommended Source URLs:**
- PYMUPDF_SOURCE: `https://cdn.jsdelivr.net/npm/@bentopdf/pymupdf-wasm@0.1.9/`
- GS_SOURCE: `https://cdn.jsdelivr.net/npm/@bentopdf/gs-wasm/assets/`
- CPDF_SOURCE: `https://cdn.jsdelivr.net/npm/coherentpdf/dist/`
> **Note:** You can use your own hosted WASM files instead of the recommended URLs. Just ensure your files match the expected directory structure and file names that BentoPDF expects for each module.
### 3. Configure BentoPDF
In BentoPDF's Advanced Settings (wasm-settings.html), enter:
| Module | URL |
| ----------- | ------------------------------------------------------------------- |
| PyMuPDF | `https://bentopdf-wasm-proxy.<your-subdomain>.workers.dev/pymupdf/` |
| Ghostscript | `https://bentopdf-wasm-proxy.<your-subdomain>.workers.dev/gs/` |
| CoherentPDF | `https://bentopdf-wasm-proxy.<your-subdomain>.workers.dev/cpdf/` |
## Custom Domain (Optional)
To use a custom domain like `wasm.bentopdf.com`:
1. Add route in `wasm-wrangler.toml`:
```toml
routes = [
{ pattern = "wasm.bentopdf.com/*", zone_name = "bentopdf.com" }
]
```
2. Add DNS record in Cloudflare:
- Type: AAAA
- Name: wasm
- Content: 100::
- Proxied: Yes
3. Redeploy:
```bash
npx wrangler deploy -c wasm-wrangler.toml
```
## Security Features
- **Origin validation**: Only allows requests from configured origins
- **Rate limiting**: 100 requests/minute per IP (requires KV namespace)
- **File type restrictions**: Only WASM-related files (.js, .wasm, .data, etc.)
- **Size limits**: Max 100MB per file
- **Caching**: Reduces origin requests and improves performance
## Self-Hosting Notes
1. Update `ALLOWED_ORIGINS` in `wasm-proxy-worker.js` to include your domain
2. Host your WASM files on any origin (R2, S3, or any CDN)
3. Set source URLs as secrets in your worker
## Endpoints
| Endpoint | Description |
| ------------ | -------------------------------------- |
| `/` | Health check, shows configured modules |
| `/pymupdf/*` | PyMuPDF WASM files |
| `/gs/*` | Ghostscript WASM files |
| `/cpdf/*` | CoherentPDF files |

View File

@@ -0,0 +1,356 @@
/**
* BentoPDF WASM Proxy Worker
*
* This Cloudflare Worker proxies WASM module requests to bypass CORS restrictions.
* It fetches WASM libraries (PyMuPDF, Ghostscript, CoherentPDF) from configured sources
* and serves them with proper CORS headers.
*
* Endpoints:
* - /pymupdf/* - Proxies to PyMuPDF WASM source
* - /gs/* - Proxies to Ghostscript WASM source
* - /cpdf/* - Proxies to CoherentPDF WASM source
*
* Deploy: cd cloudflare && npx wrangler deploy -c wasm-wrangler.toml
*
* Required Environment Variables (set in Cloudflare dashboard):
* - PYMUPDF_SOURCE: Base URL for PyMuPDF WASM files (e.g., https://cdn.example.com/pymupdf)
* - GS_SOURCE: Base URL for Ghostscript WASM files (e.g., https://cdn.example.com/gs)
* - CPDF_SOURCE: Base URL for CoherentPDF files (e.g., https://cdn.example.com/cpdf)
*/
const ALLOWED_ORIGINS = ['https://www.bentopdf.com', 'https://bentopdf.com'];
const MAX_FILE_SIZE_BYTES = 100 * 1024 * 1024;
const RATE_LIMIT_MAX_REQUESTS = 100;
const RATE_LIMIT_WINDOW_MS = 60 * 1000;
const CACHE_TTL_SECONDS = 604800;
const ALLOWED_EXTENSIONS = [
'.js',
'.mjs',
'.wasm',
'.data',
'.py',
'.so',
'.zip',
'.json',
'.mem',
'.asm.js',
'.worker.js',
'.html',
];
function isAllowedOrigin(origin) {
if (!origin) return true; // Allow no-origin requests (e.g., direct browser navigation)
return ALLOWED_ORIGINS.some((allowed) =>
origin.startsWith(allowed.replace(/\/$/, ''))
);
}
function isAllowedFile(pathname) {
const ext = pathname.substring(pathname.lastIndexOf('.')).toLowerCase();
if (ALLOWED_EXTENSIONS.includes(ext)) return true;
if (!pathname.includes('.') || pathname.endsWith('/')) return true;
return false;
}
function corsHeaders(origin) {
return {
'Access-Control-Allow-Origin': origin || '*',
'Access-Control-Allow-Methods': 'GET, HEAD, OPTIONS',
'Access-Control-Allow-Headers': 'Content-Type, Range, Cache-Control',
'Access-Control-Expose-Headers':
'Content-Length, Content-Range, Content-Type',
'Access-Control-Max-Age': '86400',
};
}
function handleOptions(request) {
const origin = request.headers.get('Origin');
return new Response(null, {
status: 204,
headers: corsHeaders(origin),
});
}
function getContentType(pathname) {
const ext = pathname.substring(pathname.lastIndexOf('.')).toLowerCase();
const contentTypes = {
'.js': 'application/javascript',
'.mjs': 'application/javascript',
'.wasm': 'application/wasm',
'.json': 'application/json',
'.data': 'application/octet-stream',
'.py': 'text/x-python',
'.so': 'application/octet-stream',
'.zip': 'application/zip',
'.mem': 'application/octet-stream',
'.html': 'text/html',
};
return contentTypes[ext] || 'application/octet-stream';
}
async function proxyRequest(request, env, sourceBaseUrl, subpath, origin) {
if (!sourceBaseUrl) {
return new Response(
JSON.stringify({
error: 'Source not configured',
message: 'This WASM module source URL has not been configured.',
}),
{
status: 503,
headers: {
...corsHeaders(origin),
'Content-Type': 'application/json',
},
}
);
}
const normalizedBase = sourceBaseUrl.endsWith('/')
? sourceBaseUrl.slice(0, -1)
: sourceBaseUrl;
const normalizedPath = subpath.startsWith('/') ? subpath : `/${subpath}`;
const targetUrl = `${normalizedBase}${normalizedPath}`;
if (!isAllowedFile(normalizedPath)) {
return new Response(
JSON.stringify({
error: 'Forbidden file type',
message: 'Only WASM-related file types are allowed.',
}),
{
status: 403,
headers: {
...corsHeaders(origin),
'Content-Type': 'application/json',
},
}
);
}
try {
const cacheKey = new Request(targetUrl, request);
const cache = caches.default;
let response = await cache.match(cacheKey);
if (!response) {
response = await fetch(targetUrl, {
headers: {
'User-Agent': 'BentoPDF-WASM-Proxy/1.0',
Accept: '*/*',
},
});
if (!response.ok) {
return new Response(
JSON.stringify({
error: 'Failed to fetch resource',
status: response.status,
statusText: response.statusText,
targetUrl: targetUrl,
}),
{
status: response.status,
headers: {
...corsHeaders(origin),
'Content-Type': 'application/json',
},
}
);
}
const contentLength = parseInt(
response.headers.get('Content-Length') || '0',
10
);
if (contentLength > MAX_FILE_SIZE_BYTES) {
return new Response(
JSON.stringify({
error: 'File too large',
message: `File exceeds maximum size of ${MAX_FILE_SIZE_BYTES / 1024 / 1024}MB`,
}),
{
status: 413,
headers: {
...corsHeaders(origin),
'Content-Type': 'application/json',
},
}
);
}
response = new Response(response.body, response);
response.headers.set(
'Cache-Control',
`public, max-age=${CACHE_TTL_SECONDS}`
);
if (response.status === 200) {
await cache.put(cacheKey, response.clone());
}
}
const bodyData = await response.arrayBuffer();
return new Response(bodyData, {
status: 200,
headers: {
...corsHeaders(origin),
'Content-Type': getContentType(normalizedPath),
'Content-Length': bodyData.byteLength.toString(),
'Cache-Control': `public, max-age=${CACHE_TTL_SECONDS}`,
'X-Proxied-From': new URL(targetUrl).hostname,
},
});
} catch (error) {
return new Response(
JSON.stringify({
error: 'Proxy error',
message: error.message,
}),
{
status: 500,
headers: {
...corsHeaders(origin),
'Content-Type': 'application/json',
},
}
);
}
}
export default {
async fetch(request, env, ctx) {
const url = new URL(request.url);
const pathname = url.pathname;
const origin = request.headers.get('Origin');
if (request.method === 'OPTIONS') {
return handleOptions(request);
}
if (!isAllowedOrigin(origin)) {
return new Response(
JSON.stringify({
error: 'Forbidden',
message:
'Origin not allowed. Add your domain to ALLOWED_ORIGINS if self-hosting.',
}),
{
status: 403,
headers: {
'Content-Type': 'application/json',
...corsHeaders(origin),
},
}
);
}
if (request.method !== 'GET' && request.method !== 'HEAD') {
return new Response('Method not allowed', {
status: 405,
headers: corsHeaders(origin),
});
}
if (env.RATE_LIMIT_KV) {
const clientIP = request.headers.get('CF-Connecting-IP') || 'unknown';
const rateLimitKey = `wasm-ratelimit:${clientIP}`;
const now = Date.now();
const rateLimitData = await env.RATE_LIMIT_KV.get(rateLimitKey, {
type: 'json',
});
const requests = rateLimitData?.requests || [];
const recentRequests = requests.filter(
(t) => now - t < RATE_LIMIT_WINDOW_MS
);
if (recentRequests.length >= RATE_LIMIT_MAX_REQUESTS) {
return new Response(
JSON.stringify({
error: 'Rate limit exceeded',
message: `Maximum ${RATE_LIMIT_MAX_REQUESTS} requests per minute.`,
}),
{
status: 429,
headers: {
...corsHeaders(origin),
'Content-Type': 'application/json',
'Retry-After': '60',
},
}
);
}
recentRequests.push(now);
await env.RATE_LIMIT_KV.put(
rateLimitKey,
JSON.stringify({ requests: recentRequests }),
{
expirationTtl: 120,
}
);
}
if (pathname.startsWith('/pymupdf/')) {
const subpath = pathname.replace('/pymupdf', '');
return proxyRequest(request, env, env.PYMUPDF_SOURCE, subpath, origin);
}
if (pathname.startsWith('/gs/')) {
const subpath = pathname.replace('/gs', '');
return proxyRequest(request, env, env.GS_SOURCE, subpath, origin);
}
if (pathname.startsWith('/cpdf/')) {
const subpath = pathname.replace('/cpdf', '');
return proxyRequest(request, env, env.CPDF_SOURCE, subpath, origin);
}
if (pathname === '/' || pathname === '/health') {
return new Response(
JSON.stringify({
service: 'BentoPDF WASM Proxy',
version: '1.0.0',
endpoints: {
pymupdf: '/pymupdf/*',
gs: '/gs/*',
cpdf: '/cpdf/*',
},
configured: {
pymupdf: !!env.PYMUPDF_SOURCE,
gs: !!env.GS_SOURCE,
cpdf: !!env.CPDF_SOURCE,
},
}),
{
status: 200,
headers: {
...corsHeaders(origin),
'Content-Type': 'application/json',
},
}
);
}
return new Response(
JSON.stringify({
error: 'Not Found',
message: 'Use /pymupdf/*, /gs/*, or /cpdf/* endpoints',
}),
{
status: 404,
headers: {
...corsHeaders(origin),
'Content-Type': 'application/json',
},
}
);
},
};

View File

@@ -0,0 +1,69 @@
name = "bentopdf-wasm-proxy"
main = "wasm-proxy-worker.js"
compatibility_date = "2024-01-01"
# =============================================================================
# DEPLOYMENT
# =============================================================================
# Deploy this worker:
# cd cloudflare
# npx wrangler deploy -c wasm-wrangler.toml
#
# Set environment secrets (one of the following methods):
# Option A: Cloudflare Dashboard
# Go to Workers & Pages > bentopdf-wasm-proxy > Settings > Variables
# Add: PYMUPDF_SOURCE, GS_SOURCE, CPDF_SOURCE
#
# Option B: Wrangler CLI
# npx wrangler secret put PYMUPDF_SOURCE -c wasm-wrangler.toml
# npx wrangler secret put GS_SOURCE -c wasm-wrangler.toml
# npx wrangler secret put CPDF_SOURCE -c wasm-wrangler.toml
# =============================================================================
# WASM SOURCE URLS
# =============================================================================
# Set these as secrets in the Cloudflare dashboard or via wrangler:
#
# PYMUPDF_SOURCE: Base URL to PyMuPDF WASM files
# Example: https://cdn.jsdelivr.net/npm/@bentopdf/pymupdf-wasm/assets
# https://your-bucket.r2.cloudflarestorage.com/pymupdf
#
# GS_SOURCE: Base URL to Ghostscript WASM files
# Example: https://cdn.jsdelivr.net/npm/@bentopdf/gs-wasm/assets
# https://your-bucket.r2.cloudflarestorage.com/gs
#
# CPDF_SOURCE: Base URL to CoherentPDF files
# Example: https://cdn.jsdelivr.net/npm/coherentpdf/cpdf
# https://your-bucket.r2.cloudflarestorage.com/cpdf
# =============================================================================
# USAGE FROM BENTOPDF
# =============================================================================
# In BentoPDF's WASM Settings page, configure URLs like:
# PyMuPDF: https://wasm.bentopdf.com/pymupdf/
# Ghostscript: https://wasm.bentopdf.com/gs/
# CoherentPDF: https://wasm.bentopdf.com/cpdf/
# =============================================================================
# RATE LIMITING (Optional but recommended)
# =============================================================================
# Create KV namespace:
# npx wrangler kv namespace create "RATE_LIMIT_KV"
#
# Then uncomment and update the ID below:
# [[kv_namespaces]]
# binding = "RATE_LIMIT_KV"
# id = "<YOUR_KV_NAMESPACE_ID>"
# Use the same KV namespace as the CORS proxy if you want shared rate limiting
[[kv_namespaces]]
binding = "RATE_LIMIT_KV"
id = "b88e030b308941118cd484e3fcb3ae49"
# =============================================================================
# CUSTOM DOMAIN (Optional)
# =============================================================================
# If you want a custom domain like wasm.bentopdf.com:
# routes = [
# { pattern = "wasm.bentopdf.com/*", zone_name = "bentopdf.com" }
# ]