Remove |safe filter from 6 template locations where data from external
search engine APIs was rendered as raw HTML without sanitization. Jinja2
autoescape now properly escapes these fields.
The |safe filter was originally added in commit 213041adc (March 2021)
by copying the pattern from result.title|safe and result.content|safe.
However, title and content are pre-escaped via escape() in webapp.py
lines 704-706 before highlight_content() adds trusted <span> tags for
search term highlighting. The metadata, info.value, link.url_label,
repository, and filename fields never go through any escaping and flow
directly from external API responses to the template.
Affected templates and their untrusted data sources:
- macros.html: result.metadata from DuckDuckGo, Reuters, Presearch,
Podcast Index, Fyyd, bpb, moviepilot, mediawiki, and others
- paper.html: result.metadata from academic search engines
- map.html: info.value and link.url_label from OpenStreetMap
user-contributed extratags
- code.html: result.repository and result.filename from GitHub API
Example exploit: a search engine API returning
metadata='<img src=x onerror=alert(document.cookie)>' would execute
arbitrary JavaScript in every user's browser viewing that result.
This patch adds a new result type: Code
- Python class: searx/result_types/code.py
- Jinja template: searx/templates/simple/result_templates/code.html
- CSS (less) client/simple/src/less/result_types/code.less
Signed-of-by: Markus Heiser <markus.heiser@darmarIT.de>
This patch adds GitHub Code Search [1] engine to allow querying the codebases.
Template code.html is changed to allow passthrough of strip and highlighting
options.
Engine Searchcode is adjusted to pass filename and not rely on hardcoded
extensions.
GitHub search code API does not return the exact code line indices, this
implementation assigns the code arbitrary numbers starting from 1
(effectively relabeling the code).
The API allows for unauth calls, and the default engine settings default to
that, although the calls are heavily rate limited.
The 'text' lexer is the default pygments lexer when parsing fails.
[1] https://docs.github.com/en/rest/search/search?apiVersion=2022-11-28#search-code
Co-authored-by: Markus Heiser <markus.heiser@darmarIT.de>