I found a bypass using the Android Google App this time. However, unlike the iPhone GSA method, this one does have rate limits. Although it took a couple of hundred consecutive requests to trigger them.
* [enh] engines: rework bing engine
Only Bing-Web has been reworked.
Some features now require JavaScript (paging and time-range results).
Cookies no longer work, parameters such as `cc`, `ui`, ... alter the results.
The engine only appears to use the locale from `Accept-Language` header properly.
The rest of Bing's child engines (Bing-Image, Bing-Video, ...) seem to benefit
from using `mkt` param in conjunction with the `Accept-Language` header
override, although Bing-Web does not (?)
* [enh] explicit mkt
* [fix] engines: bing_videos.py
https://github.com/searxng/searxng/pull/5793#pullrequestreview-3881883250
Google recently changed the DOM structure for mobile-centric responses, causing the `google_videos` engine to return zero results and the main `google` engine to drop the majority of its results (due to missing snippets or failed URL parsing). These changes restore the functionality and improve the result count for both engines.
This patch updates the parsing logic for both the `google` and `google_videos` engines to handle the modern HTML structure returned by Google when using GSA (Google Search App) User-Agents.
**Specific changes include:**
* **Google Videos (`gov`)**:
* Updated title XPath to support `role="heading"`.
* Improved URL extraction to correctly decode Google redirectors (`/url?q=...`) using `unquote`.
* Added support for the `WRu9Cd` class to capture publication metadata (author/date).
* Broadened thumbnail search and added a fallback to YouTube's `hqdefault.jpg`.
* **Google Web**:
* Relaxed the strict snippet (`content`) requirement. Valid results are no longer discarded if a snippet is missing in the mobile UI.
* Hardened URL extraction to handle both direct and redirected URLs safely.
* Improved thumbnail extraction by searching the entire result block.
The online engines emulate a request as it would come from a web browser, which
is why the HTTP headers in the default settings should also be set the way a
standard web browser would set them.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Add support for albums containing multiple videos in iqiyi engine. When
albumInfo contains a "videos" list, process each video individually to
create separate search results for each episode/video instead of a single
result for the entire album.
Also get video length from `duration` instead of `subscriptContent`.
Signed-off-by: Hu Butui <hot123tea123@gmail.com>
For unknown locales, the return value of::
locales.get_locale(params['searxng_locale'])
is None which cuase the following issue::
ERROR searx.engines.presearch : exception : 'NoneType' object has no attribute 'territory'
Traceback (most recent call last):
File "search/processors/online.py", line 256, in search
search_results = self._search_basic(query, params)
File "search/processors/online.py", line 231, in _search_basic
self.engine.request(query, params)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
File "engines/presearch.py", line 153, in request
request_id, cookies = _get_request_id(query, params)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
File "engines/presearch.py", line 140, in _get_request_id
if l.territory:
^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'territory'
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Apparently, yep has been broken for a while. Measures to fix it:
- only use HTTP/1.1, because our HTTP2 client gets fingerprinted and blocked
- send the `Origin` HTTP header
For some reason, I keep getting this error from the brave engine:
httpx.DecodingError: BrotliDecoderDecompressStream failed while processing the stream
Forcing the server to use either gzip or deflate fixes this issue.
This makes the brave engine work when the server seems to be encoding brotli incorrectly, or at least in a way incompatible with certain installs.
Related:
- https://github.com/searxng/searxng/pull/1787
- https://github.com/searxng/searxng/pull/5536
* [mod] client/simple: client plugins
Defines a new interface for client side *"plugins"* that coexist with server
side plugin system. Each plugin (e.g., `InfiniteScroll`) extends the base
`ts Plugin`. Client side plugins are independent and lazy‑loaded via `router.ts`
when their `load()` conditions are met. On each navigation request, all
applicable plugins are instanced.
Since these are client side plugins, we can only invoke them once DOM is fully
loaded. E.g. `Calculator` will not render a new `answer` block until fully
loaded and executed.
For some plugins, we might want to handle its availability in `settings.yml`
and toggle in UI, like we do for server side plugins. In that case, we extend
`py Plugin` instancing only the information and then checking client side if
[`settings.plugins`](1ad832b1dc/client/simple/src/js/toolkit.ts (L134))
array has the plugin id.
* [mod] client/simple: rebuild static
there are other classes like 'site-name-content' we don't want to match,
however only using contains(@class, 'content') would e.g. also match `site-name-content`
thus, we explicitly also require the spaces as class separator
brave web:
xpath selectors needed to be justified
brave images & videos:
The JS code with the JS object was read incorrectly; not always, but quite
often, it led to exceptions when the Python data structure was created from it.
BTW: A complete review was conducted and corrections or additions were made to
the type definitions.
To test all brave engines in once::
!br !brimg !brvid !brnews weather
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This patch is based on PR #2792 (old PR from 2023)
- js_obj_str_to_python handle more cases
- bring tests from chompjs ..
- comment out tests do not pass
The tests from chompjs give some overview of what is not implemented.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>