Blocking Content when Websites are Crawled

d.velop enterprise search for OpenSearchServer

Search documentation

2022.q4

Basic information on the application

Installing and Uninstalling

Configuring the OpenSearchServer search provider

Frequently Asked Questions

Additional information sources and imprint

Open Documentation Menu

Blocking Content when Websites are Crawled

Websites often contain elements, such as their navigation, that appear on every subpage and that are thus not considered desirable information where searches are concerned. OpenSearchServer allows you to block elements during the crawling process.

This is how it works

Choose a web crawler index in the OpenSearchServer configuration dialog.
Click Schema > Parser list > HTML parser.
Edit the parser in the area XPATH Exclusion by blocking an area with a selector. To do this, specify the selector of a CSS class, for example, to ignore the entire surrounding block. The documentation of OpenSearchServer contains instructions on how to do this.

Example: //*[@class="mega-menu-container"]

Warning

The area you select in this manner will be completely ignored. Any URLs it contains, such as URLs in navigation menus, will then no longer be processed, and the crawler may not search all existing websites.

If possible, enter a site map URL from which the crawler can dynamically request all URLs to be searched on the tab Crawler > Site Map. Many content management systems provide a site map URL.

Adding an OpenSearchServer Connection

Frequently Asked Questions

PDF

Help us become better!

Blocking Content when Websites are Crawled

Warning

Help us become better!