Open Documentation Menu

Blocking Content when Websites are Crawled

Websites often contain elements, such as their navigation, that appear on every subpage and that are thus not considered desirable information where searches are concerned. OpenSearchServer allows you to block elements during the crawling process.

This is how it works

  1. Choose a web crawler index in the OpenSearchServer configuration dialog.

  2. Click Schema > Parser list > HTML parser.

  3. Edit the parser in the area XPATH Exclusion by blocking an area with a selector. To do this, specify the selector of a CSS class, for example, to ignore the entire surrounding block. The documentation of OpenSearchServer contains instructions on how to do this.

Example: //*[@class="mega-menu-container"]

Warning

The area you select in this manner will be completely ignored. Any URLs it contains, such as URLs in navigation menus, will then no longer be processed, and the crawler may not search all existing websites.

If possible, enter a site map URL from which the crawler can dynamically request all URLs to be searched on the tab Crawler > Site Map. Many content management systems provide a site map URL.