Configure crawl settings

Australia ServiceNow AI Platform Administration

Release

australia

ft:locale

en-US

ft:publication_title

Australia ServiceNow AI Platform Administration

ft:clusterId

platadm

bundleId

platadm

workflow

Platform

Configure crawl settings for a WordPress external content connector

Release version: Australia

Updated May 15, 2026

7 minutes to read

Specify the pages and blog posts you want your WordPress external content connector to crawl. Define inclusion or exclusion filters to dictate the types of content the crawl retrieves and feeds to AI Search for indexing.

Before you begin

A connector administrator must have already created the WordPress external content connector that you want to configure crawl settings for. To learn about this procedure, see Create a WordPress external content connector.

Role required: sn_ext_conn.xcc_admin

About this task

This task is optional. By default, the WordPress external content connector crawls all pages and blog posts from its specified source system and sends attachments with all supported file extensions to AI Search for indexing. Only perform this task if you want the connector to use any of the following non-default settings:

Inclusion or exclusion filters for the pages and blog posts to crawl when running content crawls
Inclusion or exclusion filters for the file extensions to retrieve when running content crawls

Content is only retrieved from the source system if it passes all of your configured crawl setting filters. If any crawl setting filter excludes a content item, the external content connector doesn't retrieve it.

Important:

By default, each external content connector can index up to one million (1,000,000) content items from its source system. When a connector exceeds this limit, it continues to crawl the source system, but only sends content item deletions and updates to AI Search for indexing, ignoring new content items. The connector logs an error message for every 10,000 content items it crawls beyond the indexing limit.

When a connector's indexed content item count exceeds 800,000, a warning message appears in the connector's UI to indicate that it's approaching the indexing limit. If the connector reaches the indexing limit, an error message appears in its UI.

External content connectors that support user permissions crawls can handle permissions for up to five hundred thousand (500,000) users and their groups. If a connector retrieves users in excess of this limit, user and group permissions may not be correctly applied to the connector's retrieved content. As a result, the content may not be searchable.

If one of your connectors reaches the content indexing limit, you can update its crawl settings and file inclusion/exclusion filters to reduce the number of content items it retrieves. Alternatively, if you need a connector to index more than 1,000,000 content items, you can create a Customer Service and Support case at https://support.servicenow.com/now to request a limit increase for the connector.

Procedure

Navigate to All > External Content Connectors > External Content Admin Home.
In the Connectors list, select the record for the WordPress external content connector whose settings you want to modify.
In the connector editor's Settings tab, select Crawl settings.
Select one of the following Pages options:
- To crawl all pages from the source system, select Crawl all pages.
- To crawl only a specified set of pages from the source system, select Include only these pages, then use the Add page URL to include field and Add button to enter URLs for pages you want the connector to include when crawling.
  
  As an example, you might enter https://example.com/mainSite/mainPage to only retrieve searchable content from this page.
- To crawl all but a specified set of pages from the source system, select Exclude only these pages, then use the Add page URL to exclude field and Add button to enter URLs for pages you want the connector to exclude when crawling.
  
  As an example, you might enter https://example.com/testSite/testPage to exclude searchable content from this page.
Select one of the following Blog posts options:
- To crawl all blog posts from the source system, select Crawl all blog posts.
- To crawl only a specified set of blog posts from the source system, select Include only these blog posts, then use the Add URL to include field and Add button to enter URLs for blog posts you want the connector to include when crawling.
  
  As an example, you might enter https://example.com/blog/latestContent to only retrieve searchable content from this blog post.
- To crawl all but a specified set of blog posts from the source system, select Exclude only these blog posts, then use the Add URL to exclude field and Add button to enter URLs for blog posts you want the connector to exclude when crawling.
  
  As an example, you might enter https://example.com/blog/testPost to exclude searchable content from this blog post.
Select one of the following Categories options:
- To retrieve blog posts that belong to all categories from the source system, select Crawl all categories.
- To retrieve only blog posts belonging to a specified set of categories from the source system, select Include only these categories, then use the Categories to include field and Add button to enter URLs for categories you want the connector to include when crawling blog posts.
  
  As an example, you might enter Company news, Tech blog, and Press events to only retrieve searchable content from blog posts that belong to the specified categories.
- To retrieve blog posts belonging to all but a specified set of categories from the source system, select Exclude only these blog posts, then use the Categories to exclude field and Add button to enter URLs for categories you want the connector to exclude when crawling blog posts.
  
  As an example, you might enter Testing to exclude searchable content from blog posts that belong to the specified category.
Select one of the following Date options:
- To retrieve pages and blog posts from the source system that were modified in the last year, select Modified last year.
- To retrieve pages and blog posts from the source system that were modified in the last quarter, select Modified last quarter.
- To retrieve pages and blog posts from the source system that were modified in the last month, select Modified last month.
- To retrieve pages and blog posts from the source system that were modified in the last week, select Modified last week.
Select one of the following State options:
- To retrieve pages and blog posts with all states from the source system, select Crawl all states.
- To retrieve only pages and blog posts with a specified set of states from the source system, select Include only these states, then use the States to include field and Add button to enter states you want the connector to include when crawling.
  
  As an example, you might enter publish to only retrieve pages and blog that have this state.
- To retrieve pages and blog posts with all but a specified set of states from the source system, select Exclude only these states, then use the States to exclude field and Add button to enter states you want the connector to exclude when crawling.
  
  As an example, you might enter private, future, pending, and draft to exclude pages and blog that have any of the specified states.
Select one of the following Attachments options:
- To retrieve all attachments with supported file extensions from the source system, select Crawl all attachments.
- To retrieve only attachments with specified file extensions from the source system, select Include only these file extensions, then use the File extensions to include field to enter attachment file extensions you want the connector to include when crawling.
  
  As an example, you might enter .docx to retrieve only attachments with the Microsoft Word file format.
- To retrieve all attachments except those with specified file extensions from the source system, select Exclude only these file extensions, then use the File extensions to exclude field to enter attachment file extensions you want the connector to exclude when crawling.
  
  As an example, you might enter .csv to exclude attachments with the Comma-Separated Values (CSV) file format.
For details on the supported attachment file extensions, see Binary file extensions supported in External Content Connectors.
Optional: If you want AI Search to automatically generate captions for content in attachments and files retrieved by the connector, select the Multimodal captions option.
When you select this option, the Platform Multimodal Service automatically generates descriptive captions for images, tables, charts, and other visual elements found in retrieved attachments and files. You can discover these retrieved attachments and files by searching for terms from their generated captions.
This option is only available when the Platform Multimodal Service plugin is activated on your instance.
- For details on activating the plugin, see Activate the Platform Multimodal Service plugin.
- To learn how to select the VLM (visual learning model) provider and model used for the Platform Multimodal Service, see Configure multimodal captioning for AI Search.
Select Save and validate.

Result

The WordPress external content connector is updated with your modified crawl settings.

What to do next

To retrieve content from your WordPress source system using your modified crawl settings, create and run a one-time content crawl for your WordPress external content connector. To learn about creating and running one-time content crawls, see Create a content crawl for an external content connector.