Configure crawl settings for a GitHub Enterprise Cloud external content connector

Release version: Yokohama

Updated October 28, 2025

3 minutes to read

Specify the public and internal repositories you want your GitHub Enterprise Cloud external content connector to crawl. Define inclusion or exclusion filters to dictate the types of content the crawl retrieves and feeds to AI Search for indexing.

Before you begin

A connector administrator must have already created the GitHub Enterprise Cloud external content connector that you want to configure crawl settings for. To learn about this procedure, see Create a GitHub Enterprise Cloud external content connector.

Role required: sn_ext_conn.xcc_admin

About this task

This task is optional. By default, the GitHub Enterprise Cloud external content connector crawls all public and internal repositories from its specified source system and sends all commits, issues, and pull requests to AI Search for indexing. Only perform this task if you want the connector to use any of the following non-default settings:

Ignore one or more of the default content types when running content crawls
- Commits
- Issues
- Pull requests
Inclusion or exclusion filters for the repositories to crawl when running content crawls

Content is only retrieved from the source system if it passes all of your configured crawl setting filters. If any crawl setting filter excludes a content item, the external content connector doesn't retrieve it.

Important:

By default, an external content connector can index up to one million (1,000,000) documents from its source system. When a connector exceeds this limit, it continues to crawl the source system, but only sends document deletions and updates to AI Search for indexing, ignoring new documents. The connector logs an error message for every 10,000 documents it crawls beyond the indexing limit.

When a connector's indexed document count exceeds 800,000, a warning message appears in the connector's UI to indicate that it's approaching the indexing limit. If the connector reaches the indexing limit, an error message appears in its UI.

If one of your connectors reaches the indexing limit, you can update its crawl settings and file inclusion/exclusion filters to reduce the number of documents it retrieves. Alternately, if you need to index more than 1,000,000 documents, you can create a Customer Service and Support case at https://support.servicenow.com/now to request a limit increase for the connector.

Procedure

Navigate to All > External Content Connectors > External Content Admin Home.
In the Connectors list, select the record for the GitHub Enterprise Cloud external content connector whose settings you want to modify.
In the connector editor's Settings tab, select Crawl settings.
In the Repositories section, select the options for the types of item you want the connector to retrieve searchable content and metadata from.
Select one of the following Repositories options:
- To include content from items in all repositories from the source system, select Crawl all repositories.
- To include only content from items in a specified set of repositories, select Include only these repositories, then use the Add repository URLs to include field and Add button to enter URLs for repositories you want the connector to include when crawling.
  
  As an example, you might enter https://github.com/example/production to only retrieve searchable content from items in the specified repository.
- To exclude content from items in a specified set of repositories, select Exclude only these repositories, then use the Add repository URLs to exclude field and Add button to enter URLs for repositories you want the connector to exclude when crawling.
  
  As an example, you might enter https://github.com/example/beta to exclude searchable content from items in the specified repository.
Select Save and validate.

Result

The GitHub Enterprise Cloud external content connector is updated with your modified crawl settings.

What to do next

To retrieve content from your GitHub Enterprise Cloud source system using your modified crawl settings, create and run a one-time content crawl for your GitHub Enterprise Cloud external content connector. To learn about creating and running one-time content crawls, see Create a content crawl for an external content connector.