Configure crawl settings for a GitLab external content connector

Release version: Yokohama

Updated February 11, 2026

5 minutes to read

Specify the groups, projects, and repositories you want your GitLab external content connector to crawl. Select the issues, wikis, merge requests, tags, branches, and commits you want the crawl to retrieve and feed to AI Search for indexing.

Before you begin

A connector administrator must have already created the GitLab external content connector that you want to configure crawl settings for. To learn about this procedure, see Create a GitLab external content connector.

Role required: sn_ext_conn.xcc_admin

About this task

This task is optional. By default, the GitLab external content connector crawls content from all subgroups, projects, and repositories found in top-level groups owned by the GitLab.com user that it's configured to impersonate, and sends all supported content types (issues, wikis, merge requests, tags, branches, and commits) to AI Search for indexing. Only perform this task if you want the connector to use any of the following non-default settings:

Inclusion or exclusion filters for the subgroups to crawl when running content crawls
Inclusion or exclusion filters for the projects/repositories to crawl when running content crawls
Inclusion or exclusion filters for the types of content to retrieve from the source system when running content crawls
Inclusion or exclusion filters for the branches to retrieve from the source system when running content crawls

Content is only retrieved from the source system if it passes all of your configured crawl setting filters. If any crawl setting filter excludes a content item, the external content connector doesn't retrieve it.

Important:

By default, an external content connector can index up to one million (1,000,000) documents from its source system. When a connector exceeds this limit, it continues to crawl the source system, but only sends document deletions and updates to AI Search for indexing, ignoring new documents. The connector logs an error message for every 10,000 documents it crawls beyond the indexing limit.

When a connector's indexed document count exceeds 800,000, a warning message appears in the connector's UI to indicate that it's approaching the indexing limit. If the connector reaches the indexing limit, an error message appears in its UI.

If one of your connectors reaches the indexing limit, you can update its crawl settings and file inclusion/exclusion filters to reduce the number of documents it retrieves. Alternately, if you need to index more than 1,000,000 documents, you can create a Customer Service and Support case at https://support.servicenow.com/now to request a limit increase for the connector.

Procedure

Navigate to All > External Content Connectors > External Content Admin Home.
In the Connectors list, select the record for the GitLab external content connector whose settings you want to modify.
In the connector editor's Settings tab, select Crawl settings.
Select one of the following Group filtering options:
- To crawl all subgroups found in top-level groups owned by the connector's impersonated GitLab.com user account, select Crawl all groups.
- To crawl only a specified set of subgroups found in top-level groups owned by the connector's impersonated GitLab.com user account, select Include only these groups, then use the Add group URLs to include field and Add button to enter URLs for the groups that you want to include in the crawl.
  
  For example, you might enter https://gitlab.com/example-dot-com/production to include only searchable content from the production subgroup and all subgroups that it contains.
- To crawl all except a specified set of groups found in top-level groups owned by the connector's impersonated GitLab.com user account, select Exclude only these groups, then use the Add group URLs to exclude field and Add button to enter URLs for the groups that you want to exclude from the crawl.
  
  For example, you might enter https://gitlab.com/example-dot-com/test-* to exclude searchable content from all subgroups with names that start with test-.
Note:
Subgroup inclusion URLs can be specified as prefixes, with the wildcard character * at the end of the URL matching any string.
Select one of the following Project/repository filtering options:
- To crawl all projects and repositories owned by the connector's impersonated GitLab.com user account, select Crawl all projects/repositories.
- To crawl only a specified set of projects and repositories owned by the connector's impersonated GitLab.com user account, select Include only these projects/repositories, then use the Add project/repository URLs to include field and Add button to enter URLs for the projects and repositories that you want to include in the crawl.
  Note:
  Project and repository inclusion URLs can be specified as prefixes, with the wildcard character * at the end of the URL matching any string.
  
  For example, you might enter https://gitlab.com/example-dot-com/prod-* to include only searchable content from projects whose names start with prod-.
- To crawl all except a specified set of projects and repositories owned by the connector's impersonated GitLab.com user account, select Exclude only these projects/repositories, then use the Add project/repository URLs to exclude field and Add button to enter URLs for the projects and repositories that you want to exclude from the crawl.
  Note:
  Project and repository exclusion URLs can be specified as prefixes, with the wildcard character * at the end of the URL matching any string.
  
  For example, you might enter https://gitlab.com/example-dot-com/confidential273 to exclude searchable content from the confidential273 project.

Enable the Crawl content types options for the types of content you want to retrieve when you run content crawls.

The GitLab external content connector supports indexing of searchable content for these content types:


Content type	Searchable content indexed
Issues	Issue description
Wikis	MarkDown content converted to HTML (without attachments)
Merge requests	Merge request description (MarkDown) and discussions
Tags	Tag message
Branches	Commit message of head commit
Commits	Commit message

Important:

The GitLab external content connector doesn't support indexing of searchable content from any of these content types:

Commit, issue, and wiki discussions
Commit diffs
Content from archived groups or projects
Content from groups or projects in the pending deletion state
Content from subgroups of top-level groups that aren't owned by the impersonated GitLab.com user
Content of files attached to issues or merge requests
Content of wiki attachments in formats other than plain text (.txt)
Internal or confidential notes in merge request discussions
Repository files

If you included the Branches content type in step 6, use the Add branches to include in regex format field and the Add button to specify Java regular expression patterns matching the names of branches you want to include in content crawls.
As an example, you might specify ^2025.*$ to include branches with names that start with 2025, or specify ^.*$ to crawl all branches. To learn about Java regular expression pattern syntax, see the Javadoc for the java.regex.util.Pattern class.
Note:
The branch name expressions ^main$ and ^master$ are included by default. You can't remove these branches from the list.
Select Save and validate.

Result

The GitLab external content connector is updated with your modified crawl settings.

What to do next

To retrieve content from your GitLab source system using your modified crawl settings, create and run a one-time content crawl for your GitLab external content connector. To learn about creating and running one-time content crawls, see Create a content crawl for an external content connector.