Configure crawl settings for a Microsoft SharePoint Online external content connector
Specify the sites you want your Microsoft SharePoint Online external content connector to crawl. Define inclusion or exclusion filters to dictate the types of content the crawl retrieves and feeds to AI Search for indexing.
Before you begin
A connector administrator must have already created the Microsoft SharePoint Online external content connector that you want to configure crawl settings for. To learn about this procedure, see Create a Microsoft SharePoint Online external content connector.
Role required: sn_ext_conn.xcc_admin
About this task
- Inclusion or exclusion filters for the sites to crawl when running content crawls
- Inclusion or exclusion filters for individual content types (sites, lists, list items, attachments, and files)
- Inclusion or exclusion filters for the attachment file extensions to retrieve when running content crawls
- Exclusion of guest users from Entra ID when retrieving users for user mappings.
Content is only retrieved from the source system if it passes all of your configured crawl setting filters. If any crawl setting filter excludes a content item, the external content connector doesn't retrieve it.
By default, the Microsoft SharePoint Online external content connector can index up to ten million (10,000,000) content items from its source system. When the connector exceeds this limit, it continues to crawl the source system, but only sends content item deletions and updates to AI Search for indexing, ignoring new content items. The connector logs an error message for every 10,000 content items it crawls beyond the indexing limit.
When the connector's indexed content item count exceeds eight million (8,000,000) content items, a warning message appears in the connector's UI to indicate that it's approaching the indexing limit. If the connector reaches the indexing limit, an error message appears in its UI.
The Microsoft SharePoint Online external content connector can handle permissions for up to five hundred thousand (500,000) users and their groups. If the connector retrieves users in excess of this limit, user and group permissions may not be correctly applied to the connector's retrieved content. As a result, the content may not be searchable.
If your Microsoft SharePoint Online connector reaches the content indexing limit, you can update its crawl settings and file inclusion/exclusion filters to reduce the number of content items it retrieves. Alternatively, if you need the connector to index more than 1,000,000 content items, you can create a Customer Service and Support case at https://support.servicenow.com/now to request a limit increase for the connector.
Procedure
Result
The Microsoft SharePoint Online external content connector is updated with your crawl scope and file extension filter settings.
What to do next
To retrieve content from your Microsoft SharePoint Online source system using your modified crawl settings, create and run a one-time content crawl for your Microsoft SharePoint Online external content connector. To learn about creating and running one-time content crawls, see Create a content crawl for an external content connector.