Create an Amazon Redshift metadata collector
Create a collector to import metadata from Amazon Redshift.
Before you begin
Before you begin, verify the following:
- A MID Server is set up for the collectors. For more information, see MID Server for metadata collectors.
- All prerequisite tasks are completed. For more information, see Prepare to run the Amazon Redshift collector.
- Role required: connection-admin
Procedure
- Navigate to All > Workflow Data Fabric > Workflow Data Fabric Home.
-
Select the Connect Hub
icon in the left sidebar.
- Select Create > Metadata collector.
- From the System list, select Redshift.
-
From the Connection type list, select one of the following:
- Select New connection to configure a new connection.
-
Select Existing connection to reuse an existing connection and select an existing connection from the Connections list.
The configuration form is filled with details from the existing connection. The name is appended with the word Copy and sensitive details like password aren't copied.
-
Complete the form.
Table 1. Redshift metadata collector form Field Description Connection name Unique identifier for the connection. This field can't be modified once the connection is established. Short description Purpose and details of the connection. -
Configure the authentication options.
Table 2. Authentication options Field Description Username Username to use to connect to the database. Password Password of the databsase user. -
From the schema collection options, select one of the following: Collect all schemas or Specify which schema to collect.
Table 3. Schema collection options Field Description Collect all schemas Collect all schemas Catalog all schemas to which the user has access. Exclude Schema Name or regular expression of the database schema to be excluded. Include Information Schema Include the database's Information Schema in catalog collection. Specify which schema to collect Specify which schema to collect Catalog only the specified schemas. Schema Name of the database schema to catalog. -
Configure the connection information.
Table 4. Connection information Field Description Server Hostname of the database server to connect to. Server port Port of the database server (if not the default). Database Name of the database to connect to. Specify multiple databases by adding one value per line. Excluded database Name or regular expression indicating databases not to catalog when the Database field is empty. Note:This parameter is ignored if the Database field is specified. -
Configure the harvesting scope and limits options.
Table 5. Harvesting scope and limits options Field Description Enable column statistics collection Enable harvesting of column statistics (that is, data profiling). Note:Enabling profiling can increase the collector's runtime, as the collector must read table data to generate profiling metadata.Target sample size for column statistics Number of rows sampled for computation of column statistics and string-value histograms. For example, to sample 1000 rows, set the parameter as: 1000. Default: 100000
Disable Lineage collection Skip harvesting of intra-database lineage metadata. Disable Extended Metadata collection Skip harvesting of extended metadata for data asset types such as database, schema, table, columns functions, stored procedures, user-defined types, synonyms. Basic metadata for these data asset types is still harvested. Enable Sample String Values collection Enable harvesting of sample values and histograms for columns containing string data. Exclude system functions Exclude system functions from metadata collection. -
Configure the connection and reliability options.
Table 6. Connection and reliability options Field Description Server environment Friendly name for the environment in which your database server runs when the server name is localhost. Used to differentiate it from other environments. Database ID Unique identifier for this database. Used to generate the database ID when the database name is not sufficiently unique. JDBC properties JDBC driver properties to pass through to driver connection. SQL parsing timeout Timeout in seconds for SQL parsing during lineage collection. Default: 60
- Select Save.
Result
The metadata collector is created and appears on the Connectors page with a Configured status. It is now ready to connect to the source system and harvest metadata.
What to do next
After creating the collector, you can perform any of the following tasks:
- Run the collector manually to harvest metadata immediately. See Run metadata collectors manually.
- Automate metadata collection by scheduling regular collector runs. See Schedule metadata collector runs.
- Monitor execution status and troubleshoot issues by viewing the runtime logs. See View runtime logs for collector runs.
- Discover and evaluate the harvested data assets in the Data Catalog. See Governing the Data Catalog.