API Reference / Crawler Configuration API / externalDataSources

externalDataSources

Type: object[]
Parameter syntax
{
  externalDataSources: [
    {
      dataSourceId: 'your_data_source_id',
      type: 'googleanalytics'|'csv',
      // if type is "googleanalytics"
      metrics: ['ga:metric'],
      startDate: 'startdate',
      endDate: 'enddate',
      credentials: {
        type: 'service_account',
        client_email: 'client_email',
        private_key:
          'privatekey',
        viewIds: ['view_id'],
      },
    },
  ],
}

About this parameter

Defines external data sources you want to retrieve during every crawl and make available to your extractor function.

There are two supported data sources: Google Analytics and CSV files.

Once you setup an externalDataSource, it is exposed your recordExtractor. You can access it through the dataSources object, which has the following structure:

1
2
3
4
{
  dataSourceId1: { data1: 'val1', data2: 'val2' },
  dataSourceId2: { data1: 'val1', data2: 'val2' },
}

You can add a maximum of 10 sources, which combined can provide a maximum of 11 millions URLs.

Examples

Adding a CSV to your externalDataSources

1
2
3
4
5
6
7
8
9
10
11
12
externalDataSources: [
  {
    dataSourceId: 'myPageviews',
    type: 'csv',
    url: 'http://www.example.com/pageviews.csv',
  },
  {
    dataSourceId: 'myCSV',
    type: 'csv',
    url: 'http://www.example.com/website-data.csv',
  },
],

Adding a GoogleAnalyticsto your externalDataSources

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
  externalDataSources: [
    {
      dataSourceId: 'myAnalytics',
      type: 'googleanalytics',
      metrics: ['google_analytics_metric1', 'google_analytics_metric2', ...],
      startDate: 'start_date',
      endDate: 'end_date',
      credentials: {
        type: 'account_type',
        client_email: 'example@my-project.iam.gserviceaccount.com',
        private_key:
          'your_google_analytics_private_key',
        viewIds: ['target_view_id1', 'target_view_id2', ...],
      },
    },
  ],
}

Parameters

externalDataSource

An external data source object from the provided list.

externalDataSource ➔ dataSource

dataSourceID
type: string
Required

Each external data source must have a unique identifier dataSoureId that will be needed to access the corresponding data from the extractors. Other properties will be used to connect to the data source.

type
type: string
Required

Type of data source. Supported values are: "googleanalytics", "csv".

metrics
type: string[]
if type is googleanalytics

List of metrics to fetch from Google Analytics for each URL. E.g. <a href='https://ga-dev-tools.appspot.com/dimensions-metrics-explorer/'>'ga:uniquePageViews']. See the full list of supported metrics on [Google Analytics’ API reference</a>. The value of each metric will be made available through the dataSources parameter of the extractors. Note: the 'ga:uniquePageViews' metric will systematically be included.

startData
type: string
default: 7daysAgo
Optional

Specify the date from which the analytics should be fetched. Its format should comply with ISO 8601. Google Analytics also supports values like ‘365daysAgo’.

endDate
type: string
default: today
Optional

Specify the ending date of the period for which the analytics should be fetched. Its format should comply with ISO 8601. Google Analytics also supports values like ‘365daysAgo’.

credentials
type: Object
default: false
if type is `googleanalytics`

Contains your google analytics credentials.

externalDataSource ➔ credentials ➔ credentials

type
type: string
Required

Type of authentication mechanism. So far, service_account is the only supported value for this property.

client_email
type: string
Required

Client email provided after creating a service account. It must have been given read permissions to the Google Analytics view(s).

private_key
type: string
Required

Private key provided after creating a service account.

viewIds
type: string[]
default: all credential accessible views
Optional

List of Google Analytics view identifiers to fetch data from.

Did you find this page helpful?