Tools / Crawler / login
Type: object
Parameter syntax
login: {
  fetchRequest: {
    url: 'your_url',
    requestOptions: {
      ...
    }
  }
}

login: {
  browserRequest: {
    url: 'your_login_page',
    username: 'login',
    password: 'password',
  }
}

About this parameter

This property defines how the crawler acquires a session cookie.

The crawler extracts the Set-Cookie response header from the login page and sends that cookie when crawling all pages of the website defined in the configuration.

This cookie is only fetched at the beginning of each complete crawl. If it expires, we won’t renew it automatically.

There are two way the crawler can interact with your login page:

  • By doing a direct request with the credentials to your login endpoint, like a standard cURL command.
  • By emulating a web browser, loading your login page, entering the credentials and validating the login form.

Examples

1
2
3
4
5
6
7
8
9
10
11
12
13
{
  login: {
    fetchRequest: {
      url: 'http://example.com/secure/login-with-post',
      requestOptions: {
        method: 'POST',
        headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
        body: 'id=my-id&password=my-password',
        timeout: 5000 // in milliseconds
      }
    }
  }
}
1
2
3
4
5
6
7
8
9
  {
    login: {
      browserRequest: {
        url: 'http://example.com/secure/login-page',
        username: 'my-id',
        password: 'my-password',
      }
    }
  }

Parameters

fetchRequest

This allows you to manually craft the login request that the crawler sends.

url
type: string
Required

The URL to target.

requestOptions
type: Object

This object is passed to our extended version of the request library.

fetchRequest ➔ requestOptions

method
type: string
default: GET

The HTTP method to use.

headers
type: object
default: {}

HTTP headers to pass.

body
type: string

The body of the request.

timeout
type: number

Time to wait before aborting the request (in milliseconds).

requestOptions ➔ headers

Content-Type
type: string
Authorization
type: string
type: string

browserRequest

Make the crawler use a web browser to visit your login page and validate the login form like a human would do.

url
type: string
Required

The URL of the login page. The HTML elements expected on this page are input[type=text] or input[type=email] for the username and input[type=password] for the password.

username
type: string
Required

The username

password
type: string
Required

The password

waitTime
type: object
Optional

Determines the shortest and longest wait time before considering the login done.

browserRequest ➔ waitTime

min
type: number
default: 0
Optional

If the login ends faster than this minimum execution time, the browser remains open at least this long before returning the cookies.

max
type: number
default: 20000
Optional

At this maximum execution time threshold, the execution stops and the cookies are returned as is.

Did you find this page helpful?