Guides / Algolia AI / Algolia Recommend

Uploading Existing Events Via CSV

Alternative way of capturing events

Capturing events through the API is important for continuous training and improvement of the models.
It can take time until your users generate enough events for Recommend.
After implementing events ingestion, you can import past events to benefit from Recommend earlier by uploading a CSV File in the Algolia dashboard.

This feature is in beta. By joining the beta program, you understand that Algolia Recommend’s CSV upload might not work for your use case.

You can upload your events when configuring your model in the Collect Events section in the Algolia dashboard.

Your events must respect the following format:

  • The CSV file must be 100 MB or less in size.
  • Each row should represent an event tied to a single objectID.
  • The timestamps should cover a period of time of at least 30 days. The data should be as current as possible. When the model trains, it will ignore any data that is more than 90 days old.
  • The first row must contain userToken, timestamp, objectID, eventType, and eventName. Any extra columns are ignored.

  • The values must match the following criteria:
    • userToken: a unique identifier for the user session.
    • timestamp: the date of the event in standard format (ISO8601 or RFC3339) - with or without the time.
    • objectID: a unique identifier for the item the event is tied to.
    • eventType: the type of event (either ‘click’ or ‘conversion’).
    • eventName: a name for the event, which can be the same as eventType. After you upload a well-formatted file containing enough events, the model can start training. If you decide to re-upload a file, the training will take only the newer file into account and will discard the old one.

Recommend models rely only on the timestamp values to determine the most recent window in which there is enough data to train.
For example, if all the events you upload have a timestamp older than 90 days, the models won’t have any valid events for training.
Once you send enough events to train the model, Algolia Recommend only uses those events for training and discards the events from the CSV file.

Exporting Google Analytics events through BigQuery

To export Google Analytics (GA360) data from BigQuery, you must have:

The productSKU from GA360 has to match the objectID in your index.

You can use the query below to export the data required to train both models. In the following code, you must replace:

  • GCP_PROJECT_ID with the name of the project that holds the GA360 data in BigQuery.
  • BQ_DATASET with the name of the dataset the exports are stored in.
  • DATE_FROM and DATE_TO with the corresponding dates (in YYYY-MM-DD format) for a time window of at least 30 days.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
WITH ecommerce_data AS (
    SELECT
        fullVisitorId as user_token,
        TIMESTAMP_SECONDS(visitStartTime + CAST(hits.time/1000 AS INT64)) as timestamp,
        products.productSKU as object_ids,
        CASE WHEN hits.eCommerceAction.action_type = "2" THEN 'click'
            WHEN hits.eCommerceAction.action_type = "3" THEN 'click'
            WHEN hits.eCommerceAction.action_type = "5" THEN 'click'
            WHEN hits.eCommerceAction.action_type = "6" THEN 'conversion'
        END
        AS event_type,
        CASE WHEN hits.eCommerceAction.action_type = "2" THEN "product_view"
            WHEN hits.eCommerceAction.action_type = "3" THEN "add_to_cart"
            WHEN hits.eCommerceAction.action_type = "5" THEN "checkout"
            WHEN hits.eCommerceAction.action_type = "6" THEN "purchase"
        END
        AS event_name
    FROM
        `GCP_PROJECT_ID.BQ_DATASET.ga_sessions_*`,
        UNNEST(hits) as hits,
        UNNEST(hits.product) as products
    WHERE
        _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE('DATE_FROM')) AND FORMAT_DATE('%Y%m%d',DATE('DATE_TO'))
    AND
        fullVisitorId IS NOT NULL
    AND hits.eCommerceAction.action_type in UNNEST(["2", "3", "5", "6"])
),
dedup_ecommerce_data AS (
    SELECT user_token as userToken,
    timestamp, event_name as eventName,
    event_type as eventType,
    object_id as objectID
    FROM ecommerce_data
    GROUP BY userToken, timestamp, eventName, eventType, objectID
)

select * from dedup_ecommerce_data

You can run this query in the SQL workplace for BigQuery. You can then export the results as a CSV file to Google Drive where it’s available for download.
You can also automate this task with the BigQuery API client libraries.

Did you find this page helpful?