Description
- The Clickstream dataset is a user and event-level dataset that reports on every Page View and Conversion event tracked via the Rockerbox pixel on your site.
- These events are attributed back to click-based marketing touchpoints, with the Rockerbox tier structure and spend keys applied for standardization with other Rockerbox conversion datasets.
Table Creation
- This table is automatically created upon activation of the clickstream feature.
- Once the Clickstream dataset has been enabled by Rockerbox, the table will appear in your data warehouse as
ON_SITE_EVENTS_ALL_PAGES.
Partition Keys
date
Logical Primary Key
While data warehouses do not enforce primary key constraints, theevent_id functions as the logical primary key for the table.
Field Reference
| # | Name | Description | Type |
|---|---|---|---|
| 1 | action | Name of the raw pixel event. This includes events for all conversion segments configured in Rockerbox + a page view. | str |
| 2 | advertiser | Rockerbox Account ID | str |
| 3 | base_id | Primary User ID | str |
| 4 | date | Date when the action occurred | date |
| 5 | engaged_session | Binary 0 or 1 indicating a session lasting > 10 seconds (session_max - session_min > 10). | int |
| 6 | event_id | A unique identifier for each action. Can be used as the primary key. | str |
| 7 | hash_ip_events | Hashed IP address of user for a particular action | str |
| 8 | identifier | Advertiser-specific identifier | str |
| 9 | marketing_type | Type of marketing touchpoint. This will always be onsite as this dataset reflects click-based marketing events only. | str |
| 10 | onsite_count | The total number of actions seen against a given user within a given session. | int |
| 11 | original_url | URL of the page landing page | str |
| 12 | rb_sync_id | Identifier used by Rockerbox to sync dataset to your warehouse | int |
| 13 | report | The name of the report | str |
| 14 | request_referrer | Page Referrer (the previous site where the user came from) | str |
| 15 | session_id | Identifier for a session, indicated by a timestamp. A unique user session will be a combination of session_id|uid, or can be identified using session_start. | str |
| 16 | session_max | Timestamp of the last time a user was seen on site during a session | timestamp |
| 17 | session_min | Timestamp of the first time a user was seen on site during a session | timestamp |
| 18 | session_start | Binary field indicating the first event of a session. Can be used for session / visitor analysis by filtering for session_start = 1. | int |
| 19 | spend_key | The ID used to pull spend from an advertising platform. This is typically the Ad ID, but may differ based on your account setup. | str |
| 20 | tier_1 | Aligns to the 5-tiered categorization structure available in the Rockerbox UI. tier_1 = most broad categorization (level 1). | str |
| 21 | tier_2 | Aligns to the 5-tiered categorization structure available in the Rockerbox UI. tier_2 = level 2 categorization (more granular than level 1). | str |
| 22 | tier_3 | Aligns to the 5-tiered categorization structure available in the Rockerbox UI. tier_3 = level 3 categorization. | str |
| 23 | tier_4 | Aligns to the 5-tiered categorization structure available in the Rockerbox UI. tier_4 = level 4 categorization. | str |
| 24 | tier_5 | Aligns to the 5-tiered categorization structure available in the Rockerbox UI. tier_5 = level 5 categorization. | str |
| 25 | timestamp_action | Timestamp of when the action occurred | timestamp |
| 26 | timestamp_event | Timestamp of when the marketing touchpoint occurred. This will only appear on the first action within a session, as the only action that will have a marketing touchpoint. The timestamp_event and timestamp_action will match. | timestamp |
| 27 | transform_table_id | ID associated with the Rockerbox table used to apply mappings and spend. Closed beta feature only. | int |
| 28 | uid | Rockerbox User ID cookie | str |
| 29 | updated_at | Time the cache record was updated most recently | timestamp |
| 30 | user_agent | Web identifier that includes characteristics like browser, device, operating system, and application | str |
| 31 | utm_campaign | utm_campaign value parsed from the landing page URL, if present | str |
| 32 | utm_content | utm_content value parsed from the landing page URL, if present | str |
| 33 | utm_id | utm_id value parsed from the landing page URL, if present | str |
| 34 | utm_medium | utm_medium value parsed from the landing page URL, if present | str |
| 35 | utm_source | utm_source value parsed from the landing page URL, if present | str |
| 36 | utm_term | utm_term value parsed from the landing page URL, if present | str |
Clickstream FAQ
How does Rockerbox handle a session that spans two dates (UTC)
Rockerbox will break a session (generating a new session_id and date) if a user’s session is active across two days. This may be common if users are active around UTC midnight.How does Rockerbox define a session? How can I make sure a session is unique when I query against the dataset?
- The session_id used by Rockerbox is a timestamp of the first event in a session, vs the session_id cached in your browser. This allows Rockerbox to maintain the same session_id when a user opens a new tab.
- Because session_id is defined as a timestamp, it’s not unique per user. To identify all unique user sessions, join session_id|uid or filter for session_start = 1 in your query.
- A session “expires” after 30 minutes of inactivity. If the user is seen as active again, the session_id will be reset.
- A session is NOT re-set if new source information is provide (for example, a UTM on an internal page)
How do Rockerbox sessions compared to other sessions sources?
- Rockerbox’s source data is pixel based events, with sessions logic layered on top of source data to group disparate actions into connected sessions. Source pixel data may differ from session source data from other providers like Shopify, GA4, Amplitude, etc.
- Rockerbox’s session definiton (described above) may differ from other source session definitions and cannot be customized to match the definitions of other data providers.
How does Rockerbox handle bot traffic
- Today, the only filtering performed on top of your raw site data is to remove any uids seen > 200x in the same day. Additional bot filtering is not applied by default, knowing that brands often prefer a custom approach to this type of filtering.
Why are some fields for a given row in the dataset blank?
- Not all actions will have associated marketing context. Typically, only the first action in a session will pass along click-based marketing context in the URL. In your Rockerbox conversion data, the marketing context for each conversion event is carried over via this first action with marketing context. In this dataset, only the events that carry marketing context will have relevant fields populated to avoid any duplication.
Can I see non-click marketing context?
- Non-click marketing context like view-based data from Linear TV, OTT, Display, and Social as well as other marketing context like promo code attribution or direct mail matching is not currently available in this dataset.
How can I identify a repeat visitor?
- When a visitor returns to site, they may or make not have the same Rockerbox cookie ID (uid). To string together a user path when the uid is NOT the same, Rockerbox applies an identity resolution process to our conversion datasets. This is not currently available for this dataset.
- Users with > 200 events / day are excluded from this dataset under the assumption that these are admin users or server-side cookie IDs.
What is an engaged session?
- An engaged session is defined by a session where the differences of the session_max and session_min timestamps are > 10 seconds.
- By logic, this means that a session with only 1 action cannot be an engaged session, since the session_min and session_max timestamps will be the same
- The engaged_session flag carries through all events within a given session (eg sessions with 4 events that last > 10 seconds will have an engaged_session flag on each row). To compare engaged session starts to overall session start, filter for session_start = 1 in your queries.
Why does the Clickstream channel attribution differ from GA4?
- Rockerbox applies custom channel rules per advertiser using custom logic and parameters beyond UTMs, which can lead to variances in attribution categorization of a session
- Most advertisers have “Last Non-Direct Click Attribution” applied in GA4, meaning Direct attribution may be overwritten by the last non-Direct marketing channel seen against the same user.
How can I join the clickstream dataset to the Clickstream Event Paramters dataset?
- Each unique event_id in the Clickstream dataset will have multiple rows in the Event Parameter dataset, reflecting the individual parameters passed on the pixel. To retrieve specific query_param_name and value details, most advertisers will
- Join on the event_id and date
- Filter by a specific query_param_name
What is each query_param_name?
- The query_param_name and value fields are parsed directly from your on-site pixels with no further modifications applied. While certain parameters are required to be passed for Rockerbox implementation, in many cases additional parameters are also provided. Questions about what values are passed and what each means will likely need to be investigated by your team as the experts on your implementation and data layer, vs by the Rockerbox team.
- If you don’t see a certain query_param_name, check the name of the field in your implementation (ex in GTM). Otherwise, check if the query_param_name is passed anywhere on your pixel.