Snowflake Reader Account - Rockerbox Data Foundation Docs

Overview

With a Snowflake Reader Account, you can access Rockerbox datasets and egress the data to the platform of your choice.

You do not need to be a Snowflake customer to have access to a reader account.
You will not pay for any storage or compute for this account; these are billed to Rockerbox as the provider of the account.

📊 Rockerbox implements monthly credit quota on the account, and will work with you to ensure that you can run the egress out of Snowflake within the monthly credit quota.

Step 1: Access your reader account

Rockerbox support will provide you with your: (1) username (2) password (3) account URL
Login to your Snowflake reader account

Step 2: Sync Rockerbox Data Sets

Choose which datasets to share and create the share tables in the Rockerbox UI:
- Platform Data: Select the ad platforms you actively use.
- Rockerbox Data: Select Conversion and any Log Level MTA datasets for each conversion event that you need.
Click Sync this dataset when ready.

📦 aggregate_mta and taxonomy_lookup tables are automatically created in the data share.

⚠️ Backfill Notes

Platform Performance Schemas: No data is automatically backfilled. This can be backfilled on a limited basis upon request to support@rockerbox.com.
Rockerbox First Party Data Schemas: For each conversion dataset, Rockerbox will backfill based on the conversion event “First Reporting Date” in Rockerbox.
💡 If no date is set, Rockerbox backfills one day of data. This process can take up to 24 hours.

Step 3: Query your new data tables

Open a new SQL worksheet to access the query editor and confirm that you can run a query against a share table. Test setup

SELECT FROM <share_db>.public.aggregate_mta LIMIT 1;

Step 4: Build egress pipeline

Objective: Pull Rockerbox data out of a Snowflake reader account and load it to an external destination. General Notes

Run this script for each table that needs to be egressed.
Rockerbox recommends processing updates 3 times per day for the most recent 2 days.
For longer lookback windows, you can ingest updates on a rolling basis depending on your integrations. Work with Rockerbox Support to determine the right update frequency and configure the lookback appropriately.

Step 1: Create python connector

# https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-connect
 
con = snowflake.connector.connect(
    user='READER_ACCOUNT_USER',
    password='READER_ACCOUNT_PASSWORD',
    account='READER_ACCOUNT_ACCOUNT_IDENTIFIER'
)

Step 2: Create a list of dates to process for a given table

# Run query to identify files in the Snowflake external table that were updated in the last 24 hours
# Note: TABLE_NAME is dependent on the name of the table you defined in the Rockerbox UI
# Note: Adjust the lookback interval depending on how frequently you want to extract the data

DATABASE_NAME = "ROCKERBOX"
SCHEMA_NAME = "PUBLIC"
TABLE_NAME = "PLACEHOLDER"


with conn.cursor() as cur:
    MANIFEST_QUERY = f"""
    -- Replace this with SQL query from below depending on table schema
    SELECT * FROM <your_table>
    """
    
    # Results is a list of tuples that contains fields:
    #  - advertiser, report, identifier, date, registered_on
    #  - iterate through dates to load into external destination
    results = cur.execute(MANIFEST_QUERY).fetchall()

MANIFEST_QUERY for all schemas except `aggregate_mta`

SELECT
  SPLIT_PART(SPLIT_PART(file_name, '/', 2), '=', 2) as advertiser,
  SPLIT_PART(SPLIT_PART(file_name, '/', 5), '=', 2) as report,
  SPLIT_PART(SPLIT_PART(file_name, '/', 6), '=', 2) as identifier,
  SPLIT_PART(
      REGEXP_SUBSTR(
          file_name,
          'date=[0-9]{4}-[0-9]{2}-[0-9]{2}',
          1,
          1
      ),
      '=',
      2
  ) AS date,
  registered_on
FROM
    TABLE(
        {DATABASE_NAME}.information_schema.external_table_files(
            TABLE_NAME => '{DATABASE_NAME}.{SCHEMA_NAME}.{TABLE_NAME}'
        )
    )
WHERE
    last_modified >= CURRENT_TIMESTAMP - interval '24 hour'

MANIFEST_QUERY for `aggregate_mta` schema

SELECT
    file_name,
    SPLIT_PART(SPLIT_PART(file_name, '/', 3), '=', 2) as advertiser,
    SPLIT_PART(SPLIT_PART(file_name, '/', 2), '=', 2) as report,
    SPLIT_PART(SPLIT_PART(file_name, '/', 7), '=', 2) as conversion_event_id,
    SPLIT_PART(
        REGEXP_SUBSTR(
            file_name,
            'date=[0-9]{4}-[0-9]{2}-[0-9]{2}',
            1,
            1
        ),
        '=',
        2
    ) AS date,
    registered_on
FROM
    TABLE(
        {DATABASE_NAME}.information_schema.external_table_files(
            TABLE_NAME => '{DATABASE_NAME}.{SCHEMA_NAME}.aggregate_mta'
        )
    )
WHERE
    last_modified >= CURRENT_TIMESTAMP - interval '24 hour';

Step 3: Load the data to your destination

## INSERT CODE TO LOAD TO EXTERNAL DESTINATION HERE 
with conn.cursor() as cur:
    for result in results:
        date = result[3] # position of "date" in the output tuple
        results = cur.execute("""
        SELECT * FROM {DATABASE_NAME}.{SCHEMA_NAME}.{TABLE_NAME}
        WHERE date = '{date}'
        """).fetchall()

    # Write "results" output to external destination

Warehousing

​Overview

​Step 1: Access your reader account

​Step 2: Sync Rockerbox Data Sets

​⚠️ Backfill Notes

​Step 3: Query your new data tables

​Step 4: Build egress pipeline

​Step 1: Create python connector

​Step 2: Create a list of dates to process for a given table

​Step 3: Load the data to your destination