Skip to main content

Overview

With a Snowflake Reader Account, you can access Rockerbox datasets and  egress the data to the platform of your choice.
  • You do not need to be a Snowflake customer to have access to a reader account.
  • You will not pay for any storage or compute for this account; these are billed to Rockerbox as the provider of the account.
📊 Rockerbox implements monthly credit quota on the account, and will work with you to ensure that you can run the egress out of Snowflake within the monthly credit quota.

Step 1: Access your reader account 

  • Rockerbox support will provide you with your: (1) username (2) password (3) account URL
  • Login to your Snowflake reader account

Step 2: Sync Rockerbox Data Sets

  • Choose which datasets to share and create the share tables in the Rockerbox UI:
    • Platform Data: Select the ad platforms you actively use.
    • Rockerbox Data: Select Conversion and any Log Level MTA datasets for each conversion event that you need.
  • Click Sync this dataset when ready.
📦 aggregate_mta and taxonomy_lookup tables are automatically created in the data share.

⚠️ Backfill Notes

  • Platform Performance Schemas: No data is automatically backfilled. This can be backfilled on a limited basis upon request to support@rockerbox.com.
  • Rockerbox First Party Data Schemas: For each conversion dataset, Rockerbox will backfill based on the conversion event “First Reporting Date” in Rockerbox.
    💡 If no date is set, Rockerbox backfills one day of data. This process can take up to 24 hours.

Step 3: Query your new data tables

Open a new SQL worksheet to access the query editor and confirm that you can run a query against a share table. Test setup
SELECT FROM <share_db>.public.aggregate_mta LIMIT 1;

Step 4: Build egress pipeline

Objective: Pull Rockerbox data out of a Snowflake reader account and load it to an external destination. General Notes
  • Run this script for each table that needs to be egressed.
  • Rockerbox recommends processing updates 3 times per day for the most recent 2 days.
  • For longer lookback windows, you can ingest updates on a rolling basis depending on your integrations. Work with Rockerbox Support to determine the right update frequency and configure the lookback appropriately.

Step 1: Create python connector

# https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-connect
 
con = snowflake.connector.connect(
    user='READER_ACCOUNT_USER',
    password='READER_ACCOUNT_PASSWORD',
    account='READER_ACCOUNT_ACCOUNT_IDENTIFIER'
)

Step 2: Create a list of dates to process for a given table

# Run query to identify files in the Snowflake external table that were updated in the last 24 hours
# Note: TABLE_NAME is dependent on the name of the table you defined in the Rockerbox UI
# Note: Adjust the lookback interval depending on how frequently you want to extract the data

DATABASE_NAME = "ROCKERBOX"
SCHEMA_NAME = "PUBLIC"
TABLE_NAME = "PLACEHOLDER"


with conn.cursor() as cur:
    MANIFEST_QUERY = f"""
    -- Replace this with SQL query from below depending on table schema
    SELECT * FROM <your_table>
    """
    
    # Results is a list of tuples that contains fields:
    #  - advertiser, report, identifier, date, registered_on
    #  - iterate through dates to load into external destination
    results = cur.execute(MANIFEST_QUERY).fetchall()
SELECT
  SPLIT_PART(SPLIT_PART(file_name, '/', 2), '=', 2) as advertiser,
  SPLIT_PART(SPLIT_PART(file_name, '/', 5), '=', 2) as report,
  SPLIT_PART(SPLIT_PART(file_name, '/', 6), '=', 2) as identifier,
  SPLIT_PART(
      REGEXP_SUBSTR(
          file_name,
          'date=[0-9]{4}-[0-9]{2}-[0-9]{2}',
          1,
          1
      ),
      '=',
      2
  ) AS date,
  registered_on
FROM
    TABLE(
        {DATABASE_NAME}.information_schema.external_table_files(
            TABLE_NAME => '{DATABASE_NAME}.{SCHEMA_NAME}.{TABLE_NAME}'
        )
    )
WHERE
    last_modified >= CURRENT_TIMESTAMP - interval '24 hour'
SELECT
    file_name,
    SPLIT_PART(SPLIT_PART(file_name, '/', 3), '=', 2) as advertiser,
    SPLIT_PART(SPLIT_PART(file_name, '/', 2), '=', 2) as report,
    SPLIT_PART(SPLIT_PART(file_name, '/', 7), '=', 2) as conversion_event_id,
    SPLIT_PART(
        REGEXP_SUBSTR(
            file_name,
            'date=[0-9]{4}-[0-9]{2}-[0-9]{2}',
            1,
            1
        ),
        '=',
        2
    ) AS date,
    registered_on
FROM
    TABLE(
        {DATABASE_NAME}.information_schema.external_table_files(
            TABLE_NAME => '{DATABASE_NAME}.{SCHEMA_NAME}.aggregate_mta'
        )
    )
WHERE
    last_modified >= CURRENT_TIMESTAMP - interval '24 hour';

Step 3: Load the data to your destination

## INSERT CODE TO LOAD TO EXTERNAL DESTINATION HERE 
with conn.cursor() as cur:
    for result in results:
        date = result[3] # position of "date" in the output tuple
        results = cur.execute("""
        SELECT * FROM {DATABASE_NAME}.{SCHEMA_NAME}.{TABLE_NAME}
        WHERE date = '{date}'
        """).fetchall()

    # Write "results" output to external destination