Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save sayle-doit/60c515f165bc5e2900f1ffd0cb3e174f to your computer and use it in GitHub Desktop.
Save sayle-doit/60c515f165bc5e2900f1ffd0cb3e174f to your computer and use it in GitHub Desktop.
Compare BigQuery job costs when running a job on either BigQuery Editions with the autoscaler or on-demand with both new and old pricing models.
/*
* This query will look at the past 30 days of job history to analyze it for costs under
* BigQuery Editions while utilizing the new autoscaling feature that was introduced.
* It does this for those using both PAYG (Pay As You Go) and commitment models.
* It will also compare this versus running the query with the on-demand model.
*
* Note that this query utilizes some math modeling behaviors that the BigQuery
* autoscaler uses. Namely these are the up to 10 seconds "slot scale up time,"
* the minimum of 60 seconds "slot scale down time," and the behavior that the
* autoscaler scales up and down in factors of 100 slots for each job.
* Note that on scaling up and down it does bill for those periods of time.
*
* The costs generated by this are strictly an estimate and for Editions
* should be considered a minimum cost of the query as the autoscaler
* may not scale "evenly" and may change scaling during runtime.
*/
WITH jobs_duration AS
(
SELECT
job_id,
query,
start_time,
end_time,
TIMESTAMP_DIFF(end_time, start_time, SECOND) AS job_duration_s,
-- Add 60 seconds scale down time + 10 seconds scale up time to duration for autoscaled duration
TIMESTAMP_DIFF(end_time, start_time, SECOND)+60+10 AS autoscaled_duration_s,
-- Turn slot ms into slot secons
total_slot_ms/ 1000 AS total_slot_s
FROM
`region-us`.INFORMATION_SCHEMA.JOBS
WHERE
creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) AND CURRENT_TIMESTAMP()
--AND job_id NOT LIKE 'script_%'
AND total_slot_ms > 0
AND total_bytes_billed > 0
AND state = "DONE"
),
jobs_slot_usage AS
(
SELECT
jd.job_id,
jd.query,
job_duration_s,
total_bytes_processed / POW(1024, 4) AS total_tib_processed,
CAST(FLOOR((CEIL(total_slot_s / autoscaled_duration_s) + 99) / 100) * 100 AS INT64) AS autoscaled_slot_usage,
autoscaled_duration_s,
-- Slot usage (rounded up to next 100 slots) * duration in seconds / (60 seconds * 60 seconds) = slot/hour used by this job
CAST(FLOOR((CEIL(total_slot_s / autoscaled_duration_s) + 99) / 100) * 100 AS INT64)
* (autoscaled_duration_s/(60*60))
AS slot_hour,
FROM
jobs_duration AS jd JOIN `region-us`.INFORMATION_SCHEMA.JOBS AS jobs
ON jd.job_id = jobs.job_id
)
SELECT
job_id,
query,
job_duration_s,
total_tib_processed,
-- Removed this because it has been a few months since the price change and this isn't very relevant anymore
--ROUND(total_tib_processed * 5, 2) AS old_on_demand_cost,
ROUND(total_tib_processed * 6.25, 2) AS on_demand_cost,
autoscaled_duration_s,
slot_hour,
ROUND(slot_hour * 0.04, 2) AS standard_edition_cost,
ROUND(slot_hour * 0.06, 2) AS enterprise_edition_cost,
ROUND(slot_hour * 0.1, 2) AS enterprise_plus_edition_cost,
ROUND(slot_hour * 0.048, 2) AS enterprise_edition_1yr_commit_cost,
ROUND(slot_hour * 0.036, 2) AS enterprise_edition_3yr_commit_cost,
ROUND(slot_hour * 0.08, 2) AS enterprise_plus_edition_1yr_commit_cost,
ROUND(slot_hour * 0.06, 2) AS enterprise_plus_edition_3yr_commit_cost
FROM
jobs_slot_usage;
@nonmanager
Copy link

Thank you for this, Sayle! Do you know if there is a way to change the autoscaler behavior to make it allocate slots more conservatively?

@sayle-doit
Copy link
Author

As far as I know there are not any methods available to us as users currently to adjust the autoscaler behavior. I highly suspect this is coming in the future (maybe as soon as Next in 2 weeks?), but I have not been given any information related to this and it's just my conjecture they will release some "knobs" to adjust how the autoscaler works for end-users. Many customers I have worked with have experienced this same behavior and given feedback directly to the teams at Google which makes me believe they are listening and adjusting appropriately.

@LorenzoIacolare
Copy link

A proposal to better estimate slot consumption for jobs that are in the same 60 seconds window (assumes constant slot usage after the first query)

WITH jobs_duration AS
(
  SELECT
    job_id,
    query,
    
    start_time,
    end_time,
    
    TIMESTAMP_DIFF(end_time, start_time, SECOND) AS job_duration_s,
    -- Add 60 seconds scale down time + 10 seconds scale up time to duration for autoscaled duration
    CASE 
      WHEN if(first_value(end_time) over (
        order by unix_micros(end_time)
        rows between 60000000 preceding and 1 preceding
      ) is not null, true, false) THEN TIMESTAMP_DIFF(end_time, start_time, SECOND) + 0.01 
    else TIMESTAMP_DIFF(end_time, start_time, SECOND)+60+10
    end AS autoscaled_duration_s,


    -- Turn slot ms into slot secons
    total_slot_ms/ 1000 AS total_slot_s
  FROM
    `region-us`.INFORMATION_SCHEMA.JOBS
  WHERE
    creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) AND CURRENT_TIMESTAMP()
    --AND job_id NOT LIKE 'script_%'
    AND total_slot_ms > 0
    AND total_bytes_billed > 0
    AND state = "DONE"
    

),
jobs_slot_usage AS
(
  SELECT
    jd.job_id,
    jd.query,
    job_duration_s,
    total_bytes_processed / POW(1024, 4) AS total_tib_processed,

    CAST(FLOOR((CEIL(total_slot_s / autoscaled_duration_s) + 99) / 100) * 100 AS INT64) AS autoscaled_slot_usage,
    autoscaled_duration_s,

    -- Slot usage (rounded up to next 100 slots) * duration in seconds / (60 seconds * 60 seconds) = slot/hour used by this job
    CAST(FLOOR((CEIL(total_slot_s / autoscaled_duration_s) + 99) / 100) * 100 AS INT64)
      * (autoscaled_duration_s/(60*60))
      AS slot_hour,
  FROM
    jobs_duration AS jd JOIN `region-us`.INFORMATION_SCHEMA.JOBS AS jobs
      ON jd.job_id = jobs.job_id
),

SELECT
  job_id,
  query,
  job_duration_s,
  total_tib_processed,

  
  ROUND(total_tib_processed * 6.25, 2) AS on_demand_cost,

  autoscaled_duration_s,
  slot_hour,

  ROUND(slot_hour * 0.04, 2) AS standard_edition_cost,
  ROUND(slot_hour * 0.06, 2) AS enterprise_edition_cost,
  ROUND(slot_hour * 0.1, 2) AS enterprise_plus_edition_cost,

  ROUND(slot_hour * 0.048, 2) AS enterprise_edition_1yr_commit_cost,
  ROUND(slot_hour * 0.036, 2) AS enterprise_edition_3yr_commit_cost,
  
  ROUND(slot_hour * 0.08, 2) AS enterprise_plus_edition_1yr_commit_cost,
  ROUND(slot_hour * 0.06, 2) AS enterprise_plus_edition_3yr_commit_cost
FROM
  jobs_slot_usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment