Skip to content

Instantly share code, notes, and snippets.

@jonatas
Created May 20, 2024 14:34
Show Gist options
  • Save jonatas/6ba56014185855b7f2efae74d6250016 to your computer and use it in GitHub Desktop.
Save jonatas/6ba56014185855b7f2efae74d6250016 to your computer and use it in GitHub Desktop.
-- Setup temperatures tracking system
DROP VIEW if exists avg_by_hour CASCADE;
DROP TABLE if exists temperatures CASCADE;
CREATE TABLE temperatures
( time TIMESTAMP NOT NULL,
location varchar,
value decimal);
SELECT create_hypertable('temperatures',
by_range('time', INTERVAL '1 month'));
INSERT INTO temperatures ( location, time, value)
VALUES
( 'kitchen', '2000-01-01 12:20:00', 22.2),
( 'kitchen', '2000-01-01 12:32:00', 22.8),
( 'bedroom', '2000-01-01 12:22:00', 22.8),
( 'bedroom', '2000-01-01 14:33:00', 24.2);
SELECT time_bucket('1h', time) as time,
location,
avg(value) as value
FROM temperatures
GROUP BY 1,2;
CREATE MATERIALIZED VIEW avg_temperature_by_hour
WITH (timescaledb.continuous) AS
SELECT time_bucket('1h', time) as time,
location,
avg(value) as value
FROM temperatures
GROUP BY 1,2
WITH DATA;
# Search time_bucket calls
Fast.shortcut :time_bucket, '^^(func_call (funcname (string (sval "time_bucket")'
if ENV['PG_URI']
require 'bundler/inline'
gemfile do
source 'https://rubygems.org'
gem 'timescaledb', path: '../../code/timescale/timescale-gem'
gem 'pry'
end
require 'fast/cli'
# Timescaledb.establish_connection(ENV['PG_URI'])
ActiveRecord::Base.establish_connection ENV['PG_URI']
def hypertables
@hypertables ||=
Timescaledb.hypertables.map(&:hypertable_name)
end
def call_time_bucket(node)
node.first('(func_call (funcname (string (sval "time_bucket")')
end
def from_hypertable(node)
if (relname = node.capture('(relname $_)')[0])
hypertables.include?(relname)
end
end
Fast.shortcut :check_query do
pattern = <<~FAST
(select_stmt
(target_list #call_time_bucket)
(from_clause #from_hypertable)
FAST
@materialized = {}
@query = {}
search_all pattern, ARGV.last, parallel: false, on_result: ->(file, results) do
results.each do |node|
if node.parent.nil?
@query[node.capture('(relname $_)')[0]] = node
else
root = node.ancestors.last
case root.type
when :create_table_as_stmt
view_name = <<~FAST
(create_table_as_stmt
(query ... )
(into
(rel
(relname $_)
(inh true)
(relpersistence "p"))
(options
(def_elem
(defnamespace "timescaledb")
(defname "continuous")
FAST
if (name=root.capture(view_name)[0])
@materialized[name] = node
end
end
end
end
@query.each do |table_name, query_table|
@materialized.each do |view_name, query_materialized|
if query_table == query_materialized
puts "The table #{table_name} is also tracked as a continuous aggregate of #{view_name}", ""
report(query_table.source)
puts "\nQuery the data from the materialized view to get pre-computed results", ""
new_source = "SELECT * FROM #{view_name}"
report(new_source.gsub(/\s+/, " "))
end
end
end
end
end
end

Grepping SQL Code like a boss

SELECT
  'Lambda Days' as conference_name,
  2024 as edition,
  'Grepping SQL code [LIKE A BOSS]' as title,
  'Jônatas Davi Paganini' as author,
  'jonatasdp' as x_nickname, -- or gmail
  'jonatas' as github_nickname,
  '[email protected]' as company_mail

Introduction

I travel from Brazil to Poland to show you how to grep SQL code like a boss.

More than grep, my itent is inspire you to build and hack your own tools.

2 decades in the terminal

Yes, this presentation was made in the terminal. IO is simple and easy

Learn simple tools with single proposal

Minimalist examples from unix:

  • cat
  • head
  • wc
  • sort
  • uniq
  • grep
  • sed
  • echo

Count the topics of talk in the terminal

grep '^# ' talk-sql.md | wc -l # => 60

Code examples:

grep '```ruby' talk-sql.md | wc -l # => 29
grep '```sql' talk-sql.md | wc -l # => 5
grep '```bash' talk-sql.md | wc -l # => 9

TODO: kill PR review checklist

I onboarded 300+ backend developers in 5 years. One of the most boring things is the PR review checklist.

  • Check the code style
  • Check the usage of the framework components
  • Check the broken links in the documentation
  • Check the performance
  • Check the security
  • Check the architecture
  • Check the design
  • Check the business logic
  • Check the requirements

Automate the education

  • Don't add more bureaucracy to PR review
  • Build linters and tools to help the developers
  • Educate on demand instead of expecting perfection

Build smart stuff with simple tools

  • REPL: Play with code in a proper runtime environment
  • I love Ruby
  • I was several years onboarding people on code, showing frameworks, examples and hunting for best practices, legacy code, architecture patterns, etc.

I'm a Rubyist

I already did a similar talk on Ruby code. Did the "Grepping Ruby code like a boss" talk in Ruby Kaigi 2020

I'm becoming a Postgrist

  • 50% community manager
  • 50% developer advocate

My daily work involves a lot of SQL and amost nothing on programming languages itself. I also maintain the timescaledb ruby gem: github.com/jonatas/timescaledb

Krakow conferences 🫶

  • 2019: krk.rb - The Ruby Conference of Krakow
    • How to build a cop in RuboCop
    • Introduction to RuboCop Node Pattern
  • 2023: Lambda Days
    • Functional Pipelines in PostgreSQL
    • Mentoship panel
  • 2024: Lambda Days
    • Grepping SQL code like a boss
    • Mentoship panel

Adventures with fast

The Value of Trying

Experimenting with new tools and techniques can lead to unexpected discoveries and insights.

I built a tool to grep Ruby & SQL code like a boss.

  • My journey with "compilers", ASTs, and code analysis
  • My journey with SQL
  • Advanced search and refactoring

The Power of Regex

Regular expressions are a powerful tool for searching and manipulating text.

The Limitations of Regex

Regex can be difficult to use for complex searches and refactoring tasks.

  • Include/exclude context
  • Nested contexts and scenarios
  • Capture groups

When Regex is not enough

Build your own Regular Expression Engine for the AST

The AST Advantage

Abstract Syntax Trees provide a more powerful and flexible way to search and manipulate code.

Fast.ast("1").type # => :int
Fast.ast("1").children # => [1]

The AST in Ruby

puts Fast.ast("1")

Output:

(int 1)

AST of summing two numbers in Ruby

puts Fast.ast("1 + 2")

Outputs the String Expression aka sexp in Ruby:

(send
  (int 1) :+
  (int 2))

The AST unify code

Ternary operator in Ruby

a > 1 ? 2 : 3

Is the same as

if a > 1
  2
else
  3
end

AST unifies syntax

Ruby allows the keyword unless for example

puts Fast.ast("a unless b")

Outputs simplified to if clause:

(if
  (send nil :b)
  nil
  (send nil :a))

Ternary operator

puts Fast.ast("a > 1 ? 1 : 0")

Outputs:

(if
  (send
    (send nil :a) :>
    (int 1))
  (int 1)
  (int 0))

In SQL is the same

Fast.parse_sql("table customers") ==
Fast.parse_sql("SELECT * FROM customers") # => true

Say welcome to AST in SQL

puts Fast.parse_sql("table customers")

Outputs:

(select-stmt
  (target-list
    (res-target
      (val
        (column-ref
          (fields)))))
  (from-clause
    (range-var
      (relname "customers")
      (inh true)
      (relpersistence "p"))))

Getting the simplest AST

Fast.parse_sql("select 1")

Outputs:

(select-stmt
  (target-list
    (res-target
      (val
        (a-const
          (ival
            (ival 1)))))))

The 'fast' Tool

'fast' is a tool that leverages ASTs to provide advanced search and refactoring capabilities.

Fast.ast("1 + 2").search("int")
# => [s(:int, 1), s(:int, 2)]

The 'fast' Philosophy

'fast' is designed to be MY pocket knife. home-made, full of bugs and sharp edges, but powerful, flexible, and easy to use.

Like lisp, but for the AST

Fast.ast("1 + 2.0").search("int")        # => [s(:int, 1)]
Fast.ast("1 + 2.0").search("(float _)")  # => [s(:float, 2.0)]

Similar to the sexp output, but for search :)

Combine expressions with {}

Fast.ast("1 + 2.0").search("{int float}")  # => [s(:int, 1), s(:float, 2.0)]

Combine expressions

Example verify integer or float and positive values

Fast.ast("1 + 2.0").search("({int float} .positive?)")
# => [s(:int, 1), s(:float, 2.0)]

Fast - Like regex but for the AST

  • 'exp'
  • '(type *children)'
  • '_' for anything not nil
  • '{ this or that }'
  • '[ this and that ]'
  • '_' and '...' for something or a node with children _ '$' for captures _ '!' to negate

Try fast .finders

Started with Ruby

Ruby is a great language for building tools and experimenting with new ideas.

Fast.ast("a = 1 + 2")
# => s(:lvasgn, :a, s(:send, s(:int, 1), :+, s(:int, 2)))

SQL Support in 'fast'

'fast' provides advanced SQL support for searching and refactoring SQL code.

Fast.parse_sql('select 1')
s(:select_stmt,
  s(:target_list,
    s(:res_target,
      s(:val,
        s(:a_const,
          s(:ival,
            s(:ival, 1)))))))

fast .sql_parser

  • pg_query (PostgreSQL C bindings) do the main work
# lib/fast/sql.rb:110
    def parse(statement, buffer_name: "(sql)")
      return [] if statement.nil?
      source_buffer = SQL::SourceBuffer.new(buffer_name, source: statement)
      tree = PgQuery.parse(statement).tree
      first, *, last = source_buffer.tokens
      stmts = tree.stmts.map do |stmt|
        from = stmt.stmt_location
        to = stmt.stmt_len.zero? ? last.end : from + stmt.stmt_len
        expression = Parser::Source::Range.new(source_buffer, from, to)
        source_map = Parser::Source::Map.new(expression)
        sql_tree_to_ast(clean_structure(stmt.stmt.to_h), source_buffer: source_buffer, source_map: source_map)
      end.flatten
      stmts.one? ? stmts.first : stmts
    end

Fastfile

'Fastfile' can help to organize the dictionary of patterns in shortcuts.

Fast.shortcut :sql_parser, "(def parse)", "lib/fast/sql.rb"

Refactoring operations

Fast
  .parse_sql("SELECT * FROM customers")
  .replace("relname", "other_table")
# => "SELECT * FROM other_table"

Format SQL

Fast.shortcut :format_sql do
  require 'fast/sql'
  file = ARGV.last
  method = File.exist?(file) ? :parse_sql_file : :parse_sql
  ast = Fast.public_send(method, file)
  ast = ast.first if ast.is_a? Array
  eligible_kw = [:RESERVED_KEYWORD]
  eligible_tokens = [:BY]

  output = Fast::SQL.replace('_', ast) do |root|
    sb = root.loc.expression.source_buffer
    sb.tokens.each do |token|
      if eligible_kw.include?(token.keyword_kind) || eligible_tokens.include?(token.token)
        range = Parser::Source::Range.new(sb, token.start, token.end)
        replace(range, range.source.upcase)
      end
    end
  end
  require 'fast/cli'
  puts Fast.highlight(output, sql: true)
end

Anonymize SQL

Fast.shortcut :anonymize_sql do
  require 'fast/sql'
  file = ARGV.last
  method = File.exist?(file) ? :parse_sql_file : :parse_sql
  ast = Fast.public_send(method, file)
  memo = {}

  relnames = search("(relname $_)", ast).grep(String).uniq
  pattern = "{relname (sval {#{relnames.map(&:inspect).join(' ')}})}"
  puts "searching with #{pattern}"

  content = Fast::SQL.replace(pattern, ast) do |node|
    new_name = memo[node.source.tr(%|"'|, '')] ||= "x#{memo.size}"
    new_name = "'#{new_name}'" if node.type == :sval
    replace(node.loc.expression, new_name)
  end
  puts Fast.highlight(content, sql: true)
end

My latest experiments with SQL AST

CREATE TABLE temperatures
( time TIMESTAMP NOT NULL,
  location varchar,
  value decimal);

-- automatic partitioning by month
SELECT create_hypertable('temperatures',
  by_range('time', INTERVAL '1 month'));

Inserting data

INSERT INTO temperatures ( location, time, value)
VALUES
( 'kitchen',   '2000-01-01 12:20:00', 22.2),
( 'kitchen',   '2000-01-01 12:32:00', 22.8),
( 'bedroom',   '2000-01-01 12:22:00', 22.8),
( 'bedroom',   '2000-01-01 14:33:00', 24.2);

Query data

SELECT time_bucket('1h', time) as time,
  location,
  avg(value) as value
  FROM temperatures
  GROUP BY 1,2;

Creating a materialized view

CREATE MATERIALIZED VIEW  avg_temperature_by_hour
WITH (timescaledb.continuous) AS
SELECT time_bucket('1h', time) as time,
  location,
  avg(value) as value
  FROM temperatures
GROUP BY 1,2
WITH DATA;

My experiment - small linter

Hey buddy, I see you're querying from the hypertable but you also have the materialized view. Let's use the materialized view instead.

Learning the AST patterns

puts Fast.parse_sql("SELECT time_bucket('1h', now())")

Outputs:

(select-stmt
  (target-list
    (res-target
      (val
        (func-call
          (funcname
            (string
              (sval "time_bucket")))
          (args
            (a-const
              (sval
                (sval "1h")))
            (func-call
              (funcname
                (string
                  (sval "now")))
              (funcformat :COERCE_EXPLICIT_CALL)))
          (funcformat :COERCE_EXPLICIT_CALL))))))

Detecting select with time_bucket from hypertable

Fast.shortcut :check_query do 
  pattern = <<~FAST
  (select_stmt
    (target_list #call_time_bucket)
    (from_clause #from_hypertable)
  FAST
  search_all pattern, # ...
end

The call_time_bucket pattern

def call_time_bucket(node)
  node.first('(func_call (funcname (string (sval "time_bucket")')
end

The from_hypertable pattern:

def from_hypertable(node)
  if (relname = node.capture('(relname $_)')[0])
    hypertables.include?(relname)
  end
end
def hypertables
  @hypertables ||=
    Timescaledb.hypertables.map(&:hypertable_name)
end

Track @query and @materialized

    # Previous pattern
    search_all pattern, ARGV.last, parallel: false, on_result: ->(file, results) do
      puts "#{file}: #{results.size}"
      results.each do |node|
        report node
        if node.parent.nil?
          @query[node.capture('(relname $_)')[0]] = node
        else
          root = node.ancestors.last
          # ... next slide

Map @materialized with the view name

case root.type
when :create_table_as_stmt
  view_name = <<~FAST
    (create_table_as_stmt
      (query ... )
      (into
        (rel
          (relname $_)
          (inh true)
          (relpersistence "p"))
        (options
          (def_elem
            (defnamespace "timescaledb")
            (defname "continuous")
FAST

The cool part of AST

@query["temperatures"] == @materialized["avg_temperature_by_hour"] # => true

query_table == query_materialized

# ... previous context
      @query.each do |table_name, query_table|
        @materialized.each do |view_name, query_materialized|
          if query_table == query_materialized
            puts "The table #{table_name} is also tracked as a continuous aggregate of #{view_name}", ""
            report(query_table.source)
            puts "Query the data from the materialized view to get pre-computed results", ""
            report("SELECT * FROM #{view_name}")
          end
        end

Demo

 fast .check_query demo.sql
The table temperatures is also tracked as a continuous aggregate of avg_temperature_by_hour



SELECT time_bucket('1h', time) as time,
  location,
  avg(value) as value
  FROM temperatures
  GROUP BY 1,2

Query the data from the materialized view to get pre-computed results

SELECT * FROM avg_temperature_by_hour

Primitive tools 🫶

  • Playing with toys from compiler level is fun.
  • Primitive tools can be powerful.

The future

Help the query planner in the ORM level. Integrate with the timescaledb gem to replace queries before the query planner.

Sources

Special thanks

  • Timescale team for the support
  • Lambda Days team for the opportunity
  • Ruby community for the inspiration

Thank you!

@jonatas
Copy link
Author

jonatas commented May 20, 2024

Use md-show to present it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment