SELECT
'Lambda Days' as conference_name,
2024 as edition,
'Grepping SQL code [LIKE A BOSS]' as title,
'Jônatas Davi Paganini' as author,
'jonatasdp' as x_nickname, -- or gmail
'jonatas' as github_nickname,
'[email protected]' as company_mail
I travel from Brazil to Poland to show you how to grep SQL code like a boss.
More than grep, my itent is inspire you to build and hack your own tools.
Yes, this presentation was made in the terminal. IO is simple and easy
Minimalist examples from unix:
- cat
- head
- wc
- sort
- uniq
- grep
- sed
- echo
grep '^# ' talk-sql.md | wc -l # => 60
Code examples:
grep '```ruby' talk-sql.md | wc -l # => 29
grep '```sql' talk-sql.md | wc -l # => 5
grep '```bash' talk-sql.md | wc -l # => 9
I onboarded 300+ backend developers in 5 years. One of the most boring things is the PR review checklist.
- Check the code style
- Check the usage of the framework components
- Check the broken links in the documentation
- Check the performance
- Check the security
- Check the architecture
- Check the design
- Check the business logic
- Check the requirements
- Don't add more bureaucracy to PR review
- Build linters and tools to help the developers
- Educate on demand instead of expecting perfection
- REPL: Play with code in a proper runtime environment
- I love Ruby
- I was several years onboarding people on code, showing frameworks, examples and hunting for best practices, legacy code, architecture patterns, etc.
I already did a similar talk on Ruby code. Did the "Grepping Ruby code like a boss" talk in Ruby Kaigi 2020
- 50% community manager
- 50% developer advocate
My daily work involves a lot of SQL and amost nothing on programming languages itself. I also maintain the timescaledb ruby gem: github.com/jonatas/timescaledb
- 2019: krk.rb - The Ruby Conference of Krakow
- How to build a cop in RuboCop
- Introduction to RuboCop Node Pattern
- 2023: Lambda Days
- Functional Pipelines in PostgreSQL
- Mentoship panel
- 2024: Lambda Days
- Grepping SQL code like a boss
- Mentoship panel
Experimenting with new tools and techniques can lead to unexpected discoveries and insights.
I built a tool to grep Ruby & SQL code like a boss.
- My journey with "compilers", ASTs, and code analysis
- My journey with SQL
- Advanced search and refactoring
Regular expressions are a powerful tool for searching and manipulating text.
Regex can be difficult to use for complex searches and refactoring tasks.
- Include/exclude context
- Nested contexts and scenarios
- Capture groups
Build your own Regular Expression Engine for the AST
Abstract Syntax Trees provide a more powerful and flexible way to search and manipulate code.
Fast.ast("1").type # => :int
Fast.ast("1").children # => [1]
puts Fast.ast("1")
Output:
(int 1)
puts Fast.ast("1 + 2")
Outputs the String Expression aka sexp in Ruby:
(send
(int 1) :+
(int 2))
Ternary operator in Ruby
a > 1 ? 2 : 3
Is the same as
if a > 1
2
else
3
end
Ruby allows the keyword
unless
for example
puts Fast.ast("a unless b")
Outputs simplified to if clause:
(if
(send nil :b)
nil
(send nil :a))
puts Fast.ast("a > 1 ? 1 : 0")
Outputs:
(if
(send
(send nil :a) :>
(int 1))
(int 1)
(int 0))
Fast.parse_sql("table customers") ==
Fast.parse_sql("SELECT * FROM customers") # => true
puts Fast.parse_sql("table customers")
Outputs:
(select-stmt
(target-list
(res-target
(val
(column-ref
(fields)))))
(from-clause
(range-var
(relname "customers")
(inh true)
(relpersistence "p"))))
Fast.parse_sql("select 1")
Outputs:
(select-stmt
(target-list
(res-target
(val
(a-const
(ival
(ival 1)))))))
'fast' is a tool that leverages ASTs to provide advanced search and refactoring capabilities.
Fast.ast("1 + 2").search("int")
# => [s(:int, 1), s(:int, 2)]
'fast' is designed to be MY pocket knife. home-made, full of bugs and sharp edges, but powerful, flexible, and easy to use.
Fast.ast("1 + 2.0").search("int") # => [s(:int, 1)]
Fast.ast("1 + 2.0").search("(float _)") # => [s(:float, 2.0)]
Similar to the sexp output, but for search :)
Fast.ast("1 + 2.0").search("{int float}") # => [s(:int, 1), s(:float, 2.0)]
Example verify integer or float and positive values
Fast.ast("1 + 2.0").search("({int float} .positive?)")
# => [s(:int, 1), s(:float, 2.0)]
- 'exp'
- '(type *children)'
- '_' for anything not nil
- '{ this or that }'
- '[ this and that ]'
- '_' and '...' for something or a node with children _ '$' for captures _ '!' to negate
Try fast .finders
Ruby is a great language for building tools and experimenting with new ideas.
Fast.ast("a = 1 + 2")
# => s(:lvasgn, :a, s(:send, s(:int, 1), :+, s(:int, 2)))
'fast' provides advanced SQL support for searching and refactoring SQL code.
Fast.parse_sql('select 1')
s(:select_stmt,
s(:target_list,
s(:res_target,
s(:val,
s(:a_const,
s(:ival,
s(:ival, 1)))))))
- pg_query (PostgreSQL C bindings) do the main work
# lib/fast/sql.rb:110
def parse(statement, buffer_name: "(sql)")
return [] if statement.nil?
source_buffer = SQL::SourceBuffer.new(buffer_name, source: statement)
tree = PgQuery.parse(statement).tree
first, *, last = source_buffer.tokens
stmts = tree.stmts.map do |stmt|
from = stmt.stmt_location
to = stmt.stmt_len.zero? ? last.end : from + stmt.stmt_len
expression = Parser::Source::Range.new(source_buffer, from, to)
source_map = Parser::Source::Map.new(expression)
sql_tree_to_ast(clean_structure(stmt.stmt.to_h), source_buffer: source_buffer, source_map: source_map)
end.flatten
stmts.one? ? stmts.first : stmts
end
'Fastfile' can help to organize the dictionary of patterns in shortcuts.
Fast.shortcut :sql_parser, "(def parse)", "lib/fast/sql.rb"
Fast
.parse_sql("SELECT * FROM customers")
.replace("relname", "other_table")
# => "SELECT * FROM other_table"
Fast.shortcut :format_sql do
require 'fast/sql'
file = ARGV.last
method = File.exist?(file) ? :parse_sql_file : :parse_sql
ast = Fast.public_send(method, file)
ast = ast.first if ast.is_a? Array
eligible_kw = [:RESERVED_KEYWORD]
eligible_tokens = [:BY]
output = Fast::SQL.replace('_', ast) do |root|
sb = root.loc.expression.source_buffer
sb.tokens.each do |token|
if eligible_kw.include?(token.keyword_kind) || eligible_tokens.include?(token.token)
range = Parser::Source::Range.new(sb, token.start, token.end)
replace(range, range.source.upcase)
end
end
end
require 'fast/cli'
puts Fast.highlight(output, sql: true)
end
Fast.shortcut :anonymize_sql do
require 'fast/sql'
file = ARGV.last
method = File.exist?(file) ? :parse_sql_file : :parse_sql
ast = Fast.public_send(method, file)
memo = {}
relnames = search("(relname $_)", ast).grep(String).uniq
pattern = "{relname (sval {#{relnames.map(&:inspect).join(' ')}})}"
puts "searching with #{pattern}"
content = Fast::SQL.replace(pattern, ast) do |node|
new_name = memo[node.source.tr(%|"'|, '')] ||= "x#{memo.size}"
new_name = "'#{new_name}'" if node.type == :sval
replace(node.loc.expression, new_name)
end
puts Fast.highlight(content, sql: true)
end
CREATE TABLE temperatures
( time TIMESTAMP NOT NULL,
location varchar,
value decimal);
-- automatic partitioning by month
SELECT create_hypertable('temperatures',
by_range('time', INTERVAL '1 month'));
INSERT INTO temperatures ( location, time, value)
VALUES
( 'kitchen', '2000-01-01 12:20:00', 22.2),
( 'kitchen', '2000-01-01 12:32:00', 22.8),
( 'bedroom', '2000-01-01 12:22:00', 22.8),
( 'bedroom', '2000-01-01 14:33:00', 24.2);
SELECT time_bucket('1h', time) as time,
location,
avg(value) as value
FROM temperatures
GROUP BY 1,2;
CREATE MATERIALIZED VIEW avg_temperature_by_hour
WITH (timescaledb.continuous) AS
SELECT time_bucket('1h', time) as time,
location,
avg(value) as value
FROM temperatures
GROUP BY 1,2
WITH DATA;
Hey buddy, I see you're querying from the hypertable but you also have the materialized view. Let's use the materialized view instead.
puts Fast.parse_sql("SELECT time_bucket('1h', now())")
Outputs:
(select-stmt
(target-list
(res-target
(val
(func-call
(funcname
(string
(sval "time_bucket")))
(args
(a-const
(sval
(sval "1h")))
(func-call
(funcname
(string
(sval "now")))
(funcformat :COERCE_EXPLICIT_CALL)))
(funcformat :COERCE_EXPLICIT_CALL))))))
Fast.shortcut :check_query do
pattern = <<~FAST
(select_stmt
(target_list #call_time_bucket)
(from_clause #from_hypertable)
FAST
search_all pattern, # ...
end
def call_time_bucket(node)
node.first('(func_call (funcname (string (sval "time_bucket")')
end
The from_hypertable
pattern:
def from_hypertable(node)
if (relname = node.capture('(relname $_)')[0])
hypertables.include?(relname)
end
end
def hypertables
@hypertables ||=
Timescaledb.hypertables.map(&:hypertable_name)
end
# Previous pattern
search_all pattern, ARGV.last, parallel: false, on_result: ->(file, results) do
puts "#{file}: #{results.size}"
results.each do |node|
report node
if node.parent.nil?
@query[node.capture('(relname $_)')[0]] = node
else
root = node.ancestors.last
# ... next slide
case root.type
when :create_table_as_stmt
view_name = <<~FAST
(create_table_as_stmt
(query ... )
(into
(rel
(relname $_)
(inh true)
(relpersistence "p"))
(options
(def_elem
(defnamespace "timescaledb")
(defname "continuous")
FAST
@query["temperatures"] == @materialized["avg_temperature_by_hour"] # => true
# ... previous context
@query.each do |table_name, query_table|
@materialized.each do |view_name, query_materialized|
if query_table == query_materialized
puts "The table #{table_name} is also tracked as a continuous aggregate of #{view_name}", ""
report(query_table.source)
puts "Query the data from the materialized view to get pre-computed results", ""
report("SELECT * FROM #{view_name}")
end
end
fast .check_query demo.sql
The table temperatures is also tracked as a continuous aggregate of avg_temperature_by_hour
SELECT time_bucket('1h', time) as time,
location,
avg(value) as value
FROM temperatures
GROUP BY 1,2
Query the data from the materialized view to get pre-computed results
SELECT * FROM avg_temperature_by_hour
- Playing with toys from compiler level is fun.
- Primitive tools can be powerful.
Help the query planner in the ORM level. Integrate with the timescaledb gem to replace queries before the query planner.
- Timescale team for the support
- Lambda Days team for the opportunity
- Ruby community for the inspiration
-
Jônatas Davi Paganini 🇧🇷
Use md-show to present it.