Skip to content

Instantly share code, notes, and snippets.

@theY4Kman
Last active August 29, 2015 14:09
Show Gist options
  • Select an option

  • Save theY4Kman/4aff4fee10a3804fd57c to your computer and use it in GitHub Desktop.

Select an option

Save theY4Kman/4aff4fee10a3804fd57c to your computer and use it in GitHub Desktop.
Building a simple SQLAlchemy-like ORM

The Base

At the core of things, you're taking an abstract syntax tree (AST) and compiling it into a string (the query which gets sent to the database). The point of an AST is to break a language down into its core components, where each component contains other componennts until you reach a concrete level where things can go no further (such as a literal number 1 or a string "test").

To build a good extensible ORM, the you'll first need AST components. Let's build one of those, and a compiler to turn the AST into a query string.

class Element(object):
    __visit_name__ = 'element'
    
    def _compile(self, compiler):
        method = getattr(compiler, 'visit_' + self.__visit_name__)
        return method(self)


class Value(Element):
    __visit_name__ = 'value'
    
    def __init__(self, value):
        self.value = value


class Compiler(object):
    def visit_value(self, value_element):
        return str(value_element.value)


# How to use it
print Value(14)._compile(Compiler())
# 14

We create an Element base class first. This class defines one method which finds a visit_XXXXX method on a Conpiler object and calls it with itself. That method's task is to take the Element and turn it into a its string representation in the query.

Then we make our first subclass of Element: Value. Its purpose is simply to record the structure of a literal value in a query. By itself, it's really not useful, because the compiler is the one to make meaning of it. The Compiler defines the visit_value method which will convert the Value into its string representation (which it literally does with str).

The last line creates a Value of 14 and compiles it using a new instance of Compiler. Boom! It outputs 14.

The beauty of this is a separation of concerns. Value simply records its inherent value, and Compiler turns that value into what the query needs.

Utilizing the Tree

As far as the last example goes, you could have just as well done str(14). One of the benefits of having an abstract syntax tree is higher levels not caring what the lower levels are (as long as they quack like a duck). Let's modify our example to produce a list of values:

class Element(object):
    __visit_name__ = 'element'
    
    def _compile(self, compiler):
        method = getattr(compiler, 'visit_' + self.__visit_name__)
        return method(self)


class Value(Element):
    __visit_name__ = 'value'
    
    def __init__(self, value):
        self.value = value


class ValueList(Element):
    __visit_name__ = 'value_list'

    def __init__(self, *values):
        self.values = values


class Compiler(object):
    def visit_value(self, value_element):
        return str(value_element.value)

    def visit_value_list(self, value_list):
        return ', '.join(v._compile(self) for v in value_list.values)


# How to use it
print ValueList(Value(14), Value(2.0))._compile(Compiler())
# 14, 2.0

The new ValueList class takes a list of Value objects. When the compiler sees a ValueList object (with the visit_value_list method), it calls the compiler on each of the Value objects in the ValueList and joins them with a comma.

Database Parameters

Every good database has support for parameters, which are essentially placeholders for actual values. To execute a query with paramaters, you pass the query and the values for each parameter. The benefits of this are 1) the database can escape values, so no injections are possible through the values, and 2) the database can reuse the query, even if you change the values.

With regards to reusing a query, it's important to know that query languages are only an interface to a much more complicated piece of software. The database will parse your query (first into an AST) and determine how it should be run. It then caches the plan of action. So, if you change the query, it has to determine a new plan of action, but if you reuse the same query, it doesn't incur that cost again. This isn't a problem with a few queries, but with several thousand the time becomes noticeable.

Parameters mean extra performance. This is a Good Thing. So, let's introduce parameters to our example:

class Element(object):
    __visit_name__ = 'element'
    
    def _compile(self, compiler):
        method = getattr(compiler, 'visit_' + self.__visit_name__)
        return method(self)


class Value(Element):
    __visit_name__ = 'value'
    
    def __init__(self, value):
        self.value = value


class Parameter(Element):
    __visit_name__ = 'parameter'

    def __init__(self, name, value):
        self.name = name
        self.value = value


class ValueList(Element):
    __visit_name__ = 'value_list'

    def __init__(self, *values):
        self.values = values


class Compiler(object):
    def __init__(self):
        self.values = {}

    def visit_value(self, value_element):
        return str(value_element.value)

    def visit_value_list(self, value_list):
        return ', '.join(v._compile(self) for v in value_list.values)

    def visit_parameter(self, parameter):
        self.values[parameter.name] = parameter.value._compile(self)
        return ':' + parameter.name


# How to use it
compiler = Compiler()
print ValueList(Parameter('fourteen', Value(14)), Parameter('two_point_oh', Value(2.0)))._compile(compiler)
# fourteen, two_point_oh
print compiler.values
# {'fourteen': '14', 'two_point_oh': '2.0'}

We changed Compiler a bit, so it keeps track of the parameters which have been compiled. Parameter takes a name and a value. When the Parameter gets compiled, it records its value against its name with the Compiler. Then, when you execute the query against the database, you pass the query string and the values, so the database will replace the placeholders with your passed values. (The colon before the parameter name is a common way for a DB to recognize parameters.)

Making Real Queries

None of this shit is useful if it can't be used with a real database. So, let's create a real databases. Python ships with sqLite support, and even lets you create an in-memory database (using the special name ":memory:"):

compiler = Compiler()
query = ValueList(Parameter('fourteen', Value(14)), Parameter('two_point_oh', Value(2.0)))._compile(compiler)
params = compiler.values

import sqlite3
db = sqlite3.connect(':memory:')
db.execute()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment