Skip to content

Instantly share code, notes, and snippets.

@lionelbarrow
Last active August 12, 2024 01:16
Show Gist options
  • Save lionelbarrow/8177236 to your computer and use it in GitHub Desktop.
Save lionelbarrow/8177236 to your computer and use it in GitHub Desktop.
SQL primer

SQL Primer

While there are plenty of good tutorials for SQL on the Internet, I didn't find anything that set the up the historical context and "assumed understanding" of SQL very well, so I decided to write a short primer on the topic. Here's what we'll cover:

  • What is SQL?
  • The SQL programmng language
  • Databases and SQL
  • Schemas and queries

What is SQL?

At this point, SQL, which stands for Structured Query Language, is several different things, and at any moment someone saying "SQL" (which is variously pronounced "sequel" or "ess-que-ell") might mean any or several of them. I can think of at least three distinct things which people refer to as SQL in common usage:

  • The SQL standard, a massive document produced by the ANSI standards committee, which [can be yours](http://webstore.ansi.org/RecordDetail.aspx?sku=ANSI+INCITS+135-1992+(R1998) for the low, low price of only $60! Useless, especially since it hasn't really changed in 30 years.

  • The programming langauge described by the SQL standard, which people use to talk to databases.

  • The various databases (Oracle, Microsoft, MySQL, PostgresSQL, etc) that implement the SQL programming language.

Most of the time, we'll be interested in the second meaning of the word -- the SQL programming language. However, it's important to realize that this programming language is a sort of platonic ideal of what SQL is; few database programs are entirely compliant with the SQL standard. However, this almost always doesn't matter and most of the time SQL commands that work in one database will work in another.

The SQL programming language

SQL is a declarative programming language intended to express relational algebra queries in a syntax similar to natural language. I'll go over each of these bullet points below:

  • SQL is, unusually among programming languages, declarative[1]. When using SQL, you issue commands (declare things) and let the database itself figure out how to actually do what you said. You don't run around telling the computer how to find the data you want -- you just specify what you want and let the database bring it to you. Idiomatic SQL is almost always declarative; if you find yourself getting too bogged down in how to get your data, you're probably doing something wrong.

  • SQL is a natural-language papering over the mathematics of relational algebra. In typically helpful form, Wikipedia explains that relational algebra is "an offshoot of first-order logic and of algebra of sets concerned with operations over finitary relations, usually made more convenient to work with by identifying the components of a tuple by a name (called attribute) rather than by a numeric column index, which is called a relation in database terminology." Right. What this is saying is that the there is a well-understood mathematical model behind what SQL is doing. Relational algebra is really very easy to understand once you've done a little SQL. I recommend this tutorial on the topic, but only after you're familiar with SQL (particularly joins).

  • SQL is a natural language programming language. This means that the designers of the language tried really hard to make SQL queries look like English. Don't be fooled: SQL is not English. SQL is a full-on programming language that is parsed and interpreted by single-minded programs incapable of overlooking unclosed quotemarks or misplaced commas. I have found SQL consoles to be particularly bad at identifying the root cause of syntax errors. You just have to deal with it.

We'll go over the basics of how to actually do stuff in SQL momentarily. However, notice that in the above bullet points I repeatedly referenced the SQL 'consoles' and 'databases' which answer user queries. Before diving into the language itself, we should briefly talk about what these programs are.

Databases and SQL

SQL, the language, is implemented by various long-running programs we usually refer to as SQL databases. A SQL database is simply a program that accepts some variety of SQL commands, manipulates some state internal to the program, and spits output back to the user. That's it.

At a conceptual level, databases are pretty simple things. However, modern programming relies so heavily on databases that these programs have reached an extraordinary level of maturity. For example, modern SQL databases can ensure that data safety and consistency is maintained even in the event of sudden power failure, disk corruption[2], etc, even while serving hundreds of simultaneous commands.

SQL databases are the workhorses of most of the modern Internet. Many websites essentially perform what developers refer to as 'CRUD' operations: they Create, Read, Update or Delete various data according to business rules. SQL databases excel in this role, and for that reason they are among the most used pieces of technology on the Internet. However, SQL databases do have limits. Most, by design, attempt to protect the safety and consistency of data at all costs. For this reason, there are usually strong upper limits on the effective scale of a SQL database. Google search does not and cannot run on SQL; on the other hand, most companies do not operate at Google's scale.

There are two basic flavors of SQL databases: open-source and closed-source. Open source databases are free to use, and are typically developed by companies looking to sell support to users of the database. Popular open source databases include MySQL, PostgresSQL[3] and SQLLite. Closed source databases are not free to use and typically users of closed source databases have much closer relationships with the database development company than users of open source databases. Popular closed source databases are Microsoft SQL Server and Oracle.

There are a few cultural connotations around which database companies use. Smaller companies and startups, especially tech-savvy companies, will usually use an open source product because they are cheaper. Larger companies will often closed source databases because of the easier access to support and advice from the database vendor. However, these lines are somewhat blurry. Salesforce runs almost entirely on Oracle, but also maintains one of the largest Postgres instances in the world. Google's core data services are home-grown systems that do not implement SQL at all, but it runs a number of auxillary services on a home-grown implementation of MySQL. Facebook is also known to rely heavily on MySQL.

Schemas and queries

Now we'll briefly touch on what day-to-day work in SQL looks like. This is not intended to be a tutorial (since so many have been written already) as much as a preview.

SQL is oriented around the creation and querying of tables that store data in an organized way. A table is essentially a name, a list of columns (each of which also has a data type, such as "string" or "numeric"), and possibly some constraints on the table (such as "start_date must be before end_date").

Taken together, the tables in a database are called the schema. A database schema is typically designed up front and then slowly evolves over time; changing a schema once it has been created can be very tricky.

The individual entries in a table are typically called relations, rows, or entries. They are essentially "just data" -- they are the core elements that the rest of SQL is concerned with manipulating. Here are a couple SQL queries involving these elements; note that often SQL programers will capitalize the words that are part of the SQL syntax and leave lower-case the specifics of the query. However, this is optional; SQL is not case-sensitive.

CREATE TABLE users (
    id          INTEGER,
    name        TEXT,
    join_date   DATE
);

This would create a 'users' table, where each user can have an id, a name and a join date. If we wanted to add some data to the table we could do so:

INSERT INTO users (id, name, join_date) VALUES (1, 'Lionel', '2013-12-29');
INSERT INTO USERS (id, name, join_date) VALUES (2, 'Fred', '2013-12-30');
INSERT INTO USERS (id, name, join_date) VALUES (3, 'Jane', '2013-12-29'), (5, 'Dan', '2013-11-20');

(The last query would insert two users at once.)

We could then issue a query on the table:

SELECT id, name FROM users WHERE join_date = '2013-12-29';

This would return something like this:

id, name
1, 'Lionel'
3, 'Jane'

But it could also return this:

id, name
3, 'Jane'
1, 'Lionel'

That's because I haven't declared an order on the data. This is the first lesson in about SQL: it does what you ask for and nothing more. Because I didn't ask for an order, no order is guaranteed. I can fix this easily enough though:

SELECT id, name FROM users WHERE join_date = '2013-12-29' ORDER BY id;

This will always return Jane first, because she has the largest id in the set returned.

Conclusion

That's about it for this primer. SQL is a big and complicated topic, but hopefully now you have all the context you need to dive into the nuts and bolts of the queries and problems you need to solve. To actually learn how to do SQL, I recommend Postgres Exercises, an interactive tutorial. Be aware -- it ramps up fast and heavily relies on the user's ability to Google things for themselves when they get stuck, but that's basically a requirement of modern programming in any sense regardless. Good luck.

[1] Declarative programming languages are relatively rare. SQL is by far the most successful of the breed, in part because it is not intended as a general-purpose language.

[2] Up to a point. You can prove that if the disk is allowed behave corruptly all the time, the database can't do shit.

[3] Often simply called Postgres or psql, PostgresSQL is the backbone of Braintree's data services.

@christophemarois
Copy link

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment