Skip to content

Instantly share code, notes, and snippets.

@DanielEFrampton
Last active September 19, 2020 21:36
Show Gist options
  • Save DanielEFrampton/cf5be7daa000a36f52c3fa0e1f434e2c to your computer and use it in GitHub Desktop.
Save DanielEFrampton/cf5be7daa000a36f52c3fa0e1f434e2c to your computer and use it in GitHub Desktop.
An API Project from Start to Finish

An API Project from Start to Finish

I'm a back-end software engineer in training (and in fact!) at Turing School of Software & Design. In the third of four modules in our program we were assigned a project of creating an API using the Rails framework. For your benefit and amusement, I've documented the entire process and the resources that helped me along the way.

1. Create a New Rails App

Creating a new Rails project from scratch is something that happens so infrequently that it's perpetually been something I google. My mentor (through Turing's network of mentors) who is a Rails developer in the Denver area agreed, saying that he's never once created a new Rails app from scratch in his current position. That being said, I've been enjoying creating new greenfield (i.e., brand new) projects for hobby coding or for side projects, and wanted to use this opportunity to learn the syntax a bit better.

This rubyonrails.org guide got me started, and was mostly helpful for pointing out that you can view all the command-line options and flags with this command:

rails new -h

I jumped into my shell (ZSH, using iTerm 2, if you're curious) and tried this out, and here's the big infodump I received:

Usage:
  rails new APP_PATH [options]

Options:
      [--skip-namespace], [--no-skip-namespace]              # Skip namespace (affects only isolated applications)
  -r, [--ruby=PATH]                                          # Path to the Ruby binary of your choice
                                                             # Default: /Users/dframpton/.rbenv/versions/2.5.3/bin/ruby
  -m, [--template=TEMPLATE]                                  # Path to some application template (can be a filesystem path or URL)
  -d, [--database=DATABASE]                                  # Preconfigure for selected database (options: mysql/postgresql/sqlite3/oracle/frontbase/ibm_db/sqlserver/jdbcmysql/jdbcsqlite3/jdbcpostgresql/jdbc)
                                                             # Default: sqlite3
      [--skip-gemfile], [--no-skip-gemfile]                  # Don't create a Gemfile
  -G, [--skip-git], [--no-skip-git]                          # Skip .gitignore file
      [--skip-keeps], [--no-skip-keeps]                      # Skip source control .keep files
  -M, [--skip-action-mailer], [--no-skip-action-mailer]      # Skip Action Mailer files
      [--skip-action-mailbox], [--no-skip-action-mailbox]    # Skip Action Mailbox gem
      [--skip-action-text], [--no-skip-action-text]          # Skip Action Text gem
  -O, [--skip-active-record], [--no-skip-active-record]      # Skip Active Record files
      [--skip-active-storage], [--no-skip-active-storage]    # Skip Active Storage files
  -P, [--skip-puma], [--no-skip-puma]                        # Skip Puma related files
  -C, [--skip-action-cable], [--no-skip-action-cable]        # Skip Action Cable files
  -S, [--skip-sprockets], [--no-skip-sprockets]              # Skip Sprockets files
      [--skip-spring], [--no-skip-spring]                    # Don't install Spring application preloader
      [--skip-listen], [--no-skip-listen]                    # Don't generate configuration that depends on the listen gem
  -J, [--skip-javascript], [--no-skip-javascript]            # Skip JavaScript files
      [--skip-turbolinks], [--no-skip-turbolinks]            # Skip turbolinks gem
  -T, [--skip-test], [--no-skip-test]                        # Skip test files
      [--skip-system-test], [--no-skip-system-test]          # Skip system test files
      [--skip-bootsnap], [--no-skip-bootsnap]                # Skip bootsnap gem
      [--dev], [--no-dev]                                    # Setup the application with Gemfile pointing to your Rails checkout
      [--edge], [--no-edge]                                  # Setup the application with Gemfile pointing to Rails repository
      [--rc=RC]                                              # Path to file containing extra configuration options for rails command
      [--no-rc], [--no-no-rc]                                # Skip loading of extra configuration options from .railsrc file
      [--api], [--no-api]                                    # Preconfigure smaller stack for API only apps
  -B, [--skip-bundle], [--no-skip-bundle]                    # Don't run bundle install
  --webpacker, [--webpack=WEBPACK]                           # Preconfigure Webpack with a particular framework (options: react, vue, angular, elm, stimulus)
      [--skip-webpack-install], [--no-skip-webpack-install]  # Don't run Webpack install

Runtime options:
  -f, [--force]                    # Overwrite files that already exist
  -p, [--pretend], [--no-pretend]  # Run but do not make any changes
  -q, [--quiet], [--no-quiet]      # Suppress status output
  -s, [--skip], [--no-skip]        # Skip files that already exist

Rails options:
  -h, [--help], [--no-help]        # Show this help message and quit
  -v, [--version], [--no-version]  # Show Rails version number and quit

Description:
    The 'rails new' command creates a new Rails application with a default
    directory structure and configuration at the path you specify.

    You can specify extra command-line arguments to be used every time
    'rails new' runs in the .railsrc configuration file in your home directory.

    Note that the arguments specified in the .railsrc file don't affect the
    defaults values shown above in this help message.

Example:
    rails new ~/Code/Ruby/weblog

    This generates a skeletal Rails installation in ~/Code/Ruby/weblog.

From here I noticed a few important things. First, the required syntax is very simple; you just need a directory name for the app, which I prefer to be the name of the app in snake_case. That's been given to me by the project assignment, and despite my irritation at the pun on Rails and a preceding Ruby project called "SalesEngine" I will use it for clarity. (I've already navigated to my Turing project's folder as my working directory, so I don't need to provide the entire path.) So here's where I started:

rails new rales_engine

Secondly, at least half of the options are ways to leave out certain files or gems. I knew from a practice exercise in one of our classes today that the Spring gem can cause problems in certain cases; a partner and I experienced Rails stalling out while trying to generate a model, for example. That gets me to here:

rails new rales_engine --skip-spring

Third, I also know that I intend to use RSpec as my testing library, and from past projects I know that the default testing library included with Rails is Minitest, which I don't want. -T will force Rails not to include Minitest and its associated directories, which clears the way for me to install RSpec later, so that gets me to here:

rails new rales_engine -T --skip-spring

(I don't know and can't find any documentation that explains why the single-dash options always tend to be written before the double-dash options in documentation and guides. The truth is out there. I did, however, learn from responses to this StackOverflow question that the single-dash options are abbreviations of the more verbose double-dash options and can be combined for brevity.)

Fourth, I know that I intend for this Rails app to only ever expose an API and not generate HTML Views. From the aforementioned practice exercise I know that the --api option creates a lighter installation of Rails focused on this purpose. This rubyonrails.org guide explains what this option does:

  • Configure your application to start with a more limited set of middleware than normal. Specifically, it will not include any middleware primarily useful for browser applications (like cookies support) by default.
  • Make ApplicationController inherit from ActionController::API instead of ActionController::Base. As with middleware, this will leave out any Action Controller modules that provide functionalities primarily used by browser applications.
  • Configure the generators to skip generating views, helpers, and assets when you generate a new resource.

This brings us to here:

rails new rales_engine -T --api --skip-spring

Finally, I know that I intend to use PostgreSQL as my database system. By default Rails will configure itself to use Sqlite3. The -d option followed by one of the options listed in the above help page (options: mysql/postgresql/sqlite3/...) will cause it to be configured otherwise. I don't believe I can combine it with the other abbreviated option as -Td because it takes an option. Now I have this:

rails new rales_engine -T -d postgresql --api --skip-spring

And this is what I used. It didn't catch on fire!

2. Configure Gems & Test Suite

Next, through a combination of head-scratching, trial-and-error, and many trips to the documentation of these gems and other tutorials, I installed and configured the gems I intend to use for testing, debugging, and serialization.

If you're smart you could probably add these all to your Gemfile in the root directory, run bundle once, add the gem-specific configuration code and be on your merry way. Or you could do it piecemeal, adding them one at a time and running bundle every time like I did. You do you. Some of the gems depend on files created by configuring RSpec, however, so it is probably worth getting that running first.

Other than FastJsonapi, all these gems go in the group :development, :test do block because they're not going to be used in production.

RSpec

This Medium article by Andrew Livingston is a keeper. The key takeaway for me was to run this command after installing the gem to create the spec sub-directory and, within it, the test helper files rails_helper.rb and spec_helper.rb:

rails generate rspec:install

Forgetting to do this and wondering where you put rails_helper is embarrassing and bad for your health.

Shoulda-Matchers

The Shoulda-Matchers gem allows you to test validations, relationships, and other non-method-related model attributes in your model tests. In addition to installing the gem, the following code block must be added to /spec/rails_helper.rb:

Shoulda::Matchers.configure do |config|
  config.integrate do |with|
    with.test_framework :rspec
    with.library :rails
  end
end

Or you can not, and waste twenty minutes of your life chasing vaporous error messages.

SimpleCov

SimpleCov is what I use for tracking test coverage (i.e., what percentage of your code is executed in the course of running your tests). There are two extra steps to getting it configured after installing the gem.

Add these lines to the top of rails_helper which load SimpleCov and tell it to start tracking:

require 'simplecov'
SimpleCov.start

Add the coverage report output folder to the .gitignore file created by Rails in the root directory so that the heap of HTML files it creates doesn't maddeningly show up as unstaged changes or untracked files in the eyes of Git:

/coverage

Pry

Pry is my best friend. And yours! Upon looking it up after seeing it in Rails' default Gemfile, I heard that Byebug is pretty great too, maybe even better, but until I make the switch Pry is the excellent debugging tool I can't do without. There aren't any additional steps to configure it, but while we're on it I highly recommend adding this to your Snippets file if you're an Atom user to be able to tab-complete from pry to both require Pry and add a binding:

'.source.ruby':
  'require "pry"':
    'prefix': 'pry'
    'body': 'require "pry"; binding.pry'

FactoryBot & Faker

For test data. See this presentation I gave on the subject.

FastJsonapi

See Medium article by Mike Cronin and "JSON Serialization in Rails" guide by Paweł Dąbrowski.

**If consuming other APIs:

Figaro

VCR & Webmock

Require in Gemfile, then bundle:

group :development, :test do
  # ...
  gem 'vcr'
  gem 'webmock'
end

VCR setup in rails_helper:

# VCR requirements
require 'vcr'
require 'webmock/rspec'

# VCR Configuration
VCR.configure do |config|
  config.cassette_library_dir = 'spec/cassettes'
  config.hook_into :webmock
  config.filter_sensitive_data("<GITHUB_API_KEY>") { ENV['GITHUB_API_KEY'] } # If using Figaro
end

Faraday

3. Create Database & Models

For our project we were given a set of CSV files to use as our production data. Our project requirements state that we'll want to create a Rake task to handle importing that data. Before tackling that, there need to be models and a database to import the data into. There are a number of steps I need to take:

  • Download the data files to a /data folder in my project directory
  • Set up the PostgreSQL database
  • Create the Rails models corresponding to each table
  • Create and run migrations to add the tables to the database

Lots to do! Let's get started.

Download data files to project directory

Rails does not have a conventional sub-directory to place data files in. However, among various articles and guides I've read that depend upon CSV, JSON, YAML, or other static (i.e., fixed and not dynamic) non-relational (i.e., one big spreadsheet without connections to other spreadsheets) data file formats, developers tend to place such files in a /data sub-directory within the root project directory. So I did that:

The CSV files were made available to us as part of a remote repo. Rather than clone down that repo to a separate directory and copy the files, or copy the text from Github and paste them into new files, I decided to do a little digging and find out how to download files directly from Github. This StackOverflow conversation supplied several methods, one of which is to use subversion (or SVN, an older alternative version-control system to Git) via the command line to download a particular directory.

These were the steps:

  • Using Github, navigate to the folder you want to download. In my case this was https://github.com/turingschool-examples/sales_engine/tree/master/data.
  • Modify the URL to match the format required by subversion by replacing tree/master with trunk. This gave me https://github.com/turingschool-examples/sales_engine/trunk/data.
  • Use svn checkout url to download the folder in your working directory with SVN. Within the data sub-directory of my project I ran:
svn checkout https://github.com/turingschool-examples/sales_engine/trunk/data

This gave me the six CSV files I wanted, but nested within an additional /data sub-directory because that was the folder I requested via SVN. That folder in turn contained a /.svn sub-directory containing SVN meta-data similar to what you'd get in the hidden /.git directory in a local Git repo. To mask my errors, I deleted that, moved the files up one directory, and deleted the extraneous /data/data directory. Was this worth the effort? Yes--For Science!

Set up the PostgreSQL database

This step was a simple matter of skimming this guide just to remember the command rails db:setup. Welp. Moving on.

Create the Rails models and database tables for each resource

Articles referenced:

4. Create a Rake task to import data

At this point I have the database, models, and their relationships with one another built out. The framework is (hypothetically) ready to import all the CSV data. Because I might make future changes to the test and development databases, I want the import process to be easily repeatable. This is where Rake tasks come in. I know nothing about creating Rake tasks.

This rubyonrails.org guide is quite thorough, and among other more relevant things let me know that "Rake is Ruby Make, a standalone Ruby utility that replaces the Unix utility 'make.'" My etymology-obsessed brain is now at ease. This rubyguides.com guide gave me a quick overview of how Rake tasks are defined and what they're useful for. Between them I get an idea of how to go about this.

Rake tasks can be defined either in the Rakefile file created by Rails in the root directory, or defined in individual files saved in /lib/tasks with a .rake extension. The basic syntax looks like this:

desc "Print a calming message."
task :calm_message do
  puts "Don't panic!"
end

As rubyonrails.org says, inside the task block "any valid Ruby code is allowed." It also says, "All your magic here." So let's make some magic happen inside a task block.

I'll start by listing what I need this task to do:

  • Destroy existing records in the database so it starts fresh on repeated uses
  • Use the built-in CSV library in Ruby to read and parse the CSV files
  • Iterate over the parsed data to create ActiveRecord objects stored in the database

Then I'll create a rake file (/lib/tasks/import.rake) and set up the basic framework:

desc "Clear records and import CSV fixture data."
task :import do
 
end

Destroy existing records

I started with research about the best method for destroying existing records. I went first to the docs, and found two methods: destroy_all and delete_all. destroy_all appears to be more thorough because it activates models' "callback" methods, while delete_all is more efficient but skips those.

Not sure which is preferable in this situation, I searched further and found this article by Kenny Hearn in which he elucidates the pros and cons. I decided in favor of destroy_all to begin with, erring on the side of a thorough reset, with the option open to switch to delete_all if I find it to be inconveniently time-consuming.

That last article made me aware that, while defining relationships between models, I had not given any of them the dependent: destroy_all attribute which tells ActiveRecord to automatically delete "dependent" objects on the "many" side of a has_many relationship if its belongs_to object on the "one" side is deleted. (Phew. There's probably a faster way to say that.) If I did, that would change which classes I need to destroy in this task.

Considering this prompted a second question: will I be exposing the ability to delete resources in my API? I reviewed the project requirements and, lo and behold, it is purely a data-retrieval API that responds only to GET HTTP requests. So the only time adding dependent: destroy_all would come into play is this Rake task or other test-data-cleanup tasks. Not worth the trouble.

I need to call this method on all six models, so that brings my Rake task to this state:

desc "Clear records and import CSV fixture data."
task :import do
  Customer.destroy_all
  Merchant.destroy_all
  Item.destroy_all
  Invoice.destroy_all
  ItemInvoice.destroy_all
  Transaction.destroy_all
end

This is getting a bit long already, so I'm bookmarking the idea of separating out this task's responsibilities into distinct tasks which can be run as a group.

Read and parse CSV files

I start again with a visit to the docs. I've interacted with CSV files in past projects, but it was two modules ago in our program and I want to refresh my understanding. A skim through the Ruby CSV class documentation reminds me that the foreach class method is "intended as the primary interface for reading CSV files," is ideal for iterating over data to turn them into database objects, and is simpler to use than new because you skip the extra step of instantiating a CSV object. foreach takes a path to the CSV file and any number of options, most importantly header: true which causes foreach to return each row as a CSV::Row object that acts like a hash and header_converters: :symbol which causes the CSV::Row object to convert the headers into symbols before using them as keys for the values in each row.

In other words, these options take CSV like these first two rows of customers.csv:

id,first_name,last_name,created_at,updated_at
1,Joey,Ondricka,2012-03-27 14:54:09 UTC,2012-03-27 14:54:09 UTC

And would give this back as the first value iterated upon by foreach:

<CSV::Row id:"1" first_name:"Joey" last_name:"Ondricka" created_at:"2012-03-27 14:54:09 UTC" updated_at:"2012-03-27 14:54:09 UTC">

Which looks an awful lot like an attributes hash I could use to instantiate a boatload of ActiveRecord objects. Let's get cooking!

...well, hold up. Look at that first key/value pair:

id:"1"

ID fields in the database are typically an integer data type. Will this cause problems down the road when we go to import this data, or will Rails work its syntactially-sugary Rails magic and just wink at it? I have no idea. We'll test it later--For Science! If I have the choice to validate the data type at the outset, though, I'll take it. CSV has another option you can add to convert all values that are able be converted to Numeric and/or DateTime, converters: :all. Testing this in irb by using CSV.read to store the array of arrays in a variable and then calling .first, I now get:

#<CSV::Row id:1 first_name:"Joey" last_name:"Ondricka" created_at:"2012-03-27 14:54:09 UTC" updated_at:"2012-03-27 14:54:09 UTC">

The id value is now an integer. Huzzah! The created_at and updated_at timestamp values are still strings, but DateTime is stored that way; we'll wait and see if we get errors when using these to create ActiveRecord objects.

Here's our Rake task updated with a CSV.foreach block:

desc "Clear records and import CSV fixture data."
task :import do
  Customer.destroy_all
  # ... lines omitted ...
  
  CSV.foreach('./data/customers.csv', headers: true, header_converters: :symbol, converters: :all) do |row|
    
  end
end

Alright, let's do this.

Create ActiveRecord objects from CSV data

In that foreach block, I want to create one Customer record for each CSV::Row object. If I did my job well setting up the models and tables, the symbol format keys should perfectly match the Customer attributes, so I can hypothetically pass the CSV::Row in the block variable row in as the single argument of Customer.create().

Well, at least say for myself that I thought to qualify that statement. Rails gave me a surprisingly helpful error message when I tested this out on a single row in the Rails console (rails c):

ArgumentError (When assigning attributes, you must pass a hash as an argument, CSV::Row passed.)

That's a simple enough problem to fix: I call .to_h on the CSV::Row object, and the record is created (without any complaints about the data type of the timestamps):

#<Customer id: 1, first_name: "Joey", last_name: "Ondricka", created_at: "2012-03-27 14:54:09", updated_at: "2012-03-27 14:54:09">

When I access that customer's created_at attribute with Customer.all.first.created_at, it comes back more verbose, so it looks like we're cooking with gas:

Tue, 27 Mar 2012 14:54:09 UTC +00:00

If I integrate this back in to the Rake task, it now looks like this:

desc "Clear records and import CSV fixture data."
task :import do
  Customer.destroy_all
  # ... lines omitted ...
  
  CSV.foreach('./data/customers.csv', headers: true, header_converters: :symbol, converters: :all) do |row|
    Customer.create(row.to_h)
  end
end

And if we do this again for the five other classes, we have this completed (and completely untested) Rake task:

desc "Clear records and import CSV fixture data."
task :import do
  Customer.destroy_all
  Item.destroy_all
  Invoice.destroy_all
  ItemInvoice.destroy_all
  Merchant.destroy_all
  Transaction.destroy_all

  CSV.foreach('./data/customers.csv', headers: true, header_converters: :symbol, converters: :all) do |row|
    Customer.create(row.to_h)
  end

  CSV.foreach('./data/invoices.csv', headers: true, header_converters: :symbol, converters: :all) do |row|
    Invoice.crate(row.to_h)
  end

  CSV.foreach('./data/invoice_items.csv', headers: true, header_converters: :symbol, converters: :all) do |row|
    InvoiceItem.create(row.to_h)
  end

  CSV.foreach('./data/items.csv', headers: true, header_converters: :symbol, converters: :all) do |row|
    Item.crate(row.to_h)
  end

  CSV.foreach('./data/merchants.csv', headers: true, header_converters: :symbol, converters: :all) do |row|
    Merchant.crate(row.to_h)
  end

  CSV.foreach('./data/transactions.csv', headers: true, header_converters: :symbol, converters: :all) do |row|
    Transaction.crate(row.to_h)
  end
end

Oookay, let's see what happens.

Yep, nothing.

The Debugging Doldrums

The first error I got was, thankfully, one that I'd anticipated possibly seeing; good old NameError.

NameError: uninitialized constant Customer

I had seen this note in the aforementioned rubyonrails.org Rake guide about needing to take an extra step to make a Rake task able to interact with your Rails environment:

If your need to interact with your application models, perform database queries and so on, your task should depend on the environment task, which will load your application code.

When I first saw that I said to myself, "Self, this is probably going to be relevant later." Thanks, past self!

That same guide demonstrates the syntax for making a task "depend on" another task like this:

task task_name: [:prerequisite_task, :another_task_we_depend_on] do

Implementing this in my import.rake task looks like this:

task import: :environment do

The next error was caused by a misspelling of the joins table model (InvoiceItem, not ItemInvoice). Gets me every time.

The next error was caused by me somehow not having run the migrations for two of the tables specifically on the development database; I must have run them specifically on the test database (rails db:migrate RAILS_ENV=test) in response to an RSpec error message and neglected to run it generally.

The next error was caused by the CSV library not having been loaded. I added the following line to the Rake task between the mass record destruction and CSV imports:

  require 'csv'

The next error was caused by a misspelling of 'create' as 'crate'. If you caught it earlier and were waiting this whole time to see it blow up, congratu-frickin-lations, Sherlock. Just because I'm an English major doesn't mean I'm not a human who makes mistakes. Geez. Anyway, I went ahead and fixed the other three 'crates' while I was at it.

The next...wait, I think's it working. Hey! HEY! It's working!

...It's also taking a really long time. Apparently creating tens of thousands of ActiveRecord objects is no small feat.

A couple minutes later, the rake task unceremoniously stops. (I'll add a completion message to the end of it for clarity.) To confirm that it worked, I'll run some simple ActiveRecord queries in the Rails console (rails c):

  • Customer.count => 1000. Checks out against number of lines in CSV minus the header. Same for Merchant.
  • Invoice.count => 0. Something went wrong there, apparently. Same with Item, InvoiceItem, and Transaction. I notice right off the bat that the models that succeeded in being imported were those with no foreign keys on them; that is, they have_many of something but do not belong_to anything. I'll start by assuming that's relevant. To find out more, I'll change the Rake file to use create! instead of create to force Rails to stop and produce an error message upon the first unsuccesful record creation attempt.

Sure enough, the Rake task fails with this error message on line 17, creating an Invoice:

ActiveRecord::RecordInvalid: Validation failed: Merchant must exist

Thinking this through, it occurs to me that the order in which I import the tables must be significant. I double-check the invoices.csv file and confirm that the customer_id column precedes the merchant_id column, and belongs_to :customer precedes belongs_to :merchant in the Invoice model, but the missing Merchant is what trips the error. I currently have the import of Customers first and Merchants last in the Rake file, so when creating an Invoice ActiveRecord was (presumably) able to find a Customer matching each customer_id but not able to find a Merchant matching each merchant_id. I will reorganize the CSV import blocks to account for their dependencies:

  • Customer
  • Merchant
  • Invoice (belongs to Customer and Merchant)
  • Item (belongs to Merchant)
  • InvoiceItem (belongs to Invoice and Item)
  • Transaction (belongs to Invoice)

...it's taking a really long time to run again. I think that's a good sign.

After a solid 3-4 minutes, it made it all the way to line 32 (importing Transactions) before hitting an error:

ActiveModel::RangeError: 4654405418249632 is out of range for ActiveModel::Type::Integer with limit 4 bytes

I check the transactions.csv file and confirm that "4654405418249632" is the credit_card_number for the very first row. Because I'm not positive whether this is a problem with the PostgreSQL data type I defined for that column in the migration or an unforeseen side-effect of the converters: :all CSV option, I tested this in the Rails console:

rails c
Loading development environment (Rails 6.0.2.1)
irb(main):001:0> require 'csv'
=> true
irb(main):002:0> transactions = CSV.read('./data/transactions.csv', headers: true, header_converters: :symbol, converters: :all)
=> #<CSV::Table mode:col_or_row row_count:5596>
irb(main):003:0> transactions.first
=> #<CSV::Row id:1 invoice_id:1 credit_card_number:4654405418249632 credit_card_expiration_date:nil result:"success" created_at:"2012
-03-27 14:54:09 UTC" updated_at:"2012-03-27 14:54:09 UTC">
irb(main):004:0> Transaction.create(transactions.first.to_h)
   (0.2ms)  BEGIN
  Invoice Load (0.5ms)  SELECT "invoices".* FROM "invoices" WHERE "invoices"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]

   (0.2ms)  ROLLBACK
Traceback (most recent call last):
        1: from (irb):4
ActiveModel::RangeError (4654405418249632 is out of range for ActiveModel::Type::Integer with limit 4 bytes)

So the problem is not the CSV type conversion. After returning to a guide to PostgreSQL data types that I referenced in my SQL Oddities guide, by way of forgetting it and finding it again through Google, I found the problem but not necessarily the solution. The INT data type "is a 4-byte integer that has a range from -2,147,483,648 to 2,147,483,647." "4654405418249632" exceeds that by quite a bit.

After reading this forum thread about preferred data types for storing credit card numbers, I came across a mention of a BIGINT data type. Another guide on postgresqltutorial.com about how the various Integer data types differ from one another says BIGINT "can store any number in the range of (-9,223,372,036,854,775,808,+9,223,372,036,854,775,807)." That would work. The folks in that forum thread point out, however, that a credit card number with leading zeros would get messed up, and you're never going to perform a calculation on it; so why not just store it as text?

Out of curiosity, I skimmed through the transactions.csv file to see if there were any with leading zeroes, and found that they uniformly begin with "4." Recalling something I read about this during a credit card checksum algorithm project earlier in our program, I looked it up and confirmed that the initial digit(s) of a credit card number signify which credit card system it belongs to. A leading "4" indicates a Visa card.

So it is both impossible in this dataset and highly unlikely in the real world to encounter a credit card number with a leading zero. Therefore, it comes down to other considerations. I'm persuaded by the argument that a number that will never be used in a calculation is better off as a string, because it says something to future developers about what that data is intended to be used for; and the BIGINT type takes up more space in memory than I need, whereas VARCHAR can be given the exact number of characters I need: VARCHAR(16).

However. As I discuss in my SQL Oddities guide, finding the right name for a data type is challenging because they vary from database system to database system, and then you have to deal with what ActiveRecord will recognize as what. After some digging I confirmed that PostgreSQL can handle varchar(n), but ActiveRecord only recognizes string and text, which is offset by letting you provide a limit: option to set the maximum number of characters.

In light of all that, I'll generate a migration to change the data type of that column like so:

rails g migration ChangeCreditCardNumberToBeVarcharInTransactions

This is verbose, but this guide suggests that--because Rails doesn't automate anything based on a migration name of this kind--it's valuable to be descriptive. The migration starts off as a bare skeleton:

class ChangeCreditCardNumberToBeVarcharInTransactions < ActiveRecord::Migration[6.0]
  def change
  end
end

Per that tutorial, the syntax for changing a column type is this:

  change_column :table_name, :column_name, :new_type

In my case, it looks like this:

  change_column :transactions, :credit_card_number, :string, limit: 16

I made one last related change, modifying the converters option for the CSV block which handles Transactions to have the value :date_time instead of :all to (hopefully) avoid converting the credit_card_number column into an integer. Let's run it.

...huh! Remember that conversation about dependent: destroy_all earlier? Well:

ActiveRecord::InvalidForeignKey: PG::ForeignKeyViolation: ERROR:  update or delete on table "customers" violates foreign key constraint "fk_rails_0d349e632f" on table "invoices"
DETAIL:  Key (id)=(1) is still referenced from table "invoices".

ActiveRecord is essentially complaining because an Invoice record will be orphaned once its associated Customer is deleted. Googling non-project-specific portions of the error message eventually led me to the official PostgreSQL manual, and to this specific section on Foreign Keys. The relevant section reads:

We know that the foreign keys disallow creation of orders that do not relate to any products. But what if a product is removed after an order is created that references it? SQL allows you to handle that as well. Intuitively, we have a few options ... Restricting and cascading deletes are the two most common options. RESTRICT prevents deletion of a referenced row. NO ACTION means that if any referencing rows still exist when the constraint is checked, an error is raised; this is the default behavior if you do not specify anything.

It goes on to say that CASCADE is another option, which is analogous to ActiveRecord's dependent: destroy_all option. But I'm in the same position as before: I don't intend to ever delete records except in this Rake task, so I'd prefer not to go back and reconfigure the models.

It occurs to me that the same workaround I used with importing might also work with deleting: reorganize the list, but in the opposite direction so that dependent objects are destroyed before the objects they depend upon:

  • Transaction (belongs to Invoice)
  • InvoiceItem (belongs to Invoice and Item)
  • Item (belongs to Merchant)
  • Invoice (belongs to Customer and Merchant)
  • Merchant
  • Customer

...It's gone back to taking a long time. I'll go make a cup of tea.

Okay, I'm back. No errors await me. I pop into Rails console and check the numbers--all present and accounted for. I connect to the database using the free Postico PostgreSQL client which lets you view all your data via a graphic interface, and it's all there and appears to be formatted & interrelated correctly. Also, holy cow, Postico is really helpful.

I made it through the doldrums! I have data, I have a database--I'm ready to make an API!

5. Build RESTful API Endpoints

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment