Hybrid Databases in Rails: Using SQLite Alongside PostgreSQL

My Rails site Booko has several ancillary services which support it. There's the Cover Art system which downloads and resizes book cover art. CSV-to-API service which downloads CSV files of book prices and turns them into an API for Booko to use. The latest of these, which I've just (re-)written, is a Ruby app called BookHarvest which scrapes publisher websites for book information, so Booko has up-to-date titles and cover art. These ancillary services commonly use an API for Booko and push data in on a schedule.

SQLite is having a bit of a resurgence in the Rails world, largely thanks to people who've optimised how Rails communicates with it, vastly improving its performance for Ruby and Rails apps. SQLite's simplicity and high performance are perfect for single host systems. If you can use SQLite rather than MySQL and PostgreSQL, you'll dramatically simplify your deployment and administration of your applications. There are now the Solid series of Gems - solid-cache, solid-queue and solid-cable all of which can take advantage of SQLite for database work.

One of the tasks you need to check-off when considering which database to use for your app, is how to back and restore it. For SQLite, I'm using Litestream. It's quite brilliant - in a nutshell, it uses an S3 bucket as destination, uploading the database and then tracks the WAL files, posting them up to s3 at whatever frequency makes sense. The sync-interval defaults to 1 second. This provides near instant backups without any application impacts. Recovery via Litestream is likewise straightforward - it will download the database and the WAL files and apply them to the database, leaving you with up-to-date replica of the origin.

Once I'd set up Litestream to backup the BookHarvest app, I discovered that restoring the database directly into any of Booko's host's /storage directory was possible. This is quite a simple and straightforward processes using Litestream's restore command. And once I have the extra SQLite database on the host, I can import data from it directly, skipping the overhead and latency of using an API to insert data.

Accessing an external database from a Rails app is quite straightforward. Here's a simplified version of the Booko side of the BookHarvest service:

# /app/services/book_harvest.rb

module BookHarvest
  class Base < ActiveRecord::Base
    self.abstract_class = true

    establish_connection(
      adapter: "sqlite3",
      database: Rails.root.join("storage/book_harvest.sqlite3").to_s,
      readonly: true
    )
  end

  def self.import
    BookHarvest::Book.find_each do |book|
      next if book.empty?

      product = import_book(book)
    rescue => e
      puts "Error importing book: #{e.message}"
      puts e.backtrace
    end
  end

  def self.import_book(book)
    return if book.empty?

    booko_book = ::Book.find_by(book.gtin)
    # rest of the import code
  end

  # Here is a regular ActiveRecord model
  # Use associations or any ActiveRecord code here
  class Book < Base
    attribute :alternates, :json, default: []
    attribute :rrp, :json, default: []
    attribute :image_urls, :json, default: []
    attribute :tags, :json, default: []

    def empty?
      return true if title.blank? && subtitle.blank? && description.blank? && contributors.empty? && image_urls.empty? && alternates.empty? && tags.empty?
      false
    end 
  end
end

This single file sets up all the ActiveRecord Models which map to your new, recently restored SQLite database.

The process is now automated, using SolidQueue, of the Rails Job processing libraries, to regularly restore the BookHarvest database and run BookHarvest.import to import them.

This is a very effective, simple and fast technique to get data transferred into your Rails app.

update: Looks like the Litestream guys already had this idea! https://fly.io/blog/skip-the-api/