Skip to content

Instantly share code, notes, and snippets.

@noteed
Last active March 20, 2021 22:21
Show Gist options
  • Save noteed/012e9e8d9744f7204fa2abfee629ce00 to your computer and use it in GitHub Desktop.
Save noteed/012e9e8d9744f7204fa2abfee629ce00 to your computer and use it in GitHub Desktop.
Pandoc, SQLite, and FTS

This is a Gist demonstrating how to populate a SQLite database with some content coming from a Markdown file, adding full-text search capability, and querying it with a small Scotty-based web server.

In particular, this uses

  • direnv and a shell.nix file to easily get a development environment,
  • pandoc --to markdown to remove any metadata block (an alternative is --to plain,
  • SQLite's readile function to insert the content of a file in the database,
  • SQLite's FTS feature,
  • a trigger to automatically insert data into the FTW table (an INSERT INTO copying the data would have worked too, since that database is only used for reading),
  • Scotty to write a small Haskell script,
  • sqlite-simple to query the database from Haskell.

The content of the .direnv file is:

NIX_PATH=nixpkgs=channel:nixos-unstable
use_nix

Creating the database is done using the init.sh file, which prepares the text content of the database, and then executes init.sql to create the schema and insert data.

There is also search.sh script to perform the search query from the command-line.

#! /usr/bin/env bash
# Create a SQLite database, allowing FTS search of the .md files.
# Using Pandoc removes the metadata block.
pandoc --to markdown page-1.md > page-1.txt
pandoc --to markdown page-2.md > page-2.txt
sqlite3 content.db < init.sql
--------------------------------------------------------------------------------
-- Schema.
--------------------------------------------------------------------------------
CREATE TABLE prototype_pages (
url TEXT PRIMARY KEY,
body TEXT NOT NULL
);
CREATE VIRTUAL TABLE prototype_pages_fts USING fts5 (
body,
content='prototype_pages',
content_rowid='rowid'
);
-- Instead of a on-insert trigger, a simple INSERT INTO at the end of this file
-- would work too.
CREATE TRIGGER prototype_pages_ai AFTER INSERT ON prototype_pages
BEGIN
INSERT INTO prototype_pages_fts (rowid, body)
VALUES (new.rowid, new.body);
END;
--------------------------------------------------------------------------------
-- Data.
--------------------------------------------------------------------------------
-- Insert the content of the Markdown files, prepared in init.sh first.
INSERT INTO prototype_pages (url, body) VALUES
("/page-1.html", readfile('page-1.txt')),
("/page-2.html", readfile('page-2.txt'));
title
Page 1

This is a sample page to try using the full-text search feature of SQLite together with Markdown content.

title
Page 2

This is a second page (see page 1).

#! /usr/bin/env bash
# This is an example query to search the term on the (two) Markdown files. The
# found token is bracketed by `<b/>` tags and newlines are replaced by a space.
sqlite3 content.db "
SELECT
replace(snippet(prototype_pages_fts, -1, '<b>', '</b>', '...', 10), x'0A', ' ')
FROM prototype_pages_fts
WHERE prototype_pages_fts match 'sqlite' order by rank"
-- A small Scotty-based web server exposing a search bar, and a handler to
-- perform a search query against a SQLite database. (The search query is about
-- the same as in search.sh.)
{-# LANGUAGE OverloadedStrings #-}
import Data.Monoid (mconcat)
import qualified Data.Text.Lazy as LT
import Database.SQLite.Simple
import Database.SQLite.Simple.FromRow
import Web.Scotty
--------------------------------------------------------------------------------
main :: IO ()
main = do
scotty 9010 $ do
get "/" $ do
html
"<form method=\"GET\" action=\"/search\">\n\
\<input type=\"text\" name=\"q\" id=\"search\">\n\
\<button type=\"submit\">Search</button>\n\
\</form>"
get "/search" $ do
term <- param "q"
rs <- liftAndCatchIO $ searchTerm' term
html $ mconcat ("Results:<br>" : map ((`LT.append` "<br>") . searchResultBody) rs)
--------------------------------------------------------------------------------
searchTerm' :: LT.Text -> IO [SearchResult]
searchTerm' term = do
conn <- open "content.db"
rs <- searchTerm conn term
close conn
return rs
searchTerm :: Connection -> LT.Text -> IO [SearchResult]
searchTerm conn term =
query conn
"SELECT\n\
\ replace(snippet(prototype_pages_fts, -1, '<b>', '</b>', '...', 16), x'0A', ' ')\n\
\FROM prototype_pages_fts\n\
\WHERE prototype_pages_fts match ? order by rank"
(Only term)
--------------------------------------------------------------------------------
data SearchResult =
SearchResult
{ searchResultBody :: LT.Text
}
deriving (Show)
instance FromRow SearchResult
where
fromRow = SearchResult <$> field
{ pkgs ? import <nixpkgs> {} }:
pkgs.mkShell {
buildInputs = [
pkgs.sqlite
(pkgs.haskellPackages.ghcWithPackages (hpkgs: [
hpkgs.pandoc
hpkgs.scotty
hpkgs.sqlite-simple
]))
];
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment