Skip to content

Instantly share code, notes, and snippets.

View simg's full-sized avatar

Simon Gardner simg

View GitHub Profile
@simg
simg / data_importer.js
Last active August 29, 2015 14:20
Non-blocking single threaded data import from xml to postgresql "template" using node.js
/*
This is a simple node program originally written to import a large volume of data
stored in XML files and import it into a postgres database.
It uses the async library to implement paralell processing with a concurrency of 20.
Adjusting the concurrency can alter the performance depending on the machine and the
exact workload.
The program is single threaded, so for multi-core machines you might want to consider
using the cluster.js library to implement multi-threading for further increases in
@simg
simg / Node-pg Entity Relationship Benchmark.md
Last active September 30, 2019 05:18
Benchmarking the performance of creating entity relationships with Postgres Arrays compared to "join tables"

A very common pattern in relational databases is the use of a join table to create one-to-many or many-to-many relationships between entities.

eg, something like:

CREATE TABLE table1 ( 
  id serial PRIMARY KEY 
, acolumn character varying); 

CREATE TABLE table2 (

// run using: node --expose_gc loop_reduce_test.ts
// to allow for high precision timing
function arrayReduceTest(ary) {
return ary.reduce((acc, x) => {
return acc + x;
}, 0);
}
[1 of 3] Compiling Codec.Archive.Zip.Util ( src/Codec/Archive/Zip/Util.hs, .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/Codec/Archive/Zip/Util.o )
/home/www/projects/datamine/components/zip-conduit-0.2.2.2/src/Codec/Archive/Zip/Util.hs:110:25: warning: [-Wdeprecations]
In the use of type constructor or class ‘Sink’
(imported from Data.Conduit, but defined in conduit-1.3.1.1:Data.Conduit.Internal.Conduit):
Deprecated: "Use ConduitT directly"
|
110 | crc32Sink :: Monad m => Sink ByteString m Word32
| ^^^^
import Codec.Archive.Zip (sourceEntry, EntrySelector, getEntrySource, getEntry, getEntryName, getEntries, withArchive)
import Control.Monad.IO.Class (MonadIO, liftIO)
import Data.ByteString (ByteString)
--import Data.Foldable (for)
import Conduit (mapC, mapM_C, yieldMany, ($$))
import Data.Conduit (Conduit(..), yield, runConduit, (.|), awaitForever)
import Data.Conduit.Binary as CB
import qualified Data.Conduit.List as CL
import qualified Data.Csv as Csv
@simg
simg / stateT.hs
Created August 3, 2019 10:48
StateT Question
{-
This program parses a zip file and for each entry in the zip generates two lists of strings.
The lists of strings contain many values, some of which would be the same and some will be different.
I would like to count the number of instances of each string in both lists.
eg ["a1", "b1", "aa1", "aa1, "a1"] -> ["a1" -> 2, "b1" -> 1, "aa1" -> 2 ]
The zip contains very many small files.
@simg
simg / main.hs
Last active August 4, 2019 14:54
Aeson Encode List of Tuples
data Summary = Summary {
count :: Int
, list1 :: Map Text Int
} deriving (Generic, Show)
instance ToJSON Summary where
toJSON (Summary {..}) = object [
"count" .= count
, "list1" .= sortBy (comparing snd) $ toList names :: [(Text, Int)]
]