Skip to content

Instantly share code, notes, and snippets.

@markmacgillivray
markmacgillivray / nrecs2es.py
Created August 21, 2012 09:52
Example of setting up an ES index with nested objects
#
# gisted at https://gist.github.com/3414096
#
# DO ALL OF THIS FIRST, BEFORE RUNNING THIS SCRIPT
#
# Download elasticsearch (latest version) and unpack it. More info here:
# http://www.elasticsearch.org/download/
#
# Then, run it - here is an example:
# sudo /opt/elasticsearch/bin/elasticsearch start
@markmacgillivray
markmacgillivray / additional
Created May 10, 2012 10:46
Convert records from Medline, MalariaWorld, and Wikipedia Open Access Media Importer from their native formats into BibJSON
# query to retrieve articles relevant to malaria from indices
# and write the result object to search.json
# (change ES URL target and size param for different datasets)
# size can be discovered by looking at the value in data['hits']['total']
# 10 are returned at a time by default, so just increase to a number larger than total
curl -X GET 'http://localhost:9200/medline2012/record/_search?size=10000&q=gametocyte%20OR%20merozoite%20OR%20sporozoite%20OR%20trophozoite%20OR%20schizont%20OR%20artemisia%20OR%20ITN%20OR%20LLIN%20OR%20malaria%20OR%20mosquito%20OR%20anopheles%20OR%20plasmodium%20OR%20falciparum%20OR%20vivax%20OR%20ovale%20OR%20malariae%20OR%20knowlesi%20OR%20DDT%20OR%20pyrethroid%20OR%20carbamate%20OR%20organophosphate%20OR%20ogranochlorine%20OR%20bednet%20OR%20repellent%20OR%20artemisinin%20OR%20chloroquine%20OR%20quinine%20OR%20artesunate%20OR%20lumefantrin%20OR%20mefloquine%20OR%20atovaquone%20OR%20paludrine%20OR%20"insecticide%20treated%20bednet"%20OR%20"indoor%20residual%20spraying"' -o search
@markmacgillivray
markmacgillivray / bnb2bibjson.py
Created February 3, 2012 18:24
Convert BNB from xml files into bibjson files
# Used to convert the data at: http://thedatahub.org/dataset/jiscopenbib-bl_bnb-1
# to a JSON format suitable for importing into BibServer
# NOTE - there will be an error in the output files. I noticed this after running,
# so used another script to fix - see the attached file
# also, the final file on this gist shows how to upload to an elasticsearch.
# I also made some changes to the JSON in that file before indexing it.
# these bits and pieces could be put into one file and done without the various writing to disk.
# But it was fine for the way I was doing this (intermittently)