Skip to content

Instantly share code, notes, and snippets.

@kojix2
Last active January 25, 2021 08:12
Show Gist options
  • Save kojix2/9f319db771603963f971310f13096f42 to your computer and use it in GitHub Desktop.
Save kojix2/9f319db771603963f971310f13096f42 to your computer and use it in GitHub Desktop.
ruby-htslib Interim report

ruby-htslib - High-throughput sequencing data manipulation for Ruby - Interim report

(Created by Google Translate)

2021-01-25 kojix2

Github: https://github.com/kojix2/ruby-htslib

Slack: https://sciruby.slack.com (Japanese)

Overview

Ruby-htslib is the Ruby bindings to HTSlib, a C library for processing high throughput sequencing (HTS) data. It will provide APIs to read and write file formats such as SAM/BAM and VCF/BCF.

In recent years, next-generation sequencing (NGS) technologies for reading DNA and RNA sequences have become popular in the life science field. We will provide a way to manipulate the HTS file formats from Ruby. We aim to improve the Ruby ecosystem in genomics.

Implementation

  • Low level API

    • Added about 320 htslib functions to the HTS::FFI module.
    • Mainly regular expression replacement and manual additions.
    • Bit field is not supported.
    • Only few macros have been implemented.
    • Created an example to read a Bam file using low level API.
  • High level API

Future plan

  • Low level API

    • Improving reliability
      • Add tests little by little.
    • Support for bit fields, macro fuctions, constants, etc.
    • Automatic generation of the bindings
      • currently considering c2ffi.
  • High level API

    • Currently working.
  • Adding some advanced analysis workflow as an example.

Not-to-do

  • Performance improvement.
  • Integration with BioRuby API inside ruby-htslib
  • Migration from FFI to Fiddle

Appendix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment