Skip to content

Instantly share code, notes, and snippets.

@mbklein
Created April 26, 2010 16:24
Show Gist options
  • Save mbklein/379543 to your computer and use it in GitHub Desktop.
Save mbklein/379543 to your computer and use it in GitHub Desktop.
kleinm-apim24:Desktop KleinM$ irb
>> require 'open-uri'
=> true
>> raw = open('https://catalog.library.jhu.edu/mods/?format=marc&bib=1144347') { |io| io.read }
=> "\nLDR:\302\24001341nam 2200301 a 450 \n005: 19971120234400.0\n008: 890316s1988 caua b 101 0 eng \n010: \302\240\302\240$a 86025055\n020: \302\240\302\240$a0520055756 (alk. paper)\n035: \302\240\302\240$a1144347\n035: \302\240\302\240$aADR9480EI\n035: \302\240\302\240$a(CStRLIN)MDJGADR9480-B\n040: \302\240\302\240$dMdBJ\n043: \302\240\302\240$aaz-----\n049: \302\24000\302\240$aJHE\n050: \302\24000\302\240$aBP63.A4$bS646 1988\n082: \302\24000\302\240$a297/.14/0954$219\n245: \302\24000\302\240$aShari\314\204\312\273at and ambiguity in South Asian Islam /$cedited by Katherine P. Ewing.\n260: \302\24000\302\240$aBerkeley :$bUniversity of California Press,$cc1988.\n300: \302\24000\302\240$axiii, 321 p. :$bill. ;$c24 cm.\n500: \302\24000\302\240$aPapers from the conference entitled \"South Asian Islam: moral principles in tension\" held at the Pendle Hill Conference Center in Pennsylvania, May 22-24, 1981.\n500: \302\24000\302\240$a\"Sponsored by the Joint Committee on South Asia of the Social Science Research Council and the American Council of Learned Societies\"--P. facing t.p.\n504: \302\24000\302\240$aIncludes bibliographies and index.\n650: \302\240 0\302\240$aIslam$zSouth Asia$vCongresses.\n650: \302\240 0\302\240$aIslamic law$zSouth Asia$vCongresses.\n700: \302\240 0\302\240$aEwing, Katherine Pratt\n710: \302\240 0\302\240$aJoint Committee on South Asia\n910: \302\240 0\302\240$a1144347$bHorizon bib#\n999: \302\240 0\302\240$lmain$nelc$s1$wi$aBP63.A4$bS783 1988$mc. 1$61$tnorm$c4$103/16/1989$211/02/1995$711/02/1995$i31151005777655$z96f@$zNCC:L$xemsel$henorm$gemain\n\n"
# Some copy-and-paste action in here
>> title = "Shari\314\204\312\273at and ambiguity in South Asian Islam"
=> "Shari\314\204\312\273at and ambiguity in South Asian Islam"
>> print title
Sharīʻat and ambiguity in South Asian Islam=> nil
# Convert title to an array of hex bytes by unpacking to integers and using String.to_s(base)
>> bytes = title.unpack('U' * title.length).collect { |b| b.to_s(16) }
=> ["53", "68", "61", "72", "69", "304", "2bb", "61", "74", "20", "61", "6e", "64", "20", "61", "6d", "62", "69", "67", "75", "69", "74", "79", "20", "69", "6e", "20", "53", "6f", "75", "74", "68", "20", "41", "73", "69", "61", "6e", "20", "49", "73", "6c", "61", "6d"]
# Unicode (module) to the rescue!
>> require 'unicode'
=> true
# Convert title to its precomposed form
>> title = Unicode.normalize_KC(title)
=> "Shar\304\253\312\273at and ambiguity in South Asian Islam"
>> bytes = title.unpack('U' * title.length).collect { |b| b.to_s(16) }
=> ["53", "68", "61", "72", "12b", "2bb", "61", "74", "20", "61", "6e", "64", "20", "61", "6d", "62", "69", "67", "75", "69", "74", "79", "20", "69", "6e", "20", "53", "6f", "75", "74", "68", "20", "41", "73", "69", "61", "6e", "20", "49", "73", "6c", "61", "6d"]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment