Skip to content

Instantly share code, notes, and snippets.

@edouard-lopez
Created September 4, 2013 08:12
Show Gist options
  • Save edouard-lopez/6434123 to your computer and use it in GitHub Desktop.
Save edouard-lopez/6434123 to your computer and use it in GitHub Desktop.
For one unicode (1st column), keep only lines having a `kMandarin` value and `kDefinition`, how to do so ?
U+3400 kCantonese jau1
U+3400 kDefinition (same as U+4E18 丘) hillock or mound
U+3400 kMandarin qiū
U+3401 kCantonese tim2
U+3401 kDefinition to lick; to taste, a mat, bamboo bark
U+3401 kHanyuPinyin 10019.020:tiàn
U+3401 kMandarin tiàn
U+3402 kDefinition (J) non-standard form of U+559C 喜, to like, love, enjoy; a joyful thing
U+3404 kMandarin kuà
U+3405 kCantonese ng5
U+3405 kMandarin wǔ
U+3400 kMandarin qiū
U+3401 kMandarin tiàn
@edouard-lopez
Copy link
Author

Add a condition using grep, for instance:

while read -r unicode field pinyin;
do
 hasDef="$(grep -G 0q "$unicode[[:space:]]kDefinition" ./resources/unihan/Unihan_Readings.txt)"
  if [[ $unicode = U+* && $field = kMandarin && $hasDef ]]; then
    printf '%s;%b;%s\n' \
      "$unicode" \
      "\\u${unicode#U+}" \
      "$pinyin" \
    >> "$outputFile"
  fi
done < "$inputFile"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment