Last active
January 23, 2022 06:14
-
-
Save sriranggd/236b230d63e7652ce6780bc0fc603556 to your computer and use it in GitHub Desktop.
ವರ್ಡಲ್ಲಾ ಸಹಾಯಕ
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# encoding: utf-8 | |
# | |
# The Kannada dictionary file needed for this script is available here : https://github.com/alar-dict/data | |
# Download the YAML file and place it in the same directory as this script. | |
# | |
# Using this script : | |
# This script is useful in an IRB console to be used interactively. | |
# | |
# 1. Lanuch irb | |
# 2. Load this script with require 'wordalla.rb' | |
# 3. It will take a few seconds to parse the big Kannada dictionary file. | |
# 4. After that it will filter out all the 5 letter words and keep it ready for further filtering. | |
# 5. Based on what the wordalla website is showing you, you can filter out the words by calling the method filter_with_includes_and_excludes. | |
# 6. Inputs for this method are : | |
# a. words : Input the list of words to be filtered. You can pass `fives` here. That is where script stores the words of length five | |
# b. must_include : Array of letters that must be included | |
# c. must_exclude : Array of letters that must be excluded | |
# | |
require 'yaml' | |
ANUSVARA = "\u0C82".freeze | |
VISARGA = "\u0C83".freeze | |
VOTTU = "್".freeze | |
VOWEL_SIGNS = %w( ಾ ಿ ೀ ು ೂ ೆ ೇ ೈ ೊ ೋ ೌ ೃ).freeze | |
def kannada_word_length(word) | |
length = 0 | |
is_vattakshara = false | |
word.each_char do |letter| | |
next if (letter == ANUSVARA || letter == VISARGA) | |
next if VOWEL_SIGNS.include?(letter) | |
if (letter == VOTTU) | |
is_vattakshara = true | |
next | |
end | |
if (is_vattakshara) | |
is_vattakshara = false | |
else | |
length+= 1 | |
end | |
end | |
return length | |
end | |
def filter_with_includes_and_excludes(words, must_include = [], must_exclude = []) | |
filtered_words = words.select do |w| | |
has_all_include = true | |
must_include.each do |letter| | |
unless w['entry'].include?(letter) | |
has_all_include = false | |
break | |
end | |
end | |
next unless has_all_include | |
has_any_excluded = false | |
must_exclude.each do |letter| | |
if w['entry'].include?(letter) | |
has_any_excluded = true | |
break | |
end | |
end | |
next if has_any_excluded | |
true | |
end | |
return filtered_words | |
end | |
dict = YAML.load_file('alar.yml') | |
fives = dict.select { |w| kannada_word_length(w['entry']) == 5 } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment