Skip to content

Instantly share code, notes, and snippets.

@dustalov
Last active August 26, 2024 21:05
Show Gist options
  • Save dustalov/2021295 to your computer and use it in GitHub Desktop.
Save dustalov/2021295 to your computer and use it in GitHub Desktop.
Link Grammar for Russian (Parser of the Parser)
# encoding: utf-8
# Processor of Link Grammar for Russian output.
#
class LinkParser::Lexer
# This exception raises when link grammar is invalid and Lexer
# is unable to understand the output.
#
class InvalidLinkGrammar < RuntimeError
attr_reader :input
# @private
def initialize input
super 'Invalid link grammar'
@input = input
end
end
# Abstract syntax tree of the parser output.
#
AST = Struct.new(:value)
# A structure that represents link in Link Grammar.
# Includes type and position definitions along with word and its
# morphosyntactic descriptors.
#
Link = Struct.new(:type, :subtype, :id, :word, :msd)
# A structure that represents word in Link Grammar. Includes
# morphosyntactic descriptors.
#
Word = Struct.new(:word, :msd)
attr_reader :input, :lexer
private :input, :lexer
# Create a new {Lexer} instance to process the given parser output.
#
# @param input [String] output of the parser.
#
def initialize input
@input = input
end
# Perform parsing of the parser output. This wording is silly, but
# I really can't implement good Link Parser right now.
#
# @return [AST] the AST of given parser output.
#
def parse
@lexer = StringScanner.new(input)
parse_value.value
ensure
lexer.eos? or
raise('Unexpected data: "%s"' % lexer.string[lexer.pos..-1])
end
protected
# Parse any supported syntactic construction of our parser.
#
# @return [AST] the AST of given parser output.
#
def parse_value
trim_space!
parse_list or
parse_string or
parse_link or
raise InvalidLinkGrammar, input
ensure
trim_space!
end
# List parser.
#
# @return [AST] the AST of given parser output.
#
def parse_list
return false unless lexer.scan /\(\s*/
list = []
more_values = false
while contents = parse_value rescue nil
list << contents.value
more_values = lexer.scan /\s+/
end
raise 'Missing value' if more_values
lexer.scan /\s*\)\s*/ or raise 'Unclosed list'
AST.new(list)
end
# String parser.
#
# @return [AST] the AST of given parser output.
#
def parse_string
return false unless lexer.scan /"/
string = lexer.scan(/[^\"]+/)
lexer.scan /"/ or raise 'Undetermined string'
AST.new(Word.new(*classify_word(string)))
end
# Link parser.
#
# @return [AST] the AST of given parser output.
#
def parse_link
return false unless token = lexer.scan(/[\wА-Яа-яЁё!:\-\.\,\?]+/)
complex_type, id, string = token.split(/:/)
type, subtype = complex_type.match(/([A-Z]+)(.*)/)[1..2]
AST.new(Link.new(type, subtype, id.to_i, *classify_word(string)))
end
# Skip whitespace characters because we are not interested in them.
#
def trim_space!
lexer.scan /\s+/
self
end
# Word classification method that idenfities LEFT-WALL, RIGHT-WALL,
# punctuation and regular word tokens.
#
# @param word [String] the word to classify.
#
# @return [Array<[String, Symbol], [String, NilClass]>]
# classification data.
#
def classify_word(word)
case word
when 'LEFT-WALL' then [ :left_wall ]
when 'RIGHT-WALL' then [ :right_wall ]
when '.' then [ '.' ]
else
if unknown_word = word.match(/^\[(.+)\]$/)
[ unknown_word[1] ]
else
word.split('.', 2).map { |s| !s.empty? ? s : nil }
end
end
end
end
@mirth
Copy link

mirth commented Jan 30, 2013

В link_parser.rb:62, видимо, закралась опечатка.

@dustalov
Copy link
Author

Я так невнимателен! Спасибо, исправил.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment