Skip to content

Instantly share code, notes, and snippets.

@khanh96le
Created March 10, 2017 03:36
Show Gist options
  • Save khanh96le/8510f8856065d94758dd93797828eeb1 to your computer and use it in GitHub Desktop.
Save khanh96le/8510f8856065d94758dd93797828eeb1 to your computer and use it in GitHub Desktop.
Convert column to text
import pandas as pd
import io
content = open("test5.iob2.txt", "r").read().strip()
result = content
lines = result.split("\n\n")
lines = [pd.read_table(io.StringIO(line.decode("utf-8")), names=["text", "tag"]) for line in lines]
def combine_text(first, second):
if second[0]['tag'] == 'I_W':
first[-1]['text'] = first[-1]['text'] + "_" + second[0]['text']
else:
first += second
return first
def convert_column_to_text(line):
line = line.T.to_dict().values()
line = [[token] for token in line]
line = reduce(combine_text, line)
line = [token["text"] for token in line]
line = " ".join(line)
return line
lines = [convert_column_to_text(line) for line in lines]
result = "\n".join(lines)
open("my_output.txt", "w").write(result)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment