Last active
November 8, 2021 14:16
-
-
Save Forgo7ten/1c678351b09ce8d302f47a9fb00e4e10 to your computer and use it in GitHub Desktop.
python简单去除txt文本文件中的相同行
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# coding=utf-8 | |
# @File : removeSameRow.py | |
# @Desc : 检测文件中相同行,并去重 | |
# @Author : Forgo7ten | |
# @Time : 2021/11/3 | |
# 原始文件名 | |
readFilePath = "./original.txt" | |
# 输出文件名 | |
outFilePath = "./new.txt" | |
# 对文件进行读取 | |
outfile = open(outFilePath, "w", encoding='UTF-8') | |
readfile = open(readFilePath, "r", encoding='UTF-8') | |
lines_seen = set() | |
for line in readfile: | |
# 去除空格和换行 | |
line = line.strip(' \n') | |
if line not in lines_seen: | |
# 如果之前没有出现过,写入新文件 | |
outfile.write(line + '\n') | |
# 添加到set集合中 | |
lines_seen.add(line) | |
readfile.close() | |
outfile.flush() | |
outfile.close() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment