Last active
August 29, 2015 14:18
-
-
Save matthieuauger/dc6e3de83624c29ff286 to your computer and use it in GitHub Desktop.
Link articles in "code-civil"
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/python | |
# -*- coding: utf-8 -*- | |
import os | |
import re | |
def find_between (s, first, last): | |
try: | |
start = s.index(first) + len(first) | |
end = s.index(last, start) | |
return s[start:end] | |
except: | |
return -1 | |
def link_or_self(matched_article, article_path_dict, current_article): | |
if matched_article == current_article or matched_article in {"1", "2", "3", "4", "5", "6", "7", "8", "9"} or matched_article not in article_path_dict: | |
return " " + matched_article | |
return " [{}]({})".format(matched_article, article_path_dict[matched_article]) | |
# Dictionary article number -> file path | |
article_path_dict = {} | |
for (dir, _, files) in os.walk("."): | |
for filename in files: | |
article = find_between(filename, 'Article ', '.md') | |
if article == -1: | |
continue | |
print os.path.join(dir, filename)[1:] | |
article_path_dict[article] = os.path.join(dir, filename)[1:] | |
for (dir, _, files) in os.walk("."): | |
for filename in files: | |
article = find_between(filename, 'Article ', '.md') | |
if article == -1: | |
continue | |
path = os.path.join(dir, filename) | |
with open(path) as f: | |
content = f.read() | |
content = re.sub(r' (\d+\-?\d*\-?\d*)', lambda m: link_or_self(m.group(1), article_path_dict, article), content) | |
with open(path, 'w') as f: | |
f.write(content) |
Ouais ok, en regardant vite fait je vois qu'il y a pas mal de connecteurs possibles, ex. x à y et z.
Donc soit quelqu'un peut faire l'ensemble des possibilités soit en effet ta méthode (regexp simple + corrections) et la meilleure puisqu'on aura moins de faux-négatifs qu'avec la mienne.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Oui j'ai fait simple pour matcher le maximum ! Si on met article dans la regex, il faut ensuite gérer toutes les liaisons, ex :
articles 10, 11 et 12 ou 13
. J'ai testé avec ta dernière regex mais ces cas-là ne semblent pas pris en compte