Skip to content

Instantly share code, notes, and snippets.

@ThomasG77
Last active July 25, 2024 15:07
Show Gist options
  • Save ThomasG77/35ecb55d0032aa0f18777aaa15b9247d to your computer and use it in GitHub Desktop.
Save ThomasG77/35ecb55d0032aa0f18777aaa15b9247d to your computer and use it in GitHub Desktop.
Athletes fr JO 2024

Récupération des données de l'API des athlètes français JO 2024 derrière https://data.equipedefrance.com

Le plus intéressant, ce sont les athlètes qui sont exposés. Il manque quelques coordonnées géographiques (correspondant au lieu de naissance, 29 lors de mes tests). Nous avons dû les déduire en passant les slugs des athlètes car les lieux de naissance sont renseignés dans les pages HTML du type https://www.equipedefrance.com/athlete/guylaine-marchand bien que la géolocalisation du lieu de naissance soit absente.

Il y a eu quelques cas où:

  • pas de lieu de naissance dans le fiche HTML
  • on a modifié le nom de la région après le lieu de naissance car le géocodeur ne retournait pas l'attendu
  • on a récupéré le lieu de naissance "ailleurs" car quelques fiches HTML étaient innaccessibles (erreurs 500) ou sans le lieu de naissance

Les athlètes dont la propriété birthPlace est remplie sont ceux qui ont été géocodés. Ceux avec une valeur null sont ceux dont je n'ai rien touché car déjà avec des coordonnées.

On produit un fichier GeoJSON en entrée.

Chaque "feature" GeoJSON est du type

{
  "type": "Feature",
  "properties": {
    "gender": "homme",
    "firstname": "Alexandre",
    "lastname": "Lloveras",
    "slug": "alexandre-lloveras",
    "type": "paralympic",
    "birthdate": "2000-06-26",
    "pictureUrl": "https://medias.equipedefrance.com/root/6fc2f70b-1edb-47d3-a0a8-a34b6016b323.jpg",
    "isMedalist": false,
    "olympicMedals": {
      "gold": 1,
      "silver": 0,
      "bronze": 2
    },
    "olympicGames": [
      {
        "year": 2020
      },
      {
        "year": 2024
      }
    ],
    "disciplines": [
      {
        "objectID": 62,
        "slug": "para-cyclisme"
      }
    ],
    "objectID": "967",
    "birthPlace": null
  },
  "geometry": {
    "coordinates": [
      4.836284906729304,
      45.7710938512817
    ],
    "type": "Point"
  }
}

On montre le cas pour que vous compreniez qu'on a des atributs imbriqués qu'il vous faudra extraire ci-nécessaire.

Si vous voulez des entrées par sport, passez plutôt par des URLS du type https://data.equipedefrance.com/api/sport/67 après avoir listé les identifiants de sports via https://data.equipedefrance.com/api/init

Il y aussi une entrée par région mais le problème est qu'elle ne permet pas d'avoir tous les gens car tous les athlètes de l'équipe de France ne sont pas nés que sur sol français et il n'y a pas de cas pour avoir tous ceux nés à l'étranger alors que pas de problème côté entrée https://data.equipedefrance.com/api/init.

{"Nemours (77)": {"coordinates": [2.6953079, 48.268026], "type": "Point"}, "Alger (Alg\u00e9rie)": {"coordinates": [3.029126556356098, 36.70005095], "type": "Point"}, "Le Lamentin": {"coordinates": [-61.0018145, 14.614557], "type": "Point"}, "Lun\u00e9ville (54)": {"coordinates": [6.4919563, 48.5916164], "type": "Point"}, "Bissau (Guin\u00e9e-Bissau)": {"coordinates": [-15.607704564772728, 11.87018055], "type": "Point"}, "Nancy (Lorraine)": {"coordinates": [6.1834097, 48.6937223], "type": "Point"}, "Paris": {"coordinates": [2.3200410217200766, 48.8588897], "type": "Point"}, "Montb\u00e9liard (25)": {"coordinates": [6.7977564, 47.5102368], "type": "Point"}, "Melun (Seine-et-Marne)": {"coordinates": [2.6608169, 48.539927], "type": "Point"}, "Ivry-sur-Seine (94)": {"coordinates": [2.3872525, 48.8122302], "type": "Point"}, "Valenciennes (Nord)": {"coordinates": [3.5234846, 50.3579317], "type": "Point"}, "Muret (Haute-Garonne)": {"coordinates": [1.3262332, 43.4599858], "type": "Point"}, "Entraigues-sur-la-Sorgue (84)": {"coordinates": [4.9272616, 44.0023187], "type": "Point"}, "Neuilly-sur-Seine (Hauts-de-Seine)": {"coordinates": [2.2695658, 48.884683], "type": "Point"}, "Le Port (La R\u00e9union)": {"coordinates": [55.2916763, -20.9354584], "type": "Point"}, "Nantes (44)": {"coordinates": [-1.5541362, 47.2186371], "type": "Point"}, "Clermont-Ferrand (63)": {"coordinates": [3.0819427, 45.7774551], "type": "Point"}, "Cambrai": {"coordinates": [3.2346145, 50.1757546], "type": "Point"}, "Lom\u00e9 (Togo)": {"coordinates": [1.215829, 6.130419], "type": "Point"}, "Corbeil-Essonnes": {"coordinates": [2.4818087, 48.6137734], "type": "Point"}, "Echirolles (38)": {"coordinates": [5.718687, 45.1481694], "type": "Point"}, "Valenciennes (59)": {"coordinates": [3.5234846, 50.3579317], "type": "Point"}, "Martigues (Bouches-du-Rh\u00f4ne)": {"coordinates": [5.0548176, 43.4057279], "type": "Point"}, "Angers (Maine-et-Loire)": {"coordinates": [-0.5515588, 47.4739884], "type": "Point"}, "Poitiers (Vienne)": {"coordinates": [0.340196, 46.5802596], "type": "Point"}, "Echillais (17)": {"coordinates": [-0.9524315, 45.8983203], "type": "Point"}, "Le Creusot (Sa\u00f4ne-et-Loire)": {"coordinates": [4.4285961, 46.8054064], "type": "Point"}, "Poitiers (86)": {"coordinates": [0.340196, 46.5802596], "type": "Point"}, "Lyon (Rhone)": {"coordinates": [4.8320114, 45.7578137], "type": "Point"}}
import requests
import logging
import json
import os
from bs4 import BeautifulSoup
from time import sleep
'''
# You will see the REQUEST, including HEADERS and DATA, and RESPONSE with HEADERS but without DATA.
# The only thing missing will be the response.body which is not logged.
try:
import http.client as http_client
except ImportError:
# Python 2
import httplib as http_client
http_client.HTTPConnection.debuglevel = 1
# You must initialize logging, otherwise you'll not see debug output.
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
'''
attributes = [
'gender',
'firstname' ,
'lastname' ,
'slug' ,
'type' ,
'birthdate' ,
'pictureUrl' ,
'isMedalist',
'olympicMedals',
'olympicGames',
'disciplines',
'objectID'
]
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
}
def get_birth_place(slug):
response = requests.get(f'https://www.equipedefrance.com/athlete/{slug}')
html_doc = response.text
soup = BeautifulSoup(html_doc, 'html.parser')
identity_infos = soup.findAll('p', {"class": "identityFeature-3-E2q"})
return identity_infos[1].text
filename_cache = 'cache.json'
if os.path.exists(filename_cache):
with open(filename_cache) as jsonfile:
cache = json.load(jsonfile)
else:
cache = {}
idx_slug_birthplace = []
def nominatim_city_from_freeform(place_name):
if place_name not in cache:
response_nominatim = requests.get('https://nominatim.openstreetmap.org/search.php', params={"polygon_geojson":0, "format": "jsonv2", "q": place_name}, headers=headers)
# import ipdb;ipdb.set_trace()
content = response_nominatim.json()
content = [i for i in content if i.get('type') == "administrative"]
if len(content) > 0:
result = content[0]
print(place_name, result)
geometry = {
"coordinates": [float(result.get('lon')), float(result.get('lat'))],
"type": "Point"
}
cache[place_name] = geometry
with open(filename_cache, 'w') as jsonfile:
json.dump(cache, jsonfile)
return geometry
else:
return cache[place_name]
# birthplace = get_birth_place('ameline-douarre')
# geometry = nominatim_from_freeform(birthplace)
def get_all_athletes():
response = requests.get(f'https://data.equipedefrance.com/api/init', headers=headers)
content = response.json()
hits = content.get('athletes').get('hits')
# print(hits)
features = []
for idx, hit in enumerate(hits):
geoloc = hit.get('_geoloc')
if geoloc is not None:
x = geoloc.get('lng', None)
y = geoloc.get('lat', None)
geometry = {
"coordinates": [
x,
y
],
"type": "Point"
}
if geoloc is None:
slug = hit.get('slug')
if slug == 'anais-rigal':
birthplace_name = 'Paris'
elif slug == 'julie-ligner':
birthplace_name = 'Cambrai'
elif slug == 'estelle-marsa-galant':
birthplace_name = 'Corbeil-Essonnes'
elif slug == 'thomas-peyroton-dartet':
birthplace_name = 'Muret (Haute-Garonne)'
elif slug == 'wissam-amazigh-yebba':
birthplace_name = 'Poitiers (86)'
elif slug == 'jean-christophe-rambeau':
birthplace_name = 'Lyon (Rhone)'
else:
birthplace_name = get_birth_place(slug)
idx_slug_birthplace.append([idx, slug, birthplace_name])
if birthplace_name not in cache:
print('sleep 10')
sleep(10)
geometry = nominatim_city_from_freeform(birthplace_name)
if birthplace_name not in cache:
print('sleep 10')
sleep(10)
# geometry = None
properties = {
k:v for k,v in hit.items() if k in attributes
}
feature = {
"type": "Feature",
"properties": properties,
"geometry": geometry
}
features.append(feature)
return features
myfeatures = get_all_athletes()
# import ipdb;ipdb.set_trace()
for feature in myfeatures:
slug = feature['properties']['slug']
dict_slug_city = dict([i[1:] for i in idx_slug_birthplace])
feature['properties']['birthPlace'] = None
if slug in dict_slug_city:
feature['properties']['birthPlace'] = cache[dict_slug_city[slug]]
featureCollection = {
"type": "FeatureCollection",
"features": myfeatures
}
with open('jo_2024_athletes.json', 'w') as jsonfile:
json.dump(featureCollection, jsonfile)
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@fbahoken
Copy link

Une autre version de cette carte
Paris2024_Potentiel de medailles fr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment