Skip to content

Instantly share code, notes, and snippets.

@phillipoertel
Last active February 5, 2025 08:03
Show Gist options
  • Save phillipoertel/af8d109f7987213c52c36e7a2e6cf8ae to your computer and use it in GitHub Desktop.
Save phillipoertel/af8d109f7987213c52c36e7a2e6cf8ae to your computer and use it in GitHub Desktop.
import time
item = {
"images": ["img1", "img2", "img1"]
}
while True:
print("------ new loop")
# original scraper code:
# https://github.com/revamediadk/pyspiders/blob/e1b6a338e6fc8fe04b9d023c709ea42893b27539/python_spiders/pipelines.py#L154
# item["images"] = list(set(item["images"]))
images_out = list(set(item["images"]))
print(images_out) # deduplicated, order not always preserved
images_out = list(dict.fromkeys(item["images"]))
print(images_out) # deduplicated, order preserved
time.sleep(0.5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment