Skip to content

Instantly share code, notes, and snippets.

View sulaya86's full-sized avatar
📷
cycling with the wind

Soraya sulaya86

📷
cycling with the wind
View GitHub Profile
@sulaya86
sulaya86 / downloadFilesFromWeb.py
Last active February 22, 2022 08:16
Download Files from a Website and save them into Folders
"""
The problem:
Download one by one and separate them into folders by country maybe not sound tedious but can be time consuming
if you are busy enough.
The purpose of this script is to:
implement web scraping with "beautifulsoup" library (to get the links of the files and the country they belongs to)
and "wget" to download the file.
References:
@sulaya86
sulaya86 / deduplicate_products.py
Last active August 30, 2021 20:18
Remove extra blank spaces in a column and removes duplicated rows
# The Problem: Marketing Team requires to crosscheck the list of products available
# in all the websites of the Company
# Input: A Excel File that contains "Model" and "Item Path" columns
# Requirements: Models are duplicated, Remove the duplicated models and keep the Item Path
# The problem: The cells in the "Model" columns contains extra spaces which makes all the models be unique
# even if they are not
# This script will remove extra spaces, remove the cells duplicated by keeping the first item found
# author: Soraya Ruiz
# date of creation: 27-08-2021
@sulaya86
sulaya86 / centraltendency.py
Last active July 1, 2021 05:38
This python script will open a SPSS fileSPSS is a software platform that offers advanced statistical analysis. In this exercise given a dataset in a SPSS file (.sav), need to find the mean,median, minimum,and maximum values for students and allstudents in the Teaching Ratings data. The purpose is to show basic familiarity of Python to calculate …
"""This python script will open a SPSS fileSPSS is a software platform that offers
advanced statistical analysis. In this exercise given a dataset in a SPSS file (.sav),
need to find the mean,median, minimum,and maximum values for students and allstudents
in the Teaching Ratings data. The purpose is to show basic familiarity of Python
to calculate central tendency.
"""
# Author: Soraya Ruiz
# Creation Date: 2021-07-01
# Import some required libraries
@sulaya86
sulaya86 / groupbyDatasetSaveExcel.py
Created June 18, 2021 08:00
For a given dataset from a Excel File, Create files per a desired column, each file is named as column and today's date
# ~
"""
For a given dataset from a Excel File,
Create files per a desired column,
each file is named as column and today's date
"""
# Author: Soraya Ruiz
# Creation Date: 2021-06-18
import pandas as pd
@sulaya86
sulaya86 / comparetwofilelist.py
Last active May 29, 2019 05:48
The purpose is to compare two list of files (A and B) and get a list of files does not exist in list A, and viceversa. This can be helpful for example when we want to make sure files were downloaded/reloaded succesfuly to avoid missing data.
import os
from pathlib import PureWindowsPath
def print_diff_files():
'''
Compare the content of two files
'''
_local_dir = os.path.dirname(os.path.abspath(__file__))
files_to_extract = PureWindowsPath(_local_dir + '\\' + 'files_to_find.txt')
downloaded_files = PureWindowsPath(_local_dir + '\\' + 'filelist.txt')