Last active
August 27, 2024 09:19
-
-
Save paduel/32ac6f0a47f3fae67e414a73b9779e89 to your computer and use it in GitHub Desktop.
Code to download the SP500 components from Yahoo Finance.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Import the necessary modules | |
import pandas as pd | |
import yfinance as yf | |
def get_sp_data(start='2008-01-01', end=None): | |
# Get the current SP components, and get a tickers list | |
sp_assets = pd.read_html( | |
'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0] | |
assets = sp_assets['Symbol'].str.replace('.', '-').tolist() | |
# Download historical data to a multi-index DataFrame | |
try: | |
data = yf.download(assets, start=start, end=end, as_panel=False) | |
filename = 'sp_components_data.pkl' | |
data.to_pickle(filename) | |
print('Data saved at {}'.format(filename)) | |
except ValueError: | |
print('Failed download, try again.') | |
data = None | |
return data | |
if __name__ == '__main__': | |
sp_data = get_sp_data() |
Nice simple code.
I had some problems with the pd.read_html, so I had to do a correction.
sp_assets = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies') sp_assets[0].columns = sp_assets[0].iloc[0].values sp_assets[0].drop(0,0, inplace = True) assets = sp_assets[0].Symbol.tolist()
great!, to solve that problem just include parameter header=0 in pd.read_html
very useful! Thanks!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Nice simple code.
I had some problems with the pd.read_html, so I had to do a correction.