Skip to content

Instantly share code, notes, and snippets.

@paduel
Last active August 27, 2024 09:19
Show Gist options
  • Save paduel/32ac6f0a47f3fae67e414a73b9779e89 to your computer and use it in GitHub Desktop.
Save paduel/32ac6f0a47f3fae67e414a73b9779e89 to your computer and use it in GitHub Desktop.
Code to download the SP500 components from Yahoo Finance.
# Import the necessary modules
import pandas as pd
import yfinance as yf
def get_sp_data(start='2008-01-01', end=None):
# Get the current SP components, and get a tickers list
sp_assets = pd.read_html(
'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]
assets = sp_assets['Symbol'].str.replace('.', '-').tolist()
# Download historical data to a multi-index DataFrame
try:
data = yf.download(assets, start=start, end=end, as_panel=False)
filename = 'sp_components_data.pkl'
data.to_pickle(filename)
print('Data saved at {}'.format(filename))
except ValueError:
print('Failed download, try again.')
data = None
return data
if __name__ == '__main__':
sp_data = get_sp_data()
@nunodsousa
Copy link

nunodsousa commented Mar 27, 2019

Nice simple code.

I had some problems with the pd.read_html, so I had to do a correction.

sp_assets = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')

sp_assets[0].columns = sp_assets[0].iloc[0].values

sp_assets[0].drop(0,0, inplace = True)

assets = sp_assets[0].Symbol.tolist()

@crdelossantos
Copy link

Nice simple code.

I had some problems with the pd.read_html, so I had to do a correction.

sp_assets = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')

sp_assets[0].columns = sp_assets[0].iloc[0].values

sp_assets[0].drop(0,0, inplace = True)

assets = sp_assets[0].Symbol.tolist()

great!, to solve that problem just include parameter header=0 in pd.read_html

@mian20110
Copy link

very useful! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment