Skip to content

Instantly share code, notes, and snippets.

@jayspeidell
Last active July 18, 2023 12:23
Show Gist options
  • Save jayspeidell/d10b84b8d3da52df723beacc5b15cb27 to your computer and use it in GitHub Desktop.
Save jayspeidell/d10b84b8d3da52df723beacc5b15cb27 to your computer and use it in GitHub Desktop.
Sample script to download Kaggle files
# Info on how to get your api key (kaggle.json) here: https://github.com/Kaggle/kaggle-api#api-credentials
!pip install kaggle
api_token = {"username":"USERNAME","key":"API_KEY"}
import json
import zipfile
import os
with open('/content/.kaggle/kaggle.json', 'w') as file:
json.dump(api_token, file)
!chmod 600 /content/.kaggle/kaggle.json
!kaggle config path -p /content
!kaggle competitions download -c jigsaw-toxic-comment-classification-challenge
os.chdir('/content/competitions/jigsaw-toxic-comment-classification-challenge')
for file in os.listdir():
zip_ref = zipfile.ZipFile(file, 'r')
zip_ref.extractall()
zip_ref.close()
@bmusangu
Copy link

I got it to work. By just breaking the code down. For some reason it wouldn't work with the code as a block in one cell

@zionverse
Copy link

I found this gist today, because I wanted to download kaggle dataset into google colab, but I ended up using a different approach, I hope it helps:

!mkdir /root/.kaggle
!echo '{"username":"USERNAME","key":"API_KEY"}' > /root/.kaggle/kaggle.json

This way you don't need to change kaggle json file directory in the kaggle settings
and now you can use kaggle to do whatever you need!

!kaggle datasets download -d owner/dataset-slug

This is simple and working !! Thanx man !

@bmusangu
Copy link

bmusangu commented Jul 9, 2019

I found this gist today, because I wanted to download kaggle dataset into google colab, but I ended up using a different approach, I hope it helps:

!mkdir /root/.kaggle
!echo '{"username":"USERNAME","key":"API_KEY"}' > /root/.kaggle/kaggle.json

This way you don't need to change kaggle json file directory in the kaggle settings
and now you can use kaggle to do whatever you need!

!kaggle datasets download -d owner/dataset-slug

This is simple and working !! Thanx man !

Nice! Cheers!

@Joycechidi
Copy link

Thanks so much, @bothmena. This clearly works. The easiest way to download kaggle dataset.

@McGregorWwww
Copy link

Thanks for sharing! However, I wonder if there is any way that can use the dataset without downloading it? Because some of the datasets are quite large, like 100GB+.

@ucalyptus
Copy link

does it work with private datasets?

@bothmena
Copy link

bothmena commented Nov 6, 2019

@ucalyptus It should work if you are authenticated, so you should be prompted every time you run the command to insert your credentials. You can probably overcome this by using an ssh key, but I don't recommend it especially if you will share the notebook with others.

Note that I did not try my method with private datasets.
Edit: @ucalyptus try using my method without any modifications, I believe it should work, because you're already using an API key for authentication.

@McGregorWwww I do not think it's possible to use Kaggle datasets on Google Colab without downloading them. If you wish to use datasets without downloading them your only option is to use Kaggle kernels.

@mdresaj
Copy link

mdresaj commented Nov 6, 2019

Once the data is downloaded using bothmena's method, how do you define it and actually begin to use it? I received the message saying the download was successful, but lack the ability to actually see/use the data now.

@bothmena
Copy link

bothmena commented Nov 7, 2019

@mdresaj try using this command !ls . it will show you all the directories in your current working directory, there you should see the files that the command downloaded.

@XiaohanYa
Copy link

Thanks for sharing! However, I wonder if there is any way that can use the dataset without downloading it? Because some of the datasets are quite large, like 100GB+.

Maybe you could try it on google colab. It's quite fast.

@dhosco
Copy link

dhosco commented Dec 12, 2019

###Hi guys!
Can someone actually put all this together? I am a newby and I like using colab and I want to be able to do everything straight from google colab. Also is there a possibility to not download the locally?

@Adnan-Toky
Copy link

I found this gist today, because I wanted to download kaggle dataset into google colab, but I ended up using a different approach, I hope it helps:

!mkdir /root/.kaggle
!echo '{"username":"USERNAME","key":"API_KEY"}' > /root/.kaggle/kaggle.json

This way you don't need to change kaggle json file directory in the kaggle settings
and now you can use kaggle to do whatever you need!

!kaggle datasets download -d owner/dataset-slug

This is simple and working !! Thanx man !

@arthurcotaf
Copy link

Encontrei essa essência hoje, porque queria fazer o download do conjunto de dados kaggle no google colab, mas acabei usando uma abordagem diferente, espero que ajude:

!mkdir /root/.kaggle
!echo '{"username":"USERNAME","key":"API_KEY"}' > /root/.kaggle/kaggle.json

Dessa forma, você não precisa alterar o diretório de arquivos do kaggle json nas configurações do kaggle
e agora pode usar o kaggle para fazer o que precisar!

!kaggle datasets download -d owner/dataset-slug

É possível salvar o arquivo de download para usar em uma variável?

@ayushxx7
Copy link

ayushxx7 commented Oct 29, 2020

Setup and Download dataset

Imports

import json
import zipfile
import os
!pip install kaggle
api_token = {"username":"---Your Username","key":"Your API Key"}
!mkdir -p ~/.kaggle
with open('kaggle.json', 'w') as file:
    json.dump(api_token, file)
!cp kaggle.json ~/.kaggle/
!ls ~/.kaggle
!chmod 600 /root/.kaggle/kaggle.json
!kaggle datasets download -d heeraldedhia/groceries-dataset
  • The dataset will now be present in the /content/ folder (you can see it using os.listdir())

Further, to extract the dataset,

for file in os.listdir():
    if '.zip' in file:
      zip_ref = zipfile.ZipFile(file, 'r')
      zip_ref.extractall()
      zip_ref.close()
  • This will also place the files directly inside the /content/ folder

@Sebastian-ctr
Copy link

Sebastian-ctr commented Mar 18, 2021

just set the variables...

#Set the enviroment variables
import os
os.environ['KAGGLE_USERNAME'] = "xxxx"
os.environ['KAGGLE_KEY'] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
!kaggle datasets download -d iarunava/happy-house-dataset

thanks, It works but when I participate in competition:
import os
os.environ['KAGGLE_USERNAME'] = "xxxx"
os.environ['KAGGLE_KEY'] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
!kaggle competition download -c xxxxxxxxxx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment