-
-
Save jayspeidell/d10b84b8d3da52df723beacc5b15cb27 to your computer and use it in GitHub Desktop.
# Info on how to get your api key (kaggle.json) here: https://github.com/Kaggle/kaggle-api#api-credentials | |
!pip install kaggle | |
api_token = {"username":"USERNAME","key":"API_KEY"} | |
import json | |
import zipfile | |
import os | |
with open('/content/.kaggle/kaggle.json', 'w') as file: | |
json.dump(api_token, file) | |
!chmod 600 /content/.kaggle/kaggle.json | |
!kaggle config path -p /content | |
!kaggle competitions download -c jigsaw-toxic-comment-classification-challenge | |
os.chdir('/content/competitions/jigsaw-toxic-comment-classification-challenge') | |
for file in os.listdir(): | |
zip_ref = zipfile.ZipFile(file, 'r') | |
zip_ref.extractall() | |
zip_ref.close() |
does it work with private datasets?
@ucalyptus It should work if you are authenticated, so you should be prompted every time you run the command to insert your credentials. You can probably overcome this by using an ssh key, but I don't recommend it especially if you will share the notebook with others.
Note that I did not try my method with private datasets.
Edit: @ucalyptus try using my method without any modifications, I believe it should work, because you're already using an API key for authentication.
@McGregorWwww I do not think it's possible to use Kaggle datasets on Google Colab without downloading them. If you wish to use datasets without downloading them your only option is to use Kaggle kernels.
Once the data is downloaded using bothmena's method, how do you define it and actually begin to use it? I received the message saying the download was successful, but lack the ability to actually see/use the data now.
@mdresaj try using this command !ls .
it will show you all the directories in your current working directory, there you should see the files that the command downloaded.
Thanks for sharing! However, I wonder if there is any way that can use the dataset without downloading it? Because some of the datasets are quite large, like 100GB+.
Maybe you could try it on google colab. It's quite fast.
###Hi guys!
Can someone actually put all this together? I am a newby and I like using colab and I want to be able to do everything straight from google colab. Also is there a possibility to not download the locally?
I found this gist today, because I wanted to download kaggle dataset into google colab, but I ended up using a different approach, I hope it helps:
!mkdir /root/.kaggle !echo '{"username":"USERNAME","key":"API_KEY"}' > /root/.kaggle/kaggle.json
This way you don't need to change kaggle json file directory in the kaggle settings
and now you can use kaggle to do whatever you need!!kaggle datasets download -d owner/dataset-slug
This is simple and working !! Thanx man !
Encontrei essa essência hoje, porque queria fazer o download do conjunto de dados kaggle no google colab, mas acabei usando uma abordagem diferente, espero que ajude:
!mkdir /root/.kaggle !echo '{"username":"USERNAME","key":"API_KEY"}' > /root/.kaggle/kaggle.json
Dessa forma, você não precisa alterar o diretório de arquivos do kaggle json nas configurações do kaggle
e agora pode usar o kaggle para fazer o que precisar!!kaggle datasets download -d owner/dataset-slug
É possível salvar o arquivo de download para usar em uma variável?
Setup and Download dataset
Imports
import json
import zipfile
import os
!pip install kaggle
api_token = {"username":"---Your Username","key":"Your API Key"}
!mkdir -p ~/.kaggle
with open('kaggle.json', 'w') as file:
json.dump(api_token, file)
!cp kaggle.json ~/.kaggle/
!ls ~/.kaggle
!chmod 600 /root/.kaggle/kaggle.json
!kaggle datasets download -d heeraldedhia/groceries-dataset
- The dataset will now be present in the /content/ folder (you can see it using
os.listdir()
)
Further, to extract the dataset,
for file in os.listdir():
if '.zip' in file:
zip_ref = zipfile.ZipFile(file, 'r')
zip_ref.extractall()
zip_ref.close()
- This will also place the files directly inside the
/content/
folder
just set the variables...
#Set the enviroment variables
import os
os.environ['KAGGLE_USERNAME'] = "xxxx"
os.environ['KAGGLE_KEY'] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
!kaggle datasets download -d iarunava/happy-house-dataset
thanks, It works but when I participate in competition:
import os
os.environ['KAGGLE_USERNAME'] = "xxxx"
os.environ['KAGGLE_KEY'] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
!kaggle competition download -c xxxxxxxxxx
Thanks for sharing! However, I wonder if there is any way that can use the dataset without downloading it? Because some of the datasets are quite large, like 100GB+.