Amir22010/convert_voc_to_yolo.md

Forked from myounus96/convert_voc_to_yolo.md

Created February 22, 2020 22:01

Star (111) You must be signed in to star a gist
Fork (18) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/Amir22010/a99f18ca19112bc7db0872a36a03a1ec.js"></script>
Save Amir22010/a99f18ca19112bc7db0872a36a03a1ec to your computer and use it in GitHub Desktop.

Download ZIP

convert pascal voc dataset to yolo format

Raw

convert_voc_to_yolo.md

Convert PascalVOC Annotations to YOLO

This script reads PascalVOC xml files, and converts them to YOLO txt files.

Note: This script was written and tested on Ubuntu. YMMV on other OS's.

Disclaimer: This code is a modified version of Joseph Redmon's voc_label.py

Instructions:

Place the convert_voc_to_yolo.py file into your data folder.
Edit the dirs array (line 8) to contain the folders where your images and xmls are located. Note: this script assumes all of your images are .jpg's (line 13).
Edit the classes array (line 9) to contain all of your classes.
Run the script. Upon running the script, each of the given directories will contain a 'yolo' folder that contains all of the YOLO txt files. A text file containing all of the image paths will be created in the cwd, for each given directory.

Make sure to put images and xml files in the root of train.Like this(image is in comment),here my folder name is VOCData and yolo folder is generated by script.

convert_voc_to_yolo.py:

import glob
import os
import pickle
import xml.etree.ElementTree as ET
from os import listdir, getcwd
from os.path import join

dirs = ['train', 'val']
classes = ['person', 'car']

def getImagesInDir(dir_path):
    image_list = []
    for filename in glob.glob(dir_path + '/*.jpg'):
        image_list.append(filename)

    return image_list

def convert(size, box):
    dw = 1./(size[0])
    dh = 1./(size[1])
    x = (box[0] + box[1])/2.0 - 1
    y = (box[2] + box[3])/2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

def convert_annotation(dir_path, output_path, image_path):
    basename = os.path.basename(image_path)
    basename_no_ext = os.path.splitext(basename)[0]

    in_file = open(dir_path + '/' + basename_no_ext + '.xml')
    out_file = open(output_path + basename_no_ext + '.txt', 'w')
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult)==1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
        bb = convert((w,h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

cwd = getcwd()

for dir_path in dirs:
    full_dir_path = cwd + '/' + dir_path
    output_path = full_dir_path +'/yolo/'

    if not os.path.exists(output_path):
        os.makedirs(output_path)

    image_paths = getImagesInDir(full_dir_path)
    list_file = open(full_dir_path + '.txt', 'w')

    for image_path in image_paths:
        list_file.write(image_path + '\n')
        convert_annotation(full_dir_path, output_path, image_path)
    list_file.close()

    print("Finished processing: " + dir_path)

harinduravin commented Oct 29, 2021

Useful and quick

Hedenir commented Nov 4, 2021 •

edited

Loading

Thank you!

Bellahmer-hacene commented Nov 5, 2021 •

edited

Loading

Thanks, really helpfull !
You should add some code to add "classes.txt" file in the "yolo" folder in order to open the labeled images in labelimg for exemple.

file = open(full_dir_path +"/yolo/classes.txt", "w")
file.write(< classes >)
file.close()

the < classes > parameter should contain the classes that you used to label yours images :
< classes > = "class1 \nclass2 \nclass3"

samida22 commented Nov 15, 2021

@Yogiwolf did you solve it ? I am getting same error. Thanks

Shubhs0411 commented Nov 25, 2021

This code is not working

Hedenir commented Nov 25, 2021 via email

Worked for me. I runned code.py in same folder of coordinate files. Converted sucessfully

…

________________________________ De: Shubham Deshmukh ***@***.***> Enviado: quinta-feira, 25 de novembro de 2021 15:36 Para: Amir22010 ***@***.***> Cc: Hedenir ***@***.***>; Comment ***@***.***> Assunto: Re: Amir22010/convert_voc_to_yolo.md @Shubhs0411 commented on this gist.

________________________________ This code is not working — You are receiving this because you commented. Reply to this email directly, view it on GitHub<https://gist.github.com/a99f18ca19112bc7db0872a36a03a1ec#gistcomment-3974439>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AVTVFKDIWCN5YQ4QTVDJBK3UNZJYDANCNFSM4VGKZ6MA>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

Shubhs0411 commented Nov 25, 2021 via email

I did some changes it is running now.

…

On Thu, Nov 25, 2021 at 10:32 PM Hedenir ***@***.***> wrote: ***@***.**** commented on this gist. ------------------------------ Worked for me. I runned code.py in same folder of coordinate files. Converted sucessfully ________________________________ De: Shubham Deshmukh ***@***.***> Enviado: quinta-feira, 25 de novembro de 2021 15:36 Para: Amir22010 ***@***.***> Cc: Hedenir ***@***.***>; Comment ***@***.***> Assunto: Re: Amir22010/convert_voc_to_yolo.md @Shubhs0411 commented on this gist. ________________________________ This code is not working — You are receiving this because you commented. Reply to this email directly, view it on GitHub< https://gist.github.com/a99f18ca19112bc7db0872a36a03a1ec#gistcomment-3974439>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AVTVFKDIWCN5YQ4QTVDJBK3UNZJYDANCNFSM4VGKZ6MA >. Triage notifications on the go with GitHub Mobile for iOS< https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android< https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub >. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://gist.github.com/a99f18ca19112bc7db0872a36a03a1ec#gistcomment-3974515>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQHTZNX24NTAAJT2S5CP4ALUNZTY7ANCNFSM4VGKZ6MA> .

Hedenir commented Nov 25, 2021 via email

Congratulations!

________________________________ De: Shubham Deshmukh ***@***.***> Enviado: quinta-feira, 25 de novembro de 2021 17:21 Para: Amir22010 ***@***.***> Cc: Hedenir ***@***.***>; Comment ***@***.***> Assunto: Re: Amir22010/convert_voc_to_yolo.md @Shubhs0411 commented on this gist.

________________________________ I did some changes it is running now.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub<https://gist.github.com/a99f18ca19112bc7db0872a36a03a1ec#gistcomment-3974523>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AVTVFKBJOWUYWXBTMMMIAKLUNZWAZANCNFSM4VGKZ6MA>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

iaverypadberg commented Feb 4, 2022

You're script is FIRE

c-pineau commented Feb 24, 2022

Hi !
Thank you very much for this script

I had to change a few things so I'll just list them behind in case anyone has the same issues.

my files were jpeg, but the extension is case sensitive in the script, so check your files extension, i went from .jpg to .JPG:

for filename in glob.glob(dir_path + '/*.JPG'):

(It's on line 13)

I had some images that weren't labellised (because i couldn't find anything on them), but I still need to keep them in my folder, so I added an if statement to mak sure it'd just go over my non labelled images. My getImagesInDir function looks like this now:

def getImagesInDir(dir_path):
    image_list = []
    for filename in glob.glob(dir_path + '/*.JPG'):
        xml_filename = os.path.splitext(os.path.basename(filename))[0] + '.xml'
        if (os.path.exists(dir_path + '/' + xml_filename)):
            image_list.append(filename)

    return image_list

mhamza19 commented Mar 29, 2022

Please help. I have uploaded my entire dataset to google drive and mounted it to Google Colab. I am trying to run this script on Colab. My dataset directory is as follows: There are two separate folders for Images and Annotations. Both of these folders reside in a folder named MyDataset. I am compiling this script in MyDataset folder but I am getting an empty yolo folder and .txt file. Also, I don't have train and validation folders of my dataset yet. Any help would be highly appreciated!

NguyenDuyHung99 commented Jul 22, 2022

work well. thanks

imadgohar commented Sep 1, 2022

@c-pineau Thanks for your post. I had the same issue but I am not getting the output yolo file. I am wonder about the directory structure. I am working with colab. Also my data have not train and val split just like @mhamza19 commented above. What to do in this situation? Your guidance will be appreciated.

mhamza19 commented Sep 1, 2022

@imadgohar What do you mean by Yolo file? Also, I would suggest you to use Roboflow as It will save you a lot of time while automating all this data-related stuff in a few clicks.

imadgohar commented Oct 6, 2022

@imadgohar What do you mean by Yolo file? Also, I would suggest you to use Roboflow as It will save you a lot of time while automating all this data-related stuff in a few clicks.

I am done with it thanks.

ipiyushvaghela commented Nov 8, 2022 •

edited

Loading

import glob
import os
import pickle
import xml.etree.ElementTree as ET
from os import listdir, getcwd
from os.path import join

dirs = ['train', 'test']
classes = ['apple', 'banana', 'orange']


def getImagesInDir(dir_path):
    image_list = []
    for filename in glob.glob(dir_path + '\\*.jpg'):
        image_list.append(filename)

    return image_list

def convert(size, box):
    if size[0] == 0:
        dw = 1./(size[0]+0.00001)
    else:
        dw = 1./(size[0])
        
    if size[0] == 0:
        dh = 1./(size[1]+0.00001)
    else:
        dh = 1./(size[1])

    x = (box[0] + box[1])/2.0 - 1
    y = (box[2] + box[3])/2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

def convert_annotation(dir_path, output_path, image_path):
    basename = os.path.basename(image_path)
    basename_no_ext = os.path.splitext(basename)[0]

    in_file = open(dir_path + '\\' + basename_no_ext + '.xml')
    out_file = open(output_path + basename_no_ext + '.txt', 'w')
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult)==1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
        bb = convert((w,h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

cwd = getcwd()
print('not working')
for dir_path in dirs:
    full_dir_path = cwd + '\\' + dir_path
    output_path = full_dir_path +'\\yolo\\'

    if not os.path.exists(output_path):
        print(output_path)
        os.makedirs(output_path)

    image_paths = getImagesInDir(full_dir_path)
    list_file = open(full_dir_path + '.txt', 'w')

    for image_path in image_paths:
        print(image_path)
        list_file.write(image_path + '\n')
        convert_annotation(full_dir_path, output_path, image_path)
    list_file.close()

    print("Finished processing: " + dir_path)
print('gihub.com/ipiyushvaghela')

For Windows users...

Suppose the DATASET folder contains the train and test folders...
then just create a .py file inside the DATASET folder and paste the above code. and run that py file your .txt will be created inside train --> yolo.

BayanFatayer commented Nov 27, 2022

import glob
import os
import pickle
import xml.etree.ElementTree as ET
from os import listdir, getcwd
from os.path import join

dirs = ['train', 'test']
classes = ['apple', 'banana', 'orange']


def getImagesInDir(dir_path):
    image_list = []
    for filename in glob.glob(dir_path + '\\*.jpg'):
        image_list.append(filename)

    return image_list

def convert(size, box):
    if size[0] == 0:
        dw = 1./(size[0]+0.00001)
    else:
        dw = 1./(size[0])
        
    if size[0] == 0:
        dh = 1./(size[1]+0.00001)
    else:
        dh = 1./(size[1])

    x = (box[0] + box[1])/2.0 - 1
    y = (box[2] + box[3])/2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

def convert_annotation(dir_path, output_path, image_path):
    basename = os.path.basename(image_path)
    basename_no_ext = os.path.splitext(basename)[0]

    in_file = open(dir_path + '\\' + basename_no_ext + '.xml')
    out_file = open(output_path + basename_no_ext + '.txt', 'w')
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult)==1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
        bb = convert((w,h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

cwd = getcwd()
print('not working')
for dir_path in dirs:
    full_dir_path = cwd + '\\' + dir_path
    output_path = full_dir_path +'\\yolo\\'

    if not os.path.exists(output_path):
        print(output_path)
        os.makedirs(output_path)

    image_paths = getImagesInDir(full_dir_path)
    list_file = open(full_dir_path + '.txt', 'w')

    for image_path in image_paths:
        print(image_path)
        list_file.write(image_path + '\n')
        convert_annotation(full_dir_path, output_path, image_path)
    list_file.close()

    print("Finished processing: " + dir_path)
print('gihub.com/ipiyushvaghela')

For Windows users...

Suppose the DATASET folder contains the train and test folders... then just create a .py file inside the DATASET folder and paste the above code. and run that py file your .txt will be created inside train --> yolo.

it didn't work for me

abdollah-semej commented Aug 8, 2023 •

edited

Loading

very nice
It worked for me
tnks bro

carlosgomez1987 commented Oct 12, 2023

Very nice post!!! It worked well for me.

DungHD-1997 commented Oct 13, 2023

very nice, thank you for share!

onur-unsoy commented Nov 16, 2023

it didn't work for me. i got this error.
runfile('D:/bit dataset file/BITVehicle_Dataset/from_xml_to_yolo_convert_label_script.py', wdir='D:/bit dataset file/BITVehicle_Dataset')
Traceback (most recent call last):

File ~.conda\envs\thesis_two\lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
exec(code, globals, locals)

File d:\bit dataset file\bitvehicle_dataset\from_xml_to_yolo_convert_label_script.py:75
convert_annotation(full_dir_path, output_path, image_path)

File d:\bit dataset file\bitvehicle_dataset\from_xml_to_yolo_convert_label_script.py:47 in convert_annotation
w = int(size.find('width').text)

ValueError: invalid literal for int() with base 10: '[[1920]]'

Because my labeling files consist like that size.

[[1920]]
[[1080]]
3
How can i change this code? Thank you.

DungHD-1997 commented Nov 17, 2023

it didn't work for me. i got this error. runfile('D:/bit dataset file/BITVehicle_Dataset/from_xml_to_yolo_convert_label_script.py', wdir='D:/bit dataset file/BITVehicle_Dataset') Traceback (most recent call last):

File ~.conda\envs\thesis_two\lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec exec(code, globals, locals)

File d:\bit dataset file\bitvehicle_dataset\from_xml_to_yolo_convert_label_script.py:75 convert_annotation(full_dir_path, output_path, image_path)

File d:\bit dataset file\bitvehicle_dataset\from_xml_to_yolo_convert_label_script.py:47 in convert_annotation w = int(size.find('width').text)

ValueError: invalid literal for int() with base 10: '[[1920]]'

Because my labeling files consist like that size. [[1920]] [[1080]] 3 How can i change this code? Thank you.

This code worked for me but I used another code for more convenience.
https://github.com/Ryo-Kawanami/xml2yolo/tree/master

Wuito commented Feb 20, 2024

The author's code works. In VOC dataset, tag xml files and jpg images are stored separately. You need to place your xml tag file and jpg image in the same folder and then run the program to get the result of the conversion. Ensuring file paths as:
├──convert_voc_to_yolo.py
└──train
└── xxx0.xml
└── xxx0.jpg
└──test
└── xxx1.xml
└── xxx1.jpg

Amir22010/convert_voc_to_yolo.md

Convert PascalVOC Annotations to YOLO

Instructions:

convert_voc_to_yolo.py:

harinduravin commented Oct 29, 2021

Uh oh!

Hedenir commented Nov 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bellahmer-hacene commented Nov 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samida22 commented Nov 15, 2021

Uh oh!

Shubhs0411 commented Nov 25, 2021

Uh oh!

Hedenir commented Nov 25, 2021 via email

Uh oh!

Shubhs0411 commented Nov 25, 2021 via email

Uh oh!

Hedenir commented Nov 25, 2021 via email

Uh oh!

iaverypadberg commented Feb 4, 2022

Uh oh!

c-pineau commented Feb 24, 2022

Uh oh!

mhamza19 commented Mar 29, 2022

Uh oh!

NguyenDuyHung99 commented Jul 22, 2022

Uh oh!

imadgohar commented Sep 1, 2022

Uh oh!

mhamza19 commented Sep 1, 2022

Uh oh!

imadgohar commented Oct 6, 2022

Uh oh!

ipiyushvaghela commented Nov 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BayanFatayer commented Nov 27, 2022

Uh oh!

abdollah-semej commented Aug 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carlosgomez1987 commented Oct 12, 2023

Uh oh!

DungHD-1997 commented Oct 13, 2023

Uh oh!

onur-unsoy commented Nov 16, 2023

Uh oh!

DungHD-1997 commented Nov 17, 2023

Uh oh!

Wuito commented Feb 20, 2024

Uh oh!

Hedenir commented Nov 4, 2021 •

edited

Loading

Bellahmer-hacene commented Nov 5, 2021 •

edited

Loading

ipiyushvaghela commented Nov 8, 2022 •

edited

Loading

abdollah-semej commented Aug 8, 2023 •

edited

Loading