carceneaux/remove_gitlab_artifacts.sh

YoungPyDawan · 2019-09-30T08:50:23Z

I suggest to change:
https://gist.github.com/carceneaux/b75d483e3e0cb798ae60c424300d5a0b#file-remove_gitlab_artifacts-sh-L57
to
https://gist.github.com/YoungPyDawan/ace50026ae2ce587ea1782fdc5704889/revisions

This is only for artifacts and works for archived jobs too.

carceneaux · 2019-10-21T19:32:43Z

Thanks @YoungPyDawan! I've modified the API call so that only the artifacts are deleted. Good to see that API call was added. 😄

Kage-Yami · 2020-04-05T13:47:23Z

FYI... I came across this today in my search for an easy way to delete all artifacts for a project; unfortunately, it won't work in all cases (like mine) due to X-Total-Pages being omitted when the item count is greater than 10,000.

carceneaux · 2020-04-06T14:22:01Z

@Kage-Yami - Thanks for the heads up! I'll work on an updated version of the code. The fix is to not worry about X-Total-Page and simply check for the X-Next-Page header and key off of it instead.

As I'm pretty busy right now, it'll take a week or two for me to get to this. If you get the code sorted before then, please share. 😄

Here's the link mentioning the new logic to be used if you're interested:

https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/23931/diffs#34fe105b9f0ef77edad95de0c13084ff7f54c344_260_298

Kage-Yami · 2020-04-07T12:44:22Z

I ended up writing my own version that accepts the number of pages as an argument (with project and token also being arguments); I manually determined the page count by trial-and-error beforehand. So not great, but it got the job done.

I could probably adapt it to loop endlessly and simply exit once X-Next-Page either vanishes or equals the current page (haven't looked into what GitLab sends on the last page)... But I don't really need the script anymore, so probably won't bother.

Though as a bonus, mine is parallelised a bit; I was lucky and didn't need to worry about rate-limiting as I was only averaging around 300 calls a minute (out of the maximum of 600).

Atarity · 2020-04-28T13:45:45Z

It is not removed artifacts from the 1st page of my pipelines list for some reason. It also missed .status attribute in console log. The rest is as advertised, thanks!

philipptempel · 2020-07-21T12:10:47Z

It is not removed artifacts from the 1st page of my pipelines list for some reason. It also missed .status attribute in console log. The rest is as advertised, thanks!

@Atarity Check the source code and you will find the hint #starting with page 2 skipping most recent 100 Jobs thus it is intended that the first page of artifacts are not removed.

voiski · 2020-09-23T19:00:44Z

The response can have a json with breaking lines \n. Consider removing it like ${response//\\n/}

response=${response//\\n/}
length=$(echo $response | jq '. | length')

Also, you can easy simulate next page checking [ $length -ne 0 ] and having the page loop to 1000 or more.

mitar · 2021-01-07T19:50:40Z

I made the following Python script, which works for over 10k jobs, too:

#!/usr/bin/env python3

import time

import requests

project_id = '...'
token = '...'
server = 'gitlab.com'

print("Creating list of all jobs that currently have artifacts...")
# We skip the first page.
url = f"https://{server}/api/v4/projects/{project_id}/jobs?per_page=100&page=2"
while url:
    print(f"Processing page: {url}")
    response = requests.get(
        url,
        headers={
            'private-token': token,
        },
    )

    if response.status_code in [500, 429]:
        print(f"Status {response.status_code}, retrying.")
        time.sleep(10)
        continue

    response.raise_for_status()
    response_json = response.json()
    for job in response_json:
        if job.get('artifacts_file', None):
            job_id = job['id']
            delete_response = requests.delete(
                f"https://{server}/api/v4/projects/{project_id}/jobs/{job_id}/artifacts",
                headers={
                    'private-token': token,
                },
            )
            print(f"Processing job ID: {job_id} - status: {delete_response.status_code}")

    url = response.links.get('next', {}).get('url', None)

tamasgal · 2021-11-05T21:38:55Z

The if job.get('artifacts_file', None): needs to be changed to if job.get('artifacts', None): in the current version of the API, at least I don't see artifacts_file in any of the JSON responses.

mitar · 2021-11-07T17:35:28Z

I see it here: https://docs.gitlab.com/ee/api/jobs.html

tamasgal · 2021-11-07T18:15:01Z

I don't know why but none of the jobs on our server had artifacts_file but artifacts instead where they were listed including their sizes etc.

willstott101 · 2021-12-09T10:06:49Z

"artifacts_file" worked for me, but it's trivial to support both, I also tweaked the output so you can see what job failed if any, and made it start at the first page:

#!/usr/bin/env python3

import time

import requests

project_id = '...'
token = '...'
server = 'gitlab.com'
start_page = 1

print("Creating list of all jobs that currently have artifacts...")
# We skip the first page.
url = f"https://{server}/api/v4/projects/{project_id}/jobs?per_page=100&page={start_page}"
while url:
    print(f"Processing page: {url}")
    response = requests.get(
        url,
        headers={
            'private-token': token,
        },
    )

    if response.status_code in [500, 429]:
        print(f"Status {response.status_code}, retrying.")
        time.sleep(10)
        continue

    response.raise_for_status()
    response_json = response.json()
    for job in response_json:
        if job.get('artifacts_file', None) or job.get('artifacts', None):
            job_id = job['id']
            print(f"Processing job ID: {job_id}", end="")
            delete_response = requests.delete(
                f"https://{server}/api/v4/projects/{project_id}/jobs/{job_id}/artifacts",
                headers={
                    'private-token': token,
                },
            )
            print(f" - status: {delete_response.status_code}")

    url = response.links.get('next', {}).get('url', None)

kbaran1998 · 2022-04-19T11:46:29Z

While the script deletes jobs' artifacts, you can also delete project's artifacts by adding this code:

url = f"https://{server}/api/v4/projects/{project_id}/artifacts"
delete_response = requests.delete(
    url,
    headers={
        'private-token': token,
    }
)
print(f" - status: {delete_response.status_code}")

Muffinman · 2023-06-01T11:48:34Z

This does not work if you're project has more than 10000 jobs, due to the removal of X-Total-Pages header from the Gitlab API responses.

cmuller · 2023-07-07T09:45:50Z

Yes, I just found out that the X-Total-Pages header is now missing for performance reasons. Fortunately when a page number is too high, an empty json list ([]) is returned so it is quite easy to use a loop such as (here in bash):

PER_PAGE=100
PAGE=1
while JOBS=$(curl -s --header "PRIVATE-TOKEN: $TOKEN" "$GITLAB_INSTANCE/$PROJECT_ID/jobs?per_page=$PER_PAGE&page=$PAGE&sort=asc") && [ "$JOBS" != "[]" ]
do
   for JOB in $(echo $JOBS | jq .[].id)
   do
      [...]
   done
   PAGE=$((PAGE+1))
done

mikeller · 2023-12-07T04:08:08Z

Here's my slightly improved version for the 'do it in python' section (ignores job.log files which seem to be non-deletable, uses command line arguments to load the settings):

#!/usr/bin/env python3

import time
import requests
import sys

server = sys.argv[1]
project_id = sys.argv[2]
token = sys.argv[3]
start_page = sys.argv[4]

print("Creating list of all jobs that currently have artifacts...")
# We skip the first page.
url = f"https://{server}/api/v4/projects/{project_id}/jobs?per_page=100&page={start_page}"
while url:
    print(f"Processing page: {url}")
    response = requests.get(
        url,
        headers={
            'private-token': token,
        },
    )

    if response.status_code in [500, 429]:
        print(f"Status {response.status_code}, retrying.")
        time.sleep(10)
        continue

    response.raise_for_status()
    response_json = response.json()
    for job in response_json:
        artifacts = job.get('artifacts_file', None)
        if not artifacts:
            artifacts = job.get('artifacts', None)

        has_artifacts = False
        for artifact in artifacts:
            if artifact['filename'] != 'job.log':
                has_artifacts = True
                break

        if has_artifacts:
            job_id = job['id']
            print(f"Processing job ID: {job_id}", end="")
            delete_response = requests.delete(
                f"https://{server}/api/v4/projects/{project_id}/jobs/{job_id}/artifacts",
                headers={
                    'private-token': token,
                },
            )
            print(f" - status: {delete_response.status_code}")

    url = response.links.get('next', {}).get('url', None)

Tim-Schwalbe · 2023-12-07T11:08:58Z

Here's my slightly improved version for the 'do it in python' section (ignores job.log files which seem to be non-deletable, uses command line arguments to load the settings):

#!/usr/bin/env python3

import time
import requests
import sys

server = sys.argv[1]
project_id = sys.argv[2]
token = sys.argv[3]
start_page = sys.argv[4]

print("Creating list of all jobs that currently have artifacts...")
# We skip the first page.
url = f"https://{server}/api/v4/projects/{project_id}/jobs?per_page=100&page={start_page}"
while url:
    print(f"Processing page: {url}")
    response = requests.get(
        url,
        headers={
            'private-token': token,
        },
    )

    if response.status_code in [500, 429]:
        print(f"Status {response.status_code}, retrying.")
        time.sleep(10)
        continue

    response.raise_for_status()
    response_json = response.json()
    for job in response_json:
        artifacts = job.get('artifacts_file', None)
        if not artifacts:
            artifacts = job.get('artifacts', None)

        has_artifacts = False
        for artifact in artifacts:
            if artifact['filename'] != 'job.log':
                has_artifacts = True
                break

        if has_artifacts:
            job_id = job['id']
            print(f"Processing job ID: {job_id}", end="")
            delete_response = requests.delete(
                f"https://{server}/api/v4/projects/{project_id}/jobs/{job_id}/artifacts",
                headers={
                    'private-token': token,
                },
            )
            print(f" - status: {delete_response.status_code}")

    url = response.links.get('next', {}).get('url', None)

I get this error:

remove_artifacts.py", line 38, in <module>
    if artifact['filename'] != 'job.log':
       ~~~~~~~~^^^^^^^^^^^^
TypeError: string indices must be integers, not 'str'

mikeller · 2023-12-13T20:33:41Z

@Tim-Schwalbe: Apologies, yes, I overlooked this case. I have amended the script to ignore artifacts_file, as this file seems to be contained in artifacts anyway.

I have improved my version a bit, it now automatically selects expired artifacts for deletion that (in my opinion) should be deleted in the first place, because they belong to jobs that were run on:

merge requests that have been merged or closed;
branches that have been merged.

It will also take a list of project ids as the last argument, making it easy to use in a cron job: Usage: {sys.argv[0]} <server> <token> <project id>...

#!/usr/bin/env python3

import time
import requests
import sys
from datetime import datetime, timezone
from dateutil import parser
import re

if len(sys.argv) < 4:
    print(f'Usage: {sys.argv[0]} <server> <token> <project id>...')

    exit(1)

server = sys.argv[1]
token = sys.argv[2]
project_ids = []
for i in range(3, len(sys.argv)):
    project_ids.append(sys.argv[i])


now = datetime.now(timezone.utc)

overall_space_savings = 0
for project_id in project_ids:
    print(f'Processing project {project_id}:')

    merge_request_url = f"https://{server}/api/v4/projects/{project_id}/merge_requests?scope=all&per_page=100&page=1"
    merge_requests = {}
    while merge_request_url:
        response = requests.get(
            merge_request_url,
            headers={
                'private-token': token,
            },
        )

        if response.status_code in [500, 429]:
            print(f"Status {response.status_code}, retrying.")
            time.sleep(10)
            continue

        response.raise_for_status()
        response_json = response.json()

        for merge_request in response_json:
            iid = merge_request.get('iid', None)
            if iid:
                merge_requests[int(iid)] = merge_request['state']

        merge_request_url = response.links.get('next', {}).get('url', None)

    branch_url = f"https://{server}/api/v4/projects/{project_id}/repository/branches?per_page=100&page=1"
    unmerged_branches = []
    while branch_url:
        response = requests.get(
            branch_url,
            headers={
                'private-token': token,
            },
        )

        if response.status_code in [500, 429]:
            print(f"Status {response.status_code}, retrying.")
            time.sleep(10)
            continue

        response.raise_for_status()
        response_json = response.json()

        for branch in response_json:
            is_merged = branch['merged']
            if not is_merged:
                unmerged_branches.append(branch['name'])

        branch_url = response.links.get('next', {}).get('url', None)


    url = f"https://{server}/api/v4/projects/{project_id}/jobs?per_page=100&page=1"

    job_count = 0
    artifact_count = 0
    artifact_size = 0
    deleted_artifact_count = 0
    deleted_artifact_size = 0
    while url:
        response = requests.get(
            url,
            headers={
                'private-token': token,
            },
        )

        if response.status_code in [500, 429]:
            print(f"Status {response.status_code}, retrying.")
            time.sleep(10)
            continue

        response.raise_for_status()
        response_json = response.json()
        for job in response_json:
            job_count += 1

            artifacts = job.get('artifacts', None)
            artifacts_expire_at_string = job.get('artifacts_expire_at', None)
            artifacts_expire_at = None
            if artifacts_expire_at_string:
                    artifacts_expire_at = parser.parse(artifacts_expire_at_string)

            has_expired_artifacts = False
            deleted_job_artifact_count = 0
            deleted_job_artifact_size = 0
            if artifacts:
                for artifact in artifacts:
                    if artifact['filename'] != 'job.log':
                        size = artifact['size']

                        artifact_count += 1
                        artifact_size += size

                        if not artifacts_expire_at or artifacts_expire_at < now:
                            has_expired_artifacts = True
                            deleted_job_artifact_count += 1
                            deleted_job_artifact_size += size


            delete_artifacts = False
            if has_expired_artifacts:
                ref = job['ref']
                merge_request_iid_match = re.search(r'refs\/merge-requests\/(\d+)\/head', ref)
                if merge_request_iid_match:
                    merge_request_iid = merge_request_iid_match.group(1)
                    if merge_request_iid:
                        merge_request_status = merge_requests.get(int(merge_request_iid))
                        if merge_request_status in ['merged', 'closed', None]:
                            delete_artifacts = True
                            deleted_artifact_count += deleted_job_artifact_count
                            deleted_artifact_size += deleted_job_artifact_size

                elif ref not in unmerged_branches:
                    delete_artifacts = True
                    deleted_artifact_count += deleted_job_artifact_count
                    deleted_artifact_size += deleted_job_artifact_size

            if delete_artifacts:
                job_id = job['id']
                print(f"Processing job ID: {job_id}", end="")
                delete_response = requests.delete(
                    f"https://{server}/api/v4/projects/{project_id}/jobs/{job_id}/artifacts",
                    headers={
                        'private-token': token,
                    },
                )
                print(f" - status: {delete_response.status_code}\033[K", end = "\r")


        print(f'Processed page {url}.\033[K', end = "\r")

        url = response.links.get('next', {}).get('url', None)

    overall_space_savings += deleted_artifact_size

    print()
    print(f'Jobs analysed: {job_count}');
    print(f'Pre artifact count: {artifact_count}');
    print(f'Pre artifact size [MB]: {artifact_size / (1024 * 1024)}')
    print(f'Post artifact count: {artifact_count - deleted_artifact_count}')
    print(f'Post artifact size [MB]: {(artifact_size - deleted_artifact_size) / (1024 * 1024)}')
    print()

print(f'Overall savings [MB]: {overall_space_savings / (1024 * 1024)}')

voiski · 2023-12-13T21:00:32Z

@mikeller I suggest you write your script code in a gist, or even fork this one here and replace it with you python code =)

Each gist indicates which forks have activity, making it easy to find interesting changes from others.

mikeller · 2023-12-13T22:20:42Z

@voiski: Good point, done: https://gist.github.com/mikeller/ee7a668a83e4b9bc61646bddb4a2ade6

mikeller · 2024-02-20T21:04:12Z

New version that takes a GitLab group id as a parameter and then cleans up all repositories in the group: https://gist.github.com/mikeller/7034d99bc27c361fc6a2df84e19c36ff

cnblogs-dudu · 2025-01-12T14:07:00Z

The simplest way is using gitlab cli glab

glab ci delete --older-than 48h --paginate --per-page 200

See Clean up GitLab CI Build artifacts

	#!/bin/bash
	#
	# Written by Chris Arceneaux
	# GitHub: https://github.com/carceneaux
	# Email: [email protected]
	# Website: http://arsano.ninja
	#
	# Note: This code is a stop-gap to erase Job Artifacts for a project. I HIGHLY recommend you leverage
	# "artifacts:expire_in" in your .gitlab-ci.yml
	#
	# https://docs.gitlab.com/ee/ci/yaml/#artifactsexpire_in
	#
	# Software Requirements: curl, jq
	#
	# This code has been released under the terms of the Apache-2.0 license
	# http://opensource.org/licenses/Apache-2.0


	# project_id, find it here: https://gitlab.com/[organization name]/[repository name] at the top underneath repository name
	project_id="207"

	# token, find it here: https://gitlab.com/profile/personal_access_tokens
	token="9hjGYpwmsMfBxT-Ghuu7"
	server="gitlab.com"

	# Retrieving Jobs list page count
	total_pages=$(curl -sD - -o /dev/null -X GET \
	"https://$server/api/v4/projects/$project_id/jobs?per_page=100" \
	-H "PRIVATE-TOKEN: ${token}" \| grep -Fi X-Total-Pages \| sed 's/[^0-9]*//g')

	# Creating list of Job IDs for the Project specified with Artifacts
	job_ids=()
	echo ""
	echo "Creating list of all Jobs that currently have Artifacts..."
	echo "Total Pages: ${total_pages}"
	for ((i=2;i<=${total_pages};i++)) #starting with page 2 skipping most recent 100 Jobs
	do
	echo "Processing Page: ${i}/${total_pages}"
	response=$(curl -s -X GET \
	"https://$server/api/v4/projects/$project_id/jobs?per_page=100&page=${i}" \
	-H "PRIVATE-TOKEN: ${token}")
	length=$(echo $response \| jq '. \| length')
	for ((j=0;j<${length};j++))
	do
	if [[ $(echo $response \| jq ".[${j}].artifacts_file \| length") > 0 ]]; then
	echo "Job found: $(echo $response \| jq ".[${j}].id")"
	job_ids+=($(echo $response \| jq ".[${j}].id"))
	fi
	done
	done

	# Loop through each Job erasing the Artifact(s)
	echo ""
	echo "${#job_ids[@]} Jobs found. Commencing removal of Artifacts..."
	for job_id in ${job_ids[@]};
	do
	response=$(curl -s -X DELETE \
	-H "PRIVATE-TOKEN:${token}" \
	"https://$server/api/v4/projects/$project_id/jobs/$job_id/artifacts")
	echo "Processing Job ID: ${job_id} - Status: $(echo $response \| jq '.status')"
	done

carceneaux/remove_gitlab_artifacts.sh

YoungPyDawan commented Sep 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carceneaux commented Oct 21, 2019

Uh oh!

Kage-Yami commented Apr 5, 2020

Uh oh!

carceneaux commented Apr 6, 2020

Uh oh!

Kage-Yami commented Apr 7, 2020

Uh oh!

Atarity commented Apr 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philipptempel commented Jul 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

voiski commented Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mitar commented Jan 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tamasgal commented Nov 5, 2021

Uh oh!

mitar commented Nov 7, 2021

Uh oh!

tamasgal commented Nov 7, 2021

Uh oh!

willstott101 commented Dec 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kbaran1998 commented Apr 19, 2022

Uh oh!

Muffinman commented Jun 1, 2023

Uh oh!

cmuller commented Jul 7, 2023

Uh oh!

mikeller commented Dec 7, 2023

Uh oh!

Tim-Schwalbe commented Dec 7, 2023

Uh oh!

mikeller commented Dec 13, 2023

Uh oh!

voiski commented Dec 13, 2023

Uh oh!

mikeller commented Dec 13, 2023

Uh oh!

mikeller commented Feb 20, 2024

Uh oh!

cnblogs-dudu commented Jan 12, 2025

Uh oh!

YoungPyDawan commented Sep 30, 2019 •

edited

Loading

Atarity commented Apr 28, 2020 •

edited

Loading

philipptempel commented Jul 21, 2020 •

edited

Loading

voiski commented Sep 23, 2020 •

edited

Loading

mitar commented Jan 7, 2021 •

edited

Loading

willstott101 commented Dec 9, 2021 •

edited

Loading