-
-
Save ernstki/3707675c8a4ddb06d128154947c49e29 to your computer and use it in GitHub Desktop.
#!/usr/bin/env bash | |
## | |
## Query a GitLab v4 API endpoint, with pagination | |
## | |
## Author: Kevin Ernst <ernstki -at- mail.uc.edu> | |
## License: ISC or WTFPL, at your discretion | |
## Date: 22 May 2024 | |
## Requires: jq (https://github.com/jqlang/jq) | |
## Homepage: https://gist.github.com/ernstki/3707675c8a4ddb06d128154947c49e29 | |
## | |
ME=$(basename "${BASH_SOURCE[0]}") | |
_urlencode() ( | |
# Author: Chris Down (https://gist.github.com/cdown/1163649) | |
# with modifications to support multiple arguments by me | |
# License: Unknown | |
LC_COLLATE=C | |
while (( $# )); do | |
for (( i = 0; i < ${#1}; i++ )); do | |
c=${1:i:1} | |
case $c in | |
[a-zA-Z0-9.~_-]) printf "$c" ;; | |
*) printf '%%%02X' "'$c" ;; | |
esac | |
done | |
if (( $# > 1 )); then printf '+'; fi # delimit separate args w/ +'s | |
shift | |
done | |
) | |
_gitlab_api() ( | |
# set TRACE=1 in the environment to enable execution tracing | |
if (( TRACE )); then set -x; fi | |
set -u | |
: ${GITLAB_URL:?Please define GITLAB_URL as the base URL for your GitLab instance} | |
: ${GITLAB_TOKEN:?Please define GITLAB_TOKEN with your private token for the GitLab API} | |
local api=$GITLAB_URL/api/v4 | |
local endpoint=/search | |
local searchterms=() | |
local curlargs=( | |
--silent | |
--header "Authorization: Bearer $GITLAB_TOKEN" | |
) | |
local perpage=20 | |
local queryargs= count= totals= all= pages= wantheader= | |
while (( $# )); do | |
case $1 in | |
-h|--help|--flags|-\?) | |
echo " | |
$ME - query GitLab v4 API endpoints with pagination | |
usage: | |
$ME [-h|--help] | |
$ME [-c|--count] [-t|--totals] { /endpoint | TERM [TERM…] } | |
$ME [-a|--all] [-p|--pages INT] [-pp|--per-page INT] | |
${ME//?/ } { /endpoint | TERM [TERM…] } [&qs_arg1[&qs_arg2…]] | |
where: | |
-h, --help shows this help | |
-c, --count just prints the number of results and returns | |
-t, --totals prints HTTP headers for # pages, # per page, total results | |
-a, --all returns all records instead of just the first page | |
-p, --pages limit results to this many pages (default: 1) | |
-pp, --per-page specifies page size (default: $perpage) | |
…and other options starting with a dash are passed through to \`curl\` | |
examples: | |
$ $ME search terms # code search for 'search' and 'terms' | |
$ $ME --all '\"exact phrase\"' # search for an exact phrase, all results | |
$ $ME -I '\"search phrase\"' # see HTTP headers for the above | |
$ $ME --count /projects # count how many projects | |
homepage: | |
https://gist.github.com/ernstki/3707675c8a4ddb06d128154947c49e29 | |
" | |
return | |
;; | |
-c|--count) | |
count=1 | |
;; | |
-t|--totals) | |
totals=1 | |
;; | |
-a|--all) | |
all=1 | |
;; | |
-p|--pages) | |
shift | |
pages=$1 | |
;; | |
-pp|--pp|--per-page) | |
shift | |
perpage=$1 | |
;; | |
-*) | |
# FIXME: should probably _only_ accept -I / --head | |
if [[ $1 =~ -(i|-include) ]]; then | |
# because it intersperses headers into JSON output which | |
# `jq` can't handle | |
echo 'Ignoring unsupported `-i` / `--include` curl option.' >&2 | |
else | |
curlargs+=("$1") | |
fi | |
;; | |
/*) | |
endpoint=$1 | |
;; | |
\&*) | |
queryargs+=$1 | |
;; | |
*) | |
searchterms+=("$1") | |
;; | |
esac | |
shift | |
done | |
if [[ ${#searchterms[*]} -gt 0 ]]; then | |
if [[ $endpoint != /search ]]; then | |
echo 'ERROR: Bare search terms only accepted for `/search` endpoint.' >&2 | |
return 1 | |
fi | |
# otherwise | |
queryargs+="&scope=blobs&search=$(set +x; _urlencode "${searchterms[@]}")" | |
(( ${TRACE:-} )) && declare -p queryargs | |
fi | |
queryargs+="&per_page=$perpage" | |
if (( all && pages )); then | |
echo 'ERROR: The `--all` and `--pages` options are mutually-exclusive.' >&2 | |
return 1 | |
fi | |
if (( count )); then | |
# HTTP headers end with CR+LF, so make sure to get _only_ the digits | |
curl --head "${curlargs[@]}" "$api$endpoint?$queryargs" \ | |
| sed -n 's/X-Total: \([[:digit:]][[:digit:]]*\).*/\1/p' | |
return | |
elif (( totals )); then | |
# print summary of results using HTTP request headers | |
curl --head "${curlargs[@]}" "$api$endpoint?$queryargs" \ | |
| sed -nE '/X-(Page|Per-Page|Total|Total-Pages):/p' \ | |
| tr -d \\r | |
return | |
elif (( all )); then | |
# get the total number of pages | |
pages=$( | |
curl --head "${curlargs[@]}" "$api$endpoint?$queryargs" \ | |
| sed -n 's/X-Total-Pages: \([[:digit:]][[:digit:]]*\).*/\1/p' | |
) | |
if ! [[ $pages =~ ^[[:digit:]]+$ ]]; then | |
echo "ERROR: Problem fetching total pages; try TRACE=1." >&2 | |
return 1 | |
fi | |
else | |
if (( !pages )); then pages=1; fi | |
fi | |
if [[ "${curlargs[*]}" =~ -(I|-head[^e]) ]]; then | |
# the `[^e]` ensures we don't match because of `--header` (used to send | |
# the bearer token, so always in the argument list) | |
curl "${curlargs[@]}" "$api$endpoint?$queryargs" | |
# only need first page of results, so don't pipe through `jq` | |
else | |
# the first unwraps each results array, the second combines all results | |
# back into an array | |
for (( p=1; p<=pages; p++ )); do | |
if (( pages > 1 )); then echo "Fetching page ${p} of results…" >&2; fi | |
curl "${curlargs[@]}" "$api$endpoint?$queryargs&page=$p" | jq '.[]' | |
done | jq --slurp . | |
fi | |
) | |
# https://stackoverflow.com/a/28776166/785213 | |
# works because you can't `return` from a script | |
(return 0 2>/dev/null) && sourced=1 || sourced=0 | |
if (( !sourced )); then | |
_gitlab_api "$@" | |
fi |
Your tool is very nice, thanks 🙏
I just had to change the capitalized headers to lowercase because this is what my gitlab (16.9) was inserting in the HTTP header. I don't know if it depends on the version.
E.g.,
x-next-page: 2
x-page: 1
x-per-page: 20
x-prev-page:
x-request-id: 01JD7...VZRF
x-runtime: 0.534680
x-total: 4154
x-total-pages: 208
I don't know if it depends on the version. E.g.,
@cmuller, thanks. Glad it helped.
Please don't tell anyone I said this, but I'm not thrilled by the design changes of recent GitLab versions, so the version we use internally is quite far behind. I wouldn't doubt that there will be little issues like this if you're using a more recent version.
That said, I can and should update the script so it works either way; piping curl's outputs through tr A-Z a-z
should be sufficient for that, since sed itself doesn't appear to have a "case-insensitive" option.
Borne of the need to return all the results from a GitLab code search, and run some simple summary stats on those results. This seems like something
jq
or a competing utility would do, but searching the InterWebs, I turned up empty-handed. The usual suggestions were just use afor
loop in shell script.Probably Xidel has some kind of support for pagination, but the way Xidel works is sometimes difficult to reason about. Shell script, though, I can do.
Tested on macOS, with the *BSD version of
sed
, but I don't think I've done anything there that won't work on Linux. Feedback welcome.Installation
Make sure you have
jq
available in your search path.Create a personal access token in your GitLab settings with the
read_api
scope.Then:
Your
~/bin
is typically already in your$PATH
for most modern Unixes. You may need to log out and back in if your~/.profile
or similar checks for the existence of~/bin
on login, though.Examples
When searching for the exact phrase, make sure to wrap with literal double quotes, as shown below.
Bugs and misfeatures
No rate-limiting is done when there are multiple pages of results; this wasn't an issue for me since I created it for use on an internal site, but you could find yourself blocked if you try this on a public or heavily-loaded instance.
There is no error handling if you mess up the
GITLAB_URL
(remember to include e.g. the/gitlab
part of the URL if not served from the root) or yourGITLAB_TOKEN
is wrong. Here's how you can troubleshoot that, though:That is curl's
-I
/--head
option. Other curl options like-f
/--fail
may work, too.References