Skip to content

Instantly share code, notes, and snippets.

@mattwiebe
Created January 23, 2025 21:13
Show Gist options
  • Save mattwiebe/e644bb43d7bea38aa781acba6b89ae8a to your computer and use it in GitHub Desktop.
Save mattwiebe/e644bb43d7bea38aa781acba6b89ae8a to your computer and use it in GitHub Desktop.
<!-- wp:paragraph -->
<p>After my <a href="https://mattwiebe.blog/2023/08/30/how-to-run-codellama-in-vscode-on-macos/">last post</a> about setting up CodeLlama, some colleagues have asked the million dollar question: how does CodeLlama compare to Copilot? My early answer is: "I don't know yet" but here's a useful comparison.</p>
<!-- /wp:paragraph -->
<!-- wp:paragraph -->
<p>I have little shell script that lets me interface with <a href="https://phacility.com/phabricator/arcanist/">Arcanist</a> (<code>arc</code>), part of the Phabricator toolset we use internally. One of the (many) weird things about it vs something like Github is that your PRs (Diffs in Phabricator parlance) are independent from git branches. This is a problem when I need to update a Diff, but I don't know what I've been working with from the current branch. <code>git push</code> is just not possible, I need to run <code>arc diff --update DXXXXX</code> instead. But what is the ID?</p>
<!-- /wp:paragraph -->
<!-- wp:paragraph -->
<p>There's already a command for this: <code>arc which</code>. But its output is messy. I'm working on a Diff that adds <a href="https://github.com/Automattic/wordpress-activitypub/pull/395">this plugin PR</a> to WordPress.com, fixing (inevitable) bugs along the way, and this is what <code>arc which</code> gives me:</p>
<!-- /wp:paragraph -->
<!-- wp:syntaxhighlighter/code -->
<pre class="wp-block-syntaxhighlighter-code">REPOSITORY
To identify the repository associated with this working copy, arc followed this process:
Configuration value "repository.callsign" is set to "WPGIT".
Found a unique matching repository.
This working copy is associated with the WordPress.com repository.
COMMIT RANGE
If you run 'arc diff', changes between the commit:
212461c993d9d7ad [redacted commit message]
...and the current working copy state will be sent to Differential, because
it is the merge-base of 'origin/trunk' and HEAD, as specified by
'git:merge-base(origin/trunk)' in your project 'base' configuration.
You can see the exact changes that will be sent by running this command:
$ git diff 212461c993d9d7ad..HEAD
These commits will be included in the diff:
60eb751ea35ecfe1 [redacted commit message]
55dc4536770138b0 ActivityPub: add Follow Me block
MATCHING REVISIONS
These Differential revisions match the changes in this working copy:
D120281 ActivityPub: add Follow Me block
Reason: Commit message for '55dc4536770138b0' has explicit 'Differential Revision'.
Since exactly one revision in Differential matches this working copy, it will
be updated if you run 'arc diff'.</pre>
<!-- /wp:syntaxhighlighter/code -->
<!-- wp:paragraph -->
<p>(That last line is a lie.) I already have a shell command that cuts out the noise and just provides me with the URL and Diff ID that I want:</p>
<!-- /wp:paragraph -->
<!-- wp:syntaxhighlighter/code -->
<pre class="wp-block-syntaxhighlighter-code">arc which | grep -E 'D[0-9]+\ ' -o | awk '{ printf "https://redacted.a8c.com/%s %s\n", $1, $1 }'</pre>
<!-- /wp:syntaxhighlighter/code -->
<!-- wp:paragraph -->
<p>Which for the above gives me:</p>
<!-- /wp:paragraph -->
<!-- wp:syntaxhighlighter/code -->
<pre class="wp-block-syntaxhighlighter-code">https://redacted.a8c.com/D120281 D120281</pre>
<!-- /wp:syntaxhighlighter/code -->
<!-- wp:paragraph -->
<p>Cool. But I decided that I'd stripped too much context, I wanted the Diff title after the ID as well, so that my output would be like this:</p>
<!-- /wp:paragraph -->
<!-- wp:syntaxhighlighter/code -->
<pre class="wp-block-syntaxhighlighter-code">ActivityPub: add Follow Me block
https://redacted.a8c.com/D120281 D120281</pre>
<!-- /wp:syntaxhighlighter/code -->
<!-- wp:paragraph -->
<p>That was hard because <code>awk</code> separates on spaces and would only give me the first word of the title in my first attempt:</p>
<!-- /wp:paragraph -->
<!-- wp:syntaxhighlighter/code -->
<pre class="wp-block-syntaxhighlighter-code">arc which | grep -E 'D[0-9]+\ (.*)' -o | awk '{ printf "%s\nhttps://redacted.a8c.com/%s %s\n", $2, $1, $1 }'
ActivityPub:
https://redacted.a8c.com/D120281 D120281</pre>
<!-- /wp:syntaxhighlighter/code -->
<!-- wp:paragraph -->
<p>So I decided to ask both Copilot and CodeLlama (<code>7b-instruct-q5_K_M</code>) for help. I kept my wording a bit vague to see how they would do. </p>
<!-- /wp:paragraph -->
<!-- wp:paragraph -->
<p>Spoiler: they both contributed something useful, and it was doubly interesting that CodeLlama told me I was doing it wrong.</p>
<!-- /wp:paragraph -->
<!-- wp:heading {"level":3} -->
<h3 class="wp-block-heading">Prompt</h3>
<!-- /wp:heading -->
<!-- wp:syntaxhighlighter/code -->
<pre class="wp-block-syntaxhighlighter-code">I have the following shell script, which parses an ID and the title out of arc which output.
arc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ printf "%s\nhttps://code.a8c.com/%s %s\n", $2, $1, $1 }'
arc which line that concerns me looks like:
` D120281 ActivityPub: add Follow Me block`
The trouble is that awk thinks $2 is only the first word of the remaining line, just "ActivityPub". I want the whole line in the output. How can I achieve that?</pre>
<!-- /wp:syntaxhighlighter/code -->
<!-- wp:heading {"level":3} -->
<h3 class="wp-block-heading">Copilot response</h3>
<!-- /wp:heading -->
<!-- wp:syntaxhighlighter/code -->
<pre class="wp-block-syntaxhighlighter-code">To include the entire title in the output, you can modify the awk command to print all fields starting from the second field. Here's the updated command:
```
arc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ printf "%s\nhttps://code.a8c.com/%s ", $2, $1; for (i=2; i&lt;=NF; i++) printf "%s ", $i; printf "\n" }'
```
'
In this updated command, we're using a for loop to print all fields starting from the second field. The NF variable represents the total number of fields in the input, so we're looping from the second field (i=2) to the last field (i&lt;=NF). We're then using printf to print each field separated by a space.
This should output the entire title in the output, along with the ID and URL.</pre>
<!-- /wp:syntaxhighlighter/code -->
<!-- wp:heading {"level":3} -->
<h3 class="wp-block-heading">CodeLlama response</h3>
<!-- /wp:heading -->
<!-- wp:syntaxhighlighter/code -->
<pre class="wp-block-syntaxhighlighter-code">To get the whole line in the output, you need to use a different approach with awk. Instead of using $2 to print out the second word of the remaining line, you can use $0 to print out the entire line. Here's an updated version of your shell script that uses awk to print out the whole line:
```
arc which | grep -E 'D[0-9]+\ (.*)' -o | awk '{ printf "https://code.a8c.com/%s %s\n", $1, $0 }'
```
In this version of the script, we use awk to print out the entire line ($0) as part of the URL that we want to output. We also use $1 to print out the identifier followed by one or more numbers. I hope this helps! Let me know if you have any questions.</pre>
<!-- /wp:syntaxhighlighter/code -->
<!-- wp:paragraph -->
<p>Neither approach was exactly what I wanted, but I learned something from both and was able to use this to produce my desired outcome:</p>
<!-- /wp:paragraph -->
<!-- wp:syntaxhighlighter/code -->
<pre class="wp-block-syntaxhighlighter-code">arc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ for (i=2; i&lt;=NF; i++) printf "%s ", $i; printf "\n %s https://code.a8c.com/%s\n", $1, $1; }'</pre>
<!-- /wp:syntaxhighlighter/code -->
<!-- wp:paragraph -->
<p>In any case, googling for code help is probably pretty much dead. And this is the first generation of these tools, which will only get better.</p>
<!-- /wp:paragraph -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment