The main tricks I came up with in LAST are precisely basing it on quaternary and making up S-optimization. The former allows for streamlining of the basic BLC interpreter design (the 4 paths that handle the basic term elements are simplified), while the latter capitalizes on that by reducing repetition (nb. making the whole thing more efficient). I spent quite a bit of time working out and automating S-optimimization -- which is certainly the most interesting aspect of LAST and what sets it apart from RFNHS3 or other LC variants. I think it's pretty neat.
So of course if we get rid of the quaternary input, we'll immediately lose the advantage.
I noticed that using two separate tokens for variable handling allows BLC to interpret LAST in only 193 bits.
I think you can get that down to 181 bits, assuming 1:1 translation of S-deoptimized version of the LAST self-interpreter into BLC syntax (so not a valid BLC program, because the assumed input is different). Anyway 193 bits is also nice and I like how it's both one bit less than LAST-in-LAST and an odd number of bits. :D And it's shorter than BLC-in-BLC. Perhaps this goes to confirm that quaternary input fits better for this sort of an interpreter design. Certainly the BLC self-interpreter translated to LAST syntax goes in the other direction, growing to 268 bits.
Still, I suspect that for most programs, the savings from S-optimization do not quite make up for the (n-1) extra bits needed for every occurrence of variable n. What would for instance be the length of the shortest LAST program for the prime number character sequence, which takes 167 bits in BLC?
Direct syntactical translation seems to be 192 bits before and 174 bits after S-optimization -- so either way longer. I won't attempt a proper optimized port -- perhaps I'll figure that out the next time I fall into a bout of lambda calculus obsession. :D But certainly for short programs it's true that even with S-optimization, an equivalent LAST program will be longer. S-optimization will certainly cut down on longer programs, especially ones that build long stacks (use a lot of variables), and do some jumping over them. Some of these programs will then have shorter equivalents in LAST, particularly if in porting we take advantage of the quaternary input -- case in point being the self-interpreter.
S-optimization is interesting on its own regardless, as a minimal LC optimization technique.
In terms of optimizing the length of not the self-interpreter, but of BLC programs though, perhaps you may find interesting a few possible simple syntax variants of BLC that I've worked out, based on different binary encodings for variables which decrease the length of all variables except a few, e.g.
variable BLC repr. variant BLC repr. BLC length variant length difference
0. 10 10 2 2 0
1. 110 1100 3 4 -1
2. 1110 1101 4 4 0
3. 11110 1110 5 4 1
4. 111110 111100 6 6 0
5. 1111110 111101 7 6 1
6. 11111110 111110 8 6 2
7. 111111110 11111100 9 8 1
8. 1111111110 11111101 10 8 2
9. 11111111110 11111110 11 8 3
10. 111111111110 1111111100 12 10 2
11. 1111111111110 1111111101 13 10 3
12. 11111111111110 1111111110 14 10 4
13. 111111111111110 111111111100 15 12 3
14. 1111111111111110 111111111101 16 12 4
...
Here variable 0. is encoded the same as in BLC, but other variables are encoded differently. Only variable 1. is longer in this BLC variant, variables 0., 2., and 4. are the same length, and all other variables are shorter and increase in length slower than in vanilla BLC (but still linearly). So programs that only use variables 0. thru 2. will be longer, whereas programs that use variables above 2. will start shrinking in length.
With this syntax the original 167-bit primes program becomes 166 bits:
0001000110011001010001101000000001011000001001000101011110111010010001101000011100011010000000000101101101011100011111011110000000011110011011010000010110000011001100
The 1001 BF-interpreter shrinks down to 994 bits:
0001000101110001000110100001000100010001000100011100010111010110010111010110010111010110011001111001111110000000010111011011000101110101011101011110000000000001100001010111110110111011010111100111111000100111000111100000000000000101101111000101010111110111110011101101110001100110010111010111000111100000000001100001011111001000010110111100111001110001111000000000010111001110000101101110110001110001100110010111010110011000000101101111100001011111100100011010000000010101100100011010000000010111000110011101110100011111110001011111101111101011001010011100011000000001011000101100011100011101001000010111001001110100100000000110000101101110110100000001011011100000000101100001111110011110101100011010000101010001101000000101100101000110011001100000010110000011001100000001110001110010000010011100110001100000000000001010000000011100010101000110100000000101110000000001010111110111110111000000010111110100010110011100111101110101010111111111001111001000100101101100000000010111011011001000001100
The 206-bit self-interpreter on the other hand grows by 2 bits:
0100011010000000010101100000000001110100010111110001110100010111000111010000011101000101101100110101111000011110000101110110011100100101100111000001101100000101111000011110000111000110111011110001110111011100
There are also other possible encodings for variables, that need much less bits for higher variables, at the expense of a few bits in shorter variables. For example this encoding:
variable BLC repr. variant BLC repr. BLC length variant length difference
0. 10 10 2 2 0
1. 110 1110 3 4 -1
2. 1110 110010 4 6 -2
3. 11110 110110 5 6 -1
4. 111110 111110 6 6 0
5. 1111110 11000010 7 8 -1
6. 11111110 11000110 8 8 0
7. 111111110 11001110 9 8 1
8. 1111111110 11010010 10 8 2
9. 11111111110 11010110 11 8 3
10. 111111111110 11011110 12 8 4
11. 1111111111110 11110010 13 8 5
12. 11111111111110 11110110 14 8 6
13. 111111111111110 11111110 15 8 7
14. 1111111111111110 1100000010 16 10 6
...
Here variable 0. is encoded the same as in BLC, further variables are 11<number in ternary>10
where <number in ternary>
uses 00, 01, 11 as 1, 2, 3. So variables 1., 2., 3., and 5. are longer in this BLC variant, variables 0., 4., and 6. are the same length, and all other variables are shorter and increase in length not linearly but logarithmically, according to powers of 3. So programs that only use variables 0. thru 6. will be longer, whereas programs that use variables much above 6. will start quickly shrinking in length.
I wonder if you stumbled upon anything similar when working out the syntax of BLC which in any case is a great local optimum that works well in terms of simplicity and length for shorter programs. The other encodings for variables would certainly increase parsing complexity, so self-interpreters for these BLC variants would be much longer than the original.
I chose 24 bytes because it's a nice round number (3 * 2^3 * 2^3 bits) that sat a seemingly comfortable 14 bit margin below my best effort.
The conjecture assumes a binary input, that must be read bit-by-bit.
Ah cool, thanks for demystifying it for me! ;)
As I suspected, LAST's trickery indeed goes too far.
In the end, as we relax the restrictions on the encoding of the input, it seems that we can decrease the length of the self-interpreter almost arbitrarily -- down to λm.m(λx.x)(λx.x) [Mogensen] (as pointed out by @sargstuff in this thread) or even λm.m(λx.x) [Brown and Palsberg], if we are very clever about it.
Then there's the problem of practicality -- to truly take advantage of our minimalism we need to now think about a proper input encoder or specialized hardware.
That may be more justifiable if we were starting from scratch, coming up with an entirely new architecture that takes lambda calculus as its basis, the way LISP machines were based on Lisp.
While obsessing about this and letting the mind wander I noticed the nice direct correspondence between the quaternary encoding of LAST and the DNA/RNA nucleobases -- we could trivially encode LAST with AGCT/AGCU.
Now this is a pretty cool coincidence that can provide some individuals with peculiar interests with even more peculiar entertainment, such as going through all possible permuations of AGCT, looking for the longest valid LC program embedded in random mosquito DNA. Or maybe all this could lead to something more practical, and we could find real space in biological computing for messing around with LC-based architectures.
Anyway, these things are pretty fun to think about sometimes.
Thank you for coherently writing down your thoughts for anybody to build upon what you have figured out. It's fun, it's inspiring, and it works!
Cheerio