terrycojones · September 30, 2011 15:00
diff --git a/gistfile1.txt b/gistfile1.txt
 What I have often wanted was some semi-automated way to produce filters
 written in C. That way, with some little language, one could produce a
 filter, compile it and run it repeatedly. Not only that, adding to the
 produced filter would be possible, since you would have the C code.
 Plus it would run quickly. I wrote enough small filters do do
 specialised little tasks to be sick of doing it, and so I wrote "filta"
 to do the job.

 The language of filta is currently very small. I'd go so far as to say
 it's tiny. But I have found it very useful - which is more than I can
 say for some of the other things I've done. Here are the only tokens
 recognised in the language...


 == = != < > && || if else ( ) { } f# $# n t s "string" split

 Like awk, filta splits all input lines by white space (' ' and TAB).
 This can be changed or even turned off. The only action available is to
 print. The only other things you can do are comparisons and
 line-splitting. In the above, # refers to a positive integer. f# is
 equivalent to $# and refers to the #'th field on the current input
 line. f0 (and $0) refer to the whole line, f$ (and $$) refer to the
 last field.

 A string enclosed in double quotes is printed - C character
 representations ("\n", "\t" etc etc) may be used. n, t and s are
 equivalent to "\n", "\t" and " " respectively. (n = newline, t = tab, s
 = space).

 if, else, (, ), {, }, &&, ||, ==, !=, <, and > are all used as in C. 
 = is entirely equivalent to ==.

 The word "split" should be followed by a string of characters that the
 input line should be split up on. Thus split " \t" splits on white
 space, split ":" splits on colons. split "" does nothing. In
 particular, if the first command in a filta program is a split, then
 the default splitting on white space is not done. So to turn off
 splitting altogether one does split "" first off.

 There is no need to separate commands with white space. Thus f1f2 means
 print field one and field two, as does f1 f2 and $1$2 and $1 $2.

 Here is a simple use of filta.

    cat file | filta 'f1 "\n"'

 which prints the first white space separated field of each line and
 then a newline. This could have also been done more simply as "filta
 f1n".  Here is a more complicated example.

    cat file | filta 'if (f1 = f2) f3 else f4'

 And you can probably work out what this does. Of course this could have
 been written as filta 'if(f1=f2)f3elsef4' for those that don't like
 spaces. Note the hard single quotes around the program to hide the
 parentheses (or double quotes) from the shell.

    cat /etc/passwd | 
    filta 'split":" if (f1 = "tcjones") "Terry's home directory is " f6n'

 etc. It is possible to leave out the "if" as well. Thus the above could
 have started off filta 'split":" (f1="tcjones")' etc etc.

 It is also possible to split a line more than once. For example
 (assuming my encrypted password contains no commas!)

    cat /etc/passwd 
    | filta -s 'split":" if ($1="tcjones") {split"," "my office is " f2n}'

 And so on.


 So what does filta actually do? Your small program is read and parsed
 and a simple (usually 50 odd lines) C program is produced. This is then
 executed by filta and as a result gets the standard input that filta
 was supplied with. By necessity the resulting a.out file is left in the
 current directory (if possible) and can (and SHOULD) be re-used. By
 default the source is removed before the a.out file is executed. You
 can arrange for the source to be kept with the -s flag.  So

    cat file | filta -s f1 s f2n

 is a filter that prints the first field, a space, the second field and
 a newline BUT in addition you get to keep the C source. If you want to
 run it again you just say

    cat file | a.out

 The source is placed in Filta.c in the current directory (if
 possible).  The filter program that is built can handle input lines of
 length up to 4K with up to 20 fields. But that's easily changed, seeing
 as -s gives you the source.

 If you just type

    filta -s

 filta will wait for you to enter a program. This will be translated
 into C, the resulting code will be compiled but NOT executed. Saying

    filta -s f1n

 is not the same. This will write the C program, compile it and execute
 it (and will therefore sit there waiting for you to type input at it).

 If filta is unable to write the current directory it puts the source in
 strangely named files in under /tmp and tells you where they may be found.

 Also valid is 

    filta -f <filename>

 which reads the program from the file <filename>.



 Anyway I'm not going to go any more. There are more details it is handy
 to know, but if you want them send me mail or read the code. filta is
 not meant to be an awk replacement, it just makes it easier to do some
 things with greater speed, and easier and MUCH faster to repeat (using
 the a.out produced).  It is also very useful as a starting point for
 the writing of your own special purpose filters. Have fun.
	What I have often wanted was some semi-automated way to produce filters
	written in C. That way, with some little language, one could produce a
	filter, compile it and run it repeatedly. Not only that, adding to the
	produced filter would be possible, since you would have the C code.
	Plus it would run quickly. I wrote enough small filters do do
	specialised little tasks to be sick of doing it, and so I wrote "filta"
	to do the job.

	The language of filta is currently very small. I'd go so far as to say
	it's tiny. But I have found it very useful - which is more than I can
	say for some of the other things I've done. Here are the only tokens
	recognised in the language...


	== = != < > && \|\| if else ( ) { } f# $# n t s "string" split

	Like awk, filta splits all input lines by white space (' ' and TAB).
	This can be changed or even turned off. The only action available is to
	print. The only other things you can do are comparisons and
	line-splitting. In the above, # refers to a positive integer. f# is
	equivalent to $# and refers to the #'th field on the current input
	line. f0 (and $0) refer to the whole line, f$ (and $$) refer to the
	last field.

	A string enclosed in double quotes is printed - C character
	representations ("\n", "\t" etc etc) may be used. n, t and s are
	equivalent to "\n", "\t" and " " respectively. (n = newline, t = tab, s
	= space).

	if, else, (, ), {, }, &&, \|\|, ==, !=, <, and > are all used as in C.
	= is entirely equivalent to ==.

	The word "split" should be followed by a string of characters that the
	input line should be split up on. Thus split " \t" splits on white
	space, split ":" splits on colons. split "" does nothing. In
	particular, if the first command in a filta program is a split, then
	the default splitting on white space is not done. So to turn off
	splitting altogether one does split "" first off.

	There is no need to separate commands with white space. Thus f1f2 means
	print field one and field two, as does f1 f2 and $1$2 and $1 $2.

	Here is a simple use of filta.

	cat file \| filta 'f1 "\n"'

	which prints the first white space separated field of each line and
	then a newline. This could have also been done more simply as "filta
	f1n". Here is a more complicated example.

	cat file \| filta 'if (f1 = f2) f3 else f4'

	And you can probably work out what this does. Of course this could have
	been written as filta 'if(f1=f2)f3elsef4' for those that don't like
	spaces. Note the hard single quotes around the program to hide the
	parentheses (or double quotes) from the shell.

	cat /etc/passwd \|
	filta 'split":" if (f1 = "tcjones") "Terry's home directory is " f6n'

	etc. It is possible to leave out the "if" as well. Thus the above could
	have started off filta 'split":" (f1="tcjones")' etc etc.

	It is also possible to split a line more than once. For example
	(assuming my encrypted password contains no commas!)

	cat /etc/passwd
	\| filta -s 'split":" if ($1="tcjones") {split"," "my office is " f2n}'

	And so on.


	So what does filta actually do? Your small program is read and parsed
	and a simple (usually 50 odd lines) C program is produced. This is then
	executed by filta and as a result gets the standard input that filta
	was supplied with. By necessity the resulting a.out file is left in the
	current directory (if possible) and can (and SHOULD) be re-used. By
	default the source is removed before the a.out file is executed. You
	can arrange for the source to be kept with the -s flag. So

	cat file \| filta -s f1 s f2n

	is a filter that prints the first field, a space, the second field and
	a newline BUT in addition you get to keep the C source. If you want to
	run it again you just say

	cat file \| a.out

	The source is placed in Filta.c in the current directory (if
	possible). The filter program that is built can handle input lines of
	length up to 4K with up to 20 fields. But that's easily changed, seeing
	as -s gives you the source.

	If you just type

	filta -s

	filta will wait for you to enter a program. This will be translated
	into C, the resulting code will be compiled but NOT executed. Saying

	filta -s f1n

	is not the same. This will write the C program, compile it and execute
	it (and will therefore sit there waiting for you to type input at it).

	If filta is unable to write the current directory it puts the source in
	strangely named files in under /tmp and tells you where they may be found.

	Also valid is

	filta -f <filename>

	which reads the program from the file <filename>.



	Anyway I'm not going to go any more. There are more details it is handy
	to know, but if you want them send me mail or read the code. filta is
	not meant to be an awk replacement, it just makes it easier to do some
	things with greater speed, and easier and MUCH faster to repeat (using
	the a.out produced). It is also very useful as a starting point for
	the writing of your own special purpose filters. Have fun.