Created
September 30, 2011 15:00
-
-
Save terrycojones/1253997 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
What I have often wanted was some semi-automated way to produce filters | |
written in C. That way, with some little language, one could produce a | |
filter, compile it and run it repeatedly. Not only that, adding to the | |
produced filter would be possible, since you would have the C code. | |
Plus it would run quickly. I wrote enough small filters do do | |
specialised little tasks to be sick of doing it, and so I wrote "filta" | |
to do the job. | |
The language of filta is currently very small. I'd go so far as to say | |
it's tiny. But I have found it very useful - which is more than I can | |
say for some of the other things I've done. Here are the only tokens | |
recognised in the language... | |
== = != < > && || if else ( ) { } f# $# n t s "string" split | |
Like awk, filta splits all input lines by white space (' ' and TAB). | |
This can be changed or even turned off. The only action available is to | |
print. The only other things you can do are comparisons and | |
line-splitting. In the above, # refers to a positive integer. f# is | |
equivalent to $# and refers to the #'th field on the current input | |
line. f0 (and $0) refer to the whole line, f$ (and $$) refer to the | |
last field. | |
A string enclosed in double quotes is printed - C character | |
representations ("\n", "\t" etc etc) may be used. n, t and s are | |
equivalent to "\n", "\t" and " " respectively. (n = newline, t = tab, s | |
= space). | |
if, else, (, ), {, }, &&, ||, ==, !=, <, and > are all used as in C. | |
= is entirely equivalent to ==. | |
The word "split" should be followed by a string of characters that the | |
input line should be split up on. Thus split " \t" splits on white | |
space, split ":" splits on colons. split "" does nothing. In | |
particular, if the first command in a filta program is a split, then | |
the default splitting on white space is not done. So to turn off | |
splitting altogether one does split "" first off. | |
There is no need to separate commands with white space. Thus f1f2 means | |
print field one and field two, as does f1 f2 and $1$2 and $1 $2. | |
Here is a simple use of filta. | |
cat file | filta 'f1 "\n"' | |
which prints the first white space separated field of each line and | |
then a newline. This could have also been done more simply as "filta | |
f1n". Here is a more complicated example. | |
cat file | filta 'if (f1 = f2) f3 else f4' | |
And you can probably work out what this does. Of course this could have | |
been written as filta 'if(f1=f2)f3elsef4' for those that don't like | |
spaces. Note the hard single quotes around the program to hide the | |
parentheses (or double quotes) from the shell. | |
cat /etc/passwd | | |
filta 'split":" if (f1 = "tcjones") "Terry's home directory is " f6n' | |
etc. It is possible to leave out the "if" as well. Thus the above could | |
have started off filta 'split":" (f1="tcjones")' etc etc. | |
It is also possible to split a line more than once. For example | |
(assuming my encrypted password contains no commas!) | |
cat /etc/passwd | |
| filta -s 'split":" if ($1="tcjones") {split"," "my office is " f2n}' | |
And so on. | |
So what does filta actually do? Your small program is read and parsed | |
and a simple (usually 50 odd lines) C program is produced. This is then | |
executed by filta and as a result gets the standard input that filta | |
was supplied with. By necessity the resulting a.out file is left in the | |
current directory (if possible) and can (and SHOULD) be re-used. By | |
default the source is removed before the a.out file is executed. You | |
can arrange for the source to be kept with the -s flag. So | |
cat file | filta -s f1 s f2n | |
is a filter that prints the first field, a space, the second field and | |
a newline BUT in addition you get to keep the C source. If you want to | |
run it again you just say | |
cat file | a.out | |
The source is placed in Filta.c in the current directory (if | |
possible). The filter program that is built can handle input lines of | |
length up to 4K with up to 20 fields. But that's easily changed, seeing | |
as -s gives you the source. | |
If you just type | |
filta -s | |
filta will wait for you to enter a program. This will be translated | |
into C, the resulting code will be compiled but NOT executed. Saying | |
filta -s f1n | |
is not the same. This will write the C program, compile it and execute | |
it (and will therefore sit there waiting for you to type input at it). | |
If filta is unable to write the current directory it puts the source in | |
strangely named files in under /tmp and tells you where they may be found. | |
Also valid is | |
filta -f <filename> | |
which reads the program from the file <filename>. | |
Anyway I'm not going to go any more. There are more details it is handy | |
to know, but if you want them send me mail or read the code. filta is | |
not meant to be an awk replacement, it just makes it easier to do some | |
things with greater speed, and easier and MUCH faster to repeat (using | |
the a.out produced). It is also very useful as a starting point for | |
the writing of your own special purpose filters. Have fun. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment