Notes on the GUPPI Raw Data Format S. Ellingson Nov 1, 2013
This format consists of blocks, with each block consisting of a text header and a raw binary data segment.
An example of a header is shown in the file "header_example.txt". The header ends with the word "END". Some important fields are:
- OBSFREQ: [MHz] center of the RF passband
- OBSBW: [MHz] width of passband; negative sign indicates spectral flip
- OBSNCHAN: Number of channels (subbands)
- NPOL: Number of polarizations times 2. For example,
NPOL=4
means 2 polarizations. - NBITS: Number of bits per I or Q value. So, one complex-valued sample has
2*NBITS
bits - TBIN: [s] sample period within a channel
- CHAN_BW [MSPS] sample rate for a channel. Negative sign indicates spectral flip.
- OVERLAP: This many samples per subband from the previous data block are repeated at the beginning of this data block.
- BLOCSIZE: The size of the raw data segment in bytes.
The center frequency of channel i (where i is in [1..OBSNCHAN]
) is
OBSFREQ - OBSBW/2 + (i-0.5)*CHAN_BW
[MHz].
Pseudocode describing the structure of the raw data block is as follows:
for channel=1..OBSNCHAN,
for nsamples=1..NDIM,
for polarization=1..(NPOL/2)
write I, Q
Above, NDIM is the number of samples per channel in the block; i.e.,
BLOCSIZE/(OBSNCHAN*NPOL*(NBITS/8))
. This includes overlap bits. For
NBITS=8
, the samples are signed char
.
For the one and only dataset I've worked with so far (identified below):
OBSFREQ = 1378.125
OBSBW = -200
OBSNCHAN = 32
NPOL = 4
NBITS = 8
TBIN = 1.6E-07
CHAN_BW = -6.25
OVERLAP = 512
BLOCKSIZE=1073545216
In this case, NDIM = 8387072 and the time span covered by a raw data block is
NDIM*TBIN = 1.3419 s
. Keep in mind, however, that this 1.3419 s span overlaps
with the next block by 512 samples.
As an example, src/rg.c is C source code which reads a single header + raw data block from a GUPPI raw data file, extracts one channel, and writes it back out as time and spectra. (See the source code for compiling instructions and usage.) The script src/a.sh runs "rg" repeatedly to obtain the output for all channels. src/a.gp is a Gnuplot script which reads these files and plots the entire bandpass, including all channels.
guppi.png
is the output when the above code is applied to the file
guppi_56465_J1713+0747_0006.0000.raw
(NRAO folks:
/lustre/pulsar/scratch/1713+0747_global/raw
). For this particular dataset
there is a large DC offset in each channel, which accounts for the spike in the
center of each channel bandpass. In this output, channel 1 is on the right and
channel 32 is on the left.
Thanks to Paul Demorest for helping me figure this out.