Skip to content

Instantly share code, notes, and snippets.

@tstellanova
Last active July 8, 2024 17:59
Show Gist options
  • Save tstellanova/97fb83b480a93545f29f76f562f0776f to your computer and use it in GitHub Desktop.
Save tstellanova/97fb83b480a93545f29f76f562f0776f to your computer and use it in GitHub Desktop.
Notes on the GUPPI Raw Data Format (from S. Ellingson)

Notes on the GUPPI Raw Data Format S. Ellingson Nov 1, 2013

This format consists of blocks, with each block consisting of a text header and a raw binary data segment.

An example of a header is shown in the file "header_example.txt". The header ends with the word "END". Some important fields are:

  • OBSFREQ: [MHz] center of the RF passband
  • OBSBW: [MHz] width of passband; negative sign indicates spectral flip
  • OBSNCHAN: Number of channels (subbands)
  • NPOL: Number of polarizations times 2. For example, NPOL=4 means 2 polarizations.
  • NBITS: Number of bits per I or Q value. So, one complex-valued sample has 2*NBITS bits
  • TBIN: [s] sample period within a channel
  • CHAN_BW [MSPS] sample rate for a channel. Negative sign indicates spectral flip.
  • OVERLAP: This many samples per subband from the previous data block are repeated at the beginning of this data block.
  • BLOCSIZE: The size of the raw data segment in bytes.

The center frequency of channel i (where i is in [1..OBSNCHAN]) is OBSFREQ - OBSBW/2 + (i-0.5)*CHAN_BW [MHz].

Pseudocode describing the structure of the raw data block is as follows:

for channel=1..OBSNCHAN,
  for nsamples=1..NDIM,
    for polarization=1..(NPOL/2)
       write I, Q

Above, NDIM is the number of samples per channel in the block; i.e., BLOCSIZE/(OBSNCHAN*NPOL*(NBITS/8)). This includes overlap bits. For NBITS=8, the samples are signed char.

For the one and only dataset I've worked with so far (identified below):

OBSFREQ = 1378.125
OBSBW = -200
OBSNCHAN = 32
NPOL = 4
NBITS = 8
TBIN = 1.6E-07
CHAN_BW = -6.25
OVERLAP = 512
BLOCKSIZE=1073545216

In this case, NDIM = 8387072 and the time span covered by a raw data block is NDIM*TBIN = 1.3419 s. Keep in mind, however, that this 1.3419 s span overlaps with the next block by 512 samples.

As an example, src/rg.c is C source code which reads a single header + raw data block from a GUPPI raw data file, extracts one channel, and writes it back out as time and spectra. (See the source code for compiling instructions and usage.) The script src/a.sh runs "rg" repeatedly to obtain the output for all channels. src/a.gp is a Gnuplot script which reads these files and plots the entire bandpass, including all channels.

guppi.png is the output when the above code is applied to the file guppi_56465_J1713+0747_0006.0000.raw (NRAO folks: /lustre/pulsar/scratch/1713+0747_global/raw). For this particular dataset there is a large DC offset in each channel, which accounts for the spike in the center of each channel bandpass. In this output, channel 1 is on the right and channel 32 is on the left.

Thanks to Paul Demorest for helping me figure this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment