Jimmy jimsrc

Overview

I just read this trick for text compression, in order to save tokens in subbsequent interactions during a long conversation, or in a subsequent long text to summarize.

SHORT VERSION:

It's useful to give a mapping between common words (or phrases) in a given long text that one intends to pass later. Then pass that long text to gpt-4 but encoded with such mapping. The idea is that the encoded version contains less tokens than the original text. There are several algorithms to identify frequent words or phrases inside a given text, such as NER, TF-IDF, part-of-speech (POS) tagging, etc.

Run for example:

./build_long.prof.py -- --years 2006 2015  --fname_inp ./test_iv.h5  --nbin 1200  --obs wAoP_wPrs  -fig ./test.png

where:

wAoP_wPrs is one of the available fields. Others are: wAoP (i.e. only AoP correction), wAoP_wPrs_wGh (with AoP, pressure, and geopotential height) ,aop (AoP values), etc.
-fig refers to filename of output figure.

run

To build temperature-corrected histograms:

./build_temp.corr.py -- -o ../out/out.build_temp.corr/shape.ok_and_3pmt.ok/15min/test.h5 
# see the rest of options in the default argument values:
./build_temp.corr.py -- -h

Cython example of exposing C-computed arrays in Python without data copies

The goal of this example is to show how an existing C codebase for numerical computing (here c_code.c) can be wrapped in Cython to be exposed in Python.

The meat of the example is that the data is allocated in C, but exposed in Python without a copy using the PyArray_SimpleNewFromData numpy

	% Definitions for the journal names
	\newcommand{\adv}{{\it Adv. Space Res.}}
	\newcommand{\ag}{{\it Ann. Geophys.}}
	\newcommand{\annG}{{\it Ann. Geophys.}}
	\newcommand{\arxiv}{{\it ArXiv e-prints}}
	\newcommand{\aap}{{\it Astron. Astrophys.}}
	%\newcommand{\aaps}{{\it Astron. Astrophys. Suppl.}}
	%\newcommand{\aapr}{{\it Astron. Astrophys. Rev.}}
	%\newcommand{\aj}{{\it Astronom. J.}}
	%\newcommand{\an}{{\it Astronomische Nachrichten}}