session_analytics
=================

This PostgreSQL extension provides some functions for querying hstore arrays.

Functions
---------
 * **count\_elements(elements hstore[], pattern text)**. returns the number of
   elements that match the given pattern.
 * **multi\_count\_elements(elements hstore[], patterns text[])**. returns an array
   of integers which contains number of elements that match the corresponding
   pattern. This is equivalent of multiple calls of count\_elements, but is usually
   faster.
 * **sum(int[])**. aggregate function for summing vectors. This is useful for
   aggregating result of multi\_count\_elements.
 * **filter\_elements(elements hstore[], pattern text)**. returns the subsequence
   of elements that match the given pattern.
 * **contains\_elements(elements hstore[], patterns text[])**. returns 1 if we
   can find a subsequence of elements such that they match the corresponding
   patterns. Otherwise, it returns 0.
 * **funnel\_events(elements hstore[], patterns text[])**. returns an array with
   length equal to length of patterns. if patterns[1..i] matches a subesequence of
   patterns, then result[i] is 1, otherwise result[i] is 0.
 * **group\_elements(elements hstore[], fields text[])**. groups the elements
   by the value of the given fields, and returns a row per each group. Each group
   is tuple (keys text[], elements hstore[]).
 * **group\_over\_time(elements hstore[], fields text[], time\_field text,
   truncation\_unit text, client\_offset interval)**. this function is similar to
   group\_elements, but also adds the value of the given time field to group
   keys. The time field is shifted by the client offset and truncated by the given
   unit before being add to the group keys.
 * **filter\_groups(group group_elements_result, pattern text)**. filters out the
   given group if it doesn't match the given pattern.


Deprecated Functions
--------------------
 * **cohort\_over\_time(elements hstore[], fields text[], time_field text,
   truncation\_unit text, client\_offset interval, patterns text[],
   min\_sizes int[])**. similar to group\_over\_time, but filters out the groups
   who doesn't have minimum count of elements matching the given patterns.
   This function now can be replaced with a nested call to filter\_groups and
   group\_over\_time.


Pattern Language
---------------
The pattern language below is used to match hstore objects using field based expressions 
that can be combined using AND (",") and OR ("|") operators to create complex 
expressions:
 * **field\_name => value**. Is value of field\_name equal to the given value?
 * **field\_name =>not value**. Is value of field\_name not equal to the given value?
 * **field\_name =>gt value**, **field\_name =>gte value**,
   **field\_name =>lt value**, **field\_name =>lte value**. int64 comparison between
   value of field\_name and given value.
 * **field\_name =>contains value**. Is given value substring of the value of
   field\_name?
 * **field\_name =>not_contains value**. Opposite of =>contains.
 * **field\_name =>regex pattern**. Does value of field\_name match the given regex
   pattern?
 * **field\_name =>like pattern**. Does value of field\_name match the given like
   pattern?
 * **field\_name =>ilike pattern**. Similar to =>like, but case insensitive.
 * **A , B**. Does both A and B hold?
 * **A | B**. Does A or B hold?
 * **(A)**. Parentheses for arbitrary mixing AND and OR operators.
 * values can be escaped and quoted.

The above pattern language is extended for filter\_groups to support aggregate
functions:

 * You can use **function(pattern)** in place of field\_name in any of the above
   patterns.
 * **count(pattern)**. number of elements matching the pattern.
 * **sum(pattern;field\_name)**. sum of field\_name in elements matching the pattern.
 * **max(pattern;field\_name)**. max of field\_name in elements matching the pattern.
 * **min(pattern;field\_name)**. min of field\_name in elements matching the pattern.
 * **avg(pattern;field\_name)**. integer part of average of field\_name in elements
   matching the pattern.

Examples
--------
These examples are based on a database where you have a schema like the following:

```
┌─────────────┬──────────┬───────────┐
│   Column    │   Type   │ Modifiers │
├─────────────┼──────────┼───────────┤
│ cust_id     │ bigint   │           │
│ cust_events │ hstore[] │           │
└─────────────┴──────────┴───────────┘
```

You can think of element in the cust_events array as representing a single event that a
customer generated. An example event might look like:

```
{
    client      => iphone-32.2.1a,
    page_class  => message,
    action      => send,
    time        => 1398363313
}
```

## Count occurences of certain types of events
Here is an example where you are counting the total number of message page views:

```SQL
SELECT
    sum(count_elements(cust_events, 'action=>view,page_class=>message'))
FROM
    cust_events;
```

## Count occurences of certain types of events grouped by additional event metadata
Here is an example of counting the types of clients cucstomers use to view or list
messages. Note the use of the '|' and ',' operators to do OR and AND operations and 
also the use of '(...)' to group expressions.

```SQL
SELECT
    count(*),
    unnest(
        filter_elements(cust_events, '(action=>view,page_class=>message)|(action=>list,page_class=>message)')
    )->'client' as client,
FROM
    cust_events
GROUP BY
    client;
```

Another way to ask the same question illustrating how filters can be chained:
```SQL
SELECT
    count(*),
    unnest(
        filter_elements(
            filter_elements(cust_events, 'page_class=>message'),
            'action=>list|action=>view'
        )
    )->'client' as client,
FROM
    cust_events
GROUP BY
    client;

```

## Count customers who performaned certain actions over their lifetime
Here's an example illustrating the use of the count_elements function to count the number
of users who repeatedly viewed a message:

```SQL
SELECT
    count(*)
FROM
    cust_events
WHERE
    count_elements(cust_events, 'action=>view,page_class=>message') > 2;
```

## Listing customers who performed a sequence of actions
Say you run a retail application and want to list users who searched for a product, viewed
it, and went on to purchase:
```SQL
SELECT
    user_id
FROM
    cust_events
WHERE
    contains_elements(cust_events, 
        ARRAY['action=>search', 'action=>view,page_class=>product', 'action=>purchase']
    );
```

## Generating funnel queries
Now imagine an application similar to the one above but instead of a list of users you want
to know the counts of users that moved through various stages of your funnel:
```SQL
SELECT
    sum(contains_elements(cust_events, ARRAY['action=>search']) as search,
    sum(contains_elements(cust_events, ARRAY['action=>search', 
                                             'action=>view,page_class=>product']) as view,
    sum(contains_elements(cust_events, ARRAY['action=>search', 
                                             'action=>view,page_class=>product', 
                                             'action=>purchase']) as purchase
FROM
    cust_events;
```

Adding New Operators
--------------------
To add new operators:

1. add the operator to the TOKENIZATION_REGEX,
2. add it to the list of token string pound-definitions,
3. add it to the OperatorCode enum,
4. add it to the OperatorTokens mapping,
5. For string operators, modify StringOperator()/StringOperatorResult()
   functions.
6. For numeric operators, modify NumericOperator()/NumericOperatorResult()
   functions.


Copyright
---------
Copyright 2014-2016 Citus Data.

Licensed under the Apache License, Version 2.0.