-
-
Save bobvanluijt/a6f812589095f7435e4e8a99a7f8fef6 to your computer and use it in GitHub Desktop.
| ### | |
| # The result below shows the sum of population of all cities. | |
| ### | |
| { | |
| Local { | |
| Get(where:{ | |
| operands: [{ | |
| path: ["Things", "City", "population"], | |
| operator: GreaterThan | |
| valueInt: 1000000 | |
| }] | |
| },{ | |
| group:{ | |
| operands: [{ | |
| path: ["Things", "City", "population"], | |
| aggregate: SUM # other options: COUNT, MAX, MIN, SUM, AVG, | |
| }] | |
| } | |
| }) { | |
| Things { | |
| City { | |
| population | |
| } | |
| } | |
| } | |
| } | |
| } |
A extra advantage of doing tis that you'll be able to clearly defend that these are different functions with different pricing than just slurping the data out of weaviate/a network.
That indeed sounds reasonable. Not necessarily in favor for one or the other but syntactically it might absolutely be preferable to introduce a Stats{} function.
Naming wise, maybe Aggregate{} would suit better. Any thoughts @laura-ham and @moretea?
It would also be possible to add all aggregation functions as GQL-functions.
{
Local {
Aggregate(where: { ... }) { # or Stats...
Sum{}
Percentile{}
Count{}
Average{}
Maximum{}
Median{}
Minimum{}
Mode{}
GroupBy() {} # Would be used for more complext group by functions.
}
} @moretea would it be fair to say that splitting these aggregate functions (except for GroupBy()) would be relatively easier to implement?
Maybe they are simple to implement. I would expect so, based on my experience with SQL. However, Gremlin is not as well rounded, and I did not research this yet for Gremlin.
To me these feel like 'statistical' functions, and not 'Get' functions.
I'd argue that a separate 'Aggregated' or 'Statistics' field under Local would do wonders for keeping the 'Get' function simple.
{ Local { Aggregated() { Things { City { .... } } } // or Statistics(...) { Things { City { .... } } } } }I image that such a field could be translated to Network queries too.
I find it hard to understand what these different aggregations are supposed to do, based on the GraphQL query.
I believe that we should distinguish between simple counts with conditions, and more complex operations like
groupBy.Each different function/operation should ideally correspond to a field below 'Aggregated' or 'Statistics'.
This will make it very simple for end users to start to do some operations.
Initial impressions count, and a initial expore to simple statistics are very good for the demo-ability of Weaviate.
Simple sum
{ Local { Statistics(where: { ... }) { Sum { Things { City { population } } } } } }Output
95% percentile
{ Local { Statistics(where: { ... }) { # Compute 95% range of data. Percentile(from: 0.25, to: 0.975) { Things { City { population } } } } } }Output
Group By
{ Local { Statistics(where: { ... }) { GroupBy() { Things { City { country @groupBy(fn: GROUP_BY) totalPopulation: population @groupBy(fn: SUM) smallestCity: population @groupBy(fn: MIN) biggestCity: population @groupBy(fn: MAX) } } } } } }