-
-
Save bobvanluijt/a6f812589095f7435e4e8a99a7f8fef6 to your computer and use it in GitHub Desktop.
| ### | |
| # The result below shows the sum of population of all cities. | |
| ### | |
| { | |
| Local { | |
| Get(where:{ | |
| operands: [{ | |
| path: ["Things", "City", "population"], | |
| operator: GreaterThan | |
| valueInt: 1000000 | |
| }] | |
| },{ | |
| group:{ | |
| operands: [{ | |
| path: ["Things", "City", "population"], | |
| aggregate: SUM # other options: COUNT, MAX, MIN, SUM, AVG, | |
| }] | |
| } | |
| }) { | |
| Things { | |
| City { | |
| population | |
| } | |
| } | |
| } | |
| } | |
| } |
To me these feel like 'statistical' functions, and not 'Get' functions.
I'd argue that a separate 'Aggregated' or 'Statistics' field under Local would do wonders for keeping the 'Get' function simple.
{
Local {
Aggregated() {
Things { City { .... } }
}
// or
Statistics(...) {
Things { City { .... } }
}
}
}I image that such a field could be translated to Network queries too.
I find it hard to understand what these different aggregations are supposed to do, based on the GraphQL query.
I believe that we should distinguish between simple counts with conditions, and more complex operations like groupBy.
Each different function/operation should ideally correspond to a field below 'Aggregated' or 'Statistics'.
This will make it very simple for end users to start to do some operations.
Initial impressions count, and a initial expore to simple statistics are very good for the demo-ability of Weaviate.
Simple sum
{
Local {
Statistics(where: { ... }) {
Sum {
Things {
City {
population
}
}
}
}
}
}Output
{ "Local": { "Statistics": { "Sum": { "Things": { "City": { "population": 42 } } } } } }
95% percentile
{
Local {
Statistics(where: { ... }) {
# Compute 95% range of data.
Percentile(from: 0.25, to: 0.975) {
Things {
City {
population
}
}
}
}
}
}Output
{ "Local": { "Statistics": { "Sum": { "Things": { "City": { "population": {
"min": 1000,
"max": 2000
}} } } } } }
Group By
{
Local {
Statistics(where: { ... }) {
GroupBy() {
Things {
City {
country @groupBy(fn: GROUP_BY)
totalPopulation: population @groupBy(fn: SUM)
smallestCity: population @groupBy(fn: MIN)
biggestCity: population @groupBy(fn: MAX)
}
}
}
}
}
}A extra advantage of doing tis that you'll be able to clearly defend that these are different functions with different pricing than just slurping the data out of weaviate/a network.
That indeed sounds reasonable. Not necessarily in favor for one or the other but syntactically it might absolutely be preferable to introduce a Stats{} function.
Naming wise, maybe Aggregate{} would suit better. Any thoughts @laura-ham and @moretea?
It would also be possible to add all aggregation functions as GQL-functions.
{
Local {
Aggregate(where: { ... }) { # or Stats...
Sum{}
Percentile{}
Count{}
Average{}
Maximum{}
Median{}
Minimum{}
Mode{}
GroupBy() {} # Would be used for more complext group by functions.
}
} @moretea would it be fair to say that splitting these aggregate functions (except for GroupBy()) would be relatively easier to implement?
Maybe they are simple to implement. I would expect so, based on my experience with SQL. However, Gremlin is not as well rounded, and I did not research this yet for Gremlin.
Sure @laura-ham,
Assume our DB has:
Option 1
The query below will result in:
{ Local { Get(where:{ operands: [{ path: ["Things", "City", "population"], operator: GreaterThan valueInt: 1000000 }] },{ group:{ operands: [{ path: ["Things", "City", "population"], aggregate: SUM # other options: COUNT, MAX, MIN, SUM, AVG, }] } }) { Things { City { population } } } } }Option 2
The query below will result in:
{ Local { Get(where:{ operands: [{ path: ["Things", "City", "population"], operator: GreaterThan valueInt: 1000000 }] },{ group:{ operands: [{ path: ["Things", "City", "population"], aggregate: SUM # other options: COUNT, MAX, MIN, SUM, AVG, }] } }) { Things { City { name population } } } } }Option 3
The query below will result in:
{ Local { Get(where:{ operands: [{ path: ["Things", "City", "population"], operator: GreaterThan valueInt: 1000000 }] },{ group:{ operands: [{ path: ["Things", "City", "population"], aggregate: COUNT # other options: SUM, MAX, MIN, SUM, AVG, }] } }) { Things { City { name } } } } }Option 4
The query below will result in:
{ Local { Get(where:{ operands: [{ path: ["Things", "City", "population"], operator: GreaterThan valueInt: 1000000 }] },{ group:{ operands: [{ path: ["Things", "City", "population"], aggregate: COUNT # other options: SUM, MAX, MIN, SUM, AVG, }] } }) { Things { City { population } } } } }option 5
change in the DB
city-c's)The query below will result in:
{ Local { Get(where:{ operands: [{ path: ["Things", "City", "population"], operator: GreaterThan valueInt: 1000000 }] },{ group:{ operands: [{ path: ["Things", "City", "population"], aggregate: SUM # other options: COUNT, MAX, MIN, SUM, AVG, }] } }) { Things { City { name population } } } } }