Aggregation
Aggregation
Molecule implements common aggregation functions that work for all supported databases.
Available aggregate functions
All cardinality-one attribute types can be called with the following aggregation functions that either return a single value or a set of values:
Single value: Multiple values:
------------- -----------------------------------------
min min(n) // returns n lowest values
max max(n) // returns n highest values
sample samples(n) // returns n random values
count distinct // returns all distinct values
countDistinctTwo exceptions exist:
Enumtype not supported to be aggregated.Booleanvalues can be aggregatd withcountandcountDistinctonly.
Furthermore, numbers can be aggregated with the following standard math functions:
sum
avg
median
variance
stddevNumbers types:
Byte Short
Int Float
Long Double
BigInt BigDecimal6 operations on aggregated values
Aggregated values can be compared to a constant value using the following operators:
Customer.id(count)(5).country // Countries with 5 customers (equality)
Customer.id(count).not(5).country // Countries not with 5 customers (negation)
Customer.id(count).<(5).country // Countries with less than 5 customers
Customer.id(count).>(5).country // Countries with more than 5 customers
Customer.id(count).<=(5).country // Countries with maximum 5 customers
Customer.id(count).>=(5).country // Countries with minimum 5 customersSorting of aggregated values
Aggregated value can be sorted ascending (a1) or descending (d1):
// (sorting a matching single value doesn't make sense)
// Countries not with 5 customers, descending customer count
Customer.id(count).not(5).d1.country
// Countries with less than 5 customers, ascending customer count
Customer.id(count).<(5).a1.country
// Countries with more than 5 customers, descending customer count
Customer.id(count).>(5).d1.country
// Countries with maximum 5 customers, ascending customer count
Customer.id(count).<=(5).a1.country
// Countries with minimum 5 customers, descending customer count
Customer.id(count).>=(5).d1.countryFor all types
Let's examine the aggregate functions one by one using the following data set:
Person.firstName.lastName.age.insert(
("Bob", "Johnson", 23),
("Liz", "Benson", 24),
("Liz", "Murray", 24),
("Liz", "Taylor", 25)
).transactmin/max
Apply min or max to any (cardinality-one) attribute to return its lowest/highest value.
Person.age(min).query.get.head ==> 23
Person.age(max).query.get.head ==> 25
Person.lastName(min).query.get.head ==> "Benson"
Person.lastName(max).query.get.head ==> "Taylor"Get min/max n values:
// 2 lowest/highest ages
Person.age(min(2)).query.get.head ==> Set(23, 24)
Person.age(max(2)).query.get.head ==> Set(24, 25)
// 2 firstly/lastly ordered lastNames
Person.lastName(min(2)).query.get.head ==> Set("Benson", "Johnson")
Person.lastName(max(2)).query.get.head ==> Set("Murray", "Taylor")Notice how the return type changes from <basetype to Set[<basetype>].
sample
Return a random saved value
// Multiple invocations, random value
Person.age(sample).query.get.head ==> 24
Person.age(sample).query.get.head ==> 23
Person.age(sample).query.get.head ==> 24Return n random values
Person.age(sample(2)).query.get.head ==> Set(24, 25)
Person.age(sample(2)).query.get.head ==> Set(25, 23)
Person.age(sample(2)).query.get.head ==> Set(23, 24)count
Count all persons with an age value:
Person.age(count).query.get ==> List(4)Ages per person name:
Person.firstName.age(count).query.get ==> List(
("Bob", 1),
("Liz", 3), // 24, 24, 25
)Ages per person name if more than 1 age:
Person.firstName.age(count).>(1).query.get ==> List(
("Liz", 3), // 24, 24, 25
)Occurences of each age:
Person.age.age(count).query.get ==> List(
(23, 1),
(24, 2),
(25, 1),
)countDistinct
Count all persons with a distinct age value:
Person.age(count).query.get ==> List(4)Distinct ages per person name:
Person.firstName.age(countDistinct).query.get ==> List(
("Bob", 1),
("Liz", 2), // 24, 25
)Distinct ages per person name if more than 1 age:
Person.firstName.age(countDistinct).>(1).query.get ==> List(
("Liz", 2), // 24, 25
)Distinct cccurences of each age (will always be 1):
Person.age.age(countDistinct).query.get ==> List(
(23, 1),
(24, 1), // 24 and 24 coalesced to one distinct value
(25, 1),
)distinct
Return distinct values in a Set
// Distinct ages by firstName
Person.firstName.age(distinct).query.get ==> List(
("Bob", Set(23)),
("Liz", Set(24, 25))
)
// Distinct ages
Person.age(distinct).query.get ==> List(
Set(23, 24, 25)
)Notice how applying the distinct keyword changes the return type from Int to Set[Int], or more generally from <basetype to Set[<basetype>].
For numbers
sum
Person.firstName.age(sum).query.get ==> List(
("Bob", 23),
("Liz", 24 + 24 + 25),
)
Person.age(sum).query.get.head ==> (23 + 24 + 24 + 25)avg
Person.firstName.age(avg).query.get ==> List(
("Bob", 23),
("Liz", (24 + 24 + 25) / 3.0), // compare double value
)
Person.age(avg).query.get.head ==> (23 + 24 + 24 + 25) / 4.0median
Person.firstName.age(median).query.get ==> List(
("Bob", 23),
("Liz", 24),
)
Person.age(median).query.get.head ==> 24variance
Person.age(variance) // variance of agesstddev
Person.age(stddev) // standard deviation of ages