Aggregation
Aggregation
Molecule implements common aggregation functions that work for all supported databases.
Available aggregate functions
All cardinality-one attribute types can be called with the following aggregation functions that either return a single value or a set of values:
Single value: Multiple values:
------------- -----------------------------------------
min min(n) // returns n lowest values
max max(n) // returns n highest values
sample samples(n) // returns n random values
count distinct // returns all distinct values
countDistinct
Two exceptions exist:
Enum
type not supported to be aggregated.Boolean
values can be aggregatd withcount
andcountDistinct
only.
Furthermore, numbers can be aggregated with the following standard math functions:
sum
avg
median
variance
stddev
Numbers types:
Byte Short
Int Float
Long Double
BigInt BigDecimal
6 operations on aggregated values
Aggregated values can be compared to a constant value using the following operators:
Customer.id(count)(5).country // Countries with 5 customers (equality)
Customer.id(count).not(5).country // Countries not with 5 customers (negation)
Customer.id(count).<(5).country // Countries with less than 5 customers
Customer.id(count).>(5).country // Countries with more than 5 customers
Customer.id(count).<=(5).country // Countries with maximum 5 customers
Customer.id(count).>=(5).country // Countries with minimum 5 customers
Sorting of aggregated values
Aggregated value can be sorted ascending (a1
) or descending (d1
):
// (sorting a matching single value doesn't make sense)
// Countries not with 5 customers, descending customer count
Customer.id(count).not(5).d1.country
// Countries with less than 5 customers, ascending customer count
Customer.id(count).<(5).a1.country
// Countries with more than 5 customers, descending customer count
Customer.id(count).>(5).d1.country
// Countries with maximum 5 customers, ascending customer count
Customer.id(count).<=(5).a1.country
// Countries with minimum 5 customers, descending customer count
Customer.id(count).>=(5).d1.country
For all types
Let's examine the aggregate functions one by one using the following data set:
Person.firstName.lastName.age.insert(
("Bob", "Johnson", 23),
("Liz", "Benson", 24),
("Liz", "Murray", 24),
("Liz", "Taylor", 25)
).transact
min/max
Apply min
or max
to any (cardinality-one) attribute to return its lowest/highest value.
Person.age(min).query.get.head ==> 23
Person.age(max).query.get.head ==> 25
Person.lastName(min).query.get.head ==> "Benson"
Person.lastName(max).query.get.head ==> "Taylor"
Get min/max n values:
// 2 lowest/highest ages
Person.age(min(2)).query.get.head ==> Set(23, 24)
Person.age(max(2)).query.get.head ==> Set(24, 25)
// 2 firstly/lastly ordered lastNames
Person.lastName(min(2)).query.get.head ==> Set("Benson", "Johnson")
Person.lastName(max(2)).query.get.head ==> Set("Murray", "Taylor")
Notice how the return type changes from <basetype
to Set[<basetype>]
.
sample
Return a random saved value
// Multiple invocations, random value
Person.age(sample).query.get.head ==> 24
Person.age(sample).query.get.head ==> 23
Person.age(sample).query.get.head ==> 24
Return n random values
Person.age(sample(2)).query.get.head ==> Set(24, 25)
Person.age(sample(2)).query.get.head ==> Set(25, 23)
Person.age(sample(2)).query.get.head ==> Set(23, 24)
count
Count all persons with an age
value:
Person.age(count).query.get ==> List(4)
Ages per person name:
Person.firstName.age(count).query.get ==> List(
("Bob", 1),
("Liz", 3), // 24, 24, 25
)
Ages per person name if more than 1 age:
Person.firstName.age(count).>(1).query.get ==> List(
("Liz", 3), // 24, 24, 25
)
Occurences of each age:
Person.age.age(count).query.get ==> List(
(23, 1),
(24, 2),
(25, 1),
)
countDistinct
Count all persons with a distinct age
value:
Person.age(count).query.get ==> List(4)
Distinct ages per person name:
Person.firstName.age(countDistinct).query.get ==> List(
("Bob", 1),
("Liz", 2), // 24, 25
)
Distinct ages per person name if more than 1 age:
Person.firstName.age(countDistinct).>(1).query.get ==> List(
("Liz", 2), // 24, 25
)
Distinct cccurences of each age (will always be 1):
Person.age.age(countDistinct).query.get ==> List(
(23, 1),
(24, 1), // 24 and 24 coalesced to one distinct value
(25, 1),
)
distinct
Return distinct values in a Set
// Distinct ages by firstName
Person.firstName.age(distinct).query.get ==> List(
("Bob", Set(23)),
("Liz", Set(24, 25))
)
// Distinct ages
Person.age(distinct).query.get ==> List(
Set(23, 24, 25)
)
Notice how applying the distinct
keyword changes the return type from Int
to Set[Int]
, or more generally from <basetype
to Set[<basetype>]
.
For numbers
sum
Person.firstName.age(sum).query.get ==> List(
("Bob", 23),
("Liz", 24 + 24 + 25),
)
Person.age(sum).query.get.head ==> (23 + 24 + 24 + 25)
avg
Person.firstName.age(avg).query.get ==> List(
("Bob", 23),
("Liz", (24 + 24 + 25) / 3.0), // compare double value
)
Person.age(avg).query.get.head ==> (23 + 24 + 24 + 25) / 4.0
median
Person.firstName.age(median).query.get ==> List(
("Bob", 23),
("Liz", 24),
)
Person.age(median).query.get.head ==> 24
variance
Person.age(variance) // variance of ages
stddev
Person.age(stddev) // standard deviation of ages