Schema

A Datomic schema defines the set of possible attributes that we can use.

In Molecule we make this definition in a Schema definition file:

Schema definition file

Molecule provides an intuitive and type-safe dsl to model your schema in a Schema definition file. After each change you make in this file you need to compile your project with sbt compile so that the sbt-plugin can create a Molecule DSL from your definitions.

Let’s look at the schema definition from the Seattle tutorial:

package path.to.your.project
import molecule.schema.definition._  // import schema definition DSL

@InOut(2, 8)
object SeattleDefinition {

  trait Community {
    val name         = oneString.fulltextSearch.doc("A community's name") // optional doc text
    val url          = oneString
    val category     = manyString.fulltextSearch
    val orgtype      = oneEnum('community, 'commercial, 'nonprofit, 'personal)
    val `type`       = oneEnum('email_list, 'twitter, 'facebook_page) // + more...
    val neighborhood = one[Neighborhood]
  }

  trait Neighborhood {
    val name     = oneString
    val district = one[District]
  }

  trait District {
    val name   = oneString
    val region = oneEnum('n, 'ne, 'e, 'se, 's, 'sw, 'w, 'nw)
  }
}

The outer object SeattleDefinition encapsulates our schema definition. The name of this object has to end with “Definition” in order for the sbt-molecule plugin to recognize it.

Custom Scala Doc generation

The sbt-molecule plugin even generates ScalaDoc documentation for the custom DSL generated form the schema definition file! Attribute types are explained and an optional doc(<text...>) can be added to give a hint about the attribute when working with the code in the IDE. Given the doc text above for the Community.name attribute we can see this in our IDE:

Attribute Scala docs

Molecule arity

The @InOut(2, 8) arity annotation instructs the sbt-molecule plugin to generate boilerplate code with the ability to create molecules with up to 8 attributes including up to 2 input attributes.

When developing your schema you might just set the first arity annotation variable for input attributes to 0 and then later when your schema is stabilizing add the ability to make input molecules by setting it to 1, 2 or 3 (the maximum). Using parameterized input attributes can be a performance optimization since using input values in Datalog queries allow Datomic to cache the query.

The second arity annotation parameter basically tells how long molecules you can build (this doesn’t affect how many attributes you can define in each namespace). The maximum arity is 22, the same as for tuples.

If you at some point need to make molecules with more than 22 attributes you can use composite molecules or insert/query in two steps as described in attribute basics.

Namespaces

Attribute names in Datomic are namespaced keywords with the lexical form <Namespace>.<attribute>. Molecule lets you define the <Namespace> part with the name of the trait, like Community in the Seattle example above. In this way Molecule can construct the full name of the Community.category attribute etc.

Schema

Namespace != Table

If coming from an sql background one might at first think of a namespace as a table having columns (attributes). But this is not the case. An entity in Datomic can associate values of attributes from any namespace:

So, when we build a molecule

val toughCommunities = Community.name.Neighborhood.name("Tough").get

we shouldn’t think of it like a

Community table with name field with a join to Neighborhood table with a name field set to ‘Tough’” (wrong!)

but rather think it as

Entities with a communityName attribute having a reference to an entity with a neighborhoodName value ‘Tough’”

Partitions

Namespaces can also be organized in partitions.

From the Datomic schema reference:

“All entities created in a database reside within a partition. Partitions group data together, providing locality of reference when executing queries across a collection of entities. In general, you want to group entities based on how you’ll use them. Entities you’ll often query across - like the community-related entities in our sample data - should be in the same partition to increase query performance. Different logical groups of entities should be in different partitions. Partitions are discussed in more detail in the Indexes topic.”

In the schema definition file we can organize namespaces in partitions with objects:

@InOut(0, 4)
object PartitionTestDefinition {

  object gen {
    trait Person {
      val name   = oneString
      val gender = oneEnum('male, 'female)
    }
    // ..more namespaces in the `gen` partition
  }

  object lit {
    trait Book {
      val title  = oneString
      val author = one[gen.Person]
      // To avoid attr/partition name clashes we can prepend the definition object name
      // (in case we would have needed an attribute named `gen` for instance)
      val editor = one[PartitionTestDefinition.gen.Person]
      val cat    = oneEnum('good, 'bad)
    }
    // ..more namespaces in the `lit` partition
  }
}

Here we have a gen (general) partition and a lit (litterature) partition. Each partition can contain as many namespaces as you want. This can be a way also to structure large domains conceptually. The partition name has to be lowercase and is prepended to the namespaces it contains.

When we build molecules the partition name is prepended to the namespace like this:

lit_Book.title.cat.Author.name.gender.get === ...

Since Author is already defined as a related namespace we don’t need to prepend the partition name there.

When we insert a Person the created entity will automatically be saved in the gen partition (or whatever we call it).

Attribute types

In the Seattle example we see the attributes being defined with the following types that should be pretty self-explanatory:

  • oneString, manyString etc defines cardinality and type of an attribute
  • oneEnum/manyEnum defines enumerated values (pre-defined words)
  • one[<ReferencedNamespace>] defines a reference to another namespace

We can define the following types of attributes:

Cardinality one              Cardinality many                 Mapped cardinality many
-------------------          -------------------------        --------------------------------
oneString     : String       manyString    : Set[String]      mapString     : Map[String, String]
oneInt        : Int          manyInt       : Set[Int]         mapInt        : Map[String, Int]
oneLong       : Long         manyLong      : Set[Long]        mapLong       : Map[String, Long]
oneFloat      : Float        manyFloat     : Set[Float]       mapFloat      : Map[String, Float]
oneDouble     : Double       manyDouble    : Set[Double]      mapDouble     : Map[String, Double]
oneBigInt     : BigInt       manyBigInt    : Set[BigInt]      mapBigInt     : Map[String, BigInt]
oneBigDecimal : BigDecimal   manyBigDecimal: Set[BigDecimal]  mapBigDecimal : Map[String, BigDecimal]
oneBoolean    : Boolean      manyBoolean   : Set[Boolean]     mapBoolean    : Map[String, Boolean]
oneDate       : Date         manyDate      : Set[Date]        mapDate       : Map[String, Date]
oneUUID       : UUID         manyUUID      : Set[UUID]        mapUUID       : Map[String, UUID]
oneURI        : URI          manyURI       : Set[URI]         mapURI        : Map[String, URI]
oneEnum       : String       manyEnum      : Set[String]

Cardinality-one attributes can have one value per entity.

Cardinality-many attributes can have a Set of unique values per entity. Often we choose instead to model many-values as a many-reference to another entity that could have more than one attribute.

Mapped cardinality many attributes are a special Molecule variation based on cardinality-many attributes. Read more here

Reference types

References are also treated like attributes. It’s basically a reference to one or many entities. We define such relationship by supplying the referenced namespace as the type parameter to one/many:

Cardinality one         Cardinality many
---------------         ----------------
one[<Ref-namespace>]    many[<Ref-namespace>]

In the example above we saw a reference from Community to Neighborhood defined as one[Neighborhood]. We would for instance likely define an Order/OrderLine relationship in an Order namespace as many[OrderLine].

Attribute options

Each attribute can also have some extra options:

Option Indexes Description
doc   Attribute description.
uniqueValue ✔︎ Attribute value is unique to each entity.
Attempts to insert a duplicate value for a different entity id will fail.
uniqueIdentity ✔︎ Attribute value is unique to each entity and "upsert" is enabled.
Attempts to insert a duplicate value for a temporary entity id will cause all attributes associated with that temporary id to be merged with the entity already in the database.
indexed ✔︎ Generated index for this attribute. By default all attributes are set with the indexed option automatically by Molecule, so you don't need to set this.
fulltextSearch ✔︎ Generate eventually consistent fulltext search index for this attribute.
isComponent ✔︎ Specifies that an attribute whose type is :db.type/ref is a component. Referenced entities become subcomponents of the entity to which the attribute is applied.
When you retract an entity with :db.fn/retractEntity, all subcomponents are also retracted. When you touch an entity, all its subcomponent entities are touched recursively.
noHistory   Whether past values of an attribute should not be retained.

Datomic indexes the values of all attributes having an option except for the doc and noHistory options.

As you saw, we added fulltextSearch to some of the attributes in the Seattle definition above. Molecule’s schema definition DSL let’s you only choose allowed options for any attribute type.

Next

Schema transaction…