JSON Schema Validation & Expressive Query Syntax in MongoDB 3.6
This article was originally published on MongoDB. Thank you for supporting the partners who make SitePoint possible.
One of MongoDB’s key strengths has always been developer empowerment: by relying on a flexible schema architecture, MongoDB makes it easier and faster for applications to move through the development stages from proof-of-concept to production and iterate over update cycles as requirements evolve.
However, as applications mature and scale, they tend to reach a stable stage where frequent schema changes are no longer critical or must be rolled out in a more controlled fashion, to prevent undesirable data from being inserted into the database. These controls are especially important when multiple applications write into the same database, or when analytics processes rely on predefined data structures to be accurate and useful.
MongoDB 3.2 was the first release to introduce Document Validation, one of the features that developers and DBAs who are accustomed to relational databases kept demanding. As MongoDB’s CTO, Eliot Horowitz, highlighted in Document Validation and What Dynamic Schemas Means:
Along with the rest of the 3.2 “schema when you need it” features, document validation gives MongoDB a new, powerful way to keep data clean. These are definitely not the final set of tools we will provide, but is rather an important step in how MongoDB handles schema.
Announcing JSON Schema Validation Support
Building upon MongoDB 3.2’s Document Validation functionality, MongoDB 3.6 introduces a more powerful way of enforcing schemas in the database, with its support of JSON Schema Validation, a specification which is part of IETF’s emerging JSON Schema standard.
JSON Schema Validation extends Document Validation in many different ways, including the ability to enforce schemas inside arrays and prevent unapproved attributes from being added. These are the new features we will focus on in this blog post, as well as the ability to build business validation rules.
Starting with MongoDB 3.6, JSON Schema is the recommended way of enforcing Schema Validation. The next section highlights the features and benefits of using JSON Schema Validation.
Switching from Document Validation to JSON Schema Validation
We will start by creating an orders collection (based on an example we published in the Document Validation tutorial blog post):
db.createCollection("orders", {
validator: {
item: { $type: "string" },
price: { $type: "decimal" }
}
});
With this document validation configuration, we not only make sure that both the item and price attributes are present in any order document, but also that item is a string and price a decimal (which is the recommended type for all currency and percentage values). Therefore, the following element cannot be inserted (because of the “rogue” price attribute):
db.orders.insert({
"_id": 6666,
"item": "jkl",
"price": "rogue",
"quantity": 1 });
However, the following document could be inserted (notice the misspelled “pryce” attribute):
db.orders.insert({
"_id": 6667,
"item": "jkl",
"price": NumberDecimal("15.5"),
"pryce": "rogue" });
Prior to MongoDB 3.6, you could not prevent the addition of misspelled or unauthorized attributes. Let’s see how JSON Schema Validation can prevent this behavior. To do so, we will use a new operator, $jsonSchema:
db.runCommand({
collMod: "orders",
validator: {
$jsonSchema: {
bsonType: "object",
required: ["item", "price"],
properties: {
item: {
bsonType: "string"
},
price: {
bsonType: "decimal"
}
}
}
}
});
The JSON Schema above is the exact equivalent of the document validation rule we previously set above on the orders collection. Let’s check that our schema has indeed been updated to use the new $jsonSchema operator by using the db.getCollectionInfos() method in the Mongo shell:
db.getCollectionInfos({name:"orders"})
This command prints out a wealth of information about the orders collection. For the sake of readability, below is the section that includes the JSON Schema:
...
"options" : {
"validator" : {
"$jsonSchema" : {
"bsonType" : "object",
"required" : [
"item",
"price"
],
"properties" : {
"item" : {
"bsonType" : "string"
},
"price" : {
"bsonType" : "decimal"
}
}
}
},
"validationLevel" : "strict",
"validationAction" : "error"
}
...
Now, let’s enrich our JSON schema a bit to make better use of its powerful features:
db.runCommand({
collMod: "orders",
validator: {
$jsonSchema: {
bsonType: "object",
<strong>additionalProperties: false</strong>,
required: ["item", "price"],
properties: {
<strong>_id: {}</strong>,
item: {
bsonType: "string",
description: "'item' must be a string and is required"
},
price: {
bsonType: "decimal",
description: "'price' must be a decimal and is required"
},
quantity: {
<strong>bsonType: ["int", "long"]</strong>,
minimum: 1,
maximum: 100,
exclusiveMaximum: true,
description:
"'quantity' must be short or long integer between 1 and 99"
}
}
}
}
});
Let’s go through the additions we made to our schema:
- First, note the use of the additionalProperties:false attribute: it prevents us from adding any attribute other than those mentioned in the properties section. For example, it will no longer be possible to insert data containing a misspelled pryce attribute. As a result, the use of additionalProperties:false at the root level of the document also makes the declaration of the _id property mandatory: whether our insert code explicitly sets it or not, it is a field MongoDB requires and would automatically create, if not present. Thus, we must include it explicitly in the properties section of our schema.
- Second, we have chosen to declare the quantity attribute as either a short or long integer between 1 and 99 (using the minimum, maximum and exclusiveMaximum attributes). Of course, because our schema only allows integers lower than 100, we could simply have set the bsonType property to int. But adding long as a valid type makes application code more flexible, especially if there might be plans to lift the maximum restriction.
- Finally, note that the description attribute (present in the item, price, and quantity attribute declarations) is entirely optional and has no effect on the schema aside from documenting the schema for the reader.
With the schema above, the following documents can be inserted into our orders collection:
db.orders.insert({
"item": "jkl",
"price": NumberDecimal(15.50),
"quantity": NumberInt(99)
});
db.orders.insert({
"item": "jklm",
"price": NumberDecimal(15.50),
"quantity": NumberLong(99)
});
However, the following documents are no longer considered valid:
db.orders.insert({
"item": "jkl",
"price": NumberDecimal(15.50),
<strong>"quantity": NumberInt(100)</strong>
});
db.orders.insert({
"item": "jkl",
"price": NumberDecimal(15.50),
<strong>"quantity": "98"</strong>
});
db.orders.insert({
"item": "jkl",
<strong>"pryce": NumberDecimal(15.50),</strong>
"quantity": NumberInt(99)
});
You probably noticed that our orders above are seemingly odd: they only contain one single item. More realistically, an order consists of multiple items and a possible JSON structure might be as follows:
{
_id: 10000,
total: NumberDecimal(141),
VAT: 0.20,
totalWithVAT: NumberDecimal(169),
lineitems: [
{
sku: "MDBTS001",
name: "MongoDB Stitch T-shirt",
quantity: NumberInt(10),
unit_price:NumberDecimal(9)
},
{
sku: "MDBTS002",
quantity: NumberInt(5),
unit_price: NumberDecimal(10)
}
]
}
With MongoDB 3.6, we can now control the structure of the lineitems
array, for instance with the following JSON Schema:
The post JSON Schema Validation & Expressive Query Syntax in MongoDB 3.6 appeared first on SitePoint.
LEAVE A REPLY
You must be logged in to post a comment.