Validate JSON input with JSON Schema

For several years now, JSON has been a standard format for data submitted to and returned from HTTP APIs. Even in traditional web sites (by which I mean, ones that deliver primarily HTML to be consumed by a human with a web browser), JSON is also used, primarily by JavaScript code. Handling JSON is sometimes carried out entirely by the browser, and sometimes in the communication between the browser and the server.

For ensuring that JSON inputs adhere to some sensible constraints, JSON Schema is a good tool to keep in your web developer toolbox. With JSON Schema, one describes what form your JSON data should have, and then one checks a piece of data against the schema to make sure that the data has the intended form. If so, you can pass it along for further processing. If not, well, you need to reject the input somehow.

Argo is a Racket package for JSON Schema validation. Here’s how you can add Argo to your Racket web site.

Example: adding items to a sales catalog

Let’s imagine that we need to add items to a sales catalog. The site you’re working on has a form where employees can add new items to a catalog, which customers can then buy. New items should come with the following information:

The aim here is to build a web site that offers a single HTTP resource that consumes such an input, in JSON format, and creates a new record in a relational database. JSON Schema will help us to ensure that the data has some reasonable structure. We will check incoming data against the schema (which we will soon write) and, if everything looks good, convert the JSON data into an SQL INSERT statement, to be executed and stored by a relational database sitting outside of Racket.

JSON Schema for creating new products

Step one in all of this is to design our data. For our purposes here, there are two aspects to this problem: one is to design a JSON Schema that checks that a JSON input is sensible, and the other is to design an SQL schema for storing JSON data, assuming that it survives validation by the JSON Schema. Let’s tackle the first problem.

Let’s try to flesh out the structure of a new product in terms of JSON Schema. A reasonable first go at nailing down a schema is to list the properties that must be present in an input, and then nail down step-by-step what each of the properties should look like.

Here’s a sensible first attempt:

{
    "type": "object",
    "properties": {
	"name": {
	    "type": "string"
	},
	"sku": {
	    "type": "string"
	},
	"categories": {
	    "type": "array",
	    "items": {
		"type": "string"
	    }
	},
	"price": {
	    "type": "number"
	},
	"currency": {
	    "type": "string"
	},
	"image": {
	    "type": "string"
	}
    }
}

Let’s fine-tune the schema by ensuring that items are non-trivial: we don’t allow the empty string neither as a tag nor as the name of a product, the array of tags cannot be empty, and the image URL should be, well, a URL.

Here’s the improved schema:

{
    "type": "object",
    "properties": {
	"name": {
	    "type": "string",
	    "minLength": 1
	},
	"sku": {
	    "type": "string",
	    "minLength": 1
	},
	"categories": {
	    "type": "array",
	    "items": {
		"type": "string",
		"minLength": 1,
		"pattern": "^[^,]+$"
	    },
	    "uniqueItems": true,
	    "minItems": 1
	},
	"price": {
	    "type": "number",
	    "exclusiveMinimum": 0
	},
	"currency": {
	    "type": "string",
	    "enum": [
		"EUR",
		"USD",
		"GBP",
		"JPY",
		"INR"
	    ]
	},
	"image": {
	    "type": "string",
	    "format": "uri"
	}
    },
    "required": [
	"name",
	"sku",
	"categories",
	"price",
	"currency",
	"image"
    ],
    "additionalProperties": false
}

Here we’ve hardcoded a list of currencies that we allow, in ISO 4217 format.

Database for storing products

We’d like to store our data in some place outside of Racket, so that other people (or programs) can work with it. PostgreSQL is a great choice, though by no means is it the only one (MySQL is pretty good, too, and there are others).

Here’s how we can get that running. Let’s imagine a table that has six columns: product_id, name, sku (stock keeping unit), price, currency, and image.

CREATE TABLE catalog (
    product_id integer NOT NULL,
    name text NOT NULL,
    sku text NOT NULL,
    price numeric NOT NULL,
    currency text NOT NULL,
    image text NOT NULL,
    categories text NOT NULL
);

More structure could be added to the table, of course. A one-to-many relationship (with the help of a separate table) could be defined for the categories of a product instead of storing categories with products. Additional data might also make sense.

We intend that SKUs are unique. That is not reflected in the CREATE TABLE statement.

Notice that the SQL schema overlaps, semantically speaking, with the JSON Schema. For instance, in the SQL schema, we insist that all columns be non-NULL, reflecting our choice in the JSON Schema that all properties must be present. The JSON Schema goes somewhat further, though, by requiring that strings be non-empty. In the JSON Schema, moreover, categories are an array, which needs to be converted to a string when inserting entries in the catalog table.

Putting it all together

Once we’ve got a database running and a JSON Schema to validate inputs, we need to sew everything together with the HTTP server.

You’re welcome to download the Racket code and JSON Schema that accompany this tutorial.

$ racket products.rkt

This server little server offers a resource, /products, to which one makes POST requests to create a new product. The server also offers a family of resources of the form /products/ID, where ID is a positive integer representing the internal Postgres ID of a product. It offers a JSON view of a row in the catalog table.

A prerequisite for running the Racket code is that you have a Postgres database running. The script will complain if any of the environment variables DB_HOST, DB_DATABASE, DB_PORT, DB_USER, DB_PASSWORD are unset. We use these variables to specify a database connection. When it comes time to connect to the database, we will understandably get an error if these credentials are incorrect.

That said, the main function here is create-product. Its argument is an HTTP request (à la request?). It does some simple validation of the request (checking that it actually has a body, that the payload is well-formed JSON) before doing JSON Schema validation. If all these validation steps succeed, we insert a new record into the database (with the insert-new-product! function). The call to insert-new-product! is wrapped in a with-handlers form to catch SQL errors. All other errors are caught by the fallback error handler oops.

Given an array of categories (non-empty strings that do not contain a comma), we join them with a comma and save that to the database. As discussed above, this is perhaps not the smartest way to represent a one-to-many relationship between products and categories. But this will do for now.

Takeaway

JSON Schema is a powerful (if incomplete) way to ensure that JSON data adheres to sensible constraints. A JSON Schema in your application stands as a check between you and a database.

Argo is a useful Racket library for JSON Schema validation. Just use Argo’s adheres-to-schema? function, possibly combined with Argo’s parse-json function to parse JSON and json-schema? to check that the parsed JSON really is a JSON Schema.