Home
json-everything
Cancel

Exploring Code Generation with JsonSchema.Net.CodeGeneration

About a month ago, my first foray into the world of code generation was published with the extension library JsonSchema.Net.CodeGeneration. For this post, I’d like to dive into the process a little to show how it works. Hopefully, this will give better insight on how to use it as well. This library currently serves as an exploration platform for the JSON Scheam IDL Vocab work, which aims to create a new vocabulary designed to help support translating between code and schemas (both ways). Extracting type information The first step in the code generation process is determining what the schema is trying to model. This library uses a complex set of mini-meta-schemas to identify supported patterns. A meta-schema is just a schema that validates another schema. For example, in most languages, enumerations are basically just named constants. The ideal JSON Schema representation of this would be a schema with an enum. So .Net’s System.DayOfWeek enum could be modelled like this: 1 2 3 4 { "title": "DayOfWeek", "enum": [ "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday" ] } To identify this schema as defining an enumeration, we’d need a meta-schema that looks like this: 1 2 3 4 5 6 7 8 9 10 { "type": "object", "title": "MyEnum", "properties": { "enum": { "type": "array" } }, "required": [ "enum" ] } However, in JSON Schema, an enum item can be any JSON value, whereas most languages require strings. So, we also want to ensure that the values of that enum are strings. 1 2 3 4 5 6 7 8 9 10 { "type": "object", "title": "MyEnum", "properties": { "enum": { "items": { "type": "string" } } }, "required": [ "enum" ] } We don’t need to include type or uniqueItems because we know the data is a schema, and its meta-schema (e.g. Draft 2020-12) already has those constraints. We only need to define constraints on top of what the schema’s meta-schema defines. Now that we have the idea, we can expand this by defining mini-meta-schemas for all of the patterns we want to support. There are some that are pretty easy, only needing the type keyword: string number integer boolean And there are some that are a bit more complex: arrays dictionaries custom objects (inheritable and non-inheritable) And we also want to be able to handle references. The actual schemas that were used are listed in the docs. As with any documentation, I hope to keep these up-to-date, but short of that, you can always look at the source. Building type models Now that we have the different kinds of schemas that we want to support, we need to represent them in a sort of type model from which we can generate code. The idea behind the library was to be able to generate multiple code writers that could support just about any language, so .Net’s type system (i.e. System.Type) isn’t quite the right model. The type model as it stands has the following: TypeModel - Serves as a base class for the others while also supporting our simple types. This basically just exposes a type name property. EnumModel - Each value has a name and an integer value derived from the item’s index. ArrayModel - Exposes a property to track the item type. DictionaryModel - Exposes properties to track key and item types. ObjectModel - Handles both open and closed varieties. Each property has a name, a type, and whether it can read/write. Whenever we encounter a subschema or a reference, that represents a new type for us to generate. Lastly, in order to avoid duplication, we set up some equality for type models. With this all of the types supported by this library can be modelled. As more patterns are identified, this modelling system can be expanded as needed. Writing code The final step for code generation is the part everyone cares about: actually writing code. The library defines ICodeWriter which exposes two methods: TransformName() - Takes a JSON string and transforms it into a language-compatible nme. Write() - Renders a type model into a type declaration in the language. There’s really quite a bit of freedom in how this is implemented. The built-in C# writer branches on the different model types and has private methods to handle each one. One aspect to writing types that I hadn’t thought about when I started writing the library was that there’s a difference between writing the usage of a type and writing the declaration of a type. Before, when I thought about code generation, I typically thought it was about writing type declarations: you have a schema, and you generate a class for it. But what I found was that if the properties of an object also use any of the generated types, only the type name needs to be written. For example, for the DayOfWeek enumeration we discussed before, the declaration is 1 2 3 4 5 6 7 8 9 10 public enum DayOfWeek { Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday } But if I have an array of them, I need to generate DayOfWeek[], which only really needs the type name. So my writer needs to be smart enough to write the declaration once and write just the name any time it’s used. There are a couple of other little nuance behaviors that I added in, and I encourage you to read the docs on the capabilities. Generating a conclusion Overall, writing this was an enjoyable experience. I found a simple architecture that seems to work well and is also extensible. My hope is that this library will help inform the IDL Vocab effort back in JSON Schema Land. It’s useful having a place to test things. If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!

The New JsonSchema.Net

Some changes are coming to JsonSchema.Net: faster validation and fewer memory allocations thanks to a new keyword architecture. The best part: unless you’ve built your own keywords, this probably won’t require any changes in your code. A new keyword architecture? For about the past year or so, I’ve had an idea that I’ve tried and failed to implement several times: by performing static analysis of a schema, some work can be performed before ever getting an instance to validate. That work can then be saved and reused across multiple evaluations. For example, with this schema 1 2 3 4 5 6 7 { "type": "object", "properties": { "foo": { "type": "string" }, "bar": { "type": "number" } } } we know: that the instance must be an object if that object has a foo property, its value must be a string if that object has a bar property, its value must be a number These are the constraints that this schema applies to any instance that it validates. Each constraint is comprised of an instance location and a requirement for the corresponding value. What’s more, most of the time, we don’t need the instance to identify these constraints. This is the basic idea behind the upcoming JsonSchema.Net v5 release. If I can capture these constraints and save them, then I only have to perform this analysis once. After that, it’s just applying the constraints to the instance. Architecture overview With the upcoming changes, evaluating an instance against a schema occurs in two phases: gathering constraints, and processing individual evaluations. For the purposes of this post, I’m going to refer to the evaluation of an individual constraint as simply an “evaluation.” Collecting constraints There are two kinds of constraints: schema and keyword. A schema constraint is basically a collection of keyword constraints, but it also needs to contain some things that are either specific to schemas, such as the schema’s base URI, or common to all the local constraints, like the instance location. A keyword constraint, in turn, will hold the keyword it represents, any sibling keywords it may have dependencies on, schema constraints for any subschemas the keyword defines, and the actual evaluation function. I started with just the idea of a generic “constraint” object, but I soon found that the two had very different roles, so it made sense to separate them. I think this was probably the key distinction from previous attempts that allowed me to finally make this approach work. So for constraints we have this recursive definition that really just mirrors the structural definition represented by JsonSchema and the different keyword classes. The primary difference between the constraints and the structural definition is that the constraints are more generic (implemented by two types) and evaluation-focused, whereas the structural definition is the more object-oriented model and is used for serialization and other things. Building a schema constraint consists of building constraints for all of the keywords that (sub)schema contains. Each keyword class knows how to build the constraint that should represent it, including getting constraints for subschemas and identifying keyword dependencies. Once we have the full constraint tree, we can save that in the JsonSchema object and reuse that work for each evaluation. Evaluation Each constraint object produces an associated evaluation object. Again, there are two kinds: one for each kind of constraint. When constructing a schema evaluation, we need the instance (of course), the evaluation path, and any options to apply during this evaluation. It’s important to recognize that options can change between evaluations; for example, sometimes you may or may not want to validate format. A results object for this subschema will automatically be created. Creating a schema evaluation will also call on the contained keyword constraints to build their evaluations. To build a keyword evaluation, the keyword constraint is given the schema constraint that’s requesting it, the instance location, and the evaluation path. From that, it can look at the instance, determine if the evaluation even needs to run (e.g. is there a value at that instance location?), and create an evaluation object if it does. It will also create schema evaluations for any subschemas. In this way, we get another tree: one built for evaluating a specific instance. The structure of this tree may (and often will) differ from the structure of the constraint tree. For example, when building constraints, we don’t know what properties additionalProperties will need to cover, so we build a template from which we can later create multiple evaluations: one for each property. Or maybe properties contains a property that doesn’t exist in the instance; no evaluation is created because there’s nothing to evaluate. While building constraints only happens once, building evaluations occurs every time JsonSchema.Evaluate() is called. And there was much rejoicing This a lot, and it’s a significant departure from the more procedural approach of previous versions. But I think it’s a good change overall because this new design encapsulates forethought and theory present within JSON Schema and uses that to improve performance. If you find you’re in the expectedly small group of users writing your own keywords, I’m also updating the docs, so you’ll have some help there. If you still have questions, feel free to open an issue or you can find me in Slack (link on the repo readme). I’m also planning a post for the JSON Schema blog which looks at a bit more of the theory of JSON Schema static analysis separately from the context of JsonSchema.Net, so watch for that as well. If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!

JsonNode's Odd API

1 2 3 4 5 6 var array = new JsonArray { ["a"] = 1, ["b"] = 2, ["c"] = 3, }; This compiles. Why does this compile?! Today we’re going to explore that. What’s wrong? In case you didn’t see it, we’re creating a JsonArray instance and initializing using key/value pairs. But arrays don’t contain key/value pairs; they contain values. Objects contain key/value pairs. 1 2 3 4 5 6 var list = new List<int> { ["a"] = 1, ["b"] = 2, ["c"] = 3, }; This doesn’t compile, as one would expect. So why does JsonArray allow this? Is the collection initializer broken? Collection initializers Microsoft actually has some really good documentation on collection initializers so I’m not going to dive into it here. Have a read through that if you like. The crux of it comes down to when collection initializers are allowed. First, you need to implement IEnumerable<T> and an .Add(T) method (apparently it also works as an extension method). This will enable the basic collection initializer syntax, like 1 var list = new List<int> { 1, 2, 3 }; But you can also enable direct-indexing initialization by adding an indexer. This lets us do thing like 1 2 3 4 5 6 var list = new List<int>(10) { [2] = 1, [5] = 2, [6] = 3 }; More commonly, you may see this used for Dictionary<TKey, TValue> initialization: 1 2 3 4 5 6 var dict = new Dictionary<string, int> { ["a"] = 1, ["b"] = 2, ["c"] = 3, } But, wait… does that mean that JsonArray has a string indexer? JsonArray has a string indexer! It sure does! You can see it in the documentation, right there under Properties. Why?! Why would you define a string indexer on an array type? Well, they didn’t. They defined it and the integer indexer on the base type, JsonNode, as a convenience for people working directly with the base type without having to cast it to a JsonArray or JsonObject first. But now, all of the JsonNode-derived types have both an integer indexer and a string indexer, and it’s really weird. It makes all of this code completely valid: 1 2 3 4 5 6 7 8 9 10 11 JsonValue number = ((JsonNode)5).AsValue(); // can't cast directly to JsonValue _ = number[5]; // compiles but will explode _ = number["five"]; // compiles but will explode JsonArray array = new() { 0, 1, 2, 3, 4, 5, 6 }; _ = array[5]; // fine _ = array["five"]; // compiles but will explode JsonObject obj = new() { ["five"] = 1 }; _ = obj[5]; // compiles but will explode _ = obj["five"]; // fine Is this useful? This seems like a very strange API design decision to me. I don’t think I’d ever trust a JsonNode enough to confidently attempt to index it before checking to see if it can be indexed. Furthermore, the process of checking whether it can be indexed can easily result in a correctly-typed variable. 1 2 if (node is JsonArray array) Console.WriteLine(array[5]); This will probably explode because I didn’t check bounds, but from a type safety point of view, this is SO much better. I have no need to access indexed values directly from a JsonNode. I think this API enables programming techniques that are dangerously close to using the dynamic keyword, which should be avoided at all costs. If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!

Correction: JSON Path vs JSON Pointer

In my post comparing JSON Path and JSON Pointer, I made the following claim: A JSON Pointer can be expressed as a JSON Path only when all of its segments are non-numeric keys. Thinking about this a bit more in the context of the upcoming JSON Path specification, I realized that this only considers JSON Path segments that have one selector. If we allow for multiple selectors, and the specification does, then we can write /foo/2/bar as: $.foo[2,'2'].bar Why this works The /2 segment in the JSON Pointer says If the value is an array, choose the item at index 2. If the value is an object, choose the value under property “2”. So to write this as a path, we just need to consider both of these options. If the value is an array, we need a [2] to select the item at index 2. If the value is an object, we need a ["2"] to select the value under property “2”. Since the value cannot be both an array and an object, having both of these selectors in a segment [2,"2"] is guaranteed not to cause duplicate selection, and we’re still guaranteed to get a single value. Caveat While this path is guaranteed to yield a single value, it’s still not considered a “Singular Path” according to the syntax definition in the specification. I raised this to the team, and we ended up adding a note to clarify. Summary A thing that I previously considered impossible turned out to be possible. I’ve added a note to the original post summarizing this as well as linking here. If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!

Parallel Processing in JsonSchema.Net

This post wraps up (for now) the adventure of updating JsonSchema.Net to run in an async context by exploring parallel processing. First, let’s cover the concepts in JSON Schema that allow parallel processing. Then, we’ll look at what that means for JsonSchema.Net as well as my experience trying to make it work. Part of the reason I’m writing this is sharing my experience. I’m also writing this to have something to point at when someone asks why I don’t take advantage of a multi-threaded approach. Parallelization in JSON Schema There are two aspects of evaluating a schema that can be parallelized. The first is by subschema (within the context of a single keyword). For those keywords which contain multiple subschemas, e.g. anyOf, properties, etc, their subschemas are independent from each other, and so evaluating them simultaneously won’t affect the others’ outcomes. These keywords then aggregate the results from their subschemas in some way: anyOf ensures that at least one of the subschemas passed (logical OR). This can be short-circuited to a passing validation when any subschema passes. allOf ensures that all of the subschemas passed (logical AND). This can be short-circuited to a failing validation when any subschema fails. properties and patternProperties map subschemas to object-instance values by key and ensures that those values match the associated subschemas (logical AND, but only for those keys which match). These can also be short-circuited to a failing validation when any subschema fails. The other way schema evaluation can be parallelized is by keyword within a (sub)schema. A schema is built using a collection of keywords, each of which define a constraint. Those constraints are usually independent (e.g. type, minimum, properties, etc.), however some keywords have dependencies on other keywords (e.g. additionalProperties, contains, else, etc.). Organizing the keywords into dependency groups, and then sorting those groups so that each group’s dependencies are run before the group, we find that the keywords in each group can be run in parallel. 1. Keywords with no dependencies We start with keywords which have no dependencies. type minimum/maximum allOf/anyOf/not properties patternProperties if minContains/maxContains None of these keywords (among others) have any impact on the evaluation of the others within this group. Running them in parallel is fine. Interestingly, though, some of these, like properties, patternProperties, and if, are themselves dependencies of keywords not in this set. 2. Keywords with only dependencies on independent keywords Once we have all of the independent keywords processed, we can evaluate the next set of keywords: ones that only depend on the first set. additionalProperties (depends on properties and patternProperties) then/else (depends on if) contains (depends on minContains/maxContains) Technically, if we don’t mind processing some keywords multiple times, we can run all of the keywords in parallel. For example, we can process then and else in the first set if we process if for each of them. JsonSchema.Net seeks to process each keyword once, so it performs this dependency grouping. This then repeats, processing only those keywords which have dependencies that have already been processed. In each iteration, all of the keywords in that iteration can be processed in parallel because their dependencies have completed. The last keywords to run are unevaluatedItems and unevaluatedProperties. These keywords are special in that they consider the results of subschemas in any adjacent keywords, such as allOf. That means any keyword, including keywords defined in third-party vocabularies, are dependencies of these two. Running them last ensures that all dependencies are met. Parallelization in JsonSchema.Net For those who wish to see what this ended up looking like, the issue where I tracking this process is here and the final result of the branch is here. (Maybe someone looking at the changes can find somewhere I went wrong. Additional eyes are always welcome.) Once I moved everything over to async function calls, I started on the parallelization journey by updating AllOfKeyword for subschema parallelization. In doing this, I ran into my first conundrum. The evaluation context Quite a long time ago, in response to a report of high allocations, I updated the evaluation process so that it re-used the evaluation context. Before this change, each subschema evaluation (and each keyword evaluation) would create a new context object based on information in the “current” context, and then the results from that evaluation would be copied back into the “current” context as necessary. The update changed this processes so that there was a single context that maintained a series of stacks to track where it was in the evaluation process. A consequence of this change, however, was that I could only process serially because the context indicated one specific evaluation path at a time. The only way to move into a parallel process (in which I needed to track multiple evaluation paths simultaneously) was to revert at least some of that allocation management, which meant more memory usage again. I think I figured out a good way to do it without causing too many additional allocations by only creating a new context when multiple branches were possible. So that means any keywords that have one a single subschema would continue to use the single context, but any place where the process could branch would create new contexts that only held the top layer of the stacks from the parent context. I updated all of the keywords to use this branching strategy, and it passed the test suite, but for some reason it ran slower. Sync Method optimized Mean Error StdDev Gen0 Gen1 Gen2 Allocated RunSuite False 874.0 ms 13.53 ms 12.65 ms 80000.0000 19000.0000 6000.0000 178.93 MB RunSuite True 837.3 ms 15.76 ms 14.74 ms 70000.0000 22000.0000 8000.0000 161.82 MB Async Method optimized Mean Error StdDev Gen0 Gen1 Gen2 Allocated RunSuite False 1.080 s 0.0210 s 0.0206 s 99000.0000 29000.0000 9000.0000 240.26 MB RunSuite True 1.050 s 0.0204 s 0.0201 s 96000.0000 29000.0000 9000.0000 246.53 MB Investigating this led to some interesting discoveries. Async is not always parallel My first thought was to check whether evaluation was utilizing all of the processor’s cores. So I started up my Task Manager and re-ran the benchmark. Performance tab of the Task Manager during a benchmark run. One core is pegged out completely, and the others are unaffected. That’s not parallel. A little research later, and it seems that unless you explicitly call Task.Run(), a task will be run on the same thread that spawned it. Task.Run() tells .Net to run the code on a new thread. So I updated all of the keywords again to create new threads. Things get weird Before I ran the benchmark again, I wanted to run the test suite to make sure that the changes I made still actually evaluated schemas properly. After all, what good is running really fast if you’re going the wrong direction? Of the 7,898 tests that I run from the official JSON Schema Test Suite, about 15 failed. That’s not bad, and it usually means that I have some data mixed up somewhere, a copy/paste error, or something like that. Running each test on its own, though, they all passed. Running the whole suite again, and 17 would fail. Running all of the failed tests together, and they would all pass. Running the the suite again… 12 failed. Each time I ran the full, it was a different group of less than 20 tests that would fail. And every time, they’d pass if I ran them in isolation or in a smaller group. This was definitely a parallelization problem. I added some debug logging to see what the context was holding. Eventually, I found that for the failed tests, the instance would inexplicably delete all of its data. Here’s some of that logging: 1 2 3 4 5 6 starting /properties - instance root: {"foo":[1,2,3,4]} (31859421) starting /patternProperties - instance root: {"foo":[1,2,3,4]} (31859421) returning /patternProperties - instance root: {} (31859421) returning /properties - instance root: {} (31859421) starting /additionalProperties - instance root: {} (31859421) returning /additionalProperties - instance root: {} (31859421) The “starting” line was printed immediately before calling into a keyword’s .Evaluate() method, and the “returning” line was called immediately afterward. The parenthetical numbers afterward are the hash code (i.e. .GetHashCode()) of the JsonNode object, so you can see that it’s the same object, only the contents are missing. None of my code edits the instance: all access is read only. So I have no idea how this is happening. A few days ago, just by happenstance, this dotnet/runtime PR was merged, which finished off changes in this PR from last year, which resolved multi-threading issues in JsonNode… that I reported! I’m not sure how that slipped by me while working on this. This fix is slated to be included in .Net 8. I finally figure out that if I access the instance before (or immediately after) entering each thread, then it seems to work, so I set about making edits to do that. If the instance is a JsonObject or JsonArray, I simply access the .Count property. This is the simplest and quickest thing I could think to do. That got all of the tests working. Back to our regularly scheduled program With the entire test suite now passing every time I ran it, I wanted to see how we were doing on speed. I once again set up the benchmark and ran it with the Task Manager open. Performance tab of the Task Manager during a benchmark run with proper multi-threading. The good news is that we’re actually multi-threading now. The bad news is that the benchmark is reporting that the test takes twice as long as synchronous processing and uses a lot more memory. Method optimized Mean Error StdDev Gen0 Gen1 Allocated RunSuite False 1.581 s 0.0128 s 0.0120 s 130000.0000 39000.0000 299.3 MB RunSuite True 1.681 s 0.0152 s 0.0135 s 134000.0000 37000.0000 309.65 MB I don’t know how this could be. Maybe touching the instance causes a re-initialization that’s more expensive than I expect. Maybe spawning and managing all of those threads takes more time than the time saved by running the evaluations in parallel. Maybe I’m just doing it wrong. The really shocking result is that it’s actually slower when “optimized.” That is, taking advantage of short-circuiting when possible by checking for the first task that completed with a result that matched a predicate, and then cancelling the others. (My code for this was basically re-inventing this SO answer.) Given this result, I just can’t see this library moving into parallelism anytime soon. Maybe once .Net Framework is out of support, and I move it into the newer .Net era (which contains the threading fix) and out of .Net Standard (which won’t ever contain the fix), I can revisit this. If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!

The "Try" Pattern in an Async World

Something I ran across while converting JsonSchema.Net from synchronous to asynchronous is that the “try” method pattern doesn’t work in an async context. This post explores the pattern and attempts to explain what happens when we try make the ​method async. What is the “try” method pattern? We’ve all seen various TryParse() methods. In .Net, they’re on pretty much any data type that has a natural representation as a string, typically numbers, dates, and other simple types. When we want to parse that string into the type, we might go for a static parsing method which returns the parsed value. For example, 1 static int Parse(string s) { /* ... */ } The trouble with these methods is that they throw exceptions when the string doesn’t represent the type we want. If we don’t want the exception, we could wrap the Parse() call in a try/catch, but that will incur exception handling costs that we’d like to avoid. The answer is to use another static method that has a slightly different form: 1 static bool TryParse(string s, out int i) { /* ... */ } Here, the return value is a success indicator, and the parsed value is passed as an out parameter. If the parse was unsuccessful, the value in the out parameter can’t be trusted (it will still have a value, though, usually the default for the type). Ideally, this method does more than just wrapping Parse() in a try/catch for you. Instead, it should reimplemented the parsing logic to not throw an exception in the first place. However, calling TryParse() from Parse() and throwing on a failure is the ideal setup for this pair of methods if you want to re-use logic. This pattern is very common for parsing, but it can be used for other operations as well. For example, JsonPointer.Net uses this pattern for evaluating JsonNode instances because of .Net’s decision to unify .Net-null and JSON-null. There needs to be a distinction between “the value doesn’t exist” and “the value was found and is null,” and a .TryEvaluate() method allows this. Why would I need to make this pattern async? As I mentioned in the intro, I came across this when I was converting JsonSchema.Net to async. Specifically, the data keyword implementation uses a set of resolvers to locate the data that is being referenced. Those resolvers implement an interface that defines a .TryResolve() method. 1 bool TryResolve(EvaluationContext context, out JsonNode? node); I have a resolver for JSON Pointers, Relative JSON Pointers, and URIs. Since the entire point of this change was to make URI resolution async, I now have to make this “try” method async. Let’s make the pattern async To make any method support async calls, its return type needs to be a Task. In the case of .TryParse() it needs to return Task<bool>. 1 Task<bool> TryResolve(EvaluationContext context, out JsonNode? node); No problems yet. Let’s go to one of the resolvers and tag it with async so that we can use await for the resolution calls. Oh… that’s not going to work. Since we can’t have out parameters for async methods, we have two options: Implement the method without using async and await. Get the value out another way. I went with the second solution. 1 async Task<(bool, JsonNode?)> TryResolve(EvaluationContext context) { /* ... */ } This works perfectly fine: it gives a success output and a value output. Hooray for tuples in .Net! Later, I started thinking about why out parameters are forbidden in async methods. Why are out parameters forbidden in async methods? Without going into too much detail, when you have an async method, the compiler is actually doing a few transformations for you. Specifically it has to transform your method that looks like it’s returning a bool into one that returns a Task<bool>. This async method 1 2 3 4 5 6 7 async Task<bool> SomeAsyncMethod() { // some stuff await AnotherAsyncMethod(); // some other stuff return true; } essentially becomes 1 2 3 4 5 6 7 8 9 10 Task<bool> SomeAsyncMethod() { // some stuff return Task.Run(AnotherAsyncMethod) .ContinueWith(result => { // some other stuff return true; }); } There are a few other changes and optimizations that happen, but this is the general idea. So when we add an out parameter, 1 2 3 4 5 6 7 8 9 10 Task<bool> SomeAsyncMethod(out int value) { // some stuff return Task.Run(AnotherAsyncMethod) .ContinueWith(result => { // some other stuff return true; }); } it needs to be set before the method returns. That means it can only be set as part of // some stuff. But in the async version, it’s not apparent that value has to be set before anything awaits, so they just forbid having the out parameter in async methods altogether. In the context of my .TryResolve() method, I’d have to set the out parameter before I fetch the URI content, but I can’t do that because the URI content is what goes in the out parameter. Given this new information, it seems the first option of implementing the async method without async/await really isn’t an option. A new pattern While I found musing over the consequences of out parameters in async methods interesting, I think the more significant outcome from this experience is finding a new version of the “try” pattern. 1 2 3 4 Task<(bool, ResultType)> TrySomethingAsync(InputType input) { // ... } It’s probably a pretty niche need, but I hope having this in your toolbox helps you at some point. If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!

JSON Schema, But It's Async

The one thing I don’t like about how I’ve set up JsonSchema.Net is that SchemaRegistry.Fetch only supports synchronous methods. Today, I tried to remedy that. This post is a review of those prospective changes. For those who’d like to follow along, take a look at the commit that is the fallout of this change. Just about every line in this diff is a required, direct consequence of just making SchemaRegistry.Fetch async. What is SchemaRegistry? Before we get into the specific change and why we need it, we need to cover some aspects of dereferencing the URI values of keywords like $ref. The JSON Schema specification states … implementations SHOULD NOT assume they should perform a network operation when they encounter a network-addressable URI. That means that, to be compliant with the specification, we need some sort of registry to preload any documents that are externally referenced by schemas. This text addresses the specification’s responsibility around the many security concerns that arise as soon as you require implementations to reach out to the network. By recommending against this activity, the specification avoids those concerns and passes them onto the implementations that, on their own, wish to provide that functionality. JsonSchema.Net is one of a number of implementations that can be configured to perform these “network operations” when they encounter a URI they don’t recognize. This is acceptable to the specification because it is opt-in. In JsonSchema.Net this is accomplished using the SchemaRegistry.Fetch property. By not actually defining a method in the library, I’m passing on those security responsibilities to the user. I actually used to use it to run the test suite. Several of the tests reference external documents through a $ref value that starts with http://localhost:1234/. The referenced documents, however, are just files stored in a dedicated directory in the suite. So in my function, I replaced that URI prefix with the directory, loaded the file, and returned the deserialized schema. Now I just pre-load them all to help the suite run a bit faster. SchemaRegistry.Fetch is declared as an instance property of type Func<Uri, IBaseDocument?>. Really, this acts as a method that fetches documents that haven’t been pre-registered. Declaring it as a property allows the user to define their own method to perform this lookup. As this function returns an IBaseDocument?, it’s synchronous. Why would we want this to be async? The way to perform a network operation in .Net is by creating an HttpClient and calling one of its methods. Funnily, though, all of those methods are… async. One could create a quasi-synchronous method that makes the call and waits for it. 1 2 3 4 5 6 7 8 9 IBaseDocument? Download(Uri uri) { using var client = new HttpClient(); var text = client.GetAsStringAsync(uri).Result; if (text == null) return null; return JsonSchema.FromText(text); } but that isn’t ideal, and, in some contexts, it’s actively disallowed. Attempting to access a task’s .Result in Blazor Web Assembly throws an UnsupportedException, which is why json-everything.net doesn’t yet support fetching referenced schemas, despite it being online, where fetching such documents automatically might be expected. So we need the SchemaRegistry.Fetch property to support an async method. We need it to be of type Func<Uri, Task<IBaseDocument?>>. Then our method can look like this 1 2 3 4 5 6 7 8 9 async Task<IBaseDocument?> Download(Uri uri) { using var client = new HttpClient(); var text = await client.GetAsStringAsync(uri); if (text == null) return null; return JsonSchema.FromText(text); } Making the change Changing the type of the property is simple enough. However this means that everywhere that the function is called now needs to be within an async method… and those methods also need to be within async methods… and so on. Async propagation is real! In the end, the following public methods needed to be changed to async: JsonSchema.Evaluate() IJsonSchemaKeyword.Evaluate() and all of its implementations, which is every keyword, including the ones in the vocabulary extensions SchemaRegistry.Register() SchemaRegistry.Get() IBaseDocument.FindSubschema() The list doesn’t seem that long like this, but there were a lot of keywords and internal methods. The main thing that doesn’t make this list, though, is the tests. Oh my god, there were so many changes in the tests! Even with the vast majority of the over 10,000 tests being part of the JSON Schema Test Suite (which really just has some loading code and a single method), there were still a lot of .Evaluate() calls to update. Another unexpected impact of this change was in the validating JSON converter from a few posts ago. JsonConverter’s methods are synchronous, and I can’t change them. That means I had to use .Result inside the .Read() method. That means the converter can’t be used in a context where that doesn’t work. It’s ready… … but it may be a while before this goes in. All of the tests pass, and I don’t see any problems with it, but it’s a rather large change. I’ll definitely bump major versions for any of the packages that are affected, which is effectively all of the JSON Schema packages. I’ll continue exploring a bit to see what advantages an async context will bring. Maybe I can incorporate some parallelism into schema evaluation. We’ll see. But really I want to get some input from users. Is this something you’d like to see? Does it feel weird at all to have a schema evaluation be async, even if you know you’re not making network calls? How does this impact your code? Leave some comments below or on this issue with your thoughts. If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!

Numbers Are Numbers, Not Strings

A common practice when serializing to JSON is to encode floating point numbers as strings. This is done any time high precision is required, such as in the financial or scientific sectors. This approach is designed to overcome a flaw in many JSON parsers across multiple platforms, and, in my opinion, it’s an anti-pattern. Numbers in JSON The JSON specification (the latest being RFC 8259 as of this writing) does not place limits on the size or precision of numbers encoded into the format. Nor does it distinguish between integers or floating point. That means that if you were to encode the first million digits of π as a JSON number, that precision would be preserved. Similarly, if you were to encode 85070591730234615847396907784232501249, the square of the 64-bit integer limit, it would also be preserved. They are preserved because JSON, by its nature as a text format, encodes numeric values as decimal strings. The trouble starts when you try to get those numbers out via parsing. It should also be noted that the specification does have a couple paragraphs regarding support for large and high-precision numbers, but that does not negate the “purity” of the format. This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available. Note that when such software is used, numbers that are integers andare in the range [-(2**53)+1, (2**53)-1] are interoperable in the sense that implementations will agree exactly on their numeric values. The problem with parsers Mostly, parsers are pretty good, except when it comes to numbers. An informal, ad-hoc survey conducted by the engineers at a former employer of mine found that the vast majority of parsers in various languages automatically parse numbers into their corresponding double-precision (IEEE754) floating point representation. If the user of that parsed data wants the value in a more precise data type (e.g. a decimal or bigint), that floating point value is converted into the requested type afterward. But at that point, all of the precision stored in the JSON has already been lost! In order to properly get these types out of the JSON, they must be parsed directly from the text. My sad attempt at repeating the survey Perl will at least give you the JSON text for the number if it can’t parse the number into a common numeric type. This lets the consumer handle those cases. It also appears to have some built-in support for bignum. A JSON number becomes either an integer, numeric (floating point) or string scalar in perl, depending on its range and any fractional parts. Javascript actually recommends the anti-pattern for high-precision needs. … numbers in JSON text will have already been converted to JavaScript numbers, and may lose precision in the process. To transfer large numbers without loss of precision, serialize them as strings, and revive them to BigInts, or other appropriate arbitrary precision formats. Go (I played researched online) parses a bigint number as floating point and truncates high-precision decimals. There’s even an alternative parser that behaves the same way. Ruby only supports integers and floating point numbers. PHP (search for “Example #5 json_decode() of large integers”) appears to operate similarly to Perl in that it can give output as a string for the consumer to deal with. .Net actually stores the tokenized value (_parsedData) and then parses it upon request. So when you ask for a decimal (via .GetDecimal()) it actually parses that data type from the source text and gives you what you want. 10pts for .Net! This is why JsonSchema.Net uses decimal for all non-integer numbers. While there is a small sacrifice on range, you get higher precision, which is often more important. It appears that many languages support dynamically returning an appropriate data type based on what’s in the JSON text (integer vs floating point), which is neat, but then they only go half-way: they only support basic integer and floating point types without any support for high-precision values. Developers invent a workaround As is always the case, the developers who use these parsers need to have a solution, and they don’t want to have to build their own parser to get the functionality they need. So what do they do? They create a convention where numbers are serialized as JSON strings any time high precision is required. This way the parser gives them a string, and they can parse that back into a number of the appropriate type however they want. However, this has led to a multitude of support requests and StackOverflow questions. How do I configure the serializer to read string-encoded numbers? How do I validate string-encoded numbers? When is it appropriate or unnecessary to serialize numbers as strings? And, as we saw with the Javascript documentation, this practice is actually being recommended now! This is wrong! Serializing numbers as strings is a workaround that came about because parsers don’t do something they should. On the validation question, JSON Schema can’t apply numeric constraints to numbers that are encoded into JSON strings. They need to be JSON numbers for keywords like minimum and multipleOf to work. Where to go from here Root-cause analysis gives us the answer: the parsers need to be fixed. They should support extracting any numeric type we want from JSON numbers and at any precision. A tool should make a job easier. However, in this case, we’re trying to drive a screw with a pair of pliers. It works, but it’s not what was intended. If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!

JSON Path Has a New Spec!

IETF are submitting a new RFC to formalize the well-known JSON Path query syntax, originally proposed by Stefan Gössner in 2008. A brief history The effort to build a specification started after Christoph Burgmer created his amazing JSON Path Comparison website, project he started in 2019. To build it, he gathered all of the implementations he could find, created test harnesses for each of them, and ran them all against a very large and comprehensive set of JSON Path queries that were found to be supported by at least one of the implementations. The resulting grid revealed that many implementations had their own “flavor” of the original JSON Path by Goëssner. Some added syntax that wasn’t specified, while others merely filled in gaps to the best of their ability. In 2020, I was invited (along with many other library maintainers) by Glyn Normington to participate in writing an official specification, and soon after, IETF was invoked to manage the spec-writing process. Since then, we have been working to solidify a syntax that can be implemented on any platform, in any language, and provide consistent results. The charter The idea behind writing the specification was to provide common ground that the implementations, which at this point vary considerably from each other, can all aim for. In this way, there would be at least a minimal level of interoperability or guaranteed support. As long as a user wrote a JSON Path query that only used features in the specification, it would be supported by whatever implementation they were using. The other thing the charter wanted to ensure was minimal breakage of existing support. We wanted to make sure that we were supporting existing queries as much as possible. Similarities I think that we covered the basics, and for the most part, it’s largely the same as what’s in Goëssner’s post: bracketed syntax for everything dot syntax for friendly names (.foo) and wildcards (.*) select by object key name array index (negative selects from end) array index slice (1:10:2 to select indices 1, 3, 5, 7 and 9) wildcard (all children) expression double-dot syntax to make any query recursive use single- or double-quotes This should support most users’ needs. Once parsed (meaning the syntax is valid) an implementation must not error. That means if, for example, a comparison doesn’t make sense (e.g. an array being less than a number), the result is just false, and the node isn’t selected. This feature wasn’t stated in Goëssner’s post, but it seemed reasonable to include. Finally, the return value is what we call a “nodelist.” A node is a tuple that consists of a value that is present in the JSON data as well as its location within that data. Duplicate nodes (same value and location) may appear in nodelists if they’re selected by different parts of the path. Additions In addition to the above, some features from other implementations did make it into the specification. Multiple selectors within a bracketed syntax: $['foo','bar'] You can even mix and match selectors: $['foo',42,1:10:2] Parentheses are no longer needed for filter expressions: $[?@.foo==42] Omissions Currently, math operators aren’t supported by the specification (though I encouraged the group to add them). // find items where the difference between .foo and .bar is 42 $[?@.foo-@.bar==42] Although it wasn’t supported by Goëssner’s post, starting the overall JSON Path with @, which does seem to be supported by a considerable number of implementations, was decided against. @.foo Only JSON primitives are allowed in filter expressions, so explicit objects and arrays are not permitted. $[?@.foo==["string",42,false]] The in operator was excluded. This one isn’t quite as common, but several of the larger implementations do support it, so I figured it was worth mentioning. $[?@.foo in [1,2,3,4,"string"]] // also requires structured values $[?42 in @.foo] All of the above are supported in JsonPath.Net via the PathParsingOptions object. What I came to call “container queries” are also not supported. These are expression queries where, instead of evaluating to a boolean, the expression would evaluate to the index or key to select. The team just couldn’t find a compelling use case for them, though I did propose a couple niche use cases. // can be written as $[-1] $[(@.length-1)] // can't be otherwise expressed $[@.discriminator] // for an object like {"discriminator": "foo", "foo": "bar"} .length to determine the length of an array is not supported. It’s featured in Goëssner’s post, and just about every implementation supports it. However, it creates an ambiguity with trying to select length properties in data. Most of the time with existing implementations, the workaround for selecting a length property is to use the bracket syntax, ['length']. This indicated that you wanted the value of that property rather than the number of items that the data contained. However the team felt that it was better not to have special cases. The functionality, however was restored as the length() function. Although it is a different syntax (which we’ll come to), and .length will no longer be supported as generally expected. Filter expressions I think the biggest difference is in how filter expressions (?@.foo==42) are supported. Goëssner’s post says that the filter expressions should use the underlying language engine. Doing this is easier to specify, and it’s easier to implement. However it’s not at all interoperable. If I need to send a JSON Path to some server, I shouldn’t need to know that, if the server is written in Python, I need to write my filter expression in Python. If I want to send that same query to another server that’s written in another language, I have to write a new path for that server that attempts to do the same thing. (There are also security implications of receiving code to be executed.) The only way to resolve this is to specify the filter expression syntax: a common syntax that can be implemented using any language. Based on well-known syntax Most developers should be used to the C-style operators, && and ==, etc. The order of operations should be familiar as well: ||, &&, !, then comparisons. Also, both single- and double-quoted strings are permitted. Functions Yes! JSON Path now supports functions in filter expressions. These were introduced partially to support the .length functionality, but also as a general extension mechanism. So instead of $[?@.foo.length>42] you’d use $[?length(@.foo)>42] There are four other functions defined by the specification: count(<path>) will return the number of results from a sub-query match(<string>, <regex>) will exact-match a string against a regular expression (implicit anchoring) search(<string>, <regex>) will contains-match a string against a regular expression value(<path>) will return the value of a query that only returns a single result Every function must declare a “type” for its return value and its parameters. If the function is written so that these types are not correct, the specification requires that the parse will fail. There are three types: ValueType - any JSON value and Nothing, which is akin to undefined in Javascript LogicalType - either LogicalTrue or LogicalFalse; think of it like the result of a comparison. It’s distinct from JSON’s true and false literals. NodesType - the return of a query. Technically, the well-typedness of functions is determined during a semantic analysis step that occurs after the parse, but JsonPath.Net does both the parse and the semantic analysis at the same time, so in my head it’s all just “parsing.” You give it a string, and it gives you a path… or errors. Extending JSON Path Lastly, IETF will be maintaining a function registry where new functions can be defined for all implementations to use. The five functions in the spec document (listed above) will be required, and the registry functions will be recommended. You’ll need to check with the implementation to see what it supports. I plan on supporting everything in JsonPath.Net. In summary That’s pretty much the spec. There are a few changes that are incompatible with what is understood by many implementations, but I think what we have should be supportable by everyone. If you’d like to join in on the fun, have a look at the GitHub repo where we’re writing the spec and join the IETF mailing list for the project. I hope that we continue this effort to further define and enhance JSON Path. If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!

JSON Deserialization with JSON Schema Validation

This past weekend, I wondered if it was possible to validate JSON as it was being deserialized. For streaming deserialization, this is very difficult, if not impossible. It’s certainly not something that a validator like JsonSchema.Net is set up to do. But I found another option. In this post, I’m going to go over the new v4.1.0 release of JsonSchema.Net, which includes support for full JSON Schema validation during deserialization, and why this approach is preferred over the built-in validation options. What options have we? Let’s start by looking at what we have available to us for model validation. For this post, I’m going to use the following model: 1 2 3 4 5 6 public class MyModel { public string Foo { get; set; } public int Bar { get; set; } public DateTime Baz { get; set; } } The serializer is pretty simple. It’ll check basic things like value type, and that’s pretty much it. In some cases, it can do a little better; for example, in cases like Baz where the serialization results in a string, it will verify that the string content is representative of the type it’s supposed to deserialize to, but this is really just an extension of the type checking. For anything more robust, we need to add attributes from System.ComponentModel.DataAnnotations. These annotations allow us to specify data validity for our properties. Let’s add the following requirements: Foo must be between 10 and 50 characters long Bar cannot be negative Baz is required 1 2 3 4 5 6 7 8 9 10 public class MyModel { [MinLength(10)] [MaxLength(50)] public string Foo { get; set; } [Range(0, int.MaxValue)] public int Bar { get; set; } [Required] public DateTime Baz { get; set; } } But we have a problem. The serializer doesn’t support these attributes at all. This JSON will still be deserialized successfully: 1 2 3 4 { "Foo": "foo", "Bar": -42 } Ideally, we want to get errors for all three of these properties, but the serializer gives us nothing. Instead, we get a model with Foo is the string "foo" Bar is -42 Baz is DateTime.MinValue (the default for DateTime) We have to separately check the model after it’s been deserialized to determine if what we received is valid. 1 2 results = new List<ValidationResult>(); Validator.TryValidateObject(myModel, new ValidationContext(myModel), results, true); This will populate results with the errors that it can detect. But because Baz is not nullable in our model, it receives the default value for its type, and thus the [Required] attribute is met, even though it was missing from the JSON data. While this system can work, it has its shortcomings. We can receive errors from either the serializer via exceptions or from the model validator via the list. The serializer is only going to report the first error is receives. There may be others. Validation during deserialization In order to make a better experience, we want to validate the JSON as it’s coming in. To do that, we need a couple things: a way to attach a schema to a type a way to hook into the serializer For the first one, we’ll create a [JsonSchema()] attribute that can be applied to a type. This is easy; we’ll come back to it later. The second one is harder, so let’s tackle that first. The only way to do hook into the serialization process is with a JSON converter. Well… kinda. There is a more roundabout way, called a JSON converter factory. Basically, this is a special JsonConverter-derived class that produces other JsonConverter instances that are then used to perform the conversion. The idea now is to create a converter that performs validation then passes off deserialization to another converter, specifically the one the serializer would have chosen without our validation converter. made with yEd The serializer checks any custom converters to see if they can handle our type. ValidatingJsonConverter (this is actually our factory) checks the type for the [JsonSchema()] attribute. If it is found, it returns a ValidatingJsonConverter<T> (T is the type to convert); otherwise it says it can’t convert that type. The serializer invokes the converter. The converter reads the JSON payload and validates it against the schema. If the JSON isn’t valid, it throws a JsonException with the details of the validation; otherwise, it passes deserialization to another converter. The factory turns out to be pretty easy. We need to create a converter that’s typed for what we’re trying to deserialize. A little reflection, and we’re done. The interesting challenge is in the converter itself, and it uses a rather neat consequence of .Net’s decision to make Utf8JsonReader a struct. The converter Historically when I’ve tried to validate JSON before deserialization, I would first parse the JSON into JsonNode (or JsonElement before that was available). Then I could validate it with a schema. If the validation succeeded, I could then deserialize directly from the JSON model. However this secondary deserialization actually meant that it was getting the string representation back out of the JSON model and then parsing it again in the deserialization step. As quick as the System.Text.Json serializer is, making it perform a complete parse twice can get expensive. To avoid (some of) this duplication of work, it turns out that we can just grab a copy of the Utf8JsonReader object by assigning it to a new variable. Because it’s a struct, all of its data is just copied directly, and we can modify the copy all we want without affecting the original. This lets us utilize the tokenization and everything else that has already been performed to build the reader without having to repeat that work. Now we are free to parse out a JsonNode and validate it with our schema. So far, our converter looks like this: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 public override T? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) { var readerCopy = reader; var node = JsonSerializer.Deserialize<JsonNode?>(ref readerCopy, options); var validation = _schema.Evaluate(node, new EvaluationOptions { OutputFormat = OutputFormat, Log = Log!, RequireFormatValidation = RequireFormatValidation }); if (validation.IsValid) { // TODO: deserialize the model } throw new JsonException("JSON does not meet schema requirements") { Data = { ["validation"] = validation } }; } That’s most of it. Now we just need to invoke the deserializer again, but we need to be careful that we don’t cause a recursive loop. To avoid this, we can create a copy of the serializer options, remove our converter (the factory), and just deserialize normally. 1 2 3 4 5 6 7 8 if (validation.IsValid) { // _optionsFactory is a delegate that's passed into the converter // that copies the options and removes the converter factory // so we don't enter an recursive loop var newOptions = _optionsFactory(options); return JsonSerializer.Deserialize<T>(ref reader, newOptions); } Declaring the schema Now that we have the hard bit out of the way, let’s work on that attribute. We want to get the schema dynamically at runtime. The best way to do this is by following the example of unit test frameworks such as NUnit and XUnit. Both of these frameworks allow the developer to specify test cases by exposing a property that returns them. I use this to run the JSON Schema Test Suite: I can load the files from the disk, read all of the tests they contain, and return a massive collection with all of the test cases. The key part, though, is that the test cases aren’t known until the test suite runs. The way these work is by adding an attribute to the test method that gives the name of a property or method on the test class that will return the cases. We’ll do something similar, but we don’t want to restrict developers into defining the schema in the model, so we’ll also need the type that declares that member. For example, if we wanted to have all of our model schemas available in a static class called ModelSchemas, we could add this attribute to MyModel: 1 [JsonSchema(typeof(ModelSchemas), nameof(ModelSchemas.MyModelSchema))] Now the attribute can reflectively load the type, find that property (fields are also supported the way I built it), and invoke it to get the value. Now the attribute has the schema which can be used later by ValidatingJsonConverter. Putting it all together First, let’s define our schema. We wanted to put this in a static class, so here’s the declaration: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 public static class ModelSchemas { public static readonly JsonSchema MyModelSchema = new JsonSchemaBuilder() .Type(SchemaValueType.Object) .Properties( (nameof(MyModel.Foo), new JsonSchemaBuilder() .Type(SchemaValueType.String) .MinLength(10) .MaxLength(50) ), (nameof(MyModel.Bar), new JsonSchemaBuilder() .Type(SchemaValueType.Integer) .Minimum(0) ), (nameof(MyModel.Baz), new JsonSchemaBuilder() .Type(SchemaValueType.String) .Format(Formats.DateTime) ) ) .Required(nameof(MyModel.Baz)); } Now let’s attach that to our model: 1 2 3 4 5 6 7 [JsonSchema(typeof(ModelSchemas), nameof(ModelSchemas.MyModelSchema))] public class MyModel { public string Foo { get; set; } public int Bar { get; set; } public DateTime Baz { get; set; } } And finally, when we deserialize, we need to include ValidatingJsonConverter: 1 2 3 4 5 6 7 var jsonText = @"{ ""Foo"": ""foo"", ""Bar"": -42 }"; var converter = new ValidatingJsonConverter(); var options = new JsonSerializerOptions { Converters = { converter } }; var myModel = JsonSerializer.Deserialize<MyModel>(jsonText, options); This will throw a JsonException that carries an EvaluationResults in its .Data dictionary under "validation". The validation results will show that validation failed, but that’s it. To get more detailed output, you need to configure the validation using the converter. 1 2 3 4 5 6 7 8 9 10 var jsonText = @"{ ""Foo"": ""foo"", ""Bar"": -42 }"; var converter = new ValidatingJsonConverter { OutputFormat = OutputFormat.List } var options = new JsonSerializerOptions { Converters = { converter } }; var myModel = JsonSerializer.Deserialize<MyModel>(jsonText, options); Now it will give errors. Foo is too short Bar must be greater than zero Baz is missing So what about this data? 1 2 3 4 5 { "Foo": "foo is long enough", "Bar": 42, "Baz": "May 1, 2023" } Here, everything is right except for the format of Baz. It needs to be in the right format. It is a date, but JSON Schema requires RFC 3339 formats for date/time. Also, the format keyword is an annotation by default, which means it’s not even validated, so this would pass validation and then explode during deserialization. To fix this, we need to add another option to the converter: 1 2 3 4 5 6 7 8 9 10 11 var jsonText = @"{ ""Foo"": ""foo"", ""Bar"": -42 }"; var converter = new ValidatingJsonConverter { OutputFormat = OutputFormat.List, RequireFormatValidation = true } var options = new JsonSerializerOptions { Converters = { converter } }; var myModel = JsonSerializer.Deserialize<MyModel>(jsonText, options); There. Now it will validate the format and… still explode. But it’ll explode for the right reason this time, with proper JSON Schema output. Bonus material It turns out that while JSON Schema’s date-time format requires RFC 3339 formatting, .Net’s serializer requires ISO 8601-1:2019 formatting, which has a little overlap but isn’t exactly the same. Dates in the format 2023-05-01T02:09:48.54Z will generally be acceptable by both. I’ve opened an issue with the .Net team to see if I can persuade them to be more tolerant of date/times during deserialization. Short of waiting for that, you can create a custom format that checks for 8601-1:2019 date/times. If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!