Home
json-everything
Cancel

Why Is Everything So Expensive?!

Just a quick announcement.

Rebuilding JsonSchema.Net: The Destination

JsonSchema.Net has just undergone a major overhaul. It’s now faster, it uses less memory, it’s easier to extend, and it’s just more pleasant to work with. In the previous post, I discussed the motivations behind the change and the architectural concepts applied. In this post, I’ll get into the technical bits. The status quo In the previous design of JsonSchema.Net, which started with v5, the idea was to model each keyword as a constraint, and then a schema (or subschema) was merely a collection of these constraints. A constraint consisted of some validation requirement applying to some location in the instance. By identifying the constraints, embedding the validation logic in the model, and pre-processing as much as possible, the actual evaluation could happen faster because it already knew what to do; the evaluation was just doing it. Upon reflection, I think the idea was solid, but some of the choices I made led to unnecessary memory allocations and difficulties when attempting concurrent validations, and, well, just ugly and difficult to use and extend. I didn’t enjoy working in the library, and as a result I avoided fixing some of the issues that were reported. A new approach The new model follows a similar idea: collect information about the keywords and build a model of the schema. Resolve as much as possible at build time. However, instead of creating a delegate to perform evaluation, I’d use stateless logic in the form of a singleton handler that I could attach alongside the keyword data. Instead of using JsonNode, which requires an allocation for each node, I’d use JsonElement, which uses spans under the hood to reference the original data. Instead of having a global keyword registry that applied to all schemas throughout the app, I’d implement the concept of a dialect, which is the pool of available keywords, and I’ll make the chosen dialect configurable. More on dialects in a bit. Instead of hiding the build step inside the evaluation and attempting to manage it myself by guessing what my client, you, might want to do, I’d expose the build step as a separate action and let you manage it yourself. Instead of deciding for you that decimal was the type to use for numeric operations, maybe I could simply operate on the underlying JSON text. I only need comparisons and divisibility (not division, just “is x divisible by y?”). The schema model The root still starts with the JsonSchema class. You build a schema through one of the factory methods: .Build(JsonElement) .FromText(string) .FromFile(string) Serialization is still supported, but it’s can only use the default build options. Each of these methods also takes a BuildOptions object, on which you can declare the dialect you want to use and registries which define the collection of dialects, schemas, and vocabularies you want to be available during the build. Each one of the registries defaults to a global registry, and any lookups also fall back to the global registry if the item isn’t found locally. Once built, the schema object is immutable. By encapsulating the build dependencies this way, you can build the same schema JSON text against different options to get different behaviors. For example, if you want to run a schema as the Draft 7 specification and then later as the Draft 2020-12 specification, you can: you just need to build it twice, once each setting the respective dialect in the build options. Dialects are also auto-selected when encountering a $schema keyword. The dialect is looked up, and if it’s registered, then it just uses those keywords. If that dialect isn’t registered, then it looks for the meta-schema in the schema registry, and checks for a $vocabulary keyword. If that succeeds, it can dynamically build the dialect for you, supposing the appropriate vocabularies are registered; otherwise, and exception is thrown. Keywords are supported with singleton handlers. This singleton implements logic for validating the keyword data itself, building subschemas, and evaluating an instance. In previous versions, when a keyword’s behavior has evolved over the various specification, there would be a single keyword type that handled all of the different flavors. Now, there’s a handler for each different behavior. This keeps the logic very simple in each one, and it makes composing the logic into a dialect very easy. If keyword validation fails, an exception is thrown. Inside JsonSchema, there is a JsonSchemaNode which is the actual root of the graph. This contains information about where in the schema we are, relative to the root, and information on any keywords in the subschema that were supported by the dialect. Keyword information is provided by the KeywordData struct, which identifies the keyword name, a reference to the handler, and a list of subschemas, which are represented by further nodes. Once the schema is navigated, and all of the nodes and keywords are built, the build process checks the graph for cycles and attempts to resolve references. If a cycle is detected, an exception is thrown. If a reference cannot be resolved, it will continue. The last step during the build is adding itself to the schema registry found on the build options, which default to the global registry. Allowing the references to remain unresolved handles situations where two schemas reference each other. For example, you could have Schema A define a property which is validated by Schema B which in turn has a property which is validated by Schema A. When Schema A is built, the reference is left unresolved. When Schema B is built, the process of resolving its references will drill down into Schema A, resolving its reference back to Schema B. If a schema that has not been fully resolved is used to perform an evaluation, an exception will be thrown. Supporting custom keywords In the old model, keywords needed to implement IJsonSchemaKeyword to enable processing add [SchemaKeyword] attribute to identify the keyword name add [SchemaSpecVersion] attribute for all appropriate versions of the spec add [Vocabulary] attribute for the vocabulary that defines the keyword add [DependsOnAnnotationsFrom] attribute to identify keywords that needed to be processed first add [JsonConverter] attribute for JSON deserialization The implementation of the interface was particularly difficult to think through. It required that you build a constraint object that held a delegate that captured the keyword data (generally creating a closure). Some of these keyword implementations were extremely difficult to get working just right. Once I had it, I didn’t want to touch it. Just looking at it wrong might cause it to fail. The new model is much simpler. implement IKeywordHandler maybe make it a singleton [DependsOnAnnotationsFrom] is still around just in case The new interface is literally just three methods: validate the keyword value, throw an exception if invalid build any subschemas, most keywords are a no-op evaluate an instance and return a KeywordEvaluation result With the new model, you have access to the local schema’s raw data in the form of a JsonElement, so a lot of the keyword dependencies can be sorted out before evaluation time and without actually having that dependency. For example, in the old system, additionalProperties needed the annotation results from properties. Now, it just looks at the properties value in the raw data and grabs the property list directly. Instead of registering the keyword with the SchemaKeywordRegistry, you just create a dialect that includes your keyword instance, and use that on the build options. JSON math Admittedly, this was coded purely by AI, but I was immediately sure to test it thoroughly. The JsonMath static class performs numeric comparisons and divisibility tests on numbers while they’re still encoded in the JsonElement. No parsing into a numeric type means that we now fully support arbitrary size and precision. Of course, you’ll need to deserialize the value in to whatever model you need for your application, but the benefit is that you get to decide which numeric type is right. No more serialization The primary way to get a schema was to deserialize it. This meant jumping through a lot of hoops to get keywords to deserialize properly. And then trying to make that whole system AOT-compatible was an absolute pain (that I was very glad to have help on). Instead everything is built directly from a JsonElement, and each part saves the source element, so returning back to JSON is basically already done. Performance I had spent a lot of time with the previous iteration, doing a lot of gross hacking to utilize array pools and stack-allocated arrays to squeeze out every microsecond of performance. It was pretty quick, but there wasn’t anything I could do with it in its current state. When I was done with the rebuild, I ran a benchmark that built a moderate schema and ran it through the build and evaluation processes. For one test, I had it build and evaluate each iteration, and for the other, I had it build once and evaluate repeatedly. I implemented the benchmark for both versions. Here are the tests for v7: Method Runtime n Mean Gen0 Gen1 Allocated BuildAlways .NET 8.0 5 64.33 us 14.4043 0.4883 119.15 KB BuildAlways .NET 9.0 5 58.50 us 13.9160 0.4883 114.69 KB BuildAlways .NET 10.0 5 61.78 us 13.6719 0.4883 112.04 KB               BuildAlways .NET 8.0 10 130.92 us 28.3203 0.9766 238.29 KB BuildAlways .NET 9.0 10 117.20 us 27.3438 0.9766 229.39 KB BuildAlways .NET 10.0 10 109.07 us 27.3438 0.4883 225.64 KB               BuildAlways .NET 8.0 50 668.65 us 144.5313 3.9063 1191.47 KB BuildAlways .NET 9.0 50 596.34 us 139.6484 4.8828 1146.93 KB BuildAlways .NET 10.0 50 548.28 us 136.7188 3.9063 1120.37 KB               BuildOnce .NET 8.0 5 33.29 us 8.4229 0.2441 69.39 KB BuildOnce .NET 9.0 5 30.16 us 8.1787 0.2441 67.1 KB BuildOnce .NET 10.0 5 26.86 us 7.8125 0.2441 64.78 KB               BuildOnce .NET 8.0 10 58.78 us 15.3809 0.4883 126.35 KB BuildOnce .NET 9.0 10 53.01 us 14.8926 0.4883 122.29 KB BuildOnce .NET 10.0 10 47.51 us 14.4043 0.4883 117.95 KB               BuildOnce .NET 8.0 50 249.25 us 70.8008 1.9531 582 KB BuildOnce .NET 9.0 50 226.23 us 68.8477 2.1973 563.88 KB BuildOnce .NET 10.0 50 211.27 us 66.4063 1.9531 543.29 KB and for v8: Method Runtime n Mean Gen0 Gen1 Allocated BuildAlways .NET 8.0 5 80.87 us 11.9629 3.9063 98.4 KB BuildAlways .NET 9.0 5 74.84 us 10.8643 3.1738 89.06 KB BuildAlways .NET 10.0 5 99.24 us 10.7422 3.1738 89.22 KB               BuildAlways .NET 8.0 10 161.28 us 23.9258 7.3242 196.8 KB BuildAlways .NET 9.0 10 152.47 us 21.4844 6.3477 178.13 KB BuildAlways .NET 10.0 10 142.67 us 21.4844 6.3477 178.44 KB               BuildAlways .NET 8.0 50 818.38 us 120.1172 40.0391 983.98 KB BuildAlways .NET 9.0 50 756.01 us 108.3984 31.2500 890.63 KB BuildAlways .NET 10.0 50 704.13 us 108.3984 31.2500 892.19 KB               BuildOnce .NET 8.0 5 31.72 us 6.9580 1.7090 57.27 KB BuildOnce .NET 9.0 5 28.76 us 6.4697 1.5869 53.59 KB BuildOnce .NET 10.0 5 25.41 us 6.4697 1.5869 53.63 KB               BuildOnce .NET 8.0 10 53.64 us 12.6953 2.1973 104.27 KB BuildOnce .NET 9.0 10 47.20 us 11.9629 2.9297 98.32 KB BuildOnce .NET 10.0 10 41.21 us 11.9629 2.9297 98.35 KB               BuildOnce .NET 8.0 50 212.17 us 58.5938 2.9297 480.2 KB BuildOnce .NET 9.0 50 190.14 us 55.6641 2.9297 456.13 KB BuildOnce .NET 10.0 50 158.25 us 55.6641 2.9297 456.16 KB The times are roughly the same across everything, and I don’t think you’ll really notice a difference, except for a couple things I’d like to highlight. Build once, evaluate a lot This trend is true for both versions, so I’ll stick with the v8 numbers. Method Runtime n Mean Gen0 Gen1 Allocated BuildAlways .NET 10.0 50 704.13 us 108.3984 31.2500 892.19 KB BuildOnce .NET 10.0 50 158.25 us 55.6641 2.9297 456.16 KB It should be obvious, but not having to build a schema every time is definitely the way to go. Better performance for v8 over volume Version Method Runtime n Mean Gen0 Gen1 Allocated v7 BuildOnce .NET 10.0 5 26.86 us 7.8125 0.2441 64.78 KB v7 BuildOnce .NET 10.0 10 47.51 us 14.4043 0.4883 117.95 KB v7 BuildOnce .NET 10.0 50 211.27 us 66.4063 1.9531 543.29 KB v8 BuildOnce .NET 10.0 5 25.41 us 6.4697 1.5869 53.63 KB v8 BuildOnce .NET 10.0 10 41.21 us 11.9629 2.9297 98.35 KB v8 BuildOnce .NET 10.0 50 158.25 us 55.6641 2.9297 456.16 KB The performance gained in the long term over increasing evaluations is greater for v8. It scales better. It’s a better life The new implementation is so much easier to work with. It’s easier to implement custom keywords and create custom dialects. That it actually performs better is just icing on the cake. Building this new version has been a great learning experience, and honestly I couldn’t be happier with it. The knowledge and understanding I gained from taking the time to investigate the static analysis has made me a better devloper overall. I encourage everyone to occasionally take a moment, step back, and really consider what you’re building. You never know what you’ll uncover. If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!

Rebuilding JsonSchema.Net: The Journey

JsonSchema.Net has just undergone a major overhaul. It’s now faster, it uses less memory, it’s easier to extend, and it’s just more pleasant to work with. This post discusses the motivations behind the change and the architectural concepts applied. In the next post, I’ll get into the technical bits. Two years ago At the time I was still working on the JSON Schema specification full-time, and I had the first inklings of an idea for an implemention. For the next few months, I couldn’t shake the idea, but I also couldn’t pin it down. Finally after about a year of mental nagging, the idea was still elusive, but I had to figure it out. Coding AI tools had started becoming kinda good, and I decided to spend some time just chatting in the abstract to work out the idea. After a few days, and many, many threads of conversation, I landed on the idea of building a cyclical graph that was representative of the schema. This graph would allow me to perform some degree of static analysis, which meant that I could complete certain tasks, like reference resolution, at build time instead of at evaluation time. Addressing a memory sink Once the idea had shape, it was time to start looking at what an implementation could look like. But first, I needed to assess what was causing the high memory usage in the current implementation. After some testing I discovered that it was largely string allocations from JSON Pointer management. So the first step was to rebuild JsonPointer.Net using spans as the under-the-hood pointer representation. Instead of using what boiled down to an array of strings for the pointer data representation, the new implementation uses a single ReadOnlyMemory<char>. I also updated it to a struct, so if a new pointer is created and used within the scope of a method, there could be no allocation at all. The parsing logic makes use of the array pool, and extracting subpointers just adjusts the span. I wanted to make it a ref struct, but that wouldn’t have suited since I needed to be able to store it in a class, and ref structs can only live on the stack. The “downside” to this new implementation was that doing the work like evalutating a pointer or identifying individual segments happens on-the-fly. But that’s so much quicker than allocating memory that it’s still a huge net gain in processing. Experimentation on schemas With the new JSON Pointer implementation in place, I extracted it to a new project and started working with AI to build a new JSON Schema implementation that followed the research that I had compiled. Over a few months, I’d go through this exercise several times. With each iteration, a pattern emerged: the implementation showed lots of promise, being super-fast, then as complexity increased, that advantage disappeared. Ultimately most of them either grew too slow or were just architectures or APIs I didn’t care for. But I saw a lot of different ways to solve the same problem, all of them even following the same approach: build a cyclical graph, pre-resolving anything that doesn’t need an instance to evaluate, then perform repeated evaluations. There is still one of them on a branch if you want to see it. This is the final state that I managed to get with AI writing the code. The final commit on this was almost exactly a year ago. In the intervening time between then and now, the desire to actually make the update stayed with me, but I was quite busy with my new job, and I just didn’t have the time or energy to work on the library. But it still burned in my mind. Somewhere around this time, the computer I used for development decided to quit, and I could only use my gaming PC. It was demoralizing to lose the computer. I really liked it. And it was distracting to use the computer that also had my games, all of which only made working on this project slower. Buckling down Fast-forwarding to about six weeks ago, I had the mental space to work on the library again. I excluded all of the keywords and other supporting files from the project so that I could add them back, one by one. The approach I wanted to use was simple. Use JsonElement instead of JsonNode In the first versions of JsonSchema.Net, I had used JsonElement because it was the only JSON DOM available. When JsonNode was added, I decided to use that because it closely resembled the DOM I used in Manatee.Json, my previous library. I later realized that JsonNode carried with it a lot of memory allocations, whereas JsonElement used spans to refer back to the original JSON string, eliminating allocations. The first and probably simplest improvement is moving to JsonElement. Keyword handlers instead of instances One of the things that the AI would repeatedly implement through the experimentation was static keyword handers. Stateless logic machines that would be called on the raw data. That means that the keywords don’t need to be deserialized. But that also means that validation of the keyword data needed to be handled differently. The new keywords need three functions: validate keyword data, build subschemas, and perform instance evaluation. I also didn’t want static handlers because I wanted instances of them in collections. Singletons provide that function nicely. Separate build from evaluation This was really the crux of the idea I had so long ago. A single build that performed most of the computation ahead of time in order to make the evaluation quick and easy. The current code actually does this to a degree. It saves as much info as possible between evaluations, but if you tried to evaluate it with a different set of options, say with a different specification version or a different schema registry, then it couldn’t assume that the current build was valid, and it had to rebuild from the start again. The solution here was two-fold: save the build options (or at least what was needed for evaluation later) with the schema, and make the schema immutable. Allow multiple build configurations I had multiple GitHub issues opened that centered around the fact that adding a keyword made it available for every schema. I also thought the vocabulary implementation was clunky. What was needed was a way to have a keyword supported for one schema build and unsupported for another. That just isn’t possible with the current approach. So we need registries for everything, a configuration that specified everything needed to build a schema, and nothing is static. Settling in With these ideas in mind, I loaded up the solution and got to work. While I used AI to help with working out the ideas and with some experimental implementations, the final output was coded by hand. It feels a bit odd to say that, though, as if I’m advertising that this code is hand-crafted… with love… for you. First thing I did was to remove stuff I didn’t need. Instead of deleting files, I excluded them from the project file. This allowed me to slowly add stuff back as I was ready to work on it, and it also gave me an insurance policy against forgetting anything. I got the IKeywordHandler interface in place, re-added the type keyword as a sample, and implemented the new interface. This was enough for me to tear down the evaluation logic in JsonSchema and see what else I needed. After a few hours of work, my computer promptly decided to quit. So now I don’t have any computer to work on. I hadn’t even committed my work. I was bummed. Fortunately, the hard drive seemed fine, and I was able to do some hackery and get the files onto a network backup, but I wouldn’t get a computer I could work on for another three weeks. When I finally did get back in action, I was full throttle. Three weeks of late nights and full weekends later, the implementation is done, including all of the extension libraries (except data generation), and passing all of the tests. Over the past couple days, I’ve completed the updates to the docs. Finishing up It’s been a long journey, and I’m so happy with the library now. Before this update, I was discouraged about making changes and fixing bugs. I just didn’t enjoy working in the code. Now the code is simple to understand and edit, and it’s not because I just finished writing it; it’s legitimately simpler. And to boot, it’s faster! If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!

Built-in ASP.Net Validation for API Requests

I’ve been playing around with the validating JSON converter a lot at work lately, and sharing this cool feature with my coworkers yielded some interesting feedback that helped expand its capabilities. In this post, we’ll look at those improvements, how to incorporate that validation directly into the ASP.Net pipeline, and how I discovered a way I could be comfortable with runtime schema generation. The validating JSON converter A more extensive explanation can be found in the docs linked above, but I’d still like to give a quick overview. System.Text.Json comes with an ultra-efficient serializer that converts between JSON text and .Net models with ease. Its primary shortcoming, however, is validation. Because .Net is so strongly typed, the serializer can handle type validation pretty well, but anything more than that is left to a secondary system that runs after the model is created. The ValidatingJsonConverter in JsonSchema.Net is the answer to that problem. This converter uses a JSON Schema that is attached to a type through a [JsonSchema] attribute to hook into the serializer itself and perform validation on the incoming JSON text prior to creating a model. In this way, validation becomes declarative. Integration with ASP.Net Taking a quick look at a typical controller and route handler method: 1 2 3 4 5 6 7 8 9 [Route("{Controller}")] public class PersonController { [HttpPost] public IActionResult CreatePerson([FromBody] PersonModel person) { // ... } } you can see that by the time the method is invoked, any serialization has taken place. I think this may have driven the “deserialize, then validate” design that exists out of the box. To hook into the serialization, we have to backtrack into the pipeline. This requires three additional objects: a model binder (and its provider) and a filter. The model binder The model binder handles the part we’re talking about here: hooking into the serialization itself. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 public class ValidatingJsonModelBinder : IModelBinder { public async Task BindModelAsync(ModelBindingContext bindingContext) { if (bindingContext == null) throw new ArgumentNullException(nameof(bindingContext)); // For body binding, we need to read the request body if (bindingContext.BindingSource == BindingSource.Body) { bindingContext.HttpContext.Request.EnableBuffering(); using var reader = new StreamReader( bindingContext.HttpContext.Request.Body, leaveOpen: true); var body = await reader.ReadToEndAsync(); bindingContext.HttpContext.Request.Body.Position = 0; if (string.IsNullOrEmpty(body)) return; try { var options = bindingContext.HttpContext.RequestServices .GetRequiredService<IOptions<JsonOptions>>() .Value.JsonSerializerOptions; var model = JsonSerializer .Deserialize(body, bindingContext.ModelType, options); bindingContext.Result = ModelBindingResult.Success(model); } catch (JsonException jsonException) { if (jsonException.Data.Contains("validation") && jsonException.Data["validation"] is EvaluationResults results) { var errors = ExtractValidationErrors(results); if (errors.Any()) { foreach (var error in errors) { bindingContext.ModelState .AddModelError(error.Path, error.Message); } bindingContext.Result = ModelBindingResult.Failed(); return; } } bindingContext.ModelState .AddModelError(bindingContext.FieldName, jsonException, bindingContext.ModelMetadata); bindingContext.Result = ModelBindingResult.Failed(); } return; } // For other binding sources, use the value provider var valueProviderResult = bindingContext.ValueProvider .GetValue(bindingContext.ModelName); if (valueProviderResult == ValueProviderResult.None) return; bindingContext.ModelState .SetModelValue(bindingContext.ModelName, valueProviderResult); try { var value = valueProviderResult.FirstValue; if (string.IsNullOrEmpty(value)) return; var options = bindingContext.HttpContext.RequestServices .GetRequiredService<IOptions<JsonOptions>>().Value.JsonSerializerOptions; var model = JsonSerializer .Deserialize(value, bindingContext.ModelType, options); bindingContext.Result = ModelBindingResult.Success(model); } catch (JsonException jsonException) { bindingContext.ModelState .AddModelError(bindingContext.ModelName, jsonException, bindingContext.ModelMetadata); bindingContext.Result = ModelBindingResult.Failed(); } } static List<(string Path, string Message)> ExtractValidationErrors( EvaluationResults validationResults) { var errors = new List<(string Path, string Message)>(); ExtractValidationErrorsRecursive(validationResults, errors); return errors; } static void ExtractValidationErrorsRecursive( EvaluationResults results, List<(string Path, string Message)> errors) { if (results.IsValid) return; if (results.Errors != null) { foreach (var error in results.Errors) { errors.Add((results.InstanceLocation.ToString(), error.Value)); } } foreach (var detail in results.Details) { ExtractValidationErrorsRecursive(detail, errors); } } } And then we need a model binder provider. This class actually gets registered with the DI container. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 public class ValidatingJsonModelBinderProvider : IModelBinderProvider { public IModelBinder? GetBinder(ModelBinderProviderContext context) { if (context == null) throw new ArgumentNullException(nameof(context)); // Only use this binder for types that have the [JsonSchema] attribute if (context.Metadata.ModelType.GetCustomAttributes( typeof(JsonSchemaAttribute), true).Any()) return new ValidatingJsonModelBinder(); return null; } } The filter The filter handles binding failures and builds the Problem Details response. We need to implement two interfaces. IActionFilter handles partial binding success, like when there are multiple parameters in the method, and some of the parameters bind successfully. IAlwaysRunResultFilter handles total binding failure. I haven’t really dug into the critical differences, but I discovered that we need both. For what it’s worth, Google’s AI had to say this: The primary distinction lies in their scope and guarantee of execution. IActionFilter targets the action method execution itself, while IAlwaysRunResultFilter focuses on the action result execution and guarantees its execution even if other filters short-circuit the pipeline. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 public class JsonSchemaValidationFilter : IActionFilter, IAlwaysRunResultFilter { public void OnActionExecuting(ActionExecutingContext context) { // this method seems to handle partial binding success var check = HandleJsonSchemaErrors(context); if (check is not null) context.Result = check; } public void OnResultExecuting(ResultExecutingContext context) { // this method seems to handle total binding failure var check = HandleJsonSchemaErrors(context); if (check is not null) context.Result = check; } static IActionResult? HandleJsonSchemaErrors(FilterContext context) { if (context.ModelState.IsValid) return null; var errors = context.ModelState .Where(x => x.Value?.Errors.Any() == true) .SelectMany(x => x.Value!.Errors.Select(e => new { Path = x.Key, Message = e.ErrorMessage, })) .Where(e => string.IsNullOrEmpty(e.Path) || e.Path.StartsWith('/')) .GroupBy(x => x.Path) .ToDictionary(x => x.Key, x => x.Select(e => e.Message).ToList()); // If we don't have JSON Pointer errors, JSON Schema didn't handle this. // Don't change anything. if (errors.Count == 0) return null; var problemDetails = new ProblemDetails { Type = "https://zeil.com/errors/validation", Title = "Validation Error", Status = 400, Detail = "One or more validation errors occurred.", Extensions = { ["errors"] = errors } }; return new BadRequestObjectResult(problemDetails); } public void OnActionExecuted(ActionExecutedContext context) { } // no-op public void OnResultExecuted(ResultExecutedContext context) { } // no-op } Integration with ASP.Net To hook everything up, we need to edit the application startup: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 var builder = WebApplication.CreateBuilder(args); var mvcBuilder = s.AddControllersWithViews(o => // or just .AddControllers() { // add the filter o.Filters.Add<JsonSchemaValidationFilter>(); // add the binder at the start o.ModelBinderProviders.Insert(0, new ValidatingJsonModelBinderProvider()); }).AddJsonOptions(o => { o.JsonSerializerOptions.Converters.Add(new ValidatingJsonConverter { Options = { OutputFormat = OutputFormat.Hierarchical, RequireFormatValidation = true, } } ); }); Feedback from coworkers The feedback I received after adding this validation to several of my API models and integrating it into the ASP.Net pipeline was varied but mostly good. They loved the idea of adding this kind of validation. They really loved that it automatically produced a 400 Bad Request response, in Problem Details format and complete with schema-generated errors, when validation failed. What they didn’t like was the cruft of explicitly writing out the schema for every type. A small model can be validated fairly easily: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 [JsonSchema(typeof(Person), nameof(Schema))] class Person { public static JsonSchema Schema = new JsonSchemaBuilder() .Type(JsonSchemaType.Object) .Properties( ("name", new JsonSchemaBuilder()), ("age", new JsonSchemaBuilder().Type(JsonSchemaType.Integer)) ); public string Name { get; set; } public int Age { get; set; } } However, it’s easy to see how this can be quite complex and cumbersome as the model grows. The answer was to let the system generate the schemas. Accepting schema generation Generating the schemas from the model types would reduce the cruft and lower the bar for other developers to begin validating requests with schemas. (I guess others just don’t derive the joy I do from writing schemas.) I began by creating a new attribute: [GenerateJsonSchema]. Then I had to copy the ValidatingSchemaConverter (because inheriting it wasn’t an option the way I had written it) to a new version that also handled my new attribute as well as the original [JsonSchema] attribute. Generating schemas is usually good for most types, but sometimes you need validation that the generation doesn’t support. For those, we still need to support the explicit approach. Then I just needed to update a few things we’ve previously created. The binder provider needs to react to the new attribute. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 public class ValidatingJsonModelBinderProvider : IModelBinderProvider { public IModelBinder? GetBinder(ModelBinderProviderContext context) { if (context == null) throw new ArgumentNullException(nameof(context)); if (context.Metadata.ModelType.GetCustomAttributes( typeof(JsonSchemaAttribute), true).Any()) || context.Metadata.ModelType.GetCustomAttributes( typeof(GenerateJsonSchemaAttribute), true).Any()) return new ValidatingJsonModelBinder(); return null; } } We need to register the new converter in the JsonSerializerOptions 1 2 3 4 5 6 7 8 9 o.JsonSerializerOptions.Converters.Add(new ValidatingJsonConverter { Options = { OutputFormat = OutputFormat.Hierarchical, RequireFormatValidation = true, } } ); Ensuring quality I’ve generally not trusted schema generation (even generation that I wrote) in production system. The only way that I can accept it is if the generated schemas are checked by a human developer at dev-time. I landed on approval tests as a way to enforce that the schema for any given type is checked. An approval test runs within the unit test framework and outputs a file that is committed to the repository. Later, when a developer makes a change, the unit test runs again. If the newly generated approval text differs from what is saved, the approval framework can open a diff editor (e.g. VS Code) to alert the user and allow them to accept the changes by merging them into the committed file. Those changes can then be reviewed in a PR. Specifically, I had two tests: one that generated schema approvals for each of the types that were decorated with the [GenerateJsonSchema] attribute (for human verification of the schema itself), and one that finds any request models that don’t have schema validation (to prevent new schema-less requests from being created). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 public class JsonSchemaGenerationTests { static readonly JsonSerializerOptions JsonOptions = new() { WriteIndented = true, Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping, }; static readonly SchemaGeneratorConfiguration GenerationConfig = new() { PropertyNameResolver = PropertyNameResolvers.CamelCase, }; public class TypeWrapper(Type type) { public Type Type { get; } = type; public override string ToString() => Type.Name; } public static IEnumerable<object[]> TypesThatGenerateJsonSchemas { get { var webAssembly = typeof(Program).Assembly; var typesWithAttribute = webAssembly.GetTypes() .Where(type => type.GetCustomAttribute<GenerateJsonSchemaAttribute>() != null) .OrderBy(type => type.FullName) .ToList(); return typesWithAttribute.Select(type => new object[] { new TypeWrapper(type) }); } } [Theory] [MemberData(nameof(TypesThatGenerateJsonSchemas))] public void JsonSchemaGeneration(TypeWrapper type) { JsonSchema schema = new JsonSchemaBuilder().FromType(type.Type, GenerationConfig); var schemaJson = JsonSerializer.Serialize(schema, JsonOptions); this.Assent(schemaJson, new Configuration() .UsingExtension("json") .UsingApprovalFileNameSuffix($"_{type}") ); } [Fact] public void ModelsWithoutJsonSchemaValidation() { var webAssembly = typeof(Program).Assembly; var controllerTypes = webAssembly.GetTypes() .Where(t => t is { IsClass: true, IsAbstract: false } && t.IsAssignableTo(typeof(Controller))) .ToList(); var missingJsonSchemaModels = new List<(Type ControllerType, MethodInfo Method, Type ParameterType)>(); foreach (var controllerType in controllerTypes) { var methods = controllerType .GetMethods(BindingFlags.Public | BindingFlags.Instance) .Where(m => m.GetCustomAttributes<HttpMethodAttribute>().Any() || m.GetCustomAttributes<HttpPostAttribute>().Any() || m.GetCustomAttributes<HttpPutAttribute>().Any() || m.GetCustomAttributes<HttpPatchAttribute>().Any()) .ToList(); foreach (var method in methods) { var parameters = method.GetParameters() .Where(p => p.ParameterType.Namespace?.StartsWith("Zeil") == true) .ToList(); foreach (var parameter in parameters) { // don't check parameters that we know aren't coming in as JSON var isExplicitlyNotJson = parameter.GetCustomAttribute<FromRouteAttribute>() != null || parameter.GetCustomAttribute<FromQueryAttribute>() != null || parameter.GetCustomAttribute<FromFormAttribute>() != null || parameter.GetCustomAttribute<FromHeaderAttribute>() != null; if (isExplicitlyNotJson) continue; var parameterType = parameter.ParameterType; var supportsSchemaValidation = parameterType.GetCustomAttribute<GenerateJsonSchemaAttribute>() != null || parameterType.GetCustomAttribute<JsonSchemaAttribute>() != null; if (supportsSchemaValidation) continue; missingJsonSchemaModels.Add(( controllerType, method, parameterType )); } } } var groupedByType = missingJsonSchemaModels .GroupBy(m => m.ParameterType.FullName ?? m.ParameterType.Name) .OrderBy(g => g.Key) .ToList(); var reportLines = new List<string>(); foreach (var group in groupedByType) { reportLines.Add(group.Key); var controllerMethods = group .Select(m => $" {m.ControllerType.FullName ?? m.ControllerType.Name}.{m.Method.Name}") .Distinct() .OrderBy(method => method) .ToList(); reportLines.AddRange(controllerMethods); } var reportText = string.Join(Environment.NewLine, reportLines); this.Assent(reportText, new Configuration().UsingExtension("txt")); } } These tests allow me to sleep at night, knowing that any generated code had been checked. For the masses I have pulled the new [GenerateJsonSchema] attribute and the new GenerativeValidatingJsonConverter into JsonSchema.Net.Generation for you. You’ll need to copy and adapt the rest into your solution, though. Better APIs for everyone! If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!

Revamping the JsonSchema.Net Build Chain

Last week I discovered that my pack and publish builds for JsonSchema.Net and its language packs were failing. Turns out nuget.exe isn’t supported in Ubuntu Linux anymore. In this post I’m going to describe the solution I found. The build that was Rewind two and a half years. I’ve added the ErrorMessages class to JsonSchema.Net and I want to be able to support multiple languages on-demand, the way Humanizr does: a base package that supports English, and satellite language packs. (They also publish a meta-package that pulls all of the languages, but I didn’t want to do that.) So the first thing to do was check out how they were managing their build process. After some investigation, it seemed they were using nuget pack along with a series of custom .nuspec files. The big change for me was that they weren’t using the built-in “pack on build” feature of dotnet, which is what I was using. So I worked it up. The final solution had three parts: Build the library Pack and push JsonSchema.Net Pack and push the language packs The first two steps were pretty straighforward. The language packs step utilized a GitHub Actions matrix that I built by scanning the file system for .nuspec files during the build step. And to run the pack and push, I used nuget.exe which was provided by the nuget/setup-nuget action. Everything was great. Until it wasn’t. Sadness ensues As I mentioned, last week I discovered that the workflow was failing, so I went to investigate. Turns out the failing action was nuget/setup-nuget: simply installing the Nuget CLI. After some investigation, I found that the Nuget CLI requires Mono, and Mono is now out of support. I never had to install Mono, so either it was pre-installed on the Ubuntu image or the Nuget setup action installed it as a pre-requisite. Probably the latter. And now, since Mono is no longer supported, they don’t do that anymore. Whatever the reason, the action wasn’t working, so I couldn’t use the Nuget CLI. That means I need to figure out how to use the dotnet CLI to build a custom Nuget package. But there’s a problem with that: dotnet pack doesn’t support .nuspec files; it only works on project files, like .csproj. The help I needed came from Glenn Watson from the .Net Foundation. I happened to comment about the build not working and he was able to point me to another project that built custom Nuget packages with dotnet pack and the the project file. After about four hours of playing with it, I finally landed on something that worked enough. It’s not perfect, but it does the job. Building the base package To start, I just wanted to see if I could get the main package built. Then I’d move on to the language packs. I learned from the other project that to build a custom Nuget package, I need to do two things: Prevent the packing step from using the build output by using 1 <IncludeBuildOutput>false</IncludeBuildOutput> Create an ItemGroup with a bunch of entries to indicate the files that need to go into the package. 1 <None Include="README.md" Pack="true" PackagePath="\" /> Doing it this way does mean that you have to explicitly list every file that is supposed to be in the package. This is basically the same as using nuget.exe with a .nuspec file, so I really already had the list of files I needed, just in a different format. This new ItemGroup had a side effect, though. I could see all of these files in my project in Visual Studio. To fix this, I put a condition on the ItemGroup that defaults to false. 1 <ItemGroup Condition="'$(ResourceLanguage)' == 'base'"> This condition means the ItemGroup only applies when the ResourceLanguage property equals base, which we’ll use to indicate the main library. What’s the ResourceLanguage property? I made it up. Apparently you can just make up properties and then define them on various dotnet commands: 1 dotnet pack -p:ResourceLanguage=base The property’s default value is nothing, which gives an empty string… and an empty string doesn’t equal base, so we’ve successfully hidden the package files while still having access to them during the packing process. The new section now looks like this: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 <ItemGroup Condition="'$(ResourceLanguage)' == 'base'"> <None Include="README.md" Pack="true" PackagePath="\" /> <None Include="..\..\LICENSE" Pack="true" PackagePath="\" /> <None Include="..\..\Resources\json-logo-256.png" Pack="true" PackagePath="\" /> <None Include="bin\$(Configuration)\netstandard2.0\JsonSchema.Net.dll" Pack="true" PackagePath="lib\netstandard2.0" /> <None Include="bin\$(Configuration)\netstandard2.0\JsonSchema.Net.xml" Pack="true" PackagePath="lib\netstandard2.0" /> <None Include="bin\$(Configuration)\netstandard2.0\JsonSchema.Net.pdb" Pack="true" PackagePath="lib\netstandard2.0" /> <None Include="bin\$(Configuration)\net8.0\JsonSchema.Net.dll" Pack="true" PackagePath="lib\net8.0" /> <None Include="bin\$(Configuration)\net8.0\JsonSchema.Net.xml" Pack="true" PackagePath="lib\net8.0" /> <None Include="bin\$(Configuration)\net8.0\JsonSchema.Net.pdb" Pack="true" PackagePath="lib\net8.0" /> <None Include="bin\$(Configuration)\net9.0\JsonSchema.Net.dll" Pack="true" PackagePath="lib\net9.0" /> <None Include="bin\$(Configuration)\net9.0\JsonSchema.Net.xml" Pack="true" PackagePath="lib\net9.0" /> <None Include="bin\$(Configuration)\net9.0\JsonSchema.Net.pdb" Pack="true" PackagePath="lib\net9.0" /> </ItemGroup> Using the command line (because that’s what’s going to run in the GitHub workflow), I built the project and ran the pack command. Sure enough, I got a Nuget package that was properly versioned and contained all of the right files! Step 1 complete. Building language packs The language pack Nuget files carry different package names, versions, and descriptions. In order to support this, we need to isolate the properties for the base package by defining a PropertyGroup for the base package that also has the condition from before so that those properties don’t get mixed into the language packs. 1 2 3 4 5 6 7 8 9 <PropertyGroup Condition="'$(ResourceLanguage)' == 'base'"> <IncludeSymbols>true</IncludeSymbols> <SymbolPackageFormat>snupkg</SymbolPackageFormat> <PackageId>JsonSchema.Net</PackageId> <Description>JSON Schema built on the System.Text.Json namespace</Description> <Version>7.3.2</Version> <PackageTags>json-schema validation schema json</PackageTags> <EmbedUntrackedSources>true</EmbedUntrackedSources> </PropertyGroup> Now we can define an additional PropertyGroup and ItemGroup for when ResourceLanguage isn’t nothing (remember, nothing is for Visual Studio and the code build) and isn’t base (for the base package). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 <PropertyGroup Condition="'$(ResourceLanguage)' != '' And '$(ResourceLanguage)' != 'base'"> <PackageId>JsonSchema.Net.$(ResourceLanguage)</PackageId> <PackageTags>json-schema validation schema json error language-pack</PackageTags> </PropertyGroup> <ItemGroup Condition="'$(ResourceLanguage)' != '' And '$(ResourceLanguage)' != 'base'"> <None Include="Localization\README.$(ResourceLanguage).md" Pack="true" PackagePath="\README.md" /> <None Include="..\..\LICENSE" Pack="true" PackagePath="\" /> <None Include="..\..\Resources\json-logo-256.png" Pack="true" PackagePath="\" /> <None Include="bin\$(Configuration)\netstandard2.0\$(ResourceLanguage)\JsonSchema.Net.resources.dll" Pack="true" PackagePath="lib\netstandard2.0\$(ResourceLanguage)" /> <None Include="bin\$(Configuration)\net8.0\$(ResourceLanguage)\JsonSchema.Net.resources.dll" Pack="true" PackagePath="lib\net8.0\$(ResourceLanguage)" /> <None Include="bin\$(Configuration)\net9.0\$(ResourceLanguage)\JsonSchema.Net.resources.dll" Pack="true" PackagePath="lib\net9.0\$(ResourceLanguage)" /> </ItemGroup> Also notice that I’ve also incorporated the ResourceLanguage property to identify the correct paths. And finally, I used an additional PropertyGroup for each language I support so that they can each get their own description and version: 1 2 3 4 <PropertyGroup Condition="'$(ResourceLanguage)' == 'de'"> <Description>JsonSchema.Net Locale German (de)</Description> <Version>1.0.1</Version> </PropertyGroup> Now I can run a similar dotnet command for each of the languages I support: 1 dotnet pack -p:ResourceLanguage=de Updating the workflow The final thing I needed to update was the GH Actions workflow. I still like the idea of using the matrix, but now I don’t have the nuspec files I used previously to generate the list of languages. But I do know all of the languages I support, and that list doesn’t update much, so I can just list it explicitly in the workflow file and update as needed. Also, I found that including base as one of the options also packs the base library, so I don’t need a separate job for it, which is nice. Now I just have a single matrixed job that runs for base and all of the languages. (Link to the workflow at the end of the post.) That’s good enough The only thing I wasn’t able to figure out is the dependencies for the language packs. They’re currently the dependencies of the main lib. I tried putting the condition on the ItemGroups with the project and package references, but it didn’t have any effect on the pack command. Because of this and a feedback I got while trial-and-erroring this, I suspect it detects the dependencies from the obj/ folder rather than from the .csproj file. You can view the final project file here and the GH Actions workflow file here. I’ve also opened an issue on Humanizr to let them know of the solution I found in case they encounter the same problem. If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!

A Common Pitfall of Working with JsonNode

When anyone publishes a work of creativity, they invite both praise and criticism. But open source development has a special third category: bug reports. Sometimes, these “bugs” are really just user error. In this post I’m going to review what is arguably the most common of these cases: failing to parse string-encoded JSON data. The JsonNode model All of the json-everything libraries operate on the JsonNode family of models from System.Text.Json. These models offer a remarkable feature that makes inlining JSON data very simple: implicit casts into JsonValue from compatible .Net types. So, C# bool maps to the true and false JSON literals, null maps to the null JSON literal, double and all of the other numeric types map to JSON numbers, and string maps to JSON strings. That means the compiler considers all of the following code as valid and performs the appropriate conversion in the background: 1 2 3 4 5 JsonNode jsonBool = false; // in modern C#, you need to qualify that a var can be nullable JsonNode? jsonNull = null; JsonNode jsonNumber = 42; JsonNode jsonString = "my string data" The cast itself results in a JsonValue, which inherits from JsonNode. JsonObject and JsonArray also derive from JsonNode. What this enables is a very intuitive approach to building complex JSON in a way that, if you squint just right, looks like the JSON syntax itself: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 // e.g. data for a person var jsonObject = new JsonObject { ["name"] = "Ross", ["age"] = 25, ["married"] = false, ["friends"] = new JsonArray { "Rachel", "Chandler", "Phoebe", "Joey", "Monica" } } However one of these conversions creates a perfect storm for confusion. Falling into the trap I’m going to use JsonE.Evaluate() for illustration, but since basically all of the json-everything libraries expose methods which take JsonNode as a parameter, this pitfall applies to them all. Getting straight to the point, the error I see a lot of people making is passing the JSON data as a string into methods that have JsonNode parameters. 1 2 3 4 5 6 7 8 9 10 var template = """ { "$flatten": [ [1, 2], [3, 4], [[5]] ] } """ var result = JsonE.Evaluate(template); These users expect that the template will be interpreted as JSON and processed accordingly, giving the JSON result of [1, 2, 3, 4, 5]. Instead they just get the template back. Then, because it’s not working, they file a bug. (Some people create a “question” issue, but most people assume something is wrong with the lib.) Since the compiler, which is supposed to provide guardrails against incorrect typing, reports that everything is fine, they assume the problem must be with the library. But in this case, JsonNode’s implicit cast has subverted the compiler’s type-checking in the name of providing a service (easy, JSON-like, inline data building). The solution The user just needs to parse the string-encoded JSON into the JsonNode model, and then pass that into the JsonE.Evaluate() method. This can be done in multiple ways, but the primary ones I would use (in order) are: JsonNode.Parse(jsonText) JsonSerializer.Deserialize<JsonNode>(jsonText) Both of these will give you a JsonNode. The second is a bit indirect, but it gets the job done, and I’m pretty sure it just ends up calling the first. What can be done? I don’t really think anything can be done aside from educating users of System.Text.Json. It’s an API decision, and frankly one that I agree with. When I first built Manatee.Json almost ten years ago, I started with only a JSON DOM that very closely resembled JsonNode, including all of the same implicit casts. It’s a very useful API, but it does require knowledge that the cast is happening. In the end, I assume many of the users who fall into this trap and report a “bug” (or open a question issue) are likely just new to .Net. Whatever the reason, the best approach to addressing these cases is maintaining an attitude of helpfulness, understanding, and education. If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!

The End of an Era and a New Beginning

I’ve recently had some life changes, and it’s going to impact how json-everything is maintained. I just wanted to put out a quick post to let everyone know what’s going on. I want to be open and transparent about the project and its future. tl;dr I’m starting full-time work again on Monday, so I’ll be less available to work on the libraries than I have been for the past couple years. I’m still fully invested in the project: it continues to be my passion. However, updates will likely come more slowly than you’ve grown accustomed to. What happened? About three years ago, Ben Hutton, the unofficial JSON Schema project leader (unofficial because none of us have official titles there), was approached by Postman with a deal to work on JSON Schema full time. They added him to their payroll to facilitate a sort of sponsorship and basically left him alone to work on the project. About a year later, Postman approved bringing three more of the JSON Schema team into this sponsorship. I was one of them. Shortly later, we added a community manager, making five people working full-time on JSON Schema. So for the past two years, I’ve been able to work on this project and JSON Schema full-time, full-remote, and it’s been great. I also learned during that time that Postman was similarly sponsoring AsyncAPI and Microcks, two other API-centric projects. However, recently it seems that Postman has had some internal changes in their sights: changes that didn’t include supporting JSON Schema or AsyncAPI. I can’t speak to what influenced those changes, but as of late July, Postman is no longer sponsoring these two projects through individual support. (Supposedly they’re still donating to the overall projects.) This decision left eight of us without income. Overall, I’m grateful to Postman (and Kin Lane, specifically) for this opportunity. JSON Schema has really benefited from having five people working full time on the project for two years. It had to end at some point. I just thought it would go longer. There were signs, but what we were being told was that the sponsorship wasn’t in danger of ending. What’s next? Well, naturally, I’ve had to go find a job. Some of the other JSON Schema team members had come from a consulting background, so they were used to more sporadic income, and they’re mostly looking to return to that. But I was an enterprise developer before, so that’s what I’ve been looking for. Fortunately, the New Zealand tech scene is pretty tight-knit, and a lot of us know each other. I had an almost-colleague (we had just missed each other at a previous employer) reach out with a position that was opening up, and I had several interviews for other roles that had been going well. Ultimately, I decided on the role with my colleague and I start on Monday! The company and team is much smaller than I’ve worked with before, and I’m really looking forward to the “startup” mentality of just getting stuff done as well as the opportunity to contribute toward building a work culture as the company grows. Working on a passion project I really enjoyed the opportunity to get paid to work on projects that hold my passion, but I don’t think I’d do it again. Postman was really good about letting us manage ourselves, and they didn’t provide any input as to where the project should go. It was obvious they just wanted the project to flourish on its own. Any public perception that they had somehow “bought” JSON Schema was misplaced, but I suppose it was only possible to really know that by being on the inside. However, before I worked on JSON Schema and my open source projects full time, I worked on them when I could, when I wanted to. Since being “employed” to build JSON Schema, I found that having a requirement to work on it discouraged my desire to do so. It became less of a passion project, and instead it truly became a job. Don’t get me wrong, though. I still cared; it’s just that choosing to do something is quite different than being required to do it. Since Postman stopped their sponsorship, I’ve read a lot of articles and seen a lot of YouTube videos that basically say the same, and it seems to apply across all fields, not just software. The general consensus from those who have experienced turning a hobby into work is that your hobby should probably stay a hobby. Keep moving forward So json-everything and my efforts in JSON Schema are returning to an “evenings and weekends, as I have time for it” gig. I’ve recently seen an increase in PRs from users on json-everything, which I’m grateful for and excited about. I hope this trend continues and it can become more of a community project rather than just something that I threw out into the void. Thanks to everyone who contributes code, ideas, and questions. And thanks to those who have supported me directly. If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!

Joining the .Net Foundation

That’s right! The json-everything project is officially a .Net Foundation member! How it started Inspiration from JSON Schema A couple years ago JSON Schema started the onboarding process to join the OpenJS Foundation. Joining a foundation means that they can lean on the experience of other members for help on things like governance, outreach, project organization, etc. It helps to have the backing of a larger organization. However, while the specification group would be joining the Foundation, all of the tooling built around the spec remained independent. Sadly, the OpenJS Foundation onboarding journey was interrupted, so JSON Schema is still independent. We’ll likely try again, maybe with another foundation, but that’s on the horizon for right now… and this post is about json-everything anyway! A push via JSON Path As part of the JSON Path specification effort with IETF, I reached out to a lot of JSON Path implementations to let them know a specification was coming, and I kept my eyes open for other places where JSON Path was being used and/or requested. One of those places was a .Net issue requesting that System.Text.Json get first-party support for the query syntax. I posted about JsonPath.Net, and one of the responses intrigued me. That is awesome and for my personal stuff this is great, but professionally, I might be limited by corporate policy to use 1st party (Microsoft), or 2nd party (.net foundation membered), or “verified” 3rd party, (Newtonsoft), libraries. - @frankhaugen I had never considered that a professional wouldn’t be able to use my libraries because of a corporate policy. They go on to say that many of these policies are driven by “auditing agencies for things like SOC2 and ISO2700 -certifications.” As I created these libraries to help developers make great software, this barrier bothered me. Investigation Looking into the three options mentioned, I first discovered that I am not Microsoft. (This was a devastating realization, and I had to re-evaluate my entire worldview.) I’m also not a .Net Foundation member, but I could look into joining. But first I wondered what it would take to have my packages verified on Nuget. Verifying packages is pretty simple: you just need a signing certificate. There are a lot of companies that provide them… and WOW are they expensive! So, .Net Foundation seemed to be my best option. I researched the benefits, the requirements, and the T&Cs. (I’m looking for links to all of the pages I found before, but the site has changed, and it looks like the application process now starts by filling out a web form. When I looked into it before, I just had to open an issue on the .Net Foundation’s Projects repo. If you’d like to join, I recommend going through the web form.) Application They use a very extensive issue template that plainly lists all of their requirements. Fortunately, through wanting to make my repository the best it could be, most of the requirements had already been met. I had some questions about the IP implications of joining, and the Projects Committee was very helpful. Shaun Walker answered these questions to my satisfaction, and Chris Sfanos has been guiding the application through the rest of the process. Acceptance The Projects Committee decides on projects to be inducted on what appears to be a monthly basis. The result of their decision then goes to the .Net Foundation Board, who ultimately accepts or rejects the application. I was quite pleased when I received notification that my humble json-everything had been accepted. How it’s going I’m currently finishing up the onboarding process. There a checklist on my application issue that details all of the things that need to happen (or ensure have happened). I think the biggest change is that the project will be under a CLA. I’ve read through it, and it basically says the contributor allows the project and .Net Foundation to distribute and potentially patent their contribution (as part of the project). I’m not sure anything contributed to json-everything will or could be patented, but I suppose it’s come up enough for them to add it to the CLA. Outside of that, the contributor retains all rights. I’ve also moved all of the related repos into a new json-everything org, and I spruced up the place a bit, made all the readmes pretty. GitHub has done a good job of applying redirects, so everyone’s links should still work. Then there are some housekeeping things for the repos and their public announcement, which will come via their newsletter. The future I like this ship! You know, it's exciting! - Star Trek, 2009 The future is bright for the project. I expect to be working mostly on the new learning site by adding more lessons for JsonSchema.Net and the other libraries. I’ve been working hard over in JSON-Schema-Land getting the spec ready for its next release. Keep an eye out on the JSON Schema blog for news about that. And hopefully this means that more people can use my work! If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!

Learn json-everything

JSON Schema is really a cool community to work in. Over the past couple years, our Community Manager, Benja Granados, has had us involved in Google’s Summer of Code program (GSoC), which gives (primarily) students an opportunity to hone their software development skills on real-world problems through contributions to open source. This year, one of the GSoC projects JSON Schema is working on is a “JSON Schema Tour” website, which will provide a number of (very simple) coding challenges to the user as a way to teach the ins and outs of JSON Schema. I was fortunate enough to be shown a preview of this new site a couple weeks ago, and it inspired me to do something similar for my libraries. Announcing Learn json-everything Learn json-everything is a new site where you can learn how to use the various JSON technologies supported by the json-everything project. In building the lessons for this site, I’m trying to focus more on using the libraries rather than on how to use the underlying technologies. I don’t want to step on the toes of the aforementioned GSoC project or other fantastic learning and reference material sites like Learn JSON Schema. One exception to this is JSON Path. Since RFC 9535 is relatively new, there’s not a lot of documentation out there yet, so I’ll be teaching the specification’s particular flavor of JSON Path as well. A typical lesson Currently a lesson consists of some background information, a link or two to relevant documentation, and a coding challenge. The coding challenge is made up of a task to complete, a code snippet (into which the user’s code will be inserted), and some tests to verify completion. The background information typically describes the use case for a particular feature, and the coding challenge allows the user to get their hands dirty actually writing code. In my experience, doing is the most effective way to learn. As an example, here’s the first lesson for JSON Schema, which teaches you how to deserialize a schema: Deserializing a Schema Background JSON Schema is typically itself represented in JSON. To support this, the JsonSchema type is completely compatible with the System.Text.Json serializer. [Documentation] Task Deserialize the text in schemaText into a JsonSchema variable called schema. Code template 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 using System.Text.Json; using System.Text.Json.Nodes; using Json.Schema; namespace LearnJsonEverything; public class Lesson : ILessonRunner<EvaluationResults> { public EvaluationResults Run(JsonObject test) { var instance = test["instance"]; var schemaText = """ { "type": "object", "properties": { "foo": { "type": "number", "minimum": 0 }, "bar": { "type": "string" } }, "required": ["foo", "bar"] } """; /* USER CODE */ return schema.Evaluate(instance); } } Tests Instance Is valid {"foo":13,"bar":"a string"} true {"foo":false,"bar":"a string"} false {"foo":13} false {"bar":"a string"} false [1,2,3] false 6.8 false Then you’re given a code editor in which you can provide code to replace the /* USER CODE */ comment in the template. The code in the lesson, along with the user’s code, constructs an ILessonRunner<T> implementation. Each lesson type (JSON Schema, JSON Path, etc.) defines what T is. For JSON Schema, it’s the evaluation results. Then the implementation will be instantiated, Run() will be called for each of the tests, and the results will be compared with the expected outcomes from the tests. The goal is to make all of the tests pass. If compilation fails, the user will get the compiler output so that they can fix their code. Funnily, I discovered that adding Console.WriteLine() calls in the user code outputs to the browser console, so that can be used for debugging. How it works Like the main playground, Learn json-everything is built with .Net’s Blazor WASM. Everything that the site does happens in the client, including building and running the C# code you enter. The Blazor stuff is pretty straightforward. Well, as straightforward as web development can be. I despise CSS layout. I spent two days just trying to get the layout right, whereas the rest of the site infrastructure only took a couple hours! Oh, how I long for the good ol’ days of building UI/UX in WPF… The really interesting part is how the code is built. I figured out the majority of this when building support for schema generation on the playground, but I refined it a bit more with this site. Building and running C# inside a web browser Blazor WASM does most of the heavy lifting by providing a way to run .Net in the browser at all, and a good portion of the rest is provided by the Microsoft.CodeAnalysis.CSharp.Scripting Nuget package. The rest involves building a context in which your compilation can run and then explicitly loading the new assembly. Compilation requires two sources and several steps. The sources are the source code (of course) and any referenced libraries. The source code is pretty easy: it’s provided by a combination of the code from the lesson and the user’s code. The referenced libraries are provided by whatever’s in the current app domain along with the json-everything libraries. Getting references In Blazor WASM, all of the libraries needed by a particular site can be found in a /_framework folder on the site root. Also, by default, the libraries are trimmed, which means that parts of the libraries might have been removed to decrease load times. While generally beneficial, it can be a problem when you’re ad-hoc compiling your user’s code. I ended up just turning off trimming by adding <PublishTrimmed>false</PublishTrimmed> to my project file. As of .Net 8, the libraries are published as .wasm files, not .dll files, which the compiler doesn’t understand. To get the .dll files, you’ll need to add <WasmEnableWebcil>false</WasmEnableWebcil> to your project file. I also noticed that not all of the libraries you want to reference are loaded into the app domain right away, like the json-everything libraries themselves, because the site doesn’t immediately make use of them. The solution was to make sure that those specific libraries were loaded by simply searching for them explicitly by name. Each library must be loaded as a MetadataReference, which means loading the file from a download stream. All of this can be performed asynchronously, so I just kick it off as soon as the site loads. I also put in protections so that if the user tries to run code before the assemblies are loaded, they get an error message to wait for the loading to finish. I still need to look into a progress indicator so that the user can know when that’s done. For now, it’s just listed in the browser console. Building code The next step is actually building the code. The first step is parsing the source into a syntax tree. This is accomplished using the CSharpSyntaxTree.ParseText() method. This doesn’t need the references we just gathered; it’s just looking at C# symbols and making sure the syntax itself is good. You’ll also need a temporary “file” for your assembly. This is easy and doesn’t require anything special. Just use Path.GetTempFileName() and change the extension to .dll. Next up, we create a compilation. This takes the file name, the syntax tree, and the references and builds an actual compilation. This is an intermediate representation of the build. Finally, we use the compilation to emit IL and other build outputs. The build outputs include the .dll itself and optionally a .pdb symbols file and/or an .xml documentation file; you’ll need to supply streams for these. (I need all of them to support schema generation.) This process will produce an EmitResult which contains any diagnostics (errors, warnings, etc.). Once the IL is emitted into the assembly stream, it can be loaded via AssemblyLoadContext.LoadFromStream() and you can start using its types directly in your code. You’ll probably want to unload the assembly when you’re done with it. Using an AssemblyLoadContext instead of Assembly.Load() allows this. This site creates a new assembly with each compilation (every time the user clicks “Run”), so they stack up pretty quickly. Unloading old contexts between each run helps keep memory usage down. All of the source for this is on GitHub. The hard part With the above, I can build code provided by the user. But honestly, for me, that was the easy part. Now I have to do the hard part, which is building out lessons. So far, the approach I’ve been taking is going through the documentation to identify things that could be enhanced with interactivity. That’s worked well so far, but I think I’m going to need more soon. As I mentioned, I’ll be teaching the RFC 9535 JSON Path, so that should keep me busy for a while. And while I’ve done a few JSON Schema lessons, I still have the rest of the libraries to fill out as well. I also have a slew of usability features I’d like to add in, like some level of intellisense, but I haven’t figured out how just yet. If you think of some lessons you’d like to see, or enhancements to the site, please feel free to open an issue or create a PR. If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!

Improving JsonSchema.Net (Part 2)

Over the last few posts, I’ve gone over some recent changes to my libraries that work toward better performance by way of reducing memory allocations. In this post, I’d like to review some changes I made internally to JsonSchema.Net that helped the code make more sense while also providing some of the performance increase. The sad state of things In version 6 and prior, analysis of schemas was performed and stored in code that was strewn about in many different places. JsonSchema would assess and store a lot of its own data, like base URI, dialect, and anchors. There were extension methods for various lookups that I had to do a lot, and the static class that defined the methods had private static dictionaries to cache the data. Keyword Type and instance to keyword name (e.g. TitleKeyword -> “title”) Whether a keyword supported a given JSON Schema version (e.g. prefixItems is only 2020-12) Keyword priority calculation and lookup (e.g. properties needs to run before additionalProperties) Whether a keyword produced annotations that another keyword needed (e.g. unevaluatedProperties depends on annotations from properties, even nested ones) The code to determine which keywords to evaluate was in EvaluationOptions. But the code to determine which keywords were supported by the schema’s declared meta-schema was in EvaluationContext. Yeah, a lot of code in places it didn’t need to be. Moreover, a lot of this was performed at evaluation time. It was time to fix this. A better way About a month ago, I ran through an experiment to see if I could make a JSON Schema library (from scratch) that didn’t have an object model. This came out of reworking my JSON Logic library to do the same. The results of this experiment can be found in the schema/experiment-modelless-schema branch, if you want to have a look. There’s a new static JsonSchema.Evaluate() method that calls each keyword via a new IKeywordHandler interface. While the single run performance is great, it can’t compete at scale with the static analysis that was introduced a few versions ago. In building the experiment, I had to rebuild things like the schema and keyword registries, and I discovered that I could do a lot of the analysis that yielded the above information at registration time. This meant that I wasn’t trying to get this data during evaluation, which is what lead to the stark increase in performance for single evaluations. I had decided not to pursue the experiment further, but I had learned a lot by doing it, so it wasn’t a waste. Sometimes rebuilding something from scratch can give you better results, even if it just teaches you things. So let’s get refactoring! We got a lot to do. We gotta get to it. - The Matrix, 1999 Managing keyword data I started with the keyword registry. I wanted to get rid of all of those extensions and just precalculate everything as keywords were registered. In its current state, SchemaKeywordRegistry contained three different dictionaries: keyword name → keyword type keyword type → instance (for keywords that need to support null values, like const; this resolves some serializer problems) keyword type → keyword TypeInfoResolver (supports Native AOT) In the keyword extensions, I then had more dictionaries: keyword type → keyword name (reverse of what’s in the registry) keyword type → evaluation group (supporting priority and keyword evaluation order) keyword type → specification versions That’s a lot of dictionaries! And I needed them all to be concurrent! Consolidation First, I need to consolidate all of this into a “keyword meta-data” type. This is what I came up with: 1 2 3 4 5 6 7 8 9 10 11 12 class KeywordMetaData { public string Name { get; } public Type Type { get; } public long Priority { get; set; } public bool ProducesDependentAnnotations { get; set; } public IJsonSchemaKeyword? NullValue { get; set; } public SpecVersion SupportedVersions { get; set; } public JsonSerializerContext? SerializerContext { get; } // constructor contains most of the keyword inspection as well. } This single type stores all of the information for a single keyword that was stored in the various dictionaries listed above. Access Second, I need a way to store these so that I can access them in multiple ways. What I’d really like is a current dictionary that allows access to items using multiple keys. There are probably (definitely) a number of ways to do this. My approach was to wrap a ConcurrentDictionary<object, KeywordMetaData> and keep a collection of “key functions” that would produce a number of key objects for an item. When I add an item, it produces all of the keys and creates an entry for each, using the item as the value. That way, I can look up the item using any of the keys. Data initialization With these pieces in place, I can simply take all of the keyword types, build meta-data objects, and add those to the lookup. Finally, once the lookup has all of the keywords, I run some dependency analysis logic to calculate the priorities, and it’s done. When a client adds a new keyword, I simply add it to the lookup and run the dependency analysis again. Deletion The final step for this part of the refactor was to move the extension methods into the SchemaKeywordRegistry class (which was already static anyway) and delete the KeywordExtensions class. Managing schema data The other significant update I wanted to make was how schema data was handled. Like keywords, the data should be gathered at registration time rather than at evaluation time. So what kind of data do I need (or can I get) from schemas? What is the root document for any given URI? Are there any anchors defined in the document? Are any of those anchors dynamic (defined by $dynamicAnchor)? Are any of those anchors legacy (defined by $id instead of $anchor)? Is there a $recursiveAnchor? What version of the specification should it use? What dialect does the schema use (which keywords does its meta-schema declare)? I currently have several chunks of code in various places that calculate and store this. Like the keyword data, this could be consolidated. Consolidation In previous versions, JsonSchema contained a method called PopulateBaseUris() that would run on the first evaluation. This method would recursively scan the entire tree and set all of the base URIs for all of the subschemas and register any anchors. The anchor registry was on JsonSchema itself. Later, when resolving a reference that had an anchor on it, the RefKeyword (or DynamicRefKeyword or whatever needed to resolve the reference) would ask the schema registry for the schema using the base URI, and then it would check that schema directly to see if it had the required anchor. A better way would be to just let the registry figure it all out. To do that, we need a registration type to hold all of the schema identifier meta-data. 1 2 3 4 5 6 7 8 class Registration { public required IBaseDocument Root { get; init; } public Dictionary<string, JsonSchema>? Anchors { get; set; } public Dictionary<string, JsonSchema>? LegacyAnchors { get; set; } public Dictionary<string, JsonSchema>? DynamicAnchors { get; set; } public JsonSchema? RecursiveAnchor { get; set; } } Access The next step was to expose all of this glorious data to consumers of the registry. I already had a .Get(Uri) method, but for this, I’d need something a bit more robust. So I created these: .Get(Uri baseUri, string? anchor, bool allowLegacy = false) .Get(DynamicScope scope, Uri baseUri, string anchor, bool requireLocalAnchor) .GetRecursive(DynamicScope scope) These are all internal, but the .Get(Uri) still exists publicly. These methods let me query for schemas identified by URIs, URIs with anchors, and recursive and dynamic anchors, all with varied support based on which specification version I’m using. Draft 6/7 defines anchors in $id, but that usage is disallowed since 2019-09, which added $anchor. Draft 2019-09 defines $recursiveAnchor, but that was replaced by $dynamicAnchor in 2020-12. In draft 2020-12, $dynamicRef has a requirement that a $dynamicAnchor must exist within the same schema resource. This has been removed for the upcoming specification version. I have to support all of these variances, and I can do that with these three methods. Data initialization Scanning the schemas seemed like it was going to be the hard part, but it turned out to be pretty easy. As mentioned before, the old scanning approach was recursive: it would scan the local subschema to see if it had the appropriate keywords, then it would call itself on any nested subschemas to scan them. However, during all of the changes described in this and the previous posts, I developed a pattern that lets me scan a recursive structure iteratively. I’m not sure if it’s the best way, but it’s a good way and it’s mine. Here’s some pseudocode. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Result[] Scan(Item root) { var itemsToScan = new Queue() { root }; var result = new List(); while (itemsToScan.Count != 0) { // get the next item var item = itemsToScan.Dequeue(); // gather the data we want from it var localResult = GetDataForLocal(item); result.Add(localResult); // check to see if it has children foreach(var sub in item.GetSubItems()) { // set up child for scan itemsToScan.Enqueue(sub); } } return result; } The things I wanted to get at each stage were all the anchors from before. And since I was already iterating through all of the subschemas and tracking their base URIs, it was simple to just set that on the subschemas. I also checked for: a declared version, determined by the meta-schema, which I could get because I’m already in the schema registry the dialect, which is the set of vocabularies (which declare support for keywords) defined by that meta-schema Deletion With all of this now pre-calculated when the schema is registered, I no longer needed all of the code that did this spread out all over everywhere. So it’s gone! JsonSchema no longer keeps anchor data EvaluationOptions no longer determines which keywords to process EvaluationContext no longer determines vocab or stores dialect information (This seems like a short list, but it was a serious chunk of code.) Wrap up This was a lot of refactoring, but I’ve been wanting to do something about the disorganized state of my code for a really long time. I knew that it needed fixing, and I unexpectedly discovered how to fix it by writing a new implementation from scratch. Hopefully that won’t be necessary every time. Thanks for reading through this series of posts covering the latest set of improvements and the things I learned along the way. One last thing I’ve recently set up my GitHub Sponsors page, so if you or your company find my work useful, I’d be eternally grateful if you signed up for a monthly contribution. When you sign up at any level, you’ll be listed in the sponsors section on that page as well as the new Support page on this blog. Higher levels can get social media shoutouts as well as inclusion in the sponsors bubble cloud at the bottom of the json-everything.net landing page (which will show up as soon as I have such a sponsor). Thanks again. If you like the work I put out, and would like to help ensure that I keep it up, please consider becoming a sponsor!