Thursday, 29 September 2016

Find Documents with missing properties in Azure DocumentDb with the .Net SDK

Azure DocumentDb stores documents as JSON. One of the effects of this is that sometimes you may end up with documents in the database that have missing properties and it can be quite tricky to search for them with the .Net SDK. This blog post has an approach to doing it - and quite simply too.


Most commonly, you would encounter the issue of the missing property when you add a new property to an existing class in your .Net code. There is no automatic method of adding this new property to all the existing entries in the database, short of re-saving them all.

Alternatively, you can explicitly configure Json.Net to not store properties that have null values like this:
JsonConvert.DefaultSettings = () => 
  new JsonSerializerSettings
          NullValueHandling = NullValueHandling.Ignore

You can use this configuration option to test the behaviour I am describing here or to save space in the database.

For example, imagine you have a class called MyItem looking like this:
public class MyItem
    public string Id { get; set; }
    public string SomeValue { get; set; }

If you have an item where SomeValue is null, by default that will be serialised and stored in DocumentDb like this:
  "id" : "1",
  "SomeValue" : null

However, if you configure Json.Net to not store null values (or the SomeValue field was added to your .Net code after you stored this item in DocumentDb) it will look like this in the database:
  "id" : "1",

Selecting missing properties with SQL

According to the documentation you can use SQL to select missing properties like this:
SELECT f.lastName ?? f.surname AS familyName
FROM Families f
You can then extrapolate from that example to, for example, select items etc.

Finding items with null or missing with the .Net SDK

Imagine you have added the SomeValue property to the MyItem class after you had already saved some items. Further, sometimes you store a null in the SomeValue property. Or you have configured Json.Net to ignore null values. And now you want to find all the items where SomeValue is either missing or null.
You might try this: 
var query1 = client.CreateDocumentQuery<MyItem>(collectionUrl)
              .Where(i => i.SomeValue == null);
But you will find that this will not actually return any results - at least, it won't return any documents where SomeValue is not present at all. However, this odd-looking statement will work:
var query2 = client.CreateDocumentQuery<MyItem>(collectionUrl)
              .Where(i => (i.SomeValue ?? null) == null);

It is using the null coalescor to make DocumentDb return a null value for the property for the property if it does not exist, which we can then compare to null.

I have tested this with version 1.6, 1.8 and 1.10 of the SDK, but I would advise you to put an integration test in your code if you are going to rely on it, just in case the behaviour changes in the future. You'll probably also want to put a comment wherever you use this syntax as R# is quite keen to tell you that you should get rid of the "?? null" part.
Finally, I have not done any performance testing on this, but I suspect DocumentDb won't be able to use any indexes to execute this query; it will probably have to evaluate each document in a scan so use with caution.

A full sample

If you want to try this out, there is a full example here: