Writing and Running Queries on NoSQL Document Databases

In this chapter, we will write and run queries to retrieve data from documents in a collection. We will use the Cosmos DB dialect of SQL to work against a document database with the SQL API. We will understand the different ways of working with the documents, their sub-documents, and their arrays, and we will learn how queries consume resource units.

Running queries against a collection with different tools

In Chapter 2, Getting Started with Cosmos DB Development and NoSQL Document Databases, we created an Azure Cosmos DB account with the SQL API, a document database, and a collection. Then, we inserted two JSON documents in the collection. Now, we will run queries against the collection with the two documents to learn the basics of the Cosmos DB dialect of SQL and how to run queries in the different web-based and GUI tools.

When Cosmos DB was launched, one of the most frustrating issues was that the only available tool to run queries was the web-based the Azure portal with its screen real estate problems. Luckily, Microsoft added the web-based Azure Cosmos DB Explorer and included support for Cosmos DB in the Azure Storage Explorer GUI tool. The three tools allow us to run queries against a collection by following very simple steps.

If you decide to work with the Azure Cosmos DB Emulator, you can use the web-based portal for the emulator or you can work with Azure Storage Explorer.

First, we will learn the necessary steps for running a simple query in each tool and to check the request units consumed by the query. Then, you can select the most appropriate tool for you to run the subsequent examples and then work with your databases.

The following instructions will always avoid running the default filter, which retrieves documents for a collection whenever possible. This way, we avoid consuming unnecessary request units before running our desired query. However, some tools have a wrong design and don't allow us to write queries without retrieving the first documents as a first step. It is very important to understand that the filter feature included in the Documents view for a collection in the different tools doesn't provide the same result set that we will get from executing the same query. Thus, we won't apply filters by adding criteria in the filter panel found in the different tools. Instead, we will work with queries and their real results in order to be ready to work with the different SDKs.

Understanding query results in JSON arrays

The following lines show one of the simplest queries that we can run against the previously created VideoGames1 collection and retrieves the document whose id is equal to 2. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_01.sql file:

SELECT *
FROM Videogames v
WHERE v.id = '2'

In the Azure portal, make sure you are in the page for the Cosmos DB account in the portal. Click on the Data Explorer option, click on the database name (Competition) to expand the collections for the database, and click on the collection name (VideoGames1). Click on New SQL Query in the toolbar located at the top of the panel and the portal will add a new query tab with a default query at the top: SELECT * FROM c. Select this text, paste the new text for the query, and click Execute Query. Cosmos DB will execute the query and the portal will display the JSON document with the results in the Results tab, as shown in the following screenshot:

Notice that the header for the results indicates that the Results panel is displaying the results from number 1 up to number 1 with the following label: Results 1 - 1. The following lines show the results of the query:

[
   {
        "id": "2",
        "name": "Fortnite vs Zombies",
        "lastCompetitionDate": "2018-09-30T03:31:20.7251173Z",
        "tags": [
            "3D",
            "battle royale",
            "monsters",
            "shooter"
        ],
        "platforms": [
            "PS4",
            "XBox",
            "PC",
            "Switch",
            "iPad",
            "iPhone",
            "Android"
        ],
        "levels": [
            {
                "title": "Dancing in the storm",
                "maximumPlayers": 50,
                "minimumExperienceLevel": 30
            },
            {
                "title": "Rainbows after the storm",
                "maximumPlayers": 30,
                "minimumExperienceLevel": 60
            },
            {
                "title": "The last of us",
                "maximumPlayers": 10,
                "minimumExperienceLevel": 100
            }
        ],
        "highestScores": [
            {
                "player": {
                    "nickName": "PlaystationBoy",
                    "clan": "USA Players",
                    "experienceLevel": 140
                },
                "score": "5600"
            },
            {
                "player": {
                    "nickName": "KevinSwitchMan",
                    "clan": "Italian Warriors",
                    "experienceLevel": 125
                },
                "score": "3300"
            }
        ],
        "_rid": "prUNAKtPjRoGAAAAAAAAAA==",         "_self":
"dbs/prUNAA==/colls/prUNAKtPjRo=/docs/prUNAKtPjRoGAAAAAAAAAA==/",
        "_etag": "\"220066eb-0000-0000-0000-5b8704a20000\"",
        "_attachments": "attachments/",
        "_ts": 1535575202
    }
]

If we pay close attention to the previous JSON document returned as the result, we will notice that it is an array with the only JSON document that matches the query as its single element.

Whenever we execute a valid query in a collection with the SQL API, we will always receive a JSON array with one or more elements as a response. If our query just retrieves one document, we will receive this document as an element in an array, as in our current example. If we compare the results of our query with the results of inspecting the document that is stored in the database, we will notice the query results enclose the document in brackets ([]) to include the document as an element of an array. It is extremely important to understand the way the SQL API provides responses to queries.

If you have experience with relational databases and their SQL dialects, you must consider that the following query isn't equivalent to the previously shown query and it has an invalid syntax for the Cosmos DB SQL dialect:

SELECT v.*
FROM Videogames v
WHERE v.id = '2'

Hence, whenever you want to retrieve all the contents, just use SELECT * without the alias followed by a dot (.) as a prefix.

Checking the request units spent by a query

Every query we execute consumes request units. We can easily check the request charges for a query by clicking on the Query Information tab. The following screenshot shows the information provided by this tab for the previously executed query:

The value for the Request Charge metric specifies the request units that we were charged by Cosmos DB for the executed query. In this case, the query spent 2.35 request units from the request units we are provisioned for the VideoGames1 collection. Remember that we configured the settings for this collection to provide a throughput of 1,000 request units per second. Hence, after we execute this query, we will have 1,000 - 2.35 = 997.65 request units remaining after they are reset to 1,000 in the next second. In an application, we would be able to run this query 425 times in one second with the provisioned 1,000 request units per second. We would have to wait until the next second to run this query again in order to have 1,000 request units available for one second.

In this case, we are inspecting the query information in the Azure portal. It is very important to know that whenever we execute queries through the different Cosmos DB SDKs, we also receive this information as part of the results, and therefore, we are able to check the request units spent by a query. In fact, the SDKs provide a huge amount of additional data related to the query execution. Unluckily, at the time I was writing the book, the Azure portal didn't offer more information about the query execution.

The Cosmos DB account is using the default consistency level for database accounts:

session consistency. Notice that the selected default consistency level has an impact on the request units charged to a query.

Now, let's make a simple change to the previous query to retrieve a different document. Specifically, we will indicate we want to retrieve the document whose id is equal to 1. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_02.sql file:

SELECT *
FROM Videogames v
WHERE v.id = '1'

This time, we will take advantage of the Cosmos DB Explorer to run the query and benefit from a full-screen view:

Click Open Full Screen on the toolbar for the Cosmos DB database account and then click Open to open the read-write URL in a new tab. The left-hand panel will list all the connections for the Competition database.

Click on the collection name (VideoGames1). Click on New SQL Query in the toolbar located at the top of the panel and the portal will add a new query tab with a default query at the top: SELECT * FROM c.

Select this text, paste the new text for the query, and click Execute Query. Cosmos DB will execute the query and the portal will display the JSON document with the results in the Results tab, as shown in the next screenshot. Notice the additional screen real estate compared with the usage of the Azure portal for the same task:

Click on the Query Information tab to check the request charges for this query. The following screenshot shows the information provided by this tab for the recently executed query:

The new query has a request charge of 2.32 request units, which is lower than the 2.35 request units spent by the previous query. The only difference between the two queries is the id value. The new query consumes 0.03 fewer request units because the retrieved document that is returned as the single element of the result array is smaller than the document whose id is equal to 2. The document retrieved by the new query has 1,180 characters and the document retrieved by the first query has 1,731 characters.

The amount of data retrieved by a query has an impact on the request units charged to it by Cosmos DB.

Working with schema-agnostic queries

The VideoGames1 collection has two documents. The document with id equal to 2 has a platforms key whose value is an array of string with the platforms in which the video game can be executed. The document with id equal to 1 doesn't include the platforms key.

Now we will write a query that will indicate the properties we want to retrieve from all the documents in the collection. Specifically, we will specify we want to retrieve the name and platforms properties. In addition, we will request the results to be sorted by name in ascending order. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_03.sql file:

SELECT v.name,
    v.platforms FROM Videogames v
ORDER BY v.name

You can follow the steps to execute the preceding query:

Go to Azure Storage Explorer and navigate through the existing document database (Competition), its collection (VideoGames1), and its Documents element by expanding and clicking on the different elements.

Right-click on the Documents element for the VideoGames1 collection and select the Open query tab in the context menu. The application will open a new query tab with a default query at the top: SELECT * FROM c.

Select this text, paste the new text for the query, and click Execute Query. Cosmos DB will execute the query and the application will display the JSON document with the results in the Results tab, as shown in the following screenshot:

As happened with the other tools, we can click on the Query Information tab to check the request charges for this query. The following screenshot shows the information provided by this tab for the recently executed query:

The result for this query is an array with two elements that have different keys or properties. The first element only provides the value for the name key because the document whose id is equal to 1 didn't have a key named platforms at the root level. The second element provides the value for the name and platform keys. The query takes advantage of the schema-agnostic features in Cosmos DB and the document database, and therefore, we can request properties that aren't present in all of the documents and the query provides the results without issues.

It is very important to understand that the element related to the document that doesn't have a requested property won't include the key in the result.

This query required the retrieval of two documents, the extraction of the required properties from them, and the sorting of the results by the name property in ascending order. In this case, the request charge for the query is 3.28 request units.

We have learned the necessary steps to execute queries on the different available tools. Now we will focus on running useful queries and analyzing their results. You just need to follow the previously explained steps in your favorite tool to check the results.

Cosmos DB supports many built-in functions, including type-checking functions. For example, the following query retrieves the id of the documents that define the platforms key by taking advantage of the IS_DEFINED built-in type-checking function. This function returns a Boolean indicating whether the property received as an argument has been assigned a value. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_04.sql file:

SELECT v.id
FROM Videogames v
WHERE IS_DEFINED(v.platforms)

The following lines show the results of the query. Only the video game whose id is equal to "2" defined the platforms property:

[
    {
        "id": "2"
    }
]

The two documents in the VideoGames1 collection have the levels key with an array of JSON documents that defines the different levels that the game has. In the two video games, the properties that define a level have different properties and only the title property is included in the levels for both video games. The video game whose id is equal to 1 defines a level with the following properties: title, towers, and towerPower. The video game whose id is equal to 2 defines a level with the following properties: title, maximumPlayers, and minimumExperienceLevel.

The following query will build a new document for the first level of each video game document that has a levels property defined at the root level and its value is an array. The query takes advantage of the IS_ARRAY built-in type-checking function. This function returns a Boolean indicating whether the property received as an argument has been assigned a value. In addition, the query uses the AS keyword to generate a property based on the expression specified at the left-hand side. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_05.sql file:

SELECT v.id AS videoGameId,
    v.name AS videoGameName,
    v.levels[0] AS firstLevel
FROM videogames v
WHERE IS_ARRAY(v.levels)

The v.id and v.name properties are mapped to the videoGameId and videoGameName properties. The first level in the levels array (v.levels[0]) is mapped to the firstLevel property.

We can access the elements of an array with the desired element enclosed within brackets ([]) after the property name, with a syntax that is very common in many programming languages. Notice that the arrays use 0 for the index origin, and therefore, the first element in an array is addressed with [0].

The following lines show the results of the query. Notice that the firstLevel property has an object with different properties in each case:

[
    {
        "videoGameId": "1",
        "videoGameName": "Battle Royale Kingdoms",
        "firstLevel": {
            "title": "Training Camp for Dummies",
            "towers": 2,
            "towerPower": 30
        }
    },
    {
        "videoGameId": "2",
        "videoGameName": "Fortnite vs Zombies",
        "firstLevel": {
            "title": "Dancing in the storm",
            "maximumPlayers": 50,
            "minimumExperienceLevel": 30
        }
    }
]

In some cases, we will want to limit the number of values returned in the query. The TOP operator followed by a number will do this job. The following lines show a new version of the previous query that will sort the results by the video game's name and will return an array with only the first document of all the video games that match the specified criteria. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_06.sql file:

SELECT TOP 1 v.id AS videoGameId,
    v.name AS videoGameName,
    v.levels[0] AS firstLevel
FROM videogames v
WHERE IS_ARRAY(v.levels)
ORDER BY v.name

The following lines show the results of the query:

[
    {
        "videoGameId": "1",
        "videoGameName": "Battle Royale Kingdoms",
        "firstLevel": {
            "title": "Training Camp for Dummies",
            "towers": 2,
            "towerPower": 30
        }
    }
]

Using built-in array functions

The previous query made sure that it only processes each videogame document that has a level property defined at the root level and its value is an array. However, it is possible to have a new videogame document that declares an empty array ([]) as the value for the level property because the video game doesn't have defined levels yet.

The next query is a new version of the previous query that takes advantage of the ARRAY_LENGTH built-in function. This function returns the number of elements of the array expression received as an argument. The query makes sure that the level property is an array and that it contains at least one element. The results for the query will be the same that were shown for its previous version. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_07.sql file:

SELECT v.id AS videoGameId,
    v.name AS videoGameName,
    v.levels[0] AS firstLevel
FROM Videogames v
WHERE IS_ARRAY(v.levels)
AND ARRAY_LENGTH(v.levels) >= 1

The following query will build a new document for each video game that has at least three elements in the levels array at the root level. The query takes advantage of the ARRAY_SLICE built-in function. This function returns the part of the array expression received as an argument specified by the starting element and the number of elements. The query uses this function in combination with the AS keyword to generate the selectedLevels property with an array composed of the second and third elements of the levels array. Notice that the parameters for ARRAY_SLICE after v.levels are 1 and 2, which means we want the part of the array starting with the second element (the first element would be 0) and we must extract 2 elements. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_08.sql file:

SELECT v.id AS videoGameId,
    v.name AS videoGameName,
    ARRAY_SLICE(v.levels, 1, 2) AS selectedLevels
FROM Videogames v
WHERE IS_ARRAY(v.levels)
AND (ARRAY_LENGTH(v.levels) >= 3)

The following lines show the results of the query. Notice that the selectedLevels property has an array with objects with different properties in each case:

[
    {
        "videoGameId": "1",
        "videoGameName": "Battle Royale Kingdoms",
        "selectedLevels": [
            {
                "title": "Jungle Arena",
                "towers": 2,
                "towerPower": 40
            },
            {
                "title": "Legendary World",
                "towers": 5,
                "towerPower": 100
            }
        ]
    },
    {
        "videoGameId": "2",
        "videoGameName": "Fortnite vs Zombies",
        "selectedLevels": [
            {
                "title": "Rainbows after the storm",
                "maximumPlayers": 30,
                "minimumExperienceLevel": 60
            },
            {
                "title": "The last of us",
                "maximumPlayers": 10,
                "minimumExperienceLevel": 100
            }
        ]
    }
]

The two documents in the VideoGames1 collection have the tags key with an array of strings with tags related to the video game. The following query will build a new document for each videogame that includes monsters as one of the string values of the tags key. The query takes advantage of the ARRAY_CONTAINS built-in function. This function returns a Boolean indicating whether the array received as an argument contains the specified value. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_09.sql file:

SELECT v.id AS videoGameId,
    v.name AS videoGameName,
    v.tags AS videoGameTags
FROM Videogames v
WHERE ARRAY_CONTAINS(v.tags, "monsters")

The following lines show the results of the query. Only one document includes "monsters" as one of the string values of the tags key:

[
    {
        "videoGameId": "2",
        "videoGameName": "Fortnite vs Zombies",
        "videoGameTags": [
            "3D",
            "battle royale",
            "monsters",
            "shooter"
        ]
    }
]

Now we want to retrieve some properties for each video game that includes a specific level definition in the levels array. The following query will build a new document for each video game that includes a level definition with specific title, maximumPlayers, and minimumExperienceLevel values. The query uses the previously introduced ARRAY_CONTAINS built-in function. This time, the query calls this function with an object as the second argument. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_10.sql file:

SELECT v.id AS videoGameId,
    v.name AS videoGameName,
    v.levels AS videoGameLevels
FROM Videogames v
WHERE ARRAY_CONTAINS(v.levels,
    {
        "title": "Rainbows after the storm",
        "maximumPlayers": 30,
        "minimumExperienceLevel": 60
    })

The following lines show the results of the query. Only one document includes the requested level definition as one of the objects of the levels key:

[
    {
        "videoGameId": "2",
        "videoGameName": "Fortnite vs Zombies",
        "videoGameLevels": [
            {
                "title": "Dancing in the storm",
                "maximumPlayers": 50,
                "minimumExperienceLevel": 30
            },
            {
                "title": "Rainbows after the storm",
                "maximumPlayers": 30,
                "minimumExperienceLevel": 60
            },
            {
                "title": "The last of us",
                "maximumPlayers": 10,
                "minimumExperienceLevel": 100
            }
        ]
    }
]

Now we will make some changes to the previous query and we will specify a level definition with specific title and towers values. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_11.sql file:

SELECT v.id AS videoGameId,
    v.name AS videoGameName,
    v.levels AS videoGameLevels
FROM Videogames v
WHERE ARRAY_CONTAINS(v.levels,
    {
        "title": "Jungle Arena",
        "towers": 2
    })

The following line shows the results of the query, an empty array:

[]

One of the documents includes the following level definition as one of the objects of the levels key. However, the query result is an empty array because we are using the default option for the Boolean third argument of the ARRAY_CONTAINS built-in function, which specifies whether we want a partial match. The default value for this argument is false and it indicates that we want all the elements to match, and therefore, only an object that provides exactly the same key-value pairs that are specified will evaluate to true:

{
    "title": "Jungle Arena",
    "towers": 2,
    "towerPower": 40
}

The following is a new version of the query that specifies true as the third argument for the ARRAY_CONTAINS built-in function and indicates that we are specifying a partial fragment that we want to match. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_12.sql file:

SELECT v.id AS videoGameId,
    v.name AS videoGameName,
    v.levels AS videoGameLevels
FROM Videogames v
WHERE ARRAY_CONTAINS(v.levels,
    {
        "title": "Jungle Arena",
        "towers": 2
    }, true)

The following lines show the results of the query. Only one document includes a partial match for the requested fragment of the level definition as one of the objects of the levels key:

[
    {
        "videoGameId": "1",
        "videoGameName": "Battle Royale Kingdoms",
        "videoGameLevels": [
            {
                "title": "Training Camp for Dummies",
                "towers": 2,
                "towerPower": 30
            },
            {
                "title": "Jungle Arena",
                "towers": 2,
                "towerPower": 40
            },
            {
                "title": "Legendary World",
                "towers": 5,
                "towerPower": 100
            }
        ]
    }
]

Working with joins

The two documents in the VideoGames1 collection have the highestScores key with an array of JSON documents that provide a player object, also known as sub-document when we work with Cosmos DB. The array of JSON documents also provide a score value. In the two video games, the player sub-documents have different properties and only the nickName and clan properties are included in the player sub-document for the high score definitions of both video games. The video game whose id is equal to "1" defines a player with the following properties: nickName and clan. The video game whose id is equal to 2 defines a player with the following properties: nickName, clan, and experienceLevel.

Now, we want to retrieve the video game name, the player's nickname, and the high score achieved by the player on the game. The following table summarizes the properties we want to retrieve and the alias that we will assign to each of them:

Property

Alias

name

videoGameName

highestScoreplayernickName

playerNickName

highestScorescore

highScore

The following query will build a cross product of the video game documents and their high scores sub-documents by using a self-join. The query will take advantage of the JOIN clause, which forms tuples from the full cross product of specified sets. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_13.sql file:

SELECT v.name AS videoGameName,
    h.player.nickName AS playerNickName,
    h.score AS highScore
FROM Videogames v
JOIN h IN v.highestScores

The query produces the full product of the video game documents (the root documents) with the highestScores sub-document for each game and assigns the h alias to each highScore sub-document. The JOIN h IN v.highestScores clause generates an iterator that expands each child element, h, in the v.highestScores array and applies a cross product with the root of the document whose alias is v with each flattened child element, h. The query projects the desired properties from the cross product.

The following lines show a pseudo-code that rewrites the previous query with imperative code:

resultArray = [];
foreach (var v in Videogames)
{
    foreach (var h in v.highestScores)
    {
        resultTuple = new Tuple(
             videoGameName: v.name,
             highScore: h.player.nickName,
             highScore : h.score);
         resultArray.Add(resultTuple);
    }
} return resultArray;

The following lines show the results of the query. There is one document per high score registered. The first document has one high score and the second document has two high scores, and therefore, the results array has three documents (2 + 1 = 3):

[
    {
        "videoGameName": "Battle Royale Kingdoms",
        "playerNickName": "Brandon in Wonderland",
        "highScore": "750"
    },
    {
        "videoGameName": "Fortnite vs Zombies",
        "playerNickName": "PlaystationBoy",
        "highScore": "5600"
    },
    {
        "videoGameName": "Fortnite vs Zombies",
        "playerNickName": "KevinSwitchMan",
        "highScore": "3300"
    }
]

It is possible to chain multiple joins. However, it is very important to understand that we are asking Cosmos DB to produce full cross products. We must be extremely careful with self-joins.

Using array iteration

Now we will build a simple query to retrieve all the high scores defined for each video game flattened into a single array. The following query will take advantage of the IN keyword, which makes it possible to iterate over properties that are of the array type. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_14.sql file:

SELECT *
FROM h IN Videogames.highestScores

The following lines show a pseudo-code that rewrites the previous query with imperative code:

resultArray = [];
foreach (var v in Videogames)
{
    foreach (var h in v.highestScores)
    {
        resultArray.Add(h);
    }
} return resultArray;

The following lines show the results of the query. Notice that each highest score document is an element of the generated array:

[
    {
        "player": {
            "nickName": "Brandon in Wonderland",
            "clan": "Wonderland Warriors"
        },
        "score": "750"
    },
    {
        "player": {
            "nickName": "PlaystationBoy",
            "clan": "USA Players",
            "experienceLevel": 140
        },
        "score": "5600"
    },
    {
        "player": {
            "nickName": "KevinSwitchMan",
            "clan": "Italian Warriors",
            "experienceLevel": 125
        },
        "score": "3300"
    }
]

Now, we will add a filter to retrieve all the high scores defined for each video game flattened into a single array that have a player object with an experience level higher than 120. The next query combines the use of the IN keyword with a filter on the flattened single array it generates. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_15.sql file:

SELECT *
FROM h IN Videogames.highestScores
WHERE h.player.experienceLevel > 120

The following lines show a pseudo-code that rewrites the previous query with imperative code:

resultArray = [];
foreach (var v in Videogames)
{
    foreach (var h in v.highestScores)
    {
        if (h.player.experienceLevel > 120)
        {
            resultArray.Add(h);
        }
    }
} return resultArray;

The following lines show the results of the query. Notice that each highest score document whose player has an experienceLevel value higher than 120 is an element of the generated array:

[
    {
        "player": {
            "nickName": "PlaystationBoy",
            "clan": "USA Players",
            "experienceLevel": 140
        },
        "score": "5600"
    },
    {
        "player": {
            "nickName": "KevinSwitchMan",
            "clan": "Italian Warriors",
            "experienceLevel": 125
        },
        "score": "3300"
    }
]

It is extremely important to understand the difference between the use of the IN operator in a query and a similar query that produces different results because it doesn't flatten the highest scores in a single array. Thus, we will analyze the next queries in detail.

The following query retrieves all the highestScores arrays for each video game and adds each of them to the results array. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_16.sql file:

SELECT *
FROM Videogames.highestScores h

The following lines show a pseudo-code that rewrites the previous query with imperative code:

resultArray = [];
foreach (var v in Videogames)
{
    resultArray.Add(v.highestScores);
} return resultArray;

The following lines show the results of the query. Notice that each highestScore array is an element of the generated array. When we used the IN keyword, each highest score document was an element of the generated array. Of course, in this case, there is a big difference with the previous query, in that we don't use a WHERE clause and the results are different:

[
    [
        {
            "player": {
                "nickName": "Brandon in Wonderland",
                "clan": "Wonderland Warriors"
            },
            "score": "750"
        }
    ],
    [
        {
            "player": {
                "nickName": "PlaystationBoy",
                "clan": "USA Players",
                "experienceLevel": 140
            },
            "score": "5600"
        },
        {
            "player": {
                "nickName": "KevinSwitchMan",
                "clan": "Italian Warriors",
                "experienceLevel": 125
            },
            "score": "3300"
        }
    ]
]

Working with aggregate functions

Cosmos DB SQL provides support for aggregations in the SELECT clause. For example, the following query uses the SUM aggregate function to sum all the values in the expression and calculate the total number of levels in the filtered games. The query uses the ARRAY_LENGTH built-in function to calculate the length of the levels array for each game and use it as an argument for the SUM aggregate function. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_17.sql file:

SELECT SUM(ARRAY_LENGTH(v.levels))
FROM Videogames v

The following lines show the results of the query. Notice that the element of the array includes a key named $1:

[
    {
        "$1": 6
    }
]

Whenever we use an expression in the SELECT clause that is not a property name and we don't specify the desired alias name, Cosmos DB generates a key that starts with the $ prefix and continues with a number that starts in 1. Hence, if we have three expressions that aren't property names and don't include their desired aliases, Cosmos DB will use $1, $2 and $3 for the properties in the output results.

If we just want to generate the value without a key in the result, we can use the VALUE keyword. The following query uses this keyword. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_18.sql file:

SELECT VALUE SUM(ARRAY_LENGTH(v.levels))
FROM Videogames v

The following lines show the results of the query. Notice that the element of the array doesn't include the key:

[
    6
]

It is also possible to achieve the same goal by using the COUNT aggregate function combined with the IN keyword. The following query uses the COUNT aggregate function to count the number of items in the expression and calculate the total number of levels in the iterated levels for all the games. The code file for the sample is included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_19.sql file:

SELECT COUNT(l) AS totalNumberOfLevels
FROM l IN Videogames.levels

The following lines show the results of the query. Notice that, in this case, the query specified the desired alias:

[
    {
        "totalNumberOfLevels": 6
    }
]

Now we want to calculate the average tower power for the levels defined in the video games. The towerPower property is not defined for all the levels and it is only available for the levels of the game whose id is equal to 1. Whenever we use the AVG aggregate function to calculate an average for an expression, only the documents that have the property will be part of the average calculation. Hence, the levels that don't have the towerPower property won't generate an impact on the average. The following query uses the AVG aggregate function combined with the IN keyword to iterate all the levels of the games that have the towerPower property and compute its average value. The code file for the sample is

included in the learning_cosmos_db_03_01 folder in the sql_queries/videogame_1_20.sql file:

SELECT AVG(l.towerPower) AS towerPowerAverage
FROM l IN Videogames.levels

The following lines show the results of the query. Notice that, in this case, the query specified the desired alias:

[
    {
        "towerPowerAverage": 56.666666666666664
    }
]

In order to make things simpler, we have been always running queries that used the default indexing and worked with a single partition. In the next chapter, we will analyze indexing strategies for our application and we will work with multiple partitions.

Test your knowledge

Let's see whether you can answer the following questions correctly:

  1. Which of the following queries is valid in the Cosmos DB SQL dialect?
    1. SELECT g.* FROM Games g WHERE g.id == '5'
    2. SELECT g.* FROM Games g WHERE g.id = '5'
    3. SELECT * FROM Games g WHERE g.id = '5'
  2. The IS_DEFINED built-in type-checking function does what exactly? 1. Returns a Boolean indicating whether the property received as an argument has been assigned a value
    1. Returns the number of times a property has been assigned a value in the array expression received as an argument
    2. Returns a Boolean indicating whether the property received as an argument has been defined as required for the collection that contains the document
  3. Which of the following queries retrieves all the levels flattened into a single array?
    1. SELECT * FROM l IN Games.levels
    2. SELECT FLAT * FROM Games.levels
    3. SELECT * FROM Games.levels[0]
  4. Which of the following queries returns an array with a single value without a key?
    1. SELECT TOP 1 SUM(ARRAY_LENGTH(g.highestScores)) FROM Games g
    2. SELECT FLAT SUM(ARRAY_LENGTH(g.highestScores)) FROM Games g
    3. SELECT VALUE SUM(ARRAY_LENGTH(g.highestScores)) FROM Games g
  5. Which of the following queries returns an array with a key-value pair?
    1. SELECT TOP 1 VALUE SUM(ARRAY_LENGTH(g.highestScores)) FROM Games g
    2. SELECT SUM(ARRAY_LENGTH(g.highestScores)) AS totalHighestScores FROM Games g
    3. SELECT FLAT SUM(ARRAY_LENGTH(g.highestScores)) FROM Games g

Summary

In this chapter, we learned the necessary steps for composing and executing queries against a Cosmos DB collection that uses the SQL API. Now we are able to select the most appropriate tool based on our specific needs.

We understood query results for the Cosmos DB SQL dialect. We learned that the results are JSON arrays. We used the different tools to check the resource units that Cosmos DB charges for each query and we understood its impact on the available resource units after a query is executed.

We worked with schema-agnostic queries and we used many built-in functions, including type-checking and array functions. We worked with joins, array iterations, projections, and aggregate functions. We understood how to take advantage of the Cosmos DB SQL dialect to work with a schema-agnostic document database.

Now that we have a very clear understanding of the document database and its SQL dialect, we will use the .NET Core SDK and C# to code and tune our first Cosmos DB application, which is what we are going to discuss in the next chapter.