CosmosDB is a NoSQL database for solving the problem of big data as expressed by the three V’s: volume, velocity and variety.
CosmosDB is built to be distributed from the ground up. That means you can have multiple replicas of the same database. The more replicas you have, the more available and resilient to failures your database will become. No extra effort needed. It’s simply built-in, not like an afterthought. Next to database replication, CosmosDB also offers scaling out for high volume and high through-put. Scaling out is different from scaling up. More machines are added when needed. CosmosDB also prevents the need for schema changes. Schema changes become ever more complex for business critical databases that require continuous uptime around the clock. Especially when those databases are scaled out. In that case, changes need to be deployed across multiple database servers.
CosmosDB has the following main features:
CosmosDB is first and foremost a document database to store json documents in containers. The documents in your container can differ in schema and can be of different schema types: sales orders, products, invoices, clients, you name it. Reasons to use a different container are not schema type or storage (storage is unlimited), but: through-put (request units), partitioning (partitition key) or both. An example of a partition key is /address/zipCode. You can start with the free tier, which offers 400 request units per second and 5GB of storage. It’s not suitable for large production workloads, but very handy for testing. You can always upgrade later on. Also note that you can use a local emulator (located at: http://aka.ms/cosmosdb-emulator). And finally, it’s important to realize that CosmosDB always encrypts data at rest via a service managed key or a customer managed key.
Request units (RU’s) are not a measure for the number of requests per second. It’s a performance measure that includes: CPU, memory and RAM. The short version is as follows. Lower values give lower performance at a lower price. Higher values give higher performance at a higher price. Always disable option “Provision Database Throughput” as this option distributes throughput evenly across all containers, whereas varying througput is typically an important reason to use different containers.
You can create a CosmosDB account and one or more containers via the Azure Portal, Azure CLI or Powershell. You can also use the Data Explorer in the Azure Portal, with or without notebooks. You can use muliple API’s for the underlying datamodel, but for document databases Core (SQL API) is the best option. For key value storage in tables, you can use the Table API.
A CosmosDB database can have mutiple read regions and multiple write regions (also named multi master). Under Keys you’ll find the primary and secondary key and connection string. You have Read-Write keys and Read-Only keys. Be careful, the Read-Write keys give unlimited access to the database and should never be shared with end users. Once you have an account and a container, you can use the Data Explorer for querying. You don’t have to worry about indexing. Every property becomes a leaf node in the hierachy. Each arrays leads to another parent level. The hierarchy is translated into keys and values. And this list is simply translated into indexes.
We can add documents to populate the container. When we leave out the id property a GUID will automatically be generated for the id. The system properties _rid, _self, _etag, _attachments and_ts (timestamp) will be added automatically.
Simple notebook examples can use one of the standard templates from the gallery or create a new one. Notebooks are just a collection of cells. A cell can contain a code snippet (Add Code) or some formatted text using markdown syntax (Add Text). For code you can for instance use Python (*.ipynb) or C# (Kernel dropdown). You can use so-called “magic” in code, like %%upload to add a document to a container. Or %%sql to run a sql query against your container. See the exercise files for examples.