Data Isolation and Sharing in a Multitenant System - Part 2
This post is the second part of the article that shows how a well encapsulated data access implementation can play a key role in implementing a multitenant application. It is also a continuation of my post that outlines the additional benefits such a data access implementation may bring, multitenancy being one of them.
The previous post (the first part of the article) focuses on presenting the main two strategies for addressing Data Isolation and Sharing in a multitenant application:
- Separate Databases one for each tenant, or
- a Shared Database used by all tenants
It also shows how we can make an informed decision on which of these strategies fit best a certain context.
Now, we go into the implementation details for these strategies. Both are in C# and rely on a well encapsulated data access implementation, for example one like the iQuarc Data Access, which I've presented a while ago in this post. This post covers the first strategy and the next one the second.
We'll use the same example as in the first part: the multitenant application used in clinics for physiotherapy, where the clinics are the tenants, the patient related data represents tenant specific data and the diagnostics or common exercises related data is the tenant shared data.
Separate Databases Implementation
With this strategy we'll have one database for each tenant. If we'll have n tenants we'll have n databases. When a new tenant comes, we'll have to create a new database for it.
Being in a multitenant architecture, we have the same instance of the application serving all the users from all the tenants. Based on the tenant to which the current user belongs to, the application should connect to the database that corresponds for that tenant. To maximize the cost savings that the multitenancy may bring we'll have the same database schema (same version) for all the databases from all the tenants.
At the database schema level, we do not need to do any changes from a design of a non-multitenant application. Each database will have the tables both for the tenant specific data and for the tenant shared data. The tenant shared data will be duplicated in all the databases. Also the DataModel
assembly, which contains the DTOs that are mapped by EF to the tables of the database will suffer no changes. Moreover, having the same database schema for all tenants, means that we can have the same DataModel
for all. (for more details on why the EF DTOs would be generated in a separate assembly DataModel
from the data access implementation DataAccess
you can refer to the Separating Data Access Concern post)
The only place where we need to do intervene to implement multitenancy with this strategy is in the DataAccess
. More precisely where we create a connection to the database. This is the only place, because we are relying on a well encapsulated data access implementation, which assures us that only here a connection to the database is made. Otherwise, we would have to go through all the services or controllers that might create an entity framework DbContext
(or a connection to the database with other means) and deal with those places as well.
First we need a configuration of the connection strings which allows us to identify them for each tenant. We'll keep them in the config file.
One option is to have more connection strings and use a convention for the name
attribute based on the tenant_key
.Something like:
<connectionStrings>
<add name="Tenant1_PhysioDb" connectionString="..." />
<add name="Tenant2_PhysioDb" connectionString="..." />
...
</connectionStrings>
Another option is to have a template connection string and to have a convention for the database name based on the tenant_key
. The application will have to replace the <tenant_key>
with the key of the current tenant, before the connection string will be used. This would look like:
<connectionStrings>
<add name="PhysioDb"
connectionString="...;initial catalog=<tenant_key>_PhysioDb;..." />
</connectionStrings>
This option works fine if we deploy all the databases on the same server and we'll use the same credentials for the application to connect to any of the databases. Otherwise, the template gets too complicated.
Next we need to make the application to use the connection string based on the tenant of the current user. The Entity Framework (EF) DbContext
receives into the constructor the name of the connection string to use. We need to change the code generator, to make this constructor public for the class it generates for our specific data model (which inherits the DbContext
). It should look like:
public class PhysioEntities : DbContext
{
public PhysioEntities(string nameOrConnectionString)
: base(nameOrConnectionString)
{
}
public virtual DbSet<PatientFile> PatientFiles { get; set; }
public virtual DbSet<RehabSession> RehabSessions { get; set; }
...
}
Having a well encapsulated data access, it means that the PhysioEntities
class is hidden from the rest of the code. It is instantiated inside the data access and used by a Repository or an Unit of Work implementation. If we look into the iQuarc Data Access library code, we see that this is already well separated. The IDbContextFactory
abstraction takes the responsibility of constructing the context, and the Repository
or UnitOfWork
clases only use it as a DbContext
instance. (An application of the Separate construction and configuration from use principle.) Here is a simplified snippet of this code.
public interface IDbContextFactory
{
DbContext CreateContext();
}
public class Repository : IRepository, IDisposable
{
private IDbContextFactory contextFactory;
public Repository(IDbContextFactory contextFactory, IInterceptorsResolver interceptorsResolver, IDbContextUtilities contextUtilities)
{
this.contextFactory = contextFactory;
...
}
private DbContext context
protected DbContext Context
{
get
{
if (context == null)
context = contextFactory.CreateContext()
return contextBuilder.Context;
}
}
public IQueryable<T> GetEntities<T>() where T : class
{
return Context.Set<T>().AsNoTracking();
}
...
}
If we have this abstraction already done, it means that we don't even need to change the core code of the data access implementation to add the multitenanacy. We only need to come with another implementation for the IDbContextFactory
. If we don't, we just need to create the abstraction as above. It's simple, just delegate the new PhysioEntities()
code to the IDbContextFactory.CreateContext()
.
The multitenancy IDbContextFactory
implementation, will have to take the tenant_key
of the current user, to determine the connection string to use and to instantiate the PhysioEntities
with it. Here is a snippet of the code:
public class MultitenancyDbContextFactory : IDbContextFactory
{
public DbContext CreateContext()
{
string tenantKey = GetTenantKeyFromCurrentUser();
string connectionName = $"{tenantKey}_PhysioDb";
return new PhysioEntities(connectionName);
}
private string GetTenantKeyFromCurrentUser()
{
const string tenantKeyClaim = "tenant_key";
Claim tenantClaim = ClaimsPrincipal.Current.FindFirst(tenantKeyClaim);
return tenantClaim.Value;
}
}
Next we only need to register in the DI Container the MultitenancyDbContextFactory
class as the default implementation of the IDbContextFactory
, so the Repository
and UnitOfWork
use it, and we're done.
One thing to notice is that we take the tenant_key
from the claims of the current user. We expect that an authenticated user will have this claim, with the correspondent value for the tenant she belongs to. This should be assured by the authentication part, which is another important aspect in a multitenant application. We should have a separate Identity Provider Service which handles the identity management and user authentication. For each authenticated user it should add this tenant_key
claim, so our application can use it.
The Identity Management and Authentication in a multitenant application is not in the scope of these posts, but here is a good guidance on how to do this using OpenID Connect, and Azure AD as the Identity Provider Service. Identity Server is an alternative to Azure AD, which you can host yourself.
Another important aspect for this Data Isolation and Sharing strategy is to maintain the database schemas in sync for all the databases of all the tenants. The best way to do this is to rely on a tool which can generate SQL scripts that create the schema of the database. These should include the tables, view, stored procedure, system data and everything else. The SQL Server Data Tools (also known as Visual Studio Database Project) is a good tool for this. Once we generate the database schema, we can add it in git and use it as the single source of truth for the state of the database. With this, when we have a new tenant is easy to create its database. Even more, such a tool can also generate a diff T-SQL script which results from comparing the one in git with a deployed database. This diff T-SQL could update the schema of the deployed database. If we integrate this into the tools that do the deployments in all our environments (testing, acceptance, production) the overhead of having more databases and keeping their schema in sync can be very much reduced.
Closing
This covers the implementation of the Separate Databases strategy for data isolation and sharing in a multitenant application. In the next post, which is the third part of this article, I will detail the implementation of the other strategy we've discussed in the first part, the Shared Database strategy.