Data Isolation and Sharing in a Multitenant System - Part 3
This is the last part of the "Data Isolation and Sharing in a Multitenant System" article, which is also a continuation of my post that outlines the benefits of a well encapsulated data access.
In this post, we'll look in detail at the implementation of the Shared Database strategy, explained in the first part. We'll see how to refactor the database schema for multitenancy and then how to build a Lambda Expression to filter the tenant specific data.
With this strategy we'll have one database for all the tenants, which holds both tenant shared data and tenant specific data:
This means that all the tables that hold tenant specific data will have the TenantID
as the discriminant column. Here is a small view on the database diagram for our example, where with blue are the tenant specific tables.
If we're starting the project with multitenacy in mind, we'll add the TenantID
from the start, so we can skip the next section and go directly at the data access. Otherwise, if we are in the case where we add multitenancy at a later stage of the project, we need to refactor the database to add the TenantID
column to the tenant specific tables.
Refactor the Database Schema
Doing this is not as easy as it sounds, especially if we have data which we want to preserve.
Basically, we need to create the Tenants
table and to insert in it, a row for the first tenant. (We assume that all existent data belongs to this First Tenant).
GO
BEGIN TRAN
CREATE TABLE [dbo].[Tenants] (
[ID] INT IDENTITY (1, 1) NOT NULL,
[Name] NVARCHAR (50) NOT NULL,
[Key] NVARCHAR (50) NOT NULL,
CONSTRAINT [PK_Tenants] PRIMARY KEY CLUSTERED ([ID] ASC)
);
CREATE UNIQUE NONCLUSTERED INDEX [UK_Tenants_Key]
ON [dbo].[Tenants]([Key] ASC);
SET IDENTITY_INSERT [dbo].[Tenants] ON
INSERT INTO [dbo].[Tenants] ([ID], [Name], [Key])
VALUES (1, 'First Tenant', 'FirstTenant')
SET IDENTITY_INSERT [dbo].[Tenants] OFF
COMMIT TRAN
Then, we should alter each table with tenant specific data to add the TenantID
, plus a FK to the Tenants
table. Here, existent data complicates things. We should first add the TenantID
as nullable, then update all the rows with SET TenantID = 1
and then alter again the table to make the the TenantID
not nullable. Here is a simplified version of the script for the Patients
table:
GO
BEGIN TRAN
ALTER TABLE [dbo].[Patients]
ADD [TenantID] INT NULL
ALTER TABLE [dbo].[Patients] WITH CHECK
ADD CONSTRAINT [FK_Patients_Tenants] FOREIGN KEY ([TenantID]) REFERENCES [dbo].[Tenants] ([ID])
GO
UPDATE Patients SET [TenantID] = 1 WHERE [TenantID] IS NULL
ALTER TABLE [dbo].[Patients]
ALTER COLUMN [TenantID] INT NOT NULL
COMMIT TRAN
Based on this, we can create a small tool (it may be just another T-SQL script) which generates such scripts for all the tables that keep tenant specific data.
Another option to do the database schema refactor, is to use an existent schema compare tool. Such a tool may be the SQL Server Data Tools (aka Visual Studio Database Project).
Here are the steps we should do with it (you can follow the code changes for all these steps on my Code-Design-Training GitHub repository. For each step I've made a tag, so you can easily follow the progress):
- Create a database project with the current schema of the database (the one without multitenancy things) [Tag on GitHub:
mt-shared-step1
] - Edit the schema in the tool, to add the
Tenants
table, theTenantID
column and the foreign keys [Tag on GitHub:mt-shared-step2
]
- we can do this in Visual Studio, or we could publish the schema in a temporary database (which has no data), do the changes there (with SQL Management Studio) and then update back the schema from the Database project
- Generate the Publish Script from the Database project to our original database [Tag on GitHub:
mt-shared-step3
]
- this should be a script that prevents data loss and assures a safe way to refactor the database
- Edit the generated Publish Script to insert the row for the First Tenant, to set the FKs to it, and to make the FKs not nullable [Tag on GitHub:
mt-shared-step4
]
- for each table, the generated script creates temporary table with the new schema (which contains the
TenantID
column) and copies the existent data to it. Then it drops the original table and renames the temporary one as the original. So, we do the following:- search for all
CREATE TABLE
statements and make theTenantID NOT NULL
- add in all
INSERT INTO ... SELECT (..)
statements theTenantID
and its PK value1
- add the
INSERT INTO [dbo].[Tenants] ([ID], [Name], [Key]) VALUES (1, 'First Tenant', 'FirstTenant')
after theCREATE TABLE [dbo].[Tenants] (...
- search for all
- Run the Publish Script against the database to execute the refactor [Tag on GitHub:
mt-shared-step5
]
- after this we could update the database project with the changes we did to the database with this refactor
Isolate Tenant Data through Data Access
The next step is to make sure that tenant data is isolated. This means that each query or command we send to the database must have a WHERE TenantID = ...
clause appended. Here, having a well encapsulated data access implementation makes a huge difference, because it assures us that all the queries and commands go through it, so we can intercept them to append the WHERE
.
If we are using a Linq based data access, we can build a Lambda Expression for the TenantID = currentTenantId
filter and add a .Where()
with it, to IQueryable<T>
we're going to pass to the caller. If the data access is built on top of Entity Framework (EF) this simplifies to adding the .Where()
to the DbContext.DbSet<T>
property.
Lets take the iQuarc.DataAccess as an example for the data access implementation (for more on this, you can go to my older post "Separating Data Access Concern"). We should modify the Repository.GetEntities<T>()
and the UnitOfWork.GetEntities<T>()
functions, from:
public class Repository : IRepository, IDisposable
{
public IQueryable<T> GetEntities<T>() where T : class
{
return Context.Set<T>().AsNoTracking();
}
...
}
into
public class Repository : IRepository
{
public IQueryable<T> GetEntities<T>() where T : class
{
int tenantId = GetCurrentUserTenantId();
Expression<Func<T, bool>> condition = BuildWhereExpression<T>(tenantId);
return Context.Set<T>()
.Where(condition)
.AsNoTracking();
}
...
}
Actually, we don't have to modify the existent Repository
or UnitOfWork
classes. We could apply the Decorator pattern and write a MultitenancyRepository
which wraps the existent implementation and does the filtering based on the current tenant.
Lets look at the two helper functions we've introduced.
The GetCurrentUserTenantId()
is similar with the one from the previous post, where we've looked at separate databases strategy implementation.
private int GetCurrentUserTenantId()
{
const string tenantKeyClaim = "tenant_key";
Claim tenantClaim = ClaimsPrincipal.Current.FindFirst(tenantKeyClaim);
int ternanId = tenantsCache[tenantClaim.Value];
return ternanId;
}
It relays on the existent tenant_key
claim of the current user, which should be set by the authentication mechanism. Then, it uses a cache build from the Tenants
table to return the tenantId
which corresponds to the key. Nothing fancy.
The BuildWhereExpression<T>()
is a bit more complex. It needs to build a Binary Expression with the equals operator. The left operand should be the TenantID
property of the current entity and the right operand the tenantId
which is passed as a parameter. For the Patient
entity this whould be: patient.TenantID == tenantId
.
This means that all the tenant specific entities should have the TenantID
property mapped to the TenantID
column. So we should change the EF code generator to make these entities implement the ITenantEntity
interface
interface ITenantEntity
{
int TenantID { get; set; }
}
Having this, we can easily build the left operand and also we can know if the current entity is tenant specific or not. The operands are:
- left operand:
tenantIdSelector = entity => entity.TenantID;
- right operand:
tenantIdParam = () => tenantId;
so the entire function code goes like this:
private Expression<Func<T, bool>> BuildWhereExpression<T>(int tenantId)
{
if (IsTenantEntity<T>())
{
Expression<Func<ITenantEntity, int>> tenantIdSelector = entity => entity.TenantID;
Expression<Func<int>> tenantIdParam = () => tenantId;
var filterExpression= Expression.Lambda<Func<T, bool>>(
Expression.MakeBinary(ExpressionType.Equal,
Expression.Convert(tenantIdSelector.Body, typeof(int)),
tenantIdParam.Body),
tenantIdSelector.Parameters);
return filterExpression;
}
else
{
Expression<Func<T, bool>> trueExpression = entity => true;
return trueExpression;
}
}
If the current entity type is not implementing ITenantEntity
interface, then on the else
branch we will just build and return an expression which is always true
so the .Where()
we append has no effect for the tenant shared tables.
Now, if we do the same for the UnitOfWork
class we have consistently achieved data isolation for each tenant data at the data access level.
You can see a full running demo of this implementation if you go to my Code Design Training GitHub repository and open the SharedDbConsoleDemo sample.