Data Isolation and Sharing in a Multitenant System - Part 3

This is the last part of the "Data Isolation and Sharing in a Multitenant System" article, which is also a continuation of my post that outlines the benefits of a well encapsulated data access. In this post, we'll look in detail at the implementation of the

Data Isolation and Sharing in a Multitenant System - Part 3

This is the last part of the "Data Isolation and Sharing in a Multitenant System" article, which is also a continuation of my post that outlines the benefits of a well encapsulated data access.

In this post, we'll look in detail at the implementation of the Shared Database strategy, explained in the first part. We'll see how to refactor the database schema for multitenancy and then how to build a Lambda Expression to filter the tenant specific data.

With this strategy we'll have one database for all the tenants, which holds both tenant shared data and tenant specific data:

Multitenancy - Shared Database

This means that all the tables that hold tenant specific data will have the TenantID as the discriminant column. Here is a small view on the database diagram for our example, where with blue are the tenant specific tables.

Multitenancy - Database Diagram

If we're starting the project with multitenacy in mind, we'll add the TenantID from the start, so we can skip the next section and go directly at the data access. Otherwise, if we are in the case where we add multitenancy at a later stage of the project, we need to refactor the database to add the TenantID column to the tenant specific tables.

Refactor the Database Schema

Doing this is not as easy as it sounds, especially if we have data which we want to preserve.

Basically, we need to create the Tenants table and to insert in it, a row for the first tenant. (We assume that all existent data belongs to this First Tenant).

GO
BEGIN TRAN

CREATE TABLE [dbo].[Tenants] (
    [ID]   INT           IDENTITY (1, 1) NOT NULL,
    [Name] NVARCHAR (50) NOT NULL,
    [Key]  NVARCHAR (50) NOT NULL,
    CONSTRAINT [PK_Tenants] PRIMARY KEY CLUSTERED ([ID] ASC)
);

CREATE UNIQUE NONCLUSTERED INDEX [UK_Tenants_Key]
    ON [dbo].[Tenants]([Key] ASC);

SET IDENTITY_INSERT [dbo].[Tenants] ON
INSERT INTO [dbo].[Tenants] ([ID], [Name], [Key]) 
	VALUES (1, 'First Tenant', 'FirstTenant')
SET IDENTITY_INSERT [dbo].[Tenants] OFF

COMMIT TRAN

Then, we should alter each table with tenant specific data to add the TenantID, plus a FK to the Tenants table. Here, existent data complicates things. We should first add the TenantID as nullable, then update all the rows with SET TenantID = 1 and then alter again the table to make the the TenantID not nullable. Here is a simplified version of the script for the Patients table:

GO
BEGIN TRAN

ALTER TABLE [dbo].[Patients] 
	ADD [TenantID] INT NULL

ALTER TABLE [dbo].[Patients] WITH CHECK
    ADD CONSTRAINT [FK_Patients_Tenants] FOREIGN KEY ([TenantID]) REFERENCES [dbo].[Tenants] ([ID])

GO
UPDATE Patients SET [TenantID] = 1 WHERE [TenantID] IS NULL

ALTER TABLE [dbo].[Patients] 
	ALTER COLUMN [TenantID] INT NOT NULL

COMMIT TRAN

Based on this, we can create a small tool (it may be just another T-SQL script) which generates such scripts for all the tables that keep tenant specific data.

Another option to do the database schema refactor, is to use an existent schema compare tool. Such a tool may be the SQL Server Data Tools (aka Visual Studio Database Project).

Here are the steps we should do with it (you can follow the code changes for all these steps on my Code-Design-Training GitHub repository. For each step I've made a tag, so you can easily follow the progress):

  1. Create a database project with the current schema of the database (the one without multitenancy things) [Tag on GitHub: mt-shared-step1]
  2. Edit the schema in the tool, to add the Tenants table, the TenantID column and the foreign keys [Tag on GitHub: mt-shared-step2]
  • we can do this in Visual Studio, or we could publish the schema in a temporary database (which has no data), do the changes there (with SQL Management Studio) and then update back the schema from the Database project
  1. Generate the Publish Script from the Database project to our original database [Tag on GitHub: mt-shared-step3]
  • this should be a script that prevents data loss and assures a safe way to refactor the database
  1. Edit the generated Publish Script to insert the row for the First Tenant, to set the FKs to it, and to make the FKs not nullable [Tag on GitHub: mt-shared-step4]
  • for each table, the generated script creates temporary table with the new schema (which contains the TenantID column) and copies the existent data to it. Then it drops the original table and renames the temporary one as the original. So, we do the following:
    • search for all CREATE TABLE statements and make the TenantID NOT NULL
    • add in all INSERT INTO ... SELECT (..) statements the TenantID and its PK value 1
    • add the INSERT INTO [dbo].[Tenants] ([ID], [Name], [Key]) VALUES (1, 'First Tenant', 'FirstTenant') after the CREATE TABLE [dbo].[Tenants] (...
  1. Run the Publish Script against the database to execute the refactor [Tag on GitHub: mt-shared-step5]
  • after this we could update the database project with the changes we did to the database with this refactor

Isolate Tenant Data through Data Access

The next step is to make sure that tenant data is isolated. This means that each query or command we send to the database must have a WHERE TenantID = ... clause appended. Here, having a well encapsulated data access implementation makes a huge difference, because it assures us that all the queries and commands go through it, so we can intercept them to append the WHERE.

If we are using a Linq based data access, we can build a Lambda Expression for the TenantID = currentTenantId filter and add a .Where() with it, to IQueryable<T> we're going to pass to the caller. If the data access is built on top of Entity Framework (EF) this simplifies to adding the .Where() to the DbContext.DbSet<T> property.

Lets take the iQuarc.DataAccess as an example for the data access implementation (for more on this, you can go to my older post "Separating Data Access Concern"). We should modify the Repository.GetEntities<T>() and the UnitOfWork.GetEntities<T>() functions, from:

public class Repository : IRepository, IDisposable
{
  public IQueryable<T> GetEntities<T>() where T : class
  {
    return Context.Set<T>().AsNoTracking();
  }
...
}

into

public class Repository : IRepository
{
     public IQueryable<T> GetEntities<T>() where T : class
     {
         int tenantId = GetCurrentUserTenantId();
         Expression<Func<T, bool>> condition = BuildWhereExpression<T>(tenantId);

         return Context.Set<T>()
                  .Where(condition)
                  .AsNoTracking();
     }
...
}

Actually, we don't have to modify the existent Repository or UnitOfWork classes. We could apply the Decorator pattern and write a MultitenancyRepository which wraps the existent implementation and does the filtering based on the current tenant.

Lets look at the two helper functions we've introduced.

The GetCurrentUserTenantId() is similar with the one from the previous post, where we've looked at separate databases strategy implementation.

private int GetCurrentUserTenantId()
{
    const string tenantKeyClaim = "tenant_key";
    Claim tenantClaim = ClaimsPrincipal.Current.FindFirst(tenantKeyClaim);
    int ternanId = tenantsCache[tenantClaim.Value];
    return ternanId;
}

It relays on the existent tenant_key claim of the current user, which should be set by the authentication mechanism. Then, it uses a cache build from the Tenants table to return the tenantId which corresponds to the key. Nothing fancy.

The BuildWhereExpression<T>()is a bit more complex. It needs to build a Binary Expression with the equals operator. The left operand should be the TenantID property of the current entity and the right operand the tenantId which is passed as a parameter. For the Patient entity this whould be: patient.TenantID == tenantId.

This means that all the tenant specific entities should have the TenantID property mapped to the TenantID column. So we should change the EF code generator to make these entities implement the ITenantEntity interface

interface ITenantEntity
{
    int TenantID { get; set; }
}

Having this, we can easily build the left operand and also we can know if the current entity is tenant specific or not. The operands are:

  • left operand: tenantIdSelector = entity => entity.TenantID;
  • right operand: tenantIdParam = () => tenantId;

so the entire function code goes like this:

private Expression<Func<T, bool>> BuildWhereExpression<T>(int tenantId)
{
    if (IsTenantEntity<T>())
    {
        Expression<Func<ITenantEntity, int>> tenantIdSelector = entity => entity.TenantID;
        Expression<Func<int>> tenantIdParam = () => tenantId;

        var filterExpression= Expression.Lambda<Func<T, bool>>(
            Expression.MakeBinary(ExpressionType.Equal,
                Expression.Convert(tenantIdSelector.Body, typeof(int)),
                tenantIdParam.Body),
            tenantIdSelector.Parameters);

        return filterExpression;
    }
    else
    {
        Expression<Func<T, bool>> trueExpression = entity => true;
        return trueExpression;
    }
}

If the current entity type is not implementing ITenantEntity interface, then on the else branch we will just build and return an expression which is always true so the .Where() we append has no effect for the tenant shared tables.

Now, if we do the same for the UnitOfWork class we have consistently achieved data isolation for each tenant data at the data access level.

You can see a full running demo of this implementation if you go to my Code Design Training GitHub repository and open the SharedDbConsoleDemo sample.

More discussions multitenancy design are part of my Code Design Training
Featured image credit: alinoubigh via 123RF Stock Photo
Code Design
Blog

We share our insights and experiences from working on client projects and teaching developers.

Our articles blend technical expertise with real-world challenges, offering a unique perspective on Code Design.

For us, Code Design means structuring code to ensure predictability and making it easy to manage and adapt long after it's written.