Why I Built a CIAM Engine From Scratch

Everyone told me not to do this. "Use Auth0. Use Firebase Auth. Use Clerk. Don't roll your own identity." I've given that same advice to clients for years. And I stand by it — for most teams, building your own identity system is a mistake. So why did I do it?

Because Melhousen Solutions doesn't have one product. We have six. Each product has different identity requirements: different user types, different permission models, different tenant isolation boundaries, different subscription tiers that gate feature access. After evaluating every major CIAM provider, I realized that what I needed wasn't an authentication service — it was an identity platform that could be embedded into every product we build, with a consistent security posture across all of them.

Auth0 could handle the authentication. But the authorization model — multi-tenant RBAC with hierarchical organizations, subscription-gated feature flags, cross-product identity federation, and tenant-scoped audit logging — that was going to be custom code regardless. At that point, the identity provider is just the token issuer, and you've built 80% of the identity system yourself anyway.

The decision criteria: Build your own identity system only if (1) you have more than three products sharing a common identity, (2) your authorization model is more complex than RBAC can express, and (3) you have the engineering discipline to maintain it for years. If any of those aren't true, use a managed provider.

The Tenant Isolation Decision

The single most consequential architectural decision in any multi-tenant system is how you isolate tenants. There are three models, and each has real tradeoffs:

Model A

Database-per-Tenant

Model B

Schema-per-Tenant

Model C

Row-Level Isolation

Model D

Hybrid (Tallawah)

Database-per-tenant gives you the strongest isolation. Each tenant's data is physically separated. Backup, restore, and compliance are straightforward. But operational complexity scales linearly with tenant count. At 500 tenants, you're managing 500 databases. Connection pooling becomes a nightmare. Schema migrations require orchestration across every database.

Schema-per-tenant is the middle ground. One database engine, one connection string, but each tenant gets their own schema with their own tables. You get logical isolation without the operational overhead of separate databases. But most ORMs don't handle dynamic schema resolution well, and you still have the migration orchestration problem.

Row-level isolation is the simplest to operate. One database, one schema, a tenant_id column on every table, enforced by row-level security policies. This is what most SaaS platforms use. It's efficient, it scales, and it's the easiest to mess up — because a single missing WHERE clause leaks data across tenants.

What Tallawah Uses

Tallawah uses a hybrid model. The identity core — user records, credentials, sessions, audit logs — uses row-level isolation with enforced tenant context in every query. The authorization layer — roles, permissions, organization hierarchies — uses tenant-scoped configuration stored in a separate data plane that can be cached aggressively.

Why hybrid? Because identity data and authorization data have fundamentally different access patterns. Identity data is write-heavy during authentication (session creation, token refresh, audit logging) and needs transactional consistency. Authorization data is read-heavy during request processing (permission checks on every API call) and needs sub-millisecond latency. Optimizing for both patterns in a single storage model forces compromises I didn't want to make.

The trap I almost fell into: I initially considered building tenant isolation using application-level enforcement only — every query including a tenant_id filter in the application code. This works until someone writes a raw SQL query for a migration script and forgets the filter. Lesson learned: enforce tenant isolation at the database level with row-level security policies, and at the application level with middleware. Defense in depth applies to your own codebase, not just external threats.

Authentication Architecture That Scales

Tallawah's authentication flow is deliberately boring. I don't mean that dismissively — I mean that authentication is not where you innovate. You innovate on what happens after authentication. The auth flow itself should be a solved problem executed correctly.

The authentication stack:

  • Primary: Enterprise IdP federation — for enterprise tenants, Tallawah delegates authentication to the tenant's own identity provider. We receive a standardized assertion, validate it, and map the external identity to a Tallawah user record. No passwords stored. No credential management. The customer's IT department manages their users, and we trust their identity provider.
  • Secondary: Local credentials — for tenants that don't have an enterprise IdP, Tallawah manages credentials directly. Passwords are hashed with a memory-hard algorithm resistant to GPU cracking. Configurable account lockout with exponential backoff. No password complexity rules — length minimums only, per NIST 800-63B guidance.
  • MFA: App-based + hardware — Authenticator app as the baseline, hardware security keys for admin accounts. No SMS-based MFA. I've seen too many SIM-swap attacks to trust phone-based verification for privileged accounts.

The RBAC Model Nobody Warns You About

Role-Based Access Control sounds simple until you need to implement it across multiple products with different permission semantics. A "Manager" role in a monitoring product (ConsoleSentinel) means "can view all dashboards and configure alerts." A "Manager" role in a cybersecurity product (ImaraForge) means "can approve vulnerability remediation workflows and access sensitive scan results." Same role name, completely different permission sets.

Tallawah solves this with a three-layer authorization model:

01

Platform Roles

Global roles that span all products: Platform Admin, Tenant Admin, Billing Manager. These control access to the identity platform itself — user management, organization settings, subscription management.

02

Product Roles

Scoped roles defined per product: ConsoleSentinel Operator, ImaraForge Analyst, SunSeedKitchen Subscriber. Each product registers its role definitions with Tallawah, including the specific permissions each role grants within that product's domain.

03

Resource Permissions

Fine-grained permissions attached to specific resources: "can edit dashboard X," "can view report Y," "can execute workflow Z." These are evaluated at runtime by the product, using claims in the Tallawah-issued token.

A single user might be a Platform Admin, an ImaraForge Analyst, and have resource-level permissions on three specific ConsoleSentinel dashboards. Tallawah resolves all of these into a single JWT with scoped claims. The receiving product doesn't need to query Tallawah at request time — the token contains everything it needs for authorization decisions.

Token Architecture and Session Management

Tokens are the currency of distributed identity. Get the token architecture wrong and you'll spend six months debugging subtle authorization failures, session leaks, and race conditions.

Tallawah uses a dual-token model:

  • Short-lived access token — contains user identity, tenant context, platform roles, and product-scoped claims. Deliberately short TTL. Never stored in localStorage — transmitted in HTTP-only secure cookies or Authorization headers from secure token storage. Signed with asymmetric cryptography using tenant-scoped signing keys.
  • Longer-lived refresh token — stored server-side in the session store, indexed by a hash of the token value. Supports rotation: every refresh request issues a new refresh token and invalidates the previous one. If a refresh token is used twice (indicating theft), the entire session family is invalidated.

Why tenant-scoped signing keys? Because if a single global signing key is compromised, every tenant's tokens are compromised. With per-tenant keys stored in a managed key vault, a key compromise affects one tenant. You rotate that one key, invalidate that tenant's sessions, and the blast radius is contained.

Performance reality: Asymmetric token validation with a cached public key is fast — sub-millisecond per request. For most products this is negligible. For high-throughput monitoring endpoints, we added a validated-token cache with a short TTL. The tradeoff: a revoked token could remain valid briefly. Acceptable for read-only monitoring data. Not acceptable for administrative actions, which always validate against the token store directly.

Performance Tradeoffs at Scale

Every identity system makes performance tradeoffs. Here are the ones I made and why:

  • Permission resolution is pre-computed, not runtime-queried. When a user's roles change, Tallawah recomputes their effective permissions and caches the result. API calls check the cache, not the permission tables. This means permission changes aren't instant — there's a short, configurable propagation window. I chose consistency over immediacy because permission changes are rare events, and sub-second authorization checks are a constant requirement.
  • Session storage uses a purpose-built cache, not the primary database. Sessions are hot, ephemeral data. They don't belong in your relational database competing for connections with your identity queries. A dedicated cache layer gives us sub-millisecond session lookups and automatic TTL expiration without a cleanup job.
  • Audit logs are write-optimized and eventually consistent. Every authentication event, permission change, and administrative action generates an audit log entry. These are written to an append-only event stream and materialized into queryable storage asynchronously. The audit trail is complete but not instantly queryable — there's a short propagation window. For compliance queries this is invisible. For real-time security monitoring, we tap the event stream directly.

What I'd Do Differently

Building Tallawah took eighteen months from first commit to production deployment across all six products. Here's what I'd change if I started over:

  • Start with the SDK, not the platform. I built the Tallawah platform first and the client SDKs second. This meant every product team was writing custom integration code against raw APIs for six months while the SDKs caught up. Next time I'd build the TypeScript and Python SDKs first, use them as the design contract, and build the platform to satisfy the SDK's expectations.
  • Define the audit schema before writing any code. Our audit log schema evolved organically, which means early events have different field names and structures than later events. Migrating historical audit data to match the current schema was painful. Define the audit contract upfront — it's the one thing in an identity system that's hardest to change after the fact.
  • Invest in tenant provisioning automation earlier. For the first year, onboarding a new tenant was a semi-manual process: run a script, verify the database, configure the signing keys, test the federation endpoint. Now it's fully automated and takes 90 seconds. That automation should have been built in month three, not month twelve.

Final Thoughts

Identity is infrastructure. Not in the "it's boring" sense — in the "everything depends on it" sense. Every product we build at Melhousen routes through Tallawah for authentication, authorization, tenant isolation, and audit. When Tallawah is down, everything is down. That constraint forced a level of engineering discipline — reliability, testing, operational monitoring — that has made every system it touches more robust.

I don't recommend building your own CIAM engine. I recommend understanding your identity requirements deeply enough to know whether you need to. For most teams, Auth0 or Entra External ID will get you 90% of the way. If you're in the 10% that needs the other 10%, prepare for a multi-year investment in one of the most consequential systems you'll ever build.

— Jamel A. Housen, Melhousen Solutions