From Apollo Federation to tRPC: A Deep Dive into a Successful API Migration and Its Performance Gains

Six months ago, the author was a proponent of GraphQL Federation, having invested significant resources into building a federated graph with Apollo, complete with schema stitching, gateway configuration, and a complex CI/CD pipeline. This sophisticated setup, while elegant on paper, proved to be a persistent source of deployment anxiety. The turning point arrived during a routine Friday afternoon deployment. A seemingly minor update to a field type in one service led to a cascading failure. Despite successful schema regeneration and passing tests, the mobile application began crashing due to outdated generated types on the iOS client. The root cause: a forgotten trigger for the client codegen process, a common vulnerability in GraphQL Federation deployments.
This incident spurred a deep dive into tRPC, a technology promising end-to-end type safety without the extensive "schema ceremony" associated with GraphQL. The absence of SDL files, codegen steps, and federation gateways, relying solely on TypeScript, was a compelling proposition. Despite initial skepticism due to the existing investment in Apollo, compelling production metrics from companies utilizing tRPC at scale convinced the author’s team to undertake a proof-of-concept migration. The subsequent account details this journey, including missteps, unexpected performance enhancements, and the architecture supporting 2.4 million daily requests with 99.97% uptime, offering a practical guide for production-level tRPC implementation.
The Technical Reality: Unpacking tRPC’s Advantages
Type Safety Without the Schema Tax
A critical, often unacknowledged, drawback of GraphQL Federation is its reliance on a schema that can become a singular point of failure. In contrast, tRPC leverages TypeScript types as the definitive contract, eliminating the need for an intermediate representation like SDL. This circumvents the complexities of maintaining schema registries and ensuring synchronization across diverse environments.
The traditional GraphQL Federation workflow for a type change involved a multi-step process: updating the GraphQL schema, running code generation, committing the generated files, updating resolver implementations, modifying client queries, running client code generation, and finally deploying both services, all while hoping for the best. tRPC streamlines this dramatically. A simple update to a TypeScript interface immediately propagates the change, and the client is instantly aware due to the shared type definition. This inherent synchronization significantly reduces the potential for errors stemming from discrepancies between client and server expectations.
Performance That Actually Matters
Rigorous production load tests comparing the previous Apollo Federation setup with the new tRPC implementation yielded striking results. Cold start performance, a crucial metric for serverless functions, saw a remarkable 75% improvement. Apollo Federation’s gateway introduced an average overhead of 180 milliseconds during cold starts, whereas tRPC reduced this to a mere 45 milliseconds, even before the core business logic was invoked.
Under sustained load, average response times plummeted from 38 milliseconds to 12 milliseconds. More importantly, the P95 and P99 latencies, which significantly impact user experience, experienced dramatic reductions. Apollo’s P95 latency was 85 milliseconds and P99 was 156 milliseconds. Post-migration to tRPC, these figures dropped to 28 milliseconds and 42 milliseconds, respectively. These tail latencies are often the culprits behind user frustration, particularly on mobile networks.

The impact on bundle size was equally profound. The Apollo Client setup, including Federation support, weighed in at 142KB gzipped. The tRPC implementation, coupled with React Query, reduced this to a mere 28KB, an 80% reduction. For users on slower connections, this translates to an initial page load improvement of 2-3 seconds, a difference readily noticeable by end-users.
Production Architecture: A Blueprint for Success
Monorepo Setup That Works
The current production setup is built upon a pnpm workspace monorepo. It utilizes Next.js 14 App Router for the frontend and tRPC for all API communications. This architecture comprises 12 microservices, each exposing its own tRPC router, and a gateway layer that consolidates them.
Each microservice independently manages its domain logic and database. For instance, the user service interacts with PostgreSQL, the product service leverages MongoDB for catalog data, and the order service utilizes Redis for session management. The inherent advantage of tRPC is that type safety permeates the entire stack. Any modification to a field type within the product service, for example, is immediately flagged by TypeScript across all consuming services.
Request Batching and Caching Strategy
A common concern raised about tRPC is its perceived lack of built-in request batching, a feature inherent to GraphQL. However, in practice, React Query’s batching capabilities are more than adequate for the vast majority of use cases and offer superior debuggability compared to GraphQL’s DataLoader patterns. The production environment currently handles 10,000 requests per minute with seamless batching functionality.
The caching layer is a sophisticated combination of Redis for shared data and React Query’s intelligent client-side cache. This dual approach is highly effective. React Query efficiently serves data from its cache instantaneously when available, while the server-side Redis cache optimizes the retrieval of cross-user data. This strategy has resulted in impressive cache hit rates, achieving 87% for product data and 92% for user preferences.
The Migration Process: A Measured Approach
Phase 1: Strangler Fig Pattern
The migration eschewed a high-risk "big-bang" rewrite, a strategy often leading to prolonged development cycles and a lack of demonstrable business value. Instead, the team adopted the strangler fig pattern. This approach involves running both systems in parallel, migrating endpoints incrementally, validating stability at each step, and then proceeding.
/filters:no_upscale()/articles/building-trpc-api-typescript/en/resources/205figure-1-1776246610439.jpg)
The initial phase focused on read-only endpoints that experienced high traffic but posed low business risk. These included user profile lookups and product catalog queries. This allowed for the collection of real-world performance and reliability data without jeopardizing critical write operations. For three weeks, both APIs operated concurrently, enabling a detailed comparison of error rates and latency metrics before the full cutover.
Phase 2: Critical Mutations
With confidence established in the read operations, the team then addressed mutations, encompassing critical functions such as order creation, payment processing, and inventory updates – operations that carry significant financial risk if they fail. It is in this phase that tRPC’s type safety truly demonstrated its value. In the previous GraphQL setup, developers frequently encountered issues with nullable fields, optional arguments, and schema drift. tRPC, by contrast, ensures that if the code compiles, an entire class of API contract errors is eliminated. This is not due to runtime enforcement but rather the prevention of silent divergence between client and server through stale codegen.
During the migration of mutation endpoints, only two runtime errors were identified. Both were traced back to database connection pooling issues, entirely unrelated to tRPC itself. It is important to note that this was a migration, not a greenfield development, meaning the business logic was already well-established. The initial GraphQL Federation rollout, which involved building both the API layer and the domain logic concurrently, had consequently experienced a higher incident count. The near-zero errors encountered during the tRPC mutation migration are a direct reflection of tRPC’s success in resolving the codegen synchronization problem.
Real Implementation: Code That Actually Ships
Server-Side Router Setup
The following is a simplified representation of the production router setup, omitting proprietary business logic but showcasing the actual patterns employed. This structure effectively manages authentication, request validation, error handling, and type merging across the microservices.
// apps/api/src/trpc.ts
import initTRPC, TRPCError from "@trpc/server";
import Context from "./context";
import superjson from "superjson";
const t = initTRPC.context<Context>().create(
transformer: superjson,
errorFormatter( shape, error )
return
...shape,
data:
...shape.data,
zodError:
error.cause instanceof ZodError ? error.cause.flatten() : null,
,
;
,
);
export const router = t.router;
export const publicProcedure = t.procedure;
// Authentication middleware
const isAuthed = t.middleware(async ( ctx, next ) =>
if (!ctx.session?.user)
throw new TRPCError( code: "UNAUTHORIZED" );
return next( ctx: session: ctx.session, userId: ctx.session.user.id );
);
export const protectedProcedure = t.procedure.use(isAuthed);
Client Setup with Next.js 14
The Next.js setup leverages the new App Router, utilizing React Server Components where appropriate. The following illustrates the production-ready client configuration, incorporating the HTTP batch link for automatic request batching.
// apps/web/src/trpc/client.ts
import createTRPCReact from "@trpc/react-query";
import httpBatchLink from "@trpc/client";
import type AppRouter from "@/server/routers/_app";
import superjson from "superjson";
export const trpc = createTRPCReact<AppRouter>();
export function createTRPCClient()
return trpc.createClient(
links: [
httpBatchLink(
url: process.env.NEXT_PUBLIC_API_URL + "/api/trpc",
transformer: superjson,
headers: async () =>
const session = await getSession();
return
authorization: session?.token ? `Bearer $session.token` : "",
;
,
),
],
);
Production Procedure Pattern
/filters:no_upscale()/articles/building-trpc-api-typescript/en/resources/153figure-2-1776246610439.jpg)
This pattern exemplifies the structure of actual procedures, incorporating input validation via Zod, database transactions, robust error handling, and telemetry – all essential components for a production environment.
// apps/api/src/routers/product.ts
import z from "zod";
import router, protectedProcedure from "../trpc";
import prisma from "../db";
import TRPCError from "@trpc/server";
export const productRouter = router(
getById: protectedProcedure
.input(z.object( id: z.string().uuid() ))
.query(async ( input, ctx ) =>
const product = await prisma.product.findUnique(
where: id: input.id ,
include: variants: true, reviews: true ,
);
if (!product)
throw new TRPCError(
code: "NOT_FOUND",
message: "Product not found",
);
return product;
),
create: protectedProcedure
.input(
z.object(
name: z.string().min(1).max(200),
description: z.string().max(5000),
price: z.number().positive(),
inventory: z.number().int().nonnegative(),
)
)
.mutation(async ( input, ctx ) =>
// Production includes Datadog tracing here
const product = await prisma.product.create(
data: ...input, createdBy: ctx.userId ,
);
// Invalidate cache
await redis.del(`product:$product.id`);
return product;
),
);
What We Learned: Honest Mistakes and Unexpected Wins
Mistakes We Made
The initial migration was not without its challenges. A significant early mistake involved attempting to replicate GraphQL’s field-level batching. The team dedicated two weeks to developing a custom batching system, only to discover that React Query’s built-in batching was perfectly sufficient. This realization led to the deletion of 800 lines of code, with a concurrent performance improvement due to the reduced overhead of the simpler approach.
Another misstep was over-validating on the client side. The team initially implemented Zod validation on both client and server, believing it would catch errors earlier. This led to inconsistencies between client and server validation, resulting in confusing error states. The current strategy is to validate once on the server, with the client relying on TypeScript types for accuracy.
Finally, the delayed implementation of proper monitoring proved to be a tactical error. tRPC’s exceptional speed masked performance regressions until they became significant. The current setup now includes Datadog APM on every procedure, tracking P50, P95, and P99 latencies, as well as error rates. The negligible overhead of this monitoring solution provides invaluable visibility.
Unexpected Wins
The most significant unforeseen benefit has been the dramatic increase in developer velocity. The team now ships features approximately 40% faster. This acceleration is attributed to the elimination of context-switching between SDL, codegen, and implementation. Developers write their procedures, TypeScript propagates the types, and the task is complete, obviating the need for schema synchronization meetings or waiting for codegen to run.
Onboarding new developers has also become considerably more efficient. Previously, new engineers required about a week to grasp the intricacies of the schema, gateway, and codegen pipeline before contributing. With tRPC, new hires are shipping code within their second day, assuming proficiency in TypeScript and Next.js.
The reduction in testing overhead has been another major win. An entire category of integration tests has been eliminated because TypeScript now guarantees type safety end-to-end. While business logic continues to be rigorously tested, the need for tests verifying client-side handling of specific fields has vanished, as compile-time types ensure correctness.
/filters:no_upscale()/articles/building-trpc-api-typescript/en/resources/127figure-3-1776246610439.jpg)
When NOT to Use tRPC
It is crucial to acknowledge that tRPC is not a universal solution. For public APIs intended for third-party consumption, GraphQL or REST are more appropriate due to their inherent support for schema documentation, versioning, and language-agnostic access. tRPC is exclusively a TypeScript-based solution.
Furthermore, tRPC is not suitable for mobile applications developed in Swift or Kotlin. While it excels in web applications where both client and server are controlled, it does not offer cross-platform type safety solutions like Protobuf or GraphQL.
Finally, if an existing GraphQL setup is functioning effectively and not presenting the pain points experienced by the author’s team, there is no compelling reason to migrate. The decision to switch was driven by the tangible costs in developer time and production stability incurred by GraphQL Federation. If these issues are not present, maintaining the current system is advisable.
Production Metrics: The Numbers That Matter
The following table presents actual production data, comparing the final month of Apollo Federation usage with the first month following the complete tRPC migration. These figures are derived from Datadog APM, not from synthetic benchmarks.
| Metric | Apollo Federation | tRPC |
|---|---|---|
| Average Response Time | 38ms | 12ms (68% faster) |
| P95 Latency | 85ms | 28ms (67% faster) |
| Cold Start Time | 180ms | 45ms (75% faster) |
| Client Bundle Size | 142KB gzipped | 28KB (80% smaller) |
| Production Bugs/Month | 88 (avg over 3 months) | 7 (89% reduction) |
| CI/CD Pipeline Time | 8.4 minutes | 5.1 minutes (40% faster) |
These metrics reflect the performance of a production environment handling 2.4 million requests daily across 12 microservices. The reduction in bugs is particularly significant, with an 89% decrease in production incidents directly correlating to less time spent on firefighting and more on feature development.
The Bottom Line: Would We Do It Again?
The answer is an unequivocal yes. The six-week migration effort has yielded returns far exceeding the initial investment, primarily through reduced bug fixing, accelerated feature development, and an enhanced developer experience. The team now ships features 40% faster, users benefit from improved performance, and the overall operational stability has increased significantly, stemming from the elimination of an entire class of runtime problems due to improved type representation.
It is crucial to understand that tRPC cannot resolve organizational issues. If a team struggles with GraphQL due to poor processes or unclear ownership, switching to tRPC will not provide an automatic fix. However, tRPC effectively addresses problems related to schema synchronization, type generation, and API contract drift.
For organizations operating a TypeScript monorepo and experiencing difficulties with the complexity of GraphQL Federation, tRPC warrants serious consideration. A phased approach, starting with the migration of a single service, measuring the impact, and then expanding, is recommended. This incremental strategy was instrumental in the success of this migration and fundamentally reshaped how APIs are built within the organization.
The complete code for the production setup, encompassing the monorepo structure, router configurations, and testing patterns, is publicly available. This battle-tested, real-world production code, handling 2.4 million requests daily, can serve as a valuable starting point for similar migration initiatives.




