git push --force

Request Mirroring and Shadow Testing with Caddy

Surprisingly there wasn't already a module for this, so I made one!

https://github.com/dotvezz/caddy-mirror

I’ve been using Caddy for a long time, both professionally and for self-hosted hobby stuff (like this website!). It’s a workhorse project that does a lot more than what’s advertised on the website (but the “ultimate server” claim on the site might roll some eyes).

With easy automatic HTTPS, reverse-proxy features, basic load-balancing, and caching functionality, Caddy is seen at or near the edge at a lot of websites. When you consider the flexible routing, JWT validation module and at least two different zero-downtime ways to deploy configuration changes, all of a sudden it starts looking like a quick and lightweight API Gateway too.

So basically what I’m saying is, “hey look, reverse proxy does reverse proxy things!” and that’s not especially exciting. But imagine my surprise when I realized there’s a common, useful reverse proxy feature that Caddy doesn’t have. That’d be request mirroring, which is provided by HAProxy, NGINX, Envoy, Varnish, Traefik, and probably a list of other proxies too. It’s even been discussed in Caddy’s Github issues at least a few times (#4211 and #6706 if you’re curious). All that considered, this seemed like a fun way to contribute a feature to one of my favorite open source projects!

Request Mirroring and Shadow Testing

Depending on your experience with these terms, there could be a bit of fuzziness and overlap; this is one of those areas where terms aren’t consistently defined in industry. For the sake of simplicity here, in this post (and indeed, when considering support and features for the caddy-mirror project) I’ll be using some simplified definitions.

Request Mirroring involves duplicating an incoming request and sending it to an additional handler, such as another webserver or cluster. This could be done for all requests, a fraction of requests, just for requests on a specific route, or some other configuration. It doesn’t imply anything specific is being done with the secondary handler’s response, and in fact most implementations of mirroring throw away the secondary response.

This kind of feature is helpful for throwing a bunch of production-like traffic at a test environment for observation, especially if a synthetic load doesn’t safely cover edge cases or reflect real-world user patterns. I’ve heard some people describe request mirroring as “dark launching.” Not to drag this post into semantics, but I’d suggest that it’s a possible component of a dark launch strategy, not a standalone application of dark launching.

Typically, proxies which implement request mirroring make sure to follow a few common rules:

For illustration, we can take a high-level perspective and draw a simple diagram which puts Caddy in the middle of this.

Request Mirroring with Shadow Test

Note that Caddy remains active even after fully sending the response downstream to the client, as it waits for the secondary request to complete. Also keep in mind this is an illustration of the request decoupling and doesn’t dictate the real-world order of operations; in practice, it’s entirely possible for the secondary request to complete before the primary request. But even if that happens, it’s not a race - Caddy should only serve the primary backend’s response downstream.

Shadow Testing involves silently sending a request to a “shadow” handler, and usually comparing its response against the primary to check for correctness. That response inspection bit adds some overhead, but makes the capability useful in a lot of situations where mirroring falls short. The response from the shadow handler is only inspected, not served downstream.

If you’ve rebuilt a service and want to ensure full compatibility with clients, you can set up a shadow test to compare the responses using real traffic. Or maybe you’ve refactored an API endpoint but don’t have a test covering regressions on that operation. I’d love to write up a deep-dive on shadow testing itself some day, but for now I’ll just point you at this little article from Microsoft that goes into a bit more detail than what I’ve covered here.

After reading those definitions, you may already be thinking they sound like a match made in heaven! And I reckon you’d be right: Request Mirroring serves as a great mechanism to power Shadow Testing. And while that’s exactly what I’m setting out to enable, it’s still important to note that my way is not the only way. Mirroring is often done without Shadowing, and Shadowing is often done without Mirroring.

Making it work in Caddy

My goal with the caddy-mirror project is to provide both features, with the shadow feature powered by mirroring. At this early stage, the shadow testing part is very much a work in progress. As a matter of fact, I’d love to get some suggestions or feedback on that part!

But first, let’s look at the part that’s a bit more established. Here’s a simple Caddyfile to demonstrate how we can set up mirroring alone (we’ll add some shadow testing bits in another example file further down the page). Let’s imagine we’re responsible for serving static assets (images, css files, stuff like that) for bigcompany.com. We’re working on some major updates to the architecture, and we want to understand how it behaves under load before we roll it out.

Let’s make a quick Caddyfile that forwards all static asset requests to the normal backend, and also mirrors them to our staging environment.

assets.bigcompany.com {
  mirror {
      primary {
          reverse_proxy prod.assets.internal.bigcompany.com
      }
      secondary {
          reverse_proxy staging.assets.internal.bigcompany.com
      }
  }
}

Performance Metrics

It’s helpful to measure performance for the primary and secondary backends for comparison. After all, maybe the change you want to test on the secondary is a performance optimization. The caddy-mirror project does provide a way to record some basic performance metrics like time-to-first-byte and full response time from both the primary and secondary. You can use these to compare the performance of either backend, however there are limitations to consider. Typically a TTFB measurement is taken from the client, but here we are taking it on a proxy. Depending on where your caddy instance lives, that could eliminate something like 99% of your “real” request time.

That’s why, if performance measurement is important to you (as it probably should be), these metrics should only be a supplemental component of your observability strategy.

With all that out of the way, we can modify the above Caddyfile to enable metrics!

{
    metrics
}

assets.bigcompany.com {
  mirror {
      metrics mirror_all # We added this here!
      primary {
          reverse_proxy prod.assets.internal.bigcompany.com
      }
      secondary {
          reverse_proxy staging.assets.internal.bigcompany.com
      }
  }
}

This sets up the following set of Prometheus histograms (subject to change! I would love to hear requests or feedback here!) with the namespace mirror_all as defined in the Caddyfile mirror_metrics line.

Shadow Testing

We’ve finally reached the point where the project in its current state starts to show its immaturity. There are a few areas that I want to address before tagging a proper v1 release.

  1. Performance: As of writing, the project buffers responses to compare, then sends downstream after buffering. This introduces memory overhead and can impact response timing.
    • For an API or something else where response bodies aren’t big (say under ~200kb), this isn’t likely to be a meaningful problem. But for a static asset server slinging video files, you might want to wait for a v1 release.
    • It may be a good idea to add a configurable “maximum comparable size” value as well, just to avoid running response comparisons on big video files etc.
  2. Compressed responses: Right now there’s no support for automatically decompressing responses.
    • That said, I’m not entirely sure that it’s a good feature to bake into this module. Caddy already provides several ways to manage response compression both upstream and downstream. It may be best to just cover those options in my docs.
  3. Shadow Test Reporting
    • Right now, the only real “reporting” feature for the shadow test is logging. That… works but it’s lazy.
    • I’d really like to learn what makes sense for possible users, if there’s any feedback then by all means throw it at me!

But let’s get to the Caddyfile example. If we add compare_body to the mirror directive, it enables 1:1 comparison of the response body.

assets.bigcompany.com {
  mirror {
      compare_body # This is the operative part.
      primary {
          reverse_proxy prod.assets.internal.bigcompany.com
      }
      secondary {
          reverse_proxy staging.assets.internal.bigcompany.com
      }
  }
}

There’s an additional feature available that uses itchyny/gojq. If you’re working on an API and don’t necessarily want to compare the full response body. We can add compare_jq with a list of jq queries. Each of these jq queries will be applied to both the primary and secondary response, and the results will be compared. If your new API includes additional metadata in the response resource, but the original .data is meant to be backwards-compatible, you might consider using something like this example.

assets.bigcompany.com {
  mirror {
      compare_jq .data 
      primary {
          reverse_proxy prod.assets.internal.bigcompany.com
      }
      secondary {
          reverse_proxy staging.assets.internal.bigcompany.com
      }
  }
}

In both cases, any mismatches will be flagged in Caddy’s logs.

Technical Challenges

There are a lot of interesting things that you run into when working on a project like this. Between the quirks of Go, Caddy, and HTTP in general, a lot of unintuitive things can happen.

Request Body: Currently, caddy-mirror buffers request bodies to multiplex the request across two handlers. We use a sync.Pool to minimize the allocation overhead, but I want to implement lower-overhead request multiplexing as the project matures.

One of the things I love about Caddy is that it mostly (more on that below…) lets you work with requests as you would in a normal Go project; you get to work with normal http.Request, http.ResponseWriter, and other familiar types instead of needing to learn some custom abstraction.

For a normal HTTP request that has a body attached to it, (an API PATCH, a form submission, etc), that body is essentially streamed once from the client to the server. In Go, the request.Body implements io.ReadCloser, which generally abstracts that stream into a read-once value; if you need to read it multiple times, you need to capture it in a reusable way. That means if we want to send the same request to two different handlers, they can’t both read from the same request.Body (without causing big problems, at least). We can copy it to a bytes.Buffer (which is exactly what we d in caddy-mirror) to reuse the body.

I said “generally” up there because in both Caddy and normal Go programs, it’s possible for a higher link in a middleware chain to replace the original request.Body with some other value that implements io.ReadCloser, so technically we might be receiving a value that can be reused. That presents optimization opportunities, but it’s also an edge case to tread carefully around.

Response Body: Like how we buffer request bodies if they’re present, we also buffer response bodies if shadow is enabled. sync.Pool is used to reduce overhead, but there are opportunities to reduce this further here too.

As request.Body is an io.ReadCloser, http.ResponseWriter is an extension of io.Writer. If our middleware isn’t concerned about the response body, we can normally pass the http.ResponseWriter directly upstream without modification. But if we are concerned about the body, such as for a shadow test, then we need to capture and stream it back downstream. This introduces overhead that needs to be safely optimized away.

Caddy Foot-Guns

As much as I like to recommend Caddy, nothing is perfect. Over the years, I’ve run into a few things that complicate life when offloading almost anything to a goroutine in Caddy MiddlewareHandlers. I’ll drop a short list of foot-guns Caddy hands us, and try to share some examples of what I think are valid features which don’t jive with how Caddy does some things.

Before I get into it, this isn’t a complaining session about Caddy. If anything, it’s a complaining session about context.Context. If you’ve worked with me for any amount of time, you’ve probably heard me preach that context.Context often introduces debugging challenges and unintuitive tight couplings. Both of the foot-guns here come down to how the request context is handled in Caddy.

Canceled Contexts are the most straightforward example, and the one where Caddy is clearly doing the right thing. The context.Context for every request passed into your MiddlewareHandler’s ServeHTTP method can be canceled. When Caddy is finished sending results down to the client, it executes a deferred cancel. It’s not hard to imagine the leaky connection or leaky goroutine issues that this helps mitigate. But what if we want to keep a goroutine processing a request after we’ve already responded request? To briefly digress from mirroring as a feature, let’s look at another thing you might want a proxy to do: caching!

Say we’ve received a request GET /images/logo.svg. We have a cached result with a Cache-Control header max-age value of 600, but the age of our cached response is 601 seconds. In a simple setup, that is a simple cache miss, so we need to proxy the request to origin. But if the cached Cache-Control header has a sufficient max-stale value, we can serve the stale value immediately, and do a refresh in the background. If our cache is compliant with RFC 9211, then we can send back a Cache-Status header saying that we did that: Cache-Status: my-cache; hit; ttl=-1; fwd=stale.

But when we reply to the request while doing that background refresh, in a Caddy MiddlewareHandler, our request context will be canceled early! So a simple clone of the request isn’t safe for this usecase, we need to take some extra steps.

package cache

import (
	"context"
	"net/http"
	"time"
)

func BackgroundRefresh(r http.Request) {
	// We can block existing context cancellation like this.
	noCancelCtx := context.WithoutCancel(r.Context())
	
	// But blocking context cancellation is _sorta_ like playing with fire. We need to ensure that we clean up after
	// ourselves, so we add a timeout and a new deferred cancel.
	backgroundCtx, cancel := context.WithTimeout(noCancelCtx, time.Second*30)
	defer cancel()

	// Let's clone the request and give it the new context to be safe
	backgroundRequest := r.Clone(backgroundCtx)
	
	// . . . do stuff with the new backgroundRequest
}

Back to mirroring now. Remember how we said above that we need to respond to the original request without letting the secondary backend block the response? Essentially the same thing is happening here; instead of a stale-but-selectable cache value, if our primary backend returns before the secondary is finished, we need to continue processing the backend request. That’s exactly how we handle the mirrored request’s context cancellation for caddy-mirror as well.

Caddy Context Vars are a bit more annoying than the context cancellation thing. Caddy uses context to provide a request-scoped map[string]any under the key caddyhttp.VarsCtxKey. The problem I have with it: It can be accessed by other handlers and middlewares, and by Caddy itself internally. That’s a problem when we’re multiplexing a request across concurrent handlers, because maps aren’t safe for concurrent read/write.

I learned about caddyhttp.VarsCtxKey the hard way for a caching project I worked on in the past: My caddy instance would hard-crash every few hours with a very frustrating message: fatal error: concurrent map read and map write.

Now, I should be clear here, I absolutely don’t think it’s fair to say, “the Caddy team should change this to a sync.Map.” These vars can be accessed by any link in the chain, upstream or downstream. It’s not meant to have two different handlers throwing values at it, much less concurrently. So when you do have concurrent handlers, wd need to make a distinction regarding which is the canonical handler that gets to write to the map.

For our case with caddy-mirror, we’re treating the primary handler as the canonical handler. The secondary handler still needs to be able to read and write to it - or something just like it. That’s where maps.Clone comes in.

package mirror

import (
	"context"
	"maps"
	"net/http"
	
	"github.com/caddyserver/caddy/v2/modules/caddyhttp"
)

func cloneRequest(r *http.Request) *http.Request {
	ctx := r.Context()
	ctx = context.WithValue(
		ctx,
		caddyhttp.VarsCtxKey,
		maps.Clone( // The vars map isn't concurrency safe, so we'll clone it for the mirrored request
			ctx.Value(caddyhttp.VarsCtxKey).(map[string]any),
		),
	)

	return r.Clone(ctx)
}

When will it be v1?

I’ve touched on three main areas that I want to improve before tagging a v1 release, but let’s dig in a bit deeper to learn more about the process. In order of importance, I want to focus on…

  1. Ergonomics and API/Configuration stability
    • Tests, tests, tests
    • In this v0 period, I’m treating the configuration and behavior as “unstable.” This is the first and biggest opportunity to make the project safe and easy to use. I want to focus on making configuration clear and readable and ensure that the configured behavior is totally unsurprising to anyone using the project.
    • That means that performance optimizations may come and go during the early stages of the project while I focus more on simplicity and safety.
    • In v1 this all needs to be stable, so care needs to be taken with the design to make it extensible and minimize the risk of breaking changes as improvements are made.
  2. Performance
    • Benchmarks, benchmarks, benchmarks
    • While I’m able to actively develop in this “unstable” pre-v1 world, this is also the biggest opportunity to ensure optimizations can be made without accidentally breaking a real user’s workflow.
    • At the time of writing, there is a lot of low-hanging fruit to pick here. Like where bodies are being buffered before being transmitted, we could io.Pipe the bodies into buffers to improve request timing.
      • But that io.Pipe example comes with coupling caveats, so I need to be careful when doing those optimizations.

Feedback Welcome!

This is where I sign off for now, but I’d love to hear from anyone interested in giving feedback or contributing to the project! Feel free to reach me at ben@gitpush--force.com or open an issue on the github!

Tags: