Introducing Hyperlight: Virtual machine-based security for functions at scale
The Microsoft Azure Core Upstream team is excited to announce the Hyperlight…
In recent years, many developers have discovered the power of distributed tracing for debugging regressions and performance issues in their backend systems, especially for those of us with complex microservices architectures. But in many organizations, our most complex code may not be server-side code — it’s just as likely to be running client-side in a browser or mobile app.
While it’s still less common to see tracing instrumentation in client-side code, there’s no reason we can’t use it there. The high resolution, depth-first debugging capabilities tracing offers are useful regardless of where our code is running — and tracing in the client can give us a rich view of how users are interacting with our app. This article walks through a practical, incremental approach to using tracing-style instrumentation in client-side JavaScript apps.
To get started, you’ll need:
Other things that may be handy:
As you start instrumenting, it’s useful to have a mental picture of the shape of the data you will send. So what goes into a typical trace? A trace is composed of many spans (also called events). Each span represents one unit of work for a service, application, or piece of code. For example, a single database query or one HTTP request to a third party service might each be represented as a span. Spans can have parent-child relationships to each other, so you may see a trace representing an HTTP request to an app server; that trace may include a child span representing an HTTP request to an internal service, and then that child span might have many child spans of its own representing different database queries it triggered.
Here are some common types of fields you’ll see that help describe those spans:
Different distributed Tracing APIs use different field names, so check the docs of the tool or API you are using to find the right ones. OpenCensus and OpenTracing are great vendor-agnostic starting points if you aren’t sure which tracing tool you expect to use over the long term.
In backend applications, the root span — the span that represents the bounds of the whole trace — often maps to a single user request to your system. In the browser, it’s less clear what the root span should map to. A single request? A page view? A session? The single request option is too narrow for most use cases — it might help you spot a performance bottleneck, but it won’t do very much to help you understand how your users are experiencing your site. The page view and session options both will allow you to see sequences of user interactions instead of single atomic interactions. Choosing one or the other depends somewhat on the constraints of the tracing tool you are using. If navigating long time spans, e.g. minutes or hours, is difficult, a single page load may be better. If you have a single-page app, it may not make much difference which you choose, since users are likely to spend a whole session on a single page load. Some tools, including Honeycomb, allow you to choose which field to use as the trace ID field, so in that case, you may be able to send both and toggle between page-level traces and session-level traces in the UI.
In this case, we’ve decided each trace will represent a single page load. We need a few pieces of code to send our first span (the root span) and thereby start a trace.
First, you’ll want a reusable function to handle the mechanics of sending spans. This is ours:
// Generates random string ID like "2147483648" for each span in a trace. const newSpanID = () => Math.floor(Math.random() * 2 ** 31).toString(); const traceID = newSpanID(); export default class Tracing { sendSpan(spanName, metadata, duration, isRootSpan) { const params = { name: spanName, service_name: metadata.serviceName, duration_ms: duration, "trace.trace_id": traceID, // If no parent ID is passed, just attach to the root span "trace.parent_id": isRootSpan ? null : (metadata.parentID || traceID), "trace.span_id": isRootSpan ? traceID : newSpanID(), // Send the current state of all feature flags with each span ...flags, // Send the rest of our fields ...metadata, }; // Use the Beacon API to send spans to our proxy endpoint, so we don't // have to worry about missing spans if the user navigates while sending. navigator.sendBeacon(`/create_span/`, JSON.stringify(params)); } }
Then, you’ll need to add code to create your span on page load. It not only adds an on-load listener to create and send the event, but also handles capturing some client metadata and performance data about the current page.
// Start a trace every time someone loads a Honeycomb page in the browser, // and capture perf stats about the current page load. // Assumes the presence of a `window.performance.timing` object const pageLoadMetadata = function() { const nt = window.performance.timing; const hasPerfTimeline = !!window.performance.getEntriesByType; totalDurationMS = nt.loadEventEnd - nt.connectStart; const metadata = { // Chrome-only (for now) information on internet connection type (4g, wifi, etc.) // https://developers.google.com/web/updates/2017/10/nic62 connection_type: navigator.connection && navigator.connection.type, connection_type_effective: navigator.connection && navigator.connection.effectiveType, connection_rtt: navigator.connection && navigator.connection.rtt, // Navigation timings, transformed from timestamps into deltas (shortened) timing_unload_ms: nt.unloadEnd - nt.navigationStart, timing_dns_end_ms: nt.domainLookupEnd - nt.navigationStart, timing_ssl_end_ms: nt.connectEnd - nt.navigationStart, timing_response_end_ms: nt.responseEnd - nt.navigationStart, timing_dom_interactive_ms: nt.domInteractive - nt.navigationStart, timing_dom_complete_ms: nt.domComplete - nt.navigationStart, timing_dom_loaded_ms: nt.loadEventEnd - nt.navigationStart, timing_ms_first_paint: nt.msFirstPaint - nt.navigationStart, // Nonstandard IE/Edge-only first pain // Entire page load duration timing_total_duration_ms: totalDurationMS, // Client properties user_agent: window.navigator.userAgent, window_height: window.innerHeight, window_width: window.innerWidth, screen_height: window.screen && window.screen.height, screen_width: window.screen && window.screen.width, }; // PerformancePaintTiming data (Chrome only for now) if (hasPerfTimeline) { let paints = window.performance.getEntriesByType("paint"); // Loop through array of two PerformancePaintTimings and send both _.each(paints, paint => { if (paint.name === "first-paint") { metadata.timing_first_paint_ms = paint.startTime; } else if (paint.name === "first-contentful-paint") { metadata.timing_first_contentful_paint_ms = paint.startTime; } }); } // Redirect Count (inconsistent browser support) metadata.redirect_count = window.performance.navigation && window.performance.navigation.redirectCount; return metadata; }; window.addEventListener("load", () => { _userEvents.pageLoad(pageLoadMetadata(), totalDurationMS); });
The duration field on this span presents an interesting dilemma. Do we want it to represent just the work of initially loading the page, or should it represent the entire time the user spent viewing the page? In this case, I chose the former, because initial page loading is something I’m very interested in quantifying. I will also send a final span when the browser fires its “unload” event to help us capture the time spent viewing the page. However, if your tracing tool doesn’t have good support for displaying asynchronous spans — spans that have a start time or duration that push them outside the bounds of the root span — you may want to instead send this page load span on unload and have its duration represent the entire time spent viewing the page.
This gets us one span, which is both an “event” and a “trace!” It’s not very interesting, though. While it might tell us whether a particular page load was fast or slow, it doesn’t help us understand much about the user’s experience on that page. So let’s add more spans!
What else can we add to this trace view? One option that will help us get a higher-fidelity picture of our user experience is adding a span each time a user triggers an interesting event like a mouse click or a tap. If you already send custom events to a product analytics tool, these are often the exact kinds of events you want to capture. For example, this app sends product analytics events to both Amplitude and Honeycomb, so the easiest thing to do is have our analytics code also call our sendSpan function defined above. In our case, that gets us lots of events:
We’ve included a default duration of 100ms to make sure all of these show on the waterfall view, but in an ideal world we would use the duration to track the time it took the browser to bubble the event, call the event handler, and render any resulting changes.
If you don’t have existing product analytics instrumentation to hook into, you can always start by setting up event listeners for interesting events like clicks. Here’s a simple example of what that code would look like — in a real app you might want to add some logic to filter for only the most interesting click targets.
const t = new Tracing(); // Adds a span to our traces for each user click document.addEventListener("click", e => { const start = window.performance.now(); // Wrap in a setTimeout so the duration includes the running time of the // relevant event handler & related page updates. setTimeout(() => { const duration = window.performance.now() - start; t.sendSpan("click", { clickTargetClass: e.target.className, clickTargetID: e.target.id, }, duration); }, 0); // Use click capture so we can record the start time closer to the user's action }, true);
Some of our most interesting cases to track will be cases where a user interaction — like clicking a button — triggers an http request — like submitting a form via xhr. If possible, it’s also good to track xhr requests, plus any response process and DOM mutation that result, as spans too. In our React app, we can use our requests module and life cycle callbacks like componentDidUpdate to do this, but the exact patterns will vary from app to app.
Even though our team uses a dedicated error monitoring tool (something like Sentry, Bugsnag, Airbrake, etc.) with this app to give us a more convenient team workflow for finding and fixing errors, we still want to see errors in our traces too. This will help us understand if a user’s experience might be unsuccessful or slow because of a preventable JavaScript error. That way if a user loads a page and then immediately leaves, we can guess if they left because something went wrong, or because they didn’t find what they were looking for.
If you do use a dedicated error monitoring tool, consider adding a few custom fields to your error reports: your tracing spanID and a link to your tracing tool that will allow you to query for this particular span. That will let you easily jump from your error monitoring tool back to your tracing tool if you find you want to better understand the context around a particular error.
const t = new Tracing(); // Sends a span each time an error happens in the browser window.onerror = (message, source, lineno, colno, error) => { const start = window.performance.now(); const spanID = newSpanID(); // Send a unique ID to our error monitoring service AND add a reference back // to this span, so we can cross-link easily between tools. recordError(error, { honeycombSpanID: spanID, viewHoneycombTraceURL: `https://example.com/link/to/our/trace/${spanID}` }) t.sendSpan("error", { spanID: spanID, name: error.name, "error.message": message, "error.source": source, "error.lineno": lineno, "error:colno": colno, }, window.performance.now() - start); };
Finally, our React app includes some single-page-app-style behavior, where large portions of the page will change without doing a full browser navigation. These kinds of page changes are great things to track in our traces, because they help us understand how users are navigating around our app.
If you’re using a client-side routing solution that lets you fire a callback every time the user performs a navigation, that’s a great place to send a span. for the rest of us, we’ll need to shim calls to history-changing functions like pushState and replaceState to get them to do our bidding. Here’s an example:
const t = new Tracing(); // Shim the browser's native pushState function const pushState = history.pushState; // Send a span for each pushState call history.pushState = function (stateObj, title, location) { const lastPage = window.location.pathname; // Call through to the old pushState so the location/history actually changes pushState.apply(history, arguments); // Wrap in this in a setTimeout so the duration includes the running time of // related page updates. setTimeout(() => { t.sendSpan("pushState", { lastPage, newPage: location, }, duration); }, 0); };
Between events, navigations, and errors, we have a record of all the major things a user did while navigating our page. Not bad!
This is a great starting point that will give us a high-level overview of users’ interactions with each page of our app. As we find other significant interaction categories, we can add a span for each one.
While sending client-side telemetry like performance metrics, errors, and product analytics to your application server or third-party tools has been a standard practice in the industry for many years, tracing data usually presents a higher-fidelity view of user behavior than was available in the prior generation of tools. Tracing-style tools also often have better support for high cardinality querying, meaning it can be easier to associate a particular set of behaviors with an individual IP addresses or person. This higher-fidelity view into users’ activity brings with it new potential user privacy concerns.
Some suggestions:
Sending distributed tracing-style instrumentation from the browser is still a relatively uncommon practice, and we are still finding new patterns and figuring out the best practices as a community. If you’re interested in seeing more approaches to browser tracing and client-side instrumentation, check out:
If you have questions or suggestions from your own client-side tracing, let us know in the comments! Happy tracing.