Fresh URL: Clean URLs at Last!

February 13, 2014

Topic tags

Brendan Schwartz

Founder, CTO

We’ve all received links to websites and blog posts that are ridden with UTM codes. Just look at this sorry specimen:

https://wistia.com/blog/2013-wrap-up?utm_medium=social&utm_source=twitter.com&utm_campaign=buffer&utm_content=buffer2ba10

This particular URL is for the 2013 recap video on our blog. The UTM codes here indicate that this visitor clicked through from a tweet that we published using Buffer. There’s some great tracking information contained in that link, but good golly, that’s one ugly URL. It makes me want to cry. Really, it does.

No one deserves to be this sad — enough’s enough! I’m very happy to introduce Fresh URL, a simple JavaScript library to help rid the world of these gross URLs!

Introducing Fresh URL

Fresh URL makes it easy to keep URLs squeaky clean while still getting the benefits of campaign tracking. Just include this little snippet of code on your website. It’ll automatically strip UTM codes and a few other tracking parameters from your URLs:

<script src="//fast.wistia.net/labs/fresh-url/v1.js" async></script>

Here’s how it works. Fresh URL detects which analytics providers you’re using and sets up watches on them to see when they’re ready. Once Fresh URL confirms that each provider has extracted the information it needs from the URL, it calls replaceState to remove the UTM codes.

It’s hosted on the same global CDN infrastructure we use to serve our video embeds, so not only will it load quickly, but we’ll pick up the bandwidth tab for you!

Visit GitHub for the full code and some more technical instructions on how to use it.

The dark side of UTM codes

UTM codes, or Urchin Tracker Module codes, are the industry standard when it comes to tracking marketing campaigns. They were originally created by the Urchin web analytics company, which Google purchased in 2005 and turned into Google Analytics. They did away with the Urchin name, but the “utm” parameter name is here to stay.

I’m sure I’m not the only one for whom a messy, UTM-code-ridden URL causes a visceral reaction. Allow me to enumerate the reasons why I’m overcome with sadness when I encounter them:

  1. It’s a reminder I’m being tracked.
  2. Short, simple URLs look nicer and feel better. Would we tolerate parts of the HTML or JavaScript code poking through on a website? No, that makes zero sense.
  3. If I want to share that link, I feel compelled to chop off the extra junk in the query string, because I don’t want to spread gnarly URLs. I mean, what does that say about me? I don’t want to be that guy. I’d wager that having cleaner URLs actually encourages sharing, especially among the more savvy members of your audience.

Granted, I have a somewhat extreme viewpoint on design details like this, but it’s no secret that to create a truly great experience, you have to pay very close attention to the details.

Even if you and your audience are not bothered in the slightest by heinous-looking URLs, there’s one more reason having these codes in the URL is bad for you: it’s muddying up your marketing analytics!

This is the story of a URL

Let’s pick on [First Round Capital][5] for a moment. I absolutely love their content and share it all over the place. But like nearly everyone, they’ve got some ugly looking URLs!

Just the other day, I clicked a link in one of their emails and landed here:

https://firstround.com/article/Rap-Genius-Explains-Why-Worse-is-Better?utm_source=Firstround.com+Library&utm_campaign=8f863e7524-The_Right_Way_to_Grant_Equity_to_Your_Employees&utm_medium=email&utm_term=0_d9bb43e05b-8f863e7524-75883833

Okay, fight back the tears. Let’s dig into this URL. It tells them some interesting information about me:

  • An article about employee equity originally hooked me.
  • I clicked through to this article from an email campaign.

In aggregate, this data can answer some interesting questions: Where does most of their traffic come from? How does their email list drive traffic to their website and content? How valuable is an email address to them? What type of content is best at convincing people to give them their email address?

Here’s the problem, though. If I read that article from First Round Capital and find it fascinating, I will share that link. If I copy and paste it into a chat to my buddy and he clicks through, he’s being tracked as if he were me! He didn’t arrive via an email campaign. He arrived via sharing.

This is not good! If I were First Round Capital, I’d want to see that sharing traffic coming in as direct traffic, so I would at least have an idea of how much an article is being spread this way. Alexis Madrigal wrote an excellent article on this type of traffic. He dubs it ’Dark Social’.

There are some clever solutions to dealing with this problem. The fine people at Luna Metrics developed Directmonster.js, which dynamically appends every visitor’s referral information to their URL, so that when they copy and share the link, you get rich tracking information about the visitors they drove to your site. The downside of an approach like this is that your URLs are still chock full of tracking codes.

There’s always going to be a trade-off between the amount of information you’re collecting about your visitors and the cleanliness of your URLs.

The precursor to Fresh URL

Some cold day in December, I reached a tipping point with the UTM codes on our blog. As a quick hack, I changed our Google Analytics tracking code so it would call a function to remove those parameters once it was ready. Here’s that code:

<script type="text/javascript">
  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-535854-7']);
  _gaq.push(['_setDomainName', 'wistia.com']);
  _gaq.push(['_trackPageview']);

  // If the client supports replaceState, strip utm tags out of the query string
  // after GA loads.

  _gaq.push(function() {
    if (!window.history.replaceState) return;

    var cleanSearch = window.location.search
      .replace(/utm_[^&]+&?/g, '')
      .replace(/&$/, '')
      .replace(/^\?$/, '');

    window.history.replaceState({}, '', window.location.pathname + cleanSearch);
  });

  (function() {
    var ga = document.createElement('script');
    ga.type = 'text/javascript';
    ga.src =
      ('https:' == document.location.protocol ? 'https://ssl' : 'https://www') +
      '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0];
    s.parentNode.insertBefore(ga, s);
  })();
</script>

If the visitor’s browser supports replaceState, we remove all query parameters that start with utm_. Those two extra replace calls are to make sure we clean up the mess we made after stripping the UTM params, removing the trailing & if there is one, and if we’re left with nothing but a ? at the end of the URL, removing that as well.

The key to all of this is queuing this function to run after Google Analytics picks apart the URL and does its thing. GA gives you a very easy way to do this. We just push our function onto the end of the _gaq array, and GA will run it once it runs the other commands that are in the queue. So in this example, it will set the account, set the domain to wistia.com, track the page view, and then run our URL cleaning function.

Google has great documentation on how all of this works.

The full solution

This was great, except it had a huge flaw. We use a lot of other analytics providers besides Google Analytics, and we really want to make sure that they’re all ready before changing the URL, otherwise we’ll be missing out on this tracking information.

I looked through all the libraries we used and narrowed it down to the ones that really need access to the UTM codes: HubSpot, Pardot, Google Analytics, Clicky, and Simplex (a tiny JS library we built to track a visitor’s first touch with our site). They all operate differently, but I was fairly confident I could find a way to detect if each one was ready.

There was one other major snag. Our producthas a handy feature that allows you to identify a visitor by putting their email in the query string like this:

https://wistia.com/?wemail=hi@wistia.com

Our video player will detect this parameter in the URL and pass that information to our analytics system. We use this when we send out email newsletters to track what people are watching. Here’s how it works.

Dropping wemail into the query string is an amazingly simple solution for tracking visitors, but it has a problem that’s been plaguing me ever since its creation. People will share links that have their email address in them! Now our system will think everyone who clicks that link is that same person. It’s pretty annoying.

If we could coordinate with the Wistia players on the page and strip the wemail parameter, that would be huge! We’d have even cleaner URLs and no longer be errantly tagging visitors.

Thankfully after some late nights and a lot of help from Max, Fresh URL was born. Take a look at the code if you’re curious how it works!


Hopefully I’ve convinced you to help me rid the world of ugly URLs! If that’s not the case, just know that it will be hard for me personally to share links to your content, because I have trouble using my computer when my eyes are filled with tears!!!

Brendan Schwartz

Founder, CTO

Mailing list sign-up form

Sign up for Wistia’s best & freshest content.

More of a social being? We’re also on Instagram and Twitter.