Redirecting URLs After Major Content Restructure In Sitecore

About a year or so ago I was working on a project which involved an upgrade from Sitecore 6.4 to 7.0 (the latest version at the time). The upgrade was fairly incidental, the site was (kind of) “working fine” in production. The main reason for touching the codebase was to fix issues due to a poor implementation that was carried out initially, namely:

  • Lack of Page Editor support
  • Poor coding not following Sitecore Best Practices
  • Sitecore section in Web.config modified directly (which made upgrading more difficult)
  • Poorly structured Content Tree
  • Difficulty applying a good security model for multi-site, multi-editor environment

…amongst others. It wasn’t all bad, just some common mistakes made by developers struggling with Sitecore during a first implementation.

One issue was the poorly structured content tree. It seems there was struggle with getting a handle on how to structure content whilst trying to use that same structure to create the navigation menu on the site. They had the usual “Include In Navigation Menu” checkboxes but the navigation structure did not actually completely match the content structure.

Tree Structure

Since the structure of the items in the Content Tree directly affects the item URL we ended up with these ugly, unfriendly URLs that looked like the following, and what they should ideally look like:

/en/utilitypages/productdetail.aspx?productID={1760E4F2-C2CD-4109-8E80-2BAB340830AC}
โ†’ /en/products/category/sample-product-name

/en/main%20navigation/about%20us/about%20us/who%20we%20are.aspx
โ†’ /en/about-us/who-we-are

/en/utilitypages/where%20to%20buy.aspx
โ†’ /en/where-to-buy

Redirecting Content

So we made a whole bunch of code changes, updated all the Item Names (replacing spaces with dashes) and restructured all the content to something more meaningful and all was well. We are almost ready to go live, except now we have to redirect all the old URLs to the new structure. We didn’t want all that search engine goodness to be lost. My initial thoughts were that we would use the IIS URL Rewrite Module, automating the generation of the Rules using some custom code. Except when we finally looked at the Rules set up on the Production file we were faced with over 10,000 lines of rules. Bugger ๐Ÿ˜ฆ

We were rushing to meet marketing deadlines and tight on time anyway so really did not have time to investigate which rules were used, redundant or needed updating. So I came up with a different solution to deal with redirects after a major restructure of Sitecore content by attaching a secondary web database.

Steps to create the database:

  1. Set up a Sitecore instance with the old databases after they have been upgraded to the Sitecore version you will be using (you remembered to take backups along the way right?)
  2. Publish all the content, layouts and templates to the web database. No need to publish Media Library, since we were trying to keep it lightweight. The same reason for the publish and not using a master database, which would have included all Item versions, system setting etc.
    [You may also be to achieve the same by creating a content package directly from web database and then installing it into a clean Sitecore 7.0 install, I recall that should work. The differences in version should not matter… but please test this, I can’t remember…]
  3. Rename the “web” database to “web_legacy” (or whatever you want)
  4. Attach the new database to your Sitecore solution and add in the necessary config

Required config changes:

  • Add a connection string entry for "legacy_web" in connectionStrings.config
  • Add an entry to the Sitecore/Database section of config. You can make a copy of the “web” entry and rename the values to “legacy_web”

Then add the following pipeline after the standard Sitecore.Pipelines.HttpRequest.ItemResolver:

public class LegacyItemResolver : HttpRequestProcessor
{
	public override void Process(HttpRequestArgs args)
	{
		if (Context.Site.IsInternal())
			return;

		string redirectUrl = null;
		Item legacyItem = null;

		// try to resolve the new url by looking up the item in the old database
		if (Context.Item == null)
		{
                        //TODO: Add some caching in here after resolving the first time!
			string path = MainUtil.DecodeName(args.Url.ItemPath);
			using (new Sitecore.SecurityModel.SecurityDisabler())
			{
				// try to resolve from the old database
				Database LegacyWebDb = Factory.GetDatabase("legacy_web");
				legacyItem = LegacyWebDb.SelectSingleItem(path);
			}

			if (legacyItem != null)
			{						
				// if we are requesting a product item then find the product and redirect to the product page
				if (legacyItem.ID == ConfigSettings.ProductDetailPage)
				{
					var querystring = HttpUtility.ParseQueryString(args.Url.QueryString);
					var product = ProductRepository.GetItemByID(querystring["productId"]);
					
					if (product != null)
					{
					    // our custom LinkManager would create an SEO-friendly Product URL
					    // which is displayed using a wildcard node item
					    redirectUrl = LinkManager.GetItemUrl(product);
					}
				}
				else 
				{
					// try to find the item in the restructured database
					var newItem = Context.Database.GetItem(legacyItem.ID);
					if (newItem != null)
						redirectUrl = LinkManager.GetItemUrl(newItem);
				}						
			}
		}

		// redirect to the new url with a 301 Moved Permanently status code
		if (!String.IsNullOrEmpty(redirectUrl))
		{
			args.Context.Response.Clear();
			args.Context.Response.StatusCode = 301;
			args.Context.Response.RedirectLocation = redirectUrl;
			args.Context.Response.End();
		}
	}
}

This code checks if the Item has not been resolved from the content database using the standard Sitcore Item Resolver (and any of our own custom resolvers) then one final ditch effort is made to locate it in the original Sitecore content tree. If the Item is found then we then try to locate the same item in our current content tree and then redirect the user to the new URL.

Old is gold!

Why does this work? Basically because we had restructured content by moving it, not by creating new instances of pages and so the GUIDs did not change ๐Ÿ™‚ We didn’t create a new site or copy old content into a new structure and work with it that way specifically because we not not want the IDs to change. Content was constantly changing in the production environment and we could not instrument a content freeze for the amount of time it would take to make all the code/structure changes, test and then deploy. The idea was to use RAZL to compare production content with our initial backup we took and merge in the content changes before go live. And yes, we were working with a snapshot of production in dev environment – possibly not the best idea but I couldn’t think of a better one ๐Ÿ™‚

We were working in a multisite setup so redirects needed to work for both sites. Since Sitecore is nice enough to resolve args.Url.ItemPath to the full item location (i.e. /sitecore/content/site/home/location/to/page) there is no need for us to mess around appending site context start paths.

I would still recommend using the URL Rewrite module over this code if posssible, since performance wise it will be much better.

There was some logic behind the madness, since we figured these redirects would only need to be kept in place for a few months whilst we waited for the search engines to re-index all our content and related URLs. It also gave us some time to then go ahead and clean up the rewrite rules in the existing config.

Security

Any user with access to the Sitecore desktop would be able to switch to legacy_web and browse content and make changes. This should only affect those users in the Sitecore\Developer role. Your normal content editor should not have access to the database switcher if the correct roles have been set.

URL Redirect - Switch Database

This could be useful, but if you want to restrict this then switch over to the database and use the Sitecore Security Editor to restrict their access. You will need to Unprotect the sitecore item first to allow you to set the permission. Give them Read only access or even Deny Read access to the Everyone User on the /sitecore node… Nothing to see here, move along ๐Ÿ™‚

URL Redirect - Restrict Permissions

Other Uses

I’m sure there are plenty of uses for a secondary web database. I’ve heard about comments being collected in a secondary web database, which keeps the master database more lightweight, comments appear on the site immediately (no master=>web publish required) and also means that any subsequent (full) publishes do not delete those comments. There is also a great post from Mark Cassidy about Working With Multiple Content Database to store product information.

Advertisements

3 comments

  1. mursino · April 3, 2015

    This is definitely a clever solution and by maintaining the Item IDs you’re able to map between the different content trees. Will definitely think of this if a similar problem comes my way!

  2. Dr.ian · September 18, 2015

    I am so interested with this!! Thank you for sharing., now gonna bookmarkit!

  3. Ruud van Falier · August 25, 2016

    Very clever!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s