Upcoming LunaMetrics Seminars
Los Angeles - Anaheim, Sep 8-12 Washington DC, Sep 22-26 Boston, Oct 6-10 Chicago, Oct 20-24

How to Bulk Redirect Complicated URLs

This is The 404 Situation. 

mike_the_404_situation

No wait, this is. You have thousands of product images stored across many different directories on your website. In an effort to keep things more organized you decide to consolidate those images to one directory on your website, leaving you with the task of creating a best match redirect map for the old images to their new directory. The only similarity between the new and old URLs is the product name. Below I’ll show you how to use regular expression and Excel to make a best match redirect map for complex URLs, leaving you more time to GTL.

You can follow along with this example spreadsheet.

Step 1: Compile a list of all 404 URLs and all possible live (202) URLs. Moving forward I will assume you have already done this. Below is a snapshot of my example’s URLs. The left column contains the live URLs, and the right column contains the 404 URLs.

new-and-old-urls

Step2: Deconstruct the URLs using regular expressions so that each part of the URL is separated by tabs (instead of backslashes).

The expression looks like this:

Find: ^http://www.example.com/(.+)/(.+)/(.+)\.(.+)$
Replace with: http://www.example.com\t$1\t$2\t$3\t$4

The regular expression above captures the areas between each backslash and then places them in-between tab strokes instead. Doing this creates a tab-delimited file which will allow you to do a quick copy/paste into excel. Excel recognizes the tab spaces when you do this, and places each section of the URL into its own column.

This is what each URL should look like before and after:

Before: http://www.example.com/images/extraLarge/example-product-1.jpg
After: http://www.example.com     images     extraLarge     example-product-1     jpg

If you aren’t familiar with regular expressions “.” looks for any single character (i.e. a-z, 0-9, and spaces), and the “+” tells the text editor to look for 1 or more of any character. When these two are placed inside parenthesis it groups them.

^http://www.example.com/ – The carrot denotes the beginning of the strings of characters you want to match, and the rest matches an exact string of characters.

1st (.+) – matches everything in-between the backslash before and after it. The second (.+) operates the same way.

3rd (.+) – matches the product image. At this point we could match the entire image name – examples-product-1.jpg. But because our new url structure is more complicated than the original, we need to break out the file extension from the file name.

\. – escapes the period making it a normal character, not a regular expression character. The escaped period now acts like the backlash and allows us to replace it with a tab break.

4th (.+) – Captures the file extension (jpg, png, gif, etc.).

$ - Tells the text editor that it has reached the end of the string you’re looking for.

Step 3: Use VLOOKUP and CONCATENATE to find and match URLs.

Once you have created the tab-delimited files for your list of 404 and 202 URLs, it’s time to plug them into Excel and match them. In this example the 202 URLs take up 5 columns – A through E – so in column F we will recompile the URL and use it as a helper column during out VLOOKUP.

concatenate-202URLs

You will need to do the same with the 404 URLs.

concatenate-404URLs

As you can see the concatenate function is referencing cells that contain portions of the URL and is inserting backslashes (and a period) back into the URLs’ structure.

Once you have that it’s time to make the matches with VLOOKUP. Using the two columns that contain unique product information – column C and column J – use VLOOKUP to find each 404 product (column J) within an array of all new products (column C), and have it return our concatenated helper column – column F. The function will look like this:

=VLOOKUP(J2,$C$2:$F$6857,4,0)

VLOOKUP-404s

The VLOOKUP function is working like so:

J2 – is the value you want to look up. Within the function its labeled ‘lookup_value’.
$C$2:$F$6857 – defines the array within which your lookup value exists – column C – and includes the value you would like to return – column F. Within the function this is labeled ‘table_array’. VLOOKUP always looks through the first column in this array to match your lookup_value.
4 – denotes the column within the array which you would like to return. Because we would like to return the complete URL related to the match, we return the concatenated column, column F. Within the function this is called ‘col_index_num’.
0  - zero (or FALSE) tells VLOOKUP we want an exact match. Within the function it’s called ‘range_lookup’

Copy and paste that VLOOKUP function to the bottom of your 404 list, copy columns L (404 URL) and column M – the matching 202 URL – then paste them as data into a new spreadsheet. You now have a 404 redirect map for thousands of complex URLs.

 

Sean McQuaide

About Sean McQuaide

Sean McQuaide is a Search Analyst in the SEO department of LunaMetrics. His passion for business and all things digital fuels his need to ask the simplest of questions, why. Along with his education in marketing and economics he brings to LunaMetrics a background in web design, SEO and sales. He is a movie addict and avid drinker of craft beer, but would take on a mountain bike trail over those two any day.

http://www.lunametrics.com/blog/2013/12/16/bulk-redirect-complicated-urls/

One Response to “How to Bulk Redirect Complicated URLs”

Thanks for sharing, Sean. I love using Excel to build both html content merges or deconstruct CSV exports from one site to reconstruct for another. We regularly import CSVs to WordPress Woo Commerce, and of course quite a few of our clients are not already using this platform. Instead of loading product after product one at a time, it’s way better use of time to pull apart any given export file and reassemble it in a format that works for Woo. This often means writing formulas that extract or re-unite elements, or remove regexes, illegal characters (especially for importing from a site the allowed a char set for export that breaks the html). Importing images becomes a breeze when you can rebuild an import list and get Woo to grab them from current live URLs.
Of course, there are so many combinations of export-import that sharing in the hope they might help others is a bit futile, but for readers, you can make fairly simple things like this:

=IF(BD3=0,” “,(CONCATENATE(BP3,BG3,BQ3,BD3,BR3)))

Now if readers are not familiar with Excel, it might look not simple at all, but basically this formula says if cell BD3=0, then place a space in the resulting cell, but if it’s not 0, then added all the content from these other cells into a long string of characters. This formula was used to extract a JPG file name, determine if there were more images for the product, then build known html code around the jpg file name to map the URL href and anchor texts etc.

Learning this sort of stuff in Excel can save you many, many hours of manual and excruciatingly & mind-numbingly tedious work.

Leave a Reply