August 31, 2011 ↘︎

Search & Promote the implementation, part 1

“I can’t find anything!”

This is the most common response we came across during the scoping and implementation of Search and Promote as the new internal search for Murdoch University.

Hardly surprising, given the issues with internal search that I covered in my previous post, but amazingly consistent!

In fact, one of the great truths we found during this project is that people truly don’t care where content is located, or whether it’s authenticated and/or accessible – they just wanted to type something in the search box, immediately find what they’re looking for, then carry on with their work.

We’ve now completed the implementation across our internal sites, and it’s working really well – so well that we’re now 2-3 weeks away from covering our external sites.

In my last post I promised to run through the implementation, however there’s a lot to talk about, so today I’ll cover SEO metatags (or the lack thereof), using multiple content sources, and how we integrated Search & Promote with SiteCatalyst to dynamically alter search result ranking.

Given the issues with internal search across campus and the wide range of staff and students that were more than happy to tell us just how bad it was, we decided to first implement Search & Promote across the internal sites where our primary audience are current staff and students.

Through the implementation of SiteCatalyst a few years back  across our network sites we have been able to segment our staff and student traffic, so we knew from the onset just how many searches each segment were doing, and how long on average they were taking.

Looking specifically at staff, approx 2,400 people collectively performed 234,131 searches in 2010, spending an average of 202 secs per search. Wow!

That equates to 13,137 hours, which, at an average of $40/hr, comes out to a $524,498 productivity cost. This figure alone should catch the attention of your key stakeholders and finance people.

Armed with that knowledge, we set the following key objective for the  Search & Promote trial across our internally facing sites;

  • Reduce time staff spent searching by 10% by delivering a single set of filterable results, transparent of source, influenced by recent traffic.

Now that we had a clear objective, we could begin on the planning and implementation. We were greatly aided by a project team at Search & Promote – thanks John, Wally and Richard; you were all very helpful, and it was great working with each of you.

The first step was to set up the organic crawl of our internal sites, which largely consisted of listing the appropriate entry points;

Screenshot of URL entry points in Search and Promote

And their corresponding URL masks (note the test feature that allows you to try your masks before saving them);

Screenshot of URL masks in Search & Promote

Search & Promote works on a number of pages crawled – your licensing allows you to go to a certain number of pages, and after that the pages are not added to your index. There was a bit of tweaking to figure out what that level was, however there’s a cool feature in Search & Promote that allows the crawl to continue and count the number of pages that you’ve gone over by so you at least have an idea of where you are. From there you can either increase your licensed limit, or identify the larger than expected sites and par down the number of pages found by using the error logs and URL masks.

Compensating for the lack of SEO content

One of the issues I’d talked about previously was a lack of the bare minimum SEO metadata across many sites, most of which we had no direct control over. We tackled this by using the metatag injection feature in Search & Promote, which can be configured to dynamically inject metadata during a crawl, based on a URL pattern. This metadata is then included in the index as if the metadata was already embedded within each page, and can range from standard title/description metatags, to custom tags that can be use to create search filters (facets).

We soon found, however, that a significant portion of internal content required authentication to access, which meant that the crawler could not get in to that content. The Search & Promote crawler can be given credentials to access that content, however our concern was that content was authenticated for a reason, and to show even a title or extract from authenticated content on a public search may give away too much.

Given that the “we can’t find anything!” comment included authenticated content and applications, we needed an alternate option for this implementation to be successful.

At Murdoch we have a database called the A-Z index, which is maintained by our IT area, and over the past 5-6 years has grown to include an entry for most of our authenticated content and applications. This was a perfect source of information, now we needed to somehow incorporate this content into our search results.

Enter a feature in Search & Promote called ‘index connectors’.

Incorporating multiple sources of content

The index connector feature within Search & Promote allows you to define a third party xml feed, xml file, or comma/tab delineated file as an alternate source of content to be crawled.

The IT at Murdoch team were able to provide us an xml feed out of the A-Z index which allowed the Search & Promote crawler to include each entry/link within the feed in its scheduled crawls, together with custom mappings for each tag within the entries  to predefined custom metatags;

Screenshot of the raw A-Z XML feed

Not only were we able to crawl the feed and include all the authenticated content as separate entries (‘restricted’ in the above screenshots), but we were able to alter the look and feel of the specific A-Z results within the wider search results, and account for a lack of  description within the feed.

The side-effect that we hadn’t counted on, but worked to our benefit, is that the A-Z index had entries for related non-Murdoch sites that were still of value to staff and students.

By having entries for the non-Murdoch sites in the A-Z as wayfinders, we didn’t need to crawl the actual sites themselves. This resulted in a significant reduction in the number of sites/pages we needed to organically crawl, while still providing our audience with a complete set of search results.

Using this same index connector functionality we were also able to incorporate the university’s campus directory listings via a new xml feed; whereas with the A-Z feed we only wanted to incorporate the results within the wider results set, we wanted results from the campus directory to always be the first results and be displayed in a table format, but more on the styling and positioning of these multiple content sources later.

Allowing for cyclical requests to ensure the most relevant results appear

In my previous post on Search & Promote, one of the key advantages the product had over its competitors was the ability to natively integrate with SiteCatalyst.

Via SiteCatalyst we already knew that our internal search terms follow highly cyclical patterns as our student (and staff) needs change over the semester. For example, the term ‘timetable’ is searched for throughout the semester, however the anticipated result changes as the semester progresses. At the beginning of semester, people are looking at for their semester timetable and towards the end their exam timetable.

In the past we’ve used custom coded mechanisms to help staff and students find what they’re looking for, however with Search & Promote we can take that to a whole new level!

Search & Promote allows you to define a data source within SiteCatalyst, in our case Global Production > Page Views, and then add ranking weight based on those values – the higher the weight, the higher the impact the SiteCatalyst data will have over your search results.

We defined s.prop41 under our Global Production suite in SiteCatalyst as SearchPromoteURL, and then used it to cross reference the Search & Promote crawled URLs with the associated Page Views data in SiteCatalyst;

Using page view data from SiteCatalyst to influence ranking

Now, every day the last seven days worth of aggregated SiteCatalyst page view data is automatically downloaded and fed into the Search & Promote custom defined field SearchPromoteURL, which in turn is used in a ranking rule that increases the relevance of highly trafficked pages in the last seven days;

Aggregated page view data in Search & Promote

A good example of this in action are our sample and past exam papers in our Library website, where there is a separate page per letter – with the SearchPromoteURL ranking rule disabled, the pages are literally ranked A through to Z, as the other active ranking rules see them as equally relevant. However when the SearchPromoteURL ranking rule is in place, the top ranked exam page is Exams B, followed by P and I.

In the admin data report for “exams” below you can see how the ranking, relevance and score metrics are all the same for the exam paper pages, and that the differentiating ranking  factor is delivered by the page views;

Admin view of results for 'exams' and the different ranking scores that order them

The same ranking results can be seen on the front-end at;

Corresponding public search results for 'exams'

This is exactly what we set out to achieve, and it’s so far looking to have worked pretty well!

In part 2 of this post, I’ll cover how we combined all our sources of search results into a single set of user-centric, filterable search results, well as how we fared against our original objective of reducing time our staff spent search by 10%.

DB logo
DB logo
DB logo