What is in your Website Search?


Original Photo by JohnStover

There are literally hundreds and thousands of ‘search engines’ out there. Some of these search engines are for finding stuff on the Internet, like Google, Bing and Yahoo. Some search engines are more specialized, like the search box you see on a single web site that searches only that single website. Search is an incredibly complex topic that has an astounding number of factors that contribute to finding that single important piece of content that you are trying to find. Frankly, Google spoiled all of us. I expect to find exactly what I’m looking for out of the millions of pages of stuff all over the internet by simply typing a single word into a single little box. If I don’t find what I want on the first page of results, I might try changing my search a little bit or adding two words, but I won’t keep trying for long.

The Internet contains at least 27.5 billion pages, as of Tuesday, 03 August, 2010, according to http://www.worldwidewebsize.com. Not only do I expect to find exactly what I want on the Internet, but if I use the search on your website, I get EXTREMELY frustrated when it doesn’t find exactly what I want when I want. How is this possible? I know what I want is on your website somewhere. Figure out what I want and show it to me! And please do it in under a second if it’s not too much trouble!

In the beginning, search was simple. Search was based on keyword matching. If I typed in a keyword, the ‘search engine’ scanned the content and found instances of that word and showed me hyperlinks with those results. I could search for ‘blog’ and the search would show me any page that had the word ‘blog’ in it. That was perfect! It’s all anyone needed. Then websites started to grow in complexity. Soon, each website had thousands of pages. If I did a simple keyword search, I would get hundreds of results. This wasn’t useful anymore. Search had to get better.

Search introduced major improvements. Boolean search operators were introduced. I could search for “SharePoint AND WordPress”. I could search for “SharePoint NOT WordPress”. I had some control on what I was searching for exactly. I also got search result sorting. I could sort all of the results to see the most recently created pages at the top. After all, if the page was newer then it clearly was more relevant, right?

That statement introduces a very important topic: RELEVANCE. Relevance denotes how well the results meet the need of the user searching; see the all-knowing Wikipedia for more details at http://en.wikipedia.org/wiki/Relevance_(information_retrieval). Relevance is determined by the search algorithm. That’s right; a computer programmer wrote a mathematical formula that uses the available information to determine the relevance of the content to your search word. In reality, that algorithm was written by a very large team of programmers, analysts, mathematicians, executives and many others. And the search is getting more complicated and far better every day.

Most modern search engines are comprised of two different primary components: the INDEX and the QUERY. The index is just like the index at the back of a book. Rather than scanning all of the content in real time, the search engine builds a big index of all of the content. This is much faster than scouring through the content in real time. Furthermore, the index can be optimized for the type(s) of searches being performed. Your individual website search is responsible for searching your website. Facebook search searches Facebook – the profiles, comments, photos, tags, etc. Google and Bing try to search everything – your website, my website, her website, their website. Your website search should search ALL of your content – web pages, HTML, PDF files, Word docs, PowerPoint files, Excel files, images, comments. The index should include ALL of your content.

So how is the index built? Usually indexes are built by a Web crawler – some type of automated software that scours all of the links and content on your site. The index uses the concept of word breaker to look for different words. In the English language, there are many characters that break words apart. Spaces, hyphens, periods, colons, semicolons, exclamation points all separate words in English. When you get into multi-lingual content, the story gets even more complicated because other languages don’t even use the same characters. So the crawler goes through all of the content and builds this enormous index for use in queries. The index contains the words, counts, metadata, information about where the words were found, information about the pages, information about the documents, titles, cached portions of pages and much more.

When a user enters a query, the search engine uses it’s algorithm to provide the most relevant information possible. What determines relevancy? There are many factors that should determine relevancy…

  • Content Type. What type of content is the word found on? PowerPoint files typically have fewer words. If your keyword is one of the 20 words on a slide, that file is likely more relevant than a Word document or web page that has 2000 words.
  • Location. If your keyword is found on the homepage or main landing page it is likely more relevant than if the page is found 30 nodes away through some obscure navigation.
  • Popularity and linking.  How popular is the page? How many other pages and documents link to the page? How frequently is the page visited?
  • Analytics.  How frequently is the page visited with similar queries? If 50 other people searched for the same keyword(s) you searched for, which pages did they eventually go to?
  • Words. How many times is the keyword on the page?  How many
  • Metadata. Is your keyword in the metadata or just the main content area? Is your keyword in the page title?
  • Language Detection. Is my browser set to Spanish? Should documents in Spanish show up with a higher ranking in the search results?
  • Variants (Word Stemming). What if I search for the word “Flying”? Should the search engine also search for Fly and Flew and Flown? What if it’s a different language? Should the search engine be aware of other word variations?
  • Human Influence. What about best bets, synonyms and keyword mapping. If someone is on the Association site and searches for the word Meeting, do you want to artificially influence the search results to show ‘Sign up for the Annual Conference’ as the first result?  I bet the conference organizers do!

As you can see, the effectiveness of the search engine depends on the ability to determine relevance and then use that relevance to rank the search results. Modern search engines are available both inherently integrated and completely independent from your website content management technology. WordPress, for example, has a built in search that is pretty simple (and thus largely ineffective).  It’s great for finding a keyword, but I would hardly call it a search engine.  Both Microsoft and Google provide real search solutions.  The have solutions for you at every level: your desktop, your enterprise, your website, and the Internet.  We are focusing primarily on your website and to a lesser extent your enterprise. The Google Search Appliance provides a great solution that provides excellent relevancy that can be customized for your particular web site needs. The Google Search Appliance and Google Mini require annual maintenance fees.

Microsoft provides a free solution to search for your website and for the enterprise. That’s right; Microsoft provides enterprise level search capabilities for FREE. Microsoft Search Server 2010 Express provides the search capabilities described in this overview for FREE. While this solution may not be the perfect fit for every website, I think it is at least worth evaluating. You can download the software for free, install it, and configure it in a matter of minutes. If it works for you, implementing it with your website is as simple as replacing the search box.


SharePoint 2010 Licensing Part III: Search, Office Web Applications, and Project Server

In the previous post, SharePoint 2010 Licensing Part I: Foundation, Server, and Designer and SharePoint 2010 Licensing Part II: Windows Server and SQL Server, I covered SharePoint Foundation 2010, SharePoint Designer 2010, SharePoint Server 2010, Windows Server, and SQL Server.  In this post, I’ll cover additional related products.

Microsoft Search Server Express 2010.  SharePoint Server 2010 comes with incredibly robust search capabilities.  Microsoft Search Server Express 2010 provides most of these search capabilities for free.  Microsoft provides a Search Server Express 2010 vs SharePoint Server 2010 Search comparison that provides a very good high-level overview of the differences.  So why would you use Search Server Express 2010?  Maybe your organization can’t afford SharePoint Server 2010 in this year’s budget.  Maybe you are running SharePoint Foundation 2010 and want an enterprise search and not just the site level search.  Maybe you want a powerful search engine to index your public facing web site, your file shares, Exchange public folders, other SharePoint sites, or even structured content in your database (like CRM/AMS/LOB systems).  Maybe most important of all is that the Microsoft Search Server Express 2010 license is free.

FAST Search Server 2010 for SharePoint.  FAST Search Server 2010 for SharePoint adds even more functionality to the search capabilities of SharePoint Server 2010 Standard search, including support for indexing up to a BILLION content items, sub-second query latency, better search refinements, visual cues for rapid recognition (think thumbnail previews), advanced content processing, intelligent automatic metadata recognition, and much more.  As mention in SharePoint 2010 Licensing Part I, FAST Search Server 2010 for SharePoint licensing is included in SharePoint Server 2010 for Internet Sites, Enterprise licensing.    Microsoft provides a SKU to add a FAST Search Server 2010 for SharePoint license to your SharePoint Server 2010 License for your internal (client/server CAL) SharePoint Server 2010 environment.    Quoted from the Microsoft SharePoint Licensing Details page, “SharePoint Server 2010 for Internet Sites, Enterprise, also includes the rights to FAST Search Server for use in Internet or Extranet scenarios. You can deploy a single server license of SharePoint Server 2010 for Internet Sites, Enterprise, as SharePoint server or a FAST Search server—but not both concurrently.”

A wonderful resource regarding FAST is the FAST Search Server 2010 for SharePoint Evaluation Guide.

image

Microsoft Office Web Applications 2010.  Office Web Apps are the online version of Word, Excel, PowerPoint, and OneNote so that you can access, view, and edit documents from any authorized web browser – PC, Mac, or mobile.  For business use, Office Web Apps require at least SharePoint Foundation 2010 (which is free), but will also run on SharePoint Server 2010 Standard or Enterprise.  Business users are licensed through the Microsoft Office 2010 Volume License and can access the downloads at the Microsoft Volume Licensing Service Center. For personal use, Office Web Apps are free and available via Live along with your SkyDrive.  Get more details from the Microsoft Office 2010 site.

Microsoft Project Server 2010.  When presenting some of the task management capabilities of SharePoint, a question that inevitably comes up is, “Does SharePoint work with Microsoft Project?”  The short answer is, of course, yes.  You can absolutely use SharePoint to manage MPP files, including version history, exclusive check-out, workflow, alerts – just like any document type.  However, if you want to utilize Microsoft Project to manage projects, tasks, durations, work breakdowns, and assignments, then you probably are wanting to expose that detailed project information via the web site and let other project team members view and update the information directly from their browser.  That’s exactly what Project Server 2010 does.  Project Server 2010 is actually built on top of SharePoint 2010, provides a seamless integrated web experience, and allows you to cohesively interact with your entire project team. 

Microsoft Project Server 2010 follows the same client/server licensing model as SharePoint Server 2010 for internal users.  A server license is required for each server, and a Microsoft Project Server 2010 Client Access License (CAL) is required for each user that will authenticate and utilize the software.  Keep in mind that Project Server 2010 runs on top of SharePoint Server 2010, therefore you must have appropriate licensing for SharePoint Server 2010 with both the Standard CAL and Enterprise CAL as well.  Depending upon your configuration within an enterprise environment, this licensing required for each user may include:

  • Project Server CAL (note that Microsoft Project Professional 2010 also includes a Project Server 2010 CAL)
  • SharePoint Standard CAL
  • SharePoint Enterprise CAL
  • SQL Server CAL
  • Windows Server CAL

  Visit Microsoft to get the Project Server 2010 Licensing Guide for full details.

This is Part II in a series on SharePoint 2010 Licensing.  View the entire series:
SharePoint 2010 Licensing Part I: http://stovereffect.com/2010/06/29/sharepoint-2010-licensing-part-i-the-basics/
SharePoint 2010 Licensing Part II: http://stovereffect.com/2010/06/30/sharepoint-2010-licensing-part-ii-windows-server-and-sql-server/

Microsoft releases Search Engine Optimization Toolkit (SEO)

First, I think that this idea is brilliant.  It’s about time that Microsoft released a tool to help admins improve Web Site relevance.  Unfortunately, this SEO Toolkit is for IIS 7 only.  Unfortunately again, the SEO Toolkit is an IIS 7 add in – which means that it is only useful for folks that actually have IIS Administrator Rights (whereas SEO reaches far beyond admins).  I think that much of the functionality could have been relased as a stand-alone desktop app that could spider against ANY website that you are using and offer many of the same suggestions.
However, there are definitely some great things you can do BECAUSE it is an IIS 7 add in.  One of the most immediate is supporting and managing files directly on the server (such as SiteMap.xml and robots.txt).

The toolkit is in Beta – but I’m eager to try it out.  Has anyone else used this yet?

You can get the full details and the download from IIS.NET at http://www.iis.net/extensions/SEOToolkit.  Here is a snippet directly from the site.
The IIS Search Engine Optimization (SEO) Toolkit helps Web developers, hosting providers, and Web server administrators to improve their Web site’s relevance in search results by recommending how to make the site content more search engine-friendly. The IIS SEO Toolkit includes the Site Analysis module, the Robots Exclusion module, and the Sitemaps and Site Indexes module, which let you perform detailed analysis and offer recommendations and editing tools for managing your Robots and Sitemaps

Microsoft Search now available for FREE!

Based on the same technology as the SharePoint 2007 Search, you too can have Microsoft Search for FREE!  As part of the newly released Microsoft Search Server 2008 product line, you can have Microsoft Search Server 2008 Express for FREE! 

I would like to point out the differences in the following hierarchy of products from "low-end" to "high-end".

1. Microsoft Search Server 2008 Express (MSS Express). Free (enough said!), but also has support for Search Center, No Pre-Set Document Limits, Extensible Search Experience, Relevance Tuning, Continuous Propagation Indexing, Federated Search Connectors, Indexing Connectors, Security-Trimmed Results, Unified Administration Dashboard, Query and Results Reporting, Streamlined Installation.

2. Microsoft Search Server 2008 (MSS).  This product supports all of the functionality listed in the Express edition, but adds High Availability and Load Balancing capabilities.

3. Microsoft Office SharePoint Server 2007.  In addition to everything that both of the above versions support, SharePoint 2007 of course adds the great SharePoint Productivity Infrastructure, as well as the People and Expertise Search and the Business Data Catalog (BDC).

Get more details, including the download, at http://www.microsoft.com/enterprisesearch/serverproducts/searchserverexpress/default.aspx

SharePoint 2007 Search Analytics Overview

I think that one of the most useful features of the included reports that are provided with the new MOSS SharePoint 2007 Analytics and Reports are the Search Analytics.  While there are a host of improved analytical tools with this new version of SharePoint, such as the SharePoint 2007 Site Collection Usage Reports and the SharePoint 2007 Audit Log Reports, I find that one of the most powerful tools is the Search Analytics tool set: SharePoint 2007 Search Queries reports and SharePoint 2007 Search Results reports.  These reports are accessible by clicking Site Actions, Site Settings, Modify All Site Settings.  Then click on Site Collection Usage Reports under the Site Collection Administration area.

The SharePoint 2007 Search Queries Report shows some pretty simple but useful pieces of the search information. Number of Queries is shown in two views "Queries Over Previous 30 Days" and "Queries Over Previous 12 Months".  "Queries Per Scope Over Previous 30 Days" shows both a pie chart and details of actual numbers detailing the Scope, the number of queries, and the percentage of overall that this number of queries represents.  "Top Queries Over Previous 30 Days" is perhaps the most useful of these reports.  This report shows the actual keyword that was being searched for, what scope it was searched against, and the number of occurrences. 

The SharePoint 2007 Search Results report shows even more useful information.  "Search Results Top Destination Pages", "Queries With Zero Results", and "Queries With Low Clickthrough".  Of course, the following two are really only useful if you are making use of the SharePoint 2007 Search Best Bets features: "Most Clicked Best Bets" and "Queries With Zero Best Bets".

Depending upon what statistics you believe (if you believe any statistics), I’ve heard that more than half of all users immediately look for and use the search engine on your site instead of trying to figure out your navigation.  So, if you are using traditional web analytics (which pages users hit, browsing paths, landing pages, etc.), you are missing more than half of the truly useful analytical information.  Using the search engine analytics component in conjunction with the traditional web analytics numbers allows you to see a more complete picture of your end users site usage pattern.  The search analytics provides information on what the user was trying to find on your site, what they did find, what they did not find, and in turn what they keywords they are using actually mean to them!  Just because you are using a particular term, keyword, or phrase to describe something within your site, your users may be looking for that very information but using a different phrase that describes it.  Furthermore, if you find that users are continually looking for some type of information (content, product, or other) on your site because it seems logical that your site should have it, you may not even offer what they are even looking for.  Using these search analytic reports, you can drive content based on what folks are looking for when they visit your site.  This is very powerful information indeed!  You can cater and fine tune your content based upon what people are actually looking for!  Now that is personalized customer service.

I do find that out of the box reports that are exposed on the SharePoint 2007 Search Analytics to be a bit lacking ( but so are all of the SharePoint 2007 analytical reports, in my opinion). This is why I generally recommend to clients that they use not only the provided SharePoint 2007 Analytics and Reports but also use a third party package, such as Google Analytics (or Omniture, WebTrends, Visual Sciences, ClickTrack, etc.). Using the SharePoint 2007 Analytics gives you a great view of site activity, but much more importantly it includes the basic search analytics.

And now for what I’m sure will be great news: Google Analytics is finally including Search Analytics within their standard (free) package.  This is pulled directly from their news release… "you’ll be able to use Google Analytics to track site search activity. Simply edit any of your Google Analytics profiles to enable "Site Search" and you can find out what people search for on your site and where these searches lead. Located in the Content section of your Google Analytics reporting interface, Site Search reports show you the keywords and search refinement keywords people use, the pages from which people begin and end their searches."  -pulled from http://analytics.blogspot.com/2007/10/exciting-announcements-at-emetrics.html

I’m not yet sure what this actually means.  Will this addition allow Google to supersede all of the SharePoint 2007 usage analytics information?  Obviously, the SharePoint 2007 Audit Log Reports will still be valid, but will the Site Collection Usage Reports become unnecessary?    I think Google Analytics already does some of the analytical reporting (specifically presentation) better, but I’m going to reserve judgment until I actually see how the new Google Search Analytics performs…