Manually Configure SharePoint 2010 Search Service Application Topology

What a long title for a post, right?  And with the current trend to scripting installs, why would anyone in their right mind manually configure anything in SharePoint.  The truth of the matter is that I didn’t plan on configuring the Search Topology.  In fact, this manual configuration was done as part of my poorly scripted configuration.  It is rare in my experience that learning real lessons come from planning.  The real lessons come from life not going quite as planned…

How Should I Learn SharePoint?

“I’m new to SharePoint.  I have a background in ColdFusion, HTML, Java, Ruby, C++, Photoshop, (insert previous life here), etc., and now I want to learn SharePoint.  Where should I start?”

If you are trying to learn SharePoint, even getting started can be a little intimidating.  SharePoint information is available in many forms: books, blogs, discussion boards, conferences, events, webinars, and more. With so much information available for free, where do you start?

Service Pack 1 announced for SharePoint 2010, Office 2010, Project 2010, and Project Server 2010

While I’m not at TechEd Atlanta today, it’s good to see that The Microsoft Office Sustained Engineering Team announced that SharePoint 2010 Service Pack 1 is “on track for release at the end of June” 2011.  While I’m sure there are plenty of bug fixes included, I’m specifically interested in the following updates that are scheduled to be part of the Service Pack:

Five Best Improvements in SharePoint 2010

SharePoint 2010 is great, but what are the Top 5 enhancements or new features that are available in SharePoint 2010?  Everyone who works with SharePoint will have their own Top 5 lists.

  1. Social Network.  The world is now social.  I know, in reality the world has always been social.  The entire Internet is based upon the concept of social.  Being social is the new black.  Status updates, tags, notes, following, being followed – it’s all in SharePoint 2010 out of the box.  Users can tag and make notes everywhere – on docs, wikis, blogs, CMS pages, videos, pictures.  Everywhere.  Organizations have spent a ton of money in the last 4 years custom building these tools in a variety of platforms.
    image
  2. Business Connectivity Services (BCS).  BCS allows you to connect to external data sources.  Easily.  If you are connecting to an external SQL database it really could not be easier.  With SharePoint Designer 2010, you can map to an external data source, map an External Content Type to the data and use the intuitive SharePoint list interface on your external data.  Even better, this functionality exists in the free version of SharePoint – SharePoint Foundation 2010.
  3. SharePoint Designer 2010.  SharePoint Designer 2010 is free. With a completely reinvented user interface, this is now an extremely powerful tool in your SharePoint arsenal and not a tool to shy away from.  From creating external data sources and managing the UI to creating data views and reusable workflows (see Sean Bordner’s post), SharePoint Designer 2010 is my tool of choice for leveraging the power of SharePoint.  SharePoint Designer 2010 works with all versions SharePoint 2010, from Foundation up.
  4. Ribbon toolbar.  The overhaul performed on the SharePoint UI is great.  I’ve heard nothing but great reviews from end users.  People are used to working with Microsoft Office, and the familiar ribbon toolbar decreases the learning curve tremendously.  Good on ya.
  5. Search.  Search refinements, improved people search, and FAST Search for SharePoint – all great things.  Of course, FAST search is amazing, but it adds a little cost and complexity and requires additional hardware to run.  Search refinements are probably the most requested customization on the previous SharePoint search tools.  With these capabilities now available out of the box, it’s really a full feature search tool that can (and maybe should) be used on solutions across the enterprise – internally and externally.  SharePoint Search Server Express 2010 provides much of this functionality for absolutely free. You can plug this into your existing web site technology and completely revamp your search with modern functionality and flair quite easily.

Microsoft Search Server 2010 Express Part 2: External Content Source

One of the key potential uses of Search Server 2010 Express is to provide a great search engine for your existing public facing website.  I work with a lot of different associations that run a lot of different CMS platforms.  While I’m a huge fan of utilizing the CMS capabilities of SharePoint 2010 for a variety of reasons, there isn’t a single platform that is right for everyone.  There isn’t a single auto make and model for everyone, and there isn’t a single pair of shoes that will work for everyone, so why would the CMS industry be any different?  However, a powerful search IS relevant to everyone (pun intended!). 

In Part 1, we walked through a generic install.  Once you have the Search Server 2010 Express up and running, it is extremely simple to configure a new content source.  If you are jumping directly from the vanilla install, you should see a screen that will link you directly to the Search Administration page.

image
If you are just jumping in to Central Admin, the link path that you’ll need to get to the Search Administration page is under Application Management, click on Manage Service Applications, and then click on Search Service Application.  While the concept of Service Applications is beyond the scope of this particular post, know that in larger environments (such as SharePoint 2010) you can run multiple Search Service Applications.

image

In the left nav, under Crawling, click Content Sources.  You will be linked to Manage Content Sources page.  You can use this page to add, edit, or delete content sources, and to manage crawls.

image

Before we go any further, what is a Content Source?  For that matter, what is Content?  In the context of Microsoft SharePoint and Search Servers, Content is any item that can be indexed.  This can be HTML,a Web page, a Microsoft Office Word document, a text file, a PDF file, business data, or even an e-mail message.  Content lives somewhere, such as a Web site, file share, a Notes database, a SQL database, or SharePoint site.  A Content Source specifies the settings that define what content should be indexed and on what schedule it should be crawled.

You should notice on the Manage Content Sources page that there is at least one Content Source already defined: Local SharePoint sites.  Using the wizard to manage the install that we followed in Part 1, all local SharePoint sites are already defined as a Content Source. 

In order to create a new Content Source (such as our external site), click the New Content Source at the top.  You will see the Add Content Source Page:

image

Content Source Name – A title that you are giving as a reference to manage this Content Source.

Content Source Type – Type of Content that you will be crawling.  This is an important setting because it instructs the crawler on not only the type of content that will be located there, but also how to actually communicate with the Content Source.  For example, communicating with a File Share utilizes a completely different protocol than communicating with a web site.  The default types of Content Sources supported listed here.  Note that I said ‘default’.  You can work with vendors or write your own custom interface to crawl and index content types not specified out of the box.  Also note that if you select different types, the Crawl Settings change to specify different details for the specific type of Content Source you are specifying.

    • SharePoint Sites
    • Web Sites
    • File Share
    • Exchange Public Folders
    • Line of Business Data
    • Custom Repository

Start Addresses – the URLs the search system should start crawling.  For SharePoint sites and Web sites, these are traditional URLs.  For File Shares, these will be UNC paths that are accessible from the server.  You can supply more than one Start Address for a Content Source.  If, for example, I wanted to include a single Content Source to manage various SusQtech websites that I am crawling, I could add http://www.susqtech.com/, http://www.sharepointacademy.org, http://www.sharepointconference.org, and http://www.thesug.org.  I can then manage all of these URLs as a single Content Source.  I could also opt to create multiple Content Sources so that I can manage each of the crawl schedules and details independently.

Crawl Settings – used to specify the behavior of crawling for this Content Source.

image

Crawl Schedules – used to schedule the crawls for this Content Source. This allows you to configure 2 different crawl schedules: full and incremental.  Why would you ever want an incremental instead of a full?  Incremental crawls are supposed to only crawl content modified since the last crawl and thus take less bandwidth, server memory, and CPU cycles.  I typically configure these schedules with a Full crawl on the off hours on the weekend and Incremental crawls every night during the week.  Keep in mind that you may need more frequent incremental crawls – such as every hour for your public facing website if you are continuously adding new content. 

Content Source Priority – normal or high.  The crawler will prioritize ‘high’ items when you have multiple content sources that must be crawled.

Start Full Crawl – a checkbox to start a full crawl immediately.