There are literally hundreds and thousands of âsearch enginesâ out there. Some of these search engines are for finding stuff on the Internet, like Google, Bing and Yahoo. Some search engines are more specialized, like the search box you see on a single web site that searches only that single website. Search is an incredibly complex topic that has an astounding number of factors that contribute to finding that single important piece of content that you are trying to find. Frankly, Google spoiled all of us. I expect to find exactly what Iâm looking for out of the millions of pages of stuff all over the internet by simply typing a single word into a single little box. If I donât find what I want on the first page of results, I might try changing my search a little bit or adding two words, but I wonât keep trying for long.
The Internet contains at least 27.5 billion pages, as of Tuesday, 03 August, 2010, according to http://www.worldwidewebsize.com. Not only do I expect to find exactly what I want on the Internet, but if I use the search on your website, I get EXTREMELY frustrated when it doesnât find exactly what I want when I want. How is this possible? I know what I want is on your website somewhere. Figure out what I want and show it to me! And please do it in under a second if itâs not too much trouble!
In the beginning, search was simple. Search was based on keyword matching. If I typed in a keyword, the âsearch engineâ scanned the content and found instances of that word and showed me hyperlinks with those results. I could search for âblogâ and the search would show me any page that had the word âblogâ in it. That was perfect! Itâs all anyone needed. Then websites started to grow in complexity. Soon, each website had thousands of pages. If I did a simple keyword search, I would get hundreds of results. This wasnât useful anymore. Search had to get better.
Search introduced major improvements. Boolean search operators were introduced. I could search for âSharePoint AND WordPressâ. I could search for âSharePoint NOT WordPressâ. I had some control on what I was searching for exactly. I also got search result sorting. I could sort all of the results to see the most recently created pages at the top. After all, if the page was newer then it clearly was more relevant, right?
That statement introduces a very important topic: RELEVANCE. Relevance denotes how well the results meet the need of the user searching; see the all-knowing Wikipedia for more details at http://en.wikipedia.org/wiki/Relevance_(information_retrieval). Relevance is determined by the search algorithm. Thatâs right; a computer programmer wrote a mathematical formula that uses the available information to determine the relevance of the content to your search word. In reality, that algorithm was written by a very large team of programmers, analysts, mathematicians, executives and many others. And the search is getting more complicated and far better every day.
Most modern search engines are comprised of two different primary components: the INDEX and the QUERY. The index is just like the index at the back of a book. Rather than scanning all of the content in real time, the search engine builds a big index of all of the content. This is much faster than scouring through the content in real time. Furthermore, the index can be optimized for the type(s) of searches being performed. Your individual website search is responsible for searching your website. Facebook search searches Facebook â the profiles, comments, photos, tags, etc. Google and Bing try to search everything â your website, my website, her website, their website. Your website search should search ALL of your content â web pages, HTML, PDF files, Word docs, PowerPoint files, Excel files, images, comments. The index should include ALL of your content.
So how is the index built? Usually indexes are built by a Web crawler â some type of automated software that scours all of the links and content on your site. The index uses the concept of word breaker to look for different words. In the English language, there are many characters that break words apart. Spaces, hyphens, periods, colons, semicolons, exclamation points all separate words in English. When you get into multi-lingual content, the story gets even more complicated because other languages donât even use the same characters. So the crawler goes through all of the content and builds this enormous index for use in queries. The index contains the words, counts, metadata, information about where the words were found, information about the pages, information about the documents, titles, cached portions of pages and much more.
When a user enters a query, the search engine uses itâs algorithm to provide the most relevant information possible. What determines relevancy? There are many factors that should determine relevancyâŚ
As you can see, the effectiveness of the search engine depends on the ability to determine relevance and then use that relevance to rank the search results. Modern search engines are available both inherently integrated and completely independent from your website content management technology. WordPress, for example, has a built in search that is pretty simple (and thus largely ineffective). Itâs great for finding a keyword, but I would hardly call it a search engine. Both Microsoft and Google provide real search solutions. The have solutions for you at every level: your desktop, your enterprise, your website, and the Internet. We are focusing primarily on your website and to a lesser extent your enterprise. The Google Search Appliance provides a great solution that provides excellent relevancy that can be customized for your particular web site needs. The Google Search Appliance and Google Mini require annual maintenance fees.
Microsoft provides a free solution to search for your website and for the enterprise. Thatâs right; Microsoft provides enterprise level search capabilities for FREE. Microsoft Search Server 2010 Express provides the search capabilities described in this overview for FREE. While this solution may not be the perfect fit for every website, I think it is at least worth evaluating. You can download the software for free, install it, and configure it in a matter of minutes. If it works for you, implementing it with your website is as simple as replacing the search box.
SharePoint has extremely robust content storage capabilities. Being such a robust framework, there are no âwrong waysâ to use SharePoint. Sure, Iâve seen SharePoint poorly implemented, but that actually speaks to the capabilities of the platform. Due to the feature rich toolset, there are literally hundreds of ways to configure and use SharePoint. Some are great, some not so good. That is a primary reason behind the concept of best practices. Unfortunately, best practices are generally taken as the only way to do something in a technology platform, but in reality these best practices are usually just prescriptive guidance based upon experience, usability, functionality, and performance.
So what is the best practice related to document libraries with regards to folders? Do you use folders or not? Here is my prescriptive guidanceâŚ
![]()
Using folders is such a great concept that the idea largely hasnât changed since the advent of paper. In fact, even Multics utilized the concept of folders in the early 1960s. The idea of using folders is simple â store related content items close together to make them easier to find when you need them. In fact, I use folders all the time at home. I have a an entire file folder cabinet that I use to store papers in their relevant folders. I have folders for bills, folders for tax info, folders for warranty information, etc. I use these folders out of necessity because the content that I store in them is physical â not digital.
Folders have persevered through nearly all versions of computing devices from websites to mobile devices.  Does it make sense to keep doing something just because thatâs the way weâve always done it? Folders may be easy to understand and explain, but is it really the best use of technology?
I donât think so. I think folders are an antiquated way of storing and retrieving content, and Iâm not alone in this. Google agrees with me. Yes, the multi-billion dollar organization has a singular hive mind â and this massive mind agrees with me. Donât believe me? Gmail doesnât have folders. Gmail has labels.
Labels, tags, keywords or metadata are terms that people use interchangeably. Labels can be applied to any piece of content to help describe the content item. Most things you purchase have labels: food, clothing, autos, computers, and even mobile devices. they all come with attached labels. Labels can also be attached to content. For example, if I upload a video to share of my child swimming and title it, âJohnâs kids at the beachâ, you have no idea from the title alone that it is a video about a 7 year old child learning to swim to a floating dock.   This is where adding labels to help describe the video can help. I will likely add labels with my childâs name, and then some very specific labels, such as Learning, Dock, Ocean City, MD, Swimming, etc. This enables me to go back and find videos at a later date based on a variety of sorting. I could easily find all videos with that particular child. I could easily find all videos marked as Ocean City. I could easily find all videos that were specifically about Summer 2010. These labels will also help other people locate the information that they are seeking.
Can you do this with folders? What folders would you create? If I create a folder for each child, then there is no way to group by activities. If I create a folder for each type of activity, then there is no way to group by child. A major difference between folders and labels is that each piece of content can only exist in a single folder but can be marked with many labels.
SharePoint supports both folders and labels (though in SharePoint labels are called metadata and columns). So which should you use? I think the answer is clear: use metadata. Though they are definitely not mutually exclusive, here are some other good reasons to use metadata INSTEAD of folders.
Of course, you will still run into folders in SharePoint. In fact, SharePoint 2010 has many new enhancements around using folders. Plus, folders are comfortable. Some people will mention view limits in SharePoint as a reason for folders. SharePoint 2010 throttling makes this argument go away. Some people will still stand by organization. Other people will say that security is a reason to use folders. While itâs true that you can put security on a Folder (and thus the items within the folder), managing security at the subfolder level is both time consuming and a management headache. It is much easier to manage security at the library/list/site level, as typical best practices would prescribe. I mean, you have item level security too, but who wants to manage security at the item level? This is an exception and not the rule.
Am I saying that I avoid folders where possible? Yes. Am I saying that there is no place for folders? No. Folders can still be an effective tool if used correctly. Are folders and metadata mutually exclusive? Of course not! Even if you elect to use folders, you should still use an effective metadata structure.
Please wield this powerful folder weapon wisely…