Pingar – Discover new value from unstructured data

By peter.stilgoe









This looks very interesting tool & very useful in creating and maintaining your companies taxonomy.

“At PINGAR we have been building semantic search applications for business enterprises since 2007.

Consider the facts. The volume of data companies need to manage is growing 40 percent a year. Fifty percent of data searches are unsuccessful — employees may lose up to 25 percent of their productive time searching for information. And the cost to enterprises from failed searches is approximately USD $5.3 million per year, for every 1000 enterprise workers employed.

PINGAR’s research teams had three objectives:

•Deliver applications to assist enterprise workers to find more relevant data, faster
•Improve employees’ search experience
•Improve enterprise productivity and drive down costs.

These objectives were achieved with PINGAR’s Microsoft SharePoint 2007 and 2010, and Apache SOLR semantic search applications. Each can be purchased as an easy to install ‘plug-in’ server application, for distribution across enterprise networks.”

Find out more Pingar.com

Share

Leggi tutto

Increase your search relevancy by removing views & other irrelevant results from your Sharepoint search

By peter.stilgoe









By deafult Sharepoint search results will include ‘noise’ pages in the search results, such as views, all forms etc etc. If you dont want these to appear in your search results, which most people wont you can exlude them by creating exlude rules in the SSP.

1) Navigate to the SSP.
2) Click search settings under Search.
3) Click Crawl Rules.

Some common noise pages you may wish to include:

*://*webfldr.aspx – This will exclude all explorer view pages if you choice it as a exclude rule.

*://*mod-view.aspx* – This will exclude the Moderation view page if you choice it as a exclude rule.

*://*my-sub.aspx* – This will exclude the page with your items if you choice it as a exlude rule.

*://*allitems.aspx* – This will exclude the allitems page from the search results if you choice it as a exclude rule.

*://*allforms.aspx* – This will exclude the all forms page from the search results if you choice it as a exclude rule.

*://*/lists/* – This will exclude the list from the search results if you choice it as a exclude rule.

*://*DispForm.aspx* – This will exclude the list display form from the search results if you choice it as a exclude rule.

Now run a full crawl & youre done.

Share

Leggi tutto

Sharepoint Search – How features work part 1

By peter.stilgoe









Word breakers A word breaker is a component used by the query and index engines to break compound words and phrases into individual words or tokens. If there is no word breaker for a specific language, the neutral word breaker is used, in which case word breaking occurs where there are white spaces between the words and phrases. At indexing time, if there is any locale information associated with the document (for example, a Word document contains locale information for each text chunk), the index engine will try to use the word breaker for that locale. If the document does not contain any locale information, the user locale of the computer the indexer is installed on is used instead. At query time, the locale (HTTP_ACCEPT_LANGUAGE) of the browser from which the query was sent is used to perform word breaking on the query. Additional information about the language availability of the word breaker component is available in Appendix B: Search Language Considerations.

Stemming Stemming is a feature of the word breaker component used only by the query engine to determine where the word boundaries are in the stream of characters in the query. A stemmer extracts the root form of a given word. For example, ”running,” ”ran,” and ”runner“ are all variants of the verb ”to run.” In some languages, a stemmer expands the root form of a word to alternate forms. Stemming is turned off by default. Stemmers are available only for languages that have morphological expansion; this means that, for languages where stemmers are not available, turning on this feature in the Search Result Page (CoreResult Web Part) will not have any effect. Additional information about language availability for the Stemmer feature is available in Appendix B: Search Language Considerations.

Noise words dictionary Noise words are words that do not add value to a query, such as ”and,” ”the,” and ”a.” The indexing engine filters them to save index space and to increase performance. Noise word files are customizable, language-specific text files. These files are a simple list of words, one per line. If a noise word file is changed, you must perform a full update of the index to incorporate the changes. Additional information about the noise words dictionary and how to customize it is available at www.microsoft.com.

Custom dictionary The custom dictionary file contains values that the search server must include at index and query times. Custom dictionary lists are customizable, language-specific text files. These files are used by Search in both the index and query processes to identify exceptions to the noise word dictionaries. A word such as “AT&T,” for example, will never be indexed by default because the word breaker breaks it into single noise words. To avoid this, the user can add ”AT&T” to the custom dictionary file; as result, this word will be treated as an exception by the word breaker and will be indexed and queried. These files contain a simple list of words, one per line. If the custom dictionary file is changed, you must perform a full update of the index to incorporate the changes. By default, no custom dictionary file is installed during Office SharePoint Server 2007 Setup. Additional information about the custom dictionary file and how to customize it is available at www.microsoft.com.

Thesaurus There is a configurable thesaurus file for each language that Search supports. Using the thesaurus, you can specify synonyms for words and also automatically replace words in a query with other words that you specify. The thesaurus used will always be in the language of the query, not necessarily the server’s user locale. If a language-specific thesaurus is not available, a neutral thesaurus (tseneu.xml) is used. Additional information about the thesaurus file and how to customize it is available at www.microsoft.com.

Language Auto Detection The Language Auto Detection (LAD) feature generates a best guess about the language of a text chunk based on the Unicode range and other language patterns. Basically, it’s used for relevance calculation by the index engine and in queries sent from the Advanced Search Web Part, where the user is able to specify constraints on the language of the documents returned by a query.

Did You Mean? The Did You Mean? feature is used by the query engine to catch possible spelling errors and to provide suggestions for queries. The Did You Mean? feature builds suggestions by using three components:

· Query log Information tracked in the query log includes the query terms used, when the search results were returned for search queries, and the pages that were viewed from search results. This search usage data helps you understand how people are using search and what information they are seeking. You can use this data to help determine how to improve the search experience for users.

· Dictionary lexicon A dictionary of most-used lexicons provided at installation time.

· Custom lexicon A collection of the most frequently occurring words in the corpus, built at query time by the query engine from indexed information.

The Did You Mean? suggestions are available only for English, French, German, and Spanish.

Definition Extraction The Definition Extraction feature finds definitions for candidate terms and identifies acronyms and their expansions by examining the grammatical structure of sentences that have been indexed (for example, NASA, radar, modem, and so on). It is only available for English.

Share

Leggi tutto

Sharepoint Customising Search Series

By peter.stilgoe









Customising Sharepoint Search: Customising Sharepoint Search Series

Share

Leggi tutto

Search results return wrong url for list items 1_.000 etc

By peter.stilgoe









When performing searches on MOSS / Sharepoint your list items return invalid URL’s ie.

http://test-server/contacts/lists/contacts/1_.000 or 2_.000 or 3_.000 and so on.

This problem is being caused by our document imaging software from Knowledgelake, during the installation it adds some meta data mappings in the SSP.

To fix this go into your SSP –> Search Settings –> Meta data property mappings and edit the property called ‘path’

The first mapping is the cause of the problem ‘ows_ItemURL(Text)’

Remove this entry.

Reset all crawled content.

Now perform a Full Crawl & list items will now be returned with the correct url’s.

Share

Leggi tutto

MOSS 2007 Enterprise search example SQL queries

By peter.stilgoe









Finds relevant results containing the keyword SharePoint.

SELECT WorkId,Path,Title,Write,Author,HitHighlightedSummary, HitHighlightedProperties,CollapsingStatusFROM Scope()WHERE FREETEXT(defaultproperties, ‘SharePoint’) ORDER BY Rank DescFinds relevant results containing at least one of the keywords SharePoint and Search.

SELECT WorkId,Path,Title,Write,Author,…FROM Scope()WHERE FREETEXT(defaultproperties, ‘SharePoint Search’) ORDER BY Rank DescFinds relevant results containing both the keywords SharePoint and Search.

SELECT WorkId,Path,Title,Write,Author,…FROM Scope()WHERE FREETEXT(defaultproperties, ‘+SharePoint +Search’) ORDER BY Rank DescFinds relevant results containing the exact phrase SharePoint Search.

SELECT WorkId,Path,Title,Write,Author,…FROM Scope()WHERE FREETEXT(defaultproperties, ‘ “SharePoint Search” ‘) ORDER BY Rank DescFinds relevant results containing both the keywords SharePoint and Search but not the keyword WSS.

SELECT WorkId,Path,Title,Write,Author,…FROM Scope()WHERE FREETEXT(defaultproperties, ‘+SharePoint +Search -WSS’) ORDER BY Rank DescFinds relevant SharePoint results authored by persons named John.

SELECT WorkId,Path,Title,Write,Author,…FROM Scope()WHERE FREETEXT(defaultproperties, ‘SharePoint’) AND CONTAINS(Author,’ “John” ‘)ORDER BY Rank DescFinds relevant SharePoint results modified within the last 30 days.

SELECT WorkId,Path,Title,Write,Author,…FROM Scope()WHERE FREETEXT(defaultproperties, ‘SharePoint’) AND Write<=DATEADD(DAY,30,GETGMTDATE())ORDER BY Rank Desc

Share

Leggi tutto

MOSS 2007 Search Query Web Service Test Tool

By peter.stilgoe









The search web service is located at http://your-portal/_vti_bin/search.asmx. It has been developed as a UI test tool for the WSS 3.0 and MOSS 2007 search web service. It allows you to change property flags, generate the request XML, send the string to the web service and see the results in the UI. The accessible methods include Query, QueryEx, GetSearchMetaData (available for MOSS only) and GetPortalSearchInfo (available for MOSS only).

Download here MOSS 2007 Search Query Web Service Test Tool

Share

Leggi tutto