Showing posts with label full-text. Show all posts
Showing posts with label full-text. Show all posts

Sunday, March 11, 2012

accents in full-text searching

If I have accented characters in my table which is full-text indexed then I
cannot search for the corresponding non-accented character. e.g. I have the
character "é" in my column (which is nvarchar), so searching for "é" comes up
with results because it is actually entered as "é" in the database, but
searching for "e" does not come up with any results. I have tried changing
the collation on this column to SQL Collation "SQL_Latin1_General_CP1_CI_AI"
and then I tried Windows Collation "Latin1_General_CI_AI". My database
collation name is "Latin1_General_CI_AI". I've set the default full-text
language to 0 (neutral).
Does anyone know how to solve the problem of accent insensitivity for
full-text searching?
my select @.@.version is:
Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002 14:22:05
Copyright (c) 1988-2003 Microsoft Corporation Personal Edition on Windows NT
5.1 (Build 2600: Service Pack 2)
Thanks in advance
Enoch
This is handled correctly in SQL 2005. However in the meantime you should
trap for these accented versions of the words and expand your search for
both the accented and unaccented versions of the work.
So a search on bebe would be expanded to "bb" or "bebe".
There will be some correct stemming when you do a FreeText query is you are
using the French word breaker. For instance if you have removed noise words
for all verb forms of the verb to be (etre) from the French noise word list,
you will get search results to the accented versions of this word when doing
a FreeText search.
Hilary Cotter
Looking for a SQL Server replication book?
Now available for purchase at:
http://www.nwsu.com/0974973602.html
"Enoch" <Enoch@.discussions.microsoft.com> wrote in message
news:E10365FE-F791-4973-84F1-DC4C9596B1AD@.microsoft.com...
> If I have accented characters in my table which is full-text indexed then
> I
> cannot search for the corresponding non-accented character. e.g. I have
> the
> character "" in my column (which is nvarchar), so searching for "" comes
> up
> with results because it is actually entered as "" in the database, but
> searching for "e" does not come up with any results. I have tried
> changing
> the collation on this column to SQL Collation
> "SQL_Latin1_General_CP1_CI_AI"
> and then I tried Windows Collation "Latin1_General_CI_AI". My database
> collation name is "Latin1_General_CI_AI". I've set the default full-text
> language to 0 (neutral).
> Does anyone know how to solve the problem of accent insensitivity for
> full-text searching?
> my select @.@.version is:
> Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002 14:22:05
> Copyright (c) 1988-2003 Microsoft Corporation Personal Edition on Windows
> NT
> 5.1 (Build 2600: Service Pack 2)
> Thanks in advance
> Enoch
>

accent insensitive full text search

I am running SQL 2000 SP3a on Windows 2000.
Is it possible to obtain accent insensitive results from full-text searches
so that a search for caf will return all records for caf and cafe as if
these were the same?
In the four queries below queries 2a and 2b return the same results as each
other whereas queries 1a and 1b return different results from each other.
1a. select * from MyTable where CONTAINS(*,'cafe')
1b. select * from MyTable where CONTAINS(*,'caf')
2a. select * from MyTable where searchtext like '%cafe%'
2b. select * from Mytable where searchtext like '%caf%'
Thanks
Joseph
No,it is not possible.
This issue has already been discussed in detail several times, do a search
in this newsgroup.
Matthias
"Joseph Tervit" <joseph@.company-net.com> wrote in message
news:eaRfNYeMEHA.1340@.TK2MSFTNGP12.phx.gbl...
> I am running SQL 2000 SP3a on Windows 2000.
> Is it possible to obtain accent insensitive results from full-text
searches
> so that a search for caf will return all records for caf and cafe as if
> these were the same?
> In the four queries below queries 2a and 2b return the same results as
each
> other whereas queries 1a and 1b return different results from each other.
> 1a. select * from MyTable where CONTAINS(*,'cafe')
> 1b. select * from MyTable where CONTAINS(*,'caf')
> 2a. select * from MyTable where searchtext like '%cafe%'
> 2b. select * from Mytable where searchtext like '%caf%'
> Thanks
> Joseph
>
|||Joseph,
Matthias, yes this has been discussed many, many (too many) times in this
newsgroup, IMHO. However, with the new Microsoft newsgroup msg roll-off
policy, the past replies might not be searchable (except via Google groups,
of course! ;-). One easy solution is to simply install SharePoint Server
2001 or 2003 as either will upgrade the MSSearch service to the SPPS version
that does support accent insensitive search.
Regards,
John
"Matthias HALDIMANN" <matthias.haldimann@.epfl.ch> wrote in message
news:409883a9$1@.epflnews.epfl.ch...[vbcol=seagreen]
> No,it is not possible.
> This issue has already been discussed in detail several times, do a search
> in this newsgroup.
> Matthias
>
> "Joseph Tervit" <joseph@.company-net.com> wrote in message
> news:eaRfNYeMEHA.1340@.TK2MSFTNGP12.phx.gbl...
> searches
if[vbcol=seagreen]
> each
other.
>
|||Hi,
I have two questions relating to your suggestion of installing sharepoint.
1. will this work for both sql fts and indexing server fts
(I currently have a file system catolog that I have linked into slq server, I do joined queries that search that catalog and and sql catalog for the same phrases)
2. if so, hwo do I rebuild the catalogs to be case insensitive?
thanks in advance
randyr
"John Kane" wrote:

> Joseph,
> Matthias, yes this has been discussed many, many (too many) times in this
> newsgroup, IMHO. However, with the new Microsoft newsgroup msg roll-off
> policy, the past replies might not be searchable (except via Google groups,
> of course! ;-). One easy solution is to simply install SharePoint Server
> 2001 or 2003 as either will upgrade the MSSearch service to the SPPS version
> that does support accent insensitive search.
> Regards,
> John
>
>
> "Matthias HALDIMANN" <matthias.haldimann@.epfl.ch> wrote in message
> news:409883a9$1@.epflnews.epfl.ch...
> if
> other.
>
>
|||Randyr,
Hmm... a blast from the past... a past posting of mine...
1. will this work for both sql fts and indexing server fts
A. No. While the SQL FTS and Indexing Services use the same basic underlying
technology, they are implemented differently and SharePoint was developed
after both of these products were shipped.
2. if so, how do I rebuild the catalogs to be case insensitive?
A. You cannot rebuild a SQL Server 2000 FT Catalog to be "case insensitive"
or "accent insensitive" as I'm assuming that's what you meant to say...
.. However, this will be possible with SQL Server 2005 (Yukon) this will be
supported via T-SQL: "CREATE FULLTEXT CATALOG FT_Catalog_Name WITH
ACCENT_SENSITIVITY = OFF AS DEFAULT"
Either way, I'd highly recommend that you test this functionality this in a
test environment FIRST before implementing it on your production servers.
Regards,
John
"randyr" <randyr@.online.nospam> wrote in message
news:70BBE3FA-EA50-4D76-9E45-F52B3E5812CF@.microsoft.com...
> Hi,
> I have two questions relating to your suggestion of installing sharepoint.
> 1. will this work for both sql fts and indexing server fts
> (I currently have a file system catolog that I have linked into slq
server, I do joined queries that search that catalog and and sql catalog for
the same phrases)[vbcol=seagreen]
> 2. if so, hwo do I rebuild the catalogs to be case insensitive?
> thanks in advance
> --
> randyr
>
> "John Kane" wrote:
this[vbcol=seagreen]
groups,[vbcol=seagreen]
version[vbcol=seagreen]
search[vbcol=seagreen]
as[vbcol=seagreen]
as[vbcol=seagreen]

Thursday, March 8, 2012

Abstracts for Full-Text Index entries

When using Microsoft Index Server file system catalogs, the Index Server
generates 'abstracts' which give the user an 'at a glance' indication of the
entry contents. Does Full-Text Indexing within SQL Server provide such
abstracts?
I am developing a search facility for a web site where I want to combine
search results for static web pages (managed by Index Server) with Full-Text
search results from SQL. I'll union the two results sets. I can get
abstracts for the Index Server entries, but can't find anything that will
give me a similar abstracts for the SQL entries. I could just use column
contents, but I'd rather have something more in keeping with the Index Server
abstracts.
Anyone got any good ideas?
Many thanks!
Indexing services abstracts are the first 320 bytes of textual data in
Office documents or the contents of the description metatag for html
documents. If the html document does not have a description metatag, its the
first 320 bytes of the html document.
For sql you have to do something like this
declare @.searchphrase varchar(20)
declare @.MarkedupSearchPhrase varchar(20)
set @.searchphrase='shadow'
set @.MarkedupSearchPhrase ='<B>shadow</B>'
select --1,firstword=len(@.searchphrase)-(len(substring(charcol,charindex(@.se
archphrase,
charcol)-50,100))-charindex('shadow',substring(charcol,charindex(@.se archphra
se, charcol)-50,100))),
--len(substring(charcol,charindex(@.searchphrase, charcol)-50,100)),
MarkedUpColumn = case when len(charcol) >100 and
len(@.searchphrase)-(len(substring(charcol,charindex(@.searchphrase,
charcol)-50,100))-charindex(@.SearchPhrase,substring(charcol,charinde x(@.searc
hphrase, charcol)-50,100))) <>-1 then
replace(ltrim(substring(replace(substring(charcol, charindex(@.searchphrase,
charcol)-50,100),@.searchphrase,@.MarkedupSearchPhrase),1,len (replace(substrin
g(charcol,charindex(@.searchphrase,
charcol)-50,100),@.searchphrase,@.MarkedupSearchPhrase))-
len(left(reverse(replace(substring(charcol,charind ex(@.searchphrase,
charcol)-50,100),@.searchphrase,@.MarkedupSearchPhrase)),char index('
',reverse(replace(substring(charcol,charindex(@.sea rchphrase,
charcol)-50,100),@.searchphrase,@.MarkedupSearchPhrase))))))) ,char(13)+char(10
),' ')
when len(charcol) >100 and
len(@.searchphrase)-(len(substring(charcol,charindex(@.searchphrase,
charcol)-50,100))-charindex(@.SearchPhrase,substring(charcol,charinde x(@.searc
hphrase, charcol)-50,100))) =-1 then
replace(substring(charcol,charindex(@.searchphrase,
charcol)-50,100),@.searchphrase,@.MarkedupSearchPhrase)
else replace(replace(substring(charcol,charindex(@.searc hphrase,
charcol)-50,100),@.searchphrase,@.MarkedupSearchPhrase),char( 13)+char(10),' ')
end
from texttable where FreeText(*,@.searchphrase)
this needs some more work as I have to correctly remove the first word or
word fragment in the markedupColumn (as long as its not the search phrase).
This also does inline hit highlighting.
For "true" abstracts like what indexing services delivers you should use the
left function.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Dave Hickson" <DaveHickson@.discussions.microsoft.com> wrote in message
news:BEEC89DE-421F-4C0C-84A2-94ADA5B3B56C@.microsoft.com...
> When using Microsoft Index Server file system catalogs, the Index Server
> generates 'abstracts' which give the user an 'at a glance' indication of
the
> entry contents. Does Full-Text Indexing within SQL Server provide such
> abstracts?
> I am developing a search facility for a web site where I want to combine
> search results for static web pages (managed by Index Server) with
Full-Text
> search results from SQL. I'll union the two results sets. I can get
> abstracts for the Index Server entries, but can't find anything that will
> give me a similar abstracts for the SQL entries. I could just use column
> contents, but I'd rather have something more in keeping with the Index
Server
> abstracts.
> Anyone got any good ideas?
> Many thanks!
|||Thanks very much Hilary. I assume from your answer that there is therefore
no built-in support for such abstracts?
"Hilary Cotter" wrote:

> Indexing services abstracts are the first 320 bytes of textual data in
> Office documents or the contents of the description metatag for html
> documents. If the html document does not have a description metatag, its the
> first 320 bytes of the html document.
> For sql you have to do something like this
> declare @.searchphrase varchar(20)
> declare @.MarkedupSearchPhrase varchar(20)
> set @.searchphrase='shadow'
> set @.MarkedupSearchPhrase ='<B>shadow</B>'
> select --1,firstword=len(@.searchphrase)-(len(substring(charcol,charindex(@.se
> archphrase,
> charcol)-50,100))-charindex('shadow',substring(charcol,charindex(@.se archphra
> se, charcol)-50,100))),
> --len(substring(charcol,charindex(@.searchphrase, charcol)-50,100)),
> MarkedUpColumn = case when len(charcol) >100 and
> len(@.searchphrase)-(len(substring(charcol,charindex(@.searchphrase,
> charcol)-50,100))-charindex(@.SearchPhrase,substring(charcol,charinde x(@.searc
> hphrase, charcol)-50,100))) <>-1 then
> replace(ltrim(substring(replace(substring(charcol, charindex(@.searchphrase,
> charcol)-50,100),@.searchphrase,@.MarkedupSearchPhrase),1,len (replace(substrin
> g(charcol,charindex(@.searchphrase,
> charcol)-50,100),@.searchphrase,@.MarkedupSearchPhrase))-
> len(left(reverse(replace(substring(charcol,charind ex(@.searchphrase,
> charcol)-50,100),@.searchphrase,@.MarkedupSearchPhrase)),char index('
> ',reverse(replace(substring(charcol,charindex(@.sea rchphrase,
> charcol)-50,100),@.searchphrase,@.MarkedupSearchPhrase))))))) ,char(13)+char(10
> ),' ')
> when len(charcol) >100 and
> len(@.searchphrase)-(len(substring(charcol,charindex(@.searchphrase,
> charcol)-50,100))-charindex(@.SearchPhrase,substring(charcol,charinde x(@.searc
> hphrase, charcol)-50,100))) =-1 then
> replace(substring(charcol,charindex(@.searchphrase,
> charcol)-50,100),@.searchphrase,@.MarkedupSearchPhrase)
> else replace(replace(substring(charcol,charindex(@.searc hphrase,
> charcol)-50,100),@.searchphrase,@.MarkedupSearchPhrase),char( 13)+char(10),' ')
> end
> from texttable where FreeText(*,@.searchphrase)
>
> this needs some more work as I have to correctly remove the first word or
> word fragment in the markedupColumn (as long as its not the search phrase).
> This also does inline hit highlighting.
>
> For "true" abstracts like what indexing services delivers you should use the
> left function.
>
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
> "Dave Hickson" <DaveHickson@.discussions.microsoft.com> wrote in message
> news:BEEC89DE-421F-4C0C-84A2-94ADA5B3B56C@.microsoft.com...
> the
> Full-Text
> Server
>
>
|||Not in this current version, or the next.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Dave Hickson" <DaveHickson@.discussions.microsoft.com> wrote in message
news:71E03E56-85DE-45DC-A148-342756EC7CC9@.microsoft.com...
> Thanks very much Hilary. I assume from your answer that there is
therefore[vbcol=seagreen]
> no built-in support for such abstracts?
> "Hilary Cotter" wrote:
the[vbcol=seagreen]
select --1,firstword=len(@.searchphrase)-(len(substring(charcol,charindex(@.se[vbcol=seagreen]
charcol)-50,100))-charindex('shadow',substring(charcol,charindex(@.se archphra[vbcol=seagreen]
charcol)-50,100))-charindex(@.SearchPhrase,substring(charcol,charinde x(@.searc[vbcol=seagreen]
replace(ltrim(substring(replace(substring(charcol, charindex(@.searchphrase,[vbcol=seagreen]
charcol)-50,100),@.searchphrase,@.MarkedupSearchPhrase),1,len (replace(substrin[vbcol=seagreen]
charcol)-50,100),@.searchphrase,@.MarkedupSearchPhrase))))))) ,char(13)+char(10[vbcol=seagreen]
charcol)-50,100))-charindex(@.SearchPhrase,substring(charcol,charinde x(@.searc[vbcol=seagreen]
charcol)-50,100),@.searchphrase,@.MarkedupSearchPhrase),char( 13)+char(10),' ')[vbcol=seagreen]
or[vbcol=seagreen]
phrase).[vbcol=seagreen]
the[vbcol=seagreen]
Server[vbcol=seagreen]
of[vbcol=seagreen]
such[vbcol=seagreen]
combine[vbcol=seagreen]
will[vbcol=seagreen]
column[vbcol=seagreen]