Discussion:
does jcr:like has a wildcard like jcr:contain?
Juan Huang
2008-01-28 19:20:30 UTC
Permalink
In jcr:contain, I can use "." to match any attributes in the node, like
"/pages/*[jcr:contains(., 'Special')]
But, with jcr:like, I have to give the attribute name, I can only use
/pages/*[jcr:like(@title, '%Special%')], I can't use something like
/pages/*[jcr:like(@*, '%Special%')] to find all the nodes that has any
attribute like %Special%. Is there any way to accomplish this?

Thanks.

Juan
--
View this message in context: http://www.nabble.com/does-jcr%3Alike-has-a-wildcard-like-jcr%3Acontain--tp15143136p15143136.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
Ard Schrijvers
2008-01-28 20:14:34 UTC
Permalink
Hello Juan,

If you want to find all nodes that contain 'Special' in one of its
properties, "/pages/*[jcr:contains(., 'Special')]" is the way to do it
(in the node scope level all properties are added to the index, unless
you configure otherwise in the IndexingConfiguration [1]). I never tried
/pages/*[jcr:like(@*, '%Special%')], but i assume it does not work. Even
if we would implement the xpath to work, having a '%' prefix in jcr:like
is bad practice, see [2] for explanantion on some queries. I hope to
find some time in short notice to create a wiki page out of it.

Regards Ard

[1] http://wiki.apache.org/jackrabbit/IndexingConfiguration
[2]
http://mail-archives.apache.org/mod_mbox/jackrabbit-users/200801.mbox/%3
Post by Juan Huang
In jcr:contain, I can use "." to match any attributes in the
node, like "/pages/*[jcr:contains(., 'Special')] But, with
jcr:like, I have to give the attribute name, I can only use
all the nodes that has any attribute like %Special%. Is there
any way to accomplish this?
Thanks.
Juan
--
http://www.nabble.com/does-jcr%3Alike-has-a-wildcard-like-jcr%
3Acontain--tp15143136p15143136.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
Juan Huang
2008-01-28 20:30:07 UTC
Permalink
Thanks for replying me so quick!

The reason I used jcr:like is because I need to do a normal find, not the
full text search. Like if the property value is "divided", when I search for
keyword "div", if I use "jcr:like", "divided" will be matched, but
jcr:contain won't.

Juan
--
View this message in context: http://www.nabble.com/does-jcr%3Alike-has-a-wildcard-like-jcr%3Acontain--tp15143136p15144614.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
Ard Schrijvers
2008-01-28 21:39:25 UTC
Permalink
quick!
Post by Juan Huang
The reason I used jcr:like is because I need to do a normal
find, not the full text search. Like if the property value is
"divided", when I search for keyword "div", if I use
"jcr:like", "divided" will be matched, but jcr:contain won't.
jcr:contains can handle lucene syntax, so also the wildcard query you
are suggesting. But, at [1] you can already see that a wildcard prefix
will be very expensive (though depends on the number of different values
you have for a property). For jcr:like a prefix query a even *much* more
expensive than for jcr:contains.

I am though quite sure you can solve the thing you want to achieve
differently. What is the usecase?

-Ard

[1]
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/or
g/apache/lucene/search/WildcardQuery.html

ps the link seems to be down at the moment of this writing
Post by Juan Huang
Juan
--
http://www.nabble.com/does-jcr%3Alike-has-a-wildcard-like-jcr%
3Acontain--tp15143136p15144614.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
Ard Schrijvers
2008-01-28 21:49:49 UTC
Permalink
Post by Ard Schrijvers
jcr:contains can handle lucene syntax, so also the wildcard
to be sure : lucene syntax, so a wildcard is * and not %
Post by Ard Schrijvers
query you are suggesting. But, at [1] you can already see
that a wildcard prefix will be very expensive (though depends
on the number of different values you have for a property).
For jcr:like a prefix query a even *much* more expensive than
for jcr:contains.
I am though quite sure you can solve the thing you want to
achieve differently. What is the usecase?
-Ard
[1]
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/
javadoc/or
g/apache/lucene/search/WildcardQuery.html
ps the link seems to be down at the moment of this writing
Post by Juan Huang
Juan
--
http://www.nabble.com/does-jcr%3Alike-has-a-wildcard-like-jcr%
3Acontain--tp15143136p15144614.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
Juan Huang
2008-01-29 00:08:15 UTC
Permalink
The use case is that our users want to have something like the find and
replace in word, so we have to match any nodes or proerties in the
repository which contains the keyword, even if it is just part of a word,
and then there is a UI page for the users to choose which one they want to
do the replacement.

Juan
--
View this message in context: http://www.nabble.com/does-jcr%3Alike-has-a-wildcard-like-jcr%3Acontain--tp15143136p15149174.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
Ard Schrijvers
2008-01-29 08:19:44 UTC
Permalink
Hello,
Post by Juan Huang
The use case is that our users want to have something like
the find and replace in word, so we have to match any nodes
or proerties in the repository which contains the keyword,
even if it is just part of a word, and then there is a UI
page for the users to choose which one they want to do the
replacement.
For this use case you must use jcr:contains because it will be much less
slow. You only might run into some problems regarding lucene stemming
and removal op stopwords. Stemming (the process for reducing inflected
(or sometimes derived) words to their stem, base or root form) results
for example that "stemmer", "stemmed", "stemming" all are reduced to
"stem". Now, with jcr:contains if you look for "stemm*" you probably do
not get a hit (while for "stemm" you might). With jcr:like you do not
have this problem, because these fields are not tokenized, but this
directly will result in extreme slow queries (don't be surprised that
you see queries for 1000 nodes of > 10 sec for prefix % in jcr:like)

Anyhow, I think you have a use case that requires deep knowledge about
lucene how to implement this best (perhaps work with an extra index
containing n-grams tokens), and also about jackrabbit: what if a user
with one click can alter like 50.000 nodes....you'll have some
performance issues here

-Ard
Post by Juan Huang
Juan
--
http://www.nabble.com/does-jcr%3Alike-has-a-wildcard-like-jcr%
3Acontain--tp15143136p15149174.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
Loading...