Question about lucene.DocNumberCache and initializeHierarchyCache

Uittenbroek, R.M.

2018-10-17 10:44:29 UTC

Hello,

I hope this is the right list for my question.

We are running a CMS on Jackrabbit 2.16.2. We have been using JCR for years
now. After startup and initialisation, when a first query is run, the
request takes very long (over 1 minute). From what I can see in the logs,
the lucene.DocNumberCache is queried for results (and filled because it was
empty).

We do this query and get this logging:

2018-10-17 10:48:54,629 INFO [18 ] jcr.JcrSearch - open: xpath query =
/jcr:root/webplatform/www.rug.nl//element(*,
nt:file)/jcr:content[(@cms:vParentLC =
'/education/international-student-blog' and @fs
:id != '12b506d7-130c-4643-8d0a-b684890bf946-33.14') and
(((not(@cms:publicationStart) and @cms:created <
xs:dateTime('2018-10-17T10:48:00.000+02:00')) or @cms:publicationStart <
xs:dateTime('2018-10-17T10:
48:00.000+02:00')) and (not(@cms:publicationEnd) or @cms:publicationEnd >=
xs:dateTime('2018-10-17T10:48:00.000+02:00'))) and (@cms:type =
'blogEntry')]/(@cms:type) order by @jcr:lastModified descending

2018-10-17 10:48:55,575 INFO [18 ] lucene.DocNumberCache -
size=7/1000000, #accesses=3149, #hits=3149, #misses=0, cacheRatio=100%
2018-10-17 10:49:05,576 INFO [18 ] lucene.DocNumberCache -
size=37575/1000000, #accesses=1746225, #hits=1261440, #misses=484785,
cacheRatio=73%
2018-10-17 10:49:15,577 INFO [18 ] lucene.DocNumberCache -
size=99950/1000000, #accesses=616478, #hits=0, #misses=616478, cacheRatio=0%
2018-10-17 10:49:25,578 INFO [18 ] lucene.DocNumberCache -
size=129659/1000000, #accesses=625557, #hits=0, #misses=625557,
cacheRatio=0%
2018-10-17 10:49:35,579 INFO [18 ] lucene.DocNumberCache -
size=151664/1000000, #accesses=608407, #hits=3, #misses=608404,
cacheRatio=1%
2018-10-17 10:49:45,580 INFO [18 ] lucene.DocNumberCache -
size=170923/1000000, #accesses=608628, #hits=0, #misses=608628,
cacheRatio=0%
2018-10-17 10:49:55,581 INFO [18 ] lucene.DocNumberCache -
size=187612/1000000, #accesses=613109, #hits=0, #misses=613109,
cacheRatio=0%
2018-10-17 10:50:05,582 INFO [18 ] lucene.DocNumberCache -
size=201976/1000000, #accesses=593435, #hits=0, #misses=593435,
cacheRatio=0%
2018-10-17 10:50:15,583 INFO [18 ] lucene.DocNumberCache -
size=216065/1000000, #accesses=607567, #hits=0, #misses=607567,
cacheRatio=0%
2018-10-17 10:50:20,331 INFO [18 ] jcr.JcrSearch - open: number of
nodes found = 107
2018-10-17 10:50:20,331 INFO [18 ] query.Search - Query performed with
guest=true

As you can see from the timestamps, this proces takes very long.

From the docs http://jackrabbit.apache.org/jcr/index-readers.html I see "In
order to speed up lookups by UUID the CachingMultiIndexReader also has a
DocNumberCache. This cache uses a LRU algorithm to keep a limited amount of
UUID to document number mappings.". So, I assume the DocNumberCache is
empty when the first query is done.
And from another doc, I read "initializeHierarchyCache: With the default
value of true the hierarchy cache is initialized on startup and control is
only given back when the initialization has completed. When set to false
the cache is populated during regular use.'. We use the default 'true'.

So, I would assume at startup this 'DocNumberCache' would be filled with
all hierarchy information and would be 'ready to go' when we start doing
queries. But this does not seem to be the case.

Am I doing something wrong, missing something, or is there another
parameter to set for this to work? I really would want this DocNumberCache
to be fully ready before JCR becomes 'available'.

Thanks for your help,

Kind regards,

Robbert Uittenbroek