Jackrabbit Performance Tuning - Large "Transaction" & Concurrent Access to Repository

Discussion:

Daniel Bloomfield Ramagem

2007-01-24 22:33:52 UTC

I have successfully imported over 4000 files / 96 MB over a single
Jackrabbit session with no problem, using the default repository settings.

I have a question regarding performance: I observed that during this large
import (which is pretty intensive and takes at least a minute on a fast
machine) access to the same repository for other content, via another
separate session, seems to block. Now, if I previously accessed this
content then it seems to be cached and access to it is instant. Otherwise
if its newly accessed content then my access blocks.

Is there some sort of tuning of the repository settings that would improve
this concurrent access of the repository? Or must I break down the amount
being imported into "chunks" so as not to create one large "commit" ( e.g.,
"session.save()")? Do note that I require this atomic import behavior.

Thanks,

Daniel.

Marcel Reutegger

2007-01-25 10:06:43 UTC

Permalink

Hi Daniel,

this is a known issue / limitation with jackrabbit.

See: http://issues.apache.org/jira/browse/JCR-314

Post by Daniel Bloomfield Ramagem
Is there some sort of tuning of the repository settings that would improve
this concurrent access of the repository?

no, there is not.

Post by Daniel Bloomfield Ramagem
Or must I break down the amount
being imported into "chunks" so as not to create one large "commit" ( e.g.,
"session.save()")? Do note that I require this atomic import behavior.

well, you can break your import into several chunks and save them separately,
but then the import will not be atomic :-/

regards
marcel

Daniel Bloomfield Ramagem

2007-01-25 14:13:58 UTC

Permalink

Hi Marcel,

Thanks for the link to the JIRA issue. It says there that all store
operations will be serialized and that trying to read something that is
being stored will be blocked. But I seem to be experiencing a block even
when I try to read something that was previously in the repository, before
the import.

For example, suppose I had previously created node A below. I then begin
importing B, C, D, ... While that is happening a separate thread creates a
new session and tries to get A. It will get blocked until the import of
tree B is finished storing in the repository.

Workspace
/ \
A B
/ \
C ...

However I have also observed that if A has been previously accessed (before
the large store operation on B) then it will be available to my concurrent
read thread because of caching (?).

Does that seem like the behavior you'd expect from Jackrabbit? I haven't
done any strict testing and have been just informally testing these things.

Thanks,

Daniel.

Post by Marcel Reutegger
Hi Daniel,
this is a known issue / limitation with jackrabbit.
See: http://issues.apache.org/jira/browse/JCR-314

Post by Daniel Bloomfield Ramagem
Is there some sort of tuning of the repository settings that would

improve

Post by Daniel Bloomfield Ramagem
this concurrent access of the repository?

no, there is not.

Post by Daniel Bloomfield Ramagem
Or must I break down the amount
being imported into "chunks" so as not to create one large "commit" (

e.g.,

Post by Daniel Bloomfield Ramagem
"session.save()")? Do note that I require this atomic import behavior.

well, you can break your import into several chunks and save them separately,
but then the import will not be atomic :-/
regards
marcel

Marcel Reutegger

2007-01-26 08:56:42 UTC

Permalink

Hi Daniel,

Post by Daniel Bloomfield Ramagem
For example, suppose I had previously created node A below. I then begin
importing B, C, D, ... While that is happening a separate thread creates a
new session and tries to get A. It will get blocked until the import of
tree B is finished storing in the repository.
Workspace
/ \
A B
/ \
C ...

The following page has a nice little picture that shows the layering of the
jackrabbit core:
http://jackrabbit.apache.org/doc/arch/operate/index.html

what the jira issue describes is a lock in the shared item state manager, which
is workspace wide. when a change is committed this shared item state manager is
locked and not even a read will go through it to retrieve an item from the
underlying persistence manager. writes are also serialized, one at a time.

Post by Daniel Bloomfield Ramagem
However I have also observed that if A has been previously accessed (before
the large store operation on B) then it will be available to my concurrent
read thread because of caching (?).

yes, that's because of the item sate manager layered on top of the shared item
state manager. those are per session and contain a cache. that's why certain
sessions are still able to read A while a change by another session is
committed. Those sessions have A in their cache.

Post by Daniel Bloomfield Ramagem
Does that seem like the behavior you'd expect from Jackrabbit? I haven't
done any strict testing and have been just informally testing these things.

yes, that's what you can expect from jackrabbit right now.

regards
marcel