[MNT-9799] support for solr replication, solr cluster Created: 21-Oct-13  Updated: 14-Aug-15  Resolved: 21-Jul-15

Status: Closed
Project: Service Packs and Hot Fixes
Component/s: Search and Indexing (non-UI)
Affects Version/s: 4.1.4
Fix Version/s: None

Type: Feature
Reporter: Alex Madon [X] (Inactive) Assignee: Closed Issues
Resolution: Fixed Votes: 12
Labels: Customer_Success
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Microsoft Word 1.4.bm0013.v418b2_slave.results_00.xlsx     Microsoft Word bm-0010-tracking.xlsx     Microsoft Word bm-0013-soak-tracking.xlsx    
Issue Links:
Bug Priority:
Category 2
ACT Numbers:

137453 Premier


Many customers are asking for support of the built in solr replication mechanism available since solr 1.4
(this is not the rsync command line sync feature)

This ticket is aimed at tracking the list of customers asking for it.

1) http://wiki.alfresco.com/wiki/Alfresco_And_SOLR
mentions "there is currently no slave replication support"
2) if you have three solr nodes, having each of them tracking put an unnecessary load on the the SQL database. Solr replication makes more sense
3) many customers have been trying to do it.
It seems that with 4.14, we are not far from having it work.
see ticket 137453

The index replication works nicely but we are faced with an issue of index cleanup on the slave

2013-10-18 10:59:30,081 WARN [alfresco.solr.AlfrescoSolrEventListener] [pool-5-thread-1] Cache state error -> rebuilding
java.lang.IllegalStateException: New sub reader but no new docs ??
at org.alfresco.solr.AlfrescoSolrEventListener.buildCacheUpdateOperations(AlfrescoSolrEventListener.java:1063)
at org.alfresco.solr.AlfrescoSolrEventListener.newSearcher(AlfrescoSolrEventListener.java:325)
at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1163)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2013-10-18 10:59:59,674 WARN [solr.handler.SnapPuller] [pool-8-thread-1] The update handler is not an instance or sub-class of DirectUpdateHandler2. ReplicationHandler may not be able to cleanup unused index files.

We have tested adding, modifying and deleting and the Slave Solr is staying in sync with the Master Solr. The only thing is, the Slave Solr cannot cleanup after itself so the Slave is growing over time in disk usage.

What the plans with Alfresco to fully implement Solr replication as for the most part it works really well.

Comment by Alfresco QA Team (Inactive) [ 12-Mar-14 ]

Issue was tested.
Upgrade took 51 minutes.
Reindex took 30 hours and 45 minutes.

Next performance degradation was detected compared to previous run during to BM-0010 jmeter test:


4.1.7 build-37 https 4.1.8 build-2 4.1.8 build-2_slave
282 258 345


4.1.7 build-37 https 4.1.8 build-2 4.1.8 build-2_slave
686 773 1051


4.1.7 build-37 https 4.1.8 build-2 4.1.8 build-2_slave
760 842 896


4.1.7 build-37 https 4.1.8 build-2 4.1.8 build-2_slave
1713 1487 1669


4.1.7 build-37 https 4.1.8 build-2 4.1.8 build-2_slave
1934 1975 2240

Next tests were failed during to BM-0013 tests:

Suite name Total count Success count Failure count
share.startWebDrone 718 619 99
share.nav.documentLibrary 1346 1336 10

In general, results are not bad.

Please, see atachment to see results and find logs by path below:
1. Reindex: BM-0010: ms1: data/replicate/bm0013/logs/solr_reindex_418b2_slave
2. BM-0010: ms1: data/replicate/bm0013/logs/jmeter_418b2_slave
3. BM-0013: BM-0010: ms1: data/replicate/bm0013/logs/soak_V418b2_slave

Snezhana Z.

Generated at Thu Jun 24 01:04:59 BST 2021 using Jira 7.13.15#713015-sha1:7c5ddd2c3e1709974ae9c48c17df8edd3919fe2c.