STEPS TO REPRODUCE:
1) Create 15k sites.
2) Create 350 users
3) Add all users to every site
4) Delete 100 sites and upload a couple of documents. This will cause all users to be re-indexed and solr is stuck for many hours . You cannot search new docs until solr catch up new transactions after this deletion.
To be able to search new documents shortly after being added to system (timescale of minutes).
In the dev environment took approx 2 hours per site deleted, the impact of all the deletion was 5 days on their prod environment.
During the indexing there was a high network load between Solr and Alfresco (2 to 10MB/s). We enabled Solr debug logging for a short time
(in attachment). It seems that for each user Solr indexes all possible paths. For each site there are 6 paths per user:
In our case this means each user has 6 * 15.000 paths (= 90.000) for the Site groups alone. In addition there will be more paths for the other groups of which the user is a member. The same applies to the configurations/preferences child nodes of each user (2 extra nodes per user). For 350 users/15k sites this means more than 350 * 3 * 6 * 15.000 = 94.5 million path fields have to be indexed. We suspect this is the cause of the very slow indexing we experience.