-
Type:
Service Pack Request
-
Status: Closed
-
Resolution: Fixed
-
Affects Version/s: 5.0.4, 5.1.3, 5.2.2
-
Component/s: Tika, POI, and Metadata Extraction
-
Labels:None
-
Environment:Alfresco 5.1.3.2 Enterprise
-
Bug Priority:
-
ACT Numbers:
00949455, 00946925, 00950837, 00960777, 00963817
After upgrading Alfresco from 5.0 to 5.1 and re-indexing with Solr4, Alfresco starts reporting high CPU utilisation (400%) and slow response. The Alfresco Admin Console reports Solr indexing in progress. Checking Admin Console > Support Tools > Hot Threads shows multiple examples of the following stack:
<snip> http-bio-8443-exec-7 - priority:5 - threadId:0x00000000022a5800 - nativeId:0x7f19 - state:RUNNABLE stackTrace: java.lang.Thread.State: RUNNABLE at java.lang.Object.hashCode(Native Method) at java.util.HashMap.hash(HashMap.java:338) at java.util.HashMap.put(HashMap.java:611) at org.apache.pdfbox.pdmodel.PDResources.reverseMap(PDResources.java:658) at org.apache.pdfbox.pdmodel.PDResources.setXObjects(PDResources.java:332) at org.apache.pdfbox.pdmodel.PDResources.getXObjects(PDResources.java:269) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:286) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288) at org.apache.tika.parser.pdf.PDF2XHTML.endPage(PDF2XHTML.java:220) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:473) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:395) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:354) at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:150) at org.alfresco.repo.content.transform.TikaPoweredContentTransformer.transformInternal(TikaPoweredContentTransformer.java:255) at org.alfresco.repo.content.transform.AbstractContentTransformer2.transform(AbstractContentTransformer2.java:266) at org.alfresco.repo.content.transform.AbstractContentTransformer2.transform(AbstractContentTransformer2.java:218) at org.alfresco.repo.web.scripts.solr.NodeContentGet.execute(NodeContentGet.java:213) </snip>
The problem symptoms match community reported ALF-21970.
Steps to Reproduce
- Using Alfresco 5.2.2 upload the attached Tikka-Issue-Full-CPU.PDF to Alfresco Share.
- Use the top command to monitor Alfresco CPU use. It will start to climb above 100%.
- On the Admin Console > Support Tools > Hot Threads capture a thread report every few seconds. The stack trace reported above will be seen.
Actual Behaviour
CPU will start to climb above 100%.
Expected Behaviour
CPU should not be climbed above 100%
Workaround
Installing the alf-21970-repo-1.0.0.jar attached to ALF-21970 resolves this specific problem.
As the attached screenshot named "CPU_usage_comparison.png" shows, CPU usage went down after applying the patch.
To install the fix, stop Alfresco, copy alf-21970.jar to <alfresco_home>/tomcat/endorsed and then restart Alfresco.
- relates to
-
ALF-21970 Parsing a PDF freezes the system due to CPU consumption (Tika related issue)
-
- Closed
-