Uploaded image for project: 'Service Packs and Hot Fixes'
  1. Service Packs and Hot Fixes
  2. MNT-19079

High CPU triggered by org.apache.tika.parser.pdf.PDF2XHTML.extractImages

    Details

    • Bug Priority:
      Category 1
    • Escalated:
      Yes
    • ACT Numbers:

      00949455, 00946925, 00950837, 00960777, 00963817

      Description

      After upgrading Alfresco from 5.0 to 5.1 and re-indexing with Solr4,  Alfresco starts reporting high CPU utilisation (400%) and slow response. The Alfresco Admin Console reports Solr indexing in progress. Checking Admin Console > Support Tools > Hot Threads shows multiple examples of the following stack:

      <snip>
      http-bio-8443-exec-7 - priority:5 - threadId:0x00000000022a5800 - nativeId:0x7f19 - state:RUNNABLE
      stackTrace:
      java.lang.Thread.State: RUNNABLE
      at java.lang.Object.hashCode(Native Method)
      at java.util.HashMap.hash(HashMap.java:338)
      at java.util.HashMap.put(HashMap.java:611)
      at org.apache.pdfbox.pdmodel.PDResources.reverseMap(PDResources.java:658)
      at org.apache.pdfbox.pdmodel.PDResources.setXObjects(PDResources.java:332)
      at org.apache.pdfbox.pdmodel.PDResources.getXObjects(PDResources.java:269)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:286)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.extractImages(PDF2XHTML.java:288)
      at org.apache.tika.parser.pdf.PDF2XHTML.endPage(PDF2XHTML.java:220)
      at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:473)
      at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:395)
      at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:354)
      at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117)
      at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:150)
      at org.alfresco.repo.content.transform.TikaPoweredContentTransformer.transformInternal(TikaPoweredContentTransformer.java:255)
      at org.alfresco.repo.content.transform.AbstractContentTransformer2.transform(AbstractContentTransformer2.java:266)
      at org.alfresco.repo.content.transform.AbstractContentTransformer2.transform(AbstractContentTransformer2.java:218)
      at org.alfresco.repo.web.scripts.solr.NodeContentGet.execute(NodeContentGet.java:213)
      </snip>

       
      The problem symptoms match community reported ALF-21970.

      Steps to Reproduce

      1. Using Alfresco 5.2.2 upload the attached Tikka-Issue-Full-CPU.PDF to Alfresco Share.
      2. Use the top command to monitor Alfresco CPU use. It will start to climb above 100%.
      3. On the Admin Console > Support Tools > Hot Threads capture a thread report every few seconds. The stack trace reported above will be seen.

      Actual Behaviour
      CPU will start to climb above 100%.

      Expected Behaviour
      CPU should not be climbed above 100%

      Workaround
      Installing the alf-21970-repo-1.0.0.jar attached to ALF-21970 resolves this specific problem.
      As the attached screenshot named "CPU_usage_comparison.png" shows, CPU usage went down after applying the patch.
      To install the fix, stop Alfresco, copy alf-21970.jar to <alfresco_home>/tomcat/endorsed and then restart Alfresco.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                closedbugs Closed Bugs
                Reporter:
                gcussen Gerald Cussen
              • Votes:
                1 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: