Details

      Description

      As part of MNT-11225 Tika and its related transitive dependencies should be upgraded to include TIKA-1278.

      Library Current 4.2.N Tika 1.6-SNAPSHOT Notes
      asm 3.1 4.1 See MNT-9291
      commons-compress 1.4.1 1.8  
      fontbox 1.8.2 1.8.4  
      java-libpst 0.7 License: Apache 2
      jdom 1.0 Dep of rome, may not be needed
      jempbox 1.8.2 1.8.4  
      jhighlight 1.0 License: CDDL/LGPL
      pdfbox 1.8.2-alfresco-patched 1.8.4  
      poi-* 3.10-beta2-20130720 3.10-FINAL  
      tika-* 1.5-20130720-alfresco-patched 1.6-yyyyMMdd-alfresco-patched Patched to use asm 3.1, See MNT-9291
      vorbis-* 0.3-20130206 0.4  
      xz 1.2 1.5  

      Those that are a little concerning are in bold above.

      PDFBox 1.8.4 must be patched again as our changes have not been incorporated in there.

      Tika must also be patched to downgrade asm to 3.1 as a change to cglib could touch several other dependencies. (See MNT-9291 and comment on BDE-266.)

      A feature added in Tika is the ability to parse artifacts embedded in PDFs (TIKA-1268). That additional parsing can be quite resource intensive for some PDFs, particularly one used in our PDF content transformer tests. A method of disabling the parsing of embedded attachments via config must be developed and the parsing of embedded images in PDFs should be disabled by default.

        Attachments

          Issue Links

            Structure

              Activity

                People

                • Assignee:
                  closedbugs Closed Bugs (Inactive)
                  Reporter:
                  ragauss Ray Gauss [X] (Inactive)
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  2 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel