Uploaded image for project: 'Service Packs and Hot Fixes'
  1. Service Packs and Hot Fixes
  2. MNT-13882

CLONE - EMLTransformer ignoring multipart emails

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Resolution: Fixed
    • Affects Version/s: 4.2, 5.0
    • Fix Version/s: 5.0.2
    • Component/s: Transformations
    • Labels:
    • Environment:
      Tomcat 7.0.47, Macos or Ubuntu 14.04. H2 db or Postgres.

      Description

      The transformer for RFC822 messages EMLTransformer.java has a severe bug that for those who store a lot of emails impacts performance.
      The transformation of Multipart emails will always return the entire email, including attachments base64 text.

      • For indexing this results in indexing the plain text of base64 encoded attachment. A client of mine with 100.000+ emails could pretty much enter any character combination and get a hit. The index file size became 300+GB.
      • Preview of EML files, can get 300+ pages long in PdfJS viewer, since the the attachment base64 text is displayed.

      How to reproduce

      • Create an email with html body and at least one attachment.
      • Create folder with a rule to transform to plain text
      • Transfer to Alfresco as EML file, drop into folder above.
        Expected: Only text should show up
        Actual: Text and encoding keys present. Attachment visible at base64.

      Note: A long outstanding issue is that html part of email plain text is included when transforming. So you would probably see html as part of the transformation.

      What is the cause?
      In the EMLTransformer.java row 85-90 the mimetype is set to text/plain on the message. This destroys the message actual type of being multipart, so when the getContent is called it is always a string and never instanceof Multipart.
      Just remove that and it works. It may have been needed with javax.mail 1.4.x, but it seem like it is not needed now with 1.5.x.

      I will also have a look at making sure that that a plain text transformation does not include the html part of the message, and create a transformer that can pick out the html part and use that if available.

      Setting this as a regression as it used to work with 4.2.

        Attachments

          Issue Links

            Structure

              Activity

                People

                • Assignee:
                  closedbugs Closed Bugs
                  Reporter:
                  loftux Peter Löfgren
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  2 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Time Tracking

                    Estimated:
                    Original Estimate - Not Specified
                    Not Specified
                    Remaining:
                    Remaining Estimate - 0 minutes
                    0m
                    Logged:
                    Time Spent - 2 hours, 30 minutes
                    2h 30m

                      Structure Helper Panel