Service Packs and Hot Fixes
  1. Service Packs and Hot Fixes
  2. MNT-8339

The content of .msg files that are uploaded via Share are not searchable

    Details

    • Type: Bug Bug
    • Status: Closed Closed (View Workflow)
    • Priority: Unprioritized Unprioritized
    • Resolution: Fixed
    • Affects Version/s: 3.4.7
    • Fix Version/s: None
    • Component/s: Installer
    • Labels:
      None
    • Environment:
      Alfresco 3.4.7, Windows 2008R2, Tomcat, SQL Server

      Description

      The content of .msg files that are uploaded via Share are not searchable.

      To reproduce:

      upload a .msg file (provided via attachment) to Share
      In the content for .msg file there are these three words:
      Enjoy!
      Mukilteo
      Mukilteo Speedway
      When searching either in basic or advanced with these terms, the file does not show up in the search.

        Activity

        Hide
        jcohorn added a comment -

        After enabling transform logging:

        log4j.logger.org.alfresco.repo.content.transform=DEBUG

        I see the following logged in 3.4.7 when uploading the file via Explorer(Share upload fails):

        15:43:15,769 DEBUG [content.transform.ContentTransformerRegistry] Searched for transformer:
        source mimetype: application/vnd.ms-outlook
        target mimetype: text/plain
        transformers: [PoiContentTransformer[ average=0ms], MailContentTransformer[ average=0ms]]
        15:43:30,917 DEBUG [content.metadata.MetadataExtracterRegistry] Finding extractors for application/vnd.ms-outlook
        15:43:30,926 WARN [content.metadata.AbstractMappingMetadataExtracter] Metadata extraction failed (turn on DEBUG for full error):
        Extracter: org.alfresco.repo.content.metadata.MailMetadataExtracter@54aa1384
        Content: ContentAccessor[ contentUrl=store:///Users/jc/alf/installs/34/e347/tomcat/temp/Alfresco/alfresco7365161311622237915.upload, mimetype=application/vnd.ms-outlook, size=17920, encoding=UTF-8, locale=en_US]
        Failure: Invalid chunk name Olk10SideProps_0001null
        15:43:31,055 DEBUG [content.transform.ContentTransformerRegistry] Searched for transformer:
        source mimetype: application/vnd.ms-outlook
        target mimetype: text/plain
        transformers: [PoiContentTransformer[ average=0ms], MailContentTransformer[ average=60000ms]]
        15:43:33,300 DEBUG [content.transform.ContentTransformerRegistry] Searched for transformer:
        source mimetype: application/vnd.ms-outlook
        target mimetype: text/plain
        transformers: [PoiContentTransformer[ average=60000ms], MailContentTransformer[ average=60000ms]]
        15:44:26,620 DEBUG [content.transform.ContentTransformerRegistry] Searched for transformer:
        source mimetype: application/vnd.ms-outlook
        target mimetype: text/plain
        transformers: [PoiContentTransformer[ average=60000ms], MailContentTransformer[ average=60000ms]]

        It looks as though the metadata/text extractors encountered a malformed element in the file:

        Failure: Invalid chunk name Olk10SideProps_0001null

        This probably came from the POI Library. There seem to have been fixes in POI related to issues that resemble this:
        https://issues.apache.org/bugzilla/show_bug.cgi?id=51873

        Can you confirm that this is only seen with certain ill-formed email files? All MSG files? Only MSG files saved by a certain version of Outlook?

        Also, since 4.0 includes a slightly newer version of POI you may want to verify that the same issue is seen there.

        Show
        jcohorn added a comment - After enabling transform logging: log4j.logger.org.alfresco.repo.content.transform=DEBUG I see the following logged in 3.4.7 when uploading the file via Explorer(Share upload fails): 15:43:15,769 DEBUG [content.transform.ContentTransformerRegistry] Searched for transformer: source mimetype: application/vnd.ms-outlook target mimetype: text/plain transformers: [PoiContentTransformer[ average=0ms], MailContentTransformer[ average=0ms]] 15:43:30,917 DEBUG [content.metadata.MetadataExtracterRegistry] Finding extractors for application/vnd.ms-outlook 15:43:30,926 WARN [content.metadata.AbstractMappingMetadataExtracter] Metadata extraction failed (turn on DEBUG for full error): Extracter: org.alfresco.repo.content.metadata.MailMetadataExtracter@54aa1384 Content: ContentAccessor[ contentUrl=store:///Users/jc/alf/installs/34/e347/tomcat/temp/Alfresco/alfresco7365161311622237915.upload, mimetype=application/vnd.ms-outlook, size=17920, encoding=UTF-8, locale=en_US] Failure: Invalid chunk name Olk10SideProps_0001null 15:43:31,055 DEBUG [content.transform.ContentTransformerRegistry] Searched for transformer: source mimetype: application/vnd.ms-outlook target mimetype: text/plain transformers: [PoiContentTransformer[ average=0ms], MailContentTransformer[ average=60000ms]] 15:43:33,300 DEBUG [content.transform.ContentTransformerRegistry] Searched for transformer: source mimetype: application/vnd.ms-outlook target mimetype: text/plain transformers: [PoiContentTransformer[ average=60000ms], MailContentTransformer[ average=60000ms]] 15:44:26,620 DEBUG [content.transform.ContentTransformerRegistry] Searched for transformer: source mimetype: application/vnd.ms-outlook target mimetype: text/plain transformers: [PoiContentTransformer[ average=60000ms], MailContentTransformer[ average=60000ms]] It looks as though the metadata/text extractors encountered a malformed element in the file: Failure: Invalid chunk name Olk10SideProps_0001null This probably came from the POI Library. There seem to have been fixes in POI related to issues that resemble this: https://issues.apache.org/bugzilla/show_bug.cgi?id=51873 Can you confirm that this is only seen with certain ill-formed email files? All MSG files? Only MSG files saved by a certain version of Outlook? Also, since 4.0 includes a slightly newer version of POI you may want to verify that the same issue is seen there.
        Hide
        Mark Rogers added a comment -

        Its certainly not the case that all msg files are not searchable since we have unit tests, including quick.msg.

        Show
        Mark Rogers added a comment - Its certainly not the case that all msg files are not searchable since we have unit tests, including quick.msg.

          People

          • Assignee:
            Closed Issues
            Reporter:
            Harlin Seritt
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: