Uploaded image for project: 'Service Packs and Hot Fixes'
  1. Service Packs and Hot Fixes
  2. MNT-8387

Upload of document will wait indefinately if there is a problem with the metadata extraction process performed by OpenOffice (where no problem is reported back from OOo).

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Resolution: Fixed
    • Affects Version/s: 3.1.1, 3.1
    • Fix Version/s: 3.2 R
    • Labels:
      None
    • Environment:
      Customer: Alfresco 3.1, Tomcat, Oracle, Linux
      In-house: Alfresco 3.1.1 (229) schema 1009 , Tomcat 6.0.18, MySQL 5.1.34, OpenOffice 3.1.1, Windows XP Pro SP3
    • Bug Priority:
      Category 3

      Description

      Problem
      ------------
      Upload of document will wait indefinately if there is a problem with the metadata extraction process performed by OpenOffice (where no problem is reported back from OOo).
      If OpenOffice struggles to open an MSWord document for the meta-data extraction process during upload it does not time out, it waits indefinately until OO is restarted (and so will mfail the meta-data extraction).
      To explain, during meta data extraction by OpenOffice, if the content being uploaded causes OpenOffice to go into a state where it is struggling to open the document and thrashing the CPU, the JSF/share interface does not time out and seems to wait indefinately for this to complete. Ultimately, soffice.bin has to be stopped for Alfresco to continue. The only way to stop this and allow Alfresco to continue to upload the document is to kill the thrashing soffice.bin process.

      In this time OpenOffice does not report anything back to Alfresco and the reader is not closed therefore Alfresco waits and waits so the metadata extractor bean (or the AbstractMappingMetadataExtracter) should have some kind of timeout if there is no response (good or bad) from OpenOffice in a certain time.

      Steps for replication
      ---------------------------
      Replication is very easy.

      1. Set debug for catagory 'log4j.logger.org.alfresco.repo.content' in log4j.properties to make life a little easier.
      2. Start Alfresco (ensuring that OpenOffice is started and connected to it).
      3. Try and upload the attached document (kill.doc) via JSF.

      Observations
      -------------------
      You will see soffice.bin hammer the CPU indefinately, the UI will wait with no timeout. The logging will look like this:

      11:43:22,603 User:admin DEBUG [content.metadata.MetadataExtracterRegistry] Finding extractors for application/msword
      11:43:22,603 User:admin DEBUG [content.metadata.AbstractMappingMetadataExtracter] Starting metadata extraction:
      reader: ContentAccessor[ contentUrl=store://E:\installations\311e_13114\tomcat\temp\Alfresco\alfresco1769715311266895362.upload, mimetype=application/ms
      word, size=361984, encoding=UTF-8, locale=en_US]
      extracter: org.alfresco.repo.content.metadata.OpenOfficeMetadataExtracter@de0cba
      11:43:22,619 User:admin DEBUG [content.filestore.FileContentReader] Opened write channel to file:
      file: E:\installations\311e_13114\tomcat\temp\Alfresco\alfresco1769715311266895362.upload
      random-access: true
      11:43:22,619 User:admin DEBUG [repo.content.AbstractContentReader] Created callback byte channel:
      original: sun.nio.ch.FileChannelImpl@21eea4
      new: org.alfresco.repo.content.AbstractContentAccessor$CallbackFileChannel@1ba6070
      11:43:22,635 User:admin DEBUG [repo.content.AbstractContentReader] Opened channel onto content: ContentAccessor[ contentUrl=store://E:\installations\311e_1
      3114\tomcat\temp\Alfresco\alfresco1769715311266895362.upload, mimetype=application/msword, size=361984, encoding=UTF-8, locale=en_US]

      Note, metadata extraction started at 11:43, at 12:04 I killed the soffice.bin process...

      12:04:56,829 User:admin WARN [bean.repository.Repository] Metadata extraction failed:
      reader: ContentAccessor[ contentUrl=store://E:\installations\311e_13114\tomcat\temp\Alfresco\alfresco1769715311266895362.upload, mimetype=application/ms
      word, size=361984, encoding=UTF-8, locale=en_US]
      extracter: org.alfresco.repo.content.metadata.OpenOfficeMetadataExtracter@de0cba
      12:04:56,861 User:admin DEBUG [content.filestore.FileContentStore] Created content writer:
      writer: ContentAccessor[ contentUrl=store://2009/11/23/12/4/94cfd6fb-b49a-48fb-b278-83389bcc6288.bin, mimetype=null, size=0, encoding=UTF-8, locale=en_U
      S]

      This now allows the upload procedure to complete.

      So the problem with the document is an MSWord/OpenOffice issue however how we handle the upload should be a little better with perhaps a timeout for the meta-data extraction included so that when no response is received from the OpenOffice extraction it will timeout, report the timeout and continue with the upload as normal.

      Environment
      -------------------
      Alfresco 3.1.1 (229) schema 1009 , Tomcat 6.0.18, MySQL 5.1.34, OpenOffice 3.1.1, Windows XP Pro SP3

        Attachments

          Issue Links

            Structure

              Activity

                People

                • Assignee:
                  closedbugs Closed Bugs
                  Reporter:
                  astrachan Alex Strachan
                • Votes:
                  2 Vote for this issue
                  Watchers:
                  1 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel