Uploaded image for project: 'Service Packs and Hot Fixes'
  1. Service Packs and Hot Fixes
  2. MNT-19061

Title property is garbled when uploading a Japanese HTML document

    Details

    • Type: Hot Fix Request
    • Status: Closed
    • Resolution: Fixed
    • Affects Version/s: 5.0.0.12, 5.2.2
    • Fix Version/s: 5.0.0.16, 5.1.5
    • Labels:
      None
    • Environment:

      Description

      Summary of the issue
      Title property (cm:title) has garbled characters when a HTML document which contains Japanese characters in <title> field is uploaded to Alfresco. The charset defined in the HTML document is "Shift_JIS" (all the content within the HTML document is in Japanese). The issue also occur if there is no charset defined in the document. If auditing is enabled, a new entry is automatically created in the alf_prop_string_value table with respect to the new document's Title. The fields string_value & string_end_lower both contains garbled characters in it. While looking at the audit data using the Audit web script - http://localhost:8080/alfresco/s/api/audit/query/alfresco-access?verbose=true&forward=false we can see that the garbled characters are displayed something like - "

      {http:\/\/www.alfresco.org\/model\/content\/1.0}

      title={en_US=\ufffdT\ufffd\ufffd\ufffdv\ufffd\ufffd\ufffdt\ufffd@\ufffdC\ufffd\ufffd\ufffd\...."

      The issue can be reproduced internally in out of the box versions 5.0.0.12 and 5.2.2. Screenshots and sample files are attached.

      Steps to reproduce
      Optional Step: Enable auditing feature in the app - audit.enabled=true & audit.alfresco-access.enabled=true in alfresco-global.properties (issue occur with/without auditing enabled)

      1) Create a HTML document with some Japanese content and some Japanese characters in the <title> element/field.(use the sample files attached - Test1.htm, Test2.htm)
      2) Upload the HTML document to Alfresco Share (either Drag&Drop or Upload button)
      3) Document is uploaded successfully but the Title property(cm:title) is saved with some garbled characters. Actual Japanese character is not saved.
      4) If auditing is enabled then a few entries are created automatically in the alf_prop_string_value table. One of the entry is in relation to the Title property value of the new document. This entry will contain garbled characters in "string_value" & string_end_lower" columns.
      5) Check the audit data associated with this document using the API - http://localhost:8080/alfresco/s/api/audit/query/alfresco-access?verbose=true&forward=false. The title information will be displayed something like this - "

      {http:\/\/www.alfresco.org\/model\/content\/1.0}

      title={en_US=\ufffdT\ufffd\ufffd\ufffdv\ufffd\ufffd\ufffdt\ufffd@\ufffdC\ufffd\ufffd\ufffd\"

      NOTE: One other problem is when they try to migrate from one DB to another (Oracle -> Postgres) the migration fails due to 'ERROR: Invalid byte sequence with encoding method "UTF 8": 0xed 0xb 0 0x8 b incompatibility' (this is because of invalid values stored in the DB). Once they change the string_end_lower & string_value columns to valid values the migration is successful.

      Expected Behaviour
      When html document is uploaded, Title property should contain the original Japanese characters not the garbled characters. If Auditing is already enabled, it should create an entry with appropriate Japanese characters in string_value & string_end_lower columns and not the garbled characters.

      Observerd Behaviour
      Garbled characters are saved in the Title property and also in string_value, string_end_lower columns in alf_prop_string_value table.

        Attachments

        1. Audit Data.png
          Audit Data.png
          408 kB
        2. content_url.png
          content_url.png
          28 kB
        3. DB Entry.png
          DB Entry.png
          13 kB
        4. DocLib View.png
          DocLib View.png
          214 kB
        5. FileUsingMYSQL.png
          FileUsingMYSQL.png
          84 kB
        6. Shift_JIS.png
          Shift_JIS.png
          38 kB
        7. Test1.htm
          8 kB
        8. Test2.htm
          7 kB
        9. Title in Alfresco.png
          Title in Alfresco.png
          136 kB
        10. Title Property.png
          Title Property.png
          83 kB
        11. With Charset.htm
          8 kB
        12. Without Charset.htm
          8 kB

          Issue Links

            Structure

              Activity

                People

                • Assignee:
                  closedbugs Closed Bugs (Inactive)
                  Reporter:
                  kmani Karthick Mani
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  5 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel