Uploaded image for project: 'Service Packs and Hot Fixes'
  1. Service Packs and Hot Fixes
  2. MNT-19088

Garbled DB storage when uploading Japanese documents

    Details

    • Type: Hot Fix Request
    • Status: Closed
    • Resolution: Won't Fix
    • Affects Version/s: 5.0.0.12, 5.2.2
    • Fix Version/s: None
    • Labels:
      None
    • Environment:
      Alfresco Version - 5.0.0.12
      Database - Oracle 12c
      Application Server - Tomcat
    • Bug Priority:
      Category 3
    • Hot Fix Version:
      5.0.0
    • ACT Numbers:

      00950665

    • Premier Customer:
      Yes

      Description

      Summary
      When documents containing Japanese character '' as part of the filename are uploaded (I can't seem to add the character in JIRA as it complains that character is invalid when I click Create. The character is within the sample files zip attached - Sample Files.zip - 1st letter in the filename), invalid/garbled characters are stored in the string_end_lower & string_value columns in alf_prop_string_value table. If the Original filename is '[problem_char]由美子DSCN8762.png', the values stored are something like this

      string_end_lower value in the DB is - ?由美子dscn8762.png
      string_value - ' 由美子DSCN8762.png' (string_value column starts with a whitespace. Copying the value from DB to a text editor, it looks like the whitespace is a collection of invalid characters)

      NOTE: The problem is when they try to migrate from one DB to another (Oracle -> Postgres) the migration fails due to 'ERROR: Invalid byte sequence with encoding method "UTF 8": 0xed 0xb 0 0x8 b incompatibility' (this is because of invalid values stored in the DB). Once they change the string_end_lower & string_value columns to valid values the migration is successful.

      Steps to reproduce
      1) Enable auditing feature in the app - audit.enabled=true & audit.alfresco-access.enabled=true in alfresco-global.properties.
      2) Upload files that contain Japanese character '[problem_char]' to Alfresco (either Drag&Drop or Upload button)
      3) Documents are uploaded successfully, but checking the values stored in the alf_prop_string_value table 'string_end_lower' column value is - "?由美子dscn8762.png". string_value column starts with a whitespace. Copying the value from DB to a text editor it looks like the whitespace is a collection of invalid characters. (" 由美子DSCN8762.png")

      The issue can be reproduced internally in out of the box versions 5.0.0.12 and 5.2.2. Screenshots and sample files are attached.

      Expected Behaviour
      When documents containing Japanese character '' in the filename are uploaded, string_end_lower & string_value columns should contain the original Japanese character not the garbled or whitespace characters.

      Observed Behaviour
      Garbled characters are saved/stored in the string_value, string_end_lower columns in alf_prop_string_value table.

        Attachments

        1. DB Encoding.png
          DB Encoding.png
          28 kB
        2. Invalid Characters in DB.png
          Invalid Characters in DB.png
          43 kB
        3. postgres.PNG
          postgres.PNG
          15 kB
        4. Sample Files.zip
          73 kB

          Issue Links

            Structure

              Activity

                People

                • Assignee:
                  closedissues Closed Issues
                  Reporter:
                  kmani Karthick Mani
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  6 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel