Uploaded image for project: 'Alfresco One Platform'
  1. Alfresco One Platform
  2. ACE-773

Encoding on MySQL should be utf8mb4, not utf8

    Details

      Description

      According to this article, the MySQL utf8 encoding is not a full representation of UTF-8 - specifically, none of UTF-8's 4 byte sequences are supported.

      Since MySQL 5.5, MySQL has supported an encoding with true UTF-8 support, called utf8mb4.

      Alfresco, which expects full UTF-8 support, should force the use of this encoding.

        Attachments

          Activity

          Hide
          pmonks Peter Monks [X] (Inactive) added a comment -

          I've uploaded a file containing the text of the reproduction steps, as I'm unable to paste it here.

          Show
          pmonks Peter Monks [X] (Inactive) added a comment - I've uploaded a file containing the text of the reproduction steps, as I'm unable to paste it here.
          Hide
          jsoria Jennie Soria added a comment -

          This issue is impacting customers with Swedish/Nordic languages (MySql). Ability to implement the

          CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci

          Which is a superset off utf8 would likely resolve this.

          Show
          jsoria Jennie Soria added a comment - This issue is impacting customers with Swedish/Nordic languages (MySql). Ability to implement the CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci Which is a superset off utf8 would likely resolve this.
          Hide
          dhulley Derek Hulley added a comment -

          Please provide details of the attempts made to work around the issue (installing MySQL with default collation, setting the JDBC URL, etc).

          Show
          dhulley Derek Hulley added a comment - Please provide details of the attempts made to work around the issue (installing MySQL with default collation, setting the JDBC URL, etc).
          Hide
          jsoria Jennie Soria added a comment -

          Customer case added to this jira's ACT field as ref. It is under review right now for exact issue scenario. Back ground summary their issue occurs with upgraded from older v4.1 Alfresco mysql 5.5 to newer version v4.2.4 Alfresco. We will update with futher details as soon as possible.

          Show
          jsoria Jennie Soria added a comment - Customer case added to this jira's ACT field as ref. It is under review right now for exact issue scenario. Back ground summary their issue occurs with upgraded from older v4.1 Alfresco mysql 5.5 to newer version v4.2.4 Alfresco. We will update with futher details as soon as possible.
          Hide
          dhulley Derek Hulley added a comment -

          Someone can raise an issue in ACE project if it's a requirement.

          Show
          dhulley Derek Hulley added a comment - Someone can raise an issue in ACE project if it's a requirement.
          Hide
          afaust Axel Faust added a comment - - edited

          This issue is also limited use in some social content use cases, specifically when users try to use emojis like ("pile of poo").

          For a customer I am currently working on I upgraded the encoding to utf8mb4 on a MariaDB 5.5.50 install today. The main issue encountered was in relation to the key length in some Activiti tables where the key length with 4-byte unicode ended up longer than the default 767 byte index key length limit for InnoDB. This affects e.g. the table act_re_procdef which includes the columns KEY_ (varchar 255), VERSION_ (int 11), TENANT_ID_ (varchar 255) in a unique key constraint.
          I dealt with this issue by using COMPRESSED row format (Barracuda storage file format) and configuring innodb_large_prefix to "on".

          Note: I wanted to include the 4-byte unicode character for "piile of poo" in this comment as an example but JIRA apparently also does not suport 4-byte unicode.

          Show
          afaust Axel Faust added a comment - - edited This issue is also limited use in some social content use cases, specifically when users try to use emojis like ("pile of poo"). For a customer I am currently working on I upgraded the encoding to utf8mb4 on a MariaDB 5.5.50 install today. The main issue encountered was in relation to the key length in some Activiti tables where the key length with 4-byte unicode ended up longer than the default 767 byte index key length limit for InnoDB. This affects e.g. the table act_re_procdef which includes the columns KEY_ (varchar 255), VERSION_ (int 11), TENANT_ID_ (varchar 255) in a unique key constraint. I dealt with this issue by using COMPRESSED row format (Barracuda storage file format) and configuring innodb_large_prefix to "on". Note: I wanted to include the 4-byte unicode character for "piile of poo" in this comment as an example but JIRA apparently also does not suport 4-byte unicode.
          Hide
          dhulley Derek Hulley added a comment -

          It is perfectly reasonable for installations to use a wider or narrower encoding according to specific requirements. MySQL is also not the only database choice.

          We would have a lot of work to update all installations for something that is, in general, not a problem.

          Show
          dhulley Derek Hulley added a comment - It is perfectly reasonable for installations to use a wider or narrower encoding according to specific requirements. MySQL is also not the only database choice. We would have a lot of work to update all installations for something that is, in general, not a problem.

            People

            • Assignee:
              closedissues Closed Issues
              Reporter:
              pmonks Peter Monks [X] (Inactive)
            • Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: