Uploaded image for project: 'Alfresco'
  1. Alfresco
  2. ALF-21846

Cannot use 4-byte unicode characters in FTS as term query


    • Type: Bug
    • Status: New (View Workflow)
    • Priority: Unprioritized
    • Resolution: Unresolved
    • Affects Version/s: Community Edition 201605 GA, Community Edition 201612 GA
    • Fix Version/s: None
    • Security Level: external (External user)
    • Labels:
    • Environment:
      Oracle JDK 1.8.0_112, Tomcat 7.0.47, MariaDB 5.5.50 with utf8mb4, PostgreSQL 9.5 on Windows 10
    • Triage:


      Alfresco as a heavily i18n-ized product is considered to fully supports unicode (provided backing database and servlet container are properly configured). It is possible to create a folder in Share using a 4-byte unicode character / emoji like the "pile of poo" character (note: JIRA does not allow inclusion of the character here) as the name.

      When searching for the folder by name, FTS queries that perform a term query fail to be parsed while FTS queries using a phrase query succeed.

      Steps to reproduce:

      1. Ensure DB / servlet container is set up to fully support unicode (note: MySQL/MariaDB use utf8 which only supports 3-byte unicode - see ACE-773)
      2. Create a folder via Share UI (e.g. in My Files) with name as the "pile of poo" emoji
      3. Open Node Browser via Admin Tools
      4. Perform a FTS query for =cm:name:pileOfPooEmoji
      5. Perform a FTS query for =cm:name:"pileOfPooEmoji"

      Expectation: Both FTS queries succeed and show the folder
      Observation: Only the phrase query succeeds - the term query reports "no viable alternative at character"

      Assumption / analysis: The FTSLexer does not correctly handle characters when looking for tokens. Instead of handling unicode code points it may only be handling individual characters without checking for surrogate pairs (high/low characters).

      I only found a mention of SOLR-based limitation to ~32.700 UTF-8 code points in SEARCH-87. In my case I am not running into SOLR limitations since I am using a transactionally executed query and the error occurs at the FTS parsing stage.


          Issue Links



              • Assignee:
                searchAndDiscovery Search and Discovery
                afaust Axel Faust
              • Votes:
                0 Vote for this issue
                3 Start watching this issue


                • Created:
                  Date of First Response: