Uploaded image for project: 'Alfresco One Platform'
  1. Alfresco One Platform
  2. ACE-5572

Search on version labels returns incorrect results

    Details

    • ACT Numbers:

      00177409

      Description

      Testing search with version labels has uncovered a potential problem with the way Alfresco tokenizes version labels in the index.

      For example:

      We have two distinct documents in a repository. One is at version 1.1, and the other at version 11.0. If we search for files with a version label of 11, using a query like this:

      @cm\:versionLabel:"11"

      We get both the version 1.1 and version 11 documents returned in our search results, despite the fact that only one of the two documents is actually at version 11. If we add the .0 to the query:

      @cm\:versionLabel:"11.0"

      The we get the expected result, only files at version 11.0 are shown.

      Looking into the content model, the indexing behavior is not defined for the cm:versionLabel property. The default value is to tokenize the field for indexing. It looks like the tokenization is creating two entries for the versionLabel property in the index. One with the "." character, and one without. This is confirmed by inspecting the index with Luke (screenshot attached). A deeper inspection of Alfresco revealed that we use a custom Lucene TokenFilter (AlfrescoStandardFilter) that might be the source of the problem. This class is called by the AlfrescoStandardAnalyser, and appears to detect acronyms in dotted format (C.M.I.S for example). If it is detected then a new token is created for the acronym, stripped of its "." characters (CMIS, in our example case). I think this is happening to the version labels. So, a version label of 11.0 is ending up in the index as both 11.0 and 110, and a version label of 1.1 is ending up in the index as both 1.1 and 11. Thus, a search for 11 hits on both documents, incorrectly.

      I think we can fix this by altering the cm:versionLabel's index behavior so that it is not tokenized.

        Attachments

        1. 1.png
          1.png
          33 kB
        2. document - v1.1.png
          document - v1.1.png
          47 kB
        3. document - v11.0.png
          document - v11.0.png
          47 kB
        4. Luke - versionLabel top terms.png
          Luke - versionLabel top terms.png
          82 kB
        5. search results for 11.png
          search results for 11.png
          30 kB

          Activity

            People

            • Assignee:
              closedissues Closed Issues
              Reporter:
              nmcminn Nathan McMinn [X] (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 1 week, 1 day, 3 hours
                1w 1d 3h