Uploaded image for project: 'Search and Discovery'
  1. Search and Discovery
  2. SEARCH-2166

Support for Date Functions in WHERE Clause

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: Search Services 2.0
    • Labels:
      None
    • Sprint:
      Team Ninja-King - S&I 29, Team Ninja-King - S&I 30, Team Ninja-King - S&I 31, Team Ninja-King - S&I 32, Team Ninja-King - S&I 33
    • Release Train:
      Southall
    • Delivery Team:
      Search
    • Story Points:
      20

      Description

      Background
      As a business user
      When I connect to Insight Engine from Tableau
      Then I want to be able use dates functions in the WHERE clause
      The Tableau functions we need to support are DAY, MONTH, YEAR, Extract <unit of time> from <date field> (this is the function DAY, MONTH, YEAR translates to) and DATE
      e.g.
      SELECT ... FROM ... WHERE MONTH(cm_modified) = '4'
      SELECT ... FROM ... WHERE YEAR(cm_modified) = '2010'
      SELECT ... FROM ... WHERE Extract YEAR from cm_modified = '2010'
      Potential Approach
      Option 3
      Handle the date functions "properly" in the 2.0 branch and then backport to other relevant branches
      Implementation Recommendations
      The WHERE clause has the responsibility of parsing the date functions and mapping them to Apache Solr queries.
      The way Calcite guarantee this mapping is extremely convoluted: there's a lot of indirection and reflection, where expression are first parsed, then interpreted and finally transformed to Apache Solr query in Alfresco code.

      Re-Indexing considerations
      EXTRACT YEAR - OK
      EXTRACT MONTH, DAY - KO
      First of all I would like to express the fact that the only date function that can efficiently implemented in Apache Solr at the moment is the " EXTRACT YEAR".
      This can be translated to a range query:
      Extract YEAR from cm_Date = <YEAR> -> cm_date:[YEAR-01-01T00:00 TO YEAR-12-31T23:59]
      Extract month and day with the current schema are going to be very unpleasant to manage:
      It is possible to build extremely convoluted and not efficient queries that may reach a similar semantic, but I discourage that.
      Potentially it may be possible to obtain the same semantic with custom filters and plugins, also in this case, not recommend that route
      it may be possible to achieve a similar semantic with the use of the frange query parser, using a custom Solr function query (plugin) or combining multiple of the existing functions queries (such as ms(date) and sum/div/subtraction, this is not encouraged, it's very likely to be very performance heavy, defeating the scope of using Solr to perform the query

      Recommendation : change the schema
      On the contrary, if the feature is really necessary, I recommend to do a minimal schema redesign:
      adding specific dynamic fields that contain just the year, month and day
      e.g.
      cm_date_day, cm_date_month, cm_date_year
      populating them at indexing time, in the trackers

      IMPLEMENTATION
      To implement the mapping for the EXTRACT YEAR (and then potentially for the others extract, assuming we do the schema change and RE-INDEXING, these are the areas of intervention:
      1) org/alfresco/solr/sql/SolrRules.java:194 must filter out the exploded version of the EXTRACT and just let pass the EXTRACT itself condition
      (Calcite tends to explode it to AND(>=(/INT(Reinterpret($3), 86400000), 2000-01-01), <(/INT(Reinterpret($3), 86400000), 2001-01-01))) ) but in Solr it is not necessary.
      Because we can directly translate :
      Extract YEAR from cm_Date = <YEAR> -> cm_date:[YEAR-01-01T00:00 TO YEAR-12-31T23:59
      2) this happens In here: org/alfresco/solr/sql/SolrFilter.java:107, where the Extract needs to be managed with an appropriate method that parse it and translate it to the forementioned lucene syntax
      3) Performance must be check, before it reaches the Alfresco Solr code, Calcite does a lot of parsing and translation for the EXTRACT syntax, especially providing duplciate conditions in certain scenarios. We must make sure the translation happens just once.

      Acceptance Criteria

      • no regression in the date sql WHERE related tests
      • mentioned date functions to work in the WHERE
      • add e2e tests related to this work

        Attachments

          Issue Links

            Structure

              Activity

                People

                • Assignee:
                  Unassigned
                  Reporter:
                  abenedetti Alessandro Benedetti
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  4 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel