Affects Version/s: None
Fix Version/s: Search Services 2.0
Sprint:Team Ninja-King - S&I 29, Team Ninja-King - S&I 30, Team Ninja-King - S&I 31, Team Ninja-King - S&I 32, Team Ninja-King - S&I 33
As a business user
When I connect to Insight Engine from Tableau
Then I want to be able use dates functions in the WHERE clause
The Tableau functions we need to support are DAY, MONTH, YEAR, Extract <unit of time> from <date field> (this is the function DAY, MONTH, YEAR translates to) and DATE
SELECT ... FROM ... WHERE MONTH(cm_modified) = '4'
SELECT ... FROM ... WHERE YEAR(cm_modified) = '2010'
SELECT ... FROM ... WHERE Extract YEAR from cm_modified = '2010'
Handle the date functions "properly" in the 2.0 branch and then backport to other relevant branches
The WHERE clause has the responsibility of parsing the date functions and mapping them to Apache Solr queries.
The way Calcite guarantee this mapping is extremely convoluted: there's a lot of indirection and reflection, where expression are first parsed, then interpreted and finally transformed to Apache Solr query in Alfresco code.
EXTRACT YEAR - OK
EXTRACT MONTH, DAY - KO
First of all I would like to express the fact that the only date function that can efficiently implemented in Apache Solr at the moment is the " EXTRACT YEAR".
This can be translated to a range query:
Extract YEAR from cm_Date = <YEAR> -> cm_date:[YEAR-01-01T00:00 TO YEAR-12-31T23:59]
Extract month and day with the current schema are going to be very unpleasant to manage:
It is possible to build extremely convoluted and not efficient queries that may reach a similar semantic, but I discourage that.
Potentially it may be possible to obtain the same semantic with custom filters and plugins, also in this case, not recommend that route
it may be possible to achieve a similar semantic with the use of the frange query parser, using a custom Solr function query (plugin) or combining multiple of the existing functions queries (such as ms(date) and sum/div/subtraction, this is not encouraged, it's very likely to be very performance heavy, defeating the scope of using Solr to perform the query
Recommendation : change the schema
On the contrary, if the feature is really necessary, I recommend to do a minimal schema redesign:
adding specific dynamic fields that contain just the year, month and day
cm_date_day, cm_date_month, cm_date_year
populating them at indexing time, in the trackers
To implement the mapping for the EXTRACT YEAR (and then potentially for the others extract, assuming we do the schema change and RE-INDEXING, these are the areas of intervention:
1) org/alfresco/solr/sql/SolrRules.java:194 must filter out the exploded version of the EXTRACT and just let pass the EXTRACT itself condition
(Calcite tends to explode it to AND(>=(/INT(Reinterpret($3), 86400000), 2000-01-01), <(/INT(Reinterpret($3), 86400000), 2001-01-01))) ) but in Solr it is not necessary.
Because we can directly translate :
Extract YEAR from cm_Date = <YEAR> -> cm_date:[YEAR-01-01T00:00 TO YEAR-12-31T23:59
2) this happens In here: org/alfresco/solr/sql/SolrFilter.java:107, where the Extract needs to be managed with an appropriate method that parse it and translate it to the forementioned lucene syntax
3) Performance must be check, before it reaches the Alfresco Solr code, Calcite does a lot of parsing and translation for the EXTRACT syntax, especially providing duplciate conditions in certain scenarios. We must make sure the translation happens just once.
- no regression in the date sql WHERE related tests
- mentioned date functions to work in the WHERE
- add e2e tests related to this work