Uploaded image for project: 'Search and Discovery'
  1. Search and Discovery
  2. SEARCH-2255

SQL avg(field) function is broken with multiple group by

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: Search Services 2.0
    • Labels:
      None

      Description

      Avg function doesn't work if you are grouping by multiple group by.
      The issue is reproducible through an ad hoc integration test in:
      org.alfresco.solr.query.stream.DistributedGroupBySqlIT
      e.g.

      sql = "select mf_freetext_underscore as underscoreField, count(mf_freetext_underscore) as numUnderscore, exif_manufacturer, finance_title" +
                          "cm_created,  " +
                          "count(*) as ExposureCount,  " +
                          "sum(exif_exposureTime) as TotalExposure,  " +
                          "avg(exif_exposureTime) as AvgExposure,  " +
                          "min(exif_exposureTime) as MinExposure,  " +
                          "max(exif_exposureTime) as MaxExposure  " +
                      "from alfresco  " +
                      "where  " +
                          "exif_exposureTime > 0 and  " +
                          "cm_content='world' and  " +
                          "cm_created BETWEEN '2000-01-01T01:01:01Z' AND '2010-02-14T23:59:59Z'" +
                      "group by mf_freetext_underscore, exif_manufacturer,finance_title";
      

      The reason is that Calcite parses the avg function as avg when there's just one group by.
      When there are multiple group by it is parsed as SUM(field)/COUNT(field).

      This is noticeable here: org.alfresco.solr.sql.SolrTable.SolrQueryable#query

      Taking a look to List<Pair<String, String>> metricPairs input for both the scenarios, the difference is clear (with the multiple group by you see the DIVISION in the String operations input parameter).

      org.alfresco.solr.sql.SolrTable.SolrQueryable#query is kinda the Alfresco custom code entry point, so I suspect the bug may be Calcite side.
      I was not able to find an elegant solution with a quick fix, addressing it may require some work (unless we opt for a brute workaround).

      Acceptance criteria :

      • avg function works when you group by multiple fields
      • few new integration tests are added for that specific use case in org.alfresco.solr.query.stream.DistributedGroupBySqlIT
      • avg(CAST(cm_sizeCurrent AS DOUBLE)) as AvgSize returns the double average even if cm_sizeCurrent was originally BIGINT

        Attachments

          Issue Links

            Structure

              Activity

                People

                • Assignee:
                  Unassigned
                  Reporter:
                  abenedetti Alessandro Benedetti
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  5 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Structure Helper Panel