Uploaded image for project: 'Alfresco One Platform'
  1. Alfresco One Platform
  2. ACE-4812

DojoWidgetsDependencyRule - high page load overhead due to regex evaluation and complex JSON models

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 5.0.d Community, Community Edition 201510 EA
    • Fix Version/s: 5.2.N
    • Component/s: Web Scripts and Surf
    • Labels:
    • Environment:
      Share web application - configured in "production" mode with "client-debug" set to "false"

      Windows 7, Java 8u40
      JVM parameters include -XX:+UseG1GC -Xms1G -Xmx1G
    • ACT Numbers:

      Community

      Description

      In our evaluation / prototyping with Aikau we have created several pages that dynamically generate complex JSON models based on dictionary information, user roles and meta descriptions of widgets. We have noticed that with increasing complexity of the resulting JSON page model, the overhead of the regex processing in DojoWidgetsDependencyRule increases in weight relative to all other operations, causing page load times of up to 3 seconds for our most complex page, with about 2.5 - 2.8 seconds being associated with processRegexRules alone.

      In constructed tests we have tested the overhead with different parameters regarding depth / breadth of widget model (how deeply nested / how many in one layer), as well as amount of misc. data (representing configuration, labels, mappings, rules....). A clear rule of thumb for scaling could not be determined, but it became obvious that the overhead scales unfavoribly with increasing depth of the JSON model as well as amount / complexity of misc. data. CPU sampling shows a significant relative amount of total CPU time being consumed by processRegexRules with up to 65% for very complex JSON models.

      We have also observed load requests running into OutOfMemoryErrors during regex evaluation despite the JSON model primarily consisting of re-used JS objects. Memory profiling indicates a significant amount of temporary Strings / char[] being created during the evaluation (due to i.e. group extraction during recursion and backtracking of matcher).

      Using regex to evaluate the JSON page model is a very inefficient approach. Regex is not well suited for analysing arbitrary structures built upon context-free languages / representations. Especially the nesting of structures in JSON and backtracking involved in matching the simplistic regular expression (configured within Surf)

      "?widgets"?[^:]*?:[\r\s\t\n]*\[(\{(.*)\})\]
      

      will result in high CPU usage (observed near constant 100% core utilization during dependency collection) and high memory usage handling partial / temporary matches.

      A JSON-aware evaluation logic (i.e. using parsing and object traversal) would be far better suited to the use case of collecting widget module names over the JSON model. An experimental, custom dependency rule we have tried showed a reduction in processing times from several seconds to a constant double digit millisecond range for our most complex page.

      The attached web script allows testing regex overhead with different values for the paramaters "depth", "breadth" and "properties". It tries to reduce pure JS overhead by re-using widget / data structures, uses a stable set of required widgets and outputs the time needed between start of browser request and end of server response (using client-side Navigation TIming API).
      The attached screenshots highlight the two key observations when running the web script on the most recent 5.1 EA release.

      The attached custom dependency rule class attempts JSON parsing / traversal for inline JavaScript (most likely JSON model of services and widgets), falling back to regex upon encountering a JSONException. This class was used to stop-watch regex processing without attaching a profiler and showcased a 98% reduction in processing time for our most complex JSON model (~2800ms regex vs. ~50ms JSON).

      === secondary info / fragments about our use case ===

      An example of a complex (real-life) JSON model is the following page:

      • top level widgets: navigation menu, tab container, data list
      • tab container defines 7 tabs using delayed processing
      • tabs are each associated with a pre-defined currentItem which is a complex JSON data object loaded by the page web script (needs to be loaded for dynamic generation, so can't be converted to asynch client side load)
      • 5 of 7 tabs each contain a data list with paginator / toolbar, 5 - 8 columns, actions
      • 2 tabs contain a structured read-only view of metadata, but provide a button to trigger a form dialog with the button payload containing the dialog widget model
      • the primary data list includes paginator / toolbar, selected items actions and creation actions which in turn define dialog widget models
      • the primary data list contains definitions for 40-60 columns with subsets of those being made dynamically visible depending on state of context / role of user
      • most columns allow inline editing with an enhanced inline edit widget that supports all major form controls (text, date, number, simple select, filtering select, multi select) including their configurations (i.e. options config)
      • status column of primary data list contains rule-based configuration to visualize any of 15 possible entry states with the appropriate icon

      When stored to disk, the generated JSON model (as processed by DojoWidgetsDependencyRule) clocks in at 442 KiB.

      Our generation logic includes the following automatisms (for each column, form field or simple property display):

      • resolve displayed content model property from the repository (includes local cache)
      • mixin model title / description as widget title / description
      • mixin type and mandatory-ness information into widget
      • generate validation rules based on constraints
      • generate options config based on list of values constraint
      • generate value display map based on list of values constraint
      • column only: generate header and content cell
      • column only: generate filter configuration for custom header widget to drive a table filter dialog (providing simple value, list and range filter capabilities)
      • generate visbilityConfig / renderFilter rules

      Of course optimizing the JSON model to reduce its complexity is one of our recurring tasks during refactoring, but we have reached the point where the effort to manually optimize parts of the model / widgets far exceeds the benefit gained. It also requires coding more and more business logic into the model construction, making long-term maintenance more difficult.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                shareteam Share Team
                Reporter:
                afaust Axel Faust
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated: