Uploaded image for project: 'Machine Learning'
  1. Machine Learning
  2. MLEARN-11

Configuration to reduce the size of networks

    XMLWordPrintable

Details

    • Story
    • Resolution: Done
    • Neutral
    • 1.1
    • None
    • None
    • Foundation 6, Foundation 7
    • 2

    Description

      Make the number of output labels configurable - currently it is hardcoded to 10'000 (https://git.magnolia-cms.com/projects/ENTERPRISE/repos/machine-learning/browse/periscope-result-ranker/src/main/java/info/magnolia/periscope/rank/ml/NeuralNetworkResultRanker.java#65).

      Default value: 10'000

      Original Ticket:

      As a further measure to mitigate possible memory issues, we could reduce the size of networks, for example reducing the max number of output units (labels) from 10k to 1k would likely shave off ~70% of its size. Here some possible downsides

      • Shrink the output layer from 10k to 1k -> we would "forget" rankings more quickly, that is, once we've seen 1001 different results, the first one would be forgotten. It's a bit hard to estimate how soon that would be in a typical setup, but 10k certainly feels safer.
      • Ignore non-printable ascii characters and perhaps uppercase letters -> ideally no effect on accuracy since those are useless anyway. But we'd only save a small percentage (maybe 10%?) and have quite a bit of implementation work.
      • Shrink hidden layers -> that's a gamble, hard to predict.

      Checklists

        Acceptance criteria

        Attachments

          Issue Links

            Activity

              People

                fgrilli Federico Grilli
                fgrilli Federico Grilli
                Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                  Created:
                  Updated:
                  Resolved:

                  Checklists

                    Task DoD

                    Time Tracking

                      Estimated:
                      Original Estimate - Not Specified
                      Not Specified
                      Remaining:
                      Remaining Estimate - 0d
                      0d
                      Logged:
                      Time Spent - 2d
                      2d