[IMGREC-40] Train custom neural network for image recognition Created: 26/Oct/18  Updated: 22/Aug/19  Resolved: 09/Jan/19

Status: Closed
Project: Image Recognition
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Neutral
Reporter: Cedric Reichenbach Assignee: Cedric Reichenbach
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
is caused by MGNLPER-17 Recognize typical marketing images re... Closed
Template:
Acceptance criteria:
Empty
Task DoD:
[ ]* Doc/release notes changes? Comment present?
[ ]* Downstream builds green?
[ ]* Solution information and context easily available?
[ ]* Tests
[ ]* FixVersion filled and not yet released
[ ]  Architecture Decision Record (ADR)
Date of First Response:
Epic Link: Periscope improvements
Sprint: Basel 161, Foundation 1
Story Points: 13

 Description   

See MGNLPER-17 for background, user stories and business benefit.

So far, we're using a provided model from the dl4j zoo with pre-trained weights for the ImageNet-1000 dataset, which is not a good fit for general-purpose image recognition (see MGNLPER-17 for details).

Since there seem to be no pre-trained networks available that are a better fit, we should train our own. However, complete training from scratch should not be necessary; just doing transfer learning by "fine-tuning" an existing pre-trained model should be enough: Only replace the output layer with one that fits our new number of classes, then freeze all other layers and train on e.g. ImageNet data. Documentation about transfer learning with dl4j

TBD: Classes (labels) we want to support



 Comments   
Comment by Antti Hietala [ 11/Dec/18 ]

TBD: Classes (labels) we want to support

I propose Core Wordnet 5000, see MGNLPER-17. It's a list of 5000 most frequently used English language words, of which 3000 are nouns (we should exclude the adjectives and verbs). Download: http://wordnetcode.princeton.edu/standoff-files/core-wordnet.txt

Comment by Cedric Reichenbach [ 11/Dec/18 ]

ahietala sounds like a good idea. However, there are a couple potential issues:

  • Some terms are not supported by ImageNet, sometimes it doesn't even make sense because there's no typical depiction for a given noun, e.g. "truth".
  • There might not be enough data in ImageNet for some of the supported synsets (synonym set, representing a "thing"). I would say as a rule of thumb, we should have at least 100 images per synset to have useful results.
  • Some words match multiple synsets, e.g. "band" might refer to a stripe, or a music group. The question is then how we should do the tagging there. The approach I've started taking so far is to find the first synset supported by ImageNet and only use that one. However, we could also say we train for all supported synsets of a word and just always apply the same tag.
Generated at Mon Feb 12 02:08:56 CET 2024 using Jira 9.4.2#940002-sha1:46d1a51de284217efdcb32434eab47a99af2938b.