The customer is in the multi-media and entertainment industry and would like to use the model to process a video/scene and get a caption for the video/scene.
TECHNOLOGIES / TOOLS
Caffe
OpenCV
CNN – Object detection – VGG trained on ILSVRC-2012 dataset
LSTM networks
MODEL HIGHLIGHTS
Sequence to Sequence
Can handle variable length video input and generate variable length caption output
Two stage LSTM, one for Sequence to Sequence encoding of the input video and the second LSTM for decoding and encoding, to model the output word sequence