Computer Vision News - January 2018

2. The img_model function initiates a placeholder layer for the image weights (features). Actually, the code below doesn’t implement the entire CNN network (in this case VggNet), the way this model was implemented, the network was pre-trained on all the images, and at this point the layer is prepared for loading the weights determined in pre-training. Computer Vision News Tool 19 def img_model(dropout_rate): print("Creating image model...") model = Sequential() model.add(Dense(1024, input_dim=4096, activation='tanh')) return model def Word2VecModel(embedding_matrix, num_words, embedding_dim, seq_length, dropout_rate): print("Creating text model...") model = Sequential() model.add(Embedding(num_words, embedding_dim, weights=[embedding_matrix], input_length=seq_length, trainable=False)) model.add(LSTM(units=512, return_sequences=True, input_shape=(seq_length, embedding_dim))) model.add(Dropout(dropout_rate)) model.add(LSTM(units=512, return_sequences=False)) model.add(Dropout(dropout_rate)) model.add(Dense(1024, activation='tanh')) return model 3. The final function of the model, vqa_model, fuses the features extracted from the text and image into a single features vector, and adds 2 fully connected layers as well as setting the optimizer for training the network. Tool