Created
August 12, 2015 16:05
-
-
Save Terryhung/a907480ff50b266055dc to your computer and use it in GitHub Desktop.
Experiment
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
\section{Experiment} | |
Our fine-tuning takes an already learned model: BVLC CaffeNet Model. CaffeNet is modified by AlexNet. This model is the result of Caffnet training on ImageNet. We set the result of fine-tuning as our baseline. | |
We use the dataset from Microsoft: Clickture-FilteredDog. This is a subset of the Clickture-Full dataset which only contains the dog breed related items. We pick out 107 class of this subset which contains more than 100 images total 89,910 images. We use 5-fold to split this dataset: 7,1932 images for training and 17,978 for testing. | |
Our result on Clickture-FilteredDog in Table 1 and Table 2. Our network achieves accuracy of \textbf{50.5\%}. The best performance with fine-tuning is 46.2\%. | |
In Table 1, the result of our first approach, average vector, does not exceed baseline(fine-tuning). Our MMD loss does not fall down after 7000? iterations, so we consider that average reduces some information in text such that the performance do not better than fine-tuning. Form t-SNE algorithm, we can discovery some different types of dog are clustered together, so we conclude that average vector feature will lose some useful information in text because of average the vector word2vec. | |
In Table 2, most of our result surpass the baseline. Networks utilizing VLAD feature has a total 4\% improvement over baseline. The VLAD feature make our network has impressive improvement: This model let 12\% of error prediction which made by average vector method be correct and 8\% of correct prediction be error. %The loss of MMD can be lower than average vector method, it reflect that those images with dissimilar text will be pull away. | |
\begin{table}[] | |
\centering | |
\label{my-label} | |
\resizebox{!}{!}{ | |
\begin{tabular}{|l|l|l|l|l|} | |
\hline | |
Text Feature & 0.25* & 0.1* & 0.25 & 0.1 \\ \hline | |
baseline & 46.2\% & 46.2\% & 46.2\% & 46.2\% \\ \hline | |
avg-vec & & & & \\ \hline | |
\end{tabular} } | |
\caption{: The head of figure is the weight of MMD loss and the figure with an asterisk* is the left of network without fc\_adapt } | |
\end{table} | |
\begin{table}[] | |
\centering | |
\resizebox{88mm}{!}{ | |
\begin{tabular}{|l|l|l|l|l|l|l|l|} | |
\hline | |
Text Feature & 0.25 & 0.1 & 0.05 & 0.01 & 0.005 & 0.001 & 0.0005 \\ \hline | |
baseline & 46.2\% & 46.2\% & 46.2\% & 46.2\% & 46.2\% & 46.2\% & 46.2\% \\ \hline | |
vlad-1024 & 45.8\% & 48.0\% & 48.4\% & 49.6\% & 49.7\% & 49.9\% & \\ \hline | |
vlad-2048 & 43.0\% & 47.4\% & 48.8\% & 50.0\% & 50.0\% & 50.2\% & \\ \hline | |
vlad-4000 & & & 48.2\% & 49.9\% & 50.1\% & 50.1\% & 50.2\% \\ \hline | |
vlad-2048* & 43.1\% & 47.6\% & 48.3\% & 50.0\% & 50.0\% & 50.2\% & \\ \hline | |
vlad-4000* & & & 48.4\% & 50.0\% & 50.0\% & 50.3\% & {\bf 50.5\%} \\ \hline | |
\end{tabular} } | |
\caption{The head of figure is the weight of MMD loss. Text feature with an asterisk* is "local vlad"} | |
\end{table} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment