Last active
November 16, 2018 04:27
-
-
Save ZHAOZHIHAO/d09a221754fe8e876afadface7c3155c to your computer and use it in GitHub Desktop.
Short notes on papers
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Mobilenet | |
Thanks to: | |
https://blog.csdn.net/t800ghb/article/details/78879612 | |
https://blog.csdn.net/wfei101/article/details/78310226 | |
(Mobilenet V2)https://blog.csdn.net/u011995719/article/details/79135818 | |
Suppose there are M input channels, and N kernels, then the output are N output channels. | |
With padding to let the output channel size the same as the input channel. | |
Standard convolution: | |
The times of calculation is | |
(Dk * Dk) * (Dc * Dc) * N * M (1) | |
where Dk is dimension of kernel, and Dc is the dimension of input/output. | |
Convolution in Mobilenet: | |
First, use only a kernel on the M input channels to get M output channels, | |
The times of calculation is | |
(Dk * Dk) * (Dc * Dc) * M (2) | |
Second, use N * M * (Df * Df) kernels(every kernel here is 1 * 1 size) on the M output chanels in the frist step, | |
to get the same dimension output as in standard convolution. | |
The times of calculation is | |
N * M * (Df * Df) (3) | |
(1) / ((2) + (3)), is (1/N)+(1/(Dk*Dk)), this is usually 1/8 or 1/9 as said in this paper. | |
V2 is to let mobilenet adapt with resnet. See the previous link. | |
2. Soccer on Your Tabletop | |
See the demo the know what they done. https://www.youtube.com/watch?v=eRGAB4QBS6U | |
Input image -> camera calibration -> player detection -> pose estimation -> tracking -> player segmentation | |
-> player depth estimation -> mesh generation -> scene reconstruction | |
(1). camera calibration: | |
using sidelines, penalty box around the goal, solve for the camera parameters w | |
(focal length, rotation and translation) that align rendered synthetic field lines with the extracted edge points. | |
i.e., min distance(extracted edges, rendered synthetic field lines) w.r.t parameters w. | |
(2). player detection: | |
using faster rcnn. | |
(3). pose estimation: | |
detect person keypoint by "Convolutional pose machines". | |
using keypoint to refine the bounding box in (2). | |
(4). tracking: | |
tracking by the bounding box in (3) | |
(5). player segmentation: | |
For every tracked player we need to estimate its segmentation mask to be used in the depth estimation network. | |
A straightforward approach is to apply at each frame a person segmentation method [52], refined with a dense CRF[25] | |
as we did for training. | |
[52]F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. In ICLR, 2016 | |
[25]P.Krahenbuhl and V.Koltun. Efficient inference in fully connected crfs with gaussian edge potentials. In NIPS, 2011 | |
(6). player depth estimation | |
traing: | |
using FIFA games to get (depth, image, estimated segmentation mask) to train a network. | |
Note that we use a player-centric depth estimation because | |
we get more training data by breaking down each frame into 10-20 players, | |
and it is easier for the network to learn individual player’s configuration rather than whole-scene arrangements. | |
inference: | |
input: image with persons' segmentation mask(I think it's processed one by one) | |
output: depth at every pixel | |
(7). mesh generation | |
The depth map is then unprojected to world coordinates using the camera parameters, generating the player’s point-cloud | |
in 3D. Each pixel corresponds to a 3D point and we use pixel connectivity to establish faces. We texture-map the mesh | |
with the input image. | |
In short, convert depth map to world coordinates, then get meshes with these points, finaly texture-map these meshes. | |
3. Simple Baselines for Human Pose Estimation and Tracking | |
This paper uses a simple method, but still gets state-of-the-art, which I think is because of the flownet. | |
The code will be available later. | |
Pipeline: | |
1. detect person | |
using faster rcnn. if not detected, using flownet and pose keypoint in the previous frame to get the predicted keypoints | |
in the current frame. Then estimate the bounding box by the predicted keypoints. | |
2. predict pose keypoint | |
estimate human pose with a CNN(nothing special for this CNN, see fig 1), where the input is the bounding box in step 1. | |
3. association | |
the distance is calculated as distance(real detected pose, predicted pose from the previous pose with flownet). | |
I think here the distance is just the pixel distance. | |
The association is very simple. Everytime, we get the least distance pair and pop them from the pool. The remaining | |
unmatched pose(bounding box) is treated as new target. | |
4. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields | |
c++/tensorflow/pytorch/... version code available. | |
As said in the paper, "Our method has achieved the speed of 8.8 fps for a video with 19 people". The c++ version code should | |
be faster. | |
A nice comment(maybe by the author): http://image-net.org/challenges/talks/2016/Multi-person%20pose%20estimation-CMU.pdf | |
pipeline: | |
First, all keypoints are detected without grouping them to individual persons. | |
Then, these keypoints are connected/grouped to get different people, by greedy matching. The author uses Part Affinity Fields | |
to calculate the score on connecting two keypoints. | |
For the Part Affinity Fields, see page 38, 46, 47, 48 in the comment PDF. | |
The Part Affinity Fields branch and key points detection are jointly learning and inferencing. So it's fast. | |
5. Depth-aware CNN for RGB-D Segmentation | |
https://arxiv.org/pdf/1803.06791.pdf | |
Equations (2) (3) (4) (5) are all theoretical content of this paper. | |
6. Relational inductive biases, deep learning, and graph networks | |
GN := graph network | |
(i). I think the authors "propose" GN for tasks where the structrue prior of CNN doesn't suit, such as relation/interaction | |
reasoning. For example, | |
"Graph Networks as Learnable Physics Engines for Inference and Control" on ICLR 2018 | |
video, https://m.facebook.com/story.php?story_fbid=429607650887089&id=118896271958230&_rdr | |
"Neural Message Passing for Quantum Chemistry" on ICML 2017, | |
video, https://vimeo.com/238221090 | |
I think the background for proposing GN holds true if we treat deep learning(CNN/RNN/autoencoder ...) as a new basic "atom", and | |
use it in more sophisticated cases. As the two says, | |
https://twitter.com/goodfellow_ian/status/1042246801376436224 | |
https://medium.com/@karpathy/software-2-0-a64152b37c35 | |
(ii). There are many reference papers, and they are said as application of GN. But actually, only few of them are direct applications | |
of this structure form. But the authors prove that those undirected applications can be tranformed to GN form, which I think is | |
definitely true by intuition. The undirected application, for example, | |
"Non-local neural networks", on CVPR 2018, | |
Figure 2 and equations (1), (6) are core in theoretical part. | |
7. Dynamic Routing Between Capsules | |
reference: [1]. Neural Network Encapsulation | |
The input are for example, many channel feature maps v_i, then we got many capsules v_{j|i}. | |
Then, from the capsules v_{j|i}, the output are predictions, for example, 10 vectors for mnist problem. | |
The output is vector, because it encodes objects' information in addition to catgory, such as rotation, scale. The length of | |
this vector is probability. | |
The output is $s_j=\sum_i{c_{ij}{\hat{v}_{j|i}}$, where j is the number of classes, for example, 10 for mnist, s_j is the | |
vector for every class. | |
What we want is good c_{ij}, because c_{ij} allows us to give an interpretation on the output s_j. And in my opinion, | |
the c_ij is why they give the name of "routing". To get good c_ij, we use EM. The variables in EM includes, c_ij, | |
$p(\hat v_{j|i}, u, \sigma)$, s_j, u, sigma. Here u and \sigma are involved because we use gaussian cluster. u and \sigma are | |
model parameters in Gaussian model. To see the rough EM step, see equations (2), (3) in [1], note that in my notation a_j is | |
replaced by s_j, . | |
nice words: | |
The objective of the EM (Expectation Maximization) routing is to group capsules to form a part-whole relationship using a | |
clustering technique (EM). | |
8. CNN visualization | |
based on the code https://github.com/utkuozbulak/pytorch-cnn-visualizations | |
(1) Generate an image that maximizing a specific filter's output in a specific layer | |
A random or zero image is generated | |
Then input to a network | |
Then get the desired filter's output | |
Then because of our goal, treat that filter's output in negative form as loss | |
Then backward the loss to input image | |
Then adjust input image according to loss | |
Iterate several times | |
(2) Generate an image that maximizing a class probability | |
This is actually the same. One thing is that, the probability is not in [0, 1] after softmax, but right before softmax | |
in the code. From some papers, it's said right before softmax could have better result than after that. | |
(3) Vanilla backprop | |
For all classes right before softmax, only set the graddient of the desired class be 1, and all the left 0. | |
Backprop this gradient vector(actually a scalar 1 since 0 element doesn't backprop) to input image. | |
Got a gradient (image) for the input image, then equalize and strech to [0, 255] for better view. | |
(4) gradcam.py (Now I don't know what is the term name of the operation described below) | |
To see one specific layer's output, aggregating all channels (convolution output) in that layer into a single one by | |
weighted sum, and weight for a channel is the mean of the gradient on that channel (a scalar mean on a 2D feature map). | |
Then normalize the result to [0, 255]. | |
(5) Guided backprop | |
This modify a little on Vanilla backprop | |
For all classes right before softmax, only set the graddient of the desired class be 1, and all the left 0. | |
In a network, a --> b, where a is the sum of elementwise multiplication of a filter and an image patch, --> is relu, | |
b is a after -->. During the backprop, the gradient on b is set to 0 if it's smaller than 0. | |
(6) smooth_grad.py | |
Generate a noise image with 0 mean and not large variance. | |
Add this noise image to the original image. | |
Do vanilla backprop or guided backprop on the new image, get the (un-post-processed) gradient image. | |
Repeat the above process for serveral time, and average these images. | |
Post-process the average image to have a better looking. | |
(7) guided_gradcam.py | |
This is just pointwise multiplication of gradcam.py mask and guided backprop mask. | |
(8) inverted_representation.py (Understanding Deep Image Representations by Inverting Them) | |
For a input image, for example, a cat image, we get its feature maps at a specific layer. | |
Then we generate a random image, and get the feature maps for the random image at the same layer as the input image do. | |
Then we calculate the L2 distance(loss) between the two sets of feature maps, plus some regularization terms(for smooth result). | |
There are two kind of regularizer in the paper. The first kind is x(x=6 in the paper) norm. I don't know how the effect of | |
this regularizer. The second kind of regularizer is the x-axis and y-axis first order direvation, this encourage the | |
result image to have constant regions. | |
Then we use the loss the update the input image, and iterative for several times. | |
9. 15 Logical Fallacies You Should Know Before Getting Into a Debate | |
https://thebestschools.org/magazine/15-logical-fallacies-know/ | |
(1) Ad Hominem Fallacy | |
Instead of addressing the candidate’s stance on the issues, or addressing his or her effectiveness as a statesman or | |
stateswoman, ad hominems focus on personality issues, speech patterns, wardrobe, style, and other things that affect | |
popularity but have no bearing on their competence. | |
(2) Straw Man | |
In the straw man fallacy, someone attacks a position the opponent doesn’t really hold.Instead of contending with the | |
actual argument, he or she instead attacks the equivalent of a lifeless bundle of straw, an easily defeated effigy. | |
Straw man fallacies are a cheap and easy way to make one’s position look stronger than it is. Often the straw man | |
fallacy is accidental, because one doesn’t realize he or she is oversimplifying a nuanced position, or | |
misrepresenting a narrow, cautious claim as if it were broad and foolhardy. | |
(3) Appeal to Ignorance (argumentum ad ignorantiam) | |
Consider the following two claims: “No one has ever been able to prove definitively that extra-terrestrials exist, | |
so they must not be real.” “No one has ever been able to prove definitively that extra-terrestrials do not exist, so | |
they must be real.” If we don’t know whether they exist, then we don’t know that they do exist or that they don’t | |
exist. Ignorance doesn’t prove any claim to knowledge. | |
(4) False Dilemma/False Dichotomy | |
False Dilemma fails by limiting the options to two when there are in fact more options to choose from. For example, | |
there are only two kinds of people in the world, people who love Led Zeppelin, and people who hate music. | |
It’s not a fallacy if there really are only two options. For example, “either Led Zeppelin is the greatest band of | |
all time, or they are not.” That’s a true dilemma, since there really are only two options there. | |
(5) Slippery Slope | |
The slippery slope fallacy suggests that unlikely or ridiculous outcomes are likely when there’s just not enough | |
evidence to think so. You may have used this fallacy on your parents as a teenager: “But, you have to let me go to | |
the party! If I don’t go to the party, I’ll be a loser with no friends. Next thing you know I’ll end up alone and | |
jobless living in your basement when I’m 30!” | |
(6) Circular Argument (petitio principii) | |
When a person’s argument is just repeating what they already assumed beforehand, it’s not arriving at any new | |
conclusion. We call this a circular argument or circular reasoning. Another way to explain circular arguments is | |
that they start where they finish, and finish where they started. | |
(7) Hasty Generalization | |
Hasty generalizations are general statements without sufficient evidence to support them. They are general claims | |
too hastily made, hence they commit some sort of illicit assumption, stereotyping, unwarranted conclusion, | |
overstatement, or exaggeration. Is one example enough to prove the claim that "Apple computers are the most | |
expensive computer brand?" What about 12 examples? What about if 37 out of 50 apple computers were more expensive | |
than comparable models from other brands? A simple way to avoid hasty generalizations is to add qualifiers like | |
“sometimes,” "maybe," "often," or "it seems to be the case that ... ". | |
(8) Red Herring (ignoratio elenchi) | |
A “red herring” is a distraction from the argument typically with some sentiment that seems to be relevant but | |
isn’t really on-topic. This tactic is common when someone doesn’t like the current topic and wants to detour into | |
something else instead, something easier or safer to address. | |
(9) Tu Quoque Fallacy | |
The “tu quoque,” Latin for “you too,” is also called the “appeal to hypocrisy” because it distracts from the argument | |
by pointing out hypocrisy in the opponent. If Jack says, “Maybe I committed a little adultery, but so did you Jason!” | |
Jack is trying to diminish his responsibility or defend his actions by distributing blame to other people. But no one | |
else’s guilt excuses his own guilt. No matter who else is guilty, Jack is still an adulterer. | |
(10) Causal Fallacy | |
The Causal Fallacy is any logical breakdown when identifying a cause. You can think of the Causal Fallacy as a parent | |
category for several different fallacies about unproven causes. | |
i). One causal fallacy is the False Cause or non causa pro causa ("not the-cause for a cause") fallacy, which is | |
when you conclude about a cause without enough evidence to do so. Consider, for example, “Since your parents | |
named you ‘Harvest,’ they must be farmers.” | |
ii). Another causal fallacy is the Post Hoc fallacy. This fallacy happens when you mistake something for the | |
cause just because it came first. “Yesterday, I walked under a ladder with an open umbrella indoors while | |
spilling salt in front of a black cat. And I forgot to knock on wood with my lucky dice. That must be why I’m | |
having such a bad day today. It’s bad luck.” | |
iii). Another kind of causal fallacy is the correlational fallacy. This fallacy happens when you mistakenly | |
interpret two things found together as being causally related. Two things may correlate without a causal relation, | |
or they may have some third factor causing both of them to occur. Or perhaps both things just, coincidentally, | |
happened together. Consider for example, “Every time Joe goes swimming he is wearing his Speedos. Something | |
about wearing that Speedo must make him want to go swimming.” | |
(11) Fallacy of Sunk Costs | |
Sometimes we invest ourselves so thoroughly in a project that we’re reluctant to ever abandon it, even when it turns | |
out to be fruitless and futile. It’s natural, and usually not a fallacy to want to carry on with something we find | |
important, not least because of all the resources we’ve put into it. However, this kind of thinking becomes a fallacy | |
when we start to think that we should continue with a task or project because of all that we’ve put into it, without | |
considering the future costs we’re likely to incur by doing so. There may be a sense of accomplishment when finishing, | |
and the project might have other values, but it’s not enough to justify the cost invested in it. | |
(12) Appeal to Authority (argumentum ad verecundiam) | |
This fallacy happens when we misuse an authority. This misuse of authority can occur in a number of ways. We can cite | |
only authorities — steering conveniently away from other testable and concrete evidence as if expert opinion is always | |
correct. Or we can cite irrelevant authorities, poor authorities, or false authorities. Suppose someone says, “I buy | |
Fruit of the Loom™ underwear because Michael Jordan says it’s the best.” But Michael Jordan isn’t a relevant authority | |
when it comes to underwear. This is a fallacy of irrelevant authority. There’s another problem with relying too | |
heavily on authorities. Even the authorities can be wrong sometimes. | |
(13) Equivocation (ambiguity) | |
Equivocation happens when a word, phrase, or sentence is used deliberately to confuse, deceive, or mislead by | |
sounding like it’s saying one thing but actually saying something else. For example, a euphemism might be replacing | |
"lying" with the phrase "creative license," or replacing my "criminal background" with my "youthful indiscretions," | |
or replacing "fired from my job" with "early retirement." | |
(14) Appeal to Pity (argumentum ad misericordiam) | |
It is a fallacy of relevance. Personal attacks, and emotional appeals, aren’t strictly relevant to whether something | |
is true or false. In this case, the fallacy appeals to the compassion and emotional sensitivity of others when these | |
factors are not strictly relevant to the argument. Appeals to pity often appear as emotional manipulation. For example, | |
“How can you eat that innocent little carrot? He was plucked from his home in the ground at a young age, and violently | |
skinned, chemically treated, and packaged, and shipped to your local grocer and now you are going to eat him into | |
oblivion when he did nothing to you. You really should reconsider what you put into your body.” | |
To be fair, emotions can sometimes be relevant. Often, the emotional aspect is a key insight into whether something | |
is morally repugnant or praiseworthy, or whether a governmental policy will be winsome or repulsive. People’s feelings | |
about something can be critically important data when planning a campaign, advertising a product, or rallying a group | |
together for a charitable cause. But it becomes a fallacious appeal to pity when the emotions are used in | |
substitution for facts or as a distraction from the facts of the matter. | |
(15) Bandwagon Fallacy | |
The bandwagon fallacy assumes something is true (or right, or good) because other people agree with it. The form of | |
this argument often looks like this: “Many people do or think X, so you ought to do or think X too.” One problem with | |
this kind of reasoning is that the broad acceptance of some claim or action is not always a good indication that the | |
acceptance is justified. People can be mistaken, confused, deceived, or even willfully irrational. And when people | |
act together, sometimes they become even more foolish — i.e., “mob mentality.” | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment