Skip to content

Instantly share code, notes, and snippets.

@ZHAOZHIHAO
Last active November 16, 2018 04:27
Show Gist options
  • Save ZHAOZHIHAO/d09a221754fe8e876afadface7c3155c to your computer and use it in GitHub Desktop.
Save ZHAOZHIHAO/d09a221754fe8e876afadface7c3155c to your computer and use it in GitHub Desktop.
Short notes on papers
1. Mobilenet
Thanks to:
https://blog.csdn.net/t800ghb/article/details/78879612
https://blog.csdn.net/wfei101/article/details/78310226
(Mobilenet V2)https://blog.csdn.net/u011995719/article/details/79135818
Suppose there are M input channels, and N kernels, then the output are N output channels.
With padding to let the output channel size the same as the input channel.
Standard convolution:
The times of calculation is
(Dk * Dk) * (Dc * Dc) * N * M (1)
where Dk is dimension of kernel, and Dc is the dimension of input/output.
Convolution in Mobilenet:
First, use only a kernel on the M input channels to get M output channels,
The times of calculation is
(Dk * Dk) * (Dc * Dc) * M (2)
Second, use N * M * (Df * Df) kernels(every kernel here is 1 * 1 size) on the M output chanels in the frist step,
to get the same dimension output as in standard convolution.
The times of calculation is
N * M * (Df * Df) (3)
(1) / ((2) + (3)), is (1/N)+(1/(Dk*Dk)), this is usually 1/8 or 1/9 as said in this paper.
V2 is to let mobilenet adapt with resnet. See the previous link.
2. Soccer on Your Tabletop
See the demo the know what they done. https://www.youtube.com/watch?v=eRGAB4QBS6U
Input image -> camera calibration -> player detection -> pose estimation -> tracking -> player segmentation
-> player depth estimation -> mesh generation -> scene reconstruction
(1). camera calibration:
using sidelines, penalty box around the goal, solve for the camera parameters w
(focal length, rotation and translation) that align rendered synthetic field lines with the extracted edge points.
i.e., min distance(extracted edges, rendered synthetic field lines) w.r.t parameters w.
(2). player detection:
using faster rcnn.
(3). pose estimation:
detect person keypoint by "Convolutional pose machines".
using keypoint to refine the bounding box in (2).
(4). tracking:
tracking by the bounding box in (3)
(5). player segmentation:
For every tracked player we need to estimate its segmentation mask to be used in the depth estimation network.
A straightforward approach is to apply at each frame a person segmentation method [52], refined with a dense CRF[25]
as we did for training.
[52]F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. In ICLR, 2016
[25]P.Krahenbuhl and V.Koltun. Efficient inference in fully connected crfs with gaussian edge potentials. In NIPS, 2011
(6). player depth estimation
traing:
using FIFA games to get (depth, image, estimated segmentation mask) to train a network.
Note that we use a player-centric depth estimation because
we get more training data by breaking down each frame into 10-20 players,
and it is easier for the network to learn individual player’s configuration rather than whole-scene arrangements.
inference:
input: image with persons' segmentation mask(I think it's processed one by one)
output: depth at every pixel
(7). mesh generation
The depth map is then unprojected to world coordinates using the camera parameters, generating the player’s point-cloud
in 3D. Each pixel corresponds to a 3D point and we use pixel connectivity to establish faces. We texture-map the mesh
with the input image.
In short, convert depth map to world coordinates, then get meshes with these points, finaly texture-map these meshes.
3. Simple Baselines for Human Pose Estimation and Tracking
This paper uses a simple method, but still gets state-of-the-art, which I think is because of the flownet.
The code will be available later.
Pipeline:
1. detect person
using faster rcnn. if not detected, using flownet and pose keypoint in the previous frame to get the predicted keypoints
in the current frame. Then estimate the bounding box by the predicted keypoints.
2. predict pose keypoint
estimate human pose with a CNN(nothing special for this CNN, see fig 1), where the input is the bounding box in step 1.
3. association
the distance is calculated as distance(real detected pose, predicted pose from the previous pose with flownet).
I think here the distance is just the pixel distance.
The association is very simple. Everytime, we get the least distance pair and pop them from the pool. The remaining
unmatched pose(bounding box) is treated as new target.
4. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
c++/tensorflow/pytorch/... version code available.
As said in the paper, "Our method has achieved the speed of 8.8 fps for a video with 19 people". The c++ version code should
be faster.
A nice comment(maybe by the author): http://image-net.org/challenges/talks/2016/Multi-person%20pose%20estimation-CMU.pdf
pipeline:
First, all keypoints are detected without grouping them to individual persons.
Then, these keypoints are connected/grouped to get different people, by greedy matching. The author uses Part Affinity Fields
to calculate the score on connecting two keypoints.
For the Part Affinity Fields, see page 38, 46, 47, 48 in the comment PDF.
The Part Affinity Fields branch and key points detection are jointly learning and inferencing. So it's fast.
5. Depth-aware CNN for RGB-D Segmentation
https://arxiv.org/pdf/1803.06791.pdf
Equations (2) (3) (4) (5) are all theoretical content of this paper.
6. Relational inductive biases, deep learning, and graph networks
GN := graph network
(i). I think the authors "propose" GN for tasks where the structrue prior of CNN doesn't suit, such as relation/interaction
reasoning. For example,
"Graph Networks as Learnable Physics Engines for Inference and Control" on ICLR 2018
video, https://m.facebook.com/story.php?story_fbid=429607650887089&id=118896271958230&_rdr
"Neural Message Passing for Quantum Chemistry" on ICML 2017,
video, https://vimeo.com/238221090
I think the background for proposing GN holds true if we treat deep learning(CNN/RNN/autoencoder ...) as a new basic "atom", and
use it in more sophisticated cases. As the two says,
https://twitter.com/goodfellow_ian/status/1042246801376436224
https://medium.com/@karpathy/software-2-0-a64152b37c35
(ii). There are many reference papers, and they are said as application of GN. But actually, only few of them are direct applications
of this structure form. But the authors prove that those undirected applications can be tranformed to GN form, which I think is
definitely true by intuition. The undirected application, for example,
"Non-local neural networks", on CVPR 2018,
Figure 2 and equations (1), (6) are core in theoretical part.
7. Dynamic Routing Between Capsules
reference: [1]. Neural Network Encapsulation
The input are for example, many channel feature maps v_i, then we got many capsules v_{j|i}.
Then, from the capsules v_{j|i}, the output are predictions, for example, 10 vectors for mnist problem.
The output is vector, because it encodes objects' information in addition to catgory, such as rotation, scale. The length of
this vector is probability.
The output is $s_j=\sum_i{c_{ij}{\hat{v}_{j|i}}$, where j is the number of classes, for example, 10 for mnist, s_j is the
vector for every class.
What we want is good c_{ij}, because c_{ij} allows us to give an interpretation on the output s_j. And in my opinion,
the c_ij is why they give the name of "routing". To get good c_ij, we use EM. The variables in EM includes, c_ij,
$p(\hat v_{j|i}, u, \sigma)$, s_j, u, sigma. Here u and \sigma are involved because we use gaussian cluster. u and \sigma are
model parameters in Gaussian model. To see the rough EM step, see equations (2), (3) in [1], note that in my notation a_j is
replaced by s_j, .
nice words:
The objective of the EM (Expectation Maximization) routing is to group capsules to form a part-whole relationship using a
clustering technique (EM).
8. CNN visualization
based on the code https://github.com/utkuozbulak/pytorch-cnn-visualizations
(1) Generate an image that maximizing a specific filter's output in a specific layer
A random or zero image is generated
Then input to a network
Then get the desired filter's output
Then because of our goal, treat that filter's output in negative form as loss
Then backward the loss to input image
Then adjust input image according to loss
Iterate several times
(2) Generate an image that maximizing a class probability
This is actually the same. One thing is that, the probability is not in [0, 1] after softmax, but right before softmax
in the code. From some papers, it's said right before softmax could have better result than after that.
(3) Vanilla backprop
For all classes right before softmax, only set the graddient of the desired class be 1, and all the left 0.
Backprop this gradient vector(actually a scalar 1 since 0 element doesn't backprop) to input image.
Got a gradient (image) for the input image, then equalize and strech to [0, 255] for better view.
(4) gradcam.py (Now I don't know what is the term name of the operation described below)
To see one specific layer's output, aggregating all channels (convolution output) in that layer into a single one by
weighted sum, and weight for a channel is the mean of the gradient on that channel (a scalar mean on a 2D feature map).
Then normalize the result to [0, 255].
(5) Guided backprop
This modify a little on Vanilla backprop
For all classes right before softmax, only set the graddient of the desired class be 1, and all the left 0.
In a network, a --> b, where a is the sum of elementwise multiplication of a filter and an image patch, --> is relu,
b is a after -->. During the backprop, the gradient on b is set to 0 if it's smaller than 0.
(6) smooth_grad.py
Generate a noise image with 0 mean and not large variance.
Add this noise image to the original image.
Do vanilla backprop or guided backprop on the new image, get the (un-post-processed) gradient image.
Repeat the above process for serveral time, and average these images.
Post-process the average image to have a better looking.
(7) guided_gradcam.py
This is just pointwise multiplication of gradcam.py mask and guided backprop mask.
(8) inverted_representation.py (Understanding Deep Image Representations by Inverting Them)
For a input image, for example, a cat image, we get its feature maps at a specific layer.
Then we generate a random image, and get the feature maps for the random image at the same layer as the input image do.
Then we calculate the L2 distance(loss) between the two sets of feature maps, plus some regularization terms(for smooth result).
There are two kind of regularizer in the paper. The first kind is x(x=6 in the paper) norm. I don't know how the effect of
this regularizer. The second kind of regularizer is the x-axis and y-axis first order direvation, this encourage the
result image to have constant regions.
Then we use the loss the update the input image, and iterative for several times.
9. 15 Logical Fallacies You Should Know Before Getting Into a Debate
https://thebestschools.org/magazine/15-logical-fallacies-know/
(1) Ad Hominem Fallacy
Instead of addressing the candidate’s stance on the issues, or addressing his or her effectiveness as a statesman or
stateswoman, ad hominems focus on personality issues, speech patterns, wardrobe, style, and other things that affect
popularity but have no bearing on their competence.
(2) Straw Man
In the straw man fallacy, someone attacks a position the opponent doesn’t really hold.Instead of contending with the
actual argument, he or she instead attacks the equivalent of a lifeless bundle of straw, an easily defeated effigy.
Straw man fallacies are a cheap and easy way to make one’s position look stronger than it is. Often the straw man
fallacy is accidental, because one doesn’t realize he or she is oversimplifying a nuanced position, or
misrepresenting a narrow, cautious claim as if it were broad and foolhardy.
(3) Appeal to Ignorance (argumentum ad ignorantiam)
Consider the following two claims: “No one has ever been able to prove definitively that extra-terrestrials exist,
so they must not be real.” “No one has ever been able to prove definitively that extra-terrestrials do not exist, so
they must be real.” If we don’t know whether they exist, then we don’t know that they do exist or that they don’t
exist. Ignorance doesn’t prove any claim to knowledge.
(4) False Dilemma/False Dichotomy
False Dilemma fails by limiting the options to two when there are in fact more options to choose from. For example,
there are only two kinds of people in the world, people who love Led Zeppelin, and people who hate music.
It’s not a fallacy if there really are only two options. For example, “either Led Zeppelin is the greatest band of
all time, or they are not.” That’s a true dilemma, since there really are only two options there.
(5) Slippery Slope
The slippery slope fallacy suggests that unlikely or ridiculous outcomes are likely when there’s just not enough
evidence to think so. You may have used this fallacy on your parents as a teenager: “But, you have to let me go to
the party! If I don’t go to the party, I’ll be a loser with no friends. Next thing you know I’ll end up alone and
jobless living in your basement when I’m 30!”
(6) Circular Argument (petitio principii)
When a person’s argument is just repeating what they already assumed beforehand, it’s not arriving at any new
conclusion. We call this a circular argument or circular reasoning. Another way to explain circular arguments is
that they start where they finish, and finish where they started.
(7) Hasty Generalization
Hasty generalizations are general statements without sufficient evidence to support them. They are general claims
too hastily made, hence they commit some sort of illicit assumption, stereotyping, unwarranted conclusion,
overstatement, or exaggeration. Is one example enough to prove the claim that "Apple computers are the most
expensive computer brand?" What about 12 examples? What about if 37 out of 50 apple computers were more expensive
than comparable models from other brands? A simple way to avoid hasty generalizations is to add qualifiers like
“sometimes,” "maybe," "often," or "it seems to be the case that ... ".
(8) Red Herring (ignoratio elenchi)
A “red herring” is a distraction from the argument typically with some sentiment that seems to be relevant but
isn’t really on-topic. This tactic is common when someone doesn’t like the current topic and wants to detour into
something else instead, something easier or safer to address.
(9) Tu Quoque Fallacy
The “tu quoque,” Latin for “you too,” is also called the “appeal to hypocrisy” because it distracts from the argument
by pointing out hypocrisy in the opponent. If Jack says, “Maybe I committed a little adultery, but so did you Jason!”
Jack is trying to diminish his responsibility or defend his actions by distributing blame to other people. But no one
else’s guilt excuses his own guilt. No matter who else is guilty, Jack is still an adulterer.
(10) Causal Fallacy
The Causal Fallacy is any logical breakdown when identifying a cause. You can think of the Causal Fallacy as a parent
category for several different fallacies about unproven causes.
i). One causal fallacy is the False Cause or non causa pro causa ("not the-cause for a cause") fallacy, which is
when you conclude about a cause without enough evidence to do so. Consider, for example, “Since your parents
named you ‘Harvest,’ they must be farmers.”
ii). Another causal fallacy is the Post Hoc fallacy. This fallacy happens when you mistake something for the
cause just because it came first. “Yesterday, I walked under a ladder with an open umbrella indoors while
spilling salt in front of a black cat. And I forgot to knock on wood with my lucky dice. That must be why I’m
having such a bad day today. It’s bad luck.”
iii). Another kind of causal fallacy is the correlational fallacy. This fallacy happens when you mistakenly
interpret two things found together as being causally related. Two things may correlate without a causal relation,
or they may have some third factor causing both of them to occur. Or perhaps both things just, coincidentally,
happened together. Consider for example, “Every time Joe goes swimming he is wearing his Speedos. Something
about wearing that Speedo must make him want to go swimming.”
(11) Fallacy of Sunk Costs
Sometimes we invest ourselves so thoroughly in a project that we’re reluctant to ever abandon it, even when it turns
out to be fruitless and futile. It’s natural, and usually not a fallacy to want to carry on with something we find
important, not least because of all the resources we’ve put into it. However, this kind of thinking becomes a fallacy
when we start to think that we should continue with a task or project because of all that we’ve put into it, without
considering the future costs we’re likely to incur by doing so. There may be a sense of accomplishment when finishing,
and the project might have other values, but it’s not enough to justify the cost invested in it.
(12) Appeal to Authority (argumentum ad verecundiam)
This fallacy happens when we misuse an authority. This misuse of authority can occur in a number of ways. We can cite
only authorities — steering conveniently away from other testable and concrete evidence as if expert opinion is always
correct. Or we can cite irrelevant authorities, poor authorities, or false authorities. Suppose someone says, “I buy
Fruit of the Loom™ underwear because Michael Jordan says it’s the best.” But Michael Jordan isn’t a relevant authority
when it comes to underwear. This is a fallacy of irrelevant authority. There’s another problem with relying too
heavily on authorities. Even the authorities can be wrong sometimes.
(13) Equivocation (ambiguity)
Equivocation happens when a word, phrase, or sentence is used deliberately to confuse, deceive, or mislead by
sounding like it’s saying one thing but actually saying something else. For example, a euphemism might be replacing
"lying" with the phrase "creative license," or replacing my "criminal background" with my "youthful indiscretions,"
or replacing "fired from my job" with "early retirement."
(14) Appeal to Pity (argumentum ad misericordiam)
It is a fallacy of relevance. Personal attacks, and emotional appeals, aren’t strictly relevant to whether something
is true or false. In this case, the fallacy appeals to the compassion and emotional sensitivity of others when these
factors are not strictly relevant to the argument. Appeals to pity often appear as emotional manipulation. For example,
“How can you eat that innocent little carrot? He was plucked from his home in the ground at a young age, and violently
skinned, chemically treated, and packaged, and shipped to your local grocer and now you are going to eat him into
oblivion when he did nothing to you. You really should reconsider what you put into your body.”
To be fair, emotions can sometimes be relevant. Often, the emotional aspect is a key insight into whether something
is morally repugnant or praiseworthy, or whether a governmental policy will be winsome or repulsive. People’s feelings
about something can be critically important data when planning a campaign, advertising a product, or rallying a group
together for a charitable cause. But it becomes a fallacious appeal to pity when the emotions are used in
substitution for facts or as a distraction from the facts of the matter.
(15) Bandwagon Fallacy
The bandwagon fallacy assumes something is true (or right, or good) because other people agree with it. The form of
this argument often looks like this: “Many people do or think X, so you ought to do or think X too.” One problem with
this kind of reasoning is that the broad acceptance of some claim or action is not always a good indication that the
acceptance is justified. People can be mistaken, confused, deceived, or even willfully irrational. And when people
act together, sometimes they become even more foolish — i.e., “mob mentality.”
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment