Causation implies correlation but not necessarily the other way around. Correlation can be a result of confounding variables (X and Y are correlated because some latent variable Z is causing both X and Y).
I think you'd enjoy a read of Judea Pearl's "Causality". If you could get your hands on the book and read just the first chapter that should suffice.
I'll attempt a sweeping blurb but I can't guarantee its correctness. I'm probably going to screw up some statistical concepts. Also, I'm going to ignore sampling effects (which can cause spurious correlations and stuff).
Anyway, say you're observing a system. It has two variables: X and Y. You collect observational data and fit a statistical model to your data. Say we find a correlation (dependence) between X and Y in the data.
With two variables there are lot of possibilities here:
- X causes Y (X ⇨ Y)
- Y causes X (X ⇦ Y)
- Some unknown variable(s) Z causes X and Y (X ⇦ Z ⇨ Y)
- Some unknown variable(s) Z causes X and Y (X ⇦ Z ⇨ Y) and X causes Y (X ⇨ Y) (think of a triangle graph)
- Some unknown variable(s) Z causes X and Y (X ⇦ Z ⇨ Y) and Y causes X (X ⇦ Y) (think of a triangle graph)
- It goes on
Say we represent our observed data with a Bayesian network. A Bayesian network is a directed graphical model that can compactly represent a probability distribution. The structure of the edges determines whether or not variables are independent or dependent (via d-separation). If an edge is between two variables though that means those two variables are dependent/correlated. Since Bayesian networks have directed edges (with arrows) we want to be able to say X causes Y if there's an edge from X to Y.
The problem is that the distribution for the data can be represented by any of the possible models such as X ⇨ Y, X ⇦ Y, X ⇦ Z ⇨ Y, etc. Any of these will work in representing (fitting to) the observed data. So why do I keep italicizing observed? Because the inability to determine the correct causal model is a result of using observational data alone. In order to determine the correct causal model we need "interventional" data. We need data that is a result of a "do-intervention" (as Pearl likes to put it). Basically, you need experimental data.
An experiment is us, the experimenter, intervening on the observed system. We stick our hands into the system, force a variable to take on some state/value (pendulum length in your example), and then record data. The resulting experimental data can be used to tease out what is the actual causal graph.
Do-interventions have a common sense property. Say X causes Y (X ⇨ Y) but we perform a do-intervention on Y. We are forcing Y to a certain state/value and, as a result, X no longer causes Y during our intervention so we have effectively pruned any edge incoming to Y (X ⇨ Y becomes X and Y unconnected).
Why does this "pruning" of edges help us? The resulting graph after pruning an edge has its own constraints on the dependencies between variables. We can use this knowledge to test for causal relationships.
Now, let's back up a bit to when we just have observational data. Our model could be anything (X ⇨ Y, X ⇦ Z ⇨ Y, etc), but experimental data let's us eliminate some models. Say our real model is X causes Y (X ⇨ Y). We expect that if we do an experiment where we perform a do-intervention on X (force X to a state/value) then the relationship holds because a do-intervention will prune incoming edges to the variable you are intervening on (X in this case). However, if we do an experiment where we perform a do-intervention on Y we expect the edge from X to Y to be pruned (we're forcing Y to a state/value) and as a result the variables should no longer be correlated/dependent.
Say we perform an experiment where we do-intervene on X. We find that in the experiment the variables became decorrelated/independent. This eliminates the models where X causes Y because that path from X to Y (indicating dependency) should have been preserved after an intervention on X. With that we might happily conclude that Y causes X instead, however, many other models are still valid given this experimental data including the one where X and Y are caused by Z. Even worse, Z is an unknown variable. It encapsulates everything we don't measure in our data so we really can't tell if Y truly causes X or if there's a confounding variable. Now you'd have to talk about controlling variables and ensuring your interventions are pure and what not.
One last thing I wanted to mention. You can think of three levels of expressiveness: statistical, causal, and counter-factual.
When you have a probability distribution as ascertained by observational data (which is most data) you have a model capable of "statistical expressiveness." Such a model can only contain information about correlation.
If you are able to perform experiments and "force" variables to a specific value (Pearl's do-intervention) then you can now obtain causal expressiveness. A "causal Bayesian network" is an example of a model with this expressiveness (Definition 1.3.1 in the book).
If you can obtain deterministic relations between variables (like Y being a function of X) then you obtain counter-factual expressiveness. With such a model you can answer counter-factual queries such as "what is the probability that Y would not happen if we had forced X to not happen given that X did happen and Y did happen." It sounds crazy (what happens in this alternate world?), but it works (Chapter 7 covers it IIRC). A "structural equation model" is an example of a model with this expressiveness (such as in section 1.4.1).
Pearl also covers ways to determine causal quantities without having to perform experiments (assuming you accept the assumptions of your model). This is called identifiability and is covered by Pearl's do-calculus. Using counter-factual calculus you can even do the same hting for counter-factual quantities.
If you manage to Chapter 9 the concepts and notation are powerful enough to formally tackle questions like "why do we find striking a match to be a better causal explanation of a fire than the presence of oxygen even though both are required for the fire to start?"
Anyway, I'm not really a statistics guy so I may have trampled all over proper statistics and experimental design. If I've made any mistakes I'll try to correct them. Hopefully this is helpful though!