The situation: we record a podcast over Skype. We both have USB audio interfaces (Behringer U-Phoria UMC202 and 404, respectively). We each record our respective audio inputs and mix those later, manually adjusting the delay. Skype is only used for communication and is not involved in the recording.
There might be a way to do this without ardour, pulse-only. But since we have ardour in our workflow already, we prefer this way, even if it seems needlessly complicated.
Tested on Arch Linux; all referenced programs refer to Arch packages.