This is a debugging session for spack. The issue is that the self.stage.source_path changes (unexpectedly) in the middle of do_stage, and it's not clear why this happens. I'm at the HEAD of the repo, just pulled the fix to remove the empty line:
commit b92fe5f248892069674de2f090e4da879b5c83d4 (HEAD -> update/package-singularity, origin/update/package-singularity)
Author: Vanessa Sochat <[email protected]>
Date: Wed Apr 24 09:58:08 2019 -0400
removing blank link
Signed-off-by: Vanessa Sochat <[email protected]>
Before I jump in, here are some things to point out.
- For the MakefilePackage, whenever we call a use of self.build_directory it returns self.stage.source_path, but it's also a property, so (I think) this would mean both act as functions (and the self.path.stage_path could be changed silently. If you look at the stages (install, check, etc.) the fact that they all use this build directory could be an issue. Is the variable supposed to change?
- And it follows that whenever we call self.stage.source_path as I mentioned yesterday, it's going to change! I haven't used property decorators a ton, but this makes me uncomfortable because it's not clear when it's going to decide to change. It assumes that either I am aware of my filesystem not changing, or I'm aware of my filesystem changing (and I want the variable to change with it).
I think the issue comes down to using these @property dectorators, and thus having a variable that changes silently.
I want to make sure the stage is actually cleaned (these are the places that I know of)
$ ls -l var/spack/stage/
singularity-develop-7gbfecp3wbbrr7z3crhv3vm3tnbgjwvh
spack-stage-fxis13he
spack-stage-ej9jzy21
$ bin/spack clean -a
==> Removing all temporary build stages
==> Removing cached downloads
==> Removing cached information on repositories
==> Removing python cache files
$ ls var/spack/stage/
$ ls /tmp/vanessa/spack-stage/
First I'll also set a breakpoint at the first line of do_stage
:
# Unpack the tarball as usual, then move the src dir into
# its home within GOPATH.
def do_stage(self, mirror_only=False):
import pdb; pdb.set_trace()
super(Singularity, self).do_stage(mirror_only)
source_path = self.stage.source_path
if not os.path.exists(self.singularity_gopath_dir):
And uninstall, install
$ bin/spack uninstall -a -y singularity
$ bin/spack install [email protected]
Nothing is defined yet:
==> No binary for singularity found: installing from source
> /tmp/spack/var/spack/repos/builtin/packages/singularity/package.py(56)do_stage()
-> source_path = self.stage.source_path
(Pdb) p self.stage.source_path
None
(Pdb) p self.path
None
(Pdb) p self
<spack.pkg.builtin.singularity.Singularity object at 0x7fe886ee4550>
I think we are here given the first message about installing from source.
The stage is also happening in /tmp (and we verified this was cleaned out before starting)
(Pdb) self.stage.path
'/tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla'
(Pdb)
Can also verify nothing is currently there! I think the client entrypoint calls do_install first so that we are at the end of this function. I'm not totally clear where the "spec" get's instantiated. But I can take a step and then we see the download start up.
(Pdb) n
==> Fetching https://github.com/sylabs/singularity/releases/download/v3.1.1/singularity-3.1.1.tar.gz
######################################################################### 100.0%
==> Staging archive: /tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/singularity-3.1.1.tar.gz
==> Created stage in /tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla
> /tmp/spack/var/spack/repos/builtin/packages/singularity/package.py(57)do_stage()
-> if not os.path.exists(self.singularity_gopath_dir):
And /tmp is now populated with the extracted package:
$ ls /tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla
singularity singularity-3.1.1.tar.gz
This is now correct, because it's using the first entry of os.listdir that is a folder.
(Pdb) self.stage.source_path
'/tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/singularity'
That makes sense, it's the first/only subdir of the staging dir. All the expected stuff
is in singularity
$ ls /tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/singularity
CHANGELOG.md dist Gopkg.toml makeit scripts
cmd docs INSTALL.md mconfig singularity.spec
CONTRIBUTING.md etc internal mlocal vendor
CONTRIBUTORS.md examples LICENSE-LBNL.md pkg VERSION
COPYRIGHT.md Gopkg.lock LICENSE.md README.md
(Pdb) n
> /home/ghartzell/spack/var/spack/repos/builtin/packages/singularity/package.py(61)do_stage()
-> mkdirp(self.sylabs_gopath_dir)
(Pdb) p self.stage.source_path
'/home/ghartzell/spack/var/spack/stage/singularity-3.1.1-kuf7bim7pigb6flgtwcbsimkxietlwpr/singularity'
(Pdb)
And here is the bug again - after the shutil call, it changes to src.
-> tty.debug("Moving {0} to {1}".format(
(Pdb) self.stage.source_path
'/tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/singularity'
(Pdb) n
> /tmp/spack/var/spack/repos/builtin/packages/singularity/package.py(59)do_stage()
-> self.stage.source_path, self.singularity_gopath_dir))
(Pdb) self.stage.source_path
'/tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/singularity'
(Pdb) n
> /tmp/spack/var/spack/repos/builtin/packages/singularity/package.py(60)do_stage()
-> mkdirp(self.sylabs_gopath_dir)
(Pdb) self.stage.source_path
'/tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/singularity'
(Pdb) n
> /tmp/spack/var/spack/repos/builtin/packages/singularity/package.py(61)do_stage()
-> shutil.move(source_path,
(Pdb) self.stage.source_path
'/tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/singularity'
(Pdb) n
> /tmp/spack/var/spack/repos/builtin/packages/singularity/package.py(62)do_stage()
-> self.singularity_gopath_dir)
(Pdb) self.stage.source_path
'/tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/src'
Any call to self.stage.source_path is going to run this function It seems logical to me - we can look in the /tmp folder and now see that (since we mkdir -p the path starting with src) it's now there:
$ ls /tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/
singularity singularity-3.1.1.tar.gz src
Let's be silly and open up python in the directory and run the function. First I want to check the fetcher variable - it's actually the kind that would trigger this if statement:
if isinstance(self.fetcher, fs.URLFetchStrategy):
if not self.fetcher.expand_archive:
return self.path
However, it's in a list! So the entire thing is skipped over (is this intentional, perhaps it's done after the url is fetched so it doesn't trigger the statement and get fetched again?)
(Pdb) p self.fetcher[0]
URLFetchStrategy<https://github.com/sylabs/singularity/releases/download/v3.1.1/singularity-3.1.1.tar.gz>
(Pdb) p self.fetcher[0].expand_archive
True
And here is the bug. Knowing that this function returns the first directory, we cannot knowingly create a new one (src) and not expect to be called. The differences that we see between hosts machines are likely do to differences in the ordering of the listing of files.
In [1]: import os
In [2]: path = os.getcwd()
In [3]: for p in [os.path.join(path, f) for f in os.listdir(path)]:
...: if os.path.isdir(p):
...: print(p)
...:
/tmp/vanessa/spack-stage/spack-stage-d56my3dl/src
/tmp/vanessa/spack-stage/spack-stage-d56my3dl/singularity
And this is why my fix to grab the original folder works. I don't want to comment on the design of the software, but for this pull request we can't rely on this changing variable.
It seems to me that this is functioning as it should and as it is designed.
After the tarball is unpacked, the staging directory contains two things: the source tarball (
singularity-3.1.1.tar.gz
) and the unpacked source directory (singularity
). It is the right thing forself.stage.source_path
to be/.../singularity
, that's the path of the source directory, after all.Then, we move/rename the source directory, from
singularity
tosrc
. There are now two things in the staging directory: the source tarball (singularity-3.1.1.tar.gz
) and the now renamed source directory (src.
). At this point the right thing forself.stage.source_path
to return is/.../src
, because that's now the path of the source directory.The only time that you should ever see three things in that directory is if you peek into it while the
shutil.move
call is running. This could happen if, for instance, you have the program stopped in a debugger at just the right/wrong point. It could also happen if this were a multithreaded/multiprocessing/parallel application, but it's not.Before the move,
self.stage.source_path == 'singularity'
. After the move,self.stage.source_path == 'src'
.Back in the PR thread, in this comment, you got this error:
That's really weird, because the src of the rename (which is the first argument to
shutil.move(src,dst)
) is/home/vanessa/Documents/Dropbox/Code/Python/spack/var/spack/stage/singularity-develop-7gbfecp3wbbrr7z3crhv3vm3tnbgjwvh/src
.At first blush, this suggests that the call was
BUT, there's no way that there should be a
/home/van.../src
directory before theshutil.move
.That's the thing that I can't figure out, how did
self.stage.source_path
become/.../src
before the move. It seems like there must have been a directory in the staging dir namedsrc
, before the move. That's odd, we only expect the tarball the it's unpacked dir.We know that the previous generation of the code was creating a symlink there named
src
, which Dropbox was unwinding, so that might have been the problem once or twice, but after you really truly "Beetlejuice, Beetlejuice, Beetlejuice" cleaned up, that shouldn't have been a problem.Skimming back through the thread, I don't see that reported other places.
Perhaps that's a red herring error because a clean didn't complete or Dropbox didn't finish synching or .....
There's also this problem
It looks like you've cloned a copy of the Spack tree into
/tmp
. If you go back to the simple, don't-grab-copy-of-source-path version, e.g.and do a clean and then run the build in
/tmp
, which error to you hit?