Skip to content

Instantly share code, notes, and snippets.

@vsoch
Last active April 24, 2019 22:31
Show Gist options
  • Save vsoch/3ed5d1dab871e72f3342080456fbb353 to your computer and use it in GitHub Desktop.
Save vsoch/3ed5d1dab871e72f3342080456fbb353 to your computer and use it in GitHub Desktop.
Singularity package investigation (again) reveals that self.stage.source_path is not reliable

This is a debugging session for spack. The issue is that the self.stage.source_path changes (unexpectedly) in the middle of do_stage, and it's not clear why this happens. I'm at the HEAD of the repo, just pulled the fix to remove the empty line:

commit b92fe5f248892069674de2f090e4da879b5c83d4 (HEAD -> update/package-singularity, origin/update/package-singularity)
Author: Vanessa Sochat <[email protected]>
Date:   Wed Apr 24 09:58:08 2019 -0400

    removing blank link
    
    Signed-off-by: Vanessa Sochat <[email protected]>

Observations

Before I jump in, here are some things to point out.

  • For the MakefilePackage, whenever we call a use of self.build_directory it returns self.stage.source_path, but it's also a property, so (I think) this would mean both act as functions (and the self.path.stage_path could be changed silently. If you look at the stages (install, check, etc.) the fact that they all use this build directory could be an issue. Is the variable supposed to change?
  • And it follows that whenever we call self.stage.source_path as I mentioned yesterday, it's going to change! I haven't used property decorators a ton, but this makes me uncomfortable because it's not clear when it's going to decide to change. It assumes that either I am aware of my filesystem not changing, or I'm aware of my filesystem changing (and I want the variable to change with it).

I think the issue comes down to using these @property dectorators, and thus having a variable that changes silently.

Test

I want to make sure the stage is actually cleaned (these are the places that I know of)

$ ls -l var/spack/stage/
singularity-develop-7gbfecp3wbbrr7z3crhv3vm3tnbgjwvh
spack-stage-fxis13he
spack-stage-ej9jzy21

$ bin/spack clean -a
==> Removing all temporary build stages
==> Removing cached downloads
==> Removing cached information on repositories
==> Removing python cache files
$ ls var/spack/stage/
$ ls /tmp/vanessa/spack-stage/

First I'll also set a breakpoint at the first line of do_stage:

     # Unpack the tarball as usual, then move the src dir into
     # its home within GOPATH.
     def do_stage(self, mirror_only=False):
         import pdb; pdb.set_trace()
         super(Singularity, self).do_stage(mirror_only)
         source_path = self.stage.source_path
         if not os.path.exists(self.singularity_gopath_dir):

And uninstall, install

$ bin/spack uninstall -a -y singularity
$ bin/spack install [email protected]

Nothing is defined yet:

==> No binary for singularity found: installing from source
> /tmp/spack/var/spack/repos/builtin/packages/singularity/package.py(56)do_stage()
-> source_path = self.stage.source_path
(Pdb) p self.stage.source_path
None
(Pdb) p self.path
None
(Pdb) p self
<spack.pkg.builtin.singularity.Singularity object at 0x7fe886ee4550>

I think we are here given the first message about installing from source.

The stage is also happening in /tmp (and we verified this was cleaned out before starting)

(Pdb) self.stage.path
'/tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla'
(Pdb) 

Can also verify nothing is currently there! I think the client entrypoint calls do_install first so that we are at the end of this function. I'm not totally clear where the "spec" get's instantiated. But I can take a step and then we see the download start up.

(Pdb) n       
==> Fetching https://github.com/sylabs/singularity/releases/download/v3.1.1/singularity-3.1.1.tar.gz
######################################################################### 100.0%
==> Staging archive: /tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/singularity-3.1.1.tar.gz
==> Created stage in /tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla
> /tmp/spack/var/spack/repos/builtin/packages/singularity/package.py(57)do_stage()
-> if not os.path.exists(self.singularity_gopath_dir):

And /tmp is now populated with the extracted package:

$ ls /tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla
singularity  singularity-3.1.1.tar.gz

This is now correct, because it's using the first entry of os.listdir that is a folder.

(Pdb) self.stage.source_path
'/tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/singularity'

That makes sense, it's the first/only subdir of the staging dir. All the expected stuff is in singularity

$ ls /tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/singularity
CHANGELOG.md     dist        Gopkg.toml       makeit     scripts
cmd              docs        INSTALL.md       mconfig    singularity.spec
CONTRIBUTING.md  etc         internal         mlocal     vendor
CONTRIBUTORS.md  examples    LICENSE-LBNL.md  pkg        VERSION
COPYRIGHT.md     Gopkg.lock  LICENSE.md       README.md
(Pdb) n
> /home/ghartzell/spack/var/spack/repos/builtin/packages/singularity/package.py(61)do_stage()
-> mkdirp(self.sylabs_gopath_dir)
(Pdb) p self.stage.source_path
'/home/ghartzell/spack/var/spack/stage/singularity-3.1.1-kuf7bim7pigb6flgtwcbsimkxietlwpr/singularity'
(Pdb)

And here is the bug again - after the shutil call, it changes to src.

-> tty.debug("Moving {0} to {1}".format(
(Pdb) self.stage.source_path
'/tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/singularity'
(Pdb) n
> /tmp/spack/var/spack/repos/builtin/packages/singularity/package.py(59)do_stage()
-> self.stage.source_path, self.singularity_gopath_dir))
(Pdb) self.stage.source_path
'/tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/singularity'
(Pdb) n
> /tmp/spack/var/spack/repos/builtin/packages/singularity/package.py(60)do_stage()
-> mkdirp(self.sylabs_gopath_dir)
(Pdb) self.stage.source_path
'/tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/singularity'
(Pdb) n
> /tmp/spack/var/spack/repos/builtin/packages/singularity/package.py(61)do_stage()
-> shutil.move(source_path,
(Pdb) self.stage.source_path
'/tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/singularity'
(Pdb) n
> /tmp/spack/var/spack/repos/builtin/packages/singularity/package.py(62)do_stage()
-> self.singularity_gopath_dir)
(Pdb) self.stage.source_path
'/tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/src'

Any call to self.stage.source_path is going to run this function It seems logical to me - we can look in the /tmp folder and now see that (since we mkdir -p the path starting with src) it's now there:

$ ls /tmp/spack/var/spack/stage/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla/
singularity  singularity-3.1.1.tar.gz  src

Let's be silly and open up python in the directory and run the function. First I want to check the fetcher variable - it's actually the kind that would trigger this if statement:

        if isinstance(self.fetcher, fs.URLFetchStrategy):
            if not self.fetcher.expand_archive:
                return self.path

However, it's in a list! So the entire thing is skipped over (is this intentional, perhaps it's done after the url is fetched so it doesn't trigger the statement and get fetched again?)

(Pdb) p self.fetcher[0]
URLFetchStrategy<https://github.com/sylabs/singularity/releases/download/v3.1.1/singularity-3.1.1.tar.gz>
(Pdb) p self.fetcher[0].expand_archive
True

And here is the bug. Knowing that this function returns the first directory, we cannot knowingly create a new one (src) and not expect to be called. The differences that we see between hosts machines are likely do to differences in the ordering of the listing of files.

In [1]: import os

In [2]: path = os.getcwd()

In [3]: for p in [os.path.join(path, f) for f in os.listdir(path)]:
   ...:             if os.path.isdir(p):
   ...:                 print(p)
   ...:                 
/tmp/vanessa/spack-stage/spack-stage-d56my3dl/src
/tmp/vanessa/spack-stage/spack-stage-d56my3dl/singularity

And this is why my fix to grab the original folder works. I don't want to comment on the design of the software, but for this pull request we can't rely on this changing variable.

@vsoch
Copy link
Author

vsoch commented Apr 24, 2019

Note that (at least in the documentation os.listdir is said to return things in "arbitrary order."

Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..' even if they are present in the directory.

@hartzell
Copy link

It seems to me that this is functioning as it should and as it is designed.

After the tarball is unpacked, the staging directory contains two things: the source tarball (singularity-3.1.1.tar.gz) and the unpacked source directory (singularity). It is the right thing for self.stage.source_path to be /.../singularity, that's the path of the source directory, after all.

Then, we move/rename the source directory, from singularity to src. There are now two things in the staging directory: the source tarball (singularity-3.1.1.tar.gz) and the now renamed source directory (src.). At this point the right thing for self.stage.source_path to return is /.../src, because that's now the path of the source directory.

The only time that you should ever see three things in that directory is if you peek into it while the shutil.move call is running. This could happen if, for instance, you have the program stopped in a debugger at just the right/wrong point. It could also happen if this were a multithreaded/multiprocessing/parallel application, but it's not.

Before the move, self.stage.source_path == 'singularity'. After the move, self.stage.source_path == 'src'.

Back in the PR thread, in this comment, you got this error:

  File "/home/vanessa/anaconda3/lib/python3.6/shutil.py", line 544, in move
    os.rename(src, real_dst)
OSError: [Errno 22] Invalid argument: '/home/vanessa/Documents/Dropbox/Code/Python/spack/var/spack/stage/singularity-develop-7gbfecp3wbbrr7z3crhv3vm3tnbgjwvh/src' -> '/home/vanessa/Documents/Dropbox/Code/Python/spack/var/spack/stage/singularity-develop-7gbfecp3wbbrr7z3crhv3vm3tnbgjwvh/src/github.com/sylabs/singularity'

That's really weird, because the src of the rename (which is the first argument to shutil.move(src,dst)) is /home/vanessa/Documents/Dropbox/Code/Python/spack/var/spack/stage/singularity-develop-7gbfecp3wbbrr7z3crhv3vm3tnbgjwvh/src.

At first blush, this suggests that the call was

shutil.move('/home/vane...gjwvh/src', '/home/van...gjwvh/src/github.com/sylabs/singularity')

BUT, there's no way that there should be a /home/van.../src directory before the shutil.move.

That's the thing that I can't figure out, how did self.stage.source_path become /.../src before the move. It seems like there must have been a directory in the staging dir named src, before the move. That's odd, we only expect the tarball the it's unpacked dir.

We know that the previous generation of the code was creating a symlink there named src, which Dropbox was unwinding, so that might have been the problem once or twice, but after you really truly "Beetlejuice, Beetlejuice, Beetlejuice" cleaned up, that shouldn't have been a problem.

Skimming back through the thread, I don't see that reported other places.

Perhaps that's a red herring error because a clean didn't complete or Dropbox didn't finish synching or .....


There's also this problem

==> Error: ProcessError: ./mconfig: No such file or directory
    Command: './mconfig' '--prefix=/home/peanutbutter/spack/opt/spack/linux-ubuntu16.04-x86_64/gcc-5.4.0/singularity-3.1.1-3n7skikliv4pd6olbouxjtjsdl2j4vla'
See build log for details:

It looks like you've cloned a copy of the Spack tree into /tmp. If you go back to the simple, don't-grab-copy-of-source-path version, e.g.

     # Unpack the tarball as usual, then move the src dir into
     # its home within GOPATH.
     def do_stage(self, mirror_only=False):
         super(Singularity, self).do_stage(mirror_only)
         source_path = self.stage.source_path
         if not os.path.exists(self.singularity_gopath_dir):
             tty.debug("Moving {0} to {1}".format(
                 self.stage.source_path, self.singularity_gopath_dir))
             mkdirp(self.sylabs_gopath_dir)
             shutil.move(source_path,
                         self.singularity_gopath_dir)

and do a clean and then run the build in /tmp, which error to you hit?

@vsoch
Copy link
Author

vsoch commented Apr 24, 2019

This wasn’t done on Dropbox. I moved the install location to remove that variable.

@vsoch
Copy link
Author

vsoch commented Apr 24, 2019

You are missing the detail that the directory was changed to src not because it exists, but because we call a variable (the property decorator) that changes the value at the same time as the function call because we need the directory there to copy things to.

I’m not going to argue anymore for my approach, this is clearly some kind of race condition and regardless of what should or shouldn’t happening this is what is happening and it’s not a good use of time to consistently tell me it shouldn’t be happening. It happened, I fixed this issue, and if you want to explore the flawed design for another issue I’d be glad to continue discussion there. It’s out of scope for this PR which is working for all cases.

@hartzell
Copy link

Note that (at least in the documentation os.listdir is said to return things in "arbitrary order."

Yep, I mentioned that at the top of my "step-by-step.md" gist.

But that doesn't matter, because except for when shutil.move is in progress, there should never be more than two things in the staging directory and only one of them is a directory.

And worse, the src directory should never exist before the call to shutil.move (which creates it), so how it apparently ended up as an argument to shutil.move has me gobsmacked.

@vsoch
Copy link
Author

vsoch commented Apr 24, 2019

Why are you surprised that it exists? It's manually created, of course it's there:

mkdirp(self.sylabs_gopath_dir)

The problem is that is called before we reference the variable again to update the property. It get's updated to that. There's no mystery. I'm not gobsmacked. Having all of these as properties, with the base one that changes, is a mess:

    @property
    def gopath(self):
        return join_path(self.stage.path)

    @property
    def sylabs_gopath_dir(self):
        return join_path(self.gopath, 'src/github.com/sylabs/')

    @property
    def singularity_gopath_dir(self):
        return join_path(self.sylabs_gopath_dir, 'singularity')

Again, we just walked through the steps and we saw there were more than two directories there. It doesn't matter what should be there, this is the result.

The design flaw is using these properties that update one another silently leading to an almost to impossible thing. My solution is good because it doesn't use them. I'm happy to work on this on another issue / PR with you, but I'm done with this PR unless there is some other issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment