Note: This is cross-posted from reddit.
I've been trying to determine (a) if it's possible to achieve something in rust and (b) if so, how. I will try to abstract the problem as much as possible since the details of this in my code are rather boring and unessential to describe the problem.
I have a program that extracts information from the combination of a polars data frame, and a paired file. This is for a genomics application, and for those interested, the data frame contains the locations of sequence features (exons, transcripts, etc.) and the file contains the genome sequence (chromosome by chromosome). The genome is large, so we prefer not to load the whole thing in memory, and instead to iterate over it chromosome by chromosome and then feature by feature, yielding each sequence feature one at a time.
It turns out it's relatively simple to write a "single sequence" iterator (i.e. an iterator that yields the sequences of all of the features of one chromosome). It looks something like this:
struct ChrRowSeqIter<'a> {
iters: Vec<polars::series::SeriesIter<'a>>,
record: &'a Record,
}
where the lifetime 'a
is the lifetime of a polars data frame that is being referenced and record
is the paired sequence record for a specific chromosome. This iterator works fine.
Now, the problem: I'd like to have an iterator that essentially chains together many of the above iterators transparently to yield, in turn, all features over the entire genome. So this would basically create a ChrRowSeqIter
for chromosome 1, yield its entries, then move on to chromosome 2, etc. The problem then, is that I will have to create a ChrRowSeqIter
the first references a dataframe for chromosome 1, then chromosome 2, etc. and at the same time the sequence record for chromosome 1 then chromosome 2 etc.
The way things are currently structured in the program is that this "outer" iterator takes ownership of a data frame, let's say X
, from which we will create the per-chromosome data frame that is borrowed out to each ChrRowSeqIter
. The outer iterator should do the following:
* While the current ChrRowSeqIter
has entries left, yield them one by one (note that the yielded entry is returned by move — so it's not lending out anything dependent on its own lifetime)
* When the current ChrRowSeqIter
is exhausted, read the next chromosome from file, and prepare a ChrRowSeqIter
with the features corresponding to this chromosome. This data frame is created from X
, and then borrowed out to ChrRowSeqIter
.
* When there are no more records to read from file, return None
For the life of me, I cannot build such an iterator. The fundamental challenge seems to be that ChrRowSeqIter
refers to a data frame created from X
, but X
is owned by the outer iterator. Therefore, this is a form of a self-referential struct, I guess. When I implement next()
, I get something like the following:
impl<'a> Iterator for OuterIter<'a> {
fn next(&mut self) -> Option<Self::Item> {
// ... stuff
self.next_chr_row_seq_iter = Some( ChrRowSeqIter(&self.X, &self.next_chromosome) );
}
}
At that inner assignment (of self.next_chr_row_seq_iter
), the compiler complains that 'a
must outlive '1
where we can assume that self: '1
. I can't figure out how to shake this.
Strangely, if I don't attempt to implement the actual iterator trait, I can convince the compiler with the following:
impl<'a> OuterIter<'a> {
fn next_entry<'b>(&'b mut self) -> Option<Self::Item>
where 'b: 'a {
// ... stuff
self.next_chr_row_seq_iter = Some( ChrRowSeqIter(&self.X, &self.next_chromosome) );
}
}
So that we know that this is safe because the lifetime of self is > the lifetime of the thing we are creating inside self (which is true, because self
owns X). Of course, the actual signature of next()
in Iterator
doesn't permit fn next<'b>(&'b mut self)
and so this isn't an option to implement the real Iterator
trait.
I've been using rust for a while now, and usually have been able to easily reorganize my code to satisfy the borrow checker. This one really threw me for a loop though, because it's more "library" type code than the "application" type code I normally write, and I've been unable to really figure this one out. I'd really appreciate feedback, input and suggestions from the community here on the best path forward. Is something like this simply impossible (in which case, how could one reorganize the code to allow something like an OuterIter
)? If not, what must I do to convince the borrow checker to allow what I am trying to do? Thanks!