Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Make LongSequence fixed-length? #329

Open
jakobnissen opened this issue Dec 22, 2024 · 3 comments
Open

Discussion: Make LongSequence fixed-length? #329

jakobnissen opened this issue Dec 22, 2024 · 3 comments
Labels

Comments

@jakobnissen
Copy link
Member

This is obviously a breaking change, so this is for the far future if it will ever happen.

Currently, LongSequence is resizable: They support operations like:

julia> seq = dna"TAG";

julia> push!(seq, DNA_A)
4nt DNA Sequence:
TAGA

julia> append!(seq, rna"UAGA")
8nt DNA Sequence:
TAGATAGA

My proposal is to remove all methods on LongSequence that changes their size. This includes: resize!, pop!, popfirst!, push!, pushfirst!, filter!, deleteat!, and resizing during copy!.

Disadvantages of proposal

The disadvantages are obvious: Some users may want to do these operations. With my proposed suggestion, users would instead need to use immutable operations, just like if they were working with strings.
But: How often do people actually use these operations? I would guess that they are not used very often.

Advantage

It would allow us to change the storage of LongSequence from:

mutable struct OldSeq
    const data::Vector{UInt64}
    len::UInt
end

struct NewSeq
    data::Memory{UInt64}
    len::UInt
end

That is, it would allow making LongSequence a struct instead of a mutable struct, and it would also allow them to use Memory instead of Vector as backing storage. This saves two memory indirections.
More importantly, these indirections may inhibit memory optimisations such as stack-allocating some LongSequence, allocation hoisting, such. Such optimisations will be easier for Julia to do on an immutable struct containing Memory compared to a Memory hiding behind to mutable references.

At the moment, Julia doesn't really have any substantial memory optimisations so at the moment, there won't be much advantage to it.

@cjprybol
Copy link
Contributor

At the moment, Julia doesn't really have any substantial memory optimisations so at the moment, there won't be much advantage to it.

If and when this stops being true, I'd be excited about having these changes and helping out where possible.

I personally never use the resizing and conceptually think of BioSequences like bio-flavored Strings with the appropriate validation checks and functions. For example, Instead of ASCII or UniCode alphabets with uppercasing and lowercasing functions the alphabets are nucleic acids or amino acids with (reverse)complement and/or (reverse)translation functions, etc.


If Strings in base Julia are already immutable as you suggest here:

With my proposed suggestion, users would instead need to use immutable operations, just like if they were working with strings.

then making this change would just make BioSequences more consistent with how I (personally, may not be universal) use the library


Related, my personal experience to:

But: How often do people actually use these operations? I would guess that they are not used very often.

is rarely if ever


Just my perspective but hopefully it is helpful. Thanks as always for your work on BioJulia @jakobnissen !

@kescobo
Copy link
Member

kescobo commented Dec 23, 2024

@jakobnissen Do we have a decent list of dependents? We could maybe use juliahub to query this, or use some of the work that Timothy has been doing. If we could add downstream tests, we could see how breaking this actually is.

I agree it unlikely that this is used frequently.

@jakobnissen
Copy link
Member Author

jakobnissen commented Dec 23, 2024

We have downstream tests already, so we coild just test it out (of course and then do more tests if it looks promising)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants