Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support union of multiple intervals #156

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

putianyi889
Copy link

@putianyi889 putianyi889 commented Sep 9, 2023

This also provides a way to simplify DomainSets.UnionDomain of intervals.

@codecov
Copy link

codecov bot commented Sep 9, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (13c76e1) 99.11% compared to head (a099521) 98.62%.
Report is 13 commits behind head on master.

Files Patch % Lines
src/interval.jl 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #156      +/-   ##
==========================================
- Coverage   99.11%   98.62%   -0.49%     
==========================================
  Files           6        7       +1     
  Lines         225      291      +66     
==========================================
+ Hits          223      287      +64     
- Misses          2        4       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dlfivefifty
Copy link
Member

Is the restriction to TypedEndpointsInterval needed anymore?

@putianyi889
Copy link
Author

putianyi889 commented Sep 9, 2023

The function calls _union. Is that supposed to be a generic method?

Copy link
Collaborator

@hyrodium hyrodium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you keep the original union(d1, d2) method for performance?

Before this PR

julia> using BenchmarkTools

julia> using IntervalSets

julia> i1 = zero(T) .. one(T)
0.0 .. 1.0

julia> i3 = one(T)/2 .. 2*one(T)
0.5 .. 2.0

julia> @benchmark (i1, i3)
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min  max):  16.133 ns   1.314 μs  ┊ GC (min  max): 0.00%  95.32%
 Time  (median):     17.528 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.641 ns ± 31.241 ns  ┊ GC (mean ± σ):  3.63% ±  2.33%

  ▄▄▂▇█▆▅▄▂▃▅▄▄▂▁                   ▁▁▁▁▁                     ▂
  ████████████████▆▆▆▆▄▅▄▄▆▇▇▆▅▆▇▇█████████▇▆▅▄▄▅▅▄▁▁▆▅▆▆▇▆▅▇ █
  16.1 ns      Histogram: log(frequency) by time      33.5 ns <

 Memory estimate: 32 bytes, allocs estimate: 1.

After this PR

julia> using BenchmarkTools

julia> using IntervalSets

julia> i1 = zero(T) .. one(T)
0.0 .. 1.0

julia> i3 = one(T)/2 .. 2*one(T)
0.5 .. 2.0

julia> @benchmark (i1, i3)
BenchmarkTools.Trial: 10000 samples with 979 evaluations.
 Range (min  max):  65.323 ns   1.561 μs  ┊ GC (min  max): 0.00%  93.07%
 Time  (median):     68.096 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   74.969 ns ± 73.735 ns  ┊ GC (mean ± σ):  4.94% ±  4.81%

   ▄██▆▃        ▂▃▃                         ▁                 ▁
  ▇███████▇▇▇▇▇████▇▇▆▇▆▆▅▅▅▅▅▄▆▅▅▅▆▇▇▆▆▅▅▆▇█▇▇▇▇▇▇▇▆▆▆▄▅▅▅▄▅ █
  65.3 ns      Histogram: log(frequency) by time       114 ns <

 Memory estimate: 128 bytes, allocs estimate: 2.

test/runtests.jl Outdated Show resolved Hide resolved
Co-authored-by: Yuto Horikawa <[email protected]>
@putianyi889
Copy link
Author

putianyi889 commented Feb 18, 2024

The implementation is now to use an iterator-based method to merge sorted intervals. The bottleneck is sorting.

SortingNetworks.jl is fast but doesn't support sorting different types, so it's abandoned.

TupleTools.jl is faster when there are <=6 elements. Then the only available method is Base.sort on vectors. I choose static vector after some benchmarking.

For the union of 2 intervals, the old method is still 25% faster, so it's reserved.

@putianyi889 putianyi889 marked this pull request as ready for review February 21, 2024 10:02
@putianyi889 putianyi889 requested a review from hyrodium February 21, 2024 10:02
Copy link
Collaborator

@hyrodium hyrodium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating this PR!
I have not looked into the details yet, but I commented on some points.

Project.toml Outdated Show resolved Hide resolved
Project.toml Outdated Show resolved Hide resolved
src/unionalgorithms.jl Outdated Show resolved Hide resolved
src/unionalgorithms.jl Outdated Show resolved Hide resolved
@putianyi889 putianyi889 marked this pull request as draft February 22, 2024 09:07
@putianyi889
Copy link
Author

Important changes:

  • ref Removed StaticArrays.jl. Instead, use the basic vector to sort. Due to the loss of performance, TupleTools.jl becomes significantly faster. However, due to the exponential compilation time of tuples, the bound is set to 20, that is, for size <= 20, use TupleTools.
  • ref Removed DomainSets.jl from the testing environment. As a result, we lost setdiff and thus the ability to test the correctness of unions.

There is one uncovered line.

union(d::TypedEndpointsInterval) = d # 1 interval

It's to avoid calling

union(I::TypedEndpointsInterval...) = iterunion(sort!(collect(I); lt = leftof)) # ≥21 intervals

in that case, so I don't think it's worth testing.

@putianyi889 putianyi889 marked this pull request as ready for review February 22, 2024 14:20
@putianyi889 putianyi889 requested a review from hyrodium February 22, 2024 14:20
@aplavin
Copy link
Contributor

aplavin commented Feb 22, 2024

Splatting is reasonably used only for small numbers of arguments in Julia. Then, the most straightforward dependency-free implementation is even more performant than this PR:

julia> ints = Tuple(i..(i+1) for i in 1:10)
(1 .. 2, 2 .. 3, 3 .. 4, 4 .. 5, 5 .. 6, 6 .. 7, 7 .. 8, 8 .. 9, 9 .. 10, 10 .. 11)

# this PR:
julia> @btime union($ints...)
  255.479 ns (23 allocations: 1.52 KiB)
1 .. 11

# naive implementation:
julia> myunion(ints...) = reduce(union, ints)

julia> @btime myunion($ints...)
  18.556 ns (0 allocations: 0 bytes)
1 .. 11

UPD: ah, I see, it does require sorting... I heard Julia is adding tuple sorting soon though?

@aplavin
Copy link
Contributor

aplavin commented Feb 22, 2024

Yeah, don't know the current state of the Julia PR JuliaLang/julia#46104, but copying tuple sorting from there

	function tsort(x::NTuple{N}; lt::Function=isless, by::Function=identity,
	               rev::Union{Bool,Nothing}=nothing, order::Base.Ordering=Base.Forward) where N
	     o = Base.ord(lt,by,rev,order)
	    issorted(x, o) ? x : _sort(x, o)
	 end
	 _sort(x::Union{NTuple{0}, NTuple{1}}, o::Base.Ordering) = x
	 function _sort(x::NTuple, o::Base.Ordering)
	     a, b = Base.IteratorsMD.split(x, Val(length(x)>>1))
	     merge(_sort(a, o), _sort(b, o), o)
	 end
	 merge(x::NTuple, y::NTuple{0}, o::Base.Ordering) = x
	 merge(x::NTuple{0}, y::NTuple, o::Base.Ordering) = y
	 merge(x::NTuple{0}, y::NTuple{0}, o::Base.Ordering) = x # Method ambiguity
	 merge(x::NTuple, y::NTuple, o::Base.Ordering) =
	     (Base.lt(o, y[1], x[1]) ? (y[1], merge(x, Base.tail(y), o)...) : (x[1], merge(Base.tail(x), y, o)...))
I get:
julia> myunion(ints...) = reduce(union, tsort(ints; by=i -> (leftendpoint(i), isleftopen(i))))
julia> @btime myunion($ints...)
  59.723 ns (0 allocations: 0 bytes)

@putianyi889
Copy link
Author

putianyi889 commented Feb 22, 2024

Yeah, don't know the current state of the Julia PR JuliaLang/julia#46104, but copying tuple sorting from there

I get:

julia> myunion(ints...) = reduce(union, tsort(ints; by=i -> (leftendpoint(i), isleftopen(i))))
julia> @btime myunion($ints...)
  59.723 ns (0 allocations: 0 bytes)

If I understand correctly, it only supports type-stable tuple sorting, that is, all items must be of the exact same type.

Edit: JuliaLang/julia#52010

By the way, that implementation is exactly the same as TupleTools.jl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants