Skip to content

Commit

Permalink
[SPARK-50640][CORE][TESTS] Update ChecksumBenchmark by removing `Pu…
Browse files Browse the repository at this point in the history
…reJavaCrc32C` and setting `Adler32` as a baseline

### What changes were proposed in this pull request?

This PR aims to update `ChecksumBenchmark` by
- Removing `PureJavaCrc32C`
- Setting `Adler32` as a baseline

### Why are the changes needed?

Not only Apache Spark, but also Apache Hadoop community doesn't use that legacy code on Java 9+ since 2018 from Hadoop 3.1.0 (HADOOP-15033).
- apache/hadoop#291

We can save our resources by removing obsolete code usage and focusing on our available options.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

I also attached Apple Silicon result.

**Java 17**
```
[info] OpenJDK 64-Bit Server VM 17.0.13+11-LTS on Mac OS X 15.3
[info] Apple M3 Max
[info] Checksum Algorithms:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ----------------------------------------------------------------------------------------------------
[info] Adler32                        8689           8709          28          0.0     8485001.2       1.0X
[info] CRC32                          3201           3205           4          0.0     3125877.4       2.7X
[info] CRC32C                         3199           3205           5          0.0     3124264.6       2.7X
```

**Java 21**
```
[info] OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Mac OS X 15.3
[info] Apple M3 Max
[info] Checksum Algorithms: Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ---------------------------------------------------------------------------------------------------
[info] Adler32                       9208           9226          20          0.0     8991732.4       1.0X
[info] CRC32                         3238           3357         105          0.0     3162007.9       2.8X
[info] CRC32C                        3224           3351         110          0.0     3147966.1       2.9X
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49258 from dongjoon-hyun/SPARK-50640.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
dongjoon-hyun committed Dec 20, 2024
1 parent bccdf1f commit a2e3188
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 18 deletions.
9 changes: 4 additions & 5 deletions core/benchmarks/ChecksumBenchmark-jdk21-results.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,12 @@
Benchmark Checksum Algorithms
================================================================================================

OpenJDK 64-Bit Server VM 21.0.4+7-LTS on Linux 6.5.0-1025-azure
OpenJDK 64-Bit Server VM 21.0.5+11-LTS on Linux 6.8.0-1017-azure
AMD EPYC 7763 64-Core Processor
Checksum Algorithms: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
CRC32 2743 2746 3 0.0 2678409.9 1.0X
CRC32C 1974 2055 70 0.0 1928129.2 1.4X
Adler32 12689 12709 17 0.0 12391425.9 0.2X
hadoop PureJavaCrc32C 23027 23041 13 0.0 22487098.9 0.1X
Adler32 11116 11123 7 0.0 10855585.4 1.0X
CRC32 2774 2777 4 0.0 2709448.1 4.0X
CRC32C 2083 2148 65 0.0 2034177.5 5.3X


9 changes: 4 additions & 5 deletions core/benchmarks/ChecksumBenchmark-results.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,12 @@
Benchmark Checksum Algorithms
================================================================================================

OpenJDK 64-Bit Server VM 17.0.12+7-LTS on Linux 6.5.0-1025-azure
OpenJDK 64-Bit Server VM 17.0.13+11-LTS on Linux 6.8.0-1017-azure
AMD EPYC 7763 64-Core Processor
Checksum Algorithms: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
CRC32 2757 2758 1 0.0 2692250.2 1.0X
CRC32C 2142 2244 116 0.0 2091901.8 1.3X
Adler32 12699 12712 15 0.0 12401205.6 0.2X
hadoop PureJavaCrc32C 23049 23066 15 0.0 22508320.3 0.1X
Adler32 11112 11130 20 0.0 10851949.2 1.0X
CRC32 2765 2767 2 0.0 2699749.0 4.0X
CRC32C 2101 2159 54 0.0 2051565.3 5.3X


Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,6 @@ package org.apache.spark.shuffle

import java.util.zip.{Adler32, CRC32, CRC32C}

import org.apache.hadoop.util.PureJavaCrc32C

import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}

/**
Expand All @@ -41,18 +39,15 @@ object ChecksumBenchmark extends BenchmarkBase {
runBenchmark("Benchmark Checksum Algorithms") {
val data: Array[Byte] = (1 until 32 * 1024 * 1024).map(_.toByte).toArray
val benchmark = new Benchmark("Checksum Algorithms", N, 3, output = output)
benchmark.addCase(s"Adler32") { _ =>
(1 to N).foreach(_ => new Adler32().update(data))
}
benchmark.addCase("CRC32") { _ =>
(1 to N).foreach(_ => new CRC32().update(data))
}
benchmark.addCase(s"CRC32C") { _ =>
(1 to N).foreach(_ => new CRC32C().update(data))
}
benchmark.addCase(s"Adler32") { _ =>
(1 to N).foreach(_ => new Adler32().update(data))
}
benchmark.addCase(s"hadoop PureJavaCrc32C") { _ =>
(1 to N).foreach(_ => new PureJavaCrc32C().update(data))
}
benchmark.run()
}
}
Expand Down

0 comments on commit a2e3188

Please sign in to comment.