43.0.0 (2025-01-07)
Implemented enhancements:
- feat: make max message size configurable for gRPC clients #983 (etolbakov)
- feat: Upgrade to DataFusion 38 #1048 (andygrove)
- feat: Upgrade to DataFusion 39.0.0 #1052 (andygrove)
- feat: default instance for executor configuration #1147 (milenkovicm)
- feat: Expose Ballista Scheduler and Executor in Python #1148 (milenkovicm)
- feat: add test to check for
ctx.enable_url_table()
#1155 (milenkovicm)
Documentation updates:
- docs: Add ASF attribution #973 (simicd)
- Architecture guide #977 (andygrove)
- docs: enhance the ballista-cli docs #979 (haoxins)
- docs: update user guide with docker image information #980 (etolbakov)
- docs: enhance the docs of Ballista client #985 (haoxins)
- docs: list protoc dependency #989 (davidwilemski)
- update asf yaml #1007 (andygrove)
- docs: Add workflow to publish documentation #1040 (andygrove)
- docs: Replace Arrow Ballista with DataFusion Ballista #1041 (andygrove)
- Add maintenance status note #1043 (andygrove)
- Remove helm from supported code #1071 (milenkovicm)
- Remove UI #1072 (milenkovicm)
- Remove HDFS support ... #1073 (milenkovicm)
- Removed Maintenance Notice #1094 (tbar4)
- Update root
README.md
and other documentation with latest changes #1113 (milenkovicm) - docs: Update benchmarks #1121 (andygrove)
Merged pull requests:
- PyBallista - Python SQL client for Ballista #970 (andygrove)
- docs: Add ASF attribution #973 (simicd)
- [Python] Add
read_csv
andread_parquet
methods #976 (andygrove) - Architecture guide #977 (andygrove)
- [Python] Add more methods to SessionContext #978 (andygrove)
- [Python] Add
execute_logical_plan
to context #972 (andygrove) - Use correct product name in docs #975 (andygrove)
- docs: enhance the ballista-cli docs #979 (haoxins)
- docs: update user guide with docker image information #980 (etolbakov)
- Upgrade Rust version to 1.72 to keep the same as DataFusion v35 #982 (haoxins)
- build: Fix the ballista-cli Dockerfile #981 (haoxins)
- feat: make max message size configurable for gRPC clients #983 (etolbakov)
- Remove some hard-coded gRPC max message sizes #984 (andygrove)
- docs: enhance the docs of Ballista client #985 (haoxins)
- docs: list protoc dependency #989 (davidwilemski)
- Fix ExecutorLost event debug info #988 (lewiszlw)
- Fix shuffle writer test #998 (Jefffrey)
- Bump graphviz-rust from 0.6.1 to 0.8.0 #999 (Jefffrey)
- Add rust-toolchain.toml for clarity #1014 (scnerd)
- Fix executor metadata decode bug #1004 (lewiszlw)
- update asf yaml #1007 (andygrove)
- Fix Ballista rust.yml github workflow #1026 (RaphaelMarinier)
- Bump datafusion to 36.0.0 and make ballista compatible with it. #1027 (RaphaelMarinier)
- Make Ballista compatible with Datafusion 37.0.0 (from 36.0.0) #1031 (RaphaelMarinier)
- Fixes Setting Job Name Not Reflected in Ballista UI #1039 (athultr1997)
- docs: Add workflow to publish documentation #1040 (andygrove)
- [Docs] fix good_first_issue link in the contribution md doc #1022 (Almaz-KG)
- docs: Replace Arrow Ballista with DataFusion Ballista #1041 (andygrove)
- Fix job hangs when partition count of plan is zero #1024 (lewiszlw)
- Add maintenance status note #1043 (andygrove)
- Fix cargo build #1045 (andygrove)
- fix docker build in CI #1046 (andygrove)
- feat: Upgrade to DataFusion 38 #1048 (andygrove)
- Bump actions/setup-node from 3 to 4 #909 (dependabot[bot])
- Bump actions/cache from 3 to 4 #958 (dependabot[bot])
- feat: Upgrade to DataFusion 39.0.0 #1052 (andygrove)
- Update datafusion protobuf definitions #1057 (palaska)
- Fix regression with TPC-H benchmark #1060 (andygrove)
- Upgrade to Datafusion 41 #1062 (palaska)
- Remove helm from supported code #1071 (milenkovicm)
- Remove plugin subsystem #1070 (milenkovicm)
- Remove CI folder #1074 (milenkovicm)
- Code cleanup, move examples, remove unused files #1075 (milenkovicm)
- Remove UI #1072 (milenkovicm)
- Remove key-value stores for scheduler persistence #1077 (milenkovicm)
- Remove cache functionality #1076 (milenkovicm)
- Remove HDFS support ... #1073 (milenkovicm)
- Reorganise and remove dependencies #1078 (milenkovicm)
- Promote keda and flight-sql to optional features #1079 (milenkovicm)
- Update to datafusion 42 ... #1080 (milenkovicm)
- #1086 solve examples errors #1087 (tbar4)
- fix issue with not building python package ... #1085 (milenkovicm)
- another round of code cleanup ... #1089 (milenkovicm)
- Make rest-api optional feature ... #1084 (milenkovicm)
- fix clippy issues after updating to rust 1.82 #1090 (milenkovicm)
- Replace BallistaContext with SessionContext #1088 (milenkovicm)
- Removed Maintenance Notice #1094 (tbar4)
- Ergonomic way to setup/configure
SessionContextExt
#1096 (milenkovicm) - Executor configuration extended .. #1099 (milenkovicm)
- fix issue with executor registration ... #1101 (milenkovicm)
- Deprecate
BallistaContext
#1103 (milenkovicm) - fix imports after un-rebased PR #1106 (milenkovicm)
- Ballista proto cleanup #1110 (milenkovicm)
- Update and move deps to workspace #1109 (milenkovicm)
- Trim down
BallistaConfig
#1108 (milenkovicm) - Remove build-in object store registry #1114 (milenkovicm)
- Update root
README.md
and other documentation with latest changes #1113 (milenkovicm) - support window functions #1112 (onursatici)
- added a BallistaContext to ballista to allow for Remote or standalone #1100 (tbar4)
- Decommission
BallistaContext
#1119 (milenkovicm) - docs: Update benchmarks #1121 (andygrove)
- Make easier to create custom schedulers and executors #1118 (milenkovicm)
- refactor: Move BallistaRegistry to better location #1126 (milenkovicm)
- refactor: BallistaLogicalExtensionCodec refactoring and improvements #1127 (milenkovicm)
- refactor: consolidate ballista tests #1129 (milenkovicm)
- refactor: SessionStateExt and SessionConfigExt #1130 (milenkovicm)
- chore: dependancy updates #1131 (milenkovicm)
- chore: fix warning mimaloc warning when building #1137 (milenkovicm)
- refactor: SessionBuilder to return Result<_> #1138 (milenkovicm)
- chore: remove unused cache_ options from executor #1140 (milenkovicm)
- updated maturin version and ccargo build to build yml #1136 (tbar4)
- chore: Fix clippy issues after rust update (1.83.0) #1143 (milenkovicm)
- Fix documentation example which still uses BallistaContext #1145 (milenkovicm)
- Ballista proto cleanup #1146 (milenkovicm)
- feat: default instance for executor configuration #1147 (milenkovicm)
- feat: Expose Ballista Scheduler and Executor in Python #1148 (milenkovicm)
- chore: dependency cleanup #1150 (milenkovicm)
- Update DataFusion to 43 #1125 (Dandandan)
- Reinstantiate join order optimization #1122 (Dandandan)
- add partitioning scheme for unresolved shuffle and shuffle reader exec #1144 (onursatici)
- chore: update py-df to 43.1 #1152 (milenkovicm)
- chore: no need to run python test in rust #1154 (milenkovicm)
- feat: add test to check for
ctx.enable_url_table()
#1155 (milenkovicm)
0.12.0 (2024-01-14)
Documentation updates:
- docs: fix link #799 (haoxins)
Merged pull requests:
- [minor] remove outdate todo #683 (Ted-Jiang)
- Add executor terminating status for graceful shutdown #667 (thinkharderdev)
- Allow
BallistaContext::read_*
methods to read multiple paths. #679 (luckylsk34) - Update scheduler.md #657 (psvri)
- Mark
SchedulerState
as pub #688 (Dandandan) - Update graphviz-rust requirement from 0.5.0 to 0.6.1 #651 (dependabot[bot])
- Upgrade DataFusion to 19.0.0 #691 (r4ntix)
- Update release docs #692 (andygrove)
- Mark
SchedulerServer::with_task_launcher
as pub #695 (Dandandan) - Make task_manager pub #698 (Dandandan)
- Add ExecutionEngine abstraction #687 (andygrove)
- Allow accessing s3 locations in client mode #700 (luckylsk34)
- git clone branch incorrect #699 (BubbaJoe)
- Fix for error message during testing #707 (yahoNanJing)
- Upgrade datafusion to 20.0.0 & sqlparser to to 0.32.0 #711 (r4ntix)
- Update README.md #729 (jiangzhx)
- Update link to scheduler proto file in dev docs #713 (JAicewizard)
- Fix
show tables
fails #715 (r4ntix) - Remove redundant fields in ExecutorManager #728 (yahoNanJing)
- Fix parameter '--config-backend' to '--cluster-backend' #720 (paolorechia)
- Upgrade DataFusion to 21.0.0 #727 (r4ntix)
- [minor] remove useless brackets #739 (Ted-Jiang)
- Only decode plan in
LaunchMultiTaskParams
once #743 (Dandandan) - Upgrade DataFusion to 22.0.0 #740 (r4ntix)
- [feature] support shuffle read with retry when facing IO error. #738 (Ted-Jiang)
- [log] Print long running task status. #750 (Ted-Jiang)
- Upgrade DataFusion to 23.0.0 #755 (yahoNanJing)
- Fix plan metrics length and stage metrics length not match #764 (yahoNanJing)
- added match arms to create ClusterStorageConfig #766 (BokarevNik)
- [Improve] refactor the offer_reservation avoid wait result #760 (Ted-Jiang)
- [fea] Avoid multithreaded write lock conflicts in event queue. #754 (Ted-Jiang)
- Upgrade DataFusion to 24.0.0, Object_Store to 0.5.6 #769 (r4ntix)
- Refine create_datafusion_context() #778 (yahoNanJing)
- Remove output_partitioning for task definition #776 (yahoNanJing)
- Upgrade DataFusion to 25.0.0 #779 (r4ntix)
- Disable the ansi feature of tracing-subscriber #784 (yahoNanJing)
- Add config grpc_server_max_decoding_message_size to make the maximum size of a decoded message at the grpc server side configurable #782 (yahoNanJing)
- Fix nodejs issues in Docker build #731 (jnaous)
- Upgrade node version to fix build in
main
#794 (avantgardnerio) - Remove redundant mod session_registry #792 (yahoNanJing)
- Make last_seen_ts_threshold for getting alive executor at the scheduler side larger than the heartbeat time interval #786 (yahoNanJing)
- Remove the prometheus-metrics from the default feature #788 (yahoNanJing)
- Refine the ExecuteQuery grpc interface #790 (yahoNanJing)
- Add config to collect statistics, enable in TPC-H benchmark #796 (Dandandan)
- Add support for GCS data sources #805 (haoxins)
- Update DataFusion to 26 #798 (Dandandan)
- Issue 162 build docker image in ci #716 (paolorechia)
- Fix index out of bounds panic #819 (yahoNanJing)
- Refactor the TaskDefinition by changing encoding execution plan to the decoded one #817 (yahoNanJing)
- Fix ballista-cli docs #800 (jonahgao)
- docs: fix link #799 (haoxins)
- Implement the with_new_children for ShuffleReaderExec #821 (yahoNanJing)
- Update to point to the correct documentation #838 (dadepo)
- Remove ExecutorReservation and change the task assignment philosophy from executor first to task first #823 (yahoNanJing)
- Upgrade DataFusion to 27.0.0 #834 (r4ntix)
- Reduce the number of calls to
create_logical_plan
#842 (jonahgao) - Bump semver from 5.7.1 to 5.7.2 in /ballista/scheduler/ui #843 (dependabot[bot])
- Bump actions/labeler from 4.1.0 to 4.3.0 #841 (dependabot[bot])
- Bump tough-cookie from 4.1.2 to 4.1.3 in /ballista/scheduler/ui #840 (dependabot[bot])
- Update flatbuffers requirement from 22.9.29 to 23.5.26 #801 (dependabot[bot])
- Update dirs requirement from 4.0.0 to 5.0.1 #767 (dependabot[bot])
- Update libloading requirement from 0.7.3 to 0.8.0 #761 (dependabot[bot])
- Introduce a cache crate supporting concurrent cache value loading #825 (yahoNanJing)
- Fix cargo clippy for latest rust version #848 (yahoNanJing)
- Introduce CachedBasedObjectStoreRegistry to use data source cache transparently #827 (yahoNanJing)
- Add ConsistentHash for node topology management #830 (yahoNanJing)
- Implement 3-phase consistent hash based task assignment policy #833 (yahoNanJing)
- Update tonic requirement from 0.8 to 0.9 #733 (dependabot[bot])
- Update itertools requirement from 0.10 to 0.11 #844 (dependabot[bot])
- Update etcd-client requirement from 0.10 to 0.11 #845 (dependabot[bot])
- Update hashbrown requirement from 0.13 to 0.14 #846 (dependabot[bot])
- Bump word-wrap from 1.2.3 to 1.2.4 in /ballista/scheduler/ui #849 (dependabot[bot])
- Update hdfs requirement from 0.1.1 to 0.1.4 #856 (yahoNanJing)
- Update to DataFusion 28 #858 (Dandandan)
- Upgrade datafusion to 30.0.0 #866 (r4ntix)
- refactor: port get_scan_files to Ballista #877 (alamb)
- Upgrade datafusion to 31.0.0 #878 (r4ntix)
- Upgrade datafusion to 32.0.0 #899 (r4ntix)
- Update to DataFusion 33 #900 (Dandandan)
- Refactor lru mod, remove linked_hash_map #918 (PsiACE)
- Dynamically optimize aggregate (count) based on shuffle stats #919 (Dandandan)
- Use lz4 compression for shuffle files & flight stream, refactoring / improvements #920 (Dandandan)
- Make max encoding message size configurable #928 (andygrove)
- Set max message size to 16MB in gRPC clients #931 (andygrove)
- Upgrade to DataFusion 34.0.0-rc1 #927 (andygrove)
- Use official DF 34 release #939 (andygrove)
- Use StreamWriter instead of FileWriter #943 (avantgardnerio)
- Remove some TODO comments related to context fetching schemas from scheduler #946 (andygrove)
- Fix Docker build #947 (andygrove)
- Fix regression in DataFrame.write_xxx #945 (andygrove)
0.11.0 (2023-02-19)
Implemented enhancements:
- Remove
python
since it has been moved to its own repo,arrow-ballista-python
#653 - Add executor self-registration mechanism in the heartbeat service #648
- Upgrade to DataFusion 17 #638
- Move Python bindings to separate repo? #635
- Implement new release process #622
- Change default branch name from master to main #618
- Update latest datafusion dependency #610
- Implement optimizer rule to remove redundant repartitioning #608
- ballista-cli as (docker) images #600
- Update contributor guide #598
- Fix cargo clippy #570
- Support Alibaba Cloud OSS with ObjectStore #566
- Refactor
StateBackendClient
to be a higher-level interface #554 - Make it concurrently to launch tasks to executors #544
- Simplify docs #531
- Provide an in-memory StateBackend #505
- Add support for Azure blob storage #294
- Add a workflow to build the image and publish it to the package #71
Fixed bugs:
- Rust / Check Cargo.toml formatting (amd64, stable) (pull_request) Failing #662
- Protobuf parsing error #646
- jobs from python client not showing up in Scheduler UI #625
- ballista ui fails to build #594
- cargo build --release fails for ballista-scheduler #590
- docker build fails #589
- Multi-scheduler Job Starvation #585
- Cannot query file from S3 #559
- Benchmark q16 fails #373
Documentation updates:
Merged pull requests:
- Upgrade to DataFusion 18 #668 (andygrove)
- Enable physical plan round-trip tests #666 (andygrove)
- Upgrade to DataFusion 18.0.0-rc1 #664 (andygrove)
- add test_util to make examples work well #661 (jiangzhx)
- Minor refactor to reduce duplicate code #659 (andygrove)
- Cluster state refactor Part 2 #658 (thinkharderdev)
- Remove
python
dir & python-related workflows #654 (iajoiner) - Add executor self-registration mechanism in the heartbeat service #649 (yahoNanJing)
- Upgrade to DataFusion 17 #639 (avantgardnerio)
- Upgrade to DataFusion 16 (again) #636 (avantgardnerio)
- Update release process documentation #632 (andygrove)
- Implement new release process #623 (andygrove)
- Update contributor guide #617 (andygrove)
- Fix Cargo.toml format issue #616 (andygrove)
- Refactor scheduler main #615 (andygrove)
- Refactor executor main #614 (andygrove)
- Update datafusion dependency to the latest version #612 (yahoNanJing)
- Add support for Azure Blob Storage #599 (aidankovacic-8451)
- Python: add method to get explain output as a string #593 (andygrove)
- Handle job resubmission #586 (thinkharderdev)
- updated readme to contain correct versions of dependencies. #580 (saikrishna1-bidgely)
- Update graphviz-rust requirement from 0.4.0 to 0.5.0 #574 (dependabot[bot])
- Super minor spelling error #573 (jdye64)
- Fix cargo clippy #571 (yahoNanJing)
- Support Alibaba Cloud OSS with ObjectStore #567 (r4ntix)
- fix(ui): fix last seen #562 (duyet)
- Cluster state refactor part 1 #560 (thinkharderdev)
- Make it concurrently to launch tasks to executors #557 (yahoNanJing)
- Update datafusion requirement from 14.0.0 to 15.0.0 #552 (yahoNanJing)
- docs: fix style in the Helm readme #551 (haoxins)
- Fix Helm chart's image format #550 (haoxins)
- Update env_logger requirement from 0.9 to 0.10 #539 (dependabot[bot])
- only build docker images on rc tags #535 (andygrove)
- Remove
--locked
when building Python wheels #533 (andygrove) - Bump actions/labeler from 4.0.2 to 4.1.0 #525 (dependabot[bot])
- Provide a memory StateBackendClient #523 (yahoNanJing)
0.10.0 (2022-11-18)
Implemented enhancements:
- Add user guide section on prometheus metrics #507
- Don't throw error when job path not exist in remove_job_data #502
- Fix clippy warning #494
- Use job_data_clean_up_interval_seconds == 0 to indicate executor_cleanup_enable #488
- Add a config for tracing log rolling policy for both scheduler and executor #486
- Set up repo where we can push benchmark results #473
- Make the delayed time interval for cleanup job data in both scheduler and executor configurable #469
- Add some validation for the remove_job_data grpc service #467
- Add ability to build docker images using
release-lto
profile #463 - Suggest users download (rather than build) the FlightSQL JDBC Driver #460
- Clean up legacy job shuffle data #459
- Add grpc service for the scheduler to make it able to be triggered by client explicitly #458
- Replace Mutex<HashMap> by using DashMap #448
- Refine log level #446
- Upgrade to DataFusion 14.0.0 #445
- Add a feature for hdfs3 #419
- Add optional flag which advertises host for Arrow Flight SQL #418
- Partitioning reasoning in DataFusion and Ballista #284
- Stop wasting time in CI on MIRI runs #283
- Publish Docker images as part of each release #236
- Cleanup job/stage status from TaskManager and clean up shuffle data after a period after JobFinished #185
Fixed bugs:
- build broken: configure_me_codegen retroactively reserved
bind_host
#519 - Return empty results for SQLs with order by #451
- ballista scheduler is not taken inline parameters into account #443
- [FlightSQL] Cannot connect with Tableau Desktop #428
- Benchmark q15 fails #372
- Incorrect documentation for building Ballista on Linux when using docker-compose #362
- Scheduler silently replaces
ParquetExec
withEmptyExec
if data path is not correctly mounted in container #353 - SQL with order by limit returns nothing #334
Documentation updates:
Merged pull requests:
- configure_me_codegen retroactively reserved on our
bind_host
parame… #520 (avantgardnerio) - Bump actions/cache from 2 to 3 #517 (dependabot[bot])
- Update graphviz-rust requirement from 0.3.0 to 0.4.0 #515 (dependabot[bot])
- Add Prometheus metrics endpoint #511 (thinkharderdev)
- Enable tests that work since upgrading to DataFusion 14 #510 (andygrove)
- Update hashbrown requirement from 0.12 to 0.13 #506 (dependabot[bot])
- Don't throw error when job shuffle data path not exist in executor #503 (yahoNanJing)
- Upgrade to DataFusion 14.0.0 and Arrow 26.0.0 #499 (andygrove)
- Fix clippy warning #495 (yahoNanJing)
- Stop wasting time in CI on MIRI runs #491 (Ted-Jiang)
- Remove executor config executor_cleanup_enable and make the configuation name for executor cleanup more intuitive #489 (yahoNanJing)
- Add a config for tracing log rolling policy for both scheduler and executor #487 (yahoNanJing)
- Add grpc service of cleaning up job shuffle data for the scheduler to make it able to be triggered by client explicitly #485 (yahoNanJing)
- [Minor] Bump DataFusion #480 (Dandandan)
- Remove benchmark results from README #478 (andygrove)
- Update
flightsql.md
to provide correct instruction #476 (iajoiner) - Add support for Tableau #475 (avantgardnerio)
- Add SchedulerConfig for the scheduler configurations, like event_loop_buffer_size, finished_job_data_clean_up_interval_seconds, finished_job_state_clean_up_interval_seconds #472 (yahoNanJing)
- Bump DataFusion #471 (Dandandan)
- Add some validation for remove_job_data in the executor server #468 (yahoNanJing)
- Update documentation to reflect the release of the FlightSQL JDBC Driver #461 (avantgardnerio)
- Bump DataFusion version #453 (andygrove)
- Add shuffle for SortPreservingMergeExec physical operator #452 (yahoNanJing)
- Replace Mutex<HashMap> by using DashMap #449 (yahoNanJing)
- Refine log level for trial info and periodically invoked places #447 (yahoNanJing)
- MINOR: Add
set -e
to scripts, fix a typo #444 (andygrove) - Add optional flag which advertises host for Arrow Flight SQL #418 #442 (DaltonModlin)
- Reorder joins after resolving stage inputs #441 (Dandandan)
- Add a feature for hdfs3 #439 (yahoNanJing)
- Add Spark benchmarks #438 (andygrove)
- scheduler now verifies that
file://
ListingTable URLs are accessible #414 (andygrove)
0.9.0 (2022-10-22)
Implemented enhancements:
- Support count distinct aggregation function #411
- Use multi-task definition in pull-based execution loop #400
- Make the scheduler event loop buffer size configurable #397
- Remove active execution graph when the related job is successful or failed. #391
- Improve launch task efficiency by calling LaunchMultiTask #389
- Use
tokio::sync::Semaphore
to wait for available task slots #388 - stdout and file log level settings are inconsistent #385
- Use dedicated executor in pull based loop #383
- Avoid calling scheduler when the executor cannot accept new tasks #377
- Add round robin executor slots reservation policy for the scheduler to evenly assign tasks to executors #371
- Switch to mimalloc and enable by default #369
- Integration test script should use docker-compose #364
- Use local shuffle reader in containerized environments #356
- Add
--ext
option to benchmark #352 - Add job cancel in the UI #350
- Using local shuffle reader avoid flight rpc call. #346
- Add a Helm Chart #321
- [UI] Show list of query stages with metrics #306
- [UI] Add ability to specify job name and have it show in the job listing page in the UI #277
- [UI] Add ability to download query plans in dot format #276
- [UI] Add ability to render query plans #275
- Add REST API documentation to User Guide #272
- Graceful shutdown: Handle
SIGTERM
#266 - [EPIC] Scheduler UI #265
- Introduce the datafusion-objectstore-hdfs in datafusion-contrib as an object store feature #259
- Add a feature based object store provider #257
- Add docker build files #248
- Allow IDEs to recognize generated code #246
- Add user guide section on Flight SQL support #230
dev/release/README.md
is outdated #228- Make ShuffleReaderExec output less verbose #211
- Add LaunchMultiTask rpc interface for executor #209
- Make executor fetch shuffle partition data in parallel #208
- Concurrency control and rate limit during shuffle reader #195
- Update User Guide #160
- Ballista 0.8.0 Release #159
- Save encoded execution plan in the ExecutionStage to reduce cost of task serialization and deserialization #142
- Failed task retry #140
- Redefine the executor task slots #132
- Use ArrowFlight bearer token auth to create a session key for FlightSql clients #112
- Leverage Atomic for the in-memory states in Scheduler #101
- Introduce the object stores in datafusion-contrib as optional features #87
- Support multiple paths for ListingTableScanNode #75
- Need clean up intermediate data in Ballista #9
- Ballista does not support external file systems #10
Fixed bugs:
- Build errors in ./dev/build-ballista-rust.sh #407
- The Ballista Scheduler Dockerfile copies a file that no longer exists #402
- Benchmark q20 fails #374
- Integration tests fail #360
- Helm deploy fails #344
- Executor get stopped unexpected #333
- Executor poll work loop failure #311
- Queries with
LIMIT
are failing with "PhysicalExtensionCodec is not provided" #300 - Schema inference does not work in Ballista-cli with a remote context #287
- There are bugs in the yarn build github misses but break our internal build #270
- Race condition running docker-compose #267
- Scheduler UI not working in Docker image #250
- Use bind host rather than the external host for starting a local executor service #244
- Initial query stages read parquet files and repartition them needlessly #243
- Cannot build Docker images on macOS 12.5.1 with M1 chip #234
- CLI uses DataFusion context if no host or port are provided #219
- Unsupported binary operator
StringConcat
#201 - Ballista assumes all aggregate expressions are not DISTINCT #5
- Start ballista ui with docker, but it can not found ballista scheduler #11
- Cannot build Ballista docker images on Apple silicon #17
Documentation updates:
- Fixup links in README.md #366 (romanz)
- Update README in preparation for 0.9.0 release #318 (andygrove)
- User Guide improvements #274 (andygrove)
Closed issues:
- Automatic version updates for github actions with dependabot #127
Merged pull requests:
- Return multiple tasks in poll_work based on free slots #429 (Dandandan)
- Run integration tests as part of release verification script #426 (andygrove)
- Bump actions/setup-node from 2 to 3 #424 (dependabot[bot])
- Bump actions/setup-python from 2 to 4 #423 (dependabot[bot])
- Bump actions/checkout from 2 to 3 #422 (dependabot[bot])
- Bump actions/download-artifact from 2 to 3 #421 (dependabot[bot])
- Bump actions/upload-artifact from 2 to 3 #420 (dependabot[bot])
- MINOR: Fix yarn warnings #415 (andygrove)
- Fix q20 sql typo in benchmarks #409 (r4ntix)
- MINOR: Add notes on Apache Reporter #401 (andygrove)
- Use local shuffle reader in containerized environments and some impro… #399 (Ted-Jiang)
- Make the scheduler event loop buffer size configurable #398 (yahoNanJing)
- Add RoundRobinLocal slots policy for caching executor data to avoid seld persistency #396 (yahoNanJing)
- Add round robin executor slots reservation policy for the scheduler to evenly assign tasks to executors #395 (yahoNanJing)
- Improve launch task efficiency by calling LaunchMultiTask #394 (yahoNanJing)
- Cache encoded stage plan #393 (yahoNanJing)
- Remove active execution graph when the related job is successful or failed #392 (yahoNanJing)
- Update flatbuffers requirement from 2.1.2 to 22.9.29 #390 (dependabot[bot])
- Unified the log level configuration behavior #386 (r4ntix)
- Add DistinctCount support #384 (r4ntix)
- Pull-based execution loop improvements #380 (Dandandan)
- Fix latest commit #379 (Dandandan)
- Avoid calling scheduler when the executor cannot accept new tasks #378 (Dandandan)
- Switch to mimalloc and enable by default in executor #370 (Dandandan)
- Benchmark looks for path with and without extension #354 (andygrove)
- Implement job cancellation in UI #349 (Dandandan)
- Using local shuffle reader avoid flight rpc call. #347 (Ted-Jiang)
- Make helm deployable #345 (avantgardnerio)
- Benchmark & UI improvements #343 (andygrove)
- Add
cancel_job
REST API #340 (tfeda) - Fix labeler #337 (andygrove)
- Upgrade to DataFusion 13.0.0 #336 (andygrove)
- Check executor id consistency when receive stop executor request #335 (yahoNanJing)
- Enable more benchmark serde tests #331 (andygrove)
- Downgrade
docker-compose.yaml
to version 3.3 so that we can support Ubuntu 20.04.4 LTS #329 (andygrove) - update labeler #326 (andygrove)
- Upgrade to DataFusion 13.0.0-rc1 #325 (andygrove)
- Dependabot stop suggesting arrow and datafusion updates #324 (andygrove)
- Show job stages metrics #323 (onthebridgetonowhere)
- Add helm chart #322 (avantgardnerio)
- Atomic support for enhancement #319 (metesynnada)
- Allow automatic schema inference when registering csv #313 (r4ntix)
- Add ability to specify job name and have it show in the job listing page in the UI #312 (andygrove)
- Add REST API to generate DOT graph for individual query stage #310 (andygrove)
- [UI] Use tabbed pane with Queries and Executors tabs #309 (andygrove)
- REST API to get query stages #305 (andygrove)
- Add support for SortPreservingMergeExec; fix LIMIT bug #304 (andygrove)
- Add Python script to run benchmarks #302 (andygrove)
- [UI] Add ability to view query plans directly in the UI #301 (onthebridgetonowhere)
- Update datafusion.proto #299 (andygrove)
- Replace function
from_proto_binary_op
from upstream #298 (askoa) - Fix dead link in contribution guideline readme file #297 (onthebridgetonowhere)
- UI code cleanup #291 (KenSuenobu)
- Add support for S3 data sources #290 (andygrove)
- Use latest datafusion #289 (andygrove)
- Fix documentation example #288 (onthebridgetonowhere)
- Improve formatting of job status in UI #286 (andygrove)
- Enabled download of dot files from Download icon #279 (KenSuenobu)
- Executor graceful shutdown: Handle SIGTERM #278 (mingmwang)
- Also run yarn build to catch JavaScript errors in CI #271 (avantgardnerio)
- Store sessions so users can register tables and query them through flight #269 (avantgardnerio)
- Fix compose for Ian #268 (avantgardnerio)
- Task level retry and Stage level retry #261 (mingmwang)
- Introduce the datafusion-objectstore-hdfs in datafusion-contrib as an object store feature #260 (yahoNanJing)
- Add a feature based object store provider #258 (yahoNanJing)
- Make fetch shuffle partition data in parallel #256 (yahoNanJing)
- Add LaunchMultiTask rpc interface for executor #255 (yahoNanJing)
- CLI uses ballista context instead of datafusion context in local mode #252 (r4ntix)
- Fix Scheduler UI in Docker image #251 (andygrove)
- Generate into source folder to make IDEs happy #247 (avantgardnerio)
- Use bind host rather than the external host for starting a local executor service #245 (yahoNanJing)
- Add REST endpoint to get DOT graph of a job #242 (andygrove)
- Add list of jobs to scheduler UI #241 (andygrove)
- Clean up job data on both Scheduler and Executor #188 (mingmwang)
- Update etcd-client requirement from 0.9 to 0.10 #111 (dependabot[bot])
- Bump terser from 4.8.0 to 4.8.1 in /ballista/ui/scheduler #91 (dependabot[bot])
- Bump jsdom from 16.4.0 to 16.7.0 in /ballista/ui/scheduler #74 (dependabot[bot])
- Bump numpy from 1.21.3 to 1.22.0 in /python #72 (dependabot[bot])
0.8.0 (2022-09-16)
Implemented enhancements:
- Executor should use all available cores by default #218
- Update task status to the task job curator scheduler #179
- update datafusion and arrow to 20.0.0 #176
- No scheduler logs when deployed to k8s #165
- Upgrade to DataFusion 11.0.0 #163
- Better encapsulation for ExecutionGraph #149
- A stage may act as the input of multiple stages #144
- Executor Lost handling #143
- Cancel a running query. #139
- Ignore the previous job_id inside fill_reservations() #138
- Normalize the serialization and deserialization places of protobuf structs #137
- Remove revive offer event loop #136
- Remove Keyspace::QueuedJobs #133
- Spawn a thread for execution plan generation #131
- Introduce CuratorTaskManager for make an active job be curated by only one scheduler #130
- Using tokio tracing for log file #122
- Ballista Executor report plan/operators metrics to Ballista Scheduler when task finish #116
- Add timeout settings for Grpc Client #114
- Add log level config in ballista #102
- Use another channel to update the status of a task set for executor #96
- Add config for concurrent_task in executor #94
- Ballista should support Arrow FlightSQL #92
- Why not include the
ballista-cli
in the member of workspace #88 - Upgrade dependency of arrow-datafusion to commit d0d5564b8f689a01e542b8c1df829d74d0fab2b0 #84
- Support sled path in config file. #79
- Support for multi-scheduler deployments #39
- Ballista 0.7.0 Release #126
- Improvements to Ballista extensibility #8
- Implement Python bindings for BallistaContext #15
Fixed bugs:
- Run example fails via PushStaged mode #214
- Config settings in BallistaContext do not get passed to DataFusion context #213
- Start scheduler fails with arguments "-s PushStaged" #207
- FlightSQL is broken and CI isn't catching it #190
- Query fails with "NULL is invalid as a DataFusion scalar value" #180
- Executor doesn't compile, missing
tokio::signal
#171 - Unable to build master #76
ballista-0.7.0 (2022-05-12)
Breaking changes:
- Make
ExecutionPlan::execute
Sync #2434 (tustvold) - Add
Expr::Exists
to represent EXISTS subquery expression #2339 (andygrove) - Remove dependency from
LogicalPlan::TableScan
toExecutionPlan
#2284 (andygrove) - Move logical expression type-coercion code from
physical-expr
crate toexpr
crate #2257 (andygrove) - feat: 2061 create external table ddl table partition cols #2099 [sql] (jychen7)
- Reorganize the project folders #2081 (yahoNanJing)
- Support more ScalarFunction in Ballista #2008 (Ted-Jiang)
- Merge dataframe and dataframe imp #1998 (vchag)
- Rename
ExecutionContext
toSessionContext
,ExecutionContextState
toSessionState
, addTaskContext
to support multi-tenancy configurations - Part 1 #1987 (mingmwang) - Add Coalesce function #1969 (msathis)
- Add Create Schema functionality in SQL #1959 [sql] (matthewmturner)
- remove sync constraint of SendableRecordBatchStream #1884 (doki23)
Implemented enhancements:
- Add
CREATE VIEW
#2279 (matthewmturner) - [Ballista] Support Union in ballista. #2098 (Ted-Jiang)
- Add missing aggr_expr to PhysicalExprNode for Ballista. #1989 (Ted-Jiang)
Fixed bugs:
- Ballista integration tests no longer work #2440
- Ballista crates cannot be released from DafaFusion 7.0.0 source release #1980
- protobuf OctetLength should be deserialized as octet_length, not length #1834 (carols10cents)
Documentation updates:
- MINOR: Make crate READMEs consistent #2437 (andygrove)
- docs: Update the Ballista dev env instructions #2419 (haoxins)
- Revise document of installing ballista pinned to specified version #2034 (WinkerDu)
- Fix typos (Datafusion -> DataFusion) #1993 (andygrove)
Performance improvements:
- Introduce StageManager for managing tasks stage by stage #1983 (yahoNanJing)
Closed issues:
- Make expected result string in unit tests more readable #2412
- remove duplicated
fn aggregate()
in aggregate expression tests #2399 - split
distinct_expression.rs
intocount_distinct.rs
andarray_agg_distinct.rs
#2385 - move sql tests in
context.rs
to corresponding test files indatafustion/core/tests/sql
#2328 - Date32/Date64 as join keys for merge join #2314
- Error precision and scale for decimal coercion in logic comparison #2232
- Support Multiple row layout #2188
- Discussion: Is Ballista a standalone system or framework #1916
Merged pull requests:
- MINOR: Enable multi-statement benchmark queries #2507 (andygrove)
- Persist session configs in scheduler #2501 (thinkharderdev)
- Update to
sqlparser
0.17.0
#2500 (alamb) - Limit cpu cores used when generating changelog #2494 (andygrove)
- MINOR: Parameterize changelog script #2484 (jychen7)
- Fix stage key extraction #2472 (thinkharderdev)
- Add support for list_dir() on local fs #2467 (wjones127)
- minor: update versions and paths in changelog scripts #2429 (andygrove)
- Fix Ballista executing during plan #2428 (tustvold)
- Re-organize and rename aggregates physical plan #2388 (yjshen)
- Upgrade to arrow 13 #2382 (alamb)
- Grouped Aggregate in row format #2375 (yjshen)
- Stop optimizing queries twice #2369 (andygrove)
- Bump follow-redirects from 1.13.2 to 1.14.9 in /ballista/ui/scheduler #2325 (dependabot[bot])
- Move FileType enum from sql module to logical_plan module #2290 (andygrove)
- Add BatchPartitioner (#2285) #2287 (tustvold)
- Update uuid requirement from 0.8 to 1.0 #2280 (dependabot[bot])
- Bump async from 2.6.3 to 2.6.4 in /ballista/ui/scheduler #2277 (dependabot[bot])
- Bump minimist from 1.2.5 to 1.2.6 in /ballista/ui/scheduler #2276 (dependabot[bot])
- Bump url-parse from 1.5.1 to 1.5.10 in /ballista/ui/scheduler #2275 (dependabot[bot])
- Bump nanoid from 3.1.20 to 3.3.3 in /ballista/ui/scheduler #2274 (dependabot[bot])
- Update to Arrow 12.0.0, update tonic and prost #2253 (alamb)
- Add ExecutorMetricsCollector interface #2234 (thinkharderdev)
- minor: add editor config file #2224 (jackwener)
- [Ballista] Enable ApproxPercentileWithWeight in Ballista and fill UT #2192 (Ted-Jiang)
- make nightly clippy happy #2186 (xudong963)
- [Ballista]Make PhysicalAggregateExprNode has repeated PhysicalExprNode #2184 (Ted-Jiang)
- Add LogicalPlan::SubqueryAlias #2172 (andygrove)
- Implement fast path of with_new_children() in ExecutionPlan #2168 (mingmwang)
- [MINOR] ignore suspicious slow test in Ballista #2167 (Ted-Jiang)
- enable explain for ballista #2163 (doki23)
- Add delimiter for create external table #2162 (matthewmturner)
- Update sqlparser requirement from 0.15 to 0.16 #2152 (dependabot[bot])
- Add IF NOT EXISTS to
CREATE TABLE
andCREATE EXTERNAL TABLE
#2143 (matthewmturner) - Update quarterly roadmap for Q2 #2133 (matthewmturner)
- [Ballista] Add ballista plugin manager and UDF plugin #2131 (gaojun2048)
- Serialize scalar UDFs in physical plan #2130 (thinkharderdev)
- doc: update release schedule #2110 (jychen7)
- Reduce repetition in Decimal binary kernels, upgrade to arrow 11.1 #2107 (alamb)
- update zlib version to 1.2.12 #2106 (waitingkuo)
- Add CREATE DATABASE command to SQL #2094 [sql] (matthewmturner)
- Refactor SessionContext, BallistaContext to support multi-tenancy configurations - Part 3 #2091 (mingmwang)
- Remove dependency of common for the storage crate #2076 (yahoNanJing)
- [MINOR] fix doc in `EXTRACT(field FROM source) #2074 (Ted-Jiang)
- [Bug][Datafusion] fix TaskContext session_config bug #2070 (gaojun2048)
- Short-circuit evaluation for
CaseWhen
#2068 (yjshen) - split datafusion-object-store module #2065 (yahoNanJing)
- Change log level for noisy logs #2060 (thinkharderdev)
- Update to arrow/parquet 11.0 #2048 (alamb)
- minor: format comments (
//
to//
) #2047 (jackwener) - use cargo-tomlfmt to check Cargo.toml formatting in CI #2033 (WinkerDu)
- Refactor SessionContext, SessionState and SessionConfig to support multi-tenancy configurations - Part 2 #2029 (mingmwang)
- Simplify prerequisites for running examples #2028 (doki23)
- Use SessionContext to parse Expr protobuf #2024 (thinkharderdev)
- Fix stuck issue for the load testing of Push-based task scheduling #2006 (yahoNanJing)
- Fixing a typo in documentation #1997 (psvri)
- Fix minor clippy issue #1995 (alamb)
- Make it possible to only scan part of a parquet file in a partition #1990 (yjshen)
- Update Dockerfile to fix integration tests #1982 (andygrove)
- Update sqlparser requirement from 0.14 to 0.15 #1966 (dependabot[bot])
- fix logical conflict with protobuf #1958 (alamb)
- Update to arrow 10.0.0, pyo3 0.16 #1957 (alamb)
- update jit-related dependencies #1953 (xudong963)
- Allow different types of query variables (
@@var
) rather than just string #1943 [sql] (maxburke) - Pruning serialization #1941 (thinkharderdev)
- Fix select from EmptyExec always return 0 row after optimizer passes #1938 (Ted-Jiang)
- Introduce Ballista query stage scheduler #1935 (yahoNanJing)
- Add db benchmark script #1928 (matthewmturner)
- fix a typo #1919 (vchag)
- [MINOR] Update copyright year in Docs #1918 (alamb)
- add metadata to DFSchema, close #1806. #1914 [sql] (jiacai2050)
- Refactor scheduler state mod #1913 (yahoNanJing)
- Refactor the event channel #1912 (yahoNanJing)
- Refactor scheduler server #1911 (yahoNanJing)
- Clippy fix on nightly #1907 (yjshen)
- Updated Rust version to 1.59 in all the files #1903 (NaincyKumariKnoldus)
- Remove uneeded Mutex in Ballista Client #1898 (alamb)
- Create a
datafusion-proto
crate for datafusion protobuf serialization #1887 (carols10cents) - Fix clippy lints #1885 (HaoYang670)
- Separate cpu-bound (query-execution) and IO-bound(heartbeat) to … #1883 (Ted-Jiang)
- [Minor] Clean up DecimalArray API Usage #1869 [sql] (alamb)
- Changes after went through "Datafusion as a library section" #1868 (nonontb)
- Remove allow unused imports from ballista-core, then fix all warnings #1853 (carols10cents)
- Update to arrow 9.1.0 #1851 (alamb)
- move some tests out of context and into sql #1846 (alamb)
- Fix compiling ballista in standalone mode, add build to CI #1839 (alamb)
- Update documentation example for change in API #1812 (alamb)
- Refactor scheduler state with different management policy for volatile and stable states #1810 (yahoNanJing)
- DataFusion + Conbench Integration #1791 (dianaclarke)
- Enable periodic cleanup of work_dir directories in ballista executor #1783 (Ted-Jiang)
- Use
eq_dyn
,neq_dyn
,lt_dyn
,lt_eq_dyn
,gt_dyn
,gt_eq_dyn
kernels from arrow #1475 (alamb)
7.1.0-rc1 (2022-04-10)
Implemented enhancements:
- Support substring with three arguments: (str, from, for) for DataFrame API and Ballista #2092
- UnionAll support for Ballista #2032
- Separate cpu-bound and IO-bound work in ballista-executor by using diff tokio runtime. #1770
- [Ballista] Introduce DAGScheduler for better managing the stage-based task scheduling #1704
- [Ballista] Support to better manage cluster state, like alive executors, executor available task slots, etc #1703
Closed issues:
- Optimize memory usage pattern to avoid "double memory" behavior #2149
- Document approx_percentile_cont_with_weight in users guide #2078
- [follow up]cleaning up statements.remove(0) #1986
- Formatting error on documentation for Python #1873
- Remove duplicate tests from
test_const_evaluator_scalar_functions
#1727 - Question: Is the Ballista project providing value to the overall DataFusion project? #1273
7.0.0-rc2 (2022-02-14)
7.0.0 (2022-02-14)
Breaking changes:
- Update
ExecutionPlan
to know about sortedness and repartitioning optimizer pass respect the invariants #1776 (alamb) - Update to
arrow 8.0.0
#1673 (alamb)
Implemented enhancements:
- Task assignment between Scheduler and Executors #1221
- Add
approx_median()
aggregate function #1729 (realno) - [Ballista] Add Decimal128, Date64, TimestampSecond, TimestampMillisecond, Interv… #1659 (gaojun2048)
- Add
corr
aggregate function #1561 (realno) - Add
covar
,covar_pop
andcovar_samp
aggregate functions #1551 (realno) - Add
approx_quantile()
aggregation function #1539 (domodwyer) - Initial MemoryManager and DiskManager APIs for query execution + External Sort implementation #1526 (yjshen)
- Add
stddev
andvariance
#1525 (realno) - Add
rem
operation for Expr #1467 (liukun4515) - Implement
array_agg
aggregate function #1300 (viirya)
Fixed bugs:
- Ballista context::tests::test_standalone_mode test fails #1020
- [Ballista] Fix scheduler state mod bug #1655 (gaojun2048)
- Pass local address host so we do not get mismatch between IPv4 and IP… #1466 (thinkharderdev)
- Add Timezone to Scalar::Time* types, and better timezone awareness to Datafusion's time types #1455 (maxburke)
Documentation updates:
- Add dependencies to ballista example documentation #1346 (jgoday)
- [MINOR] Fix some typos. #1310 (Ted-Jiang)
- fix some clippy warnings from nightly channel #1277 [sql] (Jimexist)
Performance improvements:
- Introduce push-based task scheduling for Ballista #1560 (yahoNanJing)
Closed issues:
- Track memory usage in Non Limited Operators #1569
- [Question] Why does ballista store tables in the client instead of in the SchedulerServer #1473
- Why use the expr types before coercion to get the result type? #1358
- A problem about the projection_push_down optimizer gathers valid columns #1312
- apply constant folding to
LogicalPlan::Values
#1170 - reduce usage of
IntoIterator<Item = Expr>
in logical plan builder window fn #372
Merged pull requests:
- Fix verification scripts for 7.0.0 release #1830 (alamb)
- update README for ballista #1817 (liukun4515)
- Fix logical conflict #1801 (alamb)
- Improve the error message and UX of tpch benchmark program #1800 (alamb)
- Update to sqlparser 0.14 #1796 [sql] (alamb)
- Update datafusion versions #1793 (matthewmturner)
- Update datafusion to use arrow 9.0.0 #1775 (alamb)
- Update parking_lot requirement from 0.11 to 0.12 #1735 (dependabot[bot])
- substitute
parking_lot::Mutex
forstd::sync::Mutex
#1720 (xudong963) - Create ListingTableConfig which includes file format and schema inference #1715 (matthewmturner)
- Support
create_physical_expr
andExecutionContextState
orDefaultPhysicalPlanner
for faster speed #1700 (alamb) - Use NamedTempFile rather than
String
in DiskManager #1680 (alamb) - Abstract over logical and physical plan representations in Ballista #1677 (thinkharderdev)
- upgrade clap to version 3 #1672 (Jimexist)
- Improve configuration and resource use of
MemoryManager
andDiskManager
#1668 (alamb) - Make
MemoryManager
andMemoryStream
public #1664 (yjshen) - Consolidate Schema and RecordBatch projection #1638 (alamb)
- Update hashbrown requirement from 0.11 to 0.12 #1631 (dependabot[bot])
- Update etcd-client requirement from 0.7 to 0.8 #1626 (dependabot[bot])
- update nightly version #1597 (Jimexist)
- Add support show tables and show columns for ballista #1593 (gaojun2048)
- minor: improve the benchmark readme #1567 (xudong963)
- Consolidate
batch_size
configuration inExecutionConfig
,RuntimeConfig
andPhysicalPlanConfig
#1562 (yjshen) - Update to rust 1.58 #1557 (xudong963)
- support mathematics operation for decimal data type #1554 (liukun4515)
- Make call SchedulerServer::new once in ballista-scheduler process #1537 (Ted-Jiang)
- Add load test command in tpch.rs. #1530 (Ted-Jiang)
- Remove one copy of ballista datatype serialization code #1524 (alamb)
- Update to arrow-7.0.0 #1523 (alamb)
- Workaround build failure: Pin quote to 1.0.10 #1499 (alamb)
- add rfcs for datafusion #1490 (xudong963)
- support comparison for decimal data type and refactor the binary coercion rule #1483 (liukun4515)
- Update arrow-rs to 6.4.0 and replace boolean comparison in datafusion with arrow compute kernel #1446 (xudong963)
- support cast/try_cast for decimal: signed numeric to decimal #1442 (liukun4515)
- use 0.13 sql parser #1435 (Jimexist)
- Clarify communication on bi-weekly sync #1427 (alamb)
- Minimize features #1399 (carols10cents)
- Update rust vesion to 1.57 #1395 [sql] (xudong963)
- Add coercion rules for AggregateFunctions #1387 (liukun4515)
- upgrade the arrow-rs version #1385 (liukun4515)
- Extract logical plan: rename the plan name (follow up) #1354 [sql] (liukun4515)
- upgrade arrow-rs to 6.2.0 #1334 (liukun4515)
- Update release instructions #1331 (alamb)
- Extract Aggregate, Sort, and Join to struct from AggregatePlan #1326 (matthewmturner)
- Extract
EmptyRelation
,Limit
,Values
fromLogicalPlan
#1325 (liukun4515) - Extract CrossJoin, Repartition, Union in LogicalPlan #1322 (liukun4515)
- Extract Explain, Analyze, Extension in LogicalPlan as independent struct #1317 [sql] (xudong963)
- Extract CreateMemoryTable, DropTable, CreateExternalTable in LogicalPlan as independent struct #1311 [sql] (liukun4515)
- Extract Projection, Filter, Window in LogicalPlan as independent struct #1309 (ic4y)
- Add PSQL comparison tests for except, intersect #1292 (mrob95)
- Extract logical plans in LogicalPlan as independent struct: TableScan #1290 (xudong963)
6.0.0-rc0 (2021-11-14)
6.0.0 (2021-11-14)
ballista-0.6.0 (2021-11-13)
Breaking changes:
- File partitioning for ListingTable #1141 (rdettai)
- Register tables in BallistaContext using TableProviders instead of Dataframe #1028 (rdettai)
- Make TableProvider.scan() and PhysicalPlanner::create_physical_plan() async #1013 (rdettai)
- Reorganize table providers by table format #1010 (rdettai)
- Move CBOs and Statistics to physical plan #965 (rdettai)
- Update to sqlparser v 0.10.0 #934 [sql] (alamb)
- FilePartition and PartitionedFile for scanning flexibility #932 [sql] (yjshen)
- Improve SQLMetric APIs, port existing metrics #908 (alamb)
- Add support for EXPLAIN ANALYZE #858 [sql] (alamb)
- Rename concurrency to target_partitions #706 (andygrove)
Implemented enhancements:
- Update datafusion-cli to support Ballista, or implement new ballista-cli #886
- Prepare Ballista crates for publishing #509
- Add drop table support #1266 [sql] (viirya)
- use arrow 6.1.0 #1255 (Jimexist)
- Add support for
create table as
via MemTable #1243 [sql] (Dandandan) - add values list expression #1165 [sql] (Jimexist)
- Multiple files per partitions for CSV Avro Json #1138 (rdettai)
- Implement INTERSECT & INTERSECT DISTINCT #1135 [sql] (xudong963)
- Simplify file struct abstractions #1120 (rdettai)
- Implement
is [not] distinct from
#1117 [sql] (Dandandan) - add digest(utf8, method) function and refactor all current hash digest functions #1090 (Jimexist)
- [crypto] add
blake3
algorithm todigest
function #1086 (Jimexist) - [crypto] add blake2b and blake2s functions #1081 (Jimexist)
- Update sqlparser-rs to 0.11 #1052 [sql] (alamb)
- remove hard coded partition count in ballista logicalplan deserialization #1044 (xudong963)
- Indexed field access for List #1006 [sql] (Igosuki)
- Update DataFusion to arrow 6.0 #984 (alamb)
- Implement Display for Expr, improve operator display #971 [sql] (matthewmturner)
- ObjectStore API to read from remote storage systems #950 (yjshen)
- fixes #933 replace placeholder fmt_as fr ExecutionPlan impls #939 (tiphaineruy)
- Support
NotLike
in Ballista #916 (Dandandan) - Avro Table Provider #910 [sql] (Igosuki)
- Add BaselineMetrics, Timestamp metrics, add for
CoalescePartitionsExec
, rename output_time -> elapsed_compute #909 (alamb) - [Ballista] Add executor last seen info to the ui #895 (msathis)
- add cross join support to ballista #891 (houqp)
- Add Ballista support to DataFusion CLI #889 (andygrove)
- Add support for PostgreSQL regex match #870 [sql] (b41sh)
Fixed bugs:
- Test execution_plans::shuffle_writer::tests::test Fail #1040
- Integration test fails to build docker images #918
- Ballista: Remove hard-coded concurrency from logical plan serde code #708
- How can I make ballista distributed compute work? #327
- fix subquery alias #1067 [sql] (xudong963)
- Fix compilation for ballista in stand-alone mode #1008 (Igosuki)
Documentation updates:
- Add Ballista roadmap #1166 (andygrove)
- Adds note on compatible rust version #1097 (1nF0rmed)
- implement
approx_distinct
function using HyperLogLog #1087 (Jimexist) - Improve User Guide #954 (andygrove)
- Update plan_query_stages doc #951 (rdettai)
- [DataFusion] - Add show and show_limit function for DataFrame #923 (francis-du)
- update docs related to protoc and optional syntax #902 (Jimexist)
- Improve Ballista crate README content #878 (andygrove)
Performance improvements:
Closed issues:
- InList expr with NULL literals do not work #1190
- update the homepage README to include values,
approx_distinct
, etc. #1171 - [Python]: Inconsistencies with Python package name #1011
- Wanting to contribute to project where to start? #983
- delete redundant code #973
- How to build DataFusion python wheel #853
- Produce a design for a metrics framework #21
Merged pull requests:
For older versions, see apache/arrow/CHANGELOG.md
ballista-0.5.0 (2021-08-10)
Breaking changes:
- [ballista] support date_part and date_turnc ser/de, pass tpch 7 #840 (houqp)
- Box ScalarValue:Lists, reduce size by half size #788 (alamb)
- Support DataFrame.collect for Ballista DataFrames #785 (andygrove)
- JOIN conditions are order dependent #778 (seddonm1)
- UnresolvedShuffleExec should represent a single shuffle #727 (andygrove)
- Ballista: Make shuffle partitions configurable in benchmarks #702 (andygrove)
- Rename MergeExec to CoalescePartitionsExec #635 (andygrove)
- Ballista: Rename QueryStageExec to ShuffleWriterExec #633 (andygrove)
- fix 593, reduce cloning by taking ownership in logical planner's
from
fn #610 (Jimexist) - fix join column handling logic for
On
andUsing
constraints #605 (houqp) - Move ballista standalone mode to client #589 (edrevo)
- Ballista: Implement map-side shuffle #543 (andygrove)
- ShuffleReaderExec now supports multiple locations per partition #541 (andygrove)
- Make external hostname in executor optional #232 (edrevo)
- Remove namespace from executors #75 (edrevo)
- Support qualified columns in queries #55 (houqp)
- Read CSV format text from stdin or memory #54 (heymind)
- Remove Ballista DataFrame #48 (andygrove)
- Use atomics for SQLMetric implementation, remove unused name field #25 (returnString)
Implemented enhancements:
- Add crate documentation for Ballista crates #830
- Support DataFrame.collect for Ballista DataFrames #787
- Ballista: Prep for supporting shuffle correctly, part one #736
- Ballista: Implement physical plan serde for ShuffleWriterExec #710
- Ballista: Finish implementing shuffle mechanism #707
- Rename QueryStageExec to ShuffleWriterExec #542
- Ballista ShuffleReaderExec should be able to read from multiple locations per partition #540
- [Ballista] Use deployments in k8s user guide #473
- Ballista refactor QueryStageExec in preparation for map-side shuffle #458
- Ballista: Implement map-side of shuffle #456
- Refactor Ballista to separate Flight logic from execution logic #449
- Use published versions of arrow rather than github shas #393
- BallistaContext::collect() logging is too noisy #352
- Update Ballista to use new physical plan formatter utility #343
- Add Ballista Getting Started documentation #329
- Remove references to ballistacompute Docker Hub repo #325
- Implement scalable distributed joins #63
- Remove hard-coded Ballista version from scripts #32
- Implement streaming versions of Dataframe.collect methods #789 (andygrove)
- Ballista shuffle is finally working as intended, providing scalable distributed joins #750 (andygrove)
- Update to use arrow 5.0 #721 (alamb)
- Implement serde for ShuffleWriterExec #712 (andygrove)
- dedup using join column in wildcard expansion #678 (houqp)
- Implement metrics for shuffle read and write #676 (andygrove)
- Remove hard-coded PartitionMode from Ballista serde #637 (andygrove)
- Ballista: Implement scalable distributed joins #634 (andygrove)
- Add Keda autoscaling for ballista in k8s #586 (edrevo)
- Add some resiliency to lost executors #568 (edrevo)
- Add
partition by
constructs in window functions and modify logical planning #501 (Jimexist) - Support anti join #482 (Dandandan)
- add
order by
construct in window function and logical plans #463 (Jimexist) - Refactor Ballista executor so that FlightService delegates to an Executor struct #450 (andygrove)
- implement lead and lag built-in window function #429 (Jimexist)
- Implement fmt_as for ShuffleReaderExec #400 (andygrove)
- Add window expression part 1 - logical and physical planning, structure, to/from proto, and explain, for empty over clause only #334 (Jimexist)
- [breaking change] fix 265, log should be log10, and add ln #271 (Jimexist)
- Allow table providers to indicate their type for catalog metadata #205 (returnString)
- Add query 19 to TPC-H regression tests #59 (Dandandan)
- Use arrow eq kernels in CaseWhen expression evaluation #52 (Dandandan)
- Add option param for standalone mode #42 (djKooks)
- [DataFusion] Optimize hash join inner workings, null handling fix #24 (Dandandan)
- [Ballista] Docker files for ui #22 (msathis)
Fixed bugs:
- Ballista: TPC-H q3 @ SF=1000 never completes #835
- Ballista does not support MIN/MAX aggregate functions #832
- Ballista docker images fail to build #828
- Ballista: UnresolvedShuffleExec should only have a single stage_id #726
- Ballista integration tests are failing #623
- Integration test build failure due to arrow-rs using unstable feature #596
cargo build
cannot build the project #531- ShuffleReaderExec does not get formatted correctly in displayable physical plan #399
- Implement serde for MIN and MAX #833 (andygrove)
- Ballista: Prep for fixing shuffle mechansim, part 1 #738 (andygrove)
- Ballista: Shuffle write bug fix #714 (andygrove)
- honor table name for csv/parquet scan in ballista plan serde #629 (houqp)
- MINOR: Fix integration tests by adding datafusion-cli module to docker image #322 (andygrove)
Documentation updates:
- Add minimal crate documentation for Ballista crates #831 (andygrove)
- Add Ballista examples #775 (andygrove)
- Update ballista.proto link in architecture doc #502 (terrycorley)
- Update k8s user guide to use deployments #474 (edrevo)
- use prettier to format md files #367 (Jimexist)
- Make it easier for developers to find Ballista documentation #330 (andygrove)
- Instructions for cross-compiling Ballista to the Raspberry Pi #263 (andygrove)
- Add install guide in README #236 (djKooks)
Performance improvements:
- Ballista: Avoid sleeping between polling for tasks #698 (Dandandan)
- Make BallistaContext::collect streaming #535 (edrevo)
Closed issues:
- Confirm git tagging strategy for releases #770
- arrow::util::pretty::pretty_format_batches missing #769
- move the
assert_batches_eq!
macros to a non part of datafusion #745 - fix an issue where aliases are not respected in generating downstream schemas in window expr #592
- make the planner to print more succinct and useful information in window function explain clause #526
- move window frame module to be in
logical_plan
#517 - use a more rust idiomatic way of handling nth_value #448
- Make Ballista not depend on arrow directly #446
- create a test with more than one partition for window functions #435
- Implement hash-partitioned hash aggregate #27
- Consider using GitHub pages for DataFusion/Ballista documentation #18
- Add Ballista to default cargo workspace #17
- Update "repository" in Cargo.toml #16
- Consolidate TPC-H benchmarks #6
- [Ballista] Fix integration test script #4
- Ballista should not have separate DataFrame implementation #2
Merged pull requests:
- Change datatype of tpch keys from Int32 to UInt64 to support sf=1000 #836 (andygrove)
- Add ballista-examples to docker build #829 (andygrove)
- Update dependencies: prost to 0.8 and tonic to 0.5 #818 (alamb)
- Move
hash_array
into hash_utils.rs #807 (alamb) - Fix: Update clippy lints for Rust 1.54 #794 (alamb)
- MINOR: Remove unused Ballista query execution code path #732 (andygrove)
- [fix] benchmark run with compose #666 (rdettai)
- bring back dev scripts for ballista #648 (Jimexist)
- Remove unnecessary mutex #639 (edrevo)
- round trip TPCH queries in tests #630 (houqp)
- Fix build #627 (andygrove)
- in ballista also check for UI prettier changes #578 (Jimexist)
- turn on clippy rule for needless borrow #545 (Jimexist)
- reuse datafusion physical planner in ballista building from protobuf #532 (Jimexist)
- update cargo.toml in python crate and fix unit test due to hash joins #483 (Jimexist)
- make
VOLUME
declaration in tpch datagen docker absolute #466 (crepererum) - Refactor QueryStageExec in preparation for implementing map-side shuffle #459 (andygrove)
- Simplified usage of
use arrow
in ballista. #447 (jorgecarleitao) - Benchmark subcommand to distinguish between DataFusion and Ballista #402 (jgoday)
- #352: BallistaContext::collect() logging is too noisy #394 (jgoday)
- cleanup function return type fn #350 (Jimexist)
- Update Ballista to use new physical plan formatter utility #344 (andygrove)
- Update arrow dependencies again #341 (alamb)
- Remove references to Ballista Docker images published to ballistacompute Docker Hub repo #326 (andygrove)
- Update arrow-rs deps #317 (alamb)
- Update arrow deps #269 (alamb)
- Enable redundant_field_names clippy lint #261 (Dandandan)
- Update arrow-rs deps (to fix build due to flatbuffers update) #224 (alamb)
- update arrow-rs deps to latest master #216 (alamb)
* This Changelog was automatically generated by github_changelog_generator