Skip to content

Latest commit

 

History

History
384 lines (285 loc) · 15.7 KB

CHANGES.md

File metadata and controls

384 lines (285 loc) · 15.7 KB

Change Log

Changed

  • #383: Dropped support for Python 3.8 and added CI build for Python 3.13.

2.1.0 - 2023-11-26

Changed

  • #348, #367: Bumped default Spark to 3.5.0 and default Hadoop to 3.3.6; dropped support for Python 3.6 and 3.7; added CI builds for Python 3.10, 3.11, and 3.12.
  • #361: Migrated from AdoptOpenJDK, which is deprecated, to Adoptium OpenJDK.
  • #362, #366: Improved Flintrock's ability to cleanup after launch failures.
  • #366: Deprecated --ec2-spot-request-duration, which is not needed for one-time spot instances launched using the RunInstances API.
  • #369: Adopted pyproject.toml and tweaked Flintrock's Python packaging accordingly. This keeps Flintrock in line with modern Python packaging standards and should be transparent to end-users.

2.0.0 - 2021-06-10

Added

  • #296: Added support for launching clusters into private VPCs. This includes new infrastructure added in #302 to support testing against private VPCs.
  • #307: Added support for Hadoop/HDFS 3.x.
  • #315: Added a new --ec2-spot-request-duration option to support setting the EC2 spot request duration.
  • #316: Added a new --java-version option and support for Java 11.
  • #323: Flintrock now automatically selects the correct build of Spark to use, based on the version of Hadoop/HDFS that you specify.
  • #324: Flintrock now supports S3 URLs as a download source for Hadoop or Spark. This makes it easy to host your own copies of the Hadoop and Spark release builds in a private bucket.

Changed

  • #285: Flintrock now configures cluster nodes to use private IP addresses for internal communication. This should improve the reliability of cluster launches and restarts.
  • #304: Fixed a bug in how UserData scripts are submitted to new cluster slaves.
  • #311: Changed how Flintrock manages its own security groups to reduce the likelihood of hitting any limits on the number of rules per security group.
  • #326: Switched some internals from using host names to IP addresses, which should improve Flintrock's behavior when running from an EC2 host.
  • #329: Dropped support for Python 3.5 and added automated testing for Python 3.8 and 3.9.
  • #334: Flintrock now ensures that python3 is available on launched clusters and sets that as the default Python that PySpark will use.

1.0.0 - 2020-01-11

Changed

  • #297: Dropped support for Python 3.4.
  • #252: Flintrock now pins all its transitive dependencies via the files under requirements/. This is useful for users who want to build Flintrock themselves.

0.11.0 - 2018-12-02

Changed

  • #258, #268: Fixed up support for Python 3.7.
  • #264: Fixed a logging error in flintrock describe --master-hostname-only.
  • #277: Fixed a bug in resolving client IP addresses from behind proxy.

0.10.0 - 2018-07-15

Added

  • #242: Flintrock is now available on Homebrew:
    brew install flintrock
    
    This is a community-supported distribution.

Changed

  • #224: Fixed a problem with some Flintrock config combinations related to Hadoop.
  • #232: When you destroy a cluster, Flintrock now waits until the instances are completely terminated before returning.
  • #234: Flintrock now tries more times by default to connect via SSH, which should provide more launch stability in certain environments.
  • #246: Fixed some bugs with flintrock describe that are exposed when a cluster is transitioning states (e.g. from running to terminated).
  • #249: Flintrock now downloads both Spark and Hadoop from Apache mirrors by default. This is a significant change. You can read the background on what prompted this change in #238.
  • #254: Flintrock no longer configures hadoop-aws automatically due to version incompatibilities that are difficult to resolve automatically. Instead, the README now provides additional guidance on using s3a://.
  • #259: Flintrock now correctly ignores tiny devices that show up on some instance types, like the M5 series on EC2. This fixes the problems Flintrock had getting HDFS to work on those instance types.

0.9.0 - 2017-08-06

Added

  • #178: You can now see additional output during launch and other operations with the new --debug option.
  • #185: Added a new mount point under /media/tmp that can be used when /tmp is not big enough.
  • #186: You can now tag your clusters with arbitrary tags on launch using the new --ec2-tag option. (Remember: As with all options, you can also set this via flintrock configure.)
  • #191: You can now specify the size of the root EBS volume with the new --ec2-min-root-ebs-size-gb option.
  • #181: You can now set the number of executors per worker with --spark-executor-instances.

Changed

  • #195: After launching a new cluster, Flintrock now shows the master address and login command.
  • #196, #197: Fixed some bugs that were preventing Flintrock from launching Spark clusters at a specific commit.
  • #204: Flintrock now automatically retries starting the Spark and HDFS masters if it encounters common issues with bringing the cluster up. This greatly improves launch and restart reliability.
  • #208: Flintrock now provides a hint with possible causes for certain SSH errors.

0.8.0 - 2017-02-11

Added

  • #180: Accessing data on S3 from your Flintrock cluster is now much easier! Just configure Flintrock to use Hadoop 2.7+ (which is the default) and an appropriate IAM role, and you'll be able to access paths on S3 using the new s3a:// prefix. Check the README for more information.
  • #176, #187: Flintrock now supports users with non-standard home directories.

Changed

  • #168: Flintrock now does a better job of cleaning up after interrupted operations.
  • #179, #184: Flintrock can now clean up malformed Flintrock clusters.
  • 6b426ae: We fixed an issue affecting some users of Flintrock's standalone package that caused Flintrock to intermittently throw ImportErrors.

0.7.0 - 2016-11-15

Added

  • #146: Flintrock now ensures that launched clusters have Java 8 or higher installed.
  • #149: You can now specify an EC2 user data script to use on launch with the new --ec2-user-data option.

Changed

  • #154, #155, #156: Flintrock now provides friendly error messages when it encounters common configuration or setup problems.

0.6.0 - 2016-08-28

Added

  • #115: Flintrock can now resize existing clusters with the new add-slaves and remove-slaves commands.

Changed

  • #115: If you lost your master somehow, Flintrock can now still destroy the cluster.
  • #115: You can no longer launch clusters with 0 slaves. The implementation was broken. We may fix and add this capability back in the future.

0.5.0 - 2016-07-20

Added

  • #118: You can now specify --hdfs-download-source (or the equivalent in your config file) to tell Flintrock to download Hadoop from a specific URL when launching your cluster.
  • #125: You can now specify --spark-download-source (or the equivalent in your config file) to tell Flintrock to download Spark from a specific URL when launching your cluster.
  • #112: You can now specify --ec2-security-group to associate additional security groups with your cluster on launch.

Changed

  • #103, #114: Flintrock now opens port 6066 and 7077 so local clients like Apache Zeppelin can connect directly to the Spark master on the cluster.
  • #122: Flintrock now automatically adds executables like spark-submit, pyspark, and hdfs to the default PATH, so they're available to call right when you login to the cluster.

0.4.0 - 2016-03-27

Added

  • #98, #99: You can now specify latest for --spark-git-commit and Flintrock will automatically build Spark on your cluster at the latest commit. This feature is only available for Spark repos hosted on GitHub.
  • #94: Flintrock now supports launching clusters into non-default VPCs.

Changed

  • #86: Flintrock now correctly catches when spot requests fail and bubbles up an appropriate error message.
  • #93, #97: Fixed the ability to build Spark from git. (It was broken for recent commits.)
  • #96, #100: Flintrock launches should now work correctly whether the default Python on the cluster is Python 2.7 or Python 3.4+.

0.3.0 - 2016-02-14

Changed

  • eca59fc, 3cf6ee6: Tweaked a few things so that Flintrock can launch 200+ node clusters without hitting certain limits.

0.2.0 - 2016-02-07

Added

  • b00fd12: Added --assume-yes option to the launch command. Use --assume-yes to tell Flintrock to automatically destroy the cluster if there are problems during launch.

Changed

  • #69: Automatically retry Hadoop download from flaky Apache mirrors.
  • 0df7004: Delete unneeded security group after a cluster is destroyed.
  • 244f734: Default HDFS not to install. Going forward, Spark will be the only service that Flintrock installs by default. Defaults can easily be changed via Flintrock's config file.
  • de33412: Flintrock installs services, not modules. The terminology has been updated accordingly throughout the code and docs. Update your config file to use services instead of modules. Warning: Flintrock will have problems managing existing clusters that were launched with versions of Flintrock from before this change.
  • #73: Major refactoring of Flintrock internals.
  • #74: Flintrock now catches common configuration problems upfront and provides simple error messages, instead of barfing out errors from EC2 or launching broken clusters.
  • bf766ba: Fixed a bug in how Flintrock polls SSH availability from Linux. Cluster launches now work from Linux as intended.

0.1.0 - 2015-12-11

  • Initial release.