3.0 Release Notes

Koverse version 3.0 includes an update to underlying components including Spark, Hadoop, and Kafka. This will allow customers to use the latest improvements in those components, including a lot more built-in machine learning algorithms in Spark 2.

Speaking of machine learning, developers can now output Spark DenseVectors and SparseVectors directly to a Koverse data set. This makes it easy to store features for machine learning algorithms and storing machine learning models that consist of vectors.

Koverse 3.0 also features an all-new Transform SDK that makes it easy to create reusable analytics by writing Koverse Transforms that can take advantage of the new APIs in Spark 2, including the DataSet API. See the new Transform API for details.

Developers who are building web applications should note that all of our REST methods have been renamed to be more consistent and now support versioning.

New Features

  • [KC-5699] - Upgrade components to match HDP 3.1
  • [KC-5685] - Create a new Transform SDK including DataSets, DataFrames, and Java and Scala RDDs
  • [KC-5592] - Store Spark vectors in Koverse records directly

Improvements

  • [KC-4877] - Java 8 now required
  • [KC-5591] - Provide a build that works with Spark 2.3
  • [KC-5682] - Upgrade to Hadoop 3.1.1
  • [KC-5691] - Upgrade to Zookeeper 3.4.6
  • [KC-5700] - Upgrade to Hive 3.1.0
  • [KC-5701] - Upgrade to Kafka 2.0.0
  • [KC-5704] - Make REST methods consistent

Bug Fixes

3.0.1

  • [KC-5734] - koverse-spark-datasource for 3.0 is missing htrace dependency
  • [KC-5761] - Add Spark jars to classpath of all jobs

3.0.2

  • [KC-5821] - Classpath Conflict
  • [KC-5788] - Issues with Schema jobs
  • [KC-5786] - Error when trying to update job status
  • [KC-5770] - Old Transformers issues
  • [KC-5769] - Input Transform Widget sometimes didn’t display correctly
  • [KC-5767] - Issues with DataFlowThriftServer on startup

3.0.3

  • [KC-5842] - Restore ability to reference input datasets by ID in old SDK

3.0.4

  • [KC-5867] - Linked multiple datasets in DataFlow UI
  • [KC-5883] - Fix issues with Python transforms
  • [KC-5816] - Fix issues with DataSet transforms for scala users
  • [KC-5850] - Field options now populating correctly in the transform forms
  • [KC-5735] - Jobs scheduler now respects the schedule after a server restart
  • [KC-5711] - FileBasedSink implementations no longer get duplicate filenames
  • [KC-5590] - Improve data set listing time
  • [KC-5549] - Remove logging of message when there is none associated with the exception in pyspark driver
  • [KC-5824] - Fix Export issue in UI