Greenplum Spark Connector - a new product for data exchange of Arenadata DB

ADB-Spark Connector is designed for fast and parallel information transmission between Spark and Arenadata DB. Previously, data export and import tasks were solved with Greenplum Platform Extension Framework (PXF) to some extent.

The connector uses Scala 2.11.x, 2.12.x, Twitter Finagle, and ScalikeJDBC, and runs on an HTTP server via the gpfdist protocol. Unlike other existing ADB exchange methods, this one enables parallel writing to Greenplum segments without Master participation, supports flexible partitioning when reading data from Greenplum to Spark, does not require installing the gpfdist utility on each Spark node, and offers other advantages.

To employ gpfdist, the Finagle framework is used, which demonstrated better performance (compared to the initially selected Akka HTTP) in case of multiple simultaneous sessions from ADB segments.

ADB-Spark Connector main functions include:

Reading data from Greenplum to Spark with various partitioning methods supported;
Writing data from Spark to Greenplum using several write modes: Append, Overwrite, and ErrorIfExists;
Push-down operator support;
Extracting additional metadata from Greenplum, including statistics and data distribution schemes;
Automatic data scheme generation;
Optimizing the count aggregate function execution.

Dmitry Pluzhnikov

Director of System Architecture Department at Arenadata

“Our solution will be useful to those customers who combine Arenadata Hadoop and Arenadata DB when building their corporate storages. ADB-Spark Connector enables fast bidirectional communication between them and therefore the most effective data reading and writing.”

Compared to Pivotal Spark-Greenplum connector, its closest commercially available rival, ADB-Spark Connector provides more flexible partitioning (five methods instead of two), offers more data types (including interval and array), and features extra functionality, such as support of Batch mode in Spark, statistics collection to build query plans using Catalyst, and arbitrary SQL query execution through an ADB Master node.

ADB-Spark Connector currently supports Spark 2.3.x and 2.4.x. The near-future plans include adding support for Spark 3.x and implementing streaming functionality.

Arenadata Unveils ADB-Spark Connector for Data Exchange between Greenplum and Spark

ADB-Spark Connector main functions include:

Related more

Arenadata Unveils ADB-Spark Connector for Data Exchange between Greenplum and Spark

ADB-Spark Connector main functions include:

Related more

Thank you for subscribtion

Находясь на нашем сайте, вы соглашаетесь с тем, что мы используем куки-файлы

Thank you for contacting us!

Start your digital journey today