dask_cassandra_loader.loader module¶

class dask_cassandra_loader.loader.Connector(cassandra_clusters, cassandra_keyspace, username, password)¶

Bases: object

It sets and manages a connection to a Cassandra Cluster.

shutdown()¶

Shutdowns the existing connection with a Cassandra cluster.

> shutdown()

exception dask_cassandra_loader.loader.DaskCassandraLoaderException¶

Bases: Exception

Raise when the DaskCassandraLoader fails.

class dask_cassandra_loader.loader.Loader¶

Bases: object

A loader to populate a Dask Dataframe with data from a Cassandra table.

connect_to_cassandra(cassandra_clusters, cassandra_keyspace, username, password)¶

Connects to a Cassandra cluster specified by a list of IPs.

> connect_to_cassandra(‘test’, [‘10.0.1.1’, ‘10.0.1.2’])

Parameters:	cassandra_keyspace – It is a string which contains an existent Cassandra keyspace. cassandra_clusters – It is a list of IPs with each IP represented as a string. username – It is a string. password – It is a string.

connect_to_dask(dask_cluster)¶

Connect to a Dask Cluster

> connect_to_Dask(‘127.0.0.1:8786’) or > connect_to_Dask(cluser)

Parameters:	dask_cluster – String with format url:port or an instance of Cluster

connect_to_local_dask()¶

Connects to a local Dask cluster.

> connect_to_local_dask()

disconnect_from_cassandra()¶

Ends the established Cassandra connection.

> disconnect_from_cassandra()

disconnect_from_dask()¶

Ends the established Dask connection.

> disconnect_from_dask()

load_cassandra_table(table_name, projections, and_predicates, partitions_to_load, force=False)¶

It loads a Cassandra table into a Dask dataframe.

> load_cassandra_table(‘tab1’,: [‘id’, ‘year’, ‘month’, ‘day’], [(‘month’, ‘less_than’, [1]), (‘day’, ‘in_’, [1,2,3,8,12,30])], [(‘id’, [1, 2, 3, 4, 5, 6]), (‘year’,[2019])])

Parameters:

table_name – It is a String.
projections – A list of columns names. Each column name is a String.
and_predicates – List of triples. Each triple contains column name as String, operator name as String, and a list of values depending on the operator. CassandraOperators.print_operators() prints all available operators. It should only contain columns which are not partition columns.
partitions_to_load – List of tuples. Each tuple as a column name as String. and a list of keys which should be selected. It should only contain columns which are partition columns.
force – It is a boolean. In case all the partitions need to be loaded, which is not recommended, it should be set to ‘True’. By Default it is set to ‘False’.

class dask_cassandra_loader.loader.LoadingQuery¶

Bases: object

Class to define a SQL select statement over a Cassandra table.

build_query(table)¶

It builds and compiles the query which will be used to load data from a Cassandra table into a Dask Dataframe.

> build_query(table)

Parameters:	table – Instance of CassandraTable.

drop_projections()¶

It drops the list of columns to be projected, i.e., selected.

> drop_projections()

static partition_elimination(table, partitions_to_load, force)¶

It does partition elimination when by selecting only a range of partition key values.

> partition_elimination( table, [(id, [1, 2, 3, 4, 5, 6]), (‘year’,[2019])] )

Parameters:	table – Instance of a CassandraTable partitions_to_eliminate – List of tuples. Each tuple as a column name as String and a list of keys which should be selected. It should only contain columns which are partition columns. force – It is a boolean. In case all the partitions need to be loaded, which is not recommended, it should be set to ‘True’.

print_query()¶

It prints the query which will be used to load data from a Cassandra table into a Dask Dataframe.

> print_query()

remove_and_predicates()¶

It drops the list of predicates with ‘and’ clause over the non partition columns of a Cassandra’s table.

> remove_and_predicates()

set_and_predicates(table, predicates)¶

It sets a list of predicates with ‘and’ clause over the non partition columns of a Cassandra’s table.

> set_and_predicates(table, [(‘month’, ‘less_than’, 1), (‘day’, ‘in_’, [1,2,3,8,12,30])])

Parameters:	table – Instance of class CassandraTable. predicates – List of triples. Each triple contains column name as String, operator name as String, and a list of values depending on the operator. CassandraOperators.print_operators() prints all available operators. It should only contain columns which are not partition columns.

set_projections(table, projections)¶

It set the list of columns to be projected, i.e., selected.

> set_projections(table, [‘id’, ‘year’, ‘month’, ‘day’])

Parameters:	table – Instance of class CassandraTable projections – A list of columns names. Each column name is a String.

class dask_cassandra_loader.loader.Operators¶

Bases: object

Operators for a valida SQL select statement over a Cassandra Table.

static create_predicate(table, col_name, op_name, values)¶

It creates a single predicate over a table’s column using an operator. Call CassandraOperators.print_operators(): to print all available operators.

> create_predicate(table, ‘month’, ‘les_than’, 1)

Parameters:	table – Instance of CassandraTable. col_name – Table’s column name as string. op_name – Operators name as string. values – List of values. The number of values depends on the operator.

print_operators()¶

Print all the operators that can be used in a SQL select statement over a Cassandra’s table.

> print_operators()

class dask_cassandra_loader.loader.PagedResultHandler(future)¶

Bases: object

An handler for paged loading of a Cassandra’s query result.

handle_error(exc)¶: It handles and exception. > handle_error(exc) :param exc: It is a Python Exception. :return:

handle_page(rows)¶: It pages the result of a Cassandra query. > handle_page(rows) :param rows: Cassandra’s query result. :return:

class dask_cassandra_loader.loader.Table(keyspace, name)¶

Bases: object

It stores and manages metadata and data from a Cassandra table loaded into a Dask DataFrame.

load_data(cassandra_connection, ca_loading_query)¶

It defines a set of SQL queries to load partitions of a Cassandra table in parallel into a Dask DataFrame.

> load_data( cassandra_con, ca_loading_query)

Parameters:	cassandra_connection – Instance of CassandraConnector. ca_loading_query – Instance of CassandraLoadingQuery.

load_metadata(cassandra_connection)¶

It loads metadata from a Cassandra Table. It loads the columns names, partition columns, and partition columns keys.

> load_metadata( cassandra_con)

Parameters:	cassandra_connection – It is an instance from a CassandraConnector

print_metadata()¶

It prints the metadata of a CassandraTable.

> print_metadata()