Python Package
Installation
pip install lingodb
Basic API Usage
import lingodb
# create connection
con = lingodb.create_in_memory()
# prints result as Apache Arrow table
print(con.sql("select 42"))
# pyarrow.Table
#: int32
#----
#: [[42]]
# create table and insert rows
con.sql_stmt("create table t (x bigint, y varchar(30) not null, z bigint not null, primary key (x))")
con.sql_stmt("insert into t(x, y, z) values (1,'foo',42), (2,'bar',7)")
# execute query on table
result = con.sql("select * from t where y='foo'")
# convert to pandas using pyarrow
df = result.to_pandas()
print(df)
# x y z
# 0 1 foo 42
Creating a Connection Object
A connection object can be created through two methods:
lingodb.create_in_memory()
: creates a connection to an empty in-memory database, or by usinglingodb.connect_to_db(DB_DIR)
: creates a connection to an database instance stored in theDB_DIR
folder.
Executing SQL statements
SQL statements such as CREATE TABLE
, INSERT INTO
, COPY FROM
and SET ...
can be executed using the sql_stmt
method.
For example, if we want the database to be persistent (written back into the database directory), the following statement should be executed:
con = lingodb.connect_to_db("test-dir")
con.sql_stmt("SET persist=1")
Executing SQL queries
Read-only SQL queries can be executed using the sql
method.
It will return an arrow table that can be used arbitrarily and e.g. converted to pandas.
print(con.sql("select * from t").to_pandas())
Executing MLIR modules
Similarly, also raw MLIR modules can be executed using the mlir
method and produce an arrow table.
# select count(*) from t where x>2
con.mlir("""module {
func.func @main() {
%0 = relalg.basetable {table_identifier = "t"} columns: { x => @t::@x({type = i64})}
%1 = relalg.selection %0 (%arg0 : !tuples.tuple){
%4 = db.constant(2) : i64
%5 = tuples.getcol %arg0 @t::@x : i64
%6 = db.compare gt %5 : i64, %4: i64
tuples.return %6 : i1
}
%2 = relalg.aggregation %1 [] computes : [@aggr0::@tmp_attr0({type = i64})] (%arg0: !tuples.tuplestream,%arg1: !tuples.tuple){
%4 = relalg.count %arg0
tuples.return %4 : i64
}
%3 = relalg.materialize %2 [@aggr0::@tmp_attr0] => ["count"] : !subop.result_table<[count: i64]>
subop.set_result 0 %3 : !subop.result_table<[count: i64]>
return
}
}
""")
Querying PyArrow Tables/Pandas DataFrames
Apache Arrow tables can be imported using the add_table
method. This also enables querying of pandas dataframes through pyarrow.
import pandas as pd
df = pd.DataFrame(data={'col1': [1, 2, 3, 4], 'col2': ["foo", "foo", "bar", "bar"]})
con = lingodb.create_in_memory()
#convert data frame to pyarrow table
arrow_table=pa.Table.from_pandas(df)
con.add_table("df",arrow_table)
#query just as any other table
print(con.sql("select * from df").to_pandas())
Furthermore, the rows of an pyarrow table can be added to an existing table (import) using the append_table
method:
con.append_table("df",arrow_table)