We can work in python in various databases. In SQLite, MySQL, PostgreSQL. This contribution will create the basic SQLite database, execute sample SQL syntax and create database schema in PyCharm. SQLite3 is fast, lightweight, self-contained, serverless, zero-configuration and transactional database engine. In SQLite is the database stored in a single disk file. It works very good for database management, SQL commands or PyCharm operations.
1. Database management — Create a basic database with 8 tables
Sample csv files are uploaded here: https://www.kaggle.com/datasets/spribylova/sqlite
table t11 — customers deposit transactions, while the second table contains
table t12 — aggregated financial activity
tables t13 — t18 — dimension (reference) tables
You can first install the obvious libraries:
import pandas as pd
import datetime
import numpy as np
The following commands will create SQLite .db database, connection and cursor. my_data.db database will store our tables. To work with our my_data.db database we need to establish and maintain the connection con. We create cursor object, which will work as communication agent between the connection and SQL query command.
from pathlib import Path
Path('my_data.db').touch()
import sqlite3
con = sqlite3.connect('my_data.db')
cur = con.cursor()
sqlite3.sqlite_version
We can make sure, that my_data.db database is empty and drop st tables if they already exist.
cur.execute("DROP TABLE st11")
cur.execute("DROP TABLE st12")
cur.execute("DROP TABLE st13")
cur.execute("DROP TABLE st14")
cur.execute("DROP TABLE st15")
cur.execute("DROP TABLE st16")
cur.execute("DROP TABLE st17")
cur.execute("DROP TABLE st18")
First we create data frames and read provided csv files.
t11 = pd.read_csv("t11.csv", index_col=0, parse_dates=True)
t12 = pd.read_csv("t12.csv", index_col=0, parse_dates=True)
t13 = pd.read_csv("t13.csv", index_col=0, parse_dates=True)
t14 = pd.read_csv("t14.csv", index_col=0, parse_dates=True)
t15 = pd.read_csv("t15.csv", index_col=0, parse_dates=True)
t16 = pd.read_csv("t16.csv", index_col=0, parse_dates=True)
t17 = pd.read_csv("t17.csv", index_col=0, parse_dates=True)
t18 = pd.read_csv("t18.csv", index_col=0, parse_dates=True)
Original csv is stored in stacked (pivot) record format and we must reset index first for all new data frames.
t11.reset_index(inplace=True)
t12.reset_index(inplace=True)
t13.reset_index(inplace=True)
t14.reset_index(inplace=True)
t15.reset_index(inplace=True)
t16.reset_index(inplace=True)
t17.reset_index(inplace=True)
t18.reset_index(inplace=True)
New Data Frames with new indexes can be uploaded into SQLite format using defined connection con.
t11.to_sql('st11', con, index=False)
t12.to_sql('st12', con, index=False)
t13.to_sql('st13', con, index=False)
t14.to_sql('st14', con, index=False)
t15.to_sql('st15', con, index=False)
t16.to_sql('st16', con, index=False)
t17.to_sql('st17', con, index=False)
t18.to_sql('st18', con, index=False)
We can install the following libraries, they can be explored.
!pip install configparser
!pip install pydot
!pip install graphviz
!pip install eralchemy
!pip install sqlalchemy
!pip install sqlalchemy_schemadisplay
from sqlalchemy import MetaData
from sqlalchemy_schemadisplay import create_schema_graph
from sqlalchemy import create_engine
engine = create_engine('sqlite:///my_data.db') # using relative path
meta = MetaData(bind=engine)
engine.connect()
print(engine)
After we created database with our tables, we want to list all table names in connected my_data.db database.
x=cur.execute("SELECT distinct name FROM sqlite_master where type='table'")
for y in x.fetchall():
print(y)
('st11',)
('st12',)
('st13',)
('st14',)
('st15',)
('st16',)
('st17',)
('st18',)
2. SQL Command Syntax Examples
SQL syntax is similar to any other SQL query analyzer. In Python console ( Colaboratory, Jupyter, .. ) we only include SQL query into the SQLite connection nest:
pd.read_sql_query(“””
…..
“””, con)
2.1.
Calculate daily turnover and accounting revenue for each brand on each product in the first 6 days of the year.
pd.read_sql_query("""
SELECT
st12.mk_Calendar,
st13.brandName,
st16.ProductName,
sum(st12.turnover_EUR),
sum(st12.totalAccountingRevenue_EUR)
FROM st12 inner join st13
on
st12.mk_Customer=st13.mk_customer
inner join st16
on
st12.mk_Provider=st16.mk_Provider
where st12.mk_Calendar between '2019-01-01' and '2019-01-07'
GROUP BY 1,2,3
""", con)
2.2.
Calculate daily actives (unique) for each device in each market separately during the first 6 days of the year.
pd.read_sql_query("""
SELECT
st12.mk_Calendar,
st13.CustomerMarketName,
st17.channelname,
sum(st12.isCutomerActive)
FROM st12 inner join st13
on
st12.mk_Customer=st13.mk_customer
inner join st17
on
st12.mk_device=st17.mk_device
where st12.mk_Calendar between '2019-01-01' and '2019-01-07'
GROUP BY 1,2,3
""", con)
2.3.
Calculate successful depositors and deposits for each payment method in Brand A during the first 6 days of the year.
pd.read_sql_query("""
SELECT
st13.CustomerMarketName,
st14.PaymentMethodName,
sum(st11.amount_EUR),
count(distinct st13.mk_customer)
FROM
st12 inner join st13
on
st12.mk_Customer=st13.mk_customer
inner join st11
on
st11.mk_Customer=st12.mk_Customer
inner join st14
on
st11.mk_PaymentMethod=st14.mk_PaymentMethod
where
st13.brandName like 'Brand A'
and st11.mk_paymentstatus=2
and st12.mk_Calendar between '2019-01-01' and '2019-01-07'
GROUP BY 1,2
""", con)
3. Create ER diagram of new sqlite database in PyCharm
PyCharm application can be downloaded and installed from here:
We can upload my_data.db file to PyCharm. PyCharm creates the schema automated.
References :
Comments