top of page

Python - SQLite3 fundamentals in PyCharm

We can work in python in various databases. In SQLite, MySQL, PostgreSQL. This contribution will create the basic SQLite database, execute sample SQL syntax and create database schema in PyCharm. SQLite3 is fast, lightweight, self-contained, serverless, zero-configuration and transactional database engine. In SQLite is the database stored in a single disk file. It works very good for database management, SQL commands or PyCharm operations.


1. Database management — Create a basic database with 8 tables


Sample csv files are uploaded here: https://www.kaggle.com/datasets/spribylova/sqlite


table t11 — customers deposit transactions, while the second table contains

table t12 — aggregated financial activity

tables t13 — t18 — dimension (reference) tables


You can first install the obvious libraries:


import pandas as pd

import datetime

import numpy as np


The following commands will create SQLite .db database, connection and cursor. my_data.db database will store our tables. To work with our my_data.db database we need to establish and maintain the connection con. We create cursor object, which will work as communication agent between the connection and SQL query command.


from pathlib import Path

Path('my_data.db').touch()

import sqlite3

con = sqlite3.connect('my_data.db')

cur = con.cursor()

sqlite3.sqlite_version


We can make sure, that my_data.db database is empty and drop st tables if they already exist.


cur.execute("DROP TABLE st11")

cur.execute("DROP TABLE st12")

cur.execute("DROP TABLE st13")

cur.execute("DROP TABLE st14")

cur.execute("DROP TABLE st15")

cur.execute("DROP TABLE st16")

cur.execute("DROP TABLE st17")

cur.execute("DROP TABLE st18")


First we create data frames and read provided csv files.


t11 = pd.read_csv("t11.csv", index_col=0, parse_dates=True)

t12 = pd.read_csv("t12.csv", index_col=0, parse_dates=True)

t13 = pd.read_csv("t13.csv", index_col=0, parse_dates=True)

t14 = pd.read_csv("t14.csv", index_col=0, parse_dates=True)

t15 = pd.read_csv("t15.csv", index_col=0, parse_dates=True)

t16 = pd.read_csv("t16.csv", index_col=0, parse_dates=True)

t17 = pd.read_csv("t17.csv", index_col=0, parse_dates=True)

t18 = pd.read_csv("t18.csv", index_col=0, parse_dates=True)


Original csv is stored in stacked (pivot) record format and we must reset index first for all new data frames.


t11.reset_index(inplace=True)

t12.reset_index(inplace=True)

t13.reset_index(inplace=True)

t14.reset_index(inplace=True)

t15.reset_index(inplace=True)

t16.reset_index(inplace=True)

t17.reset_index(inplace=True)

t18.reset_index(inplace=True)


New Data Frames with new indexes can be uploaded into SQLite format using defined connection con.


t11.to_sql('st11', con, index=False)

t12.to_sql('st12', con, index=False)

t13.to_sql('st13', con, index=False)

t14.to_sql('st14', con, index=False)

t15.to_sql('st15', con, index=False)

t16.to_sql('st16', con, index=False)

t17.to_sql('st17', con, index=False)

t18.to_sql('st18', con, index=False)


We can install the following libraries, they can be explored.


!pip install configparser

!pip install pydot

!pip install graphviz

!pip install eralchemy


!pip install sqlalchemy

!pip install sqlalchemy_schemadisplay

from sqlalchemy import MetaData

from sqlalchemy_schemadisplay import create_schema_graph

from sqlalchemy import create_engine


engine = create_engine('sqlite:///my_data.db') # using relative path

meta = MetaData(bind=engine)

engine.connect()

print(engine)


After we created database with our tables, we want to list all table names in connected my_data.db database.


x=cur.execute("SELECT distinct name FROM sqlite_master where type='table'")

for y in x.fetchall():

print(y)


('st11',)

('st12',)

('st13',)

('st14',)

('st15',)

('st16',)

('st17',)

('st18',)


2. SQL Command Syntax Examples


SQL syntax is similar to any other SQL query analyzer. In Python console ( Colaboratory, Jupyter, .. ) we only include SQL query into the SQLite connection nest:


pd.read_sql_query(“””

…..

“””, con)


2.1.


Calculate daily turnover and accounting revenue for each brand on each product in the first 6 days of the year.


pd.read_sql_query("""

SELECT

st12.mk_Calendar,

st13.brandName,

st16.ProductName,

sum(st12.turnover_EUR),

sum(st12.totalAccountingRevenue_EUR)

FROM st12 inner join st13

on

st12.mk_Customer=st13.mk_customer

inner join st16

on

st12.mk_Provider=st16.mk_Provider

where st12.mk_Calendar between '2019-01-01' and '2019-01-07'

GROUP BY 1,2,3

""", con)


2.2.


Calculate daily actives (unique) for each device in each market separately during the first 6 days of the year.

pd.read_sql_query("""

SELECT

st12.mk_Calendar,

st13.CustomerMarketName,

st17.channelname,

sum(st12.isCutomerActive)

FROM st12 inner join st13

on

st12.mk_Customer=st13.mk_customer

inner join st17

on

st12.mk_device=st17.mk_device

where st12.mk_Calendar between '2019-01-01' and '2019-01-07'

GROUP BY 1,2,3

""", con)


2.3.


Calculate successful depositors and deposits for each payment method in Brand A during the first 6 days of the year.


pd.read_sql_query("""

SELECT

st13.CustomerMarketName,

st14.PaymentMethodName,

sum(st11.amount_EUR),

count(distinct st13.mk_customer)

FROM

st12 inner join st13

on

st12.mk_Customer=st13.mk_customer

inner join st11

on

st11.mk_Customer=st12.mk_Customer

inner join st14

on

st11.mk_PaymentMethod=st14.mk_PaymentMethod

where

st13.brandName like 'Brand A'

and st11.mk_paymentstatus=2

and st12.mk_Calendar between '2019-01-01' and '2019-01-07'

GROUP BY 1,2

""", con)


3. Create ER diagram of new sqlite database in PyCharm


PyCharm application can be downloaded and installed from here:


We can upload my_data.db file to PyCharm. PyCharm creates the schema automated.




References :


24 views0 comments

Recent Posts

See All

Python - Cryptographic algorithms

In order to examine the basic cryptographic mechanisms, I created the short python codes to generate the old encryption keys for basic...

Python - sktime

There are various libraries created for Python Time Series. Each of them has its own style, contributors and functions. Each library has...

Python - Basic regression comparison

Regression models are the principles of machine learning models as well. They help to understand the dataset distributions. The objective...

Comments


bottom of page