The theoretical and practical aspects of database management and querying.
ALTER TABLE
, 1 pointSELECT
, LIMIT
and OFFSET
in queries, 1 pointDISTINCT
, 1 pointWHERE
, 1 pointLIKE
, 1 pointLEFT JOIN
, 1 pointGROUP BY
, 1 pointCAST AS
to change data types, 1 pointHAVING TO
to filter grouped data, 1 pointSELECT
under another SELECT
, 1 pointBETWEEN
to filter data within a range, 1 pointCREATE VIEW
to pack a query into a named object, 1 pointSQLite
VSCode
extension by alexcvzz, which suffice for the purpose of this module.sqlalchemy
(additional lessons to SQLAlchemy)
which acts as an Object Relational Mapper (ORM), but there are also libraries specific to the database:
psycopg2
mysql-connector-python
or mysqlclient
Using pandas
Using sqlalchemy
Using psycopg2
sqlite
database you created from the CSV files:
metadata
table: from idx_subsector_metadata_080824.csv
reports
table: from idx_sector_reports_080824.csv
sectors.db
ALTER TABLE
and RENAME COLUMN
01_rename.sql
. Use ALTER TABLE <table_name> RENAME COLUMN <existing_name> TO <new_name>
to make alterations to the column in metadata
named sub_sector_id
. Rename it to sub_sector_uid
.
SQLite: New Query
SQLite: Run Query
or use the shortcut (CTRL + SHIFT + Q)PRAGMA table_info(metadata);
to see the column names.SELECT
, LIMIT
and OFFSET
in queries02_select.sql
.
Use SELECT
to query the reports
table and return the first 10 rows with the following columns:
sub_sector
total_companies
filtered_median_pe
OFFSET
clause to your query to get the first 10 rows after the first 5 rows (skipping the first 5 rows).
DISTINCT
and as
aliasing03_distinct.sql
.
Use SELECT
to query the metadata
table and return the first 10 unique sectors in the table.
Add a DISTINCT
clause to your query to return only unique values in the
sector
column. This will remove any duplicate values from the result set.
Name the column Indonesian Sectors
using the AS
keyword.
LIMIT
clause to return only the first 10 rows.
WHERE
04_where.sql
.
From the reports
table, select only rows where the sub-sector has
at least 15 companies and the average market cap of that sector (avg_market_cap
) is lesser than 500_000_000_000 (“500 billion”).
Return the following columns:
sub_sector
total_companies
avg_market_cap
AND
keyword.
LIKE
.05_like.sql
.
In SQL, you can use wildcards for constructing patterns.
%
wildcard matches any sequence of zero or more characters.
_
wildcard matches any single character.
b%
matches bank
, bytes
, bit
%ion
matches institution
, renovation
%it% matches any string containing
it`
b_nk
matches bank
, bunk
l__n
matches loan
, lean
PRAGMA case_sensitive_like = true
;
Select all rows from the metadata
table where the sub_sector
column contains the word Goods
or Services
(both case insensitive),
regardless of the position of these words in the string.
Return all columns.
LEFT JOIN
06_left_join.sql
.
Use a LEFT JOIN
to join the metadata
and reports
tables on the sub_sector
column.
Return the following columns:
sector
sub_sector
total_companies
avg_market_cap
GROUP BY
07_group_by.sql
.
This file should contain two queries.
For the first query, use COUNT
to return the number of rows in each group
of sector
in the metadata
table. This gives us an idea of how many sub sectors
are in each sector.
Name the columns sector
and total_sub_sectors
.
AVG
to return the average number of companies of each sub sector by
grouping on the sector
column.
Name the columns sector
, sub_sector
, total_companies
, and avg_sub_sector_companies
.
This query should be similar to the one in your LEFT JOIN
exercise
since you will need to join the two tables to get the total_companies
column.
CAST AS
and ORDER BY
CAST AS
to cast the column values from character to either integer
or decimal
.
Create an eighth file and name it 08_cast_as.sql
.
Use CAST AS
to change the total_companies
and avg_market_cap
columns in the reports
table to an integer.
Finally, use ORDER BY
to sort the result set by the total_companies
column in descending order. Return
only the first ten rows and the following columns:
sub_sector
total_companies
as companies_numavg_market_cap
as market_cap_averageHAVING TO
to filter grouped dataHAVING
has to be used with GROUP BY
. They act similarly to how WHERE
works on rows, except HAVING
filters on the group levels.
Create a ninth file and name it 09_having.sql
.
Perform the second exercise in the seventh file again, but this
time return only sectors where the avg_sub_sector_companies
is greater than 30.
SELECT
under anotherSELECT
statement.
Create a tenth file and name it 10_subquery.sql
.
First create a subquery that returns all rows from metadata
where the sector is either Financials
or Technology
(case-sensitive!).
Then use this subquery to filter the reports
table, so the returned set
only contains rows where the sub_sector
is in the (earlier) subquery result.
Return the following columns in your query:
sub_sector
min_company_pe
max_company_pe
BETWEEN
to filter data within a range11_between.sql
.
Your query should be similar to the one in the tenth file above,
with a small adjustment.
metadata
where the sector
is either Financials
or Infrastructures
and when the sub_sector_uid
is between 3 and 20.avg_yoy_q_revenue_growth
is between 0 and 0.3.BETWEEN
is inclusive on both ends, so BETWEEN 0 AND 17
includes 0 and 17.
Return the following columns:
sub_sector
avg_yoy_q_revenue_growth
min_company_pe
max_company_pe
sub_sector_uid
column after the renaming exercise.BETWEEN
, as you’d expect, expect two values separated by AND
and that these are
numbers or dates. You might have to cast the column to a number if it’s not already.CREATE VIEW
to pack a query into a named object12_create_view.sql
.
A view is the result set of a stored query. It is a virtual table you define
in the database that can be queried like a table, except that it
does not store data.
Once you have defined a view, you can query it like a table — it can include
columns from any number of tables, involving complex joins, filters and
aggregations.
We create a view with the CREATE VIEW
statement.
watchlist
from reports
where the following conditions are met:
avg_market_cap
is greater than the average of avg_market_cap
. This means we’re only watching
sectors with companies that are generally larger than the average.avg_yoy_q_revenue_growth
is greater than 0.1. This narrow our watchlist to sectors with companies
that are growing at a rate of at least 10% year-on-year.median_yield_ttm
greater than 0.03. This further narrows our watchlist to sectors with companies
that are, on average, paying out at least 3% worth of dividiend yield.PRAGMA
to print out the columns of the watchlist
view.