Table size in PostgreSQL

Disk space seems endless, until you run out and/or have to pay the bill. The question is how to find tables with a high disk storage usage and with the query below it shows the table and index size, but also the size of TOAST data for PostgreSQL.

SELECT schemaname, tablename,
  pg_size_pretty(tsize) AS size_table,
  pg_size_pretty(size) AS size_index,
  pg_size_pretty(total_size) AS size_total
FROM (SELECT *,
        pg_table_size(schemaname||'.'||tablename) AS tsize,
        pg_relation_size(schemaname||'.'||tablename) AS size,
        pg_total_relation_size(schemaname||'.'||tablename) AS total_size
      FROM pg_tables) AS TABLES
WHERE schemaname='public'
ORDER BY total_size DESC;

After running this query on the development schema and exporting the results to CSV, we can see that a ManyToMany table consumes a total of 39 MB. With over 330.000 entries this seems numbers seem to be fine as the table size is in line with the amount of data stored in it.

schemaname,tablename,size_table,size_index,size_total
public,domain_asset_domain_asset_group,12 MB,12 MB,39 MB
public,domain_account_domain_function,2960 kB,2936 kB,9720 kB
public,domain_account,1760 kB,1728 kB,4088 kB
public,domain_command,2016 kB,1992 kB,3528 kB
public,person,832 kB,792 kB,1736 kB
public,domain_command_collection,712 kB,688 kB,1248 kB
public,domain_asset_group,648 kB,624 kB,1160 kB
public,domain_asset,544 kB,520 kB,1088 kB
public,domain_function,440 kB,416 kB,784 kB
public,sessions,64 kB,32 kB,80 kB
public,asset_application,8192 bytes,8192 bytes,56 kB
public,domain_authority,8192 bytes,8192 bytes,40 kB
public,asset_function,8192 bytes,0 bytes,24 kB

Collecting this data and graphing it may help spot problems and predict storage needs. It may help DevOps teams to figure out if their databases are growing and with what speed.

Renaming database in PostgreSQL

Sometimes you have a system with legacy naming standards, but you really want to switch over to the new standard to keep all the scripting clean without some exceptions no one is going to remember in 12 months. Oracle had the command ALTER DATABASE, but since Oracle 10 you need to take the database offline and do some magic. MySQL got the RENAME DATABASE option with release 5.1.7 and lost the option again with release 5.1.23 as it was eating data.

Luckily PostgreSQL still has the command ALTER DATABASE so let rename a database and it’s owner. Before we start we need the password and then we need to login as the PostgreSQL superuser postgres or another account with similar privileges. So first we check the database name and owner.

postgres=# \l
                                    List of databases
     Name     |  Owner   | Encoding  |  Collation  |    Ctype    |   Access privileges   
--------------+----------+-----------+-------------+-------------+-----------------------
 dbu0001      | dbu0001  | UTF8      | en_US.UTF-8 | en_US.UTF-8 | 

Now we rename the database owner as we made a typo and we need to set the password again.

postgres=# alter user dbu0001 rename to dbo0001;
NOTICE:  MD5 password cleared because of role rename
ALTER ROLE
postgres=# alter user dbo0001 password 'yeaxaureiraeLohsh6deJ2ohngahpu9a';
ALTER ROLE

The second task is to rename the database to the correct name.

postgres=# alter database dbu0001 rename to dbs0001;
ALTER DATABASE

And basically we are now done as the ownership was already modified when we renamed the account with our first statement, but let check what PostgreSQL says it now has.

postgres=# \l
                                    List of databases
     Name     |  Owner   | Encoding  |  Collation  |    Ctype    |   Access privileges   
--------------+----------+-----------+-------------+-------------+-----------------------
 dbs0001      | dbo0001  | UTF8      | en_US.UTF-8 | en_US.UTF-8 | 

Please don’t forget to update the connection strings for applications using this database and maybe GRANTS that have been set and the pg_hba.conf file.

Starting to stop SQL-injections, part 2

In a previous posting I gave an example on how to make database queries safer by using parameter binding and basically stopping SQL-injections. The next step is to make the code more readable and maintainable. This doesn’t sound a like a priority for secure software development, but readable code is also code that can be verified and maintained by other people. It gives you the edge to debug problems quickly and invites others to supply patches. So let take the example where the previous posting ended.

$sth = $dbh_rw->prepare('select userid from accounts where username = ?');
$sth->execute(array($form_username));

For one or two parameters this may work, but when queries become bigger you need to start counting and counting beyond three is a bad idea in most cases. So let change the question mark with a named variable called ‘:username’ in this example. One could then use the function bindParam() to specify which named variable needs to be replaced and has additional features, but in this example we use the standard binding during in execute phase.

$sth = $dbh_rw->prepare('select userid from accounts where username = :username');
$sth->execute(array(':username'=>$form_username));

Please remember to use a named variable only once in a SQL-query as it will only be replaced one time and not multiple times.

Starting to stop SQL-injections

In a lot of PHP-examples strings are concatenated before a database query is being executed as below. Some examples advise to use PHP-functions mysql_real_escape_string() and/or addslashes() to make database query safe against SQL-injections. But this isn’t really a solution as when using addslashes() also requires the use of stripslashes() after retrieving data from a database. Some sites show the lack of a proper implementation and show the famous \’ string in a website.

$sth = $dbh_rw->prepare('select userid from accounts where username = "'.$form_username.'"');
$sth->execute();

Like in Perl with DBI, also PHP has PDO that allows for variables to be parameterised while executing a query as in the example below. This removes the need for home made solutions that don’t cover all usecases and allows for a way to provide a stable and more secure interface for your applications when communicating with databases.

$sth = $dbh_rw->prepare('select userid from accounts where username = ?');
$sth->execute(array($form_username));

This doesn’t stop the need of sanitizing variables like as with input from users.

PostgreSQL en NULL

Mijn docenten zullen wel de koude rillingen krijgen, maar soms heb je gewoon attribuut op je tupel staan die de geen waarde heeft. NULL is hier de algemene benaming voor zoals C-programmeurs die ook wel kennen. De grap komt hoe je tupels gaat selecteren die geen waarde hebben. Veel mensen zie je het volgende statement gebruiken en daarna zoeken waarom hun code fout loopt.

# SELECT * FROM TABLE WHERE column1 = NULL;

Het correcte statement hiervoor staat hieronder en de truuk zit hem in de IS. Je wilt geen vergelijking van de waarde van het attribuut, maar je wilt iets weten over de staat van het attribuut.

# SELECT * FROM TABLE WHERE column1 IS NULL;
# SELECT * FROM TABLE WHERE column1 IS NOT NULL;

De tweede regel is een klein toegift om te kijken of het attribuut niet NULL vertegenwoordigd. Zal de komende periode kijken of ik nog meer van dit soort leuke PostgreSQL dingetjes kan posten.