Table size in PostgreSQL

Disk space seems endless, until you run out and/or have to pay the bill. The question is how to find tables with a high disk storage usage and with the query below it shows the table and index size, but also the size of TOAST data for PostgreSQL.

SELECT schemaname, tablename,
  pg_size_pretty(tsize) AS size_table,
  pg_size_pretty(size) AS size_index,
  pg_size_pretty(total_size) AS size_total
FROM (SELECT *,
        pg_table_size(schemaname||'.'||tablename) AS tsize,
        pg_relation_size(schemaname||'.'||tablename) AS size,
        pg_total_relation_size(schemaname||'.'||tablename) AS total_size
      FROM pg_tables) AS TABLES
WHERE schemaname='public'
ORDER BY total_size DESC;

After running this query on the development schema and exporting the results to CSV, we can see that a ManyToMany table consumes a total of 39 MB. With over 330.000 entries this seems numbers seem to be fine as the table size is in line with the amount of data stored in it.

schemaname,tablename,size_table,size_index,size_total
public,domain_asset_domain_asset_group,12 MB,12 MB,39 MB
public,domain_account_domain_function,2960 kB,2936 kB,9720 kB
public,domain_account,1760 kB,1728 kB,4088 kB
public,domain_command,2016 kB,1992 kB,3528 kB
public,person,832 kB,792 kB,1736 kB
public,domain_command_collection,712 kB,688 kB,1248 kB
public,domain_asset_group,648 kB,624 kB,1160 kB
public,domain_asset,544 kB,520 kB,1088 kB
public,domain_function,440 kB,416 kB,784 kB
public,sessions,64 kB,32 kB,80 kB
public,asset_application,8192 bytes,8192 bytes,56 kB
public,domain_authority,8192 bytes,8192 bytes,40 kB
public,asset_function,8192 bytes,0 bytes,24 kB

Collecting this data and graphing it may help spot problems and predict storage needs. It may help DevOps teams to figure out if their databases are growing and with what speed.

Increasing Inotify Watches Limit

After upgrading to PyCharm 2017.2 the notice came that inotify value was too low and the IDE would fallback to recursive directory scanning. For now the following commands increase the inotify limit to 512k of files.

$ cat <<EOF | sudo tee /etc/sysctl.d/idea.conf
fs.inotify.max_user_watches = 524288
EOF
$ sudo sysctl -p --system
...
* Applying /etc/sysctl.d/idea.conf ...
fs.inotify.max_user_watches = 524288
...

It is still interesting why PhpStorm wasn’t complaining while the Symfony projects are much larger.

Massive file update with sed

Recently I generated kickstart files for a virtual environment where people could experiment and virtual machines could be rebuild quickly. Sadly enough a typo slipped into the generated files that would make the anaconda installer stop. Every kickstart file could be corrected by hand off course, but one sed command could also correct the typo in all files in one go.

$ sed -i 's/namesever/nameserver/' *.ks

The Unix toolkit is full of handy tools and options like this and it pays to get to know your environment. Specially when it is your work environment and you’re familiar with the Unix philosophy.

Starting to stop SQL-injections, part 2

In a previous posting I gave an example on how to make database queries safer by using parameter binding and basically stopping SQL-injections. The next step is to make the code more readable and maintainable. This doesn’t sound a like a priority for secure software development, but readable code is also code that can be verified and maintained by other people. It gives you the edge to debug problems quickly and invites others to supply patches. So let take the example where the previous posting ended.

$sth = $dbh_rw->prepare('select userid from accounts where username = ?');
$sth->execute(array($form_username));

For one or two parameters this may work, but when queries become bigger you need to start counting and counting beyond three is a bad idea in most cases. So let change the question mark with a named variable called ‘:username’ in this example. One could then use the function bindParam() to specify which named variable needs to be replaced and has additional features, but in this example we use the standard binding during in execute phase.

$sth = $dbh_rw->prepare('select userid from accounts where username = :username');
$sth->execute(array(':username'=>$form_username));

Please remember to use a named variable only once in a SQL-query as it will only be replaced one time and not multiple times.

Starting to stop SQL-injections

In a lot of PHP-examples strings are concatenated before a database query is being executed as below. Some examples advise to use PHP-functions mysql_real_escape_string() and/or addslashes() to make database query safe against SQL-injections. But this isn’t really a solution as when using addslashes() also requires the use of stripslashes() after retrieving data from a database. Some sites show the lack of a proper implementation and show the famous \’ string in a website.

$sth = $dbh_rw->prepare('select userid from accounts where username = "'.$form_username.'"');
$sth->execute();

Like in Perl with DBI, also PHP has PDO that allows for variables to be parameterised while executing a query as in the example below. This removes the need for home made solutions that don’t cover all usecases and allows for a way to provide a stable and more secure interface for your applications when communicating with databases.

$sth = $dbh_rw->prepare('select userid from accounts where username = ?');
$sth->execute(array($form_username));

This doesn’t stop the need of sanitizing variables like as with input from users.