Table size in PostgreSQL

Disk space seems endless, until you run out and/or have to pay the bill. The question is how to find tables with a high disk storage usage and with the query below it shows the table and index size, but also the size of TOAST data for PostgreSQL.

SELECT schemaname, tablename,
  pg_size_pretty(tsize) AS size_table,
  pg_size_pretty(size) AS size_index,
  pg_size_pretty(total_size) AS size_total
FROM (SELECT *,
        pg_table_size(schemaname||'.'||tablename) AS tsize,
        pg_relation_size(schemaname||'.'||tablename) AS size,
        pg_total_relation_size(schemaname||'.'||tablename) AS total_size
      FROM pg_tables) AS TABLES
WHERE schemaname='public'
ORDER BY total_size DESC;

After running this query on the development schema and exporting the results to CSV, we can see that a ManyToMany table consumes a total of 39 MB. With over 330.000 entries this seems numbers seem to be fine as the table size is in line with the amount of data stored in it.

schemaname,tablename,size_table,size_index,size_total
public,domain_asset_domain_asset_group,12 MB,12 MB,39 MB
public,domain_account_domain_function,2960 kB,2936 kB,9720 kB
public,domain_account,1760 kB,1728 kB,4088 kB
public,domain_command,2016 kB,1992 kB,3528 kB
public,person,832 kB,792 kB,1736 kB
public,domain_command_collection,712 kB,688 kB,1248 kB
public,domain_asset_group,648 kB,624 kB,1160 kB
public,domain_asset,544 kB,520 kB,1088 kB
public,domain_function,440 kB,416 kB,784 kB
public,sessions,64 kB,32 kB,80 kB
public,asset_application,8192 bytes,8192 bytes,56 kB
public,domain_authority,8192 bytes,8192 bytes,40 kB
public,asset_function,8192 bytes,0 bytes,24 kB

Collecting this data and graphing it may help spot problems and predict storage needs. It may help DevOps teams to figure out if their databases are growing and with what speed.

Installing SSL-certificates on Debian

Installing and configuring SSL certificates is always an issue as how to create them and where to store them. Most of the time people can find the procedure on how to create them, but they forget all the places where they have placed them. Some initiatives exist to have centralized key stores on systems, but getting applications to use them is still a problem.

Also on Debian is this an issue and key material is all over the system if you’re not careful. Some Debian developers tried to fix it, but it ended in a “stalemate” and for now an additional package called ssl-cert exists to create self-signed certificates. This package also provides a structure for storing commercial certificates and accessing them in a safer way. So for we install the package ssl-cert.

$ sudo apt-get install ssl-cert

After installing the package the different files for the SSL-key can be placed in /etc/ssl/private and have the right permissions as shown in the output below. This to protect the key material from being use by unauthorized processes as most keys don’t have a passphrase.

$ sudo ls -l /etc/ssl/private
-r--r----- 1 root    ssl-cert 2766 Dec 12 13:06 www.example.org_ca.pem
-r--r----- 1 root    ssl-cert 1671 Dec 12 13:06 www.example.org.crt
-r--r----- 1 root    ssl-cert 1070 Dec 12 13:06 www.example.org.csr
-r--r----- 1 root    ssl-cert 6268 Dec 12 13:06 www.example.org_intermediate.pem
-r--r----- 1 root    ssl-cert 1675 Dec 12 13:06 www.example.org.key
-r--r----- 1 root    ssl-cert 3502 Dec 12 13:06 www.example.org.pem

The location and files can only be accessed by the root user or members of the group ssl-cert. Some applications as Apache startup under the root user and access the files before switching to the actual user like www-data on Debian. For those applications nothing is going to change, but for others like ejabberd that run completely under the ejabberd user somethings changes. Those users need to be made member of the group ssl-cert to read the files in /etc/ssl/private. Below two known services are made member of the group ssl-cert to read the certificates.

$ sudo usermod -a -G ssl-cert ejabberd
$ sudo usermod -a -G ssl-cert postgres
$ id -a ejabberd
uid=123(ejabberd) gid=125(ejabberd) groups=105(ssl-cert),125(ejabberd)
$ id -a postgres
uid=105(postgres) gid=108(postgres) groups=105(ssl-cert),108(postgres)

After checking of the modification was in affect as some servers use a Naming Service Caching Daemon the affected services need to be restarted. In this example both ejabberd and PostgreSQL need to restarted before the SSL certificates can be accesses.

Renaming database in PostgreSQL

Sometimes you have a system with legacy naming standards, but you really want to switch over to the new standard to keep all the scripting clean without some exceptions no one is going to remember in 12 months. Oracle had the command ALTER DATABASE, but since Oracle 10 you need to take the database offline and do some magic. MySQL got the RENAME DATABASE option with release 5.1.7 and lost the option again with release 5.1.23 as it was eating data.

Luckily PostgreSQL still has the command ALTER DATABASE so let rename a database and it’s owner. Before we start we need the password and then we need to login as the PostgreSQL superuser postgres or another account with similar privileges. So first we check the database name and owner.

postgres=# \l
                                    List of databases
     Name     |  Owner   | Encoding  |  Collation  |    Ctype    |   Access privileges   
--------------+----------+-----------+-------------+-------------+-----------------------
 dbu0001      | dbu0001  | UTF8      | en_US.UTF-8 | en_US.UTF-8 | 

Now we rename the database owner as we made a typo and we need to set the password again.

postgres=# alter user dbu0001 rename to dbo0001;
NOTICE:  MD5 password cleared because of role rename
ALTER ROLE
postgres=# alter user dbo0001 password 'yeaxaureiraeLohsh6deJ2ohngahpu9a';
ALTER ROLE

The second task is to rename the database to the correct name.

postgres=# alter database dbu0001 rename to dbs0001;
ALTER DATABASE

And basically we are now done as the ownership was already modified when we renamed the account with our first statement, but let check what PostgreSQL says it now has.

postgres=# \l
                                    List of databases
     Name     |  Owner   | Encoding  |  Collation  |    Ctype    |   Access privileges   
--------------+----------+-----------+-------------+-------------+-----------------------
 dbs0001      | dbo0001  | UTF8      | en_US.UTF-8 | en_US.UTF-8 | 

Please don’t forget to update the connection strings for applications using this database and maybe GRANTS that have been set and the pg_hba.conf file.

Wanting real schema support in MySQL

While upgrading WordPress towards the latest version it also required some database modifications, but there is where I start hating MySQL again and more. I really hope that MySQL will get support for schema’s as PostgreSQL or Oracle has, but it appears that MySQL has painted itself in the corner. And before some will say that MySQL has, the create schema function is only an alias for create database.

There lies a problem as I don’t want to create a new database for every new instance of an application. Others suggest to use a prefix for the tables, but this means I need to know the prefix in advance before creating tables and/or altering tables. This is nice when you have two or three instances, but there it stops. I want a single set of commands and just switch from schema to schema and apply the patches without any additional scripting changing prefixes.

So you have the choice of creating a “database” for every instance, which will just create another directory with database files. Or you need to use prefix, but then you’re limited to a max of 64 characters for the complete table name. If you would do replication it may even needs to be shorter then that if I may belief MySQL fora. I can only hope that some applications will get decent PostgreSQL support or that Oracle will give MySQL real schema support, but I doubt if they would do that.

Als een eerstejaars met psycopg2

Als een eerstejaars ben ik sinds jaren weer in de autocommit grap van database getrapt. Om een of andere reden bleven de tupels netjes in de tabel zitten. Dit terwijl de query direct met psql uitgevoerd wel netjes zijn werk deed. Dus maar eens in de documentatie van psycopg2 gedoken om te zien wat er mis was met de volgende Python-code.

1
2
3
4
5
curr = conn.cursor()
try:
    curr.execute("""delete from tabel where x > 2""")
except Exception, e:
    exit(1)

Het antwoord was helaas snel gevonden en deed me terug denken aan de eerste stappen op Oracle. Een dikke 15 jaar geleden, dat wel. De bijgewerkte code hieronder vertelt eigenlijk wel het verhaal.

1
2
3
4
5
6
7
curr = conn.cursor()
try:
    curr.execute("""delete from tabel where x > 2""")
    conn.commit()
except Exception, e:
    conn.rollback()
    exit(1)

Ik heb me laten misleiden door de autocommit bij andere modules binnen PHP en Perl als een eerstejaars. Want daar stopt autocommit op het moment dat je een transactie start, maar hier moet alles worden gecommit. Dit doet met denken of er dirty read/write bugs zijn te vinden in sommige applicaties die autocommit doen.