Size and fill rate of PostgreSQL tables

Introduction

In PostgreSQL, it is essential to understand how disk space is allocated to tables. This knowledge is crucial for efficient resource management, capacity planning, and preventing performance degradation.

Indeed, unlike some database systems that store large tables in a single file, PostgreSQL uses a segmentation strategy.

Understanding Tablespaces

Before exploring the functions used to determine table sizes, it is important to understand the concept of tablespaces.

A tablespace is a physical location on the disk where database data is stored, including tables, indexes, and other objects.

PostgreSQL uses tablespaces to map database objects’ logical names to physical disk locations. By default, PostgreSQL provides two tablespaces: pg_default and pg_global. pg_default is used to store most data, while pg_global is used for global system objects.

Connecting to the Database and Selecting a Table

Before executing the SQL commands to get the size and fill rate of tables, you must connect to the PostgreSQL database and select the target table. Here are the steps to follow:

  1. Connect to your database using a SQL client like psql.
  2. Select the database that contains the table you want to analyze.
  3. Use the command \dt to display the list of tables in the selected database.

Once these steps are completed, you can run the SQL commands described in the following sections.

Obtaining the Global Database Size

To measure the total size occupied by the current database on the disk, PostgreSQL offers the pg_database_size() function. It accepts the database name or OID as a parameter. You can combine it with pg_size_pretty() to get a human-readable format:

SELECT pg_database_size(current_database()) AS total_bytes, pg_size_pretty(pg_database_size(current_database())) AS total_pretty;

Global and Detailed Analysis of All Tables

If you want to obtain an ordered overview of all tables in your database, including the breakdown between raw data (tables), indexes, and the percentage relative to the total database size, here is a complete and optimized SQL query (excluding system schemas):

SELECT
    n.nspname AS schema_name,
    c.relname AS table_name,
    pg_table_size(c.oid) AS data_bytes,
    pg_size_pretty(pg_table_size(c.oid)) AS data_pretty,
    pg_indexes_size(c.oid) AS index_bytes,
    pg_size_pretty(pg_indexes_size(c.oid)) AS index_pretty,
    pg_total_relation_size(c.oid) AS total_bytes,
    pg_size_pretty(pg_total_relation_size(c.oid)) AS total_pretty,
    round(100.0 * pg_total_relation_size(c.oid) / pg_database_size(current_database()), 2) AS percent
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind = 'r'
  AND n.nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(c.oid) DESC;

Obtaining the Space Allocated to a Table

PostgreSQL offers several functions to get the size of a table. Here are the most common ones:

  • pg_table_size(‘table_name’): Returns the table size in bytes, excluding indexes. This function includes free space, visibility maps, and data stored in TOAST tables. It is important to note that the TOAST (The Oversized-Attribute Storage Technique) technique allows PostgreSQL to handle large data fields that do not fit in a standard page. TOAST stores this data in a separate storage area and keeps a pointer to it in the main table.
  • pg_indexes_size(‘table_name’): Returns the total size of all indexes attached to the table in bytes.
  • pg_total_relation_size(‘table_name’): Returns the total size of the table in bytes, including indexes and all associated objects.

To display the size in a human-readable format (kB, MB, GB…), you can combine the pg_size_pretty() function with the previous functions. For example:

SELECT pg_size_pretty(pg_total_relation_size('my_table'));

Obtaining the Fill Rate of a Table

There is no single function to directly get the fill rate of a table in PostgreSQL. However, you can calculate it using the information provided by the pgstattuple('table_name') function and considering the space occupied by “dead” tuples and free space.

Here are the columns returned by pgstattuple() and their meanings:

Column Description
table_len Physical length of the relation in bytes
tuple_count Number of live tuples
tuple_len Total length of live tuples in bytes
tuple_percent Percentage of live tuples
dead_tuple_count Number of dead tuples
dead_tuple_len Total length of dead tuples in bytes
dead_tuple_percent Percentage of dead tuples
free_space Total free space in bytes
free_percent Percentage of free space

Calculating the fill rate:

To calculate the fill rate, follow these steps:

Step Formula
Live space live_space = tuple_len
Total space used total_space_used = table_len – free_space
Fill rate fill_rate = (live_space / total_space_used) * 100

Example:

Let’s take the concrete example of the pgstattuple() output presented later in this article.

table_len | 458752

tuple_count | 1470

tuple_len | 438896

tuple_percent | 95.67

dead_tuple_count | 11

dead_tuple_len | 3157

dead_tuple_percent | 0.69

free_space | 8932

free_percent | 1.95

Applying the formulas above, we get:

  • live_space = 438896
  • total_space_used = 458752 – 8932 = 449820
  • fill_rate = (438896 / 449820) * 100 = 97.57%

Remarks:

  • The total space used (table_len – free_space) includes the space occupied by live tuples, dead tuples, and table overhead (page headers, tuple pointers, etc.).
  • The fill rate does not take index space into account.
  • “Dead” tuples are records that have been deleted or updated but are not yet physically removed from the table. They can be reclaimed using the VACUUM command.
  • There are two variants of the VACUUM command: VACUUM and VACUUM FULL. VACUUM reclaims space and makes it available for reuse within the database, while VACUUM FULL is more aggressive, locks the table, removes dead tuples and empty pages, and returns the reclaimed space to the operating system. VACUUM FULL can be resource-intensive and requires an exclusive lock on the table.

Concrete Example

Here is an example of using pg_total_relation_size() to get the size of the “actor” table:

SELECT pg_size_pretty(pg_total_relation_size('actor'));

Result:

72 kB

And here is an example of the output of the pgstattuple() function:

SELECT * FROM pgstattuple('pg_catalog.pg_proc');

Result:

table_len | 458752

tuple_count | 1470

tuple_len | 438896

tuple_percent | 95.67

dead_tuple_count | 11

dead_tuple_len | 3157

dead_tuple_percent | 0.69

free_space | 8932

free_percent | 1.95

Automation in Python

To automate the monitoring of your databases and tables’ size, you can use a Python script. Here is an example of a function that connects via a SQL client and retrieves this information as a structured dictionary:

def get_database_size_analysis(self, client):
    """Fetch total database size and top tables breakdown."""
    sf = self._get_schema_filter()

    # 1. Total Database Size
    db_size_query = (
        "SELECT pg_database_size(current_database()) AS total_bytes, "
        "pg_size_pretty(pg_database_size(current_database())) AS total_pretty;"
    )

    try:
        db_size_rows = client.execute_query(db_size_query)
        db_size = db_size_rows[0] if db_size_rows else None
        total_db_bytes = db_size['total_bytes'] if db_size and db_size['total_bytes'] > 0 else 1

        # 2. Table Breakdown (Data, Index, Total)
        table_size_query = f"""
        SELECT
            n.nspname AS schema_name,
            c.relname AS table_name,
            pg_table_size(c.oid) AS data_bytes,
            pg_size_pretty(pg_table_size(c.oid)) AS data_pretty,
            pg_indexes_size(c.oid) AS index_bytes,
            pg_size_pretty(pg_indexes_size(c.oid)) AS index_pretty,
            pg_total_relation_size(c.oid) AS total_bytes,
            pg_size_pretty(pg_total_relation_size(c.oid)) AS total_pretty,
            round(100.0 * pg_total_relation_size(c.oid) / {total_db_bytes}, 2) AS percent
        FROM pg_class c
        JOIN pg_namespace n ON n.oid = c.relnamespace
        WHERE c.relkind = 'r'
          AND n.nspname NOT IN ('pg_catalog', 'information_schema')
          {sf}
        ORDER BY pg_total_relation_size(c.oid) DESC;
        """

        table_sizes = client.execute_query(table_size_query) or []
        return {
            "database": db_size,
            "tables": table_sizes
        }
    except Exception as e:
        logging.error(f"Failed to fetch size analysis: {e}")
        return None

Best Practices for Managing Table Size and Fill Rate

To maintain optimal performance of your PostgreSQL database, it is important to efficiently manage the size and fill rate of your tables. Here are some best practices to follow:

  • Regularly monitor your table sizes using the functions described in this article.
  • Analyze table bloat using tools like pgstattuple() to identify tables containing a large number of dead tuples.
  • Run the VACUUM command regularly to remove dead tuples and free up space.
  • Consider using VACUUM FULL for highly fragmented tables, but be aware of its performance impact.
  • Optimize table design by choosing appropriate data types and avoiding unnecessary columns.
  • Use indexes wisely to improve query performance, but remember that indexes also consume disk space.

Conclusion

PostgreSQL provides a variety of functions to determine table size, including pg_table_size(), pg_indexes_size(), and pg_total_relation_size().

The pgstattuple() function allows analyzing the space occupied by live tuples, dead tuples, and free space in a table.

By combining this information, you can calculate the fill rate of a table.

It is crucial to monitor table size and fill rate to ensure database performance and efficient resource allocation. The VACUUM command, with its ‘VACUUM’ and ‘VACUUM FULL’ variants, plays an important role in reclaiming disk space occupied by dead tuples.

By following table size management best practices and using the tools provided by PostgreSQL, you can optimize disk space utilization and maintain your database’s performance.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.