PostgreSQL uses TOAST (The Oversized-Attribute Storage Technique) to handle large text and bytea data types efficiently. When a table row contains very large text or binary data, PostgreSQL automatically stores these large values in TOAST tables and keeps only a reference in the main table.
What is a TOAST Table?
A TOAST table is an auxiliary table automatically created by PostgreSQL for each table containing columns that can store large data types, such as text, bytea, or jsonb. When a column value exceeds a certain size threshold (toast_tuple_target, typically 2KB), PostgreSQL moves the oversized value to the TOAST table.
Key Characteristics of TOAST Tables:
* Automatically created by PostgreSQL for supported data types.
* Store large column values that exceed the inline storage threshold.
* Use compression (PGLZ or LZ4) to reduce storage space.
* Maintain references in the main table for quick lookup.
Each TOAST table contains oversized data, and the main table stores only a reference (OID) pointing to the corresponding TOAST entry.
The TOAST management system provides four strategies for handling TOAST-able columns on disk:
1. PLAIN: Disables both compression and out-of-line storage. This is the only strategy applicable to columns with non-TOAST-able data types.
2. EXTENDED: Enables both compression and out-of-line storage. This is the default strategy for most TOAST-able data types. Compression is attempted first; if the row remains too large, out-of-line storage is used.
3. EXTERNAL: Allows out-of-line storage but disables compression. This strategy improves the performance of substring operations on large text and bytea columns by fetching only the necessary parts of the out-of-line value. However, it increases storage space usage.
4. MAIN: Enables compression but restricts out-of-line storage. Out-of-line storage is still possible as a last resort, but only if it is the only way to make the row fit within a page.
Each TOAST-able data type has a default strategy, but the strategy for individual table columns can be modified as needed.
ALTER TABLE ... SET STORAGE
The Role of Compression in TOAST Storage
To optimize storage and improve retrieval performance, PostgreSQL employs data compression techniques before storing large values in TOAST tables. The two primary compression methods used are PGLZ and LZ4.
Threshold for TOAST Storage and Compression
Not every data value is stored in TOAST tables, nor is every value compressed. PostgreSQL uses a threshold value, defined by the toast_tuple_target parameter, to determine when data should be moved to a TOAST table.
* By default, toast_tuple_target is set to 2KB.
* Values larger than this threshold are considered for storage in TOAST tables.
* Compression is applied only if the compressed size is smaller than the original by a sufficient margin (typically 25% or more).
* Smaller values remain inline within the main table row for faster access.
This behavior ensures that the overhead of compression and TOAST storage is justified by the resulting space savings.
What is PGLZ Compression?
PGLZ (PostgreSQL Lightweight Compression) is a default compression algorithm used in PostgreSQL for TOAST storage. It is designed specifically for PostgreSQL with a focus on reducing storage usage while keeping computational overhead relatively low.
Key Features of PGLZ:
* Optimized for compressing moderately large data.
* Works well for text and bytea data types.
* Balances between compression ratio and CPU overhead.
How PGLZ Works: PGLZ uses pattern matching to identify and replace repeated byte sequences with shorter references. It focuses on minimizing overhead in both compression and decompression processes.
Configuration: PGLZ compression is enabled by default in PostgreSQL. You can fine-tune compression behavior using parameters like toast_tuple_target to control when TOAST storage is used.
What is LZ4 Compression?
LZ4 is a high-performance compression algorithm known for its lightning-fast compression and decompression speeds. Starting from PostgreSQL 14, LZ4 has been introduced as an alternative to PGLZ.
Key Features of LZ4
* Extremely fast compression and decompression.
* Slightly larger compressed output compared to PGLZ.
* Suitable for use cases prioritizing performance over storage efficiency.
* Effective for large datasets and real-time applications.
How LZ4 Works: LZ4 operates by scanning data blocks and replacing repeating sequences with references, similar to PGLZ. However, it uses a more optimized approach, making it significantly faster.
Configuration: To enable LZ4 compression in PostgreSQL, modify the toast_compression parameter:
ALTER TABLE table_name SET (toast_compression = 'lz4');
You can also set it globally using:
SET default_toast_compression = 'lz4';
How PostgreSQL Fetches Compressed Data from TOAST Tables
When querying large text stored in a TOAST table, PostgreSQL follows these steps:
1. The main table row contains a reference to the TOAST table.
2. PostgreSQL fetches the compressed data from the TOAST table.
3. Depending on the compression method (PGLZ or LZ4), PostgreSQL decompresses the data.
4. The decompressed data is returned to the client.
The choice between PGLZ and LZ4 impacts both the speed of decompression and the storage size of the compressed data.
PGLZ vs LZ4: A Quick Comparison
Feature | PGLZ | LZ4 |
Compression Speed | Moderate | Very Fast |
Decompression Speed | Moderate | Extremely Fast |
Compression Ratio | Higher | Slightly Lower |
Use Case | General Purpose | Real-time & High-Performance |
Which One to Choose?
* Use PGLZ when storage savings are a higher priority.
* Use LZ4 when speed and performance are critical.
Conclusion
PostgreSQL's TOAST (The Oversized-Attribute Storage Technique) system provides a robust and flexible solution for handling large data values in databases. The system intelligently manages storage through automatic table creation, compression, and customizable storage strategies (PLAIN, EXTENDED, EXTERNAL, and MAIN). With two compression options available - the storage-efficient PGLZ and the high-performance LZ4 - database administrators can optimize their systems based on specific needs. PGLZ is ideal for scenarios where storage space is a primary concern, while LZ4 is better suited for applications requiring faster processing and real-time data access. This flexibility in PostgreSQL's large data handling mechanisms allows organizations to strike the right balance between storage efficiency and performance based on their unique requirements. To learn more about achieving seamless data synchronization, explore our guide on How to Set Up Live Replication in PostgreSQL for detailed instructions and best practices.