Feb 06, 2025

How to Use Partitioning in PostgreSQL for Efficient Data Management

Partitioning in PostgreSQL is an essential technique for managing large datasets by breaking down tables into smaller, more efficient sections. This approach improves query performance, simplifies data administration, and expands scalability.

In this blog, we focus on the several partitioning techniques that PostgreSQL provides, such as hash, list, and range partitioning, and provide useful details on how to use each technique for the best database performance.

What is partitioning?

Partitioning is the process of splitting a large table into smaller sub-tables, called partitions. These partitions share the same schema but store distinct subsets of data. PostgreSQL automatically routes data to the correct partition based on specified rules.

Benefits of Partitioning

* Query Performance: Queries targeting specific partitions are faster as they scan smaller datasets.

* Data Management: Easier to manage and archive old data by detaching partitions.

* Efficient Maintenance: Partition-level operations (like vacuum and reindexing) reduce overhead.

* Scalability: Helps handle large datasets by distributing them across multiple partitions.

Partitioning Strategies

PostgreSQL supports the following partitioning strategies:

1. Range Partitioning

Data is partitioned based on a continuous range of values. Each partition covers a distinct range that does not overlap with others. This method is best suited for sequential data like dates, numeric series, or IDs.

Use Cases: Range partitioning is ideal for time-series data (e.g., logs, sales transactions by date) and historical records (e.g., archive tables by year or quarter).

Example:

CREATE TABLE sales (
    sale_date date not null,
    amount numeric
) PARTITION BY RANGE (sale_date);
CREATE TABLE sales_2023 PARTITION OF sales
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

If you try to insert data into the sales table before creating partitions for it, it will cause an error, because:

* A PARTITION BY RANGE table acts as a parent table and does not directly store any data.

* All data must go into one of the child partition tables that you define.

* Without any partitions, there is no destination for the data to be inserted, leading to an error.

Example Error

ERROR:  no partition of relation "sales" found for row
DETAIL:  Failing row contains (2025-01-07, 1000).

This approach allows efficient querying of specific date ranges. Older partitions can be detached and archived, and purging outdated data is simplified by dropping partitions.

Challenges: Managing partition boundaries carefully is necessary, and queries that span multiple partitions can be slower.

2. List Partitioning

With list partitioning, data is distributed into partitions based on distinct, predefined values. Each partition is associated with specific values from a column, making this method particularly suitable for categorical data.

Use Cases: Common use cases include country codes, regions, product categories, and status values (e.g., "active", "inactive").

Example:

CREATE TABLE orders (
    order_id int,
    country_code text
) PARTITION BY LIST (country_code);
CREATE TABLE orders_us PARTITION OF orders
    FOR VALUES IN ('US', 'USA');
CREATE TABLE orders_ca PARTITION OF orders
    FOR VALUES IN ('CA', 'CAN');

Benefits: List partitioning is highly efficient for queries involving exact matching and simplifies the addition or removal of specific value partitions.

Challenges: Data distribution can become uneven if certain values dominate, and managing large numbers of partitions for numerous discrete values can be complex.

3. Hash Partitioning

Hash partitioning distributes data across partitions using a hash function, ensuring even row distribution. It is ideal when data has no obvious partition key.

Use Cases: Hash partitioning is useful for distributing user records (by user ID) and large data sets randomly.

Example:

Step 1: Define the users table with hash partitioning

CREATE TABLE users (
    user_id int,
    name text
) PARTITION BY HASH (user_id);

Step 2: Create the partitions

Since we are partitioning by hash, we’ll create four partitions. You can adjust the modules as needed.
CREATE TABLE users_0 PARTITION OF users
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE users_1 PARTITION OF users
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE users_2 PARTITION OF users
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE users_3 PARTITION OF users
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);

Step 3: Insert data into the users table

Once the partitions are created, you can insert data into the users table.

-- Insert data into the users table
INSERT INTO users (user_id, name)
VALUES 
    (1, 'Alice'),
    (2, 'Bob'),
    (3, 'Charlie'),
    (4, 'David'),
    (5, 'Eva'),
    (6, 'Frank'),
    (7, 'Grace'),
    (8, 'Hannah'),
    (9, 'Irene'),
    (10, 'Jack'),
    (11, 'Kathy'),
    (12, 'Leo'),
    (13, 'Mona'),
    (14, 'Nina'),
    (15, 'Oscar'),
    (16, 'Paul');

Now, your rows should be distributed into the respective partitions (users_0, users_1, users_2, users_3) based on the hash of the user_id value.

Benefits: Hash partitioning ensures automatic even data distribution and prevents hot spots in specific partitions.

Challenges: Predicting which partition contains specific data is difficult, and there is no natural way to query a single partition directly.

4. Sub-partitioning

PostgreSQL's sub-partitioning, also known as multi-level partitioning, is a powerful feature that allows you to partition your tables using multiple criteria. Think of it as creating partitions within partitions, like organizing a library where books are first sorted by genre (fiction/non-fiction) and then by author's last name within each genre.

Example:

Step 1: Define the global_sales table with range partitioning by sale_date

CREATE TABLE global_sales (
    sale_date date,
    country_code text,
    amount numeric
) PARTITION BY RANGE (sale_date);

Step 2: Create the first-level partitions for sale_date

Since we are partitioning by sale_date using RANGE, we’ll create partitions for the years 2023 and 2024.

-- Create the first-level partition for 2023
CREATE TABLE global_sales_2023 PARTITION OF global_sales
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01')
    PARTITION BY LIST (country_code);
-- Create the first-level partition for 2024
CREATE TABLE global_sales_2024 PARTITION OF global_sales
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
    PARTITION BY LIST (country_code);

Step 3: Create the sub-partitions within each first-level partition for country_code

Now, for each year partition (global_sales_2023 and global_sales_2024), we’ll create sub-partitions by country_code using LIST.

-- Create sub-partitions for each country within 2023
CREATE TABLE global_sales_2023_us PARTITION OF global_sales_2023
    FOR VALUES IN ('US', 'USA');
CREATE TABLE global_sales_2023_ca PARTITION OF global_sales_2023
    FOR VALUES IN ('CA', 'CAN');
CREATE TABLE global_sales_2023_in PARTITION OF global_sales_2023
    FOR VALUES IN ('IN', 'IND');
-- Create sub-partitions for each country within 2024
CREATE TABLE global_sales_2024_us PARTITION OF global_sales_2024
    FOR VALUES IN ('US', 'USA');
CREATE TABLE global_sales_2024_ca PARTITION OF global_sales_2024
    FOR VALUES IN ('CA', 'CAN');
CREATE TABLE global_sales_2024_in PARTITION OF global_sales_2024
    FOR VALUES IN ('IN', 'IND');

Step 4: Insert data into the global_sales table

Once the partitions are created, you can insert data into the global_sales table. The data will automatically be routed to the correct sub-partitions based on the sale_date and country_code values.

-- Insert data into the global_sales table
INSERT INTO global_sales (sale_date, country_code, amount)
VALUES 
    ('2023-06-15', 'US', 150.00),  -- Goes to global_sales_2023_us
    ('2023-07-20', 'CA', 200.00),  -- Goes to global_sales_2023_ca
    ('2023-08-10', 'IN', 175.00),  -- Goes to global_sales_2023_in
    ('2024-02-15', 'US', 225.00),  -- Goes to global_sales_2024_us
    ('2024-03-20', 'CA', 250.00),  -- Goes to global_sales_2024_ca
    ('2024-04-25', 'IN', 275.00);  -- Goes to global_sales_2024_in

Step 5: Query specific combinations of sale_date and country_code

You can now query specific date ranges and countries, and PostgreSQL will only scan the relevant partitions for improved performance.

-- Query to get sales data for the US in 2023
SELECT * FROM global_sales 
WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31' 
AND country_code = 'US';  -- This will only scan global_sales_2023_us

Benefits: Composite partitioning allows fine-grained control over data organization and is optimal for scenarios with multiple segmentation criteria.

Challenges: It can be complex to manage and design, often leading to a high number of partitions.

Key Considerations for Choosing a Partition Strategy

When selecting a partitioning strategy, it's important to consider data growth patterns and the potential for even distribution. Common query filters, such as those in WHERE clauses, should guide your choice of partitioning method. For maintenance, Range and List partitioning are typically more favorable, while Hash partitioning may be better suited for high insert rates. To optimize storage, consider archiving or dropping partitions when they are no longer needed.

Best Practices

Partitioning is most beneficial for large tables containing millions of rows or more. Regularly monitor partition sizes to maintain a balanced distribution of data. Automate partition creation and archiving through scripts, and periodically assess the effectiveness of your partitioning strategy, adjusting it as necessary.

Partitioning in PostgreSQL is a robust tool for managing large datasets efficiently. By dividing tables into smaller, focused segments, it enhances query performance, simplifies maintenance, and supports scalability. Choosing the right strategy—Range, List, Hash, or Sub-partitioning—depends on your data patterns and query needs. With thoughtful planning and regular monitoring, partitioning can optimize your database, streamline operations, and prepare it for future growth.

How to Use Partitioning in PostgreSQL for Efficient Data Management

What is partitioning?

Benefits of Partitioning

Partitioning Strategies

1. Range Partitioning

2. List Partitioning

3. Hash Partitioning

4. Sub-partitioning

Key Considerations for Choosing a Partition Strategy

Best Practices

Category

Related Post