How to Leverage Extensions in PostgreSQL

PostgreSQL offers a rich ecosystem of extensions that extend its functionality, enabling users to handle complex use cases efficiently. Many useful extensions are available in the PostgreSQL /contrib directory. These extensions can be compiled and installed as follows:

Compiling and Installing Extensions

1. Navigate to the extension's directory:

make
sudo make install

2. Enable the extension inside the database:

CREATE EXTENSION extension_name;

Exploring and leveraging these extensions can significantly enhance PostgreSQL's capabilities, allowing you to address a wide range of tasks more efficiently.

In this blog, we will explore six major PostgreSQL extensions and their features, operations, and practical examples. Some of those extensions are:

1. Hstore

2. Auto_explain

3. pg_stat_statements

4. pg_visibility

5. Bloom

Each extension provides unique capabilities to solve specific problems, making PostgreSQL a versatile database solution.

1. Hstore: Managing Key-Value Pairs

What is Hstore?

Hstore is a key-value store data type in PostgreSQL. It allows you to store sets of key-value pairs within a single database column. Each key and value is stored as text, making it a flexible option for handling semi-structured data where complex JSON operations are unnecessary.

Key Features of Hstore:

* Key-Value Storage: Stores pairs of strings.

* Text-Based Storage: All keys and values are stored as text.

* Semi-Structured Data: Perfect for scenarios where schema flexibility is needed but full JSON complexity isn't required.

Main Operations:

* Key Existence Checking (?****): Check if a specific key exists.

* Key-Value Pair Checking (@>****): Verify if a specific key-value pair exists.

* Value Retrieval (->****): Retrieve the value associated with a specific key.

* Key/Value Extraction (akeys()****, avals()): Extract keys or values as arrays.

* Merging Hstores (||****): Combine two hstore objects.

* Deleting Keys (delete()****): Remove a specific key from an hstore.

Setting Up Hstore

Enable Hstore Extension

To use hstore, you need to enable the extension. This requires superuser privileges:

-- Enable the hstore extension
-- Superuser privileges are required to enable this extension because it modifies the database's schema and makes system-level changes.
CREATE EXTENSION IF NOT EXISTS hstore;

Create a Table with an Hstore Column

Here's how you can create a table with an hstore column:

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    attributes hstore
);

Working with Hstore: Key Query Examples

Insert Operations

-- Insert key-value pairs using simple string syntax for hstore
INSERT INTO products (name, attributes) 
VALUES ('Laptop', 'brand=>Dell, color=>black, ram=>16GB, storage=>512GB');
-- Insert using a mix of syntax
INSERT INTO products (name, attributes) 
VALUES ('Phone', '"screen size"=>"6.1 inches", color=>blue, storage=>256GB');

Check if a Key Exists

-- Find rows where the key 'brand' exists in the attributes
SELECT name, attributes 
FROM products 
WHERE attributes ? 'brand';

Check if a Key-Value Pair Exists

-- Find rows with the key-value pair 'color=>black'
SELECT name, attributes 
FROM products 
WHERE attributes @> 'color=>black';

Retrieve Value for a Key

-- Extract the value of the 'brand' key from attributes
SELECT name, attributes -> 'brand' AS brand 
FROM products;

Extract Keys and Values

-- Get all keys and values from the hstore column as arrays
SELECT name,
       akeys(attributes) AS keys,
       avals(attributes) AS values
FROM products;

Delete a Key

-- Remove the 'year' key from attributes for a specific product
UPDATE products 
SET attributes = delete(attributes, 'year')
WHERE name = 'Tablet';

Add or Update Key-Value Pairs

-- Add or update multiple key-value pairs
UPDATE products 
SET attributes = attributes || 
    'warranty=>2years, condition=>new'::hstore
WHERE name = 'Laptop';

2. Auto_explain: Analyzing query execution plans

Auto_explain is an extension that automatically logs the execution plans of queries exceeding a specified execution time. By capturing this information, it enables you to identify slow or inefficient queries, helping you optimize database performance.

How to Enable auto_explain?

Step 1: Load the Module

LOAD 'auto_explain';

This dynamically loads the module for the current session. For permanent use, add it to the shared_preload_libraries in postgresql.conf:

shared_preload_libraries = 'auto_explain'

Step 2: Configure Parameters

ALTER SYSTEM SET auto_explain.log_min_duration = '100ms';  -- Log queries taking > 100ms
ALTER SYSTEM SET auto_explain.log_analyze = 'true';        -- Include EXPLAIN ANALYZE output
ALTER SYSTEM SET auto_explain.log_buffers = 'true';        -- Include buffer usage
ALTER SYSTEM SET auto_explain.log_timing = 'true';         -- Include timing information
ALTER SYSTEM SET auto_explain.log_nested_statements = 'true'; -- Log nested statements
ALTER SYSTEM SET auto_explain.log_verbose = 'true';        -- Use verbose output
ALTER SYSTEM SET auto_explain.log_format = 'json';         -- Output format (text/json/xml/yaml)

Step 3: Reload Configuration

SELECT pg_reload_conf();

This applies to the configuration changes without restarting the server.

Optional: Enable for Current Session Only

SET auto_explain.log_analyze = 'true';
SET auto_explain.log_min_duration = '100ms';

Step 4: Creating Sample Tables and Data

The script creates two tables (orders and order_items) and populates them with random data to simulate a real-world workload.

Step 5: Executing Example Queries

The script provides two complex queries that are likely to trigger auto_explain logging due to their computation complexity, multiple joins, and aggregate functions.

When these queries are executed:

1. If the execution time exceeds 100ms (log_min_duration), the query's execution plan is automatically logged in the configured format (json in this case).

2. The log includes detailed execution statistics, such as timing, buffer usage, and the structure of the query plan.

3. Nested queries and additional details are also logged if enabled.

3. pg_stat_statements: Query Performance Monitoring

pg_stat_statements is a PostgreSQL extension used to track query execution statistics and performance. It helps in monitoring and optimizing query performance by providing insights into slow queries, query patterns, and resource usage.

Key Features

* Query Normalization: Normalizes queries by removing literal values, making it easier to group similar queries.

* Execution Statistics: Includes total execution time, number of calls, and average time per call.

* I/O Statistics: Tracks cache hits and disk reads.

* Cache Hit Ratios: Shows the percentage of blocks served from memory versus disk.

* Resource Usage Tracking: Monitors the impact of queries on system resources.

1. Top 10 Slowest Queries by Total Time

SELECT 
    substring(query, 1, 100) as query_excerpt,  
    calls as number_of_executions,
    round((total_exec_time/1000/60)::numeric, 2) as total_minutes,
    round((total_exec_time/1000/calls)::numeric, 2) as avg_seconds_per_call,
    rows as total_rows
FROM pg_stat_statements 
ORDER BY total_exec_time DESC 
LIMIT 10;

2. Most Frequently Run Queries

SELECT 
    substring(query, 1, 100) as query_excerpt,
    calls as number_of_executions,
    round((total_exec_time/1000/calls)::numeric, 2) as avg_seconds_per_call,
    rows as total_rows
FROM pg_stat_statements 
ORDER BY calls DESC 
LIMIT 10;

3. Queries with Worst I/O Performance (High Block Read)

SELECT 
    substring(query, 1, 100) as query_excerpt,
    calls,
    shared_blks_hit as cache_hits,
    shared_blks_read as disk_reads,
    round((shared_blks_hit::float / (shared_blks_hit + shared_blks_read + 1) * 100)::numeric, 2) as cache_hit_ratio
FROM pg_stat_statements 
WHERE shared_blks_read > 1000  
ORDER BY shared_blks_read DESC 
LIMIT 10;

4. Reset Statistics

This query resets the pg_stat_statements statistics, which is useful for periodic performance tracking (e.g., monthly).

SELECT pg_stat_statements_reset();

4. pg_visibility: Analyzing Table Visibility

The pg_visibility extension allows you to examine the visibility map and page-level visibility bits in your PostgreSQL tables. This visibility information is crucial for PostgreSQL's VACUUM process and query optimization. When a page is marked as "all-visible," it means that all tuples on that page are visible to all transactions, allowing PostgreSQL to skip certain visibility checks during table scans.

Understanding visibility maps is crucial because they directly impact:

* Query Performance

* VACUUM Efficiency

* Storage Optimization

* Overall Database Health

Example Use Case: E-commerce Platform

1. Checking Table Visibility

SELECT relname as table_name,
       pg_size_pretty(pg_table_size(oid)) as table_size,
       pg_visibility_map_summary(oid) as visible_pages_ratio
FROM pg_class
WHERE relkind = 'r'  
  AND relname = 'products';

2. Investigating Problem Pages

SELECT blkno, all_visible, all_frozen
FROM pg_visibility('products')
WHERE all_visible = false
LIMIT 5;

3. Resolution: Running VACUUM

VACUUM (VERBOSE, ANALYZE) products;
4. Verification
SELECT pg_visibility_map_summary('products') as visible_pages_ratio;

5. Bloom: Efficient Multi-Column Indexing

A Bloom index in PostgreSQL is a specialized index type provided by the Bloom extension, designed for efficiently indexing multiple columns, especially when you need to perform queries that involve conditions on several of these columns. It uses the Bloom filter algorithm, a probabilistic data structure that offers fast membership testing with the tradeoff of occasional false positives.

Key Features

Bloom indexes in PostgreSQL are designed for efficient multi-column indexing by combining column values into a compact, hash-based representation. They use a fixed-size bit array for space-efficient storage, making them ideal for large datasets. While they excel at optimizing multi-condition queries (e.g., combining multiple WHERE conditions with AND), they may produce false positives, which are filtered out during query execution, making them unsuitable for scenarios requiring absolute accuracy. Bloom indexes are immutable, requiring a rebuild when the underlying table data changes.
To demonstrate the effectiveness of a Bloom index compared to B-Tree indexes, we can create examples using the same table and dataset, showing query execution times for multi-column queries. This will illustrate when a Bloom index is advantageous over separate B-Tree indexes.

Example

Step 1: Enable the bloom Extension

CREATE EXTENSION IF NOT EXISTS bloom;

Step 2: Create a Table

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name TEXT,
    category TEXT,
    price NUMERIC
);

Step 3: Create a Bloom Index

CREATE INDEX bloom_index ON products USING bloom (name, category, price)
WITH (length = 80, col1 = 2, col2 = 2, col3 = 2);

* length: Total length of the Bloom filter in bits.

* col1, col2, col3: Number of hash functions for each column.

PostgreSQL extensions unlock powerful features and allow developers to extend the database's capabilities far beyond its core functions. In addition to the extensions discussed, many others are available in the /contrib directory, ready to be compiled and installed. By leveraging these tools, users can optimize performance, enhance functionality, and solve complex data challenges with ease.

whatsapp_icon
location

Calicut

Cybrosys Technologies Pvt. Ltd.
Neospace, Kinfra Techno Park
Kakkancherry, Calicut
Kerala, India - 673635

location

Kochi

Cybrosys Technologies Pvt. Ltd.
1st Floor, Thapasya Building,
Infopark, Kakkanad,
Kochi, India - 682030.

location

Bangalore

Cybrosys Techno Solutions
The Estate, 8th Floor,
Dickenson Road,
Bangalore, India - 560042

Send Us A Message