Redshift vs. Postgres: Detailed Comparison of Performance and Functionality

--

The critical differences of Redshift vs. Postgres:

1. Table data is implemented via rows with Postgres, while Table data is implemented via columns with Redshift,

2. Indexing and Keys are handled differently by Redshift and Postgres,

3. A Clustered vs Single Node is the key architectural difference of Redshift and Postgres implementations.

This article will define, discover and compare the performance and functionality differences of both Redshift and Postgres, and how theses variances can potentially determine which choice you will make when deciding between the two when selecting a database warehouse solution to manage your critical business data.

Table of Contents

General overview of Redshift and Postgres

Performance comparison — Redshift vs. Postgres

Functionality comparison — Redshift vs. Postgres

When you would use one over the other? — Redshift vs. Postgres

Conclusion

General overview of Redshift and Postgres

Redshift

Redshift, which is owned by Amazon, is an amazingly fast and extremely powerful relational data warehouse solution: it has a strong following within the AWS cloud, and its popularity with businesses and users is growing rapidly.

Redshift’s ability to scale data upwards in size is easy to implement and manage: it can scale to petabytes of raw data with ease. With such a large amount of information that it houses and governs, its therefore more than reasonable to declare that high performance query optimization, and reporting of large data volumes, is readily attainable using Redshift. Business intelligence (BI) and online analytic processing (OLAP) applications that use numerous and complex queries are precisely what Redshift is designed to manage.

Postgres

Postgres is also a powerful database management system, that has been in existence for more than 30 years. It is a free, open source object-relational database system, recognized in industry as being exceptionally reliable with a more than satisfactory performance experience for its end users. Being a free product implies that the administrators will only need to procure the underlying infrastructure where Postgres will be installed on. One downside is the time and effort that will be required for ongoing RDBMS operational tasks and maintenance, as well as any project related activities such as upgrading the storage capacity if upward scaling is required.

Interestingly, the name Postgres referred to it being recognized as the successor to the Ingres database system which was developed by University of California.

Performance comparison — Redshift vs. Postgres

Amazon Redshift has been designed to house large amounts of data and manage and maintain this data with upward scalability more than likely to occur. With this in mind, the data schema and RDMS engine that Redshift has designed is quite different from that of Postgres (even though Redshift was initially a by-product from Postgres!).

For an online transaction processing application that stores data in rows within a database table, Amazon Redshift has made a technical directional choice to implement and store data in columns. What this means is that the stored data — in a column-oriented format — is that the essential data that you are searching for can be accessed very quickly as you do not need to read all of the non-relevant data existing from multiple rows.

This type of RDBMS is typically referred to as a columnar database, and its popularity is increasing significantly. The primary benefit you will realize is that your queries will become very fast — and this is vital when analyzing large amounts of data via multiple business queries. Further performance gains can be realized by utilizing compression encodings on the column data as well.

Postgres on the other hand is totally different to Redshift in that it abides by the more traditional RDBMS technical approach of increasing table size via inserting row data. Another crucial difference is that Postgres is a single database connection, and it cannot utilize more than one CPU.

One drawback of Redshift is how it handles foreign keys, and foreign key constraints. Redshift essentially treats foreign keys as basically developer documentation, and technically they are not respected: foreign keys and their constraints are not enforced when inserting or deleting column data.

So why is it that Redshift is so fast at what it does?

There are three key reasons that Redshift owes to its superior speed when compared with Postgres:

1. Columnar storage is compressed: because Redshift stores data in organized columns, the data can be readily and easily read and retrieved (regardless of the column size) and hence copied into RAM for end user manipulation. Contrast this with Postgres that stores its data via rows, which means to understand aggregated data values you need to read and retrieve all of the row data in the first instance.

2. Block storage: Redshift stores the retrieved columnal data in storage blocks and coupled with multiple nodes (as Redshift’s clustering architectural model denotes) allows Redshift parallel processing resourcing capability to compute data analytics much quicker than what Postgres can manage.

3. Clustering: Amazon Redshift is a cluster of nodes, with one node designated as the ‘leader’ node, with one or more compute nodes. A Redshift cluster can achieve a much higher Input/ Output Operations Per Second (IOPS) than what a Postgres instance can, as every node will read from a different disk and therefore the IOPS sum will occur across the whole cluster.

Postgres can provide terrific performance and reliability as a data warehouse for a smaller volume of data; however, it cannot match the performance behavior of Redshift’s columnar based oriented architecture.

Functionality comparison — Redshift vs. Postgres

Supported and unsupported data types

· Redshift will support most of the key data types, such as: SMALLINT, INTEGER, BIGINT, DECIMAL, REAL, DOUBLE PRECISION, BOOLEAN, CHAR, VARCHAR, DATE, TIMESTAMP, TIMESTAMPTZ

· Postgres supports basically every data type.

There are a wide variety of data types that Redshift does not support (which are supported by Postgres), and they can be found at the following link for your reference: Redshift — unsupported data types documentation.

SQL differences

Yes, both Redshift and Postgres use SQL as its native RDBMS language, and while a majority of the syntax is exactly the same, there are some noticeable and stark differences between the two: they are summarized as follows:

· CREATE TABLE. Redshift does not support tablespaces, table partitioning, inheritance, and certain constraints. The Redshift implementation of CREATE TABLE enables you to define the sort and distribution algorithms for tables to optimize parallel processing.

· ALTER TABLE. Redshift does not support ALTER COLUMN actions, and ADD COLUMN supports adding only one column in each ALTER TABLE statement.

· COPY. The Redshift COPY command is specialized to enable loading of data from Amazon S3 buckets and Amazon DynamoDB tables and to facilitate automatic compression.

· INSERT, UPDATE, and DELETE. Redshift does not support the WITH clause.

· VACUUM. The parameters for VACUUM are different between the two databases. For example, the default VACUUM operation in PostgreSQL reclaims space and makes it available for reuse. The default VACUUM operation in Redshift is VACUUM FULL, which reclaims disk space and resorts all rows.

The following link is a list of Redshift SQL commands (reference — AWS site and documentation repository) that are different from the Postgres list of PostgreSQL commands

Also from the AWS documentation portfolio is a link that describes the key differences between Redshift SQL and PostgreSQL — AWS documentation.

Function and Feature differences

Features

The following Postgres features have been removed from Redshift:

· Table partitioning (range and list partitioning)

· Tablespaces

· Constraints

o Unique

o Foreign key

o Primary key

o Check constraints

o Exclusion constraints

· Database Roles

· Inheritance

· Postgres system columns. Amazon Redshift SQL does not implicitly define system columns. However, the PostgreSQL system column names cannot be used as names of user-defined columns.

· Indexes

· NULLS clause in Window functions

· Collations. Amazon Redshift does not support locale-specific or user-defined collation sequences.

· Value expressions

o Subscripted expressions

o Array constructors

o Row constructors

· Stored procedures

· Triggers

· Management of External Data (SQL/MED)

· Table functions

· VALUES list used as constant tables

· Recursive common table expressions

· Sequences

· Full-text search

Functions

Redshift have removed some of the Postgres functions which are more suited to smaller sized data transaction processing systems:

· Access privilege inquiry functions

· Advisory lock functions

· Aggregate functions

o STRING_AGG()

o ARRAY_AGG()

o EVERY()

o XML_AGG()

o CORR()

o COVAR_POP()

o COVAR_SAMP()

o REGR_AVGX(), REGR_AVGY()

o REGR_COUNT()

o REGR_INTERCEPT()

o REGR_R2()

o REGR_SLOPE()

o REGR_SXX(), REGR_SXY(), REGR_SYY()

· Array functions and operators

· Backup control functions

· Comment information functions

· Database object location functions

· Database object size functions

· Date/Time functions and operators

o CLOCK_TIMESTAMP()

o JUSTIFY_DAYS(), JUSTIFY_HOURS(), JUSTIFY_INTERVAL()

o PG_SLEEP()

o TRANSACTION_TIMESTAMP()

· ENUM support functions

· Geometric functions and operators

· Generic file access functions

· IS DISTINCT FROM

· Network address functions and operators

· Mathematical functions

o DIV()

o SETSEED()

o WIDTH_BUCKET()

· Set returning functions

o GENERATE_SERIES()

o GENERATE_SUBSCRIPTS()

· Range functions and operators

· Recovery control functions

· Recovery information functions

· ROLLBACK TO SAVEPOINT function

· Schema visibility inquiry functions

· Server signaling functions

· Snapshot synchronization functions

· Sequence manipulation functions

· String functions

o BIT_LENGTH()

o OVERLAY()

o CONVERT(), CONVERT_FROM(), CONVERT_TO()

o ENCODE()

o FORMAT()

o QUOTE_NULLABLE()

o REGEXP_MATCHES()

o REGEXP_SPLIT_TO_ARRAY()

o REGEXP_SPLIT_TO_TABLE()

· System catalog information functions

· System information functions

o CURRENT_CATALOG CURRENT_QUERY()

o INET_CLIENT_ADDR()

o INET_CLIENT_PORT()

o INET_SERVER_ADDR() INET_SERVER_PORT()

o PG_CONF_LOAD_TIME()

o PG_IS_OTHER_TEMP_SCHEMA()

o PG_LISTENING_CHANNELS()

o PG_MY_TEMP_SCHEMA()

o PG_POSTMASTER_START_TIME()

o PG_TRIGGER_DEPTH()

· Text search functions and operators

· Transaction IDs and snapshots functions

· Trigger functions

· XML functions

When you would use one over the other? — Redshift vs. Postgres

The following table summarizes the key reasons that you might consider when deciding whether to adopt either Redshift or Postgres to satisfy your data warehouse needs.

What factors would determine whether you will use either Postgres or Redshift for your large Data needs?

You have a more than competent technical team within the IT department, and the database technical knowledge is shared amongst all team members.

You are happy for the technical maintenance and infrastructure operations to be managed and controlled by AWS.

While database reporting is important to the business, you are not anticipating high end load reporting: either sporadically or periodically from the business.

You are predicting, and planning for, large data growth for your DW (petabytes), hence upward scalability is a high priority requirement.

You maintain on-premise servers and will continue to do so well into the future.

Your business logic management will require column processing and the number of columns is expected to fall within the range of hundreds to thousands.

Your business has an appetite for adopting open source systems, and you have been successful implementing that IT strategy.

You are existing AWS client, who has future plans to integrate redshift with your other cloud-based infrastructure.

You are a Government body or department that has data privacy concerns, where hosting and storing sensitive data in the cloud is not feasible.

While AWS will look after the technical maintenance, you are comfortable with managing the business logic (via complex queries) that will investigate and report upon the large data residing in the warehouse.

Allocating an IT budget for a large data warehouse solution, as well as licensing and ongoing operational costs, is a concern, and risk, for your organization.

Allocating an IT budget for a large data warehouse solution, as well as licensing and ongoing operational costs, is not a concern for your organization.

Conclusion

As with every significant IT acquisition that requires in-depth planning and due diligence, the final decision to proceed with either Redshift or Postgres ultimately depends on your unique set of business requirements that need to be satisfied. Redshift is a columnar database, so it is far more suitable RDBMS for analytics and the heavy-duty information processing that it requires. That’s not to say that Postgres cannot cope with massive data volume retrieval and processing — far from it — however, there is an internal department technical resource availability requirement that will need to be upskilled in Postgres to confidentially and effectively create and maintain the data integrity and business processing requirements for your organization. Overall, Redshift is a more scalable RDBMS solution than Postgres.

--

--

Paul Chambiras https://freelance-writer.site
Paul Chambiras https://freelance-writer.site

Written by Paul Chambiras https://freelance-writer.site

I am a freelance writer on all things Business, DIY, Sport, Technology, IT and Management.

No responses yet