Normal_gaussian 3 hours ago

At the time of writing the query has a small error. The filter is checking for reads and writes, but it should be reads or writes.

    WHERE
     -- Filter to only show tables that have had some form of read or write activity
    (s.n_tup_ins + s.n_tup_upd + s.n_tup_del) > 0
    AND
     (si.heap_blks_read + si.idx_blks_read) > 0
 )
Should be OR
wirelesspotat 2 hours ago

Agree with other commenters that the title is a bit confusing and should be renamed to something like "Is your Postgres workload read heavy or write heavy?"

But title aside, I found this post very useful for better understanding PG reads and writes (under the hood) and how to actually measure your workload

Curious if the tuning actions any different if you're using a non-vanilla storage engine like AWS Aurora or GCP AlloyDB or Neon?

J_McQuade 3 hours ago

This, as a few other commenters have mentioned, is a terrible article.

For a start, the article does not mention any other database. I don't know how you can say something is read or write heavy without comparing it to something else. It doesn't even compare different queries on the same database. Like, they just wrote a query and it does a lot of reads - so what? There's nothing here. Am I going mad? Why does this article exist?

  • acscott 3 hours ago

    A little context may be of help. Maybe a better headline for the article would have been, "How Can You Determine if your PostgreSQL Instance's Workload is Read-Heavy or Write-Heavy?" It's useful to know to help optimize settings and hardware for your workload as well as to nkow whether an index might be useful or not. Most major DBMSs will have some way to answer this question, the article is aimed at PostgreSQL only.

rednafi 4 hours ago

Despite using CTEs, I found the first query quite impenetrable. Could be because I don’t spend that much time reading non-trivial SQL queries.

I’ve been mostly using the `pg_stat_statements` table (the second technique) to find out whether my workload is read or write heavy, it’s plenty good in most situations.

  • teej 3 hours ago

    pg_ system tables aren’t built for direct consumption. You typically have to massage them quite a bit to measure whatever statistic you need.

moomoo11 3 hours ago

This article quality makes me not trust the company.

jagged-chisel 4 hours ago

> When someone asks about [database] tuning, I always say “it depends”.

Indeed. On your schema. On your usage. On your app. On your users.

  • mikepurvis 3 hours ago

    If it didn’t depend they’d just make the “tuned” value the default.

    • acscott 3 hours ago

      Exactly. The parameters you can configure are there due to a lack of automating those since what you want to optimize for might be different than an automaton would.

alberth 5 hours ago

Odd that OLTP wasn’t mentioned in the article.

Postgres an an OLTP databases, which are designed for write heavy workloads.

While that being said, I agree most people have read-heavy needs.

  • da_chicken 5 hours ago

    I disagree. I think the only people that have read-heavy needs are big data and data warehouses. AI being hot right now doesn't mean big data is the only game in town.

    Most applications are used operationally or have a mix of read and write. Even on applications where the user can only consume content present there, there is often more than enough data capture just tracking page history to be relatively write heavy.

    • withinboredom 4 hours ago

      Hmmm. Not really. Yes, everything is a mix, but for applications, it very much is on the read-heavy side. Think about how many queries you have to do just to display an arbitrary page. You might, maybe, just maybe, net 2-3 writes vs. hundreds of reads. If that starts to balance out, or even flip, then you probably need to rethink your database as you start to exit traditional db usage patterns. But <30% writes is not write-heavy.

      • da_chicken 3 hours ago

        I am thinking about that. I don't think most data is read that often in an OLTP system.

        I think a very small amount of data is read very often. However, until your DB gets very large, that data is going to end up as data pages cached in memory. So that data is extremely cheap to read.

        I also think a significant amount of data that is generated in an OLTP system is written and never read, but you still had to pay the cost to write it. If you have an audit log, chances are you never need to look at it for any one piece of data. But you definitely had to write all the metadata for it.

        But I'm also assuming that writes are at least 10 times as expensive as reads. More so if what you're modifying has indexes, since indexes are often functionally identical to a partial copy of the entire table. Indeed, I think that 10 times mark is conservative. Most RDBMSs use transaction logging and some kind of locking on writes. There's data validation and integrity checks on inserts and updates (and deletes if you have foreign keys).

        I think 1 write to 10 reads is still write-heavy.

    • hinkley 3 hours ago

      I think read replicas disagree with that pretty strongly.

      The write traffic may be very write heavy, but then you have many, many users who need to see that data. The question is whether the database or a copy of the data from the database is what services that interest.

      If you mediate all reads through a cache, then you have split the source of truth from the system of record. And then the read traffic on the system of record is a much lower ratio.

Cupprum 4 hours ago

Surprising amount of downvoted comments under this article. I wonder why

gdulli 5 hours ago

Is a ball red or green? How long is a piece of string?

  • phalangion 4 hours ago

    Did you read the article? It’s about how to tell if your database is read or write heavy.

    • johncolanduoni 3 hours ago

      I think a large part of what people are responding to here is the title, which comes off as something someone who doesn't actually understand the nature of a database workload would write. It may be a simple typo, but "Is YOUR Postgres Read Heavy or Write Heavy?" is the question that can have an answer. "Is Postgres More Appropriate for Read Heavy or Write Heavy workloads?" would also be fine, but it would be a totally different article from the written one.

developper39 5 hours ago

Very usefull, and it is clear that the author knows what he is talking about. Nice intro to Pg18 too.

lysace 5 hours ago

Insipid text.

Also: HN needs to upgrade its bot upvoting detection tech. This is embarrassing. It was proper ownage of the HN #1 position for like 15 minutes straight. And then like #2-3 for an hour or so.

  • zug_zug 4 hours ago

    Off topic, but I do feel like there is a significant number of things that mysteriously get to frontpage with 12-40 upvotes, zero comments, and then sit there getting no more upvotes / comments for like 20 minutes.

    Personally I agree that it's both possible to detect this better and would actually drastically improve the quality of this site if that wasn't the meta and think it's something that should be openly discussed (in terms of practical suggestions).

    • add-sub-mul-div 4 hours ago

      They don't care what gets ranked where other than their own recruitment and other submissions, for which this site exists.

      • Waterluvian 3 hours ago

        I don’t think this holds up to higher order thinking. On the surface, sure that makes sense.

        But then who left to look at the recruitment ads if the quality of the content, comments, and community degrades enough that everyone stops coming?

        All I know is that pretty much nobody here knows enough about the whole picture to have a meaningfully informed opinion. So a lot of these opinions are going to be driven by their imagination of the whole picture.

    • lysace 4 hours ago

      It is so incredibly obvious when see it, yes.