What does a 'purpose-built protocol for each message' look like? You avoid type/tagging overhead, but other than that I'd expect a ""sufficiently smart"" generic protocol to be able to achieve the same level of e.g. data layout optimization. Obviously ProtoBuf in particular is pessimising for the reasons you describe, but I'm thinking of other protocols (e.g. Flatbuffers, Cap'n Proto, etc.)
The problem is that "sufficiently smart" does a lot of heavy lifting.
One way to look at the problem is to go build a sufficiently smart generic protocol and write down everything that's challenging to support in v1. You have tradeoffs between size (slow for slow networks), data dependencies (slow for modern CPUs), lane segmentation (parallel processing vs cache-friendly single-core access vs code complexity), forward/backward compatibility, how much validation should the protocol do, .... Any specific data serialization problem usually has some outside knowledge you can use to remove or simplify a few of those "requirements," and knowledge of the surrounding system can further guide you to have efficient data representations on _both_ sides of the transfer. Code that's less general-purpose tends to have more opportunities fore being small, fast, and explainable.
A common source of inefficiencies (protobuf is not unique in this) is the use of a schema language in any capacity as a blunt weapon to bludgeon the m x n problem between producers and consumers. The coding pattern of generating generic producers/consumers doesn't allow for fine-tuning of any producer/consumer pair.
Picking on flatbuffers as an example (I _like_ the project, but I'll ignore that sentiment for the moment), the vtable approach is smart and flexible, but it's poorly suited (compared to a full "parse" step) to data you intend to access frequently, especially when doing narrow operations. It's an overhead (one that reduces the ability for the CPU to pipeline your operations) you incur precisely by trying to define a generic format which many people can produce and consume, especially when the tech that produces that generic format is itself generic (operating on any valid schema file). Fully generic code is hard enough to make correct, much less fast, so in the aim of correctness and maintainability you usually compromise on speed somewhere.
For that (slightly vague) flatbuffers example, the "purpose-built protocol" could be as simple as almost anything else with a proper parse step. That might even be cap'n proto, though that also has problems in certain kinds of nested/repeated structures because of its arena allocation strategy (better than protobuf, but still more allocations and wasted space than you'd like).