Essentially any specification that includes any kind of image support will include this kind of chain of specifications; just as any system that does networking will eventually end up with TCP, any system that does text ends up with Unicode, etc. Even the simplest possible 1995-esque browser will have to deal with that (support for images was added in 1993, and text and networking were always central).
> Even the simplest possible 1995-esque browser will have to deal with that (support for images was added in 1993, and text and networking were always central).