While it is true that Python scripts are far more superior to cryptic bash scripts, those scripts shouldn't be on the CI's side. Nothing wrong having build.py and test.py and the others, but there are no good reasons for them to be executable only by the CI. Debugging CI scripts is just asking for non-stop pager duty.
Most all configuration specifications have a limit unless they are turing complete. Complex systems evolve beyond static configurations easily but the question is can you leverage configuration generation to minimize logic inside your CI?
Orchestration ought not be the province of your build scripts, that should be the role of CI. If the configuration isn't capable of variadic functionality but is capable of orchestration... what do you do?
What is stopping you from making a project dedicated to orchestration/configuration/deployment and using test containers to make those tests? How do you debug those "complex systems" when they fail?
I've made a project in one of my old jobs dedicated to configuration and deployment tools used by CI jobs. At first it got some criticisms about "testing the CI that tests the software" and "testing the tests" but when the number of fires we have to take care of was cut by half it quickly became the norm.