The problem you've mentioned, difficulty getting both parallelized builds and proper dependencies, is because you have "all the Makefiles". It's a recursive make problem; if you had a single instance of make able to see the entire dependency graph, a correct parallelized build would be trivial. Have you read Recursive Make Considered Harmful?
The catch is that it's hard to set up a single Makefile for an entire build; it's different than how anyone does it, so there's no good documentation. Tools like automake only sort of support it. You can include dir.mk files in each directory to have somewhat similar organization, but it gets...weird.
What's more realistic is switching to a make/autoconf/automake alternative like SCons. Everyone who writes an SConstruct (the equivalent of Makefile) uses it to make a single dependency tree of the entire project. It supports this model well by allowing inclusion of SConscript files in subdirectories. These are somewhat like having Makefiles at each level; it's more than just preprocessor-style inclusion. (They have separate scope, though you can import/export variables.) But unlike many Makefiles, it yields a single dependency graph.
Plus, I like SCons better anyway. Clean, uniform syntax (Python!), rather than m4 over make over sh. Built-in help checking for dependencies in C files. The ability to use a real scripting language to self-generate without introducing another new syntax. Less platform variation, since Python provides a more uniform API than the shell utilities. (No need to use portability m4 macros.)
