I believe that programs can be designed so that they are provably correct with respect to their specification. However, I do not believe that writing good specifications is easy. Let's look at ways bugs can creep in.
At the very lowest level, it's always possible that the definitions used in the formal proof don't match the meanings intended. For example, if you run Metamath's set.mm through tr 45 54, it will happily verify the theorem ( 2 + 2 ) = 5. I'd consider this a bug, even though when you expand out all the definitions, the resulting hodgepodge of predicate calculus and ZFC set theory is a true statement. While confusing 4 and 5 is probably unlikely in practice, it's pretty easy to see how more subtle bugs in mapping may occur.
Of course, when trying to prove things correct about existing things, the big problem is simply unmanageable complexity. Only toy langauges tend to have formally defined semantics at all. The specifications for all "real" programming languages are simply too ambiguous to prove anything at all, especially when you consider the runtime as well. I believe that it is possible to design programming languages and runtimes to be much, much simpler. If you want formal proofs, you'll either have to chuck existing things in favor of the new, simpler variety, or you'll have to spend a huge amount of effort to deal with the warts and overcomplexity of existing things. Both approaches have obvious disadvantages.
Compatibility with existing designs is also important. Internal consistency can be proved, but compatibility can only be tested. This principle extends to file formats, programming languages, instruction sets, compression algorithms, communication protocols, file systems, all manner of I/O techniques, API's, etc. In other words, most of the things you care about when building a real system.
Another technical issue is the difficulty of writing specifications that encompass all of the desired functionality. For example, if you want to prove that an implementation of a public key cryptosystem is protected against timing attacks, then your model of execution must take timing into account. Such a thing is fairly rare.
Some things lend themselves fairly well to formal specification, others less so. Compression algorithms (particularly lossless ones) are good examples of the former, while GUI's are good examples of the latter. If you haven't written a good spec, then you may end up with a flawless implementation of buttons that disappear when you press them, menus hidden behind windows, and so on. The only way out is careful design and thorough testing, which is of course the same as now.
Overall, I believe that provably correct programming is worth the effort. It will take a very long time, though.
Debuggability of reference counting
Peter Deutsch sent email criticizing reference counting on the basis that all clients have to correctly adjust the references. Often, a garbage collected runtime can be more forgiving.
I see the point, but still think that reference counting is easier. For one, improved technology makes it easier to debug problems in reference counted systems. In particular, I'm thinking about Valgrind. The most common reference counting problems (failing to decrement, decrementing before last use, decrementing twice) result in completely clear diagnostics from Valgrind, complete with stack traces. Even better, it's deterministic. By contrast, GC bugs depend critically on when garbage collections occur.
Another point is that the client is not always written in a very low level language, and thus may not have to do the runtime memory discipline by hand. That's one reason I'm interested in the Python bindings.
Thus, I think that reference counting is a perfectly reasonable approach when debugging is a goal.