Why Python is not my favorite language

Posted 17 Apr 2009 at 12:01 UTC by audriusa Share This

I recently had an opportunity to develop a project in Python. I have previously had a largely neutral opinion about this language. However after more serious development I would like to share some doubts.

I will not be talking about the implementation - related issues. Also, Python do has positive, advanced features that are worth noting. Python code for the same task is really shorter: you need more C or even Java code to write something like a = b[:3], and especially b[-1] looks nicer then b.get(b.size()-1). But when line number rolls over the first thousand, more things began to matter.

1.In Python, an assignment to the non existing variable raises no alert. You will not be warned during compilation and you will not get any alert on execution. Instead, a new variable with your typo will be created, it will store the assigned value, and the original value that was meant to be changed will stay as it was. It will be no obvious crash anywhere, just a buggy behavior to debug. This problem is caused by absence of declarations in this language - variables (including object fields) are created on demand upon assignment.

2. Assigning the value of the wrong type may result a bug far away from that assignment. Especially the "numberstrings" are problematic. While Python supports the normal integer type, a bug in the code may assign a string like "1234" to the same variable instead (for instance, when getting the string value from some input stream). Comparison of the "1234" with the true integer value will raise no warning and also usually produces the correct result: "1234" > 1500, for instance. However "4321" > 12340 produces "True" as the values are finally compared as strings. Hence our bug shows up only sometimes. During debugging, locating the problematic comparison statement does not help: instead, you need to find where and how the "numberstring" managed to get into the variable. Hence I think that it was an error to implement "more" and "less" operators between string and integer, even in a weakly typed language.

3. The "self" operator is strange and inconvenient. Python demands the .self prefix for every reference to the field, even from inside the method of the same object. Forgetting "self" assigns the parameter to the global variable without warning, creating a bug that is not automatically detected during compilation neither at run time. Mandatory 'self' looks something very much against the Python philosophy to reduce the number of printed characters around.

I am not sure if these issues could be easily fixed in the Python language ("numberstrings" definitely look more like a bug) but they do make a sentence "switch into Python for more productivity" to sound less convincing. While it would be interesting to see the new, next generation language emerging, it may be that the winner is somewhere below the second half of the Tiobe index - something that still needs to come.


Perl doesn't have those problems?, posted 17 Apr 2009 at 13:11 UTC by pphaneuf » (Journeyer)

Am I mistaken or Perl doesn't have those problems (while possibly bringing in another boatload of problems in exchange)?

#1 and #3 are somewhat related, where if #1 gave a warning, the missing "self." in #3 would raise that warning (it creates a local variable, by the way, not a global one). With Perl, you can "use strict" in your code, which would get you a warning about using a global symbol without a package name, and you can use "my" to make it local (or put the missing "$self->" bits, as appropriate).

For #2, Perl has separate comparison operators for string comparison and numerical comparison, so the coercion happens the right way: "4321" > 12340 is false (numerical), and "4321" gt 12340 is true (string-wise).

Weird, Perl being safer and/or clearer...

Maybe problems.. yeah, posted 17 Apr 2009 at 14:13 UTC by zanee » (Journeyer)

So #1 could be a problem, sure..

#2 could also be problematic, yes and it's happened to me every now and then, it's nothing that I couldn't identify though.

However #3 I don't see as a problem; it makes the code more clear for me when having to deal with larger code bases. Especially when someone else has not labeled variables all that well.

Also, for #2 Perl would have this issue on the vice versa. So if lets say a Python programmer switched to do something in Perl not realizing the comparison operations that exist.. So in this case i'd have to say that all in all it'd be the intricacies of learning a different language in comparison to others rather than out and out bullseye issue.

no it doesn't, posted 17 Apr 2009 at 15:06 UTC by lkcl » (Master)

Python demands the .self prefix for every reference to the field, even from inside the method of the same object

no, it doesn't. a class must have "a first variable" to represent the class instance.

the CONVENTION is to call that first variable "self".

you could call it "this":


class x:
   def __init__(this):
        this.y = 5

but you would be annoying and confusing a lot of people by not following the convention.

_but_, if you were significantly worried about number of characters, you could specify the first parameter as "s" not "self".

let me ask you: without the "self." convention, how would you expect to distinguish between a local variable called "x" and a member instance variable called "x"?

no they're not, posted 17 Apr 2009 at 15:13 UTC by lkcl » (Master)


>>> "4321" > 4322
True
>>> "4321" > 4320
True
>>> "4321" > 3320
True
>>> "4321" > 0
True
>>> "4321" > 5320
True
>>> "4321" > "5532"
False
>>> "4321" > "55321"
False
>>> "4321" > "3320"
True

string compare appears to be a weirdo case. certainly, strings-compared-to-strings will do a lexical analysis.

but, the str object appears to have no __cmp__ function, which tells me that something distinctly non-orthogonal is done.

it looks like non-string-comparisons always return True.

Test cases, posted 17 Apr 2009 at 15:26 UTC by lkcl » (Master)

I have previously had a largely neutral opinion about this language. However after more serious development I would like to share some doubts.

not being funny or anything but if you're doing "serious" development, then you will have automatically gone straight into producing test cases, unit tests and regression tests.

without fail, one hundred percent, you _will_ have a test suite that is significantly larger (two to five times larger) than the rest of the code put together.

you will, of course, have been developing the test cases along-side the code, and in may instances have written the test cases _before_ writing the actual code.

the developers will also have been bluntly told that they will, without fail, check in code into a revision control repository WITHOUT FAIL on a regular basis, as well as running the test suites and regression tests without thinking twice and without being asked or reminded.

thus, you have a clearly-delineated linear progression of the project, where you can, if a bug is found (and it will be, very quickly, because of the regression tests), back-track through the repository history to find out which commit broke the relevant test.

this is standard coding practice, and you are, naturally, sticking to good programming practices with such a large project.

the thing with python is that, as a dynamic language, you MUST stick to good programming practices. use pylint as a matter of course, where you define your "rules" regarding variablenames etc.

if you don't, you can expect to run into difficulties.

with python, i strongly advise you to do an absolute maximum of no more than 20 lines of code before writing a test (or three) and doing a repository commit.

preferably as little as 5 to 10 lines.

if that's awkward to do, because of the revision control system being used, get a better revision control system, such as git. you can even back-end git into svn (git-svn) and in that way, developers can work independently and "sync up" using the svn repository as the clearing-house. so they can even commit quotes broken quotes code into git, fix it, and push it to svn when they're ready.

Commentary on your problems, posted 23 Apr 2009 at 18:00 UTC by Omnifarious » (Journeyer)

I think 1 and 3 are common in dynamic languages. Perl's use strict; is a pretty unusual feature.

You can get some checking for bad member variables assignments in Python by using __slots__.

#2 is just plain strange. After some research, here's what's going on:

Strings always compare as larger than everything else other than other strings.

Objects compare by id and compare as greater than integers but less than strings.

integers compare as lower than everything else.

True and False are treated as the integer values 1 and 0 for the purposes of comparison.

So basically Python is defining an ordering for everything. Type is the primary key and then value is the secondary key. This is a little strange, and I think can lead to odd errors. On the other hand it does allow you to always sort an array without worrying about getting an exception.

cool, posted 24 Apr 2009 at 20:43 UTC by lkcl » (Master)

... so, if you have a list of randomly-typed objects, all the booleans will end up at the front of the list (False, False, False, True, True, True, True), then all the integers (sorted), then floats (or maybe longs) then longs (or maybe floats), then complex numbers.... then strings.... etc.

or whatever the type-ordering-by-number is.

cool!

i like that.

Why is there no favorite General Purpose Language?, posted 29 Apr 2009 at 04:48 UTC by DeepNorth » (Journeyer)

I can more than feel your pain.

WRT auto declaration of variables, I no like. At the very least a language should allow you to get the compiler to do this work by insisting on declarations. This almost always causes bugs and they are completely unnecessary, IMO. It might be easier for beginners, but I don't think beginners should be writing production code.

Somebody said: 'let me ask you: without the "self." convention, how would you expect to distinguish between a local variable called "x" and a member instance variable called "x"? '

If the language was half-way sensible, it would have scoping rules. If the programmer was half-way sensible they would not be reusing variable names in near-scope like this unless they are things for which conventions exist (i,j,k) and are typically in a local scope on the same page.

That sorting thing is bizarre and (IMO) violates the 'Rule of least surprise'.

------

At the risk of insulting everybody's favorite language -- they ALL suck.

I have done most of my production work in C, C++, Java and (I am not kidding) Visual Basic. Of the group, C is the least offensive. I will write a more coherent article about this when I go begging for someone to help me create a language that does not suck. However, let me just beat my breast about a few things:

I assume that a competent general purpose language is possible. Most of the world's code is written in C, it seems. Certainly, just about anything *can* be written in C (I assume inline assembly and the emission of arbitrary bytes). For most, if not all, of the world's devices for which a programming language has much meaning, there exists a C compiler.

Before I go on about C, let me just dismiss a few things:

Scripting languages and/or interpreted languages are fine as it goes. However, without facilities to compile, create large systems, program down to bare metal, etc, they are not general purpose in and of themselves. At some level, shell scripting is necessary and desirable. However, before we insist on picking whichever of the incumbents is the winner, let's defer that and assume for the moment that our general purpose language *can* be a scripting and interpreted and byte-code targeted language as well as a properly compiled language.

Java might be a good stop-gap and it is certainly gathering steam. However, it is targeted to a virtual machine and that means a performance hit any way you slice it.

C++ is also popular. However, for me, the cure is worse than the disease. It is cumbersome and ungainly. It makes unnecessary distinctions, IMO (like structs and classes) and adds junk like:

cout << "I should not have been allowed to design a language";

I know people are going to nuts over that, but I am not the only one that thinks that is just ugly and superfluous.

I came originally through Machine Code (sic) ->Assembler -> Basic -> Fortran -> PL/1 -> Pascal -> Modula2. By the time I got to C, I had developed good habits from Modula2, but was keenly aware of its limitations. C was a nightmare to learn at first and I still call it 'the language of memory corruption'. However, C allowed me to do whatever I needed to do and provided plenty of ability to build abstractions back when memory was measured in bytes and Kilobytes.

One of the main complaints about C is that that it is 'unsafe'. This is true. However, a powerful language will always be potentially unsafe. It allows you to attempt direct memory access, manipulate the stack, do arithmetic on pointers, etc. Sometimes these things are necessary. People complain that there is some problem with pointers. I have a vague recollection that it was difficult to grasp a long time ago when I was learning the language. However, once you get the hang of it, it is very natural and though it is 'low-level' it is more reflective of the actual thing you are programming. It is easy to create a 'safe' string type and a set of operations. However, the fact that seasoned C programmers don't generally bother should tell you something about how necessary it is.

The problem with C is not that it allows you to do system programming. The problem is that it makes it very cumbersome to express certain desirable things. I don't want to say 'object oriented programming' here, because I don't want to bring in all the baggage associated with the paradigm. I need facilities that make it possible, but those facilities serve other purposes as well.

I need to be able to create structs (classes in C++) that have some way to self-reference and (potentially) to have prologues and epilogues (constructors and destructors in C++). C++ provides this, but it is clumsy, not very good looking and unnatural.

Some might say that quibbling about giving a second name to structs that have new abilities is silly. However, it is not that I object to the name change. It is that I object to creating an unnecessary distinction between a record that has function pointers (and associated 'behavior') and a record that (for now) does not have such things.

Unless there is a reason to distinguish between something that points to data that happens to be code and something that points to data that happens to be data of another type, I don't think it should be enforced in the language.

Similarly, it seems to me that it should be possible and desirable to remove the distinctions between declarations of code and data.

C suffers from the problem that it does not have facilities for OO programming. Without getting into a debate about their advisability, some of the encapsulation and 'syntactic sugar' provided by these facilities is pretty much a necessity in large modern software systems. It is certainly a nasty business to attempt any kind of GUI work without some kind of object encapsulation at some level.

Other languages address various things, but then make you pay an unnecessary price and fall down in places that make many of them unusable.

A general purpose language should allow you to do anything you wish. It should make it easy to do things well, default sensibly, but allow you to give the compiler all the clues it needs to optimize the finished product.

It seems to me that, warts and all, C is still the best starting point for this and that it should be relatively easy to design the language and build the compiler.

There really is no clear general purpose language that is small, powerful and highly extensible. If there were, we would all be using it. However, I do not see any reason why there should not be such a language.

Although I have my own reasons for preferring C, I think it would be a good candidate starting point because so much of the world's code is built in it already and it would be easier to write translators to the new target language.

I would keep the macro pre-processor, but beef it up to handle template-like stuff. I know a lot of people hate the pre-processor, but to them I would say "don't use it".

I am not a big fan of the assert, break, goto, continue, register, setjmp, longjmp, auto and similar keywords that are either semantically empty or bust structure. However, I think that some facility should exist to allow them to either be created or invoked to allow compatibility.

Although I am not entirely certain how you would implement it elegantly, I think that there should be 'syntactic sugar' such that the compiler figures out whether or not you have attempted to capture an event and if so, trigger the event handler. Thus:


void SomeBlock( void ) {
  int x = 2;
  int y;


void x.=( int x ) // What to do when assignment to x { if( x < 1 || x > 10 ) { // S/B manifest consts printf( "x value is out of range\n" ); // Value not set } else { this = x; } }

void x.x( int y ) // What to do when assignment from x { if( y < 0 && y < 11 ) { // S/B manifest consts printf( "y value is already in range\n" ); // Value not set } else { y = this; } }

x = 11; // invoke 'onSet' handler y = x; // y will equal 2, x will equal 2; x = 3; // x will equal 3; }

If you have done this, then the compiler should implement the semantics. If not, then the compiler should implement as an ordinary int.

If only y were referenced later, then a smart compiler should be able to simply replace the entire block by assigning y = 2 and strip the rest of the code out. Though it might be tricky to implement, the code above would allow such an optimization.

The code above brings up other points. As I juggle code about, I should be able to do nested functions. I should also be able to access a calling context. The call stack should be available. The name of the function should be available. Information sufficient to locate it in code should be available (usually filename, line number).

Although I like the convenience of breaking interfaces into header files and implementations into code files and that is normally sufficient, I think it is a mistake to conflate the file with the role. That is, I should be able to specify nested interfaces such that many could exist in one file or one could exist in many files. For that matter, I should be able to put code and interface in the same file if I wish. Let the compiler figure out how to deal.

I am obliged to do it, but I am not fond of having to:


#ifndef STUFFUSEDALLOVER
  // define that stuff
  #define STUFFUSEDALLOVER
#endif

Let the compiler figure that out.

There should be some common sense to the compiler. If the standard stream output library is Con.StreamIO, it has standard calls like PrintLn, you have a file named 'hello.g', then instead of:


/* Whole bunch of stuff the compiler should deal with */
implementation hello
use definition hello in "hello.h"


import Con.StreamIO in "constuff.h";

int main( int argc, char **argv, char **envp ) { int SomeInventedReturnVal = 0;

// Bunch of warnings because arguments not used (void) Con.StreamIO.Println( "Hello? This is too much work!\n" ); (void) Con.StreamIO.Println( "argc was:" ); (void) Con.StreamIO.Println( argc.ToString() ); (void) Con.StreamIO.Println( "\n" );

return( SomeInventedReturnVal ); }

The above is extreme, but it is variations on a theme with many languages. I am the first to agree that printf is *way* not pretty. However, it does a job. Sure, I should be able to specify where my tokens are coming from, but if pretty much any program is going use them, let the compiler default to something sensible. For purists, we could make a default implied import file on the fly.

Let the compiler implicitly include defaults for common definitions encountered, construct whatever wrapper it needs for the operating system, use the same implicit environment and 'self' references it would normally, divine a return value, etc.

All the above should be a one liner:


printf( "Hello! This is more like it.\nArgc was:%i\n", ctxt["argc"] );

Where the return value is the value of the function printf and ctxt[] is an associative array taken from the local 'this' context.

Things that 'bug' me:

The return keyword is not consistent. Using it without brackets should be a syntax error.

The keywords if, for, while, etc should all require braces, even if they are only followed by a single line. This is a constant cause of bugs, and again is not consistent.

boolean is a legitimate type because it is returned from things. I am pretty sure they have corrected this in one of the later ANSI C standards, but really... Anything that you are consistently using typedefs for should just be given a default existence. Again, let the compiler worry about it. Given that boolean is a type and though it might be implemented as a bit, byte, int or whatever, it also might not. Values assignable to a boolean should be true/false. I am torn on this point: should it allow a third type of 'undefined'? Unless the language enforces defaults for every declaration, a declared boolean that has not been assigned is not properly true or false -- it's undefined. If I had to make a snap decision, I would say boolean can only take 1 = 'on/true' or '0' = 'off/false' because a lot of the world is using bits as booleans.

Surely there must be some way to make a kick-ass general purpose language with a spare, consistent syntax and elegant semantics. Because C is so spartan and already fairly consistent (well, ish) we should be able to make it sufficiently similar to C so that the porting effort would be reasonable.

I think a lot of the non C/C++/C#/Java people might scream. However, with the right libraries, I think they could likely be bought off by the greater power and expressiveness of the language. Functional programmers are likely sophisticated enough to know that functional programming is *possible* in such a language, just not enforced (and with good reason).

I keep hearing about how this thing or that thing is wonderful, but I don't see a lot of clean, vanilla working code for non-trivial things. In fairness, most of the C code underlying these things is pretty awful and it is usually difficult or impossible to get it to compile.

This year marks the 40th anniversary of the original design of C. It is time for an upgraded language and Java/C++/C#/Python/Ruby/Haskell/Scheme, etc just don't do the trick. Surely we can do better.

What a mess. Sorry this is just a rant. I hope to eventually write a proper article and contact people to see if something can be done.

Re: Why is there no favorite General Purpose Language?, posted 29 Apr 2009 at 21:44 UTC by atai » (Journeyer)

I don't know what language will fill the role you describe, but I am sure I will be able to write an article on Advogato titled "Why <your ideal language> is not my favorite language"

another rule about tests, posted 6 May 2009 at 02:10 UTC by connolly » (Master)

lkcl writes:

an absolute maximum of no more than 20 lines of code

In a 18 Mar 2006 journal entry, I wrote:

I made a sort of new year's resolution for 2006: no more undocumented, untested code.

In practice, my rule is: if the code works the first time, I sometimes let it be. But if not, I add a test before I try to fix it.

Since then, my my code is much more often an asset than a liability just waiting to bite me. The confidence that comes from having test cases is something that can't be overemphasized.

And the python doctest module rocks. I found something a little bit similar for javascript (thanks Ian Bicking!) but not PHP. And it's much more a cultural value in the python and ruby communities. I suppose the "serious engineering" segments of perl, php, and javascript also do reasonable testing. But it's not a "batteries included" sort of thing.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page