benad is currently certified at Apprentice level.

Name: Benoit Nadeau
Member since: 2002-04-03 17:23:09
Last Login: 2014-04-22 17:48:00

FOAF RDF Share This

Homepage: http://benad.me/

Notes:

I am Benoit Nadeau, jr. eng. in Software Engineering,
living and working in Montreal, Canada.

Projects

Recent blog entries by benad

Syndication: RSS 2.0

A Tale of Two Shells

Last year, I completed "properly" learning the bash ("Bash"?) shell, using a combination of the book "Learning the bash Shell" and reading from start to finish the gigantic "man" page. And that was enough to convince me that, regardless of its ubiquity, I don't like it much, be it as a scripting language or as a command-line shell.

Having already learned tcsh, because it was the default shell on Mac OS X and is still popular, I was ready to try out more modern shells, rather than ones stuck in the 80s.

First, on Windows, DOS is quite archaic and annoying. Simple things like sleeping for a second require unintuitive workarounds. DOS batch scripting is painfully difficult, so I was eager to find something better. And since Windows 7, this strange "PowerShell" is now installed by default, and was heralded as a revolutionary step in command-line shells. Is it?

After reading a few tutorials, including this "free ebook" on powershell.com, things became clear. Windows PowerShell is essentially a shell built atop .NET that manipulates streams of objects rather than plain text across pipes, though thankfully formats the objects as plain text on the console screen by default. It provides many "cmdlets" that can manipulate that stream in a more convenient way than grep and awk. For example, listing all files in a directory greater than a megabyte is trivial in PowerShell, while on UNIX shell requires an awkward (pun intended) combination of extracting character positions and arithmetic. The "PowerShell IDE", also provided with PowerShell, can even perform tab-completion of the fields of the objects out of the current pipe, making it easy to extract their attributes. Because so much of the Windows internals are accessible using COM and .NET, it is easy to perform system administration with it, for example installing system services and querying their status. The only major issue I've found with PowerShell is that executing PowerShell scripts is completely disabled by default until unlocked by an administrator. Also, as a minor gripe, the version of PowerShell that came out of the box with Windows 7 is quite outdated. Overall, if there was a "grand vision" of ".NET" in Windows, it is best represented by PowerShell.

Back on UNIX-like systems, it seems like users are quite happy with old shells, or at least incremental evolutions of the old "Bourne shell" and Berkeley's "C Shell". Looking around, I found "fish", the "Friendly Interactive SHell". I liked its ironic tagline of "Finally, a command line shell for the 90s", since it was initially released in 2005. It is, indeed, friendly, as it has a deliberately limited set of features, and has default out-of-the-box functionality that makes interactive use enjoyable. It was built around a comprehensive design document that explicitly favours usability over compatibility with older or popular shells. The results are spectacular: Everything has colour, TAB and arrow keys completion with a type ahead preview in light grey, inline argument completion for most commands (including "man") that present interactively all the options and their meaning automatically extracted from the "man" pages, editing the configuration through a built-in web service, applying configuration changes to all shells instantly, I could go on. Its scripting is quite limited, but that may be a good thing considering the Shellshock bug (Not that "fish" has no security holes, but at least they're not "as designed").

Personally, I am ready to move to both PowerShell and "fish" for day-to-day use. While they both don't have much in common to older shells, they are far more usable. I highly recommend to all command-line users.

Syndicated 2014-10-28 01:37:41 from Benad's Blog

Moniker, the Security Weak Point

By the time I heard about the "Shellshock bug" security hole in the morning of September 25, already the small Debian Linux server that I built up in Ramnode to host my web site patched that security hole by itself. At any rate, my web site hosts only static pages, so it was never impacted.

While I am in control of the security of my web site starting from the Linux kernel, to the web server, up to the web pages it serves, I'm still dependent on its hosting service (Ramnode) to be secure. Another potential danger would be for someone to hijack the "benad.me" domain name, and make it point to a version of the site filled with malware and viruses. Sadly, this almost happened.

Back in 2008 I registered my domain through the registrar Moniker, which used to be recommended partly based on its security. They implemented additional features that one could pay to "lock" the domain and prevent unauthorized transfers from someone that would steal your user name and password. Since then though, the company was bought by another company, and what is now called Moniker is only by name, both in terms of staffing and software.

I did notice a difference in tone in email communications from the new Moniker. They seemed to be highly focused on domain name auctions, and would automatically auction off expired domains. This felt like a conflict of interest, as Moniker would deride higher profit auctioning off your domain than helping you renewing it. Of course, they would never do that on valuable customers that do "domain speculation" and own a large number of (unused) domains, but still that raises the suspicion that the company was sold based on the number of domains it had and how much money they can extract from large speculators rather than providing valuable customer service.

The "new" Moniker had a security hole in 2013, and to fix that Moniker forced users to change their password the next time they logged in. Note though that this happened with the old version of that web site. This summer, the parent company that bought Moniker (and its name) scrapped the old site's code and replaced it with a new broken, buggy interface. The new interface also brought with it worse security, and made the domain locking feature completely ineffective.

By early October, Moniker sent an email to all its users saying that for untold security reasons, all the account passwords would be reset. The shock was that the email contained both the user names and passwords of all the user's accounts. I was shocked that my old Moniker account, identified by a standard-looking user name, was placed under a parent, numerically-identified user name I've never seen, and another numerical sub-account that was created without my knowledge. It should be noted that I could never access the numerical sub-account, even when using the password provided in the email. Also, the email said that your new passwords must fit security requirements, including the use of at least one "special character", even though the passwords provided in the email didn't contain any special character, and when attempting to change passwords, it would refuse most special characters.

OK, I'm not a security expert, but sending user names and passwords in an email, refusing special characters (which would indicate that they don't use bcrypt), and resetting the passwords of all users may indicate that they were hacked. Badly. Moniker cited the Shellshock bug, but as reports of stolen domains started to appear, a user came forth saying that the security hole predated Shellshock by a month.

So, I was convinced that Moniker had a pattern of behaviour of not taking security seriously, that is until they experience a mass exodus of their customers. I started the process of domain name transfer the day after they announced the password reset, and I would recommend everybody else to do the same. I transferred to Namecheap. Despite its name, in my case the price was the same, though as a test I created a new empty account before the transfer, and already I could attest that they take security seriously, including emails for account activity (using secondary email addresses in rotation) and 2-factor authentication (using SMS for now). I completed the transfer yesterday, so that would explain why there was a little bit of downtime when resolving my domain name.

Syndicated 2014-10-15 01:21:21 from Benad's Blog

Eventual Consistency, Squared

Looking back at my article "The Syncing Problem", implementing a generic DVCS seems like a relatively straightforward solution. Actually, if the "data to sync" was simplified to plain text, existing DVCS like git or Mercurial may be sufficient. But there is a fundamental problem I glossed over that has huge ramifications on the design of the DVCS that make existing DVCS implementations dangerous to use.

In modern "Internet-connected" appliances, there are two storage solutions: On-device, and "in the cloud". It is the "cloud" storage that is going to be used for the devices to communicate to each other indirectly when performing data synchronization. There is though a huge behavioural difference between on-device and cloud storage: The cloud storage is "eventually consistent". Beneath its API, the cloud storage itself may also be distributed across machines, and modified data can take a little while to propagate to other machines. Essentially, if you upload a file from one device, it may take a little while for another device to see the change.

Sadly, whatever conflict resolution used by a cloud storage provide is unreliable, because their behaviour is either undocumented or inconsistent. Locking files on such storage may not be possible either. Worse, internal synchronization issues at the cloud provider may make their internal synchronization speed so inconsistent to make it unreliable as a means to communicate information between devices quickly. Almost all VCS (distributed or not) assume a reliable storage area for the version repository. Hosted VCS guarantee ACID. No DVCS was made to push revision information to an unreliable storage, and use that as the primary means to exchange information.

The easiest solution for this is to design a DVCS that supports "write-only" repositories. If the storage key (file name) contains the checksum of the data it contains, it may be possible to have multiple clients writing to the same storage area changes to the shared repository. Even if listing available "data blocks" is inconsistent on the shared storage, all it can do is augment the "knowledge" of what exists in the repository against the repository stored on local storage. The atomicity of the storage blocks should be as close as possible to the atomicity of a transactional version control delta, especially since the cloud storage may make information appear on other devices out-of-order of how they were written. That could make those "patch files" larger than well-optimized VCS, but on cloud storage we may not have any other option.

Sure, a write-only repository may be a big issue if the files in the version control are too large or if storage is limited, but then most VCS tend to avoid deleting historical data, and when they support "cleaning up a repository", the solutions are clumsy and error-prone. In our case, if a device prematurely deletes older historical data in the shared storage, by being unaware that other devices were synced at older versions and may branch from there, then this would be tantamount to using the share storage to host only the latest version and nothing else. All to say, deleting historical data in a shared, eventually-consistent storage is a difficult problem that may involve a lot of tuning based on how long devices can stay unsynchronized before considering them "lost", compared to how fast the cloud storage is expected to be consistent.

Syndicated 2014-10-04 15:05:19 from Benad's Blog

Client-Side JavaScript Modules

I dislike the JavaScript programming language. I despise Node.js for server-side programming, and people with far more experience than me with both also agree, for example Ted Dziuba in 2011 and more recently Eric Jiang. And while I can easily avoid using Node as a server-side solution, the same cannot be said about avoiding JavaScript altogether.

I recently discovered Atom, a text editor based on a custom build of WebKit, essentially running on HTML, CSS and JavaScript. Though it is far from the fastest text editor out there, it feels like a spiritual successor to my favourite editor, jEdit, but based on modern web technologies rather than Java. The net effect is that Atom seems like the fastest growing text editor, and with its deep integration with Git (it was made by Github), it makes it a breeze to change code.

I noticed a few interesting things that were used to make JavaScript more tolerable in Atom. First, it supports CoffeeScript. Second, it uses Node-like modules.

CoffeeScript was a huge discovery for me. It is essentially a programming language that compiles into JavaScript, and makes JavaScript development more bearable. Its syntax reminds me a bit of the syntax difference between Java and Groovy. There's also the very interesting JavaScript to CoffeeScript converter called js2coffee. I used js2coffee on one of my JavaScript module, and the result was far more readable and manageable.

The problem with CoffeeScript is that you need to integrate its compilation to JavaScript somewhere. It just so happens that its compiler is a command-line JavaScript tool made for Node. A JavaScript equivalent to Makefiles (actually, more like Maven) is called Grunt, and from it you can call the CoffeeScript compiler directly, on top of UglifyJS to make the generated output smaller. All of these tools exist under node_modules/.bin when installed locally using npm, the Node Package Manager.

Also, by writing my module as a Node module (actually, CommonJS), I could also use some dependency management and still deploy it for a web browser's environment using Browserify. I could even go further and integrate it with Jasmine for unit tests, and run them in a GUI-less full-stack browser like PhantomJS, but that's going too far for now, and you're better off reading the Browserify article by Bastian Krol for more information.

It remains that Browserify is kind of a hack that isn't ideal for running JavaScript modules in a browser, as it has to include browser-equivalent functionality that is unique to Node and isn't optimized for high-latency asynchronous loading. A better solution for browser-side JavaScript modules is RequireJS, using the module format AMD. While not all Node modules have an AMD equivalent, the major ones are easily accessible with bower. Interestingly, you can create a module that can as AMD, Node and natively in a browser using the templates called UMD (as in "Universal Module Definition"). Also, RequireJS can support Node modules (that don't use Node-specific functionality) and any other JavaScript library made for browsers, so that you can gain asynchronous loading.

It should be noted that bower, grunt and many command-line JavaScript tools are made for Node and installed locally using npm, So, even if "Node as a JavaScript web server" fails (and it should), using Node as an environment for local JavaScript command-line tools works quite well and could have a great future.

After all is said and done, I now have something that is kind of like Maven, but for JavaScript using Grunt, RequireJS, bower and Jasmine, to download, compile (CoffeeScript), inject and optimize JavaScript modules for deployment. Or you can use something like CodeKit if you prefer a nice GUI. Either way, JavaScript development, for client-side software like Atom, command-line scripts or for the browser, is finally starting to feel reasonable.

Syndicated 2014-08-15 03:14:41 from Benad's Blog

The Case for Complexity

Like clockwork, there is a point in a programmer's career where one realizes that most programming tools suck, that not only they hinder the programmer's productivity, but worse may have an impact on the quality of the product for end users. And so, there are cries of the absurdity of it all, some posit that complex software development tools must exist because some programmers like complexity above productivity, while others long for the days where programming was easier.

I find these reactions amusing. Kind of a middle-life crisis for programmers. Trying to rationalize their careers, most just end up admitting defeat for a professional life of mediocrity, by using dumber tools and hoping to avoid the main reason why programming can be challenging. I went into that "programmer's existential crisis" in my third year as a programmer, just before deciding on making it a career, but I went out of it with what seems to be a conclusion seldom shared by my fellow programmers. To some extent this is why I don't really consider myself a programmer but rather a software designer.

The fundamental issue isn't the fact that software is (seemingly) unnecessarily complex, but rather trying to understand the source of that complexity. Too many programmers assume that programming is based on applied mathematics. Well, it ought to be, but programming as practiced in the industry is quite far from its computer science roots. That deviation isn't due only from programming mistakes, but due to the more irrational external constraints and requirements. Even existing bugs become part of the external constraints if they are in things you cannot fix but must "work around".

Those absurdities can come from two directions: Top-down, based on human need and mental models, or Bottom-up, based on faulty mathematical or software design models. Productive and efficient software development tools, by themselves, bring complexity above the programming language. Absurd business requirements, including cost-saving measures and dealing with buggy legacy systems not only bring complexity, but the workarounds they require bring even more absurd code.

Now, you may argue that abstractions make things simpler, and to some extent, they are. But abstractions only tend to mask complexity, and when things break or don't work as expected, that complexity re-surfaces. From the point of view of a typical user, if it's broken, you ask somebody else to fix it or replace it. But being a programmer is being that "somebody else" that takes responsibility into understanding, to some extent, that complexity.

You could argue that software should always be more usable first. And yet, usable software can be far more difficult to implement than software that is more "native" to its computing environment. All those manual pages, the flexible command-line parameters, those adaptive GUIs, pseudo-AIs, Clippy, and so on, bring enormous challenges to the implementation of any software because humans don't think like machines, and vice-versa. As long as users are involved, software cannot be fully "intuitive" for both users and computers at the same time. Computers are not "computing machines", but more sophisticated state machines made to run useful software for users. Gone are the days where room-sized computers just do "math stuff" for banks, where user interaction was limited to numbers and programmers. The moment there were personal computers, people didn't write "math-based software", but rather text-based games with code of dubious quality.

Complexity of software will always increase, because it can. Higher-level programming languages become more and more removed from the hardware execution model. Users keep asking for more features that don't necessarily "fit well", so either you add more buttons to that toolbar, or you create a brand new piece of software with its own interfaces. Even if by some reason computers stopped getting so much faster over time, it wouldn't stop users from asking for "more", and programmers from asking for "productivity".

My realization was that there has to be a balance between always increasing complexity and our ability to understand it. Sure, fifty years ago it would be reasonable to have a single person spend a few years to fully understand a complete computer system, but nowadays we just have to become specialized. Still, specialization is possible because we can understand a higher-level conceptual design of the other components rather than just an inconsistent mash up of absurdity. Design is the solution. Yes, things in software will always get bigger, but we can make it more reasonable to attempt to understand it all if, from afar, it was designed soundly rather than just accidentally "became". With design, complexity becomes a bit smaller and manageable, and even though only the programmers will have to deal with most of that complexity, good design produce qualities that become visible up to the end users. Good design makes tighter "vertical integration" easier since making sense of the whole system is easier.

Ultimately, making a better software product for the end users requires the programmer to take responsibility for the complexity of not only the software's code, but also of its environment. That means using sound design for any new code introduced, and accepting the potential absurdity of the rest. If you can't do that, then you'll never be more than a "code monkey".

Notes

  1. Many programmers tend to assume that their code is logically sound, and that their errors are mostly due to menial mistakes. In my experience, it's the other way around: The buggiest code is produced when code isn't logically sound, and this is what happens most of the time, especially in scripting languages that have weak or implicit typing.
  2. I use the term "complexity" more as the number of module connections than the average of module coupling. I find "complexity as a sum" more intuitive from the point of view of somebody that has to be aware of the complete system: Adding an abstraction layer still adds a new integration point between the old and new code, adding more things that could break. This is why I normally consider programming tools added complexity, even though their code completion and generation can make the programmers more productive.

Syndicated 2014-07-31 02:14:53 from Benad's Blog

106 older entries...

 

benad certified others as follows:

  • benad certified benad as Journeyer
  • benad certified llasram as Journeyer
  • benad certified shlomif as Journeyer

Others have certified benad as follows:

  • benad certified benad as Journeyer
  • llasram certified benad as Journeyer
  • pasky certified benad as Apprentice

[ Certification disabled because you're not logged in. ]

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page