michi is currently certified at Master level.

Name: Michael Starzinger
Member since: 2009-03-06 15:58:57
Last Login: 2011-03-16 18:20:27

FOAF RDF Share This

No personal information is available.

Projects

  • Lead Developer on CACAO

Recent blog entries by michi

Syndication: RSS 2.0

All-Rules Mail Bundle gets a new home

When I first published the All-Rules Mail Bundle more than two years ago and also provided a precompiled binary, I didn’t spend much thought about where to host the binary. Just hosting it on GitHub together with the source seemed an obvious choice. But then GitHub said goodbye to uploads and discontinued their feature to upload binary files.

At this point I have to say that I wholeheartedly agree with their decision. GitHub is a great place to host and share source code and I love what they are doing. But hosting (potentially big) binary files was never the idea behind GitHub, it’s just not what they do. Better stick to your trade, do one thing and do it well. Hence the search for a new home began. It’s important to remember that cool URIs don’t change, so the new home for the All-Rules Mail Bundle binary better be permanent, which is why I decided to host the binary on my own server. Also the staggering number of 51 downloads over the past two years reassured me that my available bandwidth could handle the traffic.

Where to get the bundle

The source code repository will of course remain on GitHub and its location is unchanged. Only the location of the binary package has changed and moved off GitHub. The usual amount of URL craftsmanship should allow you to reach previous versions of the binary package.

Note that I also took this opportunity to compile a new version 0.2 binary package. This version contains all the compatibility updates I made over the past two years and is compatible with several environments up to the following.

  • Max OS X Mountain Lion 10.8.4
  • Mail Application 6.5
  • Message Framework 6.5

As always, your feedback is very much appreciated and I am looking forward to the next fifty or so downloads.

Syndicated 2013-06-09 20:25:23 from michi's blog

Daneel: Type inference for Dalvik bytecode

In the last blog post about Daneel I mentioned one particular caveat of Dalvik bytecode, namely the existence of untyped instructions, which has a huge impact on how we transform bytecode. I want to take a similar approach as last time and look at one specific example to illustrate those implications. So let us take a look at the following Java method.

public float untyped(float[] array, boolean flag) {
   if (flag) {
      float delta = 0.5f;
      return array[7] + delta;
   } else {
      return 0.2f;
   }
}

The above is a straightforward snippet and most of you probably know how the generated Java bytecode will look like. So let’s jump right to the Dalvik bytecode and discuss that in detail.

UntypedSample.untyped:([FZ)F:
  [regs=5, ins=3, outs=0]
   0000: if-eqz v4, 0009
   0002: const/high16 v0, #0x3f000000
   0004: const/4 v1, #0x7
   0005: aget v1, v3, v1
   0007: add-float/2addr v0, v1
   0008: return v0
   0009: const v0, #0x3e4ccccd
   000c: goto 0008

Keep in mind that Daneel doesn’t like to remember things, so he wants to look through the code just once from top to bottom and emit Java bytecode while doing so. He gets really puzzled at certain points in the code.

  • Label 2: What is the type of register v0?
  • Label 4: What is the type of register v1?
  • Label 9: Register v0 again? What’s the type at this point?

You, as a reader, do have the answer because you know and understand the semantic of the underlying Java code, but Daneel doesn’t, so he tries to infer the types. Let’s look through the code in the same way Daneel does.

At method entry he knows about the types of method parameters. Dalvik passes parameters in the last registers (in this case in v3 and v4). Also we have a register (in this case v2) holding a this reference. So we start out with the following register types at method entry.

UntypedSample.untyped:([FZ)F:
  [regs=5, ins=3, outs=0]               uninit uninit object [float bool

The array to the right represents the inferred register types at each point in the instruction stream as determined by the abstract interpreter. Note that we also have to keep track of the dimension count and the element type for array references. Now let’s look at the first block of instructions.

   0002: const/high16 v0, #0x3f000000   u32    uninit object [float bool
   0004: const/4 v1, #0x7               u32    u32    object [float bool
   0005: aget v1, v3, v1                u32    float  object [float bool
   0007: add-float/2addr v0, v1         float  float  object [float bool

Each line shows the register type after the instruction has been processed. At each line Daneel learns something new about the register types.

  • Label 2: I don’t know the type of v0, only that it holds an untyped 32-bit value.
  • Label 4: Same applies for v1 here, it’s an untyped 32-bit value as well.
  • Label 5: Now I know v1 is used as an array index, it must have been an integer value. Also the array reference in register v3 is accessed, so I know the result is a float value. The result is stored in v1, overwriting it’s previous content.
  • Label 7: Now I know v0 is used in a floating-point addition, it must have been a float value.

Keep in mind that at each line, Daneel emits appropriate Java bytecode. So whenever he learns the concrete type of a register, he might need to retroactively patch previously emitted instructions, because some of his assumptions about the type were broken.

Finally we look at the second block of instructions reached through the conditional branch as part of the if-statement.

   0009: const v0, #0x3e4ccccd          u32    uninit object [float bool
   000c: goto 0008                      float  uninit object [float bool

When reaching this block we basically have the same information as at method entry. Again Daneel learns in the process.

  • Label 9: I don’t know the type of v0, only that it holds an untyped 32-bit value.
  • Label 12: Now I know that v0 has to be a float value because the unconditional branch targets the join-point at label 8. And I already looked at that code and know that we expect a float value in that register at that point.

This illustrates why our abstract interpreter also has to remember and merge register type information at each join-point. It’s important to keep in mind that Daneel follows the instruction stream from top to bottom, as opposed to the control-flow of the code.

Now imagine scrambling up the code so that instruction stream and control-flow are vastly different from each other, together with a few exception handlers and an optimal register re-usage as produced by some SSA representation. That’s where Daneel still keeps choking at the moment. But we can handle most of the code produced by the dx tool already and will hunt down all those nasty bugs triggered by obfuscated code as well.

Disclaimer: The abstract interpreter and the method rewriter were mostly written by Rémi Forax, with this post I take no credit for it’s implementation whatsoever, I just want to explain how it works.

Syndicated 2011-05-08 20:44:14 from michi's blog

Daneel: The difference between Java and Dalvik

Those of you who kept following IcedRobot might have seen that quite some work went into Daneel over the past months. He1 is in charge of parsing Android applications containing code intended to run on a Dalvik VM and transforming this code into something which can run on any underlying Java VM. So he is a VM compatible with Dalvik on top of a Java VM, or at least that’s what he wants to become.

So Daneel is multilingual in a strange way, he can read and understand Dalvik bytecode, but he only speaks and writes Java bytecode. To understand how he can do that we have to look at the differences between those two dialects.

Registers vs. Stack: We know Dalvik bytecode uses a register-machine, and Java bytecode uses a stack-machine. But each method frame on that stack-machine not only has an operand stack, it also has an array of local variables. Unfortunately this distinction is lost in our register-machine. To understand what this means, let us look at a full Java-Dalvik-Daneel round-trip for a simple method like the following.

public static int addConst(int val) {
   return val + 123456;
}

The first stop on our round-trip is the Java bytecode. So after we push this snippet through javac we get the following code which makes use of both, an operand stack and local variables.

public static int addConst(int);
  [max_stack=2, max_locals=1, args_size=1]
   0: iload_0
   1: ldc #int 123456
   3: iadd
   4: ireturn

The second stop takes us to the Dalvik bytecode. We push the above code through the dx tool and are left with the following code. Note that the distinction between the operand stack and local variables is lost completely, everything is stored in registers.

public static int addConst(int);
  [regs=2, ins=1, outs=0]
   0: const v0, #0x1E240
   1: add-int/2addr v0, v1
   2: return v0

The third and last step is Daneel reading the Dalvik bytecode and trying to reproduce sane Java bytecode again. The following is what he spits out after chewing on the input for a bit.

public static int addConst(int);
  [max_stack=2, max_locals=2, args_size=1]
   0: ldc #int 123456
   1: istore_1
   2: iload_1
   3: iload_0
   4: iadd
   5: istore_1
   6: iload_1
   7: ireturn

The observant reader will notice the vast difference between what we had at the beginning of our round-trip and what we ended up with. Daneel maps each Dalvik register to a Java local variable. Fortunately any decent Java VM will optimize away the unnecessary load and store instructions and we can achieve acceptable performance with this naive approach already.

Untyped Instructions: Another big difference might not be that obvious at first glance. Notice how the instruction at label 0 in the above Dalvik bytecode (the second stop on our round-trip) accesses register v0 without specifying the exact type of that register? The only thing Daneel can determine at that point in the code is that it’s a 32-bit value we are dealing with, it could be an int or a float value. For zero-constants it could even be a null reference we are dealing with. The exact type of that register is not revealed before the instruction at label 1, where v0 is read again by a typed instruction. It’s at that point that we learn the exact type of that register.

So Daneel has to keep track of all register types while iterating through the instruction stream to determine the exact types and decide which Java bytecode instructions to emit. I intend to write a separate article about how this is done by Daneel in the following days, so stay tuned.

Disclaimer: This is a technical description of just two major differences between Dalvik bytecode and Java bytecode. All political discussions about differences or similarities between Dalvik and Java in general are outside the scope of this article and I won’t comment on them.

1 Yes, Daneel is male. His girlfriend is called Ika. Together they love to drink iced tea because they try to get off caffeine. They even have a butler working for them who is called Jenkins, a very lazy guy who regularly was seen to crash during work.

Syndicated 2011-04-27 21:11:25 from michi's blog

All-Rules Mail Bundle: The shortcut to your Mail.app rules

Have you ever wanted to automate some message sorting tasks in Apple’s Mail application after you have read a message? I, for example, use one archive folder per account and move all messages into that folder after I’ve read them. The application’s rule system is perfectly suited for that task, unfortunately there is no way to activate certain rules by pressing a keyboard shortcut. That’s where this bundle comes into play.

The All-Rules Mail Bundle acts as a plugin for Apple’s Mail application and serves just one specific purpose. It provides an additional menu item located under “Message -> Apply All Rules” which applies all active rules to the currently selected messages while ignoring any present “Stop evaluating rules” action.

Where to get the bundle

The source of the bundle is available at GitHub as a standard Xcode project. Feel free to adapt it to your needs if necessary. I will also provide a precompiled binary for those of you who just want to use it out of the box.

Note that I’ve developed and tested the thing on my only Mac machine, which clearly is an inadequate test coverage. As always I would be happy about any response. So far the bundle is known to run in the following environment, which is the most recent one at the time of writing.

  • Mac OS X Snow Leopard 10.6.6
  • Mail Application 4.4
  • Message Framework 4.4

How it is implemented

First of all, let me emphasis that this is the first time I actually did some Objective-C coding. But I really liked the feel of it. I was really surprised about the power of the Objective-C runtime. You can do lots of nasty stuff at runtime like changing class hierarchies, adding methods to classes, changing method implementations and so on.

I used one technique known as method swizzling in the bundle, which lets you switch the existing implementation of a method with your own replacement at runtime. This enabled me to override the original shouldStopEvaluatingRules implementation of the MessageRule class inside the Message framework.

Unfortunately most of the APIs of the Mail application and the Message framework are private, so I expect my bundle to break sometime in the future. But the API can be easily reverse engineered with the class-dump utility which generates header files out of Objective-C binaries.

To prevent bundles from silently breaking, each bundle includes a list of the exact versions of Message frameworks and Mail applications it is compatible with. I found an article that explains how to fix unsupported plugins after upgrading Mail.app without recompiling them. So if you have different versions running on your machine that are compatible as well, let me know about them.

And last but not least I want to mention one article which helped me a lot in figuring out all those tiny details and really did it’s job in demystifying Mail.app plugins on Leopard for me.

Related bundles

The same (and more) could be done with Indev’s Mail Act-On bundle, unfortunately that bundle is sold under a commercial license. With my bundle I cloned the essential feature which was indispensable for my personal use.

Syndicated 2011-01-11 01:11:39 from michi's blog

WAQL-PP 0.1 released

With this post I am proud to announce the first release of WAQL-PP, a WAQL Preprocessor for Java I was working on for the last two weeks. In one of the former posts I described the motivation behind this little project and how I planned to implement it. I’m rather satisfied with the result, so without further ado comes a copy of the release notes for this version. If you are interested just visit the project page to check it out.

WAQL-PP 0.1 released.
 
This is the first release of the WAQL Preprocessor for Java. Here is a short
list of the most important features:
 
  * Resolves Data Dependencies between separate queries by converting
    replacement objects into a textual representation.
  * Handles nested Data Dependencies from innermost to outermost.
  * Transforms Template List constructs into valid XQuery for-clauses and
    handles correlations between different Template Lists.
  * Parser tested against the XML Query Test Suite (XQTS).
 
This release was developed against and tested with Java SE 1.6.0_22. It uses
Apache Ant as a build tool, JUnit 4.8.2 for testing purposes, JavaCC 5.0 as a
parser generator and has no additional runtime dependencies. It is currently
being used as a component in the WS-Aggregation framework.
 
Information about the project and general documentation can be found on
http://www.antforge.org/waqlpp
 
The WAQL-PP 0.1 release packages can be downloaded from
http://www.antforge.org/waqlpp/download/waqlpp-0.1/
 
File   : waqlpp-0.1-src.zip
md5sum : 57d06bfedaf1abd6eeed793838d96fc7
sha1sum: 1a5fd2196a0916fd74479c4e7aaa57811b673e3b
 
File   : waqlpp-0.1.jar
md5sum : bf97850f878014090eb9b9849e18ab37
sha1sum: 74c4e0e7e78bc16fea4bfd1b0954439e74636118
 
Enjoy!
Michael Starzinger

Syndicated 2010-11-11 22:07:58 from michi's blog

6 older entries...

 

michi certified others as follows:

  • michi certified twisti as Master
  • michi certified rlougher as Master
  • michi certified ploppy as Master

Others have certified michi as follows:

[ Certification disabled because you're not logged in. ]

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page