Older blog entries for gary (starting at number 234)

Inside Zero and Shark: Handles and Oops, Traps and Checks

You’re about to run the important enterprise application “Hello World”. What’s going to happen?

class HelloWorld {
  public static void main(String[] args) {
    System.out.println("Hello world!");
  }
}

After initializing itself, HotSpot will create a new Java thread. This will initially be _thread_in_vm because it’s running VM code. Eventually it will call JavaCalls::call (in javaCalls.cpp) to bridge from VM code to Java code. Before we can look at what JavaCalls::call does, however, we need to understand a couple of HotSpot conventions. Look at its prototype:

void JavaCalls::call(JavaValue* result, methodHandle method, JavaCallArguments* args, TRAPS);

The first things we need to understand are handles and oops. All Java objects, and in fact all objects in HotSpot managed by the garbage collector, are oops, and when you’re dealing with oops you need to keep the garbage collector in mind. More specifically, you need to know where in your code the GC might run, because when it does run you need to have told it the location of every single oop you’re using, and when it returns you need to deal with the fact that your oops have probably moved. If your C compiler has optimized your code such that an oop is in a register then the oop in that register is now wrong, and you’re going to crash pretty soon.

Dealing with raw oops is hard, but luckily there are ways of protecting them. In VM code — when you’re _thread_in_vm — the protection of choice is to use handles. A handle wraps an oop, managing access to it such that GC activity becomes transparent. If you’re in VM code and you’re using handles then you don’t have to worry. But you do need to know what’s happening, because if you see some code that’s calling methodHandle methods and you grep the OpenJDK tree to find the methodHandle class definition you will not find it. The methods you are looking for are actually the methods of the methodOopDesc class (in methodOop.hpp).

The other thing we need to understand in that prototype is the mysterious TRAPS at the end. It’s kind of a note to the programmer: functions that trap are functions that can throw Java exceptions. When you call them you use CHECK as their final argument for a convenient exception check:

JavaCalls::call(result, method, args, CHECK);

TRAPS and CHECK are defined in exceptions.hpp. You may wish to avert your eyes:

#define TRAPS   Thread* THREAD
#define CHECK   THREAD); if (HAS_PENDING_EXCEPTION) return; (0

Now we can see how HotSpot handles exceptions: they’re simply stored in the thread. Code that cares can access the exception using these guys:

#define PENDING_EXCEPTION       (((ThreadShadow*)THREAD)->pending_exception())
#define HAS_PENDING_EXCEPTION   (((ThreadShadow*)THREAD)->has_pending_exception())

Next time I really will explain how method invocation works…

Syndicated 2009-01-27 11:11:18 from gbenson.net

Inside Zero and Shark: Java threads and state transitions

Andrew Haley has been doing some work on Zero and Shark lately, and his questions have made me realise that while Zero and Shark are pretty small in comparison with the rest of HotSpot, they’re not the easiest of things to get a handle on. I decided to write some articles to try and present a kind of overview of it all, to make things easier for others in the future.

HotSpot is the Java Virtual Machine (JVM) of the OpenJDK project. Its name refers to it’s primary mode of operation, in which Java methods are initially executed by a profiling interpreter, and only after they have been executed a certain number of times are they be deemed “hot” enough to be compiled to native code by a Just In Time (JIT) compiler. The aim is to avoid wasting time compiling rarely-used methods, such that each method you compile to be the one that will improve performance the most.

If you look inside a running HotSpot process you’ll see a number of different threads. There will be VM threads that handle such things as garbage collection. There may be one or more compiler threads — these are the JITs. And there will be Java threads. These are the threads that are executing Java code, the threads we are interested in.

At any time, each Java thread will be in one of (essentially) three states. A thread that is _thread_in_Java is executing code that was written in Java, either by interpreting bytecode or by executing native code compiled by the JIT. A thread that is _thread_in_native is executing a native Java method — JNI code. And a thread that is _thread_in_vm is running code that is part of the VM rather than code that is part of the application.

Threads change state all over the place. Imagine you’re in a Java method (you’re _thread_in_Java) and you invoke a native method. That switches you to _thread_in_native. Then, your native code calls some VM function, and suddenly you’re _thread_in_vm. Maybe that VM function calls some Java code? Now you’re back in _thread_in_Java. And as those calls return the transitions happen in reverse.

When hacking on HotSpot you tend to avoid thread state transitions where possible because various things happen during them and some directions are not cheap. The most obvious example of this is that threads remain _thread_in_Java across method calls, such that if one non-native method calls another then no transition occurs. In fact, _thread_in_Java is something of a default state. If you look at the function in Zero that handles calls to native methods (CppInterpreter::native_entry, in cppInterpreter_zero.cpp) you’ll see that the transition to _thread_in_native is the very last thing to happen before the actual call itself, and that transitioning back to _thread_in_Java is the very first thing to happen once the native method returns. And whilst you’re in there, check out what happens during the transition back to _thread_in_Java. The transition from _thread_in_native to _thread_in_Java is one of the expensive ones.

That pretty much covers threads and state transitions. Next time I’ll explain some of how method invocation actually works.

Syndicated 2009-01-26 17:31:23 from gbenson.net

Porting Shark

Shark when it’s done will be great, a massive improvement over Zero, but LLVM only supports a couple of the platforms people use Zero on. I’ve wondered a few times how the task of porting LLVM to a new architecture compares with writing a full HotSpot port from scratch. This morning I realised I could get a rough idea by simple counting the lines of x86-specific code, the one port they share:

  Lines of code
LLVM 2.4 34,391
HotSpot 14.0b08 77,329

This is just raw lines of code, nothing clever. Both implement a combined IA-32 and x86_64 port, and the HotSpot figure is for the Linux port with the server JIT — one OS, one JIT — so I believe it’s a fair comparison. You could infer that porting LLVM and using Zero and Shark will get you up and running with OpenJDK in about half the time. That’s not bad.

Syndicated 2009-01-14 11:28:39 from gbenson.net

Fun things to type in gdb

Want to see what the C++ interpreter is up to in gdb?

(gdb) bt
…
#6  0×0f42155c in BytecodeInterpreter::run (istate=0xd0f7e55c) at bytecodeInterpreter.cpp:857
…
(gdb) call PI(0xd0f7e55c)
thread: 0×10108650
bcp: 0xf20efe8b
locals: 0xd0f7e5b4
constants: 0xf20f01f8
method: 0xf20efea8[ javasoft.sqe.tests.vm.jdwp.StackFrame.PopFrames.popframes001a$TestedThreadClass.testedMethod(I)I ]
mdx: 0×00000000
stack: 0xd0f7e558
msg: no_request
result_to_call._callee: 0xf2070188
result_to_call._callee_entry_point: 0xf5e95184
result_to_call._bcp_advance: 3
osr._osr_buf: 0xf2070188
osr._osr_entry: 0xf5e95184
result_return_kind 0xf2070188
prev_link: 0×00000000
native_mirror: 0×00000000
stack_base: 0xd0f7e55c
stack_limit: 0xd0f7e54c
monitor_base: 0xd0f7e55c
self_link: 0xd0f7e55c

Syndicated 2009-01-08 13:22:14 from gbenson.net

Super-dirty jtreg hacking

Today I made my second official patch to OpenJDK. I forgot how to make the jtreg test and had to figure it out all over again, so here’s my quick and dirty guide for the future:

  1. Build jtreg. I use the IcedTea one, because it’s there:
    make jtreg
  2. Make a test root and copy your test into it:
    mkdir -p tests/tests
    touch tests/TEST.ROOT
    mv ~/Test6779290.java tests/tests
    
  3. Run the tests:
    openjdk-ecj/control/build/linux-ppc/j2sdk-image/jre/bin/java -jar test/jtreg.jar -v1 -s tests

In other news it’s over a year since I started hacking on Zero. I was hoping to be able to announce a TCK-passing build before Christmas but that’s not going to happen. Oh well.

Syndicated 2008-12-23 11:18:55 from gbenson.net

Fedora 10

Apparently Fedora 10’s eclipse-ecj doesn’t have gcj-compiled libraries any more. Never mind:

mkdir /usr/lib/gcj/eclipse-ecj
aot-compile -c "-O3" /usr/lib/eclipse/dropins/jdt/plugins /usr/lib/gcj/eclipse-ecj
rebuild-gcj-db

Also, whilst I’m messing with my system, I’ve always had to do the following for ppc64 builds to work:

mkdir -p /usr/lib/jvm/java-gcj/jre/lib/ppc64/server
ln -s /usr/lib64/gcj-4.3.2/libjvm.so /usr/lib/jvm/java-gcj/jre/lib/ppc64/server

I never figured out how anyone else manages without this. Maybe nobody else is trying to build two platforms on the one box.

Syndicated 2008-12-04 11:33:30 from gbenson.net

Not dead

For anyone who’s wondering what I’ve been up to I’ve taken a bit of a break from Shark these past few weeks in order to concentrate on getting a build of Zero though the JCK.

Syndicated 2008-11-24 15:40:55 from gbenson.net

Update

With talk of a new IcedTea release I thought I’d better commit what I had of Shark ready for it. I found a couple of what look like optimizer failures while testing (usually I build with optimization disabled, for debugging) but I managed to work around those this morning and get a set of DaCapo results:

  Status Detail
antlr FAIL too many open files
bloat pass 83178ms
chart pass 47227ms
eclipse FAIL one method miscompiles, one method won’t compile
fop pass 15762ms
hsqldb pass 21190ms
jython pass 67533ms
luindex pass 35567ms
lusearch pass 35633ms
pmd pass 60637ms
xalan pass 48422ms

These are still with a non-optimized LLVM, but the numbers are much closer to what I was hoping for than the previous sets.

Syndicated 2008-10-03 14:46:14 from gbenson.net

Quickie

I have a really bad cold, but I fixed the lusearch bug.

Syndicated 2008-09-23 08:51:54 from gbenson.net

The State Decacher and Other Animals

It’s been a while. Here’s where I am:

  Status Detail
antlr FAIL too many open files
bloat pass 699657ms
chart pass 342527ms
eclipse FAIL one method miscompiles, one method won’t compile
fop pass 35198ms
hsqldb pass 178011ms
jython pass 983272ms
luindex pass 140654ms
lusearch FAIL segfault
pmd pass 456881ms
xalan pass 148200ms

After implementing deoptimization and the remaining bytecodes I’ve been taking some time to rewrite state cache and decache. When methods are compiled, the local variables and expression stack mostly end up in registers, but when you enter the VM some of the locals and stack slots need to be accessible. Garbage collection can happen when you invoke Java methods or call VM functions, for example, so object pointers need to be visible. The way this works in Shark is that methods allocate a frame on the call stack at entry with enough space to store all its locals and stack slots. At VM entry, whatever slots are needed are written to the frame (”decached”), and on return any object pointers are reloaded (”cached”) in case they changed.

Unfortunately, over time, the cache and decache functions have become ridiculously overcomplicated. The problem is that there are three types of decache and cache — for a Java call, for a VM call, and for deoptimization — each with its own different rules for exactly what needs writing and rereading. The story ends there for cache, but a decache has three separate functions: it must generate the actual code to write the values, it must tell the garbage collector which slots contain objects, and it must describe the frame to the stack trace code. This is all further complicated by the fact that Shark uses a compressed expression stack, where long and double values take up one slot, whereas HotSpot uses an expanded version where they take up two.

I’m pretty sure that the Eclipse failure is a decache failure, and I’m leaning towards the two being them as well, hence the rewrite. It’s in two parts, the first being to move the interface between the compressed and expanded stacks from the decacher code into the bit that parses the bytecode, and the second being to abstract everything so that cache and decache are using as much of the same code as possible. Currently there is not much sharing between the two, and it’s messy.

The first part is done, and seems pretty stable. The only place where compressed stacks now exist is in the SharkBlock class, and where it was necessary or useful to expose the expanded stack I’ve prefixed the method names with “x”.

The second part is a work in progress…

Syndicated 2008-09-12 11:06:15 from gbenson.net

225 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!