Older blog entries for mikal (starting at number 889)

Reflecting on Essex

This post is kind of long, and a little self indulgent. However, I really wanted to spend some time thinking about what I did for the Essex release cycle, and what I want to do for the Folsom release. I spent Essex mostly hacking on things in isolation, except for when Padraig Brady and I were hacking in a similar space. I'd like to collaborate more for Folsom, and I'm hoping talking about what I'm interested in doing in public might help with that.

I came relatively late to the Essex development cycle, having never even heard of OpenStack before joining Canonical. We can talk about how I'd worked in the cloud space for six years and yet wasn't aware of the open source implementations at some other time.

My initial introduction to OpenStack was being paged for compute nodes which were continually running out of disk. I googled around a bit and discovered that cached images for instances were never cleaned up (to start an instance, an image is fetched from glance, possibly has its format converted, is resized, and then an instance started with that resulting image, all those images were never being cleaned up). I filed bug 904532 as my absolute first interaction with the OpenStack community. Scott Moser kindly pointed me at the blueprint for how to actually fix the problem.

(Remind me if Phil Day comes to the OpenStack developer summit that I should sit down with him at some point and see how what close what was actually implemented got to what he wrote in that blueprint. I suspect we've still got a fair way to go, but I'll talk more about that later in this post).

This was a pivotal moment. I'd just spent the last six years writing python code to manage largish cloud clusters, and here was a bug which was hurting me in a python package intended to manage clusters very similar to those I had been running. I should just fix the bug, right?

It turns out that the OpenStack core developers are super easy to work with. I'd say that the code review process certainly feels like it was modelled on Google's but in general the code reviewers are nicer with their comments that what I'm used to. This makes it much easier to motivate yourself to go and spend some more time hacking that a deeply negative review would. I think Vish is especially worthy of a shout out as being an amazing person to work with. He's helpful, patient, and very smart.

In the end I wrote the image cache manager which ships in Essex. Its not perfect, but its a lot better than what came before, and its a good basis to build on. There is some remaining tech debt for image cache management which I intend to work on for Folsom. First off, the image cache only works for libvirt instances at the moment. I'd like to pull all the other hypervisors into line as best as possible. There are hooks in the virtualization driver for this, but no one has started this work as best as I am aware. To be completely honest I'd like to see the image cache manager become common code and have all the hypervisors deal with this in exactly the same manner -- that makes it easier to document, and means that on-call operations people don't need to determine what hypervisor a compute node is running before starting to debug. This is something I very much want to sit down with other nova developers and talk about at the summit.

The next step for image cache management is tracked in a very bare bones blueprint. The original blueprint envisaged that it would be desirable to pre-cache some images on all nodes. For example, a cloud host might want to offer slightly faster startup times for some images by ensuring they are pre-cached. I've been thinking about this a lot, and I can see other use cases here as well. For example, if you have mission critical instances and you wanted to tolerate a glance failure, then perhaps you want to pre-cache a class of images that serve those mission critical instances. The intention is to provide an interface and default implementation for the pre-caching logic, and then let users go wild working out their own requirements.

The hardest bit of the pre-caching will be reducing the interactions with glance I suspect. The current feeling is that calling glance from a periodic task is a bit scary, and has been actively avoided for Essex. This is especially true if Keystone is enabled, as the periodic task wont have an admin context unless we pull that from the config file. However, if you're trying to determine what images are mission critical, then you really need to talk to glance. I guess another option would be to have a table of such things in nova's database, but that feels wrong to me. We're going to have to talk about this bit more.

(It would be interesting as well to talk about the relative priority of instances as well. If a cluster is experiencing outages, then perhaps some customers would pay more to have their instances be the last killed off or something. Or perhaps I have instances which are less critical than others, so I want the cluster to degrade in an understood manner.)

That leads logically onto a scheduler change I would like to see. If I have a set of compute nodes I know already have the image for a given instance, shouldn't I prefer to start instances on those nodes instead of fetching the image to yet more compute nodes? In fact, if I already have a correctly resized COW base image for an instance on a given node, then it would make sense to run a new instance on that node as well. We need to be careful here, because you wouldn't want to run all of a given class of instance on a small set of compute nodes, but if the image was something like a default Ubuntu image, then it would make sense. I'd be interested in hearing what other people think of doing something like this.

Another thing I've tried to focus on for Essex is making OpenStack easier for operators to run. That started off relatively simply, by adding an option for log messages to specify what instance a message relates to. This means that when a user queries the state of their instance, the admin can now just grep for the instance UUID, and run from there. Its not perfect yet, in that not all messages use this functionality, but that's some tech debt that I will take on in Folsom. If you're a nova developer, then please pass instance= in your log messages where relevant!

This logging functionality isn't perfect, because if you only have the instance UUID in the method you're writing, it wont work. It expects full instance dicts because of the way the formatting code works. This is kind of ironic in that the default logging format only includes the UUID. In Folsom I'll also extend this code so that the right thing happens with UUIDs as well.

Another simple logging tweak I wrote is that tracebacks now have the time and instance included in them. This makes it much easier for admins to determine the context of a traceback in their logs. It should be noted that both of these changes was relatively trivial, but trivial things can often make it much easier for others.

There are two sessions at the Folsom dev summit talking about how to make OpenStack easier for operators to run. One was from me, and the other is from Duncan McGreggor. Neither has been accepted yet, but if I notice that Duncan's was accepted I'll drop mine. I'm very very interested in what operations staff feel is currently painful, because having something which is easy to scale and manage is vital to adoption. This is also the core of what I did at Google, and I feel I can make a real contribution here.

I know I've come relatively late to the OpenStack party, but there's heaps more to do here and I'm super enthused to be working on code that I can finally show people again.

Tags for this post: openstack canonical essex folsom image_cache_management sre
Related posts: Further adventures with base images in OpenStack; Openstack compute node cleanup; Managing MySQL the Slack Way: How Google Deploys New MySQL Servers; I won a radio shark and headphones!; Conference Wireless not working yet?; Taking over a launch pad project; Off to the MySQL tutorials; Links from Rasmus' PHP talk; MySQL Workbench; Slow git review uploads?; Thoughts on the first day of the MySQL user's conference; MySQL cluster stores in RAM!; Wow, qemu-img is fast; Registered for MySQL User Conference 2006; Are you in a LUG? Do you want some promotional materials for LCA 2013?; Announcement video; linux.conf.au Returns to Canberra in 2013; The next thing; MySQL Users Conference; Managing MySQL the Slack Way: How Google Deploys New MySQL Servers; Call for papers opens soon

Comment

Syndicated 2012-04-05 18:19:00 from stillhq.com : Mikal, a geek from Canberra living in Silicon Valley (no blather posts)

Call for papers opens soon

It's time to start thinking about your talk proposals, because the call for papers is only eight weeks away!

For the 2013 conference, the papers committee are going to be focusing on deep technical content, and things we think are going to really matter in the future -- that might range from freedom and privacy, to open source cloud systems, or energy efficient server farms of the future. However, the conference is to a large extent what the speakers make it -- if we receive many excellent submissions on a topic, then its sure to be represented at the conference.

The papers committee will be headed by the able combination of Michael Davies and Mary Gardiner, who have done an excellent job in previous years. They're currently working through the details of the call for papers announcement. I am telling you this now because I want speakers to have plenty of time to prepare for the submissions process, as I think that will produce the highest quality of submissions.

I also wanted to let you know the organising for linux.conf.au 2013 is progressing well. We're currently in the process of locking in all of our venue arrangements, so we will have some announcements about that soon. We've received our first venue contract to sign, which is for the keynote venue. It's exciting, but at the same time a good reminder that the conference is a big responsibility.

What would you like to see at the conference? I am sure there are things which are topical which I haven't thought of. Blog or tweet your thoughts (include the hashtag #lca2013 please), or email us at contact@lca2013.linux.org.au.

Tags for this post: conference lca2013 cfp canonical
Related posts: Taking over a launch pad project; LCA 2006: CFP closes today; Slow git review uploads?; Further adventures with base images in OpenStack; Wow, qemu-img is fast; Are you in a LUG? Do you want some promotional materials for LCA 2013?; Announcement video; linux.conf.au Returns to Canberra in 2013; The next thing; Openstack compute node cleanup

Comment

Syndicated 2012-04-02 20:45:00 from stillhq.com : Mikal, a geek from Canberra living in Silicon Valley (no blather posts)

Memorial service details

This is what will be published in the paper on Wednesday this week:

Robyn Barbara Boland
24 April 1948 - 30 March 2012

Dearly loved and cherished mother of
Catherine and Michael, Emily and Justin
Jonathan and Lynley, and Allister.
Proud Ma of Andrew and Matthew.

Robyn took Jesus' hand and
walked peacefully
into her Heavenly Father's arms.
She was a friend to all who met her.
Robyn will be deeply missed.


A celebration of Robyn's life will be held
at Woden Valley Alliance Church,
81 Namatjira Drive, Waramanga on
Tuesday, 10 April 2012 commencing at 1pm.


Tags for this post: health robyn liver funeral
Related posts: A further update on Robyn's health; RIP Robyn Boland; Doh...; Weekend update; Bigger improvements; Robyn's Health; More on Robyn; Update on Robyn from Catherine; Continued improvement; Small improvements

Comment

Syndicated 2012-04-02 00:41:00 from stillhq.com : Mikal, a geek from Canberra living in Silicon Valley (no blather posts)

Update on Robyn from Catherine

I apologize if there are factual inaccuracies in this post. It has been written with the best information I have available at the time.


Cat sent this update out to robyn-discuss last night, but I am reposting it here for those who aren't on the mailing list.

25 Mar 2012 (updated 25 Mar 2012 at 11:08 UTC) »

Weekend update

I apologize if there are factual inaccuracies in this post. It has been written with the best information I have available at the time.


Robyn's blood test results are showing a slight decline in both liver and kidney function. She was awake for slightly longer periods this morning, but was back to being really sleepy this afternoon. Based on her ability to stay awake this morning they were talking about removing her breathing tube.

Robyn is still breathing on her own but she but they want the breathing tube to stay in place until she is more conscious and able to be roused. Disappointingly this afternoon she was back to being mostly un-responsive. Over all her condition is stable but she has a long way to go.

Tags for this post: health robyn liver sydney
Related posts: A further update on Robyn's health; Bigger improvements; Robyn's Health; More on Robyn; Continued improvement; Small improvements; In Sydney!; In Sydney for the day; Planes at 600 meters!; Sydney next week; Getting ready to leave Sydney; What are we doing with the pets?; Slack talk at SLUG; Don't use Jetbus Sydney if you want to catch your flight; Travel details so far; In Sydney; Sydney 1, Mikal 1; Sydney redeems itself, if only a little; Google? Sydney?; On the potentially sorry state of second hand science fiction book stores in Sydney

Comment

Syndicated 2012-03-25 02:59:00 (Updated 2012-03-25 11:08:32) from stillhq.com : Mikal, a geek from Canberra living in Silicon Valley (no blather posts)

Continued improvement

I apologize if there are factual inaccuracies in this post. It has been written with the best information I have available at the time.


Here's the next status update. Robyn is awake for brief periods. She is on marginally less medication for her low blood pressure. Overall the ICU nurses say they are happy with how stable she is. It should be reinforced that she's still very ill and has a long way to go, but she has improved marginally.

Tags for this post: health robyn liver sydney
Related posts: A further update on Robyn's health; Bigger improvements; Robyn's Health; More on Robyn; Small improvements; In Sydney!; In Sydney for the day; Planes at 600 meters!; Sydney next week; Getting ready to leave Sydney; What are we doing with the pets?; Slack talk at SLUG; Don't use Jetbus Sydney if you want to catch your flight; Travel details so far; In Sydney; Sydney 1, Mikal 1; Sydney redeems itself, if only a little; Google? Sydney?; On the potentially sorry state of second hand science fiction book stores in Sydney

Comment

Syndicated 2012-03-21 17:04:00 from stillhq.com : Mikal, a geek from Canberra living in Silicon Valley (no blather posts)

Bigger improvements

I apologize if there are factual inaccuracies in this post. It has been written with the best information I have available at the time.


Last night Robyn went for a CAT scan. That involved detaching her from dialysis. When she got back from the scan they didn't bother to connect her back up, and her kidneys seem to be coping on their own now. Since this morning she has also been breathing without assistance, which is also good. Robyn continues to respond to input which is also good.

Next steps are that Robyn needs to increase her blood pressure, her kidney function needs to improve, and her kidneys need to start producing urine again. The kids also need to decide at what point they're going to go back home, which is hard for them at the moment as they're all pretty tired. It seems that discussion will happen tomorrow sometime.

Tags for this post: health robyn liver sydney
Related posts: A further update on Robyn's health; Robyn's Health; More on Robyn; Small improvements; In Sydney!; In Sydney for the day; Planes at 600 meters!; Sydney next week; Getting ready to leave Sydney; What are we doing with the pets?; Slack talk at SLUG; Don't use Jetbus Sydney if you want to catch your flight; Travel details so far; In Sydney; Sydney 1, Mikal 1; Sydney redeems itself, if only a little; Google? Sydney?; On the potentially sorry state of second hand science fiction book stores in Sydney

Comment

Syndicated 2012-03-20 19:48:00 from stillhq.com : Mikal, a geek from Canberra living in Silicon Valley (no blather posts)

Small improvements

I apologize if there are factual inaccuracies in this post. It has been written with the best information I have available at the time.


I've just been told that Robyn has opened her eyes briefly and is responding more concretely to input than she was previously. Specifically she is squeezing people's hands in response to questions and moving her head around. This is more communicative than she's been for the last few days, so it seems small but it is still a move in the right direction.

Tags for this post: health robyn liver sydney
Related posts: A further update on Robyn's health; Robyn's Health; More on Robyn; In Sydney!; In Sydney for the day; Planes at 600 meters!; Sydney next week; Getting ready to leave Sydney; What are we doing with the pets?; Slack talk at SLUG; Don't use Jetbus Sydney if you want to catch your flight; Travel details so far; In Sydney; Sydney 1, Mikal 1; Sydney redeems itself, if only a little; Google? Sydney?; On the potentially sorry state of second hand science fiction book stores in Sydney

Comment

Syndicated 2012-03-19 17:14:00 from stillhq.com : Mikal, a geek from Canberra living in Silicon Valley (no blather posts)

More on Robyn

I apologize if there are factual inaccuracies in this post. It has been written with the best information I have available at the time.


I have just got off the phone with Robyn's kids at RPA. My understanding is that if Robyn's health doesn't improve they can no longer do a transplant. If she doesn't improve within a couple of days the doctors want to start talking about turning off life support.

Tags for this post: health robyn liver sydney
Related posts: A further update on Robyn's health; Robyn's Health; In Sydney!; In Sydney for the day; Planes at 600 meters!; Sydney next week; Getting ready to leave Sydney; What are we doing with the pets?; Slack talk at SLUG; Don't use Jetbus Sydney if you want to catch your flight; Travel details so far; In Sydney; Sydney 1, Mikal 1; Sydney redeems itself, if only a little; Google? Sydney?; On the potentially sorry state of second hand science fiction book stores in Sydney

Comment

Syndicated 2012-03-18 19:44:00 from stillhq.com : Mikal, a geek from Canberra living in Silicon Valley (no blather posts)

880 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!