Older blog entries for amits (starting at number 83)

31 Jan 2014 (updated 31 Jan 2014 at 10:13 UTC) »

Use of Piwik Analytics

I run Piwik on OpenShift to collect stats on visits to this blog.  I’m not really interested in knowing who visits my site.  I’m only interested in knowing what people are visiting for, and how: which pages are more viewed? where are people landing to my site from?  how long after publishing some post do people still visit it?  And so on.

One of the ways this is also helpful is to track 404 (page not found) errors that pop up for visitors.  After migrating my previous posts from blogger, I kept monitoring for any posts that may have been missed by the automatic migration process, and manually moved them.

These days, the 404 tracking turns up interesting data, though.  Someone recently tried to access such a page on this blog which resulted in a 404 error:

/​oxmax/​admin/​includes/​javascript/​ckeditor/​filemanager/​swfupload/​upload.​php/​From =

A quick search on the net revealed it’s a relatively recent vulnerability discovered in some php-based e-commerce suite, which gives root access to the server hosting the software.  Thankfully, I don’t run any e-commerce software, and I also run on OpenShift, which gives the servers quite a bit of protection.  In the worst case, some wordpress vulnerability might affect my blog, but the other software hosted on the same server as this blog will be protected (even in the case of a root expoit).

Syndicated 2014-01-31 08:12:09 (Updated 2014-01-31 09:41:28) from Think. Debate. Innovate.

Backing Up Data on Android Phones

Experimenting with the new cyanogenmod builds for Android 4.3 (cm-10.2) resulted in a disaster: my phone was setup for encryption, and the updater messed up the usb storage such that the phone wouldn’t recognise the in-built sdcard on the Nexus S anymore.  I tried several things: factory reset, formatting via the clockworkmod recovery, etc., to no avail.  The recovery wouldn’t recognize the /sdcard partition, too.

Good thing I had a backup, so I wasn’t worried too much.

I could use adb when CWM recovery was booted, to navigate around.  Using fdisk, I could see the /sdcard partition was intact, but wouldn’t get recognized by either CWM or the kernel.  I deleted the partition, and created a new one with the same parameters.  Also used the opportunity to try out ext4 instead of the default fat.  CWM still wouldn’t recognize / mount this partition, but the android kernel does recognize it.  However, mounting the card as USB storage still doesn’t work.

So I’ve now fallen back to using adb + rsync as my backup solution: usb-tether the phone to the laptop, note the IP addr the laptop got, and then from an adb shell, just issue

'rsync -av /sdcard/ user@laptop-ip:/path/to/backup/'

This is working fine.  adb push/pull also work quite well, and I don’t really miss the ‘mount as usb storage’ functionality much.  I’ll however try fixing this issue, since encryption isn’t working as well — so the key would be to ensure CWM recovery identifying the partition.  I’m guessing if that works fine, the remaining bits would be fine too (mounting usb storage, encrypting it, etc.)

I use GOBackup from the Play store to backup apps+data.  oandbackup from the fdroid store looks nice, but crashes a lot.  It’s being constantly updated, though, so it has promise to become a nice backup app.

Syndicated 2014-01-15 17:36:06 (Updated 2014-01-15 17:37:10) from Think. Debate. Innovate.

Red-Whiskered Bulbul

A few weeks back, a strange bird call started waking me up.  Though red-whiskered bulbuls are supposed to be pretty common, I’d not heard them or seen one up close.


There were two of them making rounds throughout the day, and they used to visit one plant in my terrace-garden frequently.  I took that as a sign that they’re building a nest, and ensured they don’t get disturbed when the visited.  I tried taking a few pictures, but couldn’t manage good ones: these birds are very shy, and any they’re very alert to any movement or human presence.  There’s a better photo at wikipedia.

Exactly one week after they started arriving, they had built a nest and they didn’t make as much noise as earlier.  That made me curious.  I checked out their nest, and was quite delighted to see an egg:

Red whiskered bulbul's nest with egg

Red whiskered bulbul’s nest with egg

Red whiskered bulbul's nest with egg -- camera flash on.

Red whiskered bulbul’s nest with egg — camera flash on.

The next day I woke up to a lot of mayhem, lots of bird noises near the plant.  I didn’t feel like disturbing anything there.  When the commotion died down, I went to check, and saw the egg was missing.  That was a sad end to the week-long activity around the nest.  I initially suspected pigeons, permanent residents in the terrace, to have caused the damage.  However, later in the day, a big crow came by near the same plant (crows have never come inside the terrace earlier).  Wonder what this all means, and where the egg vanished.

Syndicated 2013-05-12 07:41:57 (Updated 2013-05-12 08:05:15) from Think. Debate. Innovate.

25 Jan 2013 (updated 22 May 2013 at 11:16 UTC) »

Session notes from the Virtualization microconf at the 2012 LPC

The Linux Plumbers Conf wiki seems to have made the discussion notes for the 2012 conf read-only as well as visible only to people who have logged in.  I suspect this is due to the spam problem, but I’ll put those notes here so that they’re available without needing a login.  The source is here.

These are the notes I took during the virtualization microconference at the 2012 Linux Plumbers Conference.

Virtualization Security Discussion – Paul Moore


  • threats to virt system
  • 3 things to worry about
    • attacks from host – has full access to guest
    • attacks from other guests on the host
      • break out from guest, attack host and other guests (esp. in multi-tenant situations)
    • attacks from the network
      • traditional mitigation: separate networks, physical devices, etc.
  • protecting guest against malicious hosts
  • host has full access to guest resources
  • host has ability to modify guest stuff at will; w/o guest knowing it
  • how to solve?
    • no real concrete solutions that are perfect
    • guest needs to be able to verify / attest host state
      • root of trust
    • guests need to be able to protect data when offline
      • (discussion) encrypt guests – internally as well as qcow2 encryption
  • decompose host
    • (discussion) don’t run services as root
  • protect hosts against malicious guests
  • just assume all guests are going to be malicious
  • more than just qemu isolation
  • how?
    • multi-layer security
    • restrict guest access to guest-owned resources
    • h/w passthrough – make sure devices are tied to those guests
    • limit avl. kernel interfaces
      • system calls, netlink, /proc, /sys, etc.
    • if a guest doesn’t need an access, don’t give it!
  • libvirt+svirt
    • MAC in host to provide separation, etc.
    • addresses netlink, /proc, /sys
  • (discussion) aside: how to use libvirt w/o GUI?
    • there is ‘virsh’, documentation can be improved.
  • seccomp
    • allows to selectively turn off syscalls; addresses syscalls in list above.
  • priv separation
    • libvirt handles n/w, file desc. passing, etc.
  • protecting guest against hostile networks
  • guests vulnerable directly and indirectly
  • direct: buggy apache
  • indirect: host attacked
  • qos issue on loaded systems
  • host and guest firewalls can solve a lot of problems
  • extend guest separation across network
    • network virt – for multi-tenant solutions
    • guest ipsec and vpn services on host
  • (discussion) blue pill vulnerability – how to mitigate?
    • lot of work being done by trusted computing group – TPM
    • maintain a solid root of trust
  • somebody pulling rug beneath you, happens even after boot
  • you’ll need h/w support?
    • yes, TPM
    • UEFI, secure boot
  • what about post-boot security threats?
    • let’s say booted securely. other mechanisms you can enable – IMA – extends root of trust higher. signed hashes, binaries.
    • unfortunately, details beyond scope for a 20-min talk

Storage Virtualization for KVM: Putting the pieces together – Bharata Rao


  • Different aspects of storage mgmt with kvm
    • mainly using glusterfs as storage backend
    • integrating with libvirt, vdsm
  • problems
    • multiple choices for fs and virt mgmt
      • libvirt, ovirt, etc.
  • not many fs’es are virt-ready
    • virt features like snapshots, thin-provisioning, cloning not present as part of fs
    • some things done in qemu: snapshot, block mig, img fmt handling are better handled outside
  • storage integration
    • storage device vendor doesn’t have well-defined interfaces
  • gluster has potential: leverage its capabilities, and solve many of these problems.
  • intro on glusterfs
    • userspace distributed fs
    • aggregates storage resources from multiple nodes and presents a unified fs namespace
  • glusterfs features
    • replication, striping, distribution, geo-replication/sync, online volume extension
  • (discussion) why gluster vs ceph?
    • gluster is modular; pluggable, flexible.
    • keeps storage stack clean. only keep those things active which are needed
    • gluster doesn’t have metadata.
      • unfortunately, gluster people not around to answer these questions.
  • by having backend in qemu, qemu can already leverage glusterfs features
    • (discussion) there is a rados spec in qemu already
      • yes, this is one more protocol that qemu will now support
  • glusterfs is modular: details
    • translators: convert requests from users into requests for storage
    • open/read/write calls percolate down the translator stack
      • any plugin can be introduced in the stack
  • current status: enablement work to integrate gluster-qemu
    • start by writing a block driver in qemu to support gluster natively
    • add block device support in gluster itself via block device translator
  • (discussion) do all features of gluster work with these block devices?
    • not yet, early stages. Hope is all features will eventually work.
  • interesting use-case: replace qemu block dev with gluster translators
  • would you have to re-write qcow2?
    • don’t need to, many of qcow2 features already exist in glusterfs common code
  • slide showing perf numbers
  • future
    • is it possible to export LUNs to gluster clients?
    • creating a VM image means creating a LUN
    • exploit per-vm storage offload – all this using a block device translator
    • export LUNs as files; also export files as LUNs.
  • (discussion) why not use raw files directly instead of adding all this overhead? This looks like a perf disaster (ip network, qemu block layer, etc.) – combination of stuff increasing latency, etc.
    • all of this is an experimentation, to go where we haven’t yet thought about – explore new opportunities. this is just the initial work; more interesting stuff can be build upon this platform later.
  • libvirt, ovirt, vdsm support for glusterfs added – details in slides
  • (discussion) storage array integration (slide) – question
    • way vendors could integrate san storage into virt stack.
    • we should have capability to use array-assisted features to create lun.
    • from ovirt/vdsm/libvirt/etc.
  • (discussion) we already have this in scsi. why add another layer? why in userspace?
    • difficult, as per current understanding: send commands directly to storage: fast copy from lun-to-lun, etc., not via scsi T10 extentions.
    • these are out-of-band mechanisms, in mgmt path, not data path.
  • why would someone want to do that via python etc.?

Next generation interrupt virtualization for KVM – Joerg Roedel


  • presenting new h/w tech today that accelerates guests
  • current state
    • kvm emulates local apic and io-apic
    • all reads/writes intercepted
    • interrupts can be queued from user or kernel
    • ipi costs high
  • h/w support
    • limited
    • tpr is accelerated by using cr8 register
    • only used by 64 bit guests
  • shiny new feature: avic
  • avic is designed to accelrate most common interrupt system features
    • ipi
    • tpr
    • interurpts from assigned devs
  • ideally none of those require intercept anymore
  • avic virtualizes apic for each vcpu
    • uses an apic backing page
    • guest physical apic id table, guest logical apic id table
    • no x2apic in first version
  • guest vapic backing page
    • store local apic contents for one vcpu
    • writes to accelerated won’t intercept
    • to non-accelerated cause intercepts
  • accelerated:
    • tpr
    • EOI
    • ICR low
    • ICR high
  • physical apic id table
    • maps guest physical apic id to host vapic pages
    • (discussion) what if guest cpu is not running
      • will be covered later
  • table maintained by kvm
  • logical apic id table
    • maps guest logical apic ids to guest physical apic ids
      • indexed by guest logical apic id
  • doorbell mechanism
    • used to signal avic interrupts between physcial cpus
      • src pcpu figures out physical apic id of the dest.
      • when dest. vcpu is running, it sends doorbell interrupt to physical cpu
  • iommu can also send doorbell messages to pcpus
    • iommu checks if vcpu is running too
    • for not running vcpus, it sends an event log entry
  • imp. for assigned devices
  • msr can also be used to issue doorbell messages by hand – for emulated devices
  • running and not running vcpus
    • doorbell only when vcpu running
  • if target pcpu is not running, sw notified about a new interrupt for this vcpu
  • support in iommu
    • iommu necessary for avic-enabled device pass-through
    • (discussion) kvm has to maintain? enable/disable on sched-in/sched-out
  • support can be mostly be implemented in kvm-amd module
    • some outside support in apic emulation
    • some changes to lapic emulation
      • change layout
      • kvm x86 core code will allocate vapic pages
      • (discussion) instead of kvm_vcpu_kick(), just run doorbell
  • vapic page needs to be mapped in nested page table
    • likely requires changes to kvm softmmu code
  • open question wrt device passthrough
    • changes to vfio required
      • ideally fully transparent to userspace

Reviewing Unused and New Features for Interrupt/APIC Virtualization – Jun Nakajima


  • Intel is going to talk about a similar hardware feature
  • intel: have a bitmap, and they decide whether to exit or not.
  • amd: hardcoded. apic timer counter, for example.
  • q to intel: do you have other things to talk about?
    • yes, coming up later.
  • paper on ‘net showed perf improved from 5-6Gig/s to wire speed, using emulation of this tech.
  • intel have numbers on their slides.
  • they used sr-iov 10gbe; measured vmexit
  • interrupt window: when hypervisor wants to inject interrupt, guest may not be running. hyp. has to enter vm. when guest is ready to receive interrupt, it comes back with vmexit. problem: as you need to inject interrupt, more vmexits, guest becomes busier. so: they wanted to eliminate them.
    • read case: if you have something in advance (apic page), hyp can just point to that instead of this exit dance
    • more than 50% exits are interrupt-related or apic related.
  • new features for interrupt/apic virt
    • reads are redirected to apic page
    • writes: vmexit after write; not intercepted. no need for emluation.
  • virt-interrupt delivery
    • extend tpr virt to other apic registers
    • eoi – no need for vm exits (using new bitmap)
      • this looks different from amd
    • but for eoi behaviour, intel/amd can have common interface.
  • intel/amd comparing their approaches / features / etc.
    • most notably, intel have support for x2apic, not for iommu. amd have support for iommu, not for x2apic.
  • for apic page, approaches mostly similar.
  • virt api can have common infra, but data structures are totally different. intel spec will be avl. in a month or so (update: already available now). amd spec shd be avl in a month too.
  • they can remove interrupt window, meaning 10% optimization for 6 VM case
  • net result
    • eliminate 50% of vmexits
    • optimization of 10% vmexits.
  • intel also supports x2apic h/w.
    • this can hide info from other vcpus
    • secure channel between guest and host; can do whatever hypervisor wants.
    • vcpu executes vmfunc instrucion in special thread
  • usecases:
    • allow hvm guests to share pages/info with hypervisor in secure fashion
  • (discussion) why not just add to ept table
  • (discussion) does intel’s int. virt. has iommu component to?
    • doesn’t want to commit.

CoLo – Coarse-grained Lock-stepping VM for non-stop service – Will Auld


  • non-stop service with VM replication
    • client-server
    • Compare and contrast with Ramus – Xen’s solution
      • xen: ramus
        • buffers responses until checkpoint to secondary server completes (once per epic)
        • resumes secondary only on failover
        • failover at anytime
      • xen: colo
        • runs two VMs in parallel comparing their responses, checkpoints only on miscompare
        • resumes after every checkpoint
        • failover at anytime
  • CoLo runs VMs on primary and secondary at same time.
    • both machines respond to requests; they check for similartiy. When they agree, one of the responses sent to client
  • diff. between two models:
    • ramus: assume machine states have to be same. This is the reason to buffer responses until checkpoint has completed.
    • in colo; no such req. only requirement is request stream must be the same.
  • CoLo non-stop service focus on server response, not internal machine state (since multiprocessor environment is inherently nondeterministic)
  • there’s heartbeat, checkpoint
  • colo managers on both machines compare requests.
    • when they’re not same, CoLo does checkpoint.
  • (discussion) why will response be same?
    • int. machine state shouldn’t matter for most responses.
    • some exceptions, like tcp/ip timestamps.
    • minor modifications to tcp/ip stacks
      • coarse grain time stamp
      • highly deterministic ack mechanism
    • even then, state of machine is dissimilar.
  • resume of machine on secondary node:
    • another stumbling block.
  • (slides) graph on optimizations
  • how do you handle disk access? network is easier – n/w stack resumes on failover. if you don’t do failover in a state where you know disk is in a consistent state, you can get corruption.
    • Two solutions
      • For NAS, do same compares as with responses (this can also trigger checkpoints).
      • On local disks, buffer original state of changed pages, revert to original and them checkpoint with primary nodes disk writes included. This is equivalent to how the memory image is updated. (This was not described complete enough during the session).
  • that sounds dangerous. client may have acked data, etc.
    • will have to look closer at this. (More complete explanation above counters this)
  • how often do you get mismatch?
    • depends on workload. some were like 300-400 packets of good packets, then a mismatch.
  • during that, are you vulnerable to failure?
    • no, can failover at any point. internal state doesn’t matter. Both VMs, provide consistent request streams from their initial state and match responses up to the moment of failover.

NUMA – Dario Faggioli, Andrea Arcangeli

NUMA and Virtualization, the case of Xen – Dario Faggioli


  • Intro to NUMA
    • access costs to memory is different, based on which processor access it
    • remote mem is slower
  • in context of virt, want to avoid accessing remote memory
  • what we used to have in xen
    • on creation of VM, memory was allocated on all nodes
  • to improve: automatic placement
    • at vm1 creation time, pin vm1 to first node,
    • at vm2 create time, pin vm2 to second node since node1 already has a vm pinned to it
  • then they went ahead a bit, because pinning was inflexible
    • inflexible
    • lots of idle cpus and memories
  • what they will have in xen 4.3
    • node affinity
      • instead of static cpu pinning, preference to run vms on specific cpus
  • perf evaluation
    • specjbb in 3 configs (details in slides)
    • they get 13-17% improvements in 2vcpus in each vm
  • open problems
    • dynamic memory migration
    • io numa
      • take into consideration io devices
    • guest numa
      • if vm bigger than 1 node, should guest be aware?
    • ballooning and sharing
      • sharing could cause remote access
      • ballooning causes local pressures
    • inter-vm dependencies
    • how to actually benchmark and evaluate perf to evaluate if they’re improving

AutoNUMA – Andrea Arcangeli


  • solving similar problem for Linux kernel
  • implementation details avail in slides, will skip now
  • components of autonuma design
    • novel approach
      • mem and cpu migration tried earlier using diff. approaches, optimistic about this approach.
    • core design is to two novel ideas
      • introduce numa hinting pagefaults
        • works at thread-level, on thread locality
      • false sharing / relation detection
  • autonuma logic
    • cpu follows memory
    • memory in b/g slowly follows cpu
    • actual migration is done by knuma_migrated
      • all this is async and b/g, doesn’t stall memory channels
  • benchmarkings
    • developed a new benchmark tool, autonuma-benchmark
      • generic to measure alternative approaches too
    • comparing to gravity measurement
    • put all memory in single node
      • then drop pinning
      • then see how memory spreads by autonuma logic
  • see slides for graphics on convergance
  • perf numbers
    • also includes comparison with alternative approach, sched numa.
    • graphs show autonuma is better than schednuma, which is better than vanilla kernel
  • criticism
    • complex
    • takes more memory
      • takes 12 bytes per page, Andrea thinks it’s reasonable.
      • it’s done to try to decrease risk of hitting slowdowns (is faster than vanilla already)
    • stddev shows autonuma is pretty deterministic
  • why is autonuma so important?
    • even 2 sockets show differences and improvements.
    • but 8 nodes really shows autonuma shines


  • looks like andrea focussing on 2 nodes / sockets, not more? looks like it will have bad impact on bigger nodes
    • points to graph showing 8 nodes
    • on big nodes, distance is more.
    • agrees autonuma doesn’t worry about distances
    • currently worries only about convergence
    • distance will be taken as future optimisation
    • same for Xen case
      • access to 2 node h/w is easier
    • as Andrea also mentioned, improvement on 2 node is lower bound; so improvements on bigger ones should be bigger too; don’t expect to be worse
  • not all apps just compute; they do io. and they migrate to the right cpu to where the device is.
    • are we moving memory to cpu, or cpu to device, etc… what should the heuristic be?
      • 1st step should be to get cpu and mem right – they matter the most.
      • doing for kvm is great since it’s in linux, and everyone gets the benefit.
      • later, we might want to take other tunables into account.
    • crazy things in enterprise world, like storage
    • for high-perf networking, use tight binding, and then autonuma will not interfere.
      • this already works.
    • xen case is similar
      • this is also something that’s going to be workload-dependent, so custom configs/admin is needed.
  • did you have a chance to test on AMD Magny-Cours (many more nodes)
    • hasn’t tried autonuma on specific h/w
    • more nodes, better perf, since upstream is that much more worse.
    • xen
      • he did, and at least placement was better.
      • more benchmarking is planned.
  • suggestion: do you have a way to measure imbalance / number of accesses going to correct node
    • to see if it’s moving towards convergence, or not moving towards convergence, maybe via tracepoints
    • essentially to analyse what the system is doing.
    • exposing this data so it can be analysed.
  • using printks right now for development, there’s a lot of info, all the info you have to see why the algo is doing what it’s doing.
  • good to have in production so that admins can see
    • what autonuma is doing
    • how much is it converging
      • to decide to make it more aggressive, etc.
  • overall, all such stats can be easily exported, it’s already avl. via printk, but have to moved to something more structured and standard.
  • xen case is same; trying to see how they can use perf counters, etc. for statistical review of what is going on, but not precise enough
    • tells how many remote memory accesses are happening, but not from where and to where
    • something more in s/w is needed to enable this information.

One balloon for all – towards unified balloon driver – Daniel Kiper


  • wants to integrate various balloon drivers avl. in Linux
  • currently 3 separate drivers
    • virtio
    • xen
    • vmware
  • despite impl. differences, their core is similar
    • feature difference in drivers (xen has selfballooning)
    • overall lots of duplicate code
  • do we have an example of a good solution?
    • yes, generic mem hotplug code
    • core functionality is h/w independent
    • arch-specific parts are minimal, most is generic
  • solution proposed
    • core should be hypervisor-independent
    • should co-operate on h/w independent level – e.g mem hotplug, tmem, movable pages to reduce fragmentation
    • selfballooning ready
    • support for hugepages
    • standard api and abi if possible
    • arch-specific parts should communicate with underlying hypervisor and h/w if needed
  • crazy idea
    • replace ballooning with mem hot-unplug support
    • however, ballooning operates on single pages whereas hotplug/unplug works on groups of pages that are arch-dependent.
      • not flexible at all
      • have to use userspace interfaces
        • can be done via udev scripts, which is a better way
  • discussion: does acpi hotplug work seamlessly?
    • on x86 baremetal, hotplug works like this:
      • hotplug mem
      • acpi signals to kernel
      • acpi adds to mem space
      • this is not visible to processes directly
      • has to be enabled via sysfs interfaces, by writing ‘enable command’ to every section that has to be hotplugged
  • is selfballooning desirable?
    • kvm isn’t looking at it
    • guest wants to keep mem to itself, it has no interest in making host run faster
    • you paid for mem, but why not use all of it
    • if there’s a tradeoff for the guest: you pay less, you get more mem later, etc., guests could be interested.
    • essentially, what is guest admin’s incentive to give up precious RAM to host?

ARM – Marc Zyngier, Stefano Stabellini

KVM ARM – Marc Zyngier


  • ARM architecture virtualization extensions
    • recent introduction in arch
    • new hypervisor mode PL2
    • traditionally secure state and non-secure state
    • Hyp mode is in non-secure side
  • higher privilege than kernel mode
  • adds second stage translation; adds extra level of indirection between guests and physical mem
    • tlbs are tagged by VMID (like EPT/NPT)
  • ability to trap accesses to most system registers
  • can handle irqs, fiqs, async aborts
    • e.g. guest doesn’t see interrupts firing
  • hyp mode: not a superset of SVC
    • has its own pagetables
    • only stage 1, not 2
    • follows LPAE, new physical extensions.
    • one translation table register
      • so difficult to run Linux directly in Hyp mode
      • therefore they use Hyp mode to switch between host and guest modes (unlike x86)
    • uses HYP mode to context switch from host to guest and back
    • exits guest on physical interrupt firing
    • access to a few privileged system registers
    • WFI (wait for interrupt)
      • (discussion) WFI is trapped and then we exit to host
    • etc.
    • on guest exit, control restored to host
    • no nesting; arch isn’t ready for that.
  • MM
    • host in charge of all MM
    • has no stage2 translation itself (saves tlb entries)
    • guests are in total control of page tables
    • becomes easy to map a real device into the guest physical space
    • for emulated devices, accesses fault, generates exit, and then host takes over
    • 4k pages only
  • instruction emulation
    • trap on mmio
    • most instructions described in HSR
    • added complexity due to having to handle multiple ISAs (ARM, Thumb)
  • interrupt handling
    • redirect all interrupts to hyp mode only while running a guest. This only affects physical interrupts.
    • leave it pending and return to host
    • pending int will kick in when returns to guest mode?
      • No, it will be handled in host mode. Basically, we use the redirection to HYP mode to exit the guest, but keep the handling on the host.
  • inject two ways
    • manipulating arch. pins in the guest?
      • The architecture defines virtual interrupt pins that can be manipulated (VI→I, VF→F, VA→A). The host can manipulate these pins to inject interrupts or faults into the guest.
  • using virtual GIC extensions,
  • booting protocol
    • if you boot in HYP mode, and if you enter a non-kvm kernel, it gracefully goes back to SVC.
    • if kvm-enabled kernel is attempted to boot into, automatically goes into HYP mode
    • If a kvm-enabled kernel is booted in HYP mode, it installs a HYP stub and goes back to SVC. The only goal of this stub is to provide a hook for KVM (or another hypervisor) to install itself.
  • current status
    • pending: stable userspace ABI
    • pending: upstreaming
      • stuck on reviewing

Xen ARM – Stefano Stabellini


  • Why?
    • arm servers
    • smartphones
    • 288 cores in a 4U rack – causes a serious maintenance headache
  • challenges
    • traditional way: port xen, and port hypercall interface to arm
    • from Linux side, using PVOPS to modify setpte, etc., is difficult
  • then, armv7 came.
  • design goals
    • exploit h/w as much as possible
    • limit to one type of guest
      • (x86: pv, hvm)
      • no pvops, but pv interfaces for IO
    • no qemu
      • lots of code, complicated
    • no compat code
      • 32-bit, 64-bit, etc., complicated
    • no shadow pagetables
      • most difficult code to read ever
  • NO emulation at all!
  • one type of guest
    • like pv guests
      • boot from a user supplied kernel
      • no emulated devices
      • use PV interfaces for IO
    • like hvm guests
      • exploit nested paging
      • same entry point on native and xen
      • use device tree to discover xen presence
      • simple device emulation can be done in xen
        • no need for qemu
  • exploit h/w
    • running xen in hyp mode
    • no pv mmu
    • hypercall
    • generic timer
      • export timer int. to guest
  • GIC: general interrupt controller
    • int. controller with virt support
    • use GIC to inject event notifications into any guest domains with Xen support
      • x86 taught us this provides a great perf boost (event notifications on multiple vcpus simultaneously)
      • on x86, they had a pci device to inject interrupts to guest at regular intervals (on x86 we had a pci device to inject event notifications as legacy interrupt)
  • hypercall calling convention
    • hvc (hypercall)
    • pass params on registers
    • hvc takes an argument: 0xEA1 – means it’s a xen hypercall.
  • 64-bit ready abi (another lesson from x86)
    • no compat code in xen
      • 2600 lines of lesser code
  • had to write a 1500 line patch of mechanical substitutions to make 32-bit host make all guests work fine
  • status
    • xen and dom0 boot
    • vm creation and destruction work
    • pv console, disk, network work
    • xen hypervisor patches almost entirely upstream
    • linux side patches should go in next merge window
  • open issues
    • acpi
      • will have to add acpi parsers, etc. in device table
      • linux has 110,000 lines – should all be merged
  • uefi
    • grub2 on arm: multiboot2
    • need to virtualise runtime services
    • so only hypervisor can use them now
  • client devices
    • lack ref arch
    • difficult to support all tablets, etc. in market
    • uefi secure boot (is required by win8)
    • windows 8


  • who’s responsbile for device tree mgmt for xen?
    • xen takes dt from hw, changes for mem mgmt, then psases to dom0
    • at present, currently have to build dt binary
  • at the moment, linux kernel infrastructure doesn’t support interrupt priorities.
    • needed to prevent a guest doing a DoS on host by just generating interrupts non-stop
    • xen does support int. priorities in GIC

VFIO – Are we there yet? – Alex Williamson


  • are we there yet? almost
  • what is vfio?
    • virtual function io
    • not sr-iov specific
    • userspace driver interface
      • kvm/qemu vm is a userspace driver
    • iommu required
      • visibility issue with devices in iommu, guaranteeing devices are isolated and safe to use – different from uio.
    • config space access is done from kernel
      • adds to safety requirement – can’t have userspace doing bad things on host
  • what’s different from last year?
    • 1st proposal shot down last year, and got revised at last LPC
    • allow IOMMU driver to define device visibility – not per-device, but the whole group exposed
    • more modular
  • what’s different from pci device assignment
    • x86 only
    • kvm only
    • no iommu grouping
    • relies on pci-sysfs
    • turns kvm into a device driver
  • current staus
    • core pci and iommu drivers in 3.6
    • qemu will be pushed for 1.3
  • what’s next?
    • qemu integration
    • legacy pci interrupts
      • more of a qemu-kvm problem, since vfio already supports this, but these are unique since they’re level-triggered; host has to mask interrupt so it doesn’t cause a DoS till guest acks interrupt
        • like to bypass qemu directly – irqfd for edge-triggered. now exposing irqfd for level
  • (lots of discussion here)
  • libvirt support
    • iommu grps changed the way we do device assignment
    • sysfs entry point; move device to vfio driver
    • do you pass group by file descriptor?
    • lots of discussion on how to do this
    • existing method needs name for access to /sys
    • how can we pass file descriptors from libvirt for groups and containers to work in different security models?
      • The difficulty is in how qemu assembles the groups and containers. On the qemu command line, we specify an individual device, but that device lives in a group, which is the unit of ownership in vfio and may or may not be connectable to other containers. We need to figure out the details here,
  • POWER support
    • already adding
  • PowerPC
    • freescale looking at it
    • one api for x86, ppc was strange
  • error reporting
    • better ability to inject AER etc to guest
    • maybe another ioctl interrupt
    • What are we going to be able to do if we do get PCIe AER errors to show up at a device, what is the guest going to be able to do (for instance can it reset links).
      • We’re going to have to figure this out and it will factor into how much of the AER registers on the device do we expose and allow the guest to control. Perhaps not all errors are guest serviceable and we’ll need to figure out how to manage those.
  • better page pinning and mapping
    • gup issues with ksm running in b/g
  • PRI support
  • graphics support
    • issues with legacy io port space and mmio
    • can be handled better with vfio
  • NUMA hinting

Semiassignment: best of both worlds – Alex Graf


  • b/g on device assignment
  • best of both worlds
    • assigned device during normal operation
    • emulated during migration
  • xen solution – prototype
    • create a bridge in domU
    • guest sees a pv device and a real device
    • guest changes needed for bridge
    • migration is guest-visible, since real device goes away and comes back (hotplug)
      • security issue if VM doesn’t ack hot-unplug
  • vmware way
    • writing a new driver for each n/w device they want to support
    • this new driver calls into vmxnet
    • binary blob is mapped into your address space
    • migration is guest exposed
      • new blob needed for destination n/w card
  • alex way
    • emulate real device in qemu
    • e.g. expose emulated igbvf if passing through igbvf
    • need to write migration code for each adapter as well
  • demo
    • doesn’t quite work right now
  • is it a good idea?
  • how much effort really?
    • doesn’t think it’s much effort
    • current choices in datacenters are igbvf and <something else>
      • that’s not true!
      • easily a dozen adapters avl. now
      • lots of examples given why this claim isn’t true
        • no one needs single-vendor/card dependency in an entire datacenter
  • non-deterministic network performance
  • more complicated network configuration
  • discussion
    • Another solution suggested by Benjamin Herrenschmidt: use s3; remove ‘live’ from ‘live migration’.
    • AER approach
  • General consensus was to just do bonding+failover

KVM performance: vhost scalability – John Fastabend


  • current situation: one kernel thread per vhost
  • if we create a lot of VMs and a lot of virtio-net devices, perf doesn’t scale
  • not numa aware
  • Main grouse is it doesn’t scale.
  • instead of having a thread of every vhost device, create a vhost thread per cpu
  • add some numa-awareness scheduling – pick best cpu based on load
  • perf graphs
    • for 1 VM, number of instances of netperf increase, per-cpu-vhost doesn’t shine.
    • another tweak: use 2 threads per cpu: perf is better
  • for 4 VMs, results are good for 1-thread. much better than 2-thread. (2 thread does worse than current) With 4VMs per-cpu-vhost was nearly equivalent.
  • on 12 VMs, 1-thread works better, 2-thread works better than baseline. Per cpu-vhosts shine here outperforming baseline and 1-thread/2-thread cases.
  • tried tcp, udp, inter-guest, all netperf tests, etc.
    • this result is pretty standard for all the tests they’ve done.
  • RFC
    • should they continue?
    • strong objections?
  • discussion
    • were you testing with raw qemu or libvirt?
      • as libvirt creates its own cgroups, and that may interfere.
    • pinning vs non-pinning
      • gives similar results
  • no objections!
  • in a cgroup – roundrobin the vhost threads – interesting case to check with pinning as well.
  • transmit and receive interfere with each other – so perf improvement was seen when they pinned transmit side.
  • try this on bare-metal.

Network overlays – Vivek Kashyap

  • want to migrate machines from one place to another in a data center
    • don’t want physical limitations (programming switches, routers, mac addr, etc)
  • idea is to define a set of tunnels which are overlaid on top of networks
    • vms migrate within tunnels, completely isolated from physical networks
  • Scaling at layer 2 is limited by the need to support broadcsat/multicast over the network
  • overlay networks
    • when migrating across domains (subnets), have to re-number IP addresses
      • when migrating need to migrate IP and MAC addresses
      • When migrating across subnets might need to re-number or find another mechanism
    • solution is to have a set of tunnels
    • every end-user can view their domain/tunnel as a single virtual network
      • they only see their own traffic, no one else can see their traffic.
  • standardization is required
    • being worked on at IETF
    • MTU seen as VM is not same as what is on the physical network (because headers added by extra layers)
    • vxlan adds udp headers
    • one option is to have large(er) physical MTU so it takes care of this otherwise there will be fragmentation
      • Proposal
        • If guest does pathMTU discovery let tunnel end point return the ICMP error to reduce the guest’s view of the MTU.
        • Even if the guest has not set the DF (dont fragment) bit return an ICMP error. The guest will handle the ICMP error and update its view of the MTU on the route.
        • having the hypervisor to co-operate so guests do a path MTU discovery and things work fine
          • no guest changes needed, only hypervisor needs small change
  • (discussion) Cannot assume much about guests; guests may not handle ICMP.
  • Some way to avoid flooding
    • extend to support an ‘address resolution module’
    • Stephen Hemminger supported the proposal
  • Fragmentation
    • can’t assume much about guests; they may not like packets getting fragmented if they set DF
    • fragmentation highly likely since new headers are added
      • The above is wrong comment since if DF is set we do pathMTU and the packet wont be fragmented. Also, the fragmentation if done is on the tunnel. The VM’s dont see fragmentation but it is not performant to fragment and reassemble at end points.
      • Instead the proposal is to use PathMTU discovery to make the VM’s send packets that wont need to be fragmented.
  • PXE, etc., can be broken
  • Distributed Overlay Ethernet Network
    • DOVE module for tunneling support
      • use 24-bit VNI
  • patches should be coming to netdev soon enough.
  • possibly using checksum offload infrastructure for tunneling
  • question: layer 2 vs layer 3
    • There is interest in the industry to support overlay solutions for layer 2 and layer 3.

Lightning talks

QEMU disaggregation – Stefano Stabellini


  • dom0 is a privileged VM
  • better model is to split dom0 into multiple service VMs
    • disk domain, n/w domain, everything else
      • no bottleneck, better security, simpler
  • hvm domain needs device model (qemu)
  • wouldn’t it be nice if one qemu does only disk emulation
    • second does network emulation
    • etc.
  • to do this, they moved pci decoder in xen
    • traps on all pci requests
    • hypervisor de-multiplexes to the ‘right’ qemu
  • open issues
    • need flexibility in qemu to start w/o devices
    • modern qemu better
      • but: always uses PCI host bridge, PIIX3, etc.
    • one qemu uses this all, others have functionality, but don’t use it
  • multifunction devices
    • PIIX3
  • internal dependencies
    • one component pulls others
      • vnc pulls keyboard, which pulls usb, etc.
  • it is in demoable state

Xenner — Alex Graf

  • intro
    • guest kernel module that allows a xen pv kernel to run on top of kvm – messages to xenbus go to qemu
  • is anyone else interested in this at all?
    • xen folks last year did show interest for a migration path to get rid of pv code.
    • xen is still interested, but not in short time. – few years.
    • do you guys want to work together and get it rolling?
      • no one commits to anything right now


Syndicated 2013-01-25 11:36:01 (Updated 2013-05-22 10:22:52) from Think. Debate. Innovate. - Amit Shah's blog

10 Jan 2013 (updated 22 May 2013 at 11:16 UTC) »

About Random Numbers and Virtual Machines

Several applications need random numbers for correct and secure operation.  When ssh-server gets installed on a system, public and private key paris are generated.  Random numbers are needed for this operation.  Same with creating a GPG key pair.  Initial TCP sequence numbers are randomized.  Process PIDs are randomized.  Without such randomization, we’d get a predictable set of TCP sequence numbers or PIDs, making it easy for attackers to break into servers or desktops.


On a system without any special hardware, Linux seeds its entropy pool from sources like keyboard and mouse input, disk IO, network IO, and any other sources whose kernel modules indicate they are capable of adding to the kernel’s entropy pool (i.e .the interrupts they receive are from sufficiently non-deterministic sources).  For servers, keyboard and mouse inputs are rare (most don’t even have a keyboard / mouse connected).  This makes getting true random numbers difficult: applications requesting random numbers from /dev/random have to wait for indefinite periods to get the randomness they desire (like creating ssh keys, typically during firstboot.).


For applications that need random numbers instantaneously, but can make do with slightly low-quality random numbers, they have the option of getting their randomness from /dev/urandom, which doesn’t block to serve random numbers — it’s just not guaranteed that the numbers one receives from /dev/urandom truly reflect pure randomness.  Indiscriminate reading of /dev/urandom will reduce the system’s entropy levels, and will starve applications that need true random numbers.  Random numbers in a system are a rare resource, so applications should only fetch them when they are needed, and only read as many bytes as needed.


There are a few random number generator devices that can be plugged into computers.  These can be PCI or USB devices, and are fairly popular add-ons on servers.  The Linux kernel has a hwrng (hardware random number generator) abstraction layer to select an active hwrng device among several that might be present, and ask the device to give random data when the kernel’s entropy pool falls below the low watermark.  The rng-tools package comes with rngd, a daemon, that reads input from hwrngs and feeds them into the kernel’s entropy pool.


Virtual machines are similar to server setups: there is very little going on in a VM’s environment for the guest kernel to source random data.  A server that hosts several VMs may still have a lot of disk and network IO happening as a result of all the VMs it hosts, but a single VM may not be doing much to itself generate enough entropy for its applications.  One solution, therefore, to sourcing random numbers in VMs is to ask the host for a portion of the randomness it has collected, and feed them into the guest’s entropy pool.  A paravirtualized hardware random number generator exists for KVM VMs.  The device is called virtio-rng, and as the name suggests, the device sits on top of the virtio PV framework.  The Linux kernel gained support for virtio-rng devices in kernel 2.6.26 (released in 2008).  The QEMU-side device was added in the recent 1.3 release.


On the host side, the virtio-rng device (by default) reads from the host’s /dev/random and feeds that into the guest.  The source of this data can be modified, of course.  If the host lacks any hwrng, /dev/random is the best source to use.  If the host itself has a hwrng, using input from that device is recommended.


Newer Intel architectures (IvyBridge onwards) have an instruction, RDRAND, that provides random numbers.  This instruction can be directly exposed to guests.  Guests probe for the presence of this instruction (using CPUID) and use it if available.  This doesn’t need any modification to the guest.  However, there’s one drawback to exposing this instruction to guests: live migration.  If not all hosts in a server farm have the same CPU, live-migrating a guest from one host that exposes this instruction to another that doesn’t, will not work.  In this case, virtio-rng in the host can be configured to use RDRAND as its source, and the guest can continue to work as in the previous example.  This is still sub-optimal, as we’ll be passing random numbers to the guest (as in the case of /dev/random), instead of real entropy.  The RDSEED instruction, to be introduced later (Broadwell onwards) will provide entropy that can be safely passed on to a guest via virtio-rng as a source of true random entropy, eliminating the need to have a physical hardware random number generator device.


It looks like QEMU/KVM is the only hypervisor that has the support for exposing a hardware random number generator to guests.  (One could pass through a real hwrng to a guest, but that doesn’t scale and isn’t practical for all situations — e.g. live migration.)  Fedora 19 will have QEMU 1.4, which has the virtio-rng device, and even older guests running on top of F19 will be able to use the device.


For more information on virtio-rng, see the QEMU feature page, and the Fedora feature page.  LWN.net has an excellent article on random numbers, based on H. Peter Anvin’s talk at LinuxCon EU 2012.


Updated 2013 May 22: Added info about RDSEED and the Fedora feature page, corrected few typos.

Syndicated 2013-01-10 20:57:30 (Updated 2013-05-22 10:17:40) from Think. Debate. Innovate. - Amit Shah's blog

9 Jan 2013 (updated 22 May 2013 at 11:16 UTC) »

Workarounds for common F18 bugs

I’ve been using the Fedora 18 pre-release for a couple of months now, and am generally happy with how it works.  I filed quite a few bugs, some got resolved, some not.  Here’s a list of things that don’t work as they used to in the past, with workarounds so they may help others:

  • Bug 878619Laptop always suspends on lid close, regardless of g-s-t policy: I used to set the action on laptop lid close to lock the screen by default, instead of putting it in the suspend state.  I used to use the function keys or menu item to suspend earlier.  However, with GNOME 3.6 in F18, the ‘suspend’ menu item has gone away, replaced by ‘Power Off’.  The developers have now removed the dconf settings to tweak the action of lid close (via gnome-tweak-tool or dconf-editor).  As described in GNOME Bug 687277, this setting can be tweaked by adding a systemd inhibitor:
    systemd-inhibit --what=handle-lid-switch \
                    --who=me \
                    --why=because \
                    --mode=block /bin/sh
  • Bug 887218 – 0.5.0-1 regression: 147e:2016 Upek fingerprint reader no longer works: fprintd may not remember the older registered fingerprints, re-registering them is a workaround.
  • Bug 878412Cannot assign shortcuts to switch to workspaces 5+: I use keyboard shortcuts (Ctrl+F<n>) to switch workspaces.  Till F16, I could assign shortcuts to as many workspaces as are currently in use.  Curiously, with F18, shortcuts can only be assigned to workspaces 1 through 4.  This was a major productivity blocker for me, and an ugly workaround is to create a shell script that switches workspaces via window manager commands: install ‘wmctrl’, and create custom shortcuts to switch workspaces by invoking ‘wmctrl -s <workspace-1>’.  wmctrl counts workspaces from 0, so to switch to workspace 5, invoke ‘wmctrl -s 4′.
  • Bug 878736Desktop not shown after unlocking screensaver: This one is due to some focus-stealing apps and gnome-shell’s new screensaver not working together.  I use workrave, an app that helps me keep my eyesight and wrists in relatively good shape.  Other people have complained even SDL windows (games, qemu VMs, etc.) interact badly with the new screensaver.  For my workaround, I’ve set workrave to not capture focus for now.
  • Bug 878981“Alt + Mouse click in a window + mouse move” doesn’t move windows anymore: The modifier key is now changed to the ‘Super’ key, so Super + mouse click + mouse move works in a similar way to how using the Alt key worked earlier.  I’m still lacking the window resize modifier that KDE offers (modifier key + right-click+mouse move)
  • Bug 878428__git_ps1 not found: I’ve discussed this earlier.

Other than these, a couple of bugs that affect running F18 in virtual machines:

  • Bug 864567display garbled in KVM VMs on opening windows: Using any other display driver for the guest other than cirrus works fine.
  • Bug 810040 – F17/F18 xen/kvm/vmware/hyperv guest with no USB: gnome-shell fails to start if fprintd is present: I mentioned this earlier as well: remove fprintd in the VM, or add ‘-usb’ to the qemu command line.

Syndicated 2013-01-09 13:24:36 (Updated 2013-05-22 10:24:38) from Think. Debate. Innovate. - Amit Shah's blog

7 Dec 2012 (updated 22 May 2013 at 11:16 UTC) »

Mystery Shopper Needed

Most of the spam I receive gets caught by spam filters, and pushed into the separate spam folder.  I check the folder once in a while for false positives.

A recent message in my spam folder, with the subject ‘Mystery shopper needed’ caught my attention:

Mystery Shopper needed

We have post of Mystery Shopper in your area. All you need is to act like a customer, you be will surveying different outlets like Walmart, Western Union, etc and provide us with detailed information about their service.

You will get $200.00 per one task and you can handle as many tasks as you want. Each assignment will take one hour and it wont affect your present occupation because it is flexible.

Before any task we will give you with the resources needed. You will be sent a check or money order, which you will cash and use for the task. Included to the  check would be your assignment payment, then we will provide you details through email. You just need to follow instruction given to you as a Secret Shopper.

If you are interested, please fill in the details below and send it back to us to john_paul2_john@aol.com for approval.

First Name:
Last Name:
Full Address:
City, State and Zip code:
Cell and Home Phone Numbers:

Hope to hear from you soon.

Head of Operations,
John Paul.

I can’t resist going shopping — and being paid for it!  Posted this here in case anyone else missed this email due to “bad” spam filters.  We don’t have Walmart here yet, but we certainly do have Western Union.

PS: If you’re interested in treasure hunts: can you spot who’s actually sending these messages?

Return-Path: <john@rapanuiviaggi.redacted>
Delivered-To: <redacted>
Received: (qmail invoked by alias); 28 Nov 2012 04:07:23 -0000
Received: from dns.hsps.ntpc.edu.tw (EHLO dns.hsps.ntpc.edu.tw) []
        by mx0.gmx.net (mx002) with SMTP; 28 Nov 2012 05:07:23 +0100
Received: from dns.hsps.ntpc.edu.tw (localhost [])
        by dns.hsps.ntpc.edu.tw (Postfix) with ESMTP id C5BD97DF740D;
           Wed, 28 Nov 2012 10:34:02 +0800 (CST)
Received: from dns.hsps.ntpc.edu.tw (localhost [])
        by dns.hsps.ntpc.edu.tw (Postfix) with ESMTP id 7DA667DF7379;
           Wed, 28 Nov 2012 10:34:02 +0800 (CST)
From: "John Paul." <john@rapanuiviaggi.redacted>
Reply-To: john_paul2_john@aol.com
Subject: Mystery Shopper needed.
Date: Wed, 28 Nov 2012 10:34:02 +0800
Message-Id: <20121128023012.M26524@rapanuiviaggi.redacted>
X-Mailer: OpenWebMail 2.52 20060502
X-OriginatingIP: (web2)
MIME-Version: 1.0
Content-Type: text/plain;
To: undisclosed-recipients: ;
X-NetStation-Status: PASS
X-NetStation-SPAM: 0.00/5.00-8.00

Syndicated 2012-12-07 06:37:41 (Updated 2013-05-22 10:28:15) from Think. Debate. Innovate. - Amit Shah's blog

20 Nov 2012 (updated 22 May 2013 at 11:16 UTC) »

__git_ps1 not found after upgrade to Fedora 18

If you have enabled git information in the shell prompt (like branch name, working tree status, etc.) [1], an upgrade to F18 breaks this functionality.  What’s worse, __git_ps1 (a shell function) isn’t found, and a yum plugin goes looking for a matching package name to install, making running any command on the shell *very* slow.

A workaround, till the bug is fixed, is to do:

ln -s /usr/share/git-core/contrib/completion/git-prompt.sh  /etc/profile.d/

Bug 878428, if you want to track progress.

[1] To add such git information in the shell display (for bash), add this to your .bashrc file:

export PS1='\[\033[00;36m\]\u@\h\[\033[00m\]:\[\033[01;34m\] \w\[\033[00m\]$(__git_ps1 " (%s)")\$ '

Syndicated 2012-11-20 12:22:00 (Updated 2013-05-22 10:26:58) from Think. Debate. Innovate. - Amit Shah's blog

18 Nov 2012 (updated 22 May 2013 at 11:16 UTC) »

Avi Kivity Stepping Down from the KVM Project

Avi Kivity giving his keynote speech

Avi Kivity announced he is stepping down as (co-)maintainer of the KVM Project at the recently-concluded KVM Forum 2012 in Barcelona, Spain.  Avi wrote the initial implementation of the KVM code back at Qumranet, and has been maintaining the KVM-related kernel and qemu code for about 7 years now.

In his keynote speech, he mentioned he’s founding a startup with a friend, and hopes to create new technology as exciting as KVM.  He also mentioned they’re in stealth mode right now, so questions about the new venture didn’t get any answers.

He returned to the stage on the second day of the Forum to talk about the new memory API work he’s been doing in qemu, and in his typical dry humour, he mentioned he was supposed to vanish in a puff of smoke after his keynote, but the special effects machinery didn’t work, so he was back on stage.  Avi later rued the lack of laughter at this joke, and that made him very sad.  To offer him some consolation, it was pointed out that not everyone knew of his departure, as many had missed his keynote.  He quipped “that’s even worse than not getting laughs”.

His leadership, as well as his humour, will be missed.  Personally, he’s helped me grow during the last few years we’ve worked together.  But I’m sure whatever he’s working on will be something to look forward to, and we’re not really bidding him adieu from the tech world.

Syndicated 2012-11-18 06:32:05 (Updated 2013-05-22 10:31:17) from Think. Debate. Innovate. - Amit Shah's blog

28 Oct 2012 (updated 22 May 2013 at 11:16 UTC) »

Setting Up Your Free Private Feed Reader

I’ve tried several RSS feed readers, offline as well as online: aKregator, Liferea, rss2email being the ones tried for a long time. One drawback with these offline tools is they may miss feeds when I’m offline for prolonged periods (travel, vacations, etc.). Also, they’re tied to one device; can’t switch laptops and have the feeds be in sync. I tried Google Reader for a while as well, for a solution in the “cloud”, which worked for a while, but not anymore.

So I started to search for an online feed reader, preferably with hosting services, since I didn’t want to keep up with updates to the software. I found several free readers, and Tiny Tiny RSS seemed like a really good option.  The developer hosts an online version of the reader, which I used for quite a while.  (The online service is soon going to be discontinued.)  I was quite content with that option, but when OpenShift was launched, I thought I’d try hosting tt-rss myself: it initially began as an experiment to using OpenShift. Then, when I moved this blog to OpenShift, I realised it didn’t really take much effort to host the blog, and that I could switch my primary instance of tt-rss from the developer-hosted instance to my own. It turned out to be really easy, and here I’ll share my recipe.

I first grabbed the ttrss sources from the git repo:

cd ~/src/
git clone git://github.com/gothfox/Tiny-Tiny-RSS.git

I then created an OpenShift php app.

cd ~/openshift
rhc app create -a ttr -t php-5.3

Then, added a mysql db and the phpmyadmin tool to manage the db, in case something goes wrong sometime.

rhc-ctl-app -e add-mysql-5.1 -a ttr
rhc-ctl-app -e add-phpmyadmin-3.4 -a ttr

After this initial setup, I copied all the files from the ttrss src dir to the php/ directory of the OpenShift repo:

cp -r ~/src/Tiny-Tiny-RSS/* ~/openshift/ttr/php/

Next is to add all the files to the git repo:

cd ~/openshift/ttr/
git add php
git commit -m 'Add tt-rss sources'

Now to set up the environment on the server for tt-rss to work in. E.g. creating directories where tt-rss will store its feed icons, temporary files, etc. This is needed, as the OpenShift git directory is transient: it’s deleted and re-created whenever ‘git push’ is done. So to store persistent data between git pushes, we need to use the OpenShift data directory. Create an app build-time action hook to setup the proper directory structure each time the app is built (i.e. after a git push). Learn more about the different build hooks here.

Edit the .openshift/action_hooks/build file, so it looks like this:

# This is a simple build script, place your post-deploy but pre-start commands
# in this script.  This script gets executed directly, so it could be python,
# php, ruby, etc.


if [ ! -d $TMP_DIR ]; then
    mkdir $TMP_DIR

if [ ! -d $LOCK_DIR ]; then
    mkdir $LOCK_DIR

if [ ! -d $CACHE_DIR ]; then
    mkdir $CACHE_DIR

if [ ! -d $CACHE_DIR/export ]; then
    mkdir $CACHE_DIR/export

if [ ! -d $CACHE_DIR/images ]; then
    mkdir $CACHE_DIR/images

Make this file executable, and commit the result:

chmod +x .openshift/action_hooks/build
git add .openshift/action_hooks/build
git commit -m 'build hook: create and link to persistent RW directories'

Next was to create the tt-rss config file from the provided template:

cd ~/openshift/ttr/php/
cp config.php-dist config.php

And then editing the config file.

First, the DB info. I created a new db user via the phpmyadmin interface, but you can use the default admin user as well.

        define('DB_TYPE', "mysql");
        define('DB_HOST', $_ENV['OPENSHIFT_DB_HOST']);
        define('DB_USER', "<user>");
        define('DB_NAME', "ttr");
        define('DB_PASS', "<your pass>");
        //define('DB_PORT', '5432'); // when neeeded, PG-only

Next come the files and directories section:

        define('LOCK_DIRECTORY', $_ENV['OPENSHIFT_DATA_DIR'] . "/lock");
        // Directory for lockfiles, must be writable to the user you run
        // daemon process or cronjobs under.

        define('CACHE_DIR', $_ENV['OPENSHIFT_DATA_DIR'] . '/cache');
        // Local cache directory for RSS feed content.

        define('TMP_DIRECTORY', $_ENV['OPENSHIFT_DATA_DIR'] . "/tmp");
        // Directory for temporary files

        define('ICONS_DIR',  $_ENV['OPENSHIFT_DATA_DIR'] . '/icons');
        define('ICONS_URL', "ico");

The last icons bit is a modification from the default of ‘feed-icons’. If you’re setting up a new repo, there’s no need to deviate from the default, but when I had deployed the tt-rss instance, the default icons directory was ‘icons’, which unfortunatley clashes with Apache’s idea of what $URL/icons is. So I used ‘ico’. Remember to modify the bit in the build hook above to create the appropriate symlink if this ICONS_URL is changed.

These config settings are the ones specific to OpenShift. Modify the others to suit your needs.

Lastly, add a cron job to update the feeds at an hourly interval:

cd ~/openshift/ttr
mkdir .openshift/cron/hourly

I created a new file, called update-feeds.sh, in the new .openshift/cron/hourly directory, and added the following to it:


$OPENSHIFT_REPO_DIR/php/update.php -feeds >/dev/null 2>&1
date >> $OPENSHIFT_LOG_DIR/update-feeds.log

For troubleshooting cron jobs, you can append custom output to any file in the log directory, like the date being output above. For other ways to update feeds, refer to the tt-rss documentation.

Add this file to git:

cd ~/openshift/ttr
git add .openshift/cron/hourly/update-feeds.sh
git commit -m 'add hourly cron job to update feeds'

Lastly, push the result to the OpenShift servers:

git push

That’s it! Enjoy your completely free (free as in freedom, as well as free as in beer) and personal feed reader in the clouds.

Syndicated 2012-10-28 16:36:14 (Updated 2013-05-22 10:29:35) from Think. Debate. Innovate. - Amit Shah's blog

74 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!