Older blog entries for tampe (starting at number 79)

Am I unique

Today I consider the idea of uniqueness of identities and mixing identities. Basically the problem is that for a set of objects I want to assign an identity or a set of identities so that construction of identities of combinations of the objects are fast enough and also give enough randomness to be used as keys. It is a good exercise to test out the generator framework and as always working gives inspiration.

So the first question you should ask is do I need to guarantee uniqueness? Can I assume that collisions will be negligible. Sometimes clashes is rare and a cheap extra check at the end of the analysis can be used to verify correctness of the analysis and in the minuscule number of clash cases a mitigating strategy can be used.

Hmm, assume that we have objects X1,X2,... and identities I1,I2,... What I'm wondering is if one can use random double precision numbers between in [0.1,0.9] as identities. Then the combination [X1,X2] could be build by simply use (R * X1 + (1-R)*X2), with =.1<R>0.9 a random double. This is rather fast to compute and should mix well and a key should be possible to extract from this operation. Anyone knows about the properties of such an approach?

Here is another approach where clashes do not happen for correct choices of p and q which also has the potential to mix well.

    q**(k1*k2*k3..km) mod p,  
if you are going to mix X:s and Y:s (X x Y) you could generate identities q**i mod p, for X and q**(k1*i) mod p, for Y and just multiply the identities modulo p to get new identities. This could mix well, lets happily hack some generators for this...

(define id-gen
  K -> (mlet ((N   (for X in (li K) (mul X)))
	      (P   (get-prime N))
	      (Q   (MOD (get-prime (* 100 N)) P))
	      (R   (for X in (li K) 
                      (mk-ar (.powmod Q ** (mul X 1..) 
                           mod P)))))

[P (slet ((N (for X in (ar R) (mk-ar X)))) (mk-g [(/. IS (for I in (li IS) (coll (.NTH N I)))) (sig IS (for I in (li IS) (.SETF (.ELT R I) (.MOD (.* (.ELT N I) (.ELT R I)) P)))) (/. IS (for I in (li IS) (coll (.NTH N I)))) (/. _ (setok)) |(pop-like)]))])

The input K is a list that describes the number of objects in each channel (number of X:s , number of Y:s) and so on. N is simply the total number of combinations-. P will be a prime larger then N according to some method (like the first one) Q is just another prime number (we assume that P >> 100 here) R will be q**1, q**k1, q**(k1*k2),... not including N. The 1.. in mul tells the transformer to start from 1.., eg 1, k1, k1*k2, etc. powmod is just a function to quickly evaluate q**k (g**k = q**(k/2) * q**(k/2)).

The slet is a let construct that allows the value of N to be pop and pushable e.g. internal state. N just initializes to a copy of R. mk-g will generate the generator (this is poor mans object system but it's prototyping so I don't care very much and the functionality is similar) Now the list should be a list of lambdas according to 1. get the value e.g. the identity and we have a selector here that picks up the correct identity channel (identity for X or for Y). 2. finding the next value. just updates q^(i*k) -> q^/(++i k) mod p. Sig is a macro mainly introduce code so that we only take the next value ones in each iteration. the third one is the final value and the fourth is the construct to make sure the updating is correct. and the rest is basically memory management. Not a perfect interface but you never get it right if you do not try.

What I learned from my work with this was that it's nice to have lambdas that generate generators e.g.

(mlet (([P G] (id-gen [10 1000 1000]))
       (Fxy   (./. (_) (next [1 2] G)))
       (Ff    (./. (_) (next [0]   G))))
  (gen Key in (li H)
       (coll (switch-if (.= Key f)
P is the prime number used in the modulo code, now /. is the lambda construct in qi, and ./. is a similar lambda construct for generators. So G will be a generator and Fxy will be a generator generating lambda that captures in the closure G. I coded ./. by using something like,

(defmacro ./. 
  [X Y] -> (mlet ((F   (gensym "f"))
		  (F.  (gensym "f."))
		  (F.. (gensym "f..")))
	     (do (eval (splice [[*X* X]]
			       [define F *X* -> Y]))
		 (.mk F F. F.. X id)
This is ugly but it should work. (I know pure lisp can express this logic better) .mk basically constructs a point-wise generator generating function F. and a lazy and delayable version of it in F..


Ahh modern scripting power, feel the fresh electrons streaming

Two of my heros today is 1. Simple serialization of objects to files and back and 2. maps.

The project was the following. I'm doing CFD and optimization at work and have a pretty big directory structure where all the work is done and organized. All work is organized from unix shells and I tend to not use much graphical tools (although when hacking java code I've used popular IDE's to shell out the code mainly because java seem to be pain without IDE's and the fact that I'm not especiall y good at java). My observation was that I tend to not write aliases and shell constructs to ease the navigation in the structure and wanted to make a change. So todays project was to make it dead simple to capture navigation habbits.

To solve this I noted that what I want is to capture the 10 most popular navigations from any popular directory x. e.g. have a maping from a directory x to a list of popular directories. and make it possible to list them, to choose from the list and to add new navigation patterns. I also stored this information for the most popular x:s directories.

popularity was capture by move to front lists. Listing was done in a sorted order of the most popular navigations. The sorting was peculiar. basically lexical sorting with the lexicals beeing the directories in the path in reversed order (starting from the bottom directory and upwards). The result is presented in a nice textual n x m tabel with a number tag in front of the directory and and the directories was cropped keeping the last 30 characters of path

I work from about 3-4 different machines using the same home directory. So simply serializing and deserializing the datastructures to a shared file for each invocation solved any persistancy problems although expensive (but I don't notice this)

I implemented this in python, worked like a charm and I'm probably hooked on to this concept for now. (Some key-based assigned icons to speed up the cognitive selection of navigations should have been a boon but you cannot get everything, actually not hard to implement because of open picture libraries but icons in the shell is not well supported)

The interface is this.

  d mydir
this is the same as cd mydir, but stores the navigational pattern via move to front mechanisms

This list popular navigations from current directory

  d 3
This walk navigation 3 listed by dd.

All this can be generalized but the sketched approach above should be good enough for my need.

By the way what's up with popular languishes like java! Code generation from gui:s, that's totally crazy! Read my lips ... use macros, and if you want to modify, use __macroexpand__ and if java doesn't support it, Invent it! -it's not that hard, the boiler plate is there.



The background, we have a sequence X1,X2,... and by some strange reason want to sort them incrementally and use functionals of the generated sorted sequences as output. So what tricks do exists?

Try explore (linearity), (monotonisity), (associativity and commutativity), (locallity) and (permutational invariance).

One of the simplest functionals are a basic sum of the sequence and of cause for this you just don't care to sort it in the first place.

S is sorted incrementally, now consider a linear filtering of that sequence and we are just interested in the end result. Most certainly you can use locality to improve performance here but you can also use the following. When sorting you will define a subtree, at each junction point. There define the value of the output (the state of the filter) of the sequence related to this part of the tree as a linear function of the input state of the filter. Doing this means that each new value will demand about log(n) operations in order to update at an insertion in the tree. So this is actually n log(n) - the same complexity as sorting a tree but with a bigger constant in front and memory requirements. Still I found this little trick kind of neat. Noting this approach can be generalized to other kinds of operations like some class of logical ones by abstracting this idea gave some coolness and it is a good little pattern to remember.

So there is a family of tricks you can do. The criteria for these tricks is quite restrictive and it is easy to get out of the n log n domain and into the n**2 domain. The interesting thing though is that this class of functionals are composable, but cannot mix well with point wise operations, the thing is that ones you have issued one of these filters you cannot in general then do point-wise operations on it and after that issue one of these kind of filters again for that to work you need it to be some kind of linear pointwise operation.

So The conclusion is that there are some nice tricks but most probably you will be bound to n**2 complexity.

I'm off make for some weird abstractions, or in other words Good Night!

Be brave, Be Relaxed, Become Someone

Academic is much about mental exploring, If we don't explore we wouldn't have the living so many enjoy today.

The problem today is not that people spend time on their own exploring. The problem is that we don't take time to listen to them and engage in heated discussions because we are supposed to bee so busy exploring

At least that's the reason I tend to do much work by my own, none at work are interested, non of my friends are interested and it is bloody difficult to get a relation with some expert at academia (and this is not their fault, they do not know me, and can't effort building such a thing because of the many loners out there really being more or less lunar)

I've always tried to fight these tendencies and tries to be a good discussion partner at least at work.

A good rule though for anyone exploring is to take your work seriously but be much more relaxed with how you consider yourself. The thing is a road that is so exciting to follow my just tend to be a dead end and that is cool because now you have learned something very important and the skills and tools you will have achieved have multiplied.

And last something that sort of guide me in the analysis of the information wind that combs our mind. Beware of pointing fingers (at least to early) or else we could find ourselves one day standing naked as H.C. Andersen probably would have put it.

Anyone, Sharp Your Mind, And the rest of us will look like a bunch of cowards, cheers!

Sort it out, or should we sort it in

or .........

An interesting family of logic is looping around the median. I mean then median, seriously!!!

(sort X in G by M  (coll (sum (X - (median X))**2))
sorting and looping is usually done by first sort a vector v into a vector w, then continue to work with that vector. The Question I'm asking may seem stupid, but what about trying to so a sort incrementally and pipe the state of the sort to further analysis. You may think that this is just crazy, and it is in many circumstances but you will be able to do one thing quite effectively, and that is to freeze functionality such as median values and the whole family of similar constructs that depends on sorting long before you have finished the full sort.


(for X in G (collect X if (outlier 0.25 X)))
We would like to get the extreme values for further analysis. Assume that the stream of X comes in in a close to random fashion, you could then just take the first N examples and build your model for the distribution from those, then X-values that is extreme to these values will be collected. If N is much much smaller then the whole dataset the above will be linear in complexity and we can avoid a log(n) factor that we have from the quick sort examples and a lot less memory needed.

A simple mock-up of the code for outlier would be

(define outlier
  P X -> 
    (gen from [R V] in X sort R into Tree by R
        (let ((Left  (percentile P       Tree))
              (Right (percentile (- 1 P) Tree)))
            (do (freeze Left  if (< (Delta Left ) Eps))
                (freeze Right if (< (Delta Right) Eps))
                (outside? X [Left Right])))))))
Tree is a reference to a generator that contains the sequence of trees.

One thing you may want to do is to actually loop over the sorted values you may want to calculate the height of the first maximum in the probability density, this is a loop along the sorted values and you should be able to freeze the tree generation according to the behavior of this quantity. The general method is to redo the looping for other special cases there is smart tricks to take. Something like

(sort [X M] in G by M  
  (coll (sort-sum ((X+M)/2 - (freezable-median M))**2))
should now make sense. (M is both a tree and a value depending on the context e.g. scoped duck-typing)


And I tried so hard to be academic

Two more test cases added. Handling of sparse tensors and handling of Blocked tensors. Operations on block matrices can be faster for matrix local operations because you use the cache more efficiently. and sparse matrices is a must to have when doing CFD where you cannot store a full matrix. Doing operations on these creatures such as doing different marginals and permutations carries over just fine. There is some refinement in the logic though. my g.. operator has to be modified for the sparse case to store internally an index tree. One sparse representation can be based on lists of pairs of an index and a value. when you encounter a pair you look into your tree to try to find the correct generator if not there you need to recreate it. Else the logic is the same. For blocked matrices you need to consider blocked symbols coll.X.Y which will use a generator of generators generating generators. Permuting blocked sparse matrices are now a dead simple problem and a large set of basic matrix manipulations becomes trivial using this. Also add the .f modifier, eg sum.X.Y.f will mean that the final value of (sum.X.Y ..) is returned at the calculation. It is not available of cause, but again delayed variables come to rescue and everything should work just fine. Wow maybe academic play just turned very practical.

Good Night

No I know it, I'm a nerd, I just started to drool over my code

I might bore you all with this. If so move on. the thing is that I hope to entertain you all, and being a geek this is my way of entertainment. If I'm alone in the universe so bee it, If only one appreciate this stuff I will be a very happy geek, .... , lets hack a mole!

This week I have completed code to do all the things In the last prototype + some other bit's and pieces that inspiration brought forward. I have prototyped and clean up interfaces for parsing including stuff such as look ahead which contains some general patterns.

An Idea would be to make proper interfaces for threading and process handling. This is an item that has to be handled as well. TODO!

look ahead is interesting, and can be implemented in two ways. Either you "rebase" by your incoming sequence so that you have all the data you need then start the actual looping at a later stage normalize by using delayed variables to get in sync with the rest. The other approach is to work with delayed variables throughout and is simpler but some constructs will then not work.

It would be nice to have a type info for the value of the next value. The type information is mediated as the return value when doing a next, so it would be nice to introduce that logic.

And one more thing. Assume G = (g.. Cnstr) will take a generator constructor Cnstr and generate sequence of generators/transformers/collectors but with the quirk that when a finish signal is sent to G it will restart but keeping all the generators created in memory so it reiterates the same iterators when applied again. Introduce the (x G V) G is a generator of generators and V is generator sequence of values (x G V) will then for each iteration take the corresponding G an update it with the value of V. Now,

(let G (g.. coll)
  (for X in (li L)
       (xfor [myColl Y] in [G (li (ghead X))]
	     (coll (x .myColl Y)))))
meaning a transposing of the list of lists L in a algoritmic efficient way but of cause without an optimizer will have poor performance. It may look a little complicated but from this you can introduce the following syntactic sugar:

(for X in (li L)
   (xfor Y in .X
       (xfor Z in .Y
	   (permute (Y coll) (Z coll) (X sum)) Z))))))

(.X just means you take the value of the generator) This code will behave as if you permuted L and issued something like,

(for Y in (li L)
     (coll (xfor Z in .Y
                 (coll (xfor X in .Z
	               (sum X))))))

To see the transformation of the permute construct rewrite it as

  (forX (forY (forZ (Ycoll (Zcoll (Xsum Z))))

Now first move Ycoll as much to the left as possible, then move Zcoll as much to the left as possible and lastly Xcoll as much to the left as possible. An Acoll may not pass an forA and an Acoll is not allowed to pass through another Bcoll. Then you will get

(forX (forY (Ycoll (forZ (Zcoll (Xsum Z)))))

Then replace (forA (AS .. with (forA (S .. and when that has been done replace (AS Rest) with (x .AS Rest) and insert let statements so that we get (let AS (g.. S) (forA ... Following these rules we reach

(let Xsums (g.. sum)
      (forY (coll
            (forZ (coll (x .Xsum) Z))))
Add the correct paralel iteration over the Xsums and we reach.

(let Xsums (g.. sum)
    (for X in (li L)
        (for Y in .X
           (coll (xfor [Xsum Z] in [Xsums .Y]
                     (coll (x Xsum Z)))))))))

This maybe is a little academic but fun fun fun


Some ideas just fly, where?, I don't now but it's butyfly

Using backtracking to parse is sort of inefficient, you take one try at the time. Using the standard tools for parsing (bison yacc lex) is way faster, so, why am I going to choose to use backtracking?

Easy mental model, compact code, extensible. The idea is that users and library writer should have more possibilities to add good checks and extend the parser - at least I have many times missed these features. The hope is that the approach I'm using will make this not only possible but also not to hard to do

But performance, what about performance? Well think about how we code, we have one big code base and incrementally line by line modify it a little. Why not store extra information so that the parser only backtracks on new code sections by using the extra information to select the correct branch directly (when order of choices does not matter)- is that enough? I don't now I'm pretty new in this game. You can if you like do analysis as well to align the choices in order of likelihood and tricks like that. Maybe I'm wrong but I don't expect this to be an issue practically when taking this observation into account. But things can add up, backtracking, lisp instead of C++, flexible but slow data structures. No big deal, I can as well backtrack if my code can.

I will now dive into code again, but one thing I'm missing in the picture is the buffer pattern, this is a thing that is really nice to have, why not extend that to work in a tree context, think about that. A lot of nice butterfideas will come out of that mental exercise, colorize me, colorize life, colorize you

Cheers to you all

Arborists unite

Have been sick last week, nothing much have been done most of the parser loop example or my endavour in prototyping and experimenting with generators about trees is finished though. Must of this stuff was just a simple generalization of previous stuff so 98% of the stuff should be finished. The logic should needs testing. So whats new ideas have come up of prototyping?

Well, for one thing catching and modulating signals and their values has to be streamlined into the framework.

To have some more notion of setf of generator stuff may be of value (think of letting A represents an element in a data-structure, you might want to set that value to something). this is not anything functional minded persons would recomend, but locally in a transient that functionality is nice to have in order to reach efficiency.

managing state of collectors, transformers and diferent constructs is needed or simplifies logic a lot. This is implemented and the framework should now allow for inclusion of similar constructs that adheres to the same pattern .

Noticed that applying a structure of transformers to a similar structure of transformers should be possible think of making sense of

  (apply [Tr1 [Tr2 Tr3] [val1 Tr4])

Lot's of ideas is maturing in handling of parsing and some of it is proturberized into an examplification.

If you want to check out some of this code with terse comments see loop3 , you can check out my first tree effort in loop1

If you are going to read this stuff, happy sleep and good night

Tracking Signals

Prototyping using the framework for parser generators touches one more aspect. The aspect of needing to walk down the tree multiple times when backtracking. If what you have is an abstract generator X, you will want to first try pattern M1 against X, then restart X and try say pattern M2 against X. Enter the need to push and pop the state of X. Actually you could think on many more of these operations like storing a state in variable A, printing the state, make the loop persist and so on. All these operation and similar operations has a common pattern, so I made macros and now introduce these kind of pattern will more easally done.

The pattern is this. A generator has state. And depends on other generators, thats it. The dependency of all generators will be represented by an acyclic graph. Now in order to make the default illusion you will have to make sure at many places that you will update and therefore the signals hit each object multiple times. This is ok because the logic is to first send a clear signal that on all objects with internal state set a flag to true. then when the update signal such as next push or pop reaches the object it will if the flag is true do the action and then set the flag to false. This makes the design clean but costly. Also the model of the action of a next for example will trigger a flow of signals backwards up the call graph in order to skip finish and yield correctly. The next is also special because also updates internal state by using the internal state and the dependents state

There will also be a flow of data as well, and that is the other category of flow and can be represented by the value av the generator or a view of the state of the generator. There is three main abstract streams, ordinary data, delayed data and lazy data. ordinary data is data as you normally would consider as data, delayed data is data that is not yet computed, but we treat it as a computed data but all functional calculation based on that data will be delayed as well. Lazy data is a lambda that when there is a need for that data, the it will be calculated by evaluating the function.

So The framework is dead slow, but I do not want to make a compiler just yet. The idea is to write a slow but full featured loop environment, have nice syntax. Iron out and streamline the code, construct an extensive set of test cases, then make a compiler using this framework (eat the dog-food) include the forward chaining stuff as well, this beautifies a lot of logic I hope.

What's next after parsing in protyping?, well one thing I'm thinking of is tree manipulation. Consider having a tree, to do a macroexpansion in a lisp tree can be viewed as a way to have a walker walking the tree and when there is a match, transform things that comes "after" and "below" on the same level. Now you might want run many of these modifiers one after the other, (to reuse L2 cashes if multiple cores and L1 cashes if they are on the same core) This is an interesting pattern and will brake the functional approach so it would be interesting to try to design some framework to handle this. Again it will be a nice exercise.

I hope that you see the unix in all this and that you can view it as a nice extension of the piping idea, or as a generator for extensions of the piping idea

Have fun!

70 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!