Older blog entries for jmg (starting at number 71)

Well, just got back from the Hospital. It's offical, I have Strep throat. Good thing I caught it now and before it advanced to rhematic fever which just wouldn't be anymore fun. So, starting on the antibotics, and of course not going to work tomorrow. Should still be contagious for the next 24 hours. Gotta hole up at home, and maybe watch a few more movies while I'm here, assuming I have the strength. Hey and maybe even read more of my physics text book! :)

Better go get some food and take the first of the antibotics.

P.S. I did write a more expansive diary entery a few hours ago, but this one will supercede it in the recient diary entires list.

I finished reading Ray Bradbury's Fahrenheit 451 (ISBN: 0345342968) on Thursday I believe it was. Definately interesting. I had started this book shortly before Christmas, but I had misplaced it. So that's why I read it so quickly. That and it's a short book. It's funny, in the afterword (or coda) he says: "Only six weeks ago, I discovered that, over the years, some cubby-hole editors at Ballantine Books, fearful of contaminating the young, had, bit by bit, censored some 75 separate sections from the novel." Funny that a novel about sensorship was sensored itself. Was thinking I wasn't going to be able to include the MARC record, but here you go:

001 4768383
003 CU-UC
005 20010211081127.3
008 870928c19871953nyu            00 0 eng d
020    $a 0345342968
035    $9 4768383
040    $a VVX $c VVX $d CLU
100 1  $a Bradbury, Ray, $d 1920-
245 10 $a Fahrenheit 451 : $b Fahrenheit 451 -- the temperature at which bookpap
er catches fire, and burns ... / $c Ray Bradbury.
260    $a New York : $b Ballantine Books, $c 1987, c1953.
300    $a 179 p. ; $c 18 cm.
500    $a "A Del Rey book."
500    $a With an Author's afterword.
546    $a English

The past two days have not been fun. I've come down with something and finally made an appointment for today to have it looked at. Haven't been able to get any sleep the past two nights as my nose is so runny that I can't find a decient possition to sleep in. It was so bad Saturday morning that I was thinking of going to the Emergancy room, but found that some IBprofin reduced the swelling in the back of my throat making it more managable. Sometimes it's nice being sick, but when you can't sleep, being sick just sucks.

Oh, does anyone have a flashrom burning in the bay area? I reciently tried to upgrade my FIC PA-2007 motherboard with a beta bios, and now the machine doesn't even like a ISA VGA card it in. You turn it one, and it just beeps at you. :( Need to burn the old BIOS back into the chip. (or test with another bios that is not beta).

alisdair and ajv:
In the day-n-age of dynamic web content, why pregenerate them? I wrote a cgi script a while back that will generate thumbnails (and downsized versions) of my images when requested. I of course store the resulting image in a database so I don't have to regenerate them every time. Plus it autogenerates indexes for the images too. ls *.jpg > somename.indx and I have an index. As for that, why not use jpeg + pnmutils? djpeg | pnmscale -xysize 175 175 | cjpeg -optimize? That's what Unix was designed for, putting commands together to build a bigger project.

Hmmm, realized that I'm really hungry right now. Off to find some food.

I agree filesystem versioning is nice, but take a look at VMS, it had this. There are a lot of cavets with this, like the fact that you have to delete old versions to free up disk space, etc. and how many times have you saved a file just because of habit? If it was an optional on, then it might be useful. Though with NetApp snapshots and snapshots supported under SoftUpdates, you aren't to far from supporting file system versioning.

A friend of mine mentioned that you might want to check out BeOS for filesystem metadata. Sounds like you can have "virtual folders" that contain queries of the meta-data. So you can have a folder of all of your mp3's or other such things. I haven't ever used BeOS myself, but probably something to checkout.

Last night I finished another book ISBN 0062585274. The MARC record:

005 19950629160347.0
035    $a (FCLAZ)AJC5084FS00003/01/199506/29/199523409Bpam a D0FS
008 940314s1994    cau           000 1 eng
010    $a    94011765
020    $a 0062585274 : $c $13.00 ($18.00 Can.)
035    $a (Source)ONIFS193-     7
035    $a (OCoLC)30075186
040    $a DLC $c DLC
049    $a FDAA ksp
050 00 $a PS3552.E7942 $b R56 1994
082 00 $a 813/.54 $2 20
100 1  $a Besher, Alexander.
245 10 $a Rim : $b a novel of virtual reality / $c Alexander Besher.
260    $a ÑSan Franciscoã : $b HarperCollins West, $c c1994.
300    $a x, 357 p. ; $c 18 cm.
650  0 $a Imaginary wars and battles $x Fiction.
650  0 $a Virtual reality $x Fiction.
655  7 $a Science fiction.
655  7 $a War stories.

Hmmm, that's interesting. They consider it a "War stories." book. There wasn't any true battles in this book, it would be more of a spy novel than anything. Oh well. This book is really a bit confusing and ends very abruptly. You get to like 20 pages left in the book and feel there should be at least another 75 or so, but he ends up finishing the book like he was on a dead line or something. Heck, the book ended so quickly that I couldn't even really tell you how it ended (except the good guys won of course). I will be going back to read the end of it sometime.

You might want to take a look at the extend attributes work that is ongoing in <project>FreeBSD</project>. I'm not the best person to contact about it, but rwatson will know most about this (at least I'm pretty sure it's rwatson). They plan on using extended attributes to store info like ACL's, plus possibly mime-type info. Shouldn't be hard to extend that to HTTP headers to be sent along with file too. No more anoying .meta files.

Well, my Python dbwrap project I started a few days ago is coming along nicely. I have it pretty functional. It wraps a DB-API 2.0 compliant driver into a dictionary like structure. So, it might look like: b[(pgdb, ':operations']['users'][('uid', 'jmg')]['registered'] = 'now' which will then set the column registered to now (assuming DateTime field where uid = 'jmg' in the table users. This makes it a bit easier to use a DB in your app instead of having to constantly write your own select/update/insert queries. I probably should reduce the number of times I do select queries, but I want to make sure I don't skip anything.

I am thinking about making the getdict function return a immutable dictionary (raise an error on __setitem__) so you don't assume that when you set this dictionary you can set it. Now that I think about it, why do I even need to support the getdict function? :) Wish I could release a binary only copy (yeh, I know about py[co] files) so people can't see how bad the code looks. I've also only tested with with pgdb (PyGreSQL) which interfaces to PostgreSQL v6.5. I might have issues with other db apis. I also don't test which parameter passing the API supports, but that's because the parameter passing the pgdb supports is kinda broken (or maybe it's because it's more designed for Python v2.0).

Oh well, enough for now. How anoying, there isn't a project tag.

Religion: Well, I'm not an atheist, but atheists should have an issue with Advogato. There is no way to say someone doesn't exist. The default certification of an observer means that they exist, therefore everyone on Advogato believes that God and Satan exist because everyone at least has them certified as observer. Oh well. :)

Thanks for the email, but there definately needs to be more expansion of the docs on PyTypeObject declarations. Also, listobject.c doesn't show where/how/when PyList_New gets called. Right now I'm assuming that I create a module with a routine that does the instantiation of the type, which would make sense, but as you have found out, the docs in this area (at least the offical docs) are non-existant. Glad that you are going to work on the docs, and if you'd like a newbie to review them, send 'em my way! :)

Still need to decide what I want to use to develope a dynamic web site I want to create. Debating if I want to go with fly (which I happen to know the people who developed it, and I hold the mime-type for them), or create my own in python. Probably should go and see about installing Postgres (oh, that's right, I already have it installed, just not doing anything with it) and playing around with it. fly should be really fast as it was written in lex/yacc/C. The fun of db work.

From my experiences, there is nothing more anoying than a bug report that says something is broken (when it's obvious, esspecially documentation) w/o any help. When the documentation is in as much disarray as Defining New Object Types there really isn't that much I can do about it. Either a bug report should already be there, or the group that is working on the documentation should already know about it. It has improved, but still, isn't usable unless you already know what's going on. As for reading the source code, that's what I do. But there always a point when the need/want to do a project isn't great enough to go through reading all the source code looking for good examples of what you want to do.

Also, if PyClass_* is internal, why is it publicly defined? And if it's private, then how are you suppose to create an object to pass back for maintaining a connection such as the state for a Z39.50 connection w/o using a class? Defining a new type seems a bit extreme, and defining a whole module wouldn't work as you don't instanciate that for each connection. So how else am I suppose to do that w/o using PyClass_*? Again, it really sounds like I should just do a blind wrapper around the functions and do all the glue code in Python. It'll be easier, and don't have to worry about as many coding bugs.

As for bitching to Advogato, don't forget that this is a diary entry. It's what happens to be on my mind when I'm writing the diary, and be frustrating for me.

29 Jan 2001 (updated 29 Jan 2001 at 07:04 UTC) »

Well, just started to try and write a Z39.50 Python module and forgot how poor some of Python's documentation is. It completely doesn't document any of the PyClass calls, and the documentation for 2.0 is exactly the same in parts. Guess no one felt it worth the time to spend even five minutes improving the Python/C API documentation. Now I remeber why it's easier to write most of the code in Python, and only keep a small C kernel as an interface to the calls, instead of wasting your time making the module perform deciently.

Now, do I spend the time to make this module or not?

Well, I finally finished reading Dante Alighieri's Inferno (ISBN: 0691018960) today. Personally I preferred Larry Niven/Jerry Pournelle's (ISBN: 0671670557) version of it. It pretty much contains all that is in Dante's version without destroying it too much, and of course it's a tad bit easier to read. Though I would like to be able to read the original Italian text, but I don't think I'll be learning Italian anytime soon to do it.

Started reading Rim by Alexander Besher (ISBN: 0062585274) and so for it looks like a good book. It's kinda hard to understand exactly if you're in a virtual world or the real one. They also throw out strange terms that you have no way of knowing what exactly they are. Hope that it gets explained soon, but so for I'm less than impressed.

Guess I should head off to the local bar and grab a beer. It really does feel about that time.

Bloody hell, repeate after me, read the recent diary entries before posting your entry.

Do they happen to web cast their station? Sounds not to different from KWVA which is University of Oregon's college radio station.

Yeh, I was thinking that it wouldn't be too hard to partse the web page that they have at LOC and if I had to imput them, that's what I'd do. Now I haven't taken a look at the DTD's that are suppose to be now used for XML version of MARC and I would think that they have most of that defined in the DTD's, but I'm not that familar with XML really.

technik: Check out PicoBSD based on FreeBSD. I know that they have done various work on making a small system bootable. Not sure if they've relaxed the requirements now that you have do large images on CD's, but I do remeber them talking about doing something similar.

Hmmm, am I the only one that is noticing that Advogato has gotten really slow all of a sudden? Both from home and work it takes a good 30 seconds to bring up a page, oh well.

Thanks for the hint, I didn't know that there was a performance difference. Guess I'll have to switch to has_key then. I've been thinking of getting trying out Python 2 one of these days, but just haven't gotten around to installing it. As for the try/except or has_key, that is the problem with languages today. Even the high level languages today need you to do special stuff to tweek out the last inch of performance. (Of course, if you're after performance why the hell are you using a high level language? :) )

I was thinking about printing out that information too, but after I read how many different fields and subfields there are, I decided I'd leave that to the end application. Really all I'd be doing would be writting a fancy name mapping program and I'd rather not do that. As for my MARC converter, I also have it being able to write out python MARC records that you read in. Just need to handle the data in the leader a bit more cleanly (like actually storing it) and it'll be somewhat useful. Now to update the books read page on my home page to use this instead of the crappy scripts I'm using now which fetch data from Borders.

Read the Tabs vs. Spaces article that ariya posted. He is right about point 1 being the core of the argument, but the problem is that if you use spaces instead of tabs, then it becomes hard for others to read your code. I personally use 8 space tabs because that is the FreeBSD style(9) guide lines say. This may sound strange to adopt a project's style guide for your own code, but if we could all agree on a single style, then everyone would have less issues with this argument. Or you could always switch to Python which forces style on you.

I personally agree the 8 space tab stops are good. If you ever get so deeply nested that you can't fit your code on one or two lines, then you need to create more functions for that piece of code. The general rule that if you tab in more than three of four times from your base function then you need to rethink the function is a good thing. If you write with 2 space tab stops, then it's easy to write functions that have about 20 loops in them (that only puts you have way across the screen) without even thinking about it. If you had 8 space tab stops, you'll have issues going beyond 6 nested loops.

I wrote an MARC binary to ascii conversion program last night, but I won't release it till I split it into functions, because the one big function goes a whole four indents in from the base of the function. For me this is too much, and writing more smaller functions makes the code easier to read.

Oh well, just ranting a bit about coding style.

Hmmm, should I rant about the whole binary vs. XML for machine exchange? The reason systems are getty so bloody slow is because they decided to trade a faster to read format for an easier to [human] parse format. If programers continue to decide to go for solutions like these, we will continue to need faster computers, but it doesn't have to be that way.

I was impressed with how easy to parse the MARC format was without giving up extra space and without dealing with endianness. To deal with endianness, they simply encoded the numbers in base10 ASCII. Of course, with python it was too easy to parse the "binary" MARC format to a list of dictionaries.

Now for a bit about python. I always forget to use try/except instead of if statements when it's more appropriate. One example is if you are adding a data element to a dictionary, and you may have duplicate tags. There are a few ways to deal with this. Simply start out using lists for your data elements (which is probably what I should do), or you convert it to a list once you get more than one. An example of the first is:

except KeyError:
	rec[tag] = data
except AttributeError:
	rec[tag] = [rec[tag], data]
The second one would be like:
except KeyError:
	rec[tag] = [data]
Now the latter one in some ways makes more sense, as then you don't have to find out if it's a list or not, and handle them differently, but it also means a bit of extra work in the case that multiple tags are the exception rather than the rule.

Oh well, enough mussing, now hopefully the 45gig IBM 75GXP drive I ordered will be waiting for me today when I get home. I was also lucky to get a couple 128MB PC133 DIMMs for only about $40 each. They were generic stock, but were CAS2 timing. What luck! Of course, I only happen to be using them in PC100 capabile hardware, but I'm debating about ordering a couple more.

Boy, that was fun. Just wrote ean2isbn to convert the EAN bar codes to their proper ISBN. You'll need python installed to run it. It verifies that the EAN check digit is correct and generates the ISBN check digit for you.

It's kind of interesting, University of California lists two different books for the same ISBN number even though the check digits isn't correct, sounds like someone entered the data incorrectly:

hydrogen,ttyp6,/tmp/yaz-1.6,534$client/yaz-client tcp:
Sent initrequest.
Connection accepted by target.
ID     : Z39.50
Name   : CDL Z39.50 SERVER
Version: 1.0
Z> base cat
Z> find @attr 1=7 0937175935
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 2, setno 1
records returned: 0
Z> show +2
Sent presentRequest (1+2).
Records: 2
[Catalog]Record type: USmarc
001 5856206
003 CU-UC
005 20010124215609.3
008 890905s1989    caub     r    00010 eng d
020    $z 0937175930
035    $9 5856206
040    $a SSL $c SSL $d CAS $d OCL $d CUS
100 10 $a Frey, Donnalyn
245 14 $a !%@:, a directory of electronic mail addressing and networks / $c Donn
alyn Frey and Rick Adams.
250    $a 1st ed.
260 0  $a Sebastopol, CA : $b O'Reilly & Associates, $c c1989.
300    $a xv, 284 p. : $b maps ; $c 23 cm.
440  0 $a Nutshell handbook
500    $a At head of title on cover: UNIX Communications.
504    $a Includes bibliographical references and indexes.
546    $a English
650  0 $a Electronic mail systems $x Directories
700 10 $a Adams, Rick
740 01 $a Directory of electronic mail addressing and networks.
740 01 $a UNIX Communications.
[Catalog]Record type: USmarc
001 7612099
003 CU-UC
005 20010124215609.6
008 930310s1992    caua          001 0 eng
010    $a    93124509 //r96
020    $a 0937175935
035    $9 7612099
040    $a DLC $c DLC $d DLC
050 00 $a QA76.76.O63 $b O73 1992
082 00 $a 005.7/13 $2 20
100 1  $a O'Reilly, Tim
245 10 $a Managing UUCP and Usenet / $c Tim O'Reilly and Grace Todino.
250    $a 10th ed.
260    $a Sebastopol, CA : $b O'Reilly and Associates, $c 1992.
300    $a xxiii, 342 p. : $b ill. ; $c 23 cm.
440  2 $a A Nutshell book
500    $a Includes index.
546    $a English
630 00 $a UUCP
630 00 $a UNIX (Computer file)
650  0 $a Usenet (Computer network)
700 10 $a Todino, Grace
nextResultSetPosition = 0

Now to write/find an MARC parser to parse the data from the YAZ client.

62 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!