Older blog entries for jmg (starting at number 68)

Last night I finished another book ISBN 0062585274. The MARC record:

005 19950629160347.0
035    $a (FCLAZ)AJC5084FS00003/01/199506/29/199523409Bpam a D0FS
008 940314s1994    cau           000 1 eng
010    $a    94011765
020    $a 0062585274 : $c $13.00 ($18.00 Can.)
035    $a (Source)ONIFS193-     7
035    $a (OCoLC)30075186
040    $a DLC $c DLC
049    $a FDAA ksp
050 00 $a PS3552.E7942 $b R56 1994
082 00 $a 813/.54 $2 20
100 1  $a Besher, Alexander.
245 10 $a Rim : $b a novel of virtual reality / $c Alexander Besher.
260    $a ÑSan Franciscoã : $b HarperCollins West, $c c1994.
300    $a x, 357 p. ; $c 18 cm.
650  0 $a Imaginary wars and battles $x Fiction.
650  0 $a Virtual reality $x Fiction.
655  7 $a Science fiction.
655  7 $a War stories.

Hmmm, that's interesting. They consider it a "War stories." book. There wasn't any true battles in this book, it would be more of a spy novel than anything. Oh well. This book is really a bit confusing and ends very abruptly. You get to like 20 pages left in the book and feel there should be at least another 75 or so, but he ends up finishing the book like he was on a dead line or something. Heck, the book ended so quickly that I couldn't even really tell you how it ended (except the good guys won of course). I will be going back to read the end of it sometime.

RyanMuldoon:
You might want to take a look at the extend attributes work that is ongoing in <project>FreeBSD</project>. I'm not the best person to contact about it, but rwatson will know most about this (at least I'm pretty sure it's rwatson). They plan on using extended attributes to store info like ACL's, plus possibly mime-type info. Shouldn't be hard to extend that to HTTP headers to be sent along with file too. No more anoying .meta files.

Well, my Python dbwrap project I started a few days ago is coming along nicely. I have it pretty functional. It wraps a DB-API 2.0 compliant driver into a dictionary like structure. So, it might look like: b[(pgdb, ':operations']['users'][('uid', 'jmg')]['registered'] = 'now' which will then set the column registered to now (assuming DateTime field where uid = 'jmg' in the table users. This makes it a bit easier to use a DB in your app instead of having to constantly write your own select/update/insert queries. I probably should reduce the number of times I do select queries, but I want to make sure I don't skip anything.

I am thinking about making the getdict function return a immutable dictionary (raise an error on __setitem__) so you don't assume that when you set this dictionary you can set it. Now that I think about it, why do I even need to support the getdict function? :) Wish I could release a binary only copy (yeh, I know about py[co] files) so people can't see how bad the code looks. I've also only tested with with pgdb (PyGreSQL) which interfaces to PostgreSQL v6.5. I might have issues with other db apis. I also don't test which parameter passing the API supports, but that's because the parameter passing the pgdb supports is kinda broken (or maybe it's because it's more designed for Python v2.0).

Oh well, enough for now. How anoying, there isn't a project tag.

Religion: Well, I'm not an atheist, but atheists should have an issue with Advogato. There is no way to say someone doesn't exist. The default certification of an observer means that they exist, therefore everyone on Advogato believes that God and Satan exist because everyone at least has them certified as observer. Oh well. :)

mwh:
Thanks for the email, but there definately needs to be more expansion of the docs on PyTypeObject declarations. Also, listobject.c doesn't show where/how/when PyList_New gets called. Right now I'm assuming that I create a module with a routine that does the instantiation of the type, which would make sense, but as you have found out, the docs in this area (at least the offical docs) are non-existant. Glad that you are going to work on the docs, and if you'd like a newbie to review them, send 'em my way! :)

Still need to decide what I want to use to develope a dynamic web site I want to create. Debating if I want to go with fly (which I happen to know the people who developed it, and I hold the mime-type for them), or create my own in python. Probably should go and see about installing Postgres (oh, that's right, I already have it installed, just not doing anything with it) and playing around with it. fly should be really fast as it was written in lex/yacc/C. The fun of db work.

mwh:
From my experiences, there is nothing more anoying than a bug report that says something is broken (when it's obvious, esspecially documentation) w/o any help. When the documentation is in as much disarray as Defining New Object Types there really isn't that much I can do about it. Either a bug report should already be there, or the group that is working on the documentation should already know about it. It has improved, but still, isn't usable unless you already know what's going on. As for reading the source code, that's what I do. But there always a point when the need/want to do a project isn't great enough to go through reading all the source code looking for good examples of what you want to do.

Also, if PyClass_* is internal, why is it publicly defined? And if it's private, then how are you suppose to create an object to pass back for maintaining a connection such as the state for a Z39.50 connection w/o using a class? Defining a new type seems a bit extreme, and defining a whole module wouldn't work as you don't instanciate that for each connection. So how else am I suppose to do that w/o using PyClass_*? Again, it really sounds like I should just do a blind wrapper around the functions and do all the glue code in Python. It'll be easier, and don't have to worry about as many coding bugs.

As for bitching to Advogato, don't forget that this is a diary entry. It's what happens to be on my mind when I'm writing the diary, and be frustrating for me.

29 Jan 2001 (updated 29 Jan 2001 at 07:04 UTC) »

Well, just started to try and write a Z39.50 Python module and forgot how poor some of Python's documentation is. It completely doesn't document any of the PyClass calls, and the documentation for 2.0 is exactly the same in parts. Guess no one felt it worth the time to spend even five minutes improving the Python/C API documentation. Now I remeber why it's easier to write most of the code in Python, and only keep a small C kernel as an interface to the calls, instead of wasting your time making the module perform deciently.

Now, do I spend the time to make this module or not?

Well, I finally finished reading Dante Alighieri's Inferno (ISBN: 0691018960) today. Personally I preferred Larry Niven/Jerry Pournelle's (ISBN: 0671670557) version of it. It pretty much contains all that is in Dante's version without destroying it too much, and of course it's a tad bit easier to read. Though I would like to be able to read the original Italian text, but I don't think I'll be learning Italian anytime soon to do it.

Started reading Rim by Alexander Besher (ISBN: 0062585274) and so for it looks like a good book. It's kinda hard to understand exactly if you're in a virtual world or the real one. They also throw out strange terms that you have no way of knowing what exactly they are. Hope that it gets explained soon, but so for I'm less than impressed.

Guess I should head off to the local bar and grab a beer. It really does feel about that time.

Bloody hell, repeate after me, read the recent diary entries before posting your entry.

voltron:
Do they happen to web cast their station? Sounds not to different from KWVA which is University of Oregon's college radio station.

AlanShutko:
Yeh, I was thinking that it wouldn't be too hard to partse the web page that they have at LOC and if I had to imput them, that's what I'd do. Now I haven't taken a look at the DTD's that are suppose to be now used for XML version of MARC and I would think that they have most of that defined in the DTD's, but I'm not that familar with XML really.

technik: Check out PicoBSD based on FreeBSD. I know that they have done various work on making a small system bootable. Not sure if they've relaxed the requirements now that you have do large images on CD's, but I do remeber them talking about doing something similar.

Hmmm, am I the only one that is noticing that Advogato has gotten really slow all of a sudden? Both from home and work it takes a good 30 seconds to bring up a page, oh well.

mwh:
Thanks for the hint, I didn't know that there was a performance difference. Guess I'll have to switch to has_key then. I've been thinking of getting trying out Python 2 one of these days, but just haven't gotten around to installing it. As for the try/except or has_key, that is the problem with languages today. Even the high level languages today need you to do special stuff to tweek out the last inch of performance. (Of course, if you're after performance why the hell are you using a high level language? :) )

AlanShutko:
I was thinking about printing out that information too, but after I read how many different fields and subfields there are, I decided I'd leave that to the end application. Really all I'd be doing would be writting a fancy name mapping program and I'd rather not do that. As for my MARC converter, I also have it being able to write out python MARC records that you read in. Just need to handle the data in the leader a bit more cleanly (like actually storing it) and it'll be somewhat useful. Now to update the books read page on my home page to use this instead of the crappy scripts I'm using now which fetch data from Borders.

Read the Tabs vs. Spaces article that ariya posted. He is right about point 1 being the core of the argument, but the problem is that if you use spaces instead of tabs, then it becomes hard for others to read your code. I personally use 8 space tabs because that is the FreeBSD style(9) guide lines say. This may sound strange to adopt a project's style guide for your own code, but if we could all agree on a single style, then everyone would have less issues with this argument. Or you could always switch to Python which forces style on you.

I personally agree the 8 space tab stops are good. If you ever get so deeply nested that you can't fit your code on one or two lines, then you need to create more functions for that piece of code. The general rule that if you tab in more than three of four times from your base function then you need to rethink the function is a good thing. If you write with 2 space tab stops, then it's easy to write functions that have about 20 loops in them (that only puts you have way across the screen) without even thinking about it. If you had 8 space tab stops, you'll have issues going beyond 6 nested loops.

I wrote an MARC binary to ascii conversion program last night, but I won't release it till I split it into functions, because the one big function goes a whole four indents in from the base of the function. For me this is too much, and writing more smaller functions makes the code easier to read.

Oh well, just ranting a bit about coding style.

Hmmm, should I rant about the whole binary vs. XML for machine exchange? The reason systems are getty so bloody slow is because they decided to trade a faster to read format for an easier to [human] parse format. If programers continue to decide to go for solutions like these, we will continue to need faster computers, but it doesn't have to be that way.

I was impressed with how easy to parse the MARC format was without giving up extra space and without dealing with endianness. To deal with endianness, they simply encoded the numbers in base10 ASCII. Of course, with python it was too easy to parse the "binary" MARC format to a list of dictionaries.

Now for a bit about python. I always forget to use try/except instead of if statements when it's more appropriate. One example is if you are adding a data element to a dictionary, and you may have duplicate tags. There are a few ways to deal with this. Simply start out using lists for your data elements (which is probably what I should do), or you convert it to a list once you get more than one. An example of the first is:

try:
	rec[tag].append(data)
except KeyError:
	rec[tag] = data
except AttributeError:
	rec[tag] = [rec[tag], data]
The second one would be like:
try:
	rec[tag].append(data)
except KeyError:
	rec[tag] = [data]
Now the latter one in some ways makes more sense, as then you don't have to find out if it's a list or not, and handle them differently, but it also means a bit of extra work in the case that multiple tags are the exception rather than the rule.

Oh well, enough mussing, now hopefully the 45gig IBM 75GXP drive I ordered will be waiting for me today when I get home. I was also lucky to get a couple 128MB PC133 DIMMs for only about $40 each. They were generic stock, but were CAS2 timing. What luck! Of course, I only happen to be using them in PC100 capabile hardware, but I'm debating about ordering a couple more.

Boy, that was fun. Just wrote ean2isbn to convert the EAN bar codes to their proper ISBN. You'll need python installed to run it. It verifies that the EAN check digit is correct and generates the ISBN check digit for you.

It's kind of interesting, University of California lists two different books for the same ISBN number even though the check digits isn't correct, sounds like someone entered the data incorrectly:

hydrogen,ttyp6,/tmp/yaz-1.6,534$client/yaz-client tcp:128.48.141.7:210
Connecting...Ok.
Sent initrequest.
Connection accepted by target.
ID     : Z39.50
Name   : CDL Z39.50 SERVER
Version: 1.0
Z> base cat
Z> find @attr 1=7 0937175935
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 2, setno 1
records returned: 0
Z> show +2
Sent presentRequest (1+2).
Records: 2
[Catalog]Record type: USmarc
001 5856206
003 CU-UC
005 20010124215609.3
008 890905s1989    caub     r    00010 eng d
020    $z 0937175930
035    $9 5856206
040    $a SSL $c SSL $d CAS $d OCL $d CUS
100 10 $a Frey, Donnalyn
245 14 $a !%@:, a directory of electronic mail addressing and networks / $c Donn
alyn Frey and Rick Adams.
250    $a 1st ed.
260 0  $a Sebastopol, CA : $b O'Reilly & Associates, $c c1989.
300    $a xv, 284 p. : $b maps ; $c 23 cm.
440  0 $a Nutshell handbook
500    $a At head of title on cover: UNIX Communications.
504    $a Includes bibliographical references and indexes.
546    $a English
650  0 $a Electronic mail systems $x Directories
700 10 $a Adams, Rick
740 01 $a Directory of electronic mail addressing and networks.
740 01 $a UNIX Communications.
[Catalog]Record type: USmarc
001 7612099
003 CU-UC
005 20010124215609.6
008 930310s1992    caua          001 0 eng
010    $a    93124509 //r96
020    $a 0937175935
035    $9 7612099
040    $a DLC $c DLC $d DLC
050 00 $a QA76.76.O63 $b O73 1992
082 00 $a 005.7/13 $2 20
100 1  $a O'Reilly, Tim
245 10 $a Managing UUCP and Usenet / $c Tim O'Reilly and Grace Todino.
250    $a 10th ed.
260    $a Sebastopol, CA : $b O'Reilly and Associates, $c 1992.
300    $a xxiii, 342 p. : $b ill. ; $c 23 cm.
440  2 $a A Nutshell book
500    $a Includes index.
546    $a English
630 00 $a UUCP
630 00 $a UNIX (Computer file)
650  0 $a Usenet (Computer network)
700 10 $a Todino, Grace
nextResultSetPosition = 0
Z>

Now to write/find an MARC parser to parse the data from the YAZ client.

Arg!!! Ctrl-W is a bad idea under Netscape for Windows. It closes the bloody window!! Now to remeber what I wrote.

Christmas was good to see the parents again and my sister, but I see her pretty often as she's just up in SF. New Years was nice. Had dinner at my sister's place and then watched the SF fireworks from a park that's only a few blocks away.

I finally got the PS/2 to AT and AT to PS/2 converters to hook up a Cue Cat a guy I know gave me (I have an AT keyboard and they use PS/2 connectors :( ). I decided to open it up and do the mod so that it spews the raw ascii data w/o serial number or base64 "encryption" (probably should take a picture of it). It's much more useful.

I have keyboard extension cables that allowed me to reach from my living room into my walk-in closet (where I keep all my books), so I scanned them. Some books are anoying and didn't have the ISBN as a bar code, and I have a few books that are so old they don't even have an ISBN!

I downloaded the YAZ software and have learned that the Library of Congress's book selection is sorely lacking. I have found that the University of California catalog has most of the books, both fiction and non-fiction that I have. Now to write a script to convert from MARC format to Web or something so I can automate pulling the data out.

At least it's working now! :)

Quote of the day (from Earth by David Brin):

Heck, picture if aliens ever landed in California. Instead of running away or even inquiring about the secrets of the universe, Californians would probably ask the BEMs if they had any new cuisine!

Computers:
I hate them! There are too many incompetent designers out there that can't get things to work properly. Spent the four day weekend upgrading my main server and still isn't done. I'm having major issues with the Tekram DC-390F (UW, 53c875) SCSI controller. I thought originally I'd have more issues with the DC-390T (amd based) but it turns out that the Tekram driver for this card works well. Of course, having a hard disk which likes to have read/write issues doesn't make things that much easier to diagnose. Guess I'll end up sticking my old Micropolis drive back in. I hate to loose the UltraWide-ness of the drive. We'll see.

DNA Lounge looks interesting. If I remeber, maybe I should check them out when I head up to SF some time. The write up on Webcasting is interesting and helpful. Once I get things back and running, my webcast of my 200 disc changer will come back on-line, and I have the number of users set so low that I don't think it's an issue. It's like if a couple people dropped by. Do I need to pay for the right to let my friends listen to the music? No, so I'm not going to pay. If I actually distributed to more people than I'd look at it. Anyways, screw the music companies. They have no idea what they're doing.

jLoki:
I'm using one of the new Ricochet modems with 3.4-R, but that's just using the serial interface. Sure you don't get the faster data transfer rates, but it does work quite reliably (assuming you don't go through WorldCom). I do get 11KB/sec which is kinda slow, but it works. I was going to look at back porting the umodem driver in -current and getting it up and running on my 3.4-R, but I haven't looked at that yet.

59 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!