Older blog entries for jmg (starting at number 66)

mwh:
From my experiences, there is nothing more anoying than a bug report that says something is broken (when it's obvious, esspecially documentation) w/o any help. When the documentation is in as much disarray as Defining New Object Types there really isn't that much I can do about it. Either a bug report should already be there, or the group that is working on the documentation should already know about it. It has improved, but still, isn't usable unless you already know what's going on. As for reading the source code, that's what I do. But there always a point when the need/want to do a project isn't great enough to go through reading all the source code looking for good examples of what you want to do.

Also, if PyClass_* is internal, why is it publicly defined? And if it's private, then how are you suppose to create an object to pass back for maintaining a connection such as the state for a Z39.50 connection w/o using a class? Defining a new type seems a bit extreme, and defining a whole module wouldn't work as you don't instanciate that for each connection. So how else am I suppose to do that w/o using PyClass_*? Again, it really sounds like I should just do a blind wrapper around the functions and do all the glue code in Python. It'll be easier, and don't have to worry about as many coding bugs.

As for bitching to Advogato, don't forget that this is a diary entry. It's what happens to be on my mind when I'm writing the diary, and be frustrating for me.

29 Jan 2001 (updated 29 Jan 2001 at 07:04 UTC) »

Well, just started to try and write a Z39.50 Python module and forgot how poor some of Python's documentation is. It completely doesn't document any of the PyClass calls, and the documentation for 2.0 is exactly the same in parts. Guess no one felt it worth the time to spend even five minutes improving the Python/C API documentation. Now I remeber why it's easier to write most of the code in Python, and only keep a small C kernel as an interface to the calls, instead of wasting your time making the module perform deciently.

Now, do I spend the time to make this module or not?

Well, I finally finished reading Dante Alighieri's Inferno (ISBN: 0691018960) today. Personally I preferred Larry Niven/Jerry Pournelle's (ISBN: 0671670557) version of it. It pretty much contains all that is in Dante's version without destroying it too much, and of course it's a tad bit easier to read. Though I would like to be able to read the original Italian text, but I don't think I'll be learning Italian anytime soon to do it.

Started reading Rim by Alexander Besher (ISBN: 0062585274) and so for it looks like a good book. It's kinda hard to understand exactly if you're in a virtual world or the real one. They also throw out strange terms that you have no way of knowing what exactly they are. Hope that it gets explained soon, but so for I'm less than impressed.

Guess I should head off to the local bar and grab a beer. It really does feel about that time.

Bloody hell, repeate after me, read the recent diary entries before posting your entry.

voltron:
Do they happen to web cast their station? Sounds not to different from KWVA which is University of Oregon's college radio station.

AlanShutko:
Yeh, I was thinking that it wouldn't be too hard to partse the web page that they have at LOC and if I had to imput them, that's what I'd do. Now I haven't taken a look at the DTD's that are suppose to be now used for XML version of MARC and I would think that they have most of that defined in the DTD's, but I'm not that familar with XML really.

technik: Check out PicoBSD based on FreeBSD. I know that they have done various work on making a small system bootable. Not sure if they've relaxed the requirements now that you have do large images on CD's, but I do remeber them talking about doing something similar.

Hmmm, am I the only one that is noticing that Advogato has gotten really slow all of a sudden? Both from home and work it takes a good 30 seconds to bring up a page, oh well.

mwh:
Thanks for the hint, I didn't know that there was a performance difference. Guess I'll have to switch to has_key then. I've been thinking of getting trying out Python 2 one of these days, but just haven't gotten around to installing it. As for the try/except or has_key, that is the problem with languages today. Even the high level languages today need you to do special stuff to tweek out the last inch of performance. (Of course, if you're after performance why the hell are you using a high level language? :) )

AlanShutko:
I was thinking about printing out that information too, but after I read how many different fields and subfields there are, I decided I'd leave that to the end application. Really all I'd be doing would be writting a fancy name mapping program and I'd rather not do that. As for my MARC converter, I also have it being able to write out python MARC records that you read in. Just need to handle the data in the leader a bit more cleanly (like actually storing it) and it'll be somewhat useful. Now to update the books read page on my home page to use this instead of the crappy scripts I'm using now which fetch data from Borders.

Read the Tabs vs. Spaces article that ariya posted. He is right about point 1 being the core of the argument, but the problem is that if you use spaces instead of tabs, then it becomes hard for others to read your code. I personally use 8 space tabs because that is the FreeBSD style(9) guide lines say. This may sound strange to adopt a project's style guide for your own code, but if we could all agree on a single style, then everyone would have less issues with this argument. Or you could always switch to Python which forces style on you.

I personally agree the 8 space tab stops are good. If you ever get so deeply nested that you can't fit your code on one or two lines, then you need to create more functions for that piece of code. The general rule that if you tab in more than three of four times from your base function then you need to rethink the function is a good thing. If you write with 2 space tab stops, then it's easy to write functions that have about 20 loops in them (that only puts you have way across the screen) without even thinking about it. If you had 8 space tab stops, you'll have issues going beyond 6 nested loops.

I wrote an MARC binary to ascii conversion program last night, but I won't release it till I split it into functions, because the one big function goes a whole four indents in from the base of the function. For me this is too much, and writing more smaller functions makes the code easier to read.

Oh well, just ranting a bit about coding style.

Hmmm, should I rant about the whole binary vs. XML for machine exchange? The reason systems are getty so bloody slow is because they decided to trade a faster to read format for an easier to [human] parse format. If programers continue to decide to go for solutions like these, we will continue to need faster computers, but it doesn't have to be that way.

I was impressed with how easy to parse the MARC format was without giving up extra space and without dealing with endianness. To deal with endianness, they simply encoded the numbers in base10 ASCII. Of course, with python it was too easy to parse the "binary" MARC format to a list of dictionaries.

Now for a bit about python. I always forget to use try/except instead of if statements when it's more appropriate. One example is if you are adding a data element to a dictionary, and you may have duplicate tags. There are a few ways to deal with this. Simply start out using lists for your data elements (which is probably what I should do), or you convert it to a list once you get more than one. An example of the first is:

try:
	rec[tag].append(data)
except KeyError:
	rec[tag] = data
except AttributeError:
	rec[tag] = [rec[tag], data]
The second one would be like:
try:
	rec[tag].append(data)
except KeyError:
	rec[tag] = [data]
Now the latter one in some ways makes more sense, as then you don't have to find out if it's a list or not, and handle them differently, but it also means a bit of extra work in the case that multiple tags are the exception rather than the rule.

Oh well, enough mussing, now hopefully the 45gig IBM 75GXP drive I ordered will be waiting for me today when I get home. I was also lucky to get a couple 128MB PC133 DIMMs for only about $40 each. They were generic stock, but were CAS2 timing. What luck! Of course, I only happen to be using them in PC100 capabile hardware, but I'm debating about ordering a couple more.

Boy, that was fun. Just wrote ean2isbn to convert the EAN bar codes to their proper ISBN. You'll need python installed to run it. It verifies that the EAN check digit is correct and generates the ISBN check digit for you.

It's kind of interesting, University of California lists two different books for the same ISBN number even though the check digits isn't correct, sounds like someone entered the data incorrectly:

hydrogen,ttyp6,/tmp/yaz-1.6,534$client/yaz-client tcp:128.48.141.7:210
Connecting...Ok.
Sent initrequest.
Connection accepted by target.
ID     : Z39.50
Name   : CDL Z39.50 SERVER
Version: 1.0
Z> base cat
Z> find @attr 1=7 0937175935
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 2, setno 1
records returned: 0
Z> show +2
Sent presentRequest (1+2).
Records: 2
[Catalog]Record type: USmarc
001 5856206
003 CU-UC
005 20010124215609.3
008 890905s1989    caub     r    00010 eng d
020    $z 0937175930
035    $9 5856206
040    $a SSL $c SSL $d CAS $d OCL $d CUS
100 10 $a Frey, Donnalyn
245 14 $a !%@:, a directory of electronic mail addressing and networks / $c Donn
alyn Frey and Rick Adams.
250    $a 1st ed.
260 0  $a Sebastopol, CA : $b O'Reilly & Associates, $c c1989.
300    $a xv, 284 p. : $b maps ; $c 23 cm.
440  0 $a Nutshell handbook
500    $a At head of title on cover: UNIX Communications.
504    $a Includes bibliographical references and indexes.
546    $a English
650  0 $a Electronic mail systems $x Directories
700 10 $a Adams, Rick
740 01 $a Directory of electronic mail addressing and networks.
740 01 $a UNIX Communications.
[Catalog]Record type: USmarc
001 7612099
003 CU-UC
005 20010124215609.6
008 930310s1992    caua          001 0 eng
010    $a    93124509 //r96
020    $a 0937175935
035    $9 7612099
040    $a DLC $c DLC $d DLC
050 00 $a QA76.76.O63 $b O73 1992
082 00 $a 005.7/13 $2 20
100 1  $a O'Reilly, Tim
245 10 $a Managing UUCP and Usenet / $c Tim O'Reilly and Grace Todino.
250    $a 10th ed.
260    $a Sebastopol, CA : $b O'Reilly and Associates, $c 1992.
300    $a xxiii, 342 p. : $b ill. ; $c 23 cm.
440  2 $a A Nutshell book
500    $a Includes index.
546    $a English
630 00 $a UUCP
630 00 $a UNIX (Computer file)
650  0 $a Usenet (Computer network)
700 10 $a Todino, Grace
nextResultSetPosition = 0
Z>

Now to write/find an MARC parser to parse the data from the YAZ client.

Arg!!! Ctrl-W is a bad idea under Netscape for Windows. It closes the bloody window!! Now to remeber what I wrote.

Christmas was good to see the parents again and my sister, but I see her pretty often as she's just up in SF. New Years was nice. Had dinner at my sister's place and then watched the SF fireworks from a park that's only a few blocks away.

I finally got the PS/2 to AT and AT to PS/2 converters to hook up a Cue Cat a guy I know gave me (I have an AT keyboard and they use PS/2 connectors :( ). I decided to open it up and do the mod so that it spews the raw ascii data w/o serial number or base64 "encryption" (probably should take a picture of it). It's much more useful.

I have keyboard extension cables that allowed me to reach from my living room into my walk-in closet (where I keep all my books), so I scanned them. Some books are anoying and didn't have the ISBN as a bar code, and I have a few books that are so old they don't even have an ISBN!

I downloaded the YAZ software and have learned that the Library of Congress's book selection is sorely lacking. I have found that the University of California catalog has most of the books, both fiction and non-fiction that I have. Now to write a script to convert from MARC format to Web or something so I can automate pulling the data out.

At least it's working now! :)

Quote of the day (from Earth by David Brin):

Heck, picture if aliens ever landed in California. Instead of running away or even inquiring about the secrets of the universe, Californians would probably ask the BEMs if they had any new cuisine!

Computers:
I hate them! There are too many incompetent designers out there that can't get things to work properly. Spent the four day weekend upgrading my main server and still isn't done. I'm having major issues with the Tekram DC-390F (UW, 53c875) SCSI controller. I thought originally I'd have more issues with the DC-390T (amd based) but it turns out that the Tekram driver for this card works well. Of course, having a hard disk which likes to have read/write issues doesn't make things that much easier to diagnose. Guess I'll end up sticking my old Micropolis drive back in. I hate to loose the UltraWide-ness of the drive. We'll see.

DNA Lounge looks interesting. If I remeber, maybe I should check them out when I head up to SF some time. The write up on Webcasting is interesting and helpful. Once I get things back and running, my webcast of my 200 disc changer will come back on-line, and I have the number of users set so low that I don't think it's an issue. It's like if a couple people dropped by. Do I need to pay for the right to let my friends listen to the music? No, so I'm not going to pay. If I actually distributed to more people than I'd look at it. Anyways, screw the music companies. They have no idea what they're doing.

jLoki:
I'm using one of the new Ricochet modems with 3.4-R, but that's just using the serial interface. Sure you don't get the faster data transfer rates, but it does work quite reliably (assuming you don't go through WorldCom). I do get 11KB/sec which is kinda slow, but it works. I was going to look at back porting the umodem driver in -current and getting it up and running on my 3.4-R, but I haven't looked at that yet.

sergent:
There is the freely available MPEG-1 library based on code from Berkley that will return a frame for decoding. I wrote a FreeBSD console app to view MPEG-1 video, but because it actually returns a full frame instead of the differences (kinda hard to do), it's a bit more complex, but for RTP data, you probably don't need 30/24fps speed. It was very simple to write. Should do the job for you.

gstein:

Digest Auth doesn't cut it as it still requires a clear password sent to the otherside. As for the others, there isn't any rfc that tells how these are standarized. Sure you can write your own, but then that's your own. So, still no strong auth.

As for the design document, it should include all features you will want to include in the product and not include any implementation details. Implementation is later, the design document is to describe what you are going to do, and what features you are going to include so that when you start implementing it, you don't end up doing extra work for what isn't in the design document, or you forget to include features. SVN's design doc is more of a basic implementation documentation as it includes this is how we are going to do it.

If you had know/pointed me to future.texi earlier in the discussion, less misunderstanding would of happened, but at least I did finally find that you guys are planning on it (sounds like you didn't even know about it before), but haven't included it as part of your design.

As for it being Open Source, that's just spiffy, but to many people think of open source is the end all solution. What really open source have shown is that code sharing between groups/orginizations is a good thing. Do you think FreeBSD would be so successful if it wasn't for the GNU gcc compiler, nvi and other projects that are out there? It all comes down to being able to share code. People reimplement code all the time including stupid mistakes such as skiplists. If everyone wrote their own, it'd be a waste of time.

With Inter-repository communication and public/private key authentication, it becomes easy for people to share code and follow bug fixes that have happened. Imagine being able to run a simple command sc update and suddenly have all of the latest code for the 15 different libraries your product has. Now that be a 21st century feature. This would also be equally applicable to commerical ventures as free ones. It would allow a company to sell certain releases of the library and allow them to fetch patches and all those goodies.

So, when will Subversion 1.0 be released? Any general idea? This year? 2001? 2002? Will Inter-repository communication make it up to the design document so it's not one of those well, if we have time, we'll get around to it, but we don't really care about it so it isn't going to be part of the project unless someone contributes code to do it? Thanks for the information, and I do hope that the project goes well.

thom:
Sounds like your experience was pretty similar to mine while I was in Melbourne, Australia. Danced with this girl for a while, get her mobile, run into her a couple days later, and I realize that she's a lesbian.

mfleming:
That Happy Hacking keyboard looks cool. Just the type of thing I need. Though I'm not sure how well it'd balance on my lap because I'm used to having the lop sided keyboard. Plus it's a bit expensive for being so small. If it was even $30, I'd understand, but it being 5 times more expensive than the inexpensive Mitsumi keyboards I've grown used to, I don't think it's worth it. I'm also not so sure about having no lip on the bottom side of the keyboard. When I get my hands rested there, I need that lip to keep my hands comfortable. Of course, they teach keyboarding in schools so that you end up with RSI instead of preventing it. They always have the keyboard to high, and your wrists in the worse possitions.

Hmmm, looking at my hand possitions, I think that if you don't have your pointer finger near straight, you probably aren't typing correctly. It may seem strange, but you'll keep your hand lined up with your arm so that you're wrists are straight instead of bent like they teach you.

gstein:
Looks like you guys need to post the subversion/doc/future.texi to your web page. This actually includes a small part of what I consider a 21st century revision control system: Inter-Repository Communication. The short description you provide is good, but doesn't deal with the strong authentication required, and doesn't mention inter-project management. Oh well, I'll see what I can whip up. Sure it ain't going to be pretty, but I'm doing it more for a proof of concept type thing. If it turns into a project, so be it. It's just something that I've wanted for a year or two after some people started sending me bug fixed to my libraries that I've done.

Guess I'll let the issue drop since you're so quiet on it.

57 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!