Older blog entries for RyanMuldoon (starting at number 24)

It's been a long time since my last diary entry. A lot has happened since then, so I'll only talk about recent events. I am extremely happy with Nautilus 1.0.....I have a few problems with it, but they are already being worked on, which is great. I am now very much wanting to learn how to write views and sidebars for Nautilus, and bonobo components in general. The RSS viewer thing that is in the works is a really excellent idea. I am deeply saddened by Eazel having to lay people off......that is really unfortunate. But, I think that people are jumping to the conclusion that Eazel's failed - which is obviously not the case. They haven't really tried to sell anything yet. But, as soon as they do offer services for sale, I am most definitely going to get them. I want to support Eazel and Ximian. I think that they do great work. I hope that I will be able to contribute to their projects, and GNOME in general soon. Time is a hard thing to come by, but so many people here seem to manage it, so I'll make an extra effort.

I really place most of the blame on the fact that the US has become obsessed with the stock market. The concept of daytrading has really just messed up the function of the stock market, as well as normal business practices. I don't think that it is a problem unique to Free Software companies, or Linux companies, or tech companies. It is just investors not realizing that an investment is supposed to be long-term. Believe in the company, and support them. But our economy is now on a slide. Why? Because it is in the interests of the bush administration to push us into a recession, so he can gain support for a tax cut that only really benefits the wealthy. I can't say that I am particularly looking forward to the reincarnation of 80's class warfare, powered by trickle-down (oops, I mean supply-side) economics. And so legitimately good companies that I think are going to not only have excellent profit potential a little down the road, but are able to do so with the good of the public in mind are getting hurt. That is really unfortunate. I am glad that there has been something of a shakeout to get rid of companies just interested in capitalizing on a buzzword, but I hate to see problems extending to companies with a strong commitment to their work, and the community.

I am enjoying reading the "Moneyflow" article discussion. The more I think about it, the more certain I am of a few things. First, the Internet is kind of at a crossroads right now. It has the potential to start getting really cool, or keep getting less and less useful. Second, I am fed up with copyright law, and I am planning on starting to write my various governmental representatives. Third, I think a lot of the problems with payment on the Internet, as well as nifty peer to peer filesharing techniques, can be resolved with a robust, standardized system for Metadata. How else can you find the little guy? How else can you know who to pay? We can't work under the assumption that the consumer will go to the creator's website to get whatever there is to offer. Building copy protection and payment mechanisms into protocols and file formats is a really bad idea. It institutionalizes the middle man, and hurts Free Software. It also encourages piracy rather than diminishing it. Systems that will work are ones that don't treat people like criminals, and allow individuals to pick a proper "reward" for content creators. Furthermore, I refuse to believe that artists create just to get paid. If they do, the really are just not artists. Whenever I choose to create something, there is not much profit motive there. Some of the greatest works ever were created without any notion of copyright. I do believe in the original ideals of copyright law, but as it stands now, it is vastly contorted and rewritten to favor distributors rather than the people that matter - artists and appreciators.

I am becoming more and more interested in figuring out ways to leverage public domain works for the public good. There is a lot of absolutely incredible stuff out there. I have the feeling that a lot of people don't take advantage of it, either because it is hard to find, or they don't know they can. I want to figure out a way to unify a lot of the "virtual library" type projects out there, so people can search and access this stuff using something with more of a napster feel. Websites are all well and good, but we should be thinking of them more as "leisure" sorts of things. We need better searching so we can find what we want initially, and then can choose if we want to bother with the website or not. Sometimes I just want to browse through some Van Gogh. Other times I want to read about each painting, and find out about their relationships. We need ways to facilitate this. We need solid metadata systems. We need means of cross-referencing large bodies of work on the fly. We need ways to tip the people that make this all available for us. And we need it all to be in open file formats that are designed for searching and portability. Documents should be in well-structured XML. Audio should be in open formats like Ogg Vorbis. Images should be in JPG or PNG. Ok, enough ranting on this. One more rant to go:

On the topic of people complaining about various features (or lack of features) of advogato: Really, I think that the current structure of advogato is what makes it a unique and well-defined community. The diaries allow everyone to see what various community members are working on, and the public conversations are interesting to follow. The articles have no real need to be threaded - there are typically few responses to articles, so we may as well keep them as open conversations. Threading can be nice, but at the same time I see it as limiting conversation styles. The other thing that I have been kind of annoyed at is people asking for certification to a given level. It seems kind of antithetical to the idea of *trust* metrics that people ask for a given level of certification. The idea is that you prove yourself in some manner. While I'd like to be ranked higher than Apprentice, I realize that since I haven't been ranked higher, there is probably a reason. So I am perfectly content with my Apprenticeship, and am confident that when I actually deserve it, I'll be ranked higher. I certify people as a result of taking part in discussion with them. To me, that is the best indicator of where they should be in the trust metric system. The trust metric is a pretty damn cool system, and I see it as having a ton of useful applications. But it becomes pretty pointless once it is no longer about trust. So if you're not being certified up, it is probably because you haven't shown that you should be yet. Ok, that's the end of my rant.

slef: I don't think that we'd have to have a regress of metametadata......if the metadata format were standardized, and there were a query language built on top of it, there would be no need for additional description. I guess though, that a schema for the metadata itself is in a way metametadata, but it should all work out nicely. ;-)

Last night I got fed up with XMMS and how it displayed track information in a tasklist, so I changed it from being "XMMS - tracknum. track (time)" to being "track (time) - tracknum - XMMS", which lets you get a good deal more information at a quick glance. I sent a patch the the xmms people....hopefully it is incorporated.

The thought of extending metadata to services is cool. It has a lot of potential.

Quote of the day:
"The mind of man is capable of anything - because everything is in it, all the past as well as all the future. What was there after all? Joy, fear, sorrow, devotion, valour, rage - who can tell? - but truth - truth stripped of its cloak of time. Let the fool gape and shudder - the man knows, and can look on without a wink."
--Joseph Conrad, Heart of Darkness

Ankh: I'd definitely be interested in getting in touch with some people with similar goals. I think that it is ultimately essential to the health of the Internet that public domain works are made easily available, and also that searching is vastly improved.

I'll take this opportunity to rant a bit. First, I am becoming increasingly disillusioned by the world wide web. I used to think that it was the coolest thing since sliced bread. But, overcommercialization is killing it. It is becoming harder and harder to find the actual information that I want, because searching was tacked on as an afterthought. I think that the world-wide web can be broken down into 4 categories:

  • Community sites: Things like Advogato, Slashdot, K5, etc. Sites that are too broad or with too great a userbase are really showing it. Slashdot used to be great, but now it is barely worth going to, except that it is habit. Advogato in my mind strikes a great balance. I think the 2 big reasons for that are that it has a focus (free software developers), and the trust metric. The trust metric is great, and has a ton of uses.
  • Services: Things like expedia.com, ticketmaster.com, and buy.com are all pushing a "web application" and are pretty useful. But I think that they can probably be revised in terms of metadata to be much cooler.
  • News sites: salon.com, cnn.com, etc. These are all useful, but it is annoying that I have to go to the front page to see if there are even any articles I want to read. Another candidate for metadata magic, or client pull.
  • Research Material/Public domain works:This stuff was the original purpose of the WWW. It is really lacking though, because the nature of research is that you have to be able to find it. But, as I said before, searching is kind of weak right now. This is also a huge candidate for metadata magic.

The problem is that people are trying to make the desktop more like the web, where I think the opposite should be true. Web sites should be seen like any document or application. Mozilla should not be an evironment for me to do anything. It should be a rendering engine for the content that I asked for. I think that a nice unified search system should be how I find what content I want. Same with things I want to buy. News should be client-pulled for me and put into my desktop environment (like a "News" subdirectory in the gnome menu). Why "browse" unless I'm trying to kill time? It seems kind of dumb.

Now my rant will break out from just technological complaints to general intellectual property complaints. I completely agree with Ankh that people should be focusing their investments on Museums, Libraries, and other public repositories, rather than hogging important works to themselves. I can understand the joy of owning an original painting, or a first edition (and would definitely love to be in a position to be able to afford such things one day), but I'd like to think that it would be better to give or loan such things to museums, and just buy the print for my own enjoyment. Some things are too important to be held privately. However, Museums and libraries need to shape up. They don't display anywhere near 20% of their holdings. What isn't on display is packed in crates where no one can enjoy it, research it, or do anything with it. This isn't in line with the function of a museum. I think that they have an obligation to supply electronic versions of everything they have. Imagine the boon to research that this would represent. Or even just personal enrichment. It would be an admittedly enormous task, but even doing things piece by piece would be beneficial. The arguments that this would discourage people from actually seeing the real thing is foolish. I am thrilled that I can go to webmuseum and look at Van Gogh's amazing paintings, but that just makes me want to see the real things even more. And, when I do get the chance to see them, I appreciate it that much more. All of this stuff should be readily available.

I am thrilled to see organizations fund projects like ibiblio.org - it is an excellent collection of knowledge. But, while browsing it yesterday, I couldn't help but think how great it would be if all of that information had accompanying metadata. And then the development of a distributed filesharing system that has places like ibiblio.org as permenant nodes. It would be truly great. It frustrates me that the technology is there, but it is just not happening yet. Hopefully I can help make it happen a bit faster. Incidentally, a filesharing system that uses servers like ibiblio.org as permenant nodes would be virtually impossible to stop - the government couldn't help but fund such an effort eventually. It would be a quantum leap in the usefulness of computers and the internet. Being able to do crossreferencing on the fly would be cool as well, but I could live if that were a later feature. All a project like this needs is a lot of people willing to spend a little time adding metadata to things. After a while, it will be easy to maintain. To some extend, computers would be able to generate some of the metadata for us, leaving us to fill in the blanks as we have time. A guy can dream. ;-)

I took a look at www.canonicaltomes.org - it is a very cool idea. It reminded me of a project I wanted to do 3 or 4 years ago, but I still have yet to get off the ground, or see anyone else really do. The project would be a central compilation of every public domain work that has been digitized. The goal would be to provide a nice, navigable, searchable interface to all the extremely useful research materials out there.

It would have to have the following features:

  • A Yahoo-like category interface for browsing casually
  • Each work would have extensive metadata, covering a standard like Dublin Core
  • A search interface that lets you perform all the basic searches, but also searches by metadata, so you can dynmically regroup works to your liking
  • A nice gdict-like desktop application that is a search gateway
  • Palm/WAP interface?
  • Cross-referencing
  • Text prettification, so you don't get stuck with ascii if you don't want to....some nice HTML or XML with stylesheets would be nice

I'd imagine that the technical side would be the easy part. It would basically just have to be a big database with a well thought-out schema. The hard part is definitely organizing the content, attaching the metadata, and finding it all. Also, it would be good to be mirrored. Eventually it should be able to act as part of a distributed filesharing system. It would be an invaluable research tool.

With things like GNUpedia, and other similar efforts to create free-license encyclopedias, it seems like a much more worthwhile effort is to work on something like I describe above. An encyclopedia is only useful after there is a collection of works to reference. This would probably go further to accomplish what RMS wanted to get done: there is already no copyright on this material, so no competing interest can do anything about it. Once there is a community around it, it can be extended in all sorts of directions.

Of course, I think that the Library of Congress should provide such a resource, but the person running it seems to disagree with me. Ah well. Maybe one day when I have free time I'll try and get something like this started.

Ankh: Thanks for the suggestions for query languages. After spending a bit of time looking into it, I have found the following query languages that I will want to review: XPath, XQuery, SQL, OQL, and WebQL. One potential problem that I will have to review is any licensing issues and patents. WebQL looks like part of a proprietary product. I want to make sure what I do is unencumbered by patents. It will be beneficial to look at it for ideas though, I'm sure. The XPath and XQuery systems look very interesting - the only thing that I would be concerned about if I chose one of them explicitly is that I want to treat the fact that the metadata is stored as XML as an implementation detail. I would like seamless transition to a method that can support resource forks in filesystems, as well as files that store their own metadata. But, I do want to use the XML DOM as how I deal with the data itself. It seems well-designed. Another thing to consider is the XML Fragment Interchange spec, which looks ideal for simple metadata exchange systems. I'll definitely need to do a lot of reading on this. I understand XML well, but there are a lot of related technologies that I need to familiarize myself with. I really want to do this correctly......leveraging as many standards as possible. I need a sane public interface, a well-defined metadata set, and a well-defined query language. I really want to be able to support things like subqueries and unions. The ultimate goal is to provide a solid foundation for things like a much more powerful peer to peer filesharing system, superior search engines, and virtual folders on your desktop. I think if I can get the 3 things specced out right, the rest will fall into place for anyone to pick up and build on top of. But before I can get there, I need to keep doing a lot of research. And hopefully the other people involved in this project will bring as much to the table as I think they will. We'll see how it goes.

Most of my thinking time has been spent on metadata. I've been trying to do as much research on the issue as possible. The more I think about it, the more I think that it will be feasible to develop a metadata manager that will be forward-compatible. But there is still a lot of work left. I want to begin work on hashing out an XML schema for the format of the metadata. That in of itself will be a large undertaking. Then figuring out a query language. I want to model it after SQL. I am probably going to take a look at OAF's query system....it is SQL-like, and is designed to query similar kinds of information. So it will probably be a good starting point. Then comes the public interface specification for the metadata manager. It's definitely going to be a project with a long timeline. Hopefully soon we will be able to outline a formal roadmap, and plan exactly we need to do. Hopefully it will end up being used by a lot of people at some point. ;-)

I've been thinking more about metadata, and how ir could realistically be stored and organized. Of course, "realistically" is fairly subjective. My current thinking is pretty much based on the Semantic Web ideas...using RDF files to store metadata on files. That right there poses a couple problems: One, should the filesystem expose these files? I say no, at least not directly. The user should be able to modify information contained in these files, but only through utilities that are designed for it. If someone can just see them as files, and open them in emacs, then the metadata contained can be comprimised (due to how I want to organize the metadata.....see below). The other problem is whether or not these files should have metadata themselves....again, I say no. These should not be treated like normal files. They should be more like Mac's resource forks. Of course, this immediately requires a new filesystem API that knows about this, as well as protocol level support. Hence my subjective notion of realism. The one good thing is it is probably likely that protocols can be updated in a way that is backwards-compatible.

Now, you may ask, what are these RDF files going to contain? My thought on the matter is this: there should be a standard set of metadata fields to work with. Ideally, these should be based on the Dublin Core work in this area. This is a good start, but I would like to go one step further: use namespaces to specify standard metadata fields by MIME type. The Dublin Core stuff would most likely be the supertype */*, then it new metadata fields can be introduced for text/*, audio/*, image/*, video/*, and application/*. The annoying one is application/*, because there is no real continuity in the members of that set. Maybe it should be left out....I don't know. My other thought on breaking down these metadata namespaces is that the more specific areas of metadata should be typedefed fields - there should only be certain keywords that can be used. This greatly eases implementation issues, as there is no need for fuzzy logic in associating similar words. It also quickly establishes a lowest common demoninator that everyone can work with. This is why I don't think metadata files should be treated normally.

With this base, filesystems and OSes have a ton of room to innovate new features. One thing that I would like to see is the development of association graphs - so files that you use together regularly are associated together. Another side benefit to this is that a heuristic could be developed for dynamically adding metadata to files with incomplete metadata based on how the file is being used with other metadata-rich files. Also, searching that is purely based on metadata should be really fast, as all the files are small, easily indexible, and in a standard format. I'm sure that there are a ton of other things that could be built on top of this basic framework. I'd be interested in anyone's thoughts on this system....especially my initial thinking that metadata files should be treated differently than normal files.

dirtyrat: You're right, there is a lot of infrastructure work to be done. As I've said before, it would be a much easier problem to solve if there weren't any legacy compatibility issues. We could build metadata right into all the filesystems and protocols and/or file formats. Then all that would need to be done would be to develop the features that take advantage of the extra information. It then quickly becomes more of a straight HCI issue. But as it stands now, it is a pretty massive engineering problem, a computer science problem, and a HCI problem. Not simple. ;-) It is really foundational, which is a curse and a blessing. The curse is of course that it takes a lot of work to graft this onto existing infrastructure. The blessing is that once it is there, there are all sorts of cool things that can be done relatively cheaply. Hopefully this (huge) benefit will get people inspired to work on the issues. We'll see how it goes.

I really hate how all the books that I want to buy are in the $60-$200 range. Of course, all of these books are in Philosophy, CS, Semantics, or HCI. All of these fields find it reasonable to have very high prices on books. I can appreciate the fact that upon reading these books, I am theoretically more marketable/smart, but man, it is just a lot of money. My book list is somewhere in the $600 range right now. And that is after having gotten ~$200 worth of books in the past couple months. I am being sucked dry. :( Hopefully I'll learn enough to justify spending so much money. I hope that the authors are actually getting most of that money. Somehow I doubt it. Ah well. I'll live.

As for my article, I was hoping for a bit more conversation, but that's ok. I think that it is partially because it is a fairly specific area of study that not too many people necessarily think about. Or maybe because I'm completely off-base. ;-) I am finding it difficult to find people to do some research with. The Semantic Web stuff seems very cool, but I have no idea how I can get involved with that. Maybe I'll see if I can figure it out. Hmm..that's about all for today.

15 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!