Older blog entries for zanee (starting at number 209)

Relational Databases are killing content management.

Why LAMP is wrong for content management

LAMP. Linux/Apache/Mysql/PHP,Perl,Python is THE opensource software solution stack for programmers/administrators doing almost anything, specifically rapid application development on the web. There are numerous frameworks that pull all the pieces together in hopes that large and parcel anyone can build robust and usable web applications and websites. This includes content management. For the most part this works well, even through the hurdles of getting all of the specific components to work together nicely. Unfortunately the masses have taken the context of building anything to mean that LAMP is the ONLY solution to building applications on the web.  So the standard de facto has been to look at a problem and automatically assume LAMP as the underlying technology for the space. Blog? LAMP. Twitter like site? LAMP. Website for my company? LAMP.  Simple web page with a small contact page? LAMP.  To be fair, this works well for many context spaces. Especially because the LAMP system is highly modular one can literally pull one piece of the overall solution set and replace it with something completely different. However as any engineer and architect will tell you. When building a bridge, it helps to know what is going to travel over it, under it, through it and unfortunately, into it. If not you end up with something like this.

The collapse of the Ironworkers Memorial while under construction on June 17 1958

The collapse of the Ironworkers Memorial while under construction on June 17 1958. Collapsed during construction due to miscalculation of weight bearing capacity of a temporary arm.

This is generally what content management tends to resemble when it meets the architecture of LAMP. The software solution set is a complete failure for content management applied as a standard de facto solution primarily because of the M in LAMP. In this case I mean Mysql but the problem in actuality is any RDBMS or Relational Database Management System. It's not the only problem but it's one of the most critical components to getting the idea of content management right.

Lets start by taking apart the LAMP acronym.

Linux; As far as the operating system goes. Linux is a tried and true system which has been under development for nearly two decades. It has consistently pushed the envelope in regards to utilizing the underlying hardware to provide a robust and capable operating system that can scale from the server to the desktop.

Apache; As far as web serving goes. Apache is again, tried and true and has been under development for nearly just as long with many lessons learned. It is the standard de-facto server for serving static/dynamic content with an extensible and modular system which makes it capable of providing support for many different applications and setups.

Mysql; The relational database management system that had the honor from many as being the first RDBMS they used and the cut throat bare bones data solution. In early Mysql releases there was no idea of ACID Atomicity, Consistency, Isolation, Durability in the project. This overhead was seen as generally a problem that could be solved higher in the application space but for the most part everyone using Mysql during that time had no need for such compliancy. This made Mysql extremely fast and of course, with the exception of Postgres (which did concentrate on these things) it was opensource and freely available. Things have changed much since those days, Mysql has generally become ACID aware and is surprisingly owned by Oracle.

Perl/PHP/Python; For the most part Perl as a language was dominant in the 90's. Since then it's fallen behind in regards to the web application space. There are numerous reasons for this but one of the major reasons is that many new programmers have written Perl code that is difficult to read and maintain. The language wasn't originally intended for large object-oriented projects (Ruby which has syntax largely like Perl was written primarily with object orientation at its core and the Ruby language has a vibrant community and a popular web app framework called Ruby on Rails or ROR) and maintenance has generally become a nightmare. This isn't a blackmark against the language past it allowing poor practices that have seemingly been embedded with the programmer. As those new and junior programmers become more literate their Perl code improves but those old projects and web applications they've written or help write don't fare as well. There tends to be a group-think backlash against the language because of that. "It was written in Perl? Ugh.. nightmare".

PHP tends to have the same exact problem, low cost of entry, except the community around PHP is wholly vibrant and releases often. PHP as a language itself has had many of the same problems as Perl in regards to building a large software project. Most of which have been remedied with time, namespace support, halfway usable object orientation support etc. It still lacks many commercial features that you can't readily use without purchasing them or their frameworks.

Python on the other hand, has a much higher cost in regards to learning curve than the previous two. However it's still a very easy language to learn and enforces many good practices out-of-the-box that the other two languages don't. Proper formatting, a useful object oriented system, the idea of namespaces, unit tests and code structure. These are all important concepts in architecting software (building your bridge) and it's a viable language.

Ruby isn't a P but needs mention as it's also a dominant language used. It's highly object oriented and as stated above has syntax largely like Perl. It's essentially referred to some as "Perl done right"  and it's primary author has stated that "I wanted a scripting language that was more powerful than Perl, and more object-oriented than Python. That's why I decided to design my own language".

So what is the problem with Relational Database systems and content management?

Relational Database systems are from an era where object-oriented programming didn't readily exist large and parcel. The concept of an object quite frankly was  foreign. Most programming had been functional and procedural, no one had any idea how useful it would become. The general crowd-think was more concerned with "records"; it had worked well since the 1940's and many companies had spent large sums of money setting up internal systems. A nice line printer spilling out data onto dead tree's with a list of users, phone numbers, etc. It was easy, pulling out all of this information. However as time passed and with the advent of the internet the concept of object-oritentation became more important. Java came to dominance because of this. We needed to do more than just relate information but also update it real time, expose it to other systems in-house and to our partner systems. Records in a flat table were simply not enough, we wanted to have a representation of a user and all of the attributes important to us updated dynamically as they were updated by our staff, customers or both.

Computer languages started to reflect this fact. Most languages started receiving object orientation methodologies. C got it's OO super set in C++ and then there was Objective-C which got it's clothing from Smalltalk and obviously Java. Which sparked a new commercial trend and is probably the most dominant object oriented program in use today.

Cube vs Cylinder

Square vs Circle, Cube vs Cylinder, Oil vs Water.

Unfortunately, as the way we wrote programs changed, the way we stored the data our programs created or needed did not. Programmers began writing Object Relationship Mappers ORM's to map the objects they created in their programs with the way they stored their data. So one would design a user object with a name, address and phone number in their program and then have to create a table for this in the relational database and then have to map between the two. Obviously for time sensitive applications the overhead of conversion became an issue. More importantly though a whole system to manage the consistency between the two became an issue. If one got out of sync with the other it would cause no small amount of trouble for critical applications.

Enter Object Oriented databases or OODB's. A programmer could create the object in his program and store it into an object database. Early versions were considered slow and inefficient however OODB's tended to hold more data than relational database systems, were generally faster than relational databases as there is less to lookup and no overhead or extra systems to manage between a relational system and an object system. As well as being more secure it seemed like an easy win however there was no uptake. Most commercial organizations were and still are used to the idea of a "record" and Oracle as a company is simply a good salesman; they came up with Oracle Object Relational support. OR in more simple terms a superset to SQL to make Oracles relational database behave more object like. Also the sheer force of SQL and the relational database eco-system made it hard to see through the clouds. In-fact, most people didn't and don't even bother looking. There was no real advantageous reason to go with an object oriented system if you had an object relationship mapper. Summarily, no one took up OODB's except for organizations in the know, primarily scientific and engineering houses who had large amounts of data they needed to warehouse and work on. Every one else stuck with RDBMS, until it started hurting them. They would eventually retool by either finding and consulting with Oracle or one of the commercial Object Oriented Database providers. It's a testament to Oracle's success that they are the ONLY database game in town. Really, what other commercial database company can you refer to off the top of your head? I'll wait... Right, so that leads to the present day where there is more talk of Nosql databases (object databases, graph databases, high perform key/value stores etc things like HadOOP, CouchDB, MongoDB, Redis, Neo4j, Allegrograph) but not much has changed in the last two decades. This time around things seem much different and the database playing field is bound to go through transformation with web semantics and html5 database standards. We can only wait and see.

In the interim the previous decades were simply unfortunate for database stores and summarily the content management space as it is highly object oriented. Your customers want to manage content. Videos, Users, Large lists, Blogs, News, Images etc. All of which are objects that need to be stored somewhere, and for the most part that is occurring in a relational database. Which means one has to overcome the problems above and 9 times out of 10 it requires a lot of time and engineering that simply isn't done properly. Hence choosing the standard de facto in LAMP, the bridge eventually collapses. A collapse maybe downtime, or loss of records, or constant maintenance, security threats all of which can be lessened by building the correct bridge for the problem space. In content management that is an object database; or some combination of nosql/relational and object data depending on application.

How do I change the M in LAMP to an O for object database or something similar?

Well, if you plan on managing content there is ZODB or the Zope Object Database for Python which is part of the overall package for the Plone Content Management system. There are also DyBase, db4o, Twig etc. Past that your options are currently limited without an object relationship mapper but it pays to understand the problem space so you can architect your content management solution appropriately even should you need to keep your data in a RDBMS. Whether it be for something as simple as management of blog data or a large list of data it pays to know that you have a bridge that can withstand all of the nets elements. Hopefully, next time you are talking with your consultant, client, design company or web team and you hear LAMP you have a better idea of what it entails and how to apply your business needs and process using LAMP if you need to. LAMP isn't always the answer and it certainly isn't always the answer for content management solutions.


Syndicated 2010-08-30 19:47:00 from Christopher Warner » Advogato

What computer should I purchase?

Recently I had a conversation with my brother about his purchasing of a new computer or a better computer and here are some recommendations that I feel will be useful to anyone looking to do the same.

  1. You don't need the newest whiz bang feature. 9 times out of 10 most people are using their computers for email/facebook/music and web browsing.
  2. More expensive doesn't mean faster. There are a whole host of reasons this is the case but a big price tag doesn't make for a faster machine.
  3. If you know very little about computers. My recommendation is to purchase a Mac. Simply because you don't have the where with all or knowledge to manage a Windows machine. It's simply too complex fighting off virus after virus. Even for very experienced Windows administrators.
  4. If you know a thing or two. My recommendation is to still purchase a Mac. You obviously have work to do and spending time fighting all of the intricacies of Windows doesn't sound fun or productive. [1]
  5. If you want to build your own I would spend time researching copiously your components. Computer hardware moves faster than anything else in any industry. Period. Right this instant the card you are looking at is the fastest. Just now, in the time it took for me to type Just now, it became maybe Top 5.. maybe. By next week you'll be lucky if it's still there. A month from now, new Whiz Bang GTX with 50 more whiz bangs will be out.
  6. Do not purchase cheap. If it's $300 bucks it probably won't last two years. The components are cheap and will break down and you will find yourself spending another $300 or more and would have saved nothing. In-fact with the time you spend in repair and with a down computer plus all the etc. Might as well have bought a quality machine. If you are like me and a computer is vital to your lively-hood, you should make tactical measure to only buy quality gear. It need not be expensive, but quality. IE: If I purchase memory it will only be ECC memory. If you know what that means, it's a clearly obvious decision.
  7. Buy refurbished quality gear from the manufacturer if you can swing it. You'll get burned-in tested gear in near brand new condition for a fraction of the price that has been certified by the manufacturer. It's an easy sell.
  8. Keep your computer for as long as possible. At some point your needs may change and you may need more whiz-bang feature. It's at this point when you need to upgrade. I see too many people upgrading for absolutely no reason. They get no benefit AT ALL. Nothing is inherently faster for your needs just because it is new. For instance, I try to keep my main desktop running for at least 5 years before I even consider an upgrade. The machine that handles most of my network traffic has been in service for close to 12  years. This idea that computers are throw away products is harmful. They contain all sorts of toxic compounds and most are serviceable, usable and capable for a decade if not more. Consider recycling or donating your computer. Throwing it out should never be an option.

All that said if you still think you need 30 inches, 12 cores, 72 GB of ram, 4 TB of SSD hard drives and a goddamn pony. Feel free, it'll be obsolete by tomorrow and you're 30k investment will be worth 10% of that by next year. In two years, it'll be in a dell catalog for $300 dollars.

Needless to say I have a variance of kit that is useful to me but somehow people seem to believe that the more they spend the better. This is wholly incorrect.

[1]: Right, so some idiots will jump to conclusions of fanboyism tirades as to my recommendation for Apple kit (even though I hold no such allegiance.  I use Linux, Freebsd, Opensolaris, Openbsd and have DEC!, Sun! (Dec, Sun are no longer but they built some of the most long lasting machines known to man), etc etc kit all over the place for my needs). Unfortunately, when it comes to quality built hardware there are few consumer manufacturers that do it well. Even Apple has some shoddy gear. However, out of all the consumer manufacturers I would choose Apple first over any other. They provide decent quality components, in a nice overall package that holds up over time for your run of the mill user. It also comes with solid software packages all for a reasonable price. The same can't be said for Dell, HP, Compaq, Etc. Windows as a general purpose operating system has historically been  and is still pretty much a complete failure. OSX concentrates on the user and the tasks they need to complete; which is why we have computers and why you need one.


Syndicated 2010-08-17 16:00:13 from Christopher Warner » Advogato

Selinux is not for desktop usage

Ok, let me state emphatically. Selinux[1] is probably the most secure environment and system that you can get for free. It's emphasis is on a RBAC model which is different to lets say OpenBSD security through code approach. Anyway, I don't have time to get into a lengthy post about all of this right now because my brain is tracking on something else except to say that I don't believe Selinux is useful on consumer grade desktop systems.

In tight-security corporate roll out environments or secure military facilities or some such; it  only makes sense on the desktop there. However i'm speaking about RUN-OF-THE-MILL machines. So you're desktop at home? It's retarded to have Selinux there because a run-of-the-mill machine is constantly changing needs and is general purpose. Writing new policy for every new thing you plan to do is a little silly. Especially because that policy will probably be insecure ANYWAY and it takes time to vet.

So Fedora with Selinux? Dumb. Ubuntu user distro with Selinux? Dumb. Etc user distro linux? Dumb. You get the idea. If you want a secure desktop, turn on a firewall and flip on encryption of your most sacred files. Apple has Filevault which is implemented extremely poorly even though I use it, obviously if you are using Unix then you are aware of your options etc.

[1]: http://www.nsa.gov/research/selinux/


Syndicated 2010-08-05 13:57:07 from Christopher Warner » Advogato

Extending a default Plone Content Type commentary

So you want to extend a default Plone Archetype Content Type. You start saying to yourself, i'm actually recreating an event content-type why not just utilize Plones existing type. You know about archetypes.schemaextender [1]and are thinking to yourself. This is going to be a cake walk. I'll just adapt whatever content-type, lets say in my case ATEvent. You then waltz along saying to yourself let me reorder the schemata order. However, you can not do this. That sucks. Then you start thinking to yourself, maybe I shouldn't use archetypes.schemaextender. I mean, it's getting a little silly with all of these workarounds I'm doing for the most basic content-type stuff. It's obvious I want to do more than just add an option or two onto the main content type. Yeah that's the ticket i'll be adding new methods, browser views, candy, ponies, etc. So you decide that you will subclass one of the main plone Archetype content types and be happy. Unfortunately you then realize, you're basically copying the whole content type out of Plone and recreating it.

This is when a light bulb goes off even though it's ever so dim; you just need to make your own content-type. If you're lucky enough, you would of already did all the work like myself and will just need to continue on. The bright idea of you reusing Plone AT Content types for more than just a boolean option or some string material is just a dumb idea. Seriously, it's just a dumb idea.

You may still think it is a bright idea. I really mean it. It is not. Go ahead.. Try it. I'll wait........

See? Dumb idea. If you were smart and heeded this advice without trying it then you've just saved yourself a couple of hours worth of work. So when is it a good idea to use archetypes.schemaextender? Well if you need some minor annotation to an overall content type it makes sense. Usually information that you won't have to display readily or override views for. You know, like a boolean option that will trigger a subscriber or some such. Or some other tiny bit of stringfield information where it's just a tiny change and not much of the overall schema is being modified.  When you start talking about more than that, just go with your initial gut feeling and do your own thing. If you really want the functionality from the default Plone content type you can always rip it out and throw it into your own content-type.

Now today doesn't feel like such a waste. Cheers.

  1. http://pypi.python.org/pypi/archetypes.schemaextender

Syndicated 2010-07-27 20:08:39 from Christopher Warner » Advogato

Race and discrimination; Political Correctness

Rarely do I go into discussion on what I believe to be off-topic issue but this needs to be said because I have eyeballs. Race and discrimination as an ongoing issue doesn't interest me as I feel many strides have been made. Many more will be made and the topic in and of itself seems to be on level with religion. Every side is right and if you are on the opposing side, you are wrong. It doesn't seem any one side can take a critical look at themselves or offer valuable critique of the other and that means it is just a general morass of mess. It just keeps popping up in my life, most likely because I am a black male living in the United States of America.

Unfortunately the discord continues here in the USA and currently the flames are fueled by rhetoric from what many have deemed to be the "Right" wing of American politics. Sometime ago, if asked, I would of said it's simply safe to ignore it as intelligent people tend to ignore such things or at the very least collectively work towards marginalizing the behavior. Over the last decade, this theory has been proven wrong to me numerous times over. So after revisiting this and thinking about response it became clear that organizations such as the NAACP and others are wholly inadequate at dealing with the issue. Personally I suspect they are able to hire appropriate PR people but maybe that idea is wrong. I've not seen a solid formal response to many things from them. So, should you be planning to respond to some racial rhetoric it's important to note several things.

  1. There is no need to get emotional. It is indeed what the other party is looking for and as we all know an emotional response is not effective. The response should not be argumentative because it is not an argument.
  2. Persuasion happens by not persuading. Persuade by your actions and not by conversation or argument. It is also wholly ineffective in this case.
  3. It is ok to be angry if you feel injured or hurt. You should channel that energy into something positive.
  4. Stand-ins, sit-ins and civil disobedience works for some things. For others it doesn't. The notion that standing somewhere will change something is idiocy. What will change something is collective organization and changing it by all means available until it is so. Stand-ins worked when people didn't want you to stand where you were. I think as far as civil rights go we are past that.
  5. It pays to be smarter. Know your position and every detail, know their position and every detail. Refute anything that is inaccurate; improve any weakness.
  6. Timing is everything, so bide it.
  7. It is safe to ignore some things. Not every piece of racial rhetoric deserves a remark. Realistically, very little of it needs remark, you can generally pool most commentary into a pool and provide one response to that.
  8. Some will claim that violence is a proper form of response. For instance shooting an unarmed human being 50+ times and then being fully acquitted is unacceptable in any conceivable context. It evokes a feeling of rage. However the proper response isn't a stand-in, march, or protest. See 4. You'll be surprised to see how effective anger can be for fuel; it is very focusing. Obviously, i'm not saying you should not protect your families or loved ones. In the United States of America that is every free mans right. I'm saying calculated behind the scenes efforts are wholly more effective; combined with 6 they are even more precise and deadly.
  9. Nothing is ever one groups problem. Ever. So for instance, the racially motivated discrimination or killing of one group should be every groups problem. We all have our preconceived discriminations based on race. We should work toward eliminating those beliefs when they are clearly invalid.
  10. It's safe to realize that first we are human beings and secondly that as a group we have our collective fair share of idiots, scumbags and people who should really not be apart of our species. Unfortunately, they come in all color and sizes. Weed them out; See 8.
  11. Not everything is racially motivated. It's safe to say that sometimes I just don't like a person for whatever reason. That is fine, we are all human beings and that is a fair position to have. It's also fair to note ones experience or reaction to something. Sometimes I wonder why senior black people are so racist? Then I realize that some of them were children of slavery, racism, discrimination, they lived through riots, attacks on their families and such. I have to take this experience into context and be fortunate enough to realize that I haven't had to deal with race and discrimination in this way in my lifetime.

Onto Political Correctness or PC. If you have to be "PC" about something it usually means there is an underlying issue that needs to be addressed. So it's better to address the issue than to be PC about it. Should someone feel offended by something you've said then humbly and gracefully apologize if that wasn't your intention. If you have to validate an apology then it's not really an apology, try to take their experience into context and avoid doing so again.

We are humans and as much as we think we all fit a mold of behavior, feelings and passion we are highly unique and complex in the universe. Also, feel free to stop saying I have "black, asian, white, indian or whatever friends". That doesn't validate or excuse you being an idiot or discriminating against someone at the time. Anyway.. I guess I'm gonna just call this end of lunch for me.. cheers.


Syndicated 2010-07-16 16:09:52 from Christopher Warner » Advogato

DublinCore Metadata

So, most of us have heard of DublinCore before, which is why I'm not going to into a long spiel introducing it. What I want to talk about is the lack of use it gets. Realistically, no one is using DublinCore in an exposed useful manner . Meaning, when my developer cap is on I can easily pull DublinCore associated data from an object. When my administrative or user cap is on it becomes near impossible. Exposing this metadata set is important! As a user I want to know what the rights are, or who created a specific object/resource/page whatever. As a developer I'd like to pull this data and mash it up with my own development.

Yes, I know what you just said. The meaning for the terms in the element set are so obtuse that it can be interpreted almost in any fashion. Thinking about the possibilities becomes almost mind numbing. .oO "Simply the standard is broken by not being specific enough Mr. Christopher, blah blah, lets hit this beach". "Whatever dude, who cares.. no one. that's who. No one gives a shit, pardon me while I drink this beer..."  It need not be so however. Instead of recreating ones own metadata set which is doing the same exact thing. We should be looking towards exposing this data and then ADVERTISING that it's available. It's also easy to simply generate the required data for most of the 15 resources with ease.

How does this help? Well, at the very least people who respect each others Copyrights/Rights and would like to give attribution will know exactly who or what institution to give it to. As well as helping to nail down where data originates etc. It's not perfect by any means and the holy grail solution most likely exist in a W3C standard that gets accepted by the major browsers.

In the meantime maybe it's high time the DCMI community came up with some badges that simply state "This site or resource is DubliCore aware" or "These resources have DublinCore metadata attached" or just a graphic that highlights that we can indeed review or pull the metadata. There is a nice plugin for Firefox called Dublin Core Viewer that does this. Also exposing the data as naturally as possible ie: "Creator" == name of user or some such is also generally good behavior. "Goddd, you're still yapping about this?? Ronnie, pass me another beer.. i'm gonna leave if you keep talking this clickety-click bullshit".

ok.. I'm done... for now.


Syndicated 2010-07-09 14:41:36 from Christopher Warner » Advogato

1 Jul 2010 (updated 1 Jul 2010 at 03:17 UTC) »

Temporarily granting manager permissions with Plone

Recently I had to "sudo" with Plone while updating @@personal- information/member data. However, as a authenticated user you can't do this even if it's for your own member data. It's been like this forever but it always gets me because there is no error. It just silently fails. The only way around this is to essentially grant permission to the contexted user through USER.manage_permission. Where USER is whatever context you're in. So you could do something like the below which grants the Manager and Authenticated roles temporarily to a user and then removes them.

USER.manage_permission("Manage users", roles=['Manager', 'Authenticated'], acquire = 1) #ON
member = membertool.getAuthenticatedMember()
member.setMemberProperties(mapping={"CVReference": uri})
USER.manage_permission("Manage users", role=['Manager', 'Authenticated'], acquire = 0) #OFF

Yes, it is a little ugly but it's better than having to do some script trash if you can traverse and get membertool. Also, it's safe unless something goes terribly wrong with membertool in which case we could probably wrap that up in a try/except clause and run acquire = 0 at the exception.


Syndicated 2010-07-01 03:10:04 from Christopher Warner » Advogato

29 Jun 2010 (updated 29 Jun 2010 at 04:50 UTC) »

PyCurl update

So I haven't worked on PyCurl for a couple of weeks and received some email thanking me about all of the fixes to cvs head but why I haven't made a release. The issue is that the PyCurl releases were aligned specifically with libcurl and there is a need to keep that congruency. If I update to 7.19.2, then I have to add the abi/api changes that went in for 7.19.1.

Obviously, if I'm going to make a release I may as well target the latest version of libcurl with all of the updated changes that have occurred in that space of time. Also, even though ISAW (Institute for the Study of the Ancient World) is gracious with my time on opensource projects. There is a lot going on right now priority wise so I haven't had work time to spend with it. Needless to say if you have a patch that fixes a bug; or you have found a bug you would like me to diagnose. I'll do so, so long as it's for the current code base. For those of you having problems the fixes are most likely in CVS- HEAD. You can grab that code from my Github repo or from the Sourceforge repo


Syndicated 2010-06-29 03:56:57 from Christopher Warner » Advogato

Zotero is a walled garden.

I've been pretty busy lately, as most of you know I haven't been doing much of anything with linux and have essentially been quiet about my content management activities with few exceptions with Plone here and there (the 27th draws near). Really i've been busy with life, laying low, getting ready for another leg of study, playing pool, trying to get these street signs changed, etc. However I would like to take a moment to talk about some of the tools that have crossed my path and some of what I am working on. I will try my best to keep this as short as possible, primarily because I want to go on my run and it's already late. First up, Zotero, and let me state my employer does not condone anything that comes out of my mouth on my blog and in general may fully disagree. At work I myself may extoll Zotero as a virtue of progress or some such but I'll preface that with a "I wouldn't use it myself"

Zotero is a powerful, easy-to-use research tool that helps you gather, organize, and analyze sources and then share the results of your research.

Yes, the above is true, except for the "share the results of your research piece". You see, Zotero is a walled garden. The inherent problem is that Zotero simply doesn't have an API that allows anything other than Zotero to utilize the data that is put there. I'll give you a simple example. Remember the Compact Disc Data Base aka Gracenote? For sites opensource/frees software sites in syndication this will ring a bell immediately but remember when they said to all of us young ripe teenagers, with all of our cd's and our xmms players in the upper right of the screen, that if we put all of our info into the CDDB and use their tools that we would make it so much easier to share with each other what exactly we were listening to? Do you remember what Gracenote did?  They simply took all of our data and then sold it back to us via licenses. To this day all of us pay a small license to CDDB via our music players (hardware and/or software) to Gracenote for the privilege of the data we provided them; and people STILL provide this data unknowingly! Don't believe me? Click About in iTunes or whatever music player you are using and most likely you'll see Gracenote scroll by. Luckily Freedb came to fruition but it still lags behind Gracenote at this point and so, commercial institutions just purchase the license from Gracenote and pass the cost onto us.

Of course you see, fool me once......*BLINK*  can't be fooled again! Joking aside, it is obvious the data and uses for it, especially bibliography information becomes important. Regardless of just citing an author or work in specific, sharing cited material in any fashion you deem necessary can be a very, very, powerful thing. The use cases become impressive. For instance, being able to cite an author or a list of works from an author and displaying only citations that reference certain type of material. Or drawing a graph based on works that maybe related. With the goal of helping and aiding you in your research. Or even seeing what authors you may have the most in common with based upon works you've cited. Ideas like that become just the tip of the iceberg, for which one needs a powerful, robust and completely open engine. If i'm going to use a tool that I have to input data into I want access to that data in any fashion deemed necessary when I want it and obviously I do not want to have to pay for the privilege.

So, while i'm laying low, getting ready to start another leg of study. Tools like this have crossed my path and they have been championed to me. After taking a look at Zotero, and needing a solution in the interim to scratch my own itch it became clear that something would have to be done. To be honest I wasn't even interested in the tool and could have cared less as I'm not writing any immediate papers. It only became an issue when I had to interoperate with Zotero. Initially I sent email asking for access to the API in which I was told that it was imminent in it's release and that I could try essentially web scraping. Part of me thought it was a joke or maybe my question was misunderstood but the response I received was untenable, unacceptable and generally I wasn't a big fan of the tone in that regard. Also, searching the Zotero website didn't bolster any confidence in me. It seems everyone likes the tag "Open" nowadays but doesn't really like to be Open. We in the free software community are quite familiar with this type of shenanigan and frankly I get tired of it. It's rather funny because in research for this post I came across a quote from George Mason University from whence Zotero was born "anything created by users of Zotero belongs to those users, and that it should be as easy as possible for Zotero users to move to and from the software as they wish, without friction." This in a response to Thomson Reuters who sued GMU in regards to what they feel as Zotero developers reverse engineering EndNote and violating their EULA agreement. Unfortunately, at this time, and from what I have seen that statement is not factual. Maybe applied to EndNote in specific but obviously I want my data as I want it and specifically I want it available to me on the web.

All of the above is testament considering we are into lawsuit territory here, that a truly open and free API is needed for bibliographic data. I should note that Thomson Reuters also produces a tool called OpenCalais which I've spoken about numerous times here before and have used myself on numerous occasion for which one doesn't enter data and it is free and open. "There is no plan to someday "drop the other shoe" and charge folks for the basic service."

All of this leads up to the fact that I don't readily have much time but I am putting some stuff together and will most likely be releasing a prototype of my work. In which case I'd hope that a community of developers can build on it and make it greater etc etc. To my knowledge Zotero is planning to release some api/server kit which is good news. I'm not holding my breath.

That's one simple example of what I am doing now, and I can probably go on but suffice to say to the people that are reading this. My friends who are fellow grad students in teacher and lit programs, my lawyer friends and the list goes on. Please do not use Zotero. At least, not if you really care about accessing your data or sharing it.

It took too long to write this up and semi proof read so now I'll probably forego the run and just watch True Blood unless someone wants to volunteer as a run partner tonight. In which case, i'm down if you wanna do 3-5 miles.


Syndicated 2010-06-23 03:48:31 from Christopher Warner » Advogato

If you are an OpenID Provider please publish your URI for endpoint discovery

If you are an OpenID provider and don't have a published domain or URI where others can search and begin discovery for an OpenID endpoint and services within that resource you aren't helping the adoption of OpenID.

On this list as of current is nearly every provider utilizing OpenID right now. Developers have to waste time searching the internet or obtuse and decaying documentation to figure out where we can start searching for an endpoint in your domain. How about making this information easily available and keeping it updated on a regular basis? At this point I want to just provide my own list but the onus shouldn't be placed on a third-party so I will refrain for now.


Syndicated 2010-06-22 18:47:52 from Christopher Warner » Advogato

200 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!