Finally making progress. I've been given the task of writing a search engine for one of our clients' site, which uses a (to remain nameless for now) project i've always hated. Initially, i wrote a curl script to log in and fetch the cookie info to be passed on to wget with th -r function to recursively index the site. That didn't work... the curl script did, but the wget didn't. For some reason, i could not get wget to post the cookie information. I tried like hell too, using the --header functions to manually set the cookie and by using the --load-cookies option and neither produced any results.
After failed attempts doing it the sexy, seemingly proper way, i decided to go with the current tradition of modifying an existing search engine to support indexing cookie information. Sounds easy enough, i just need to find the HTTP code and add an extra header. Unfortunately, i couldn't get phpDig installed and working properly, and I don't understand ht://dig. I even looked at phpMySearch, but as is the case with most other phpMy* projects out there it's a flaming piece of shit. phpMyAdmin is gratis, but it is by no means libre, they require that all changes to the code, even private, be mailed to them, and they reserve the right to forbid you/anyone from using their code.
So, now i'm writing it from scratch. I'm ending up writing everything in PHP, which may not be best for the spider/index, but oh well. I've gotten it to fetch a file at my discretion, strip all of the links and do so as a logged in user, so it indexes information for a logged-in user.
I was sick all day yesterday. Very sick, and i can't figure out why. At best, it's something I ate, but i'm still unsure. It's the first time i've been sick in Mexico, which is odd considering how much street food i eat here.
