Name: Gregory Baumgardner
Member since: 2003-06-16 15:26:24
Last Login: N/A
Homepage: http://chaoticset.perlmonk.org/
Notes:
I like footling about with Perl, I hope to someday footle well enough to be paid for designing websites or application development (hopefully in Perl), Perl Perl Perl, Perl.
I had actually tried to do this once before, just after I tried the fuzzy thingy. This, I know much better; it's really just a matter of shoving the math into the code and figuring out precisely what data structure my brain's using.
I would say I've got about 1/3rd the basic methods in place. The data structure's sort of not important, the way I'm doing it. Most of the rules about stuff are in the methods, which is kind of the way it should be. The resulting code's not the most readable thing I've ever laid eyes on, in the sense that you don't know precisely what a given chunk of code is meant for, but you know what it does.
There was an easy solution for anagrams presented long ago to me, I think in _Programming Pearls_ actually. Sort the letters, compare the sorted results, if they match, they're anagrams.
That's only for single words, though. Palindrome words would be easy enough too. I'm trying to take a nearly 2 meg wordfile and process it so that it can spit out palindromes.
So far I've figured out that you can create palindromes from smaller palindromes, that pivot points in palindromes are always either a position (in the case of even-length palindromes) or else a position and a letter (in the case of odd-length palindromes), and that I have a sort of a system to figure out whether two words can chain together to any degree. Problem is, the system doesn't scale to that amount (which I'm trying to fix with processing the file in other ways first) and the rest of the things I figured out, besides being relatively obvious, don't help reduce the number of word combinations.
Any help is appreciated, but I'm sure eventually I'll get it.
Right now, I've got a system where I'm producing data from word interactions but the process for it is too slow when applied to the dataset (1.7 megs) that it needs to be applied to. Right now it's too slow to handle a tenth of that within four hours.
I'm working on ways to speed it up, but I have a feeling they're few and far between. My first try is a binary encoding of whether or not a word contains various letters of the alphabet (because then I can apply an operation to both and find out whether the words contain any common letters. If two words don't contain common letters, they sure as hell don't need lengthy regex comparison.)
Could be that I'm duplicating effort here, that the regex engine does this already, etc., but I don't think so, and even if it does, it's doing it 6 times or so, and I'm just trying to do it once.
The other side of this problem is that the hash insertion is taking longer and longer for each successive word. With a hundred, it's a non-issue. With over a billion...I'm thinking I'll need a database. I've got MySQL on this machine, and if I really had to I could install something else, so I'll noodle that around, implement it, run it on my test data, and see what happens. If it looks appreciably faster I'll throw it at 1/10 the amount of the full dataset, see what that looks like.
And, if it blazes through that in an hour or less, then we'll look at running the whole dataset through it over the weekend or something.
One of the surprising things that fell out of her mouth was "...and we can look at getting Greg certified for some stuff..."
Now, I'm Greg, and I haven't been thinking seriously about getting certified for anything, frankly. I asked her what she thought I should get certified for, and she rattled off a couple of things -- Java (whatever that's called), A+ possibly, a couple MS operating systems, etc.
Now, while I'm not really opposed to the idea, I am curious. Anybody have any definitive statement on whether or not certification convinces anybody but the people who give the certifications that you're any good?
Realistically, is it financially worth it, or would I be better off taking the money it costs to get certified on X and throwing that into paying myself to learn about X (once, of course, there's money to pay myself with)?
She also talked about getting certified for some stuff herself, but it would all be financial stuff -- tax preparer, etc. -- so it makes sense. The financial world is rife with certification this and qualified that, and it would make a lot of sense. (It would also help me learn how to write financial packages. Someday, I'd like to be able to write a drop-in replacement for, say, Quickbooks. That would make me really, really happy for reasons that -- if they aren't obvious to you -- would probably make no sense.)
As part of the New Crazy Idea (which I have no name, notions, etc. for as of yet) I'm building a Pretty Random Content generator (PRC generator, if you will).
What's it do? Not a whole lot. Not really a generator, now that I think about it -- it's a retrieval device. It's going to browse randomly within a nonrandom set of things and build concept maps out of what it finds. I haven't the faintest notion what the hell I'll be doing with this, except that it will be very crackfuelled.
chaoticset certified others as follows:
Others have certified chaoticset as follows:
[ Certification disabled because you're not logged in. ]
FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!