Older blog entries for chaoticset (starting at number 18)

cmiller: I wish. It's palindromes I seek. I'm hoping to reduce the amount of processing by eliminating pairs of words that don't contain any letters in common.

There was an easy solution for anagrams presented long ago to me, I think in _Programming Pearls_ actually. Sort the letters, compare the sorted results, if they match, they're anagrams.

That's only for single words, though. Palindrome words would be easy enough too. I'm trying to take a nearly 2 meg wordfile and process it so that it can spit out palindromes.

So far I've figured out that you can create palindromes from smaller palindromes, that pivot points in palindromes are always either a position (in the case of even-length palindromes) or else a position and a letter (in the case of odd-length palindromes), and that I have a sort of a system to figure out whether two words can chain together to any degree. Problem is, the system doesn't scale to that amount (which I'm trying to fix with processing the file in other ways first) and the rest of the things I figured out, besides being relatively obvious, don't help reduce the number of word combinations.

Any help is appreciated, but I'm sure eventually I'll get it.

Still working on the palindrome problem. Still no solution in sight.

Right now, I've got a system where I'm producing data from word interactions but the process for it is too slow when applied to the dataset (1.7 megs) that it needs to be applied to. Right now it's too slow to handle a tenth of that within four hours.

I'm working on ways to speed it up, but I have a feeling they're few and far between. My first try is a binary encoding of whether or not a word contains various letters of the alphabet (because then I can apply an operation to both and find out whether the words contain any common letters. If two words don't contain common letters, they sure as hell don't need lengthy regex comparison.)

Could be that I'm duplicating effort here, that the regex engine does this already, etc., but I don't think so, and even if it does, it's doing it 6 times or so, and I'm just trying to do it once.

The other side of this problem is that the hash insertion is taking longer and longer for each successive word. With a hundred, it's a non-issue. With over a billion...I'm thinking I'll need a database. I've got MySQL on this machine, and if I really had to I could install something else, so I'll noodle that around, implement it, run it on my test data, and see what happens. If it looks appreciably faster I'll throw it at 1/10 the amount of the full dataset, see what that looks like.

And, if it blazes through that in an hour or less, then we'll look at running the whole dataset through it over the weekend or something.

My SO and I were discussing the *ahem* "future" of my attempts at getting IT work. We discussed the possibility of setting myself up as a small business within the next 4-9 months. (This would bring a lot of advantages, but I'm still sussing out the disadvantages, which include tax issues and whatnot.)

One of the surprising things that fell out of her mouth was "...and we can look at getting Greg certified for some stuff..."

Now, I'm Greg, and I haven't been thinking seriously about getting certified for anything, frankly. I asked her what she thought I should get certified for, and she rattled off a couple of things -- Java (whatever that's called), A+ possibly, a couple MS operating systems, etc.

Now, while I'm not really opposed to the idea, I am curious. Anybody have any definitive statement on whether or not certification convinces anybody but the people who give the certifications that you're any good?

Realistically, is it financially worth it, or would I be better off taking the money it costs to get certified on X and throwing that into paying myself to learn about X (once, of course, there's money to pay myself with)?

She also talked about getting certified for some stuff herself, but it would all be financial stuff -- tax preparer, etc. -- so it makes sense. The financial world is rife with certification this and qualified that, and it would make a lot of sense. (It would also help me learn how to write financial packages. Someday, I'd like to be able to write a drop-in replacement for, say, Quickbooks. That would make me really, really happy for reasons that -- if they aren't obvious to you -- would probably make no sense.)

Craziness

As part of the New Crazy Idea (which I have no name, notions, etc. for as of yet) I'm building a Pretty Random Content generator (PRC generator, if you will).

What's it do? Not a whole lot. Not really a generator, now that I think about it -- it's a retrieval device. It's going to browse randomly within a nonrandom set of things and build concept maps out of what it finds. I haven't the faintest notion what the hell I'll be doing with this, except that it will be very crackfuelled.

Well, I've spent a big pile of time away from here. Unfortunately, it's mirrored in my relative lack of actual development.

I've hit a big stumbling block with the fuzzy module, so I shelved it for a little while. Good news -- the game matrix solving module I wanted to write ages ago is now within my grasp! I've finally comprehended the general method for solving games outlined in _The Compleat Strategyst_ and can attempt to code it.

Once it's in place (inefficient though it will be for very large matrices), I can look at optimizations. Probably the major one will be removing what appear to be crappy strategies for each player if obviously crappy strategies are there. (The analysis for that will probably get prohibitively complex quick.) Another one could be a reduction in resolution (rounding off a lot of the numbers and then caching results would help if there's a whole lot of values to run.)

Anyways, it's good news. On the bad side of things, I have no real chance for employment.

How often does one find themselves working on multiple projects at once, leaving some for a time (even weeks or months) before touching them again?

I'm trying to find out if I feel okay with the idea, or if it's some sort of slippery slope to slackerness.

Okay. Some time with a pad of paper and some Dew has produced the following rough outline of subroutines:

  1. included -- takes the incoming value and returns firing rules and percentages of inclusion in a hash
  2. fire -- takes that hash and returns the graph points for the resulting rule intersection

I'm already working out a sub called get_midpoint that takes an incoming set of graph points and produces the X value of the vertical line that would divide the shape defined by the set of graph points into two equally large shapes.

Those three combined will take the value from a set of firing rules to a graph to a final scalar value.

More math. I have a solution for the midpoint that involves a whole bunch of odd terms and whatnot, so I have to find some slightly more advanced algebra resources to determine how to resolve that to one side clean, one side icky but solvable.

Frustration about the math model is building, and I'm going to log off and spend some time with my old friend pen and paper to work the kinks out and get the proper formulas translated into Perl.

This math model may be wrong, it may be totally wrong, it may be completely counter to what Kosko meant when he wrote the damn thing, but by Chao it is going to sing and dance when I code it up.

Okay, despite every worst effort on my part I've managed to locate the heart of this thing. It's the curve-calculator that I'm about to write, and was previously an ucky thing that deviated behavior based on the number of incoming points.

Translation: I've started to work on the general method that will actually work, instead of the hacky-specific method that didn't really work at all.

9 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!