I have now been able to get the introspector perl scripts to run on the output of rdfproc, a part of redland. All you need to use this now are just the redland, and there are debian packages for them. You can use many tools on this rdf, take a look at http://librdf.org for more information
You are going to want these packages for debian. librdf-perl - Perl language bindings for the Redland RDF library librdf0 - Redland RDF Application Framework librdf0-dev - Redland RDF library development libraries and headers libraptor1 - Raptor RDF Parser library libraptor1-dev - Raptor RDF parser and serializer development libraries and headers
These are two forms of rdf, ntriple and rdf/xml. You can use them with the introspector like this, example given with the ntriples :
1. gunzip the file gunzip c-dump.rdf.gz
2. make a redland repository rdfproc Global parse ntriples file:/ The Global is the name of the repository file:/ is the base address that can be what ever uri you want
That will create a repository in the current directory using berkleydb 6.2M Global-po2s.db -- predicate object index (used to find by field) 9.0M Global-so2p.db -- subject -object index (not used) 9.5M Global-sp2o.db -- subeject predicate index (graph traversal) 25M total
So you have about 9mb of indexes for a 500k zipped ntriples file.
The unpacked sizes are here : 13M Nov 28 15:34 c-dump.rdf 4.7M Nov 28 15:34 c-dump.ntriples
wc(wordcount) on c-dump.ntriples gives lines 96,818, words 387,292, chars 4,846,776
The original source file (expanded with headers) lines 13,270 words 27,221 chars 260,051(254K from ls) c-dump.i
So we are talking about 10x increase in size for indexing.
For example, i have installed the introspector into my home dir : /home/mdupont/EXPERIMENTS/introspector/introspector-0.7 The cvs version is up to date, You can download the release here from sf.net
so, to use it Go to the directory containing the rdf database files perl -I/home/mdupont/EXPERIMENTS/introspector/introspector-0.7 ~/EXPERIMENTS/introspector/introspector-0.7/recurse5.pl node_types:function_decl file:/
the node_types:function_decl is the node types that i am looking for, other interesting ones can be found in the Introspector/GCCTypes.pm file.
I hope that you take some time and play around with the introspector. It is not running perfect, but fast!