Older blog entries for ruoso (starting at number 19)

Fortaleza.PM

The recent message from Gabor Szabo for all the leaders of the perl groups sounded me like a reminder of something I've been in debt for a long time, which is to get things moving in the local group. Let me start doing a small history of the group.

The group was officially created in 2005, as an attempt to gather more attention to Perl in the local Free Software community. We were sharing the social meetings with the local debian group, that gathered at the Benfica mall in alternate saturdays to drink some beer. I was able to mention Perl once or twice.

At the same time, I presented some talks in local free software events, in special FLISOL, which also served as advertising, as much as I didin't see the impact at the time, I was surprised to see a Perl workshop in one of the first editions of SESOL (Free Software Week, today CESOL - Free Software Congress), someone else was presenting it.

The main problem of this process is that at the time I had my dailly job in Java J2EE, which limited my possibilities in effective contribution to the language (my uploads to CPAN have a gap from that period).

But in fact, I think the major problem is connected to the dismantling of a team that, I believe, had meanings to move a local group by its own. This team worked for Inova, which was my first job which primary function was Perl programming -- in a way, Alex (company's director) was kinda my mentor in the learning of the language, as he was the one pushing me into uploading my first CPAN module: Server::FastPL (kind of FastCGI but for regular scripts using Unix Sockets, today only on backpan).

Inova had a Perl lab with around 12 people, that developed the webmail service Velop, which supported around 300k active email accounts around 2000. This team learnt and matured toguether in web development, and even got its own framework, not MVC, but got it's job done..

This team was dismantled around the end of 2001, and with that went the opporunity to have an active group here in Fortaleza.

In 2006 I moved to Lisbon, where I lived until 2008, which also didn't help with the local group, which didn't even have a leader at that period.

When I came back to Fortaleza, otoh, I came back working on Perl again, and most importantly, I came back in a good coding rythm, I came back as a SMOP developer (Perl 6 implementation) and I came back as the core developer for the Sistema de Atendimento, a Perl Catalyst solution using SOAP over XMPP, which was published in the Brazillian Public Software Portal, and was already mentioned by the Lula president and by Rogério Santanna (logistics and IT secretary for the planning ministery).

This is getting thing moving again around here. At the same time I got into a devel team at the Prefeitura de Fortaleza to maintain the Sistema de Atendimento and there I'm doing my lobby to the spread of Perl, and the result is that two people are already inside the Catalyst world, and getting excited about it, there's the prospect of a workshop in a local college as well as we're already going to have a Epitáfio hackathon at CESOL. And, one other day, when I was in a meeting at CGDT, I saw one person passing with a copy of "Programming Perl".

Soon we should have support from Oktiva, which uses Catalyst to deploy its sites, to have a new home to Fortaleza.PM. I think I'm on the right track.

Syndicated 2009-10-12 09:54:21 from Daniel Ruoso

Missing :ignoreaccent in Perl 5

So I'm working with natural language processing, and one of the steps in that is tokenization. For those not familiar with the term, it means taking an "expression" and splitting it into the several "parts" of it. In SQL, for instance SELECT * FROM foo WHERE bar='baz' is composed by the tokens 'SELECT', '*', 'FROM', 'foo', 'WHERE', 'bar', '=', '"baz"'. This is not much surprising, a very naive look would say that I could split in word boundaries, but that's just not the case, since the quotes can include spaces inside it, and can include escape characters inside that. But it's still a very simple and very well documented syntax, so tokenization is fairly simple. In fact, you can look at SQL::Tokenizer from the fellow brazillian monk izut.

But unlike SQL, natural languages are everything but simple and well documented, and there are no conformance requirements, because the text is intended for another human being to parse it, not a machine. During this project I realized I was a bit behind in the literature, since I was still using Chomsky as my reference, and the linguistics field already flew apart from that, now accepting that grammars "statistically emerge" from the language use and that a "top-down" approach, which is to define a general grammar model and use it to classify the text is not that much usefull in the long term, since the language use evolve too fast and its variability depending on the media in use, as well as the social environment create conflicting definitions of the grammar. Any resemblance with Perl 6 hability to have custom grammars changing the way the code is parsed is not a mere coincidence.

So, the problem with tokenizing natural language texts, is, at first, the locale problem. In german "ß" would be normalized to "ss", while in greek, it would be "normalized" to "b" and in other languages it should probably be ignored as noise (yes, people often use foreign characters as bullet markers or other decorators). Of course I could do a two-phase tokenization, first doing a naive tokenization to discover the language, then doing a second presuming the locale, but I actually decided to do something alledgedly more clever, which is to ignore the accents when trying to remove the non-important characters, so I can do a simple \W match to remove the non-alpha characters.

The thing I miss in Perl 6 is that you could just use the :ignoreaccent modifier so the match would already match against the base character. What is a simple regex modifier in Perl 6, in Perl 5 needs to be done as:

use strict;
use warnings;
use Text::Unaccent;
use utf8;
use Encode;
use 5.10.0;

my $str = 'têmó�›... åçèñŧos!!!';
my $unac = unac_string('utf8',encode('utf8',$str));
my $d = $unac;
(my $words = $unac) =~ s/(\W)/substr($d,$-[0],1,' ')/ge;

say $str;
say $unac;
say $words;
say $d;

And with that code, I have a much easier task when tokenizing natural languages...

Syndicated 2009-08-30 12:46:05 from Daniel Ruoso

Transactions and Authorization made simple

So I really like to follow DRY: Don't Repeat Yourself. In the development of Epitafio (A cemetery management system I mentioned earlier), I was workin on my model classes - note that this is not a DBIC model, but a regular model that do access a DBIC schema - and I realized that for every single method of the models I would need to do two things:

  • Enclose code in a transaction, much like:
    $schema->txn_do(sub { ... })
  • Authorize the user against a specific role:
    die 'Access denied!' unless $user->in_role('foo')

So I started wondering at #catalyst if there would be a pretty way of doing it. I was already using Catalyst::Component::InstancePerContext, but mst quickly guided me to avoid saving the context itself in the object, but rather getting the values I need from there. Since my app models will basically follow this same principle I did a model superclass with:

package Epitafio::Model;
use Moose;
with 'Catalyst::Component::InstancePerContext';
has 'user' => (is => 'rw');
has 'dbic' => (is => 'rw');

sub build_per_context_instance {
  my ($self, $c) = @_;
  $self->new(user => $c->user->obj,
             dbic => $c->model('DB')->schema->restrict_with_object($c->user->obj));
}
1;

Note that I'm still using the C::M::DBIC::Schema as usual, but I'm additionally making a local dbic schema that is restricted according with the logged user. Check DBIx::Class::Schema::RestrictWithObject for details on how that works, and mst++ for the tip.

Ok, now my model classes can know which user is logged in (in a Cat-independent way) as well as have access to the main DBIC::Schema used in the application. Now we just need to DRO - Don't Repeat Ourselves.

Following, again, mst++ tip, I decided against doing a more fancy solution and gone to a plain and simple:

txn_method 'foo' => authorize 'rolename' => sub {
   ...
}

For those who didn't get how that is parsed, this could be rewritten as:

txn_method('foo',authorize('rolename',sub { }))

This works as:

  • authorize receives a role name and a code ref and returns a code ref that does the user role checking before invoking the actual code.
  • txn_method receives the method name and a code ref and installs a new coderef that encloses the given coderef into a transcation in the package namespace as if it were a regular sub definition.

That means you can have a txn_method without authorization, but you would require

our &foo = authorize 'rolename' => sub { ... }

to get authorization without transaction. But as in my application I'll probably have both most of the time, I thought it should suffice the way it is.

But for the txn_method..authorize thing to parse, both subs need to be in the package namespace at BEGIN time, so to solve that, without having to re-type it every time, I wrote a simple Epitafio::ModelUtil module that exports this helpers.

package Epitafio::ModelUtil;
use strict;
use warnings;
use base 'Exporter';

our @EXPORT = qw(txn_method authorized);

sub txn_method {
  my ($name, $code) = @_;
  my $method_name = caller().'::'.$name;
  no strict 'refs';
  *{$method_name} = sub {
    $_[0]->dbic->txn_do($code, @_)
  };
}

sub authorized {
  my ($role, $code) = @_;
  return sub {
    if ($_[0]->user->in_role($role)) {
      $code->(@_);
    } else {
      die 'Access Denied!';
    }
  }
}

1;

And now the code of the model looks just pretty and non-repetitive ;). See the sources for the full version.

Syndicated 2009-08-18 11:14:09 from Daniel Ruoso

SMOPP5 first steps

After a long time imagining when this day would come, today Paweł Murias has created a github fork of the perl interpreter so we can start working on the integration of SMOP and perl5.

Some of you might have heard me saying that the major reason for SMOP to exist today is the prospective integration with the perl5 interpreter so we can use Perl 6 at the same time as still being able to use all of CPAN, including the things that depend on XS, like the fantastic Gtk2-Perl suite.

In fact, I've been blocking pmurias on some things like replacing the refcounting by a trace gc in SMOP exactly because that would make SMOP incompatible with perl5, and I really want them to cooperate.

This integration should happen at the deepest level of perlguts, where the perl5 interpreter should play the role of the SMOP interpreter and every SV* is also a SMOP__Object*.

Paweł has added smop/base.h to the p5 repo and I started adding the SMOP__ResponderInterface* member to some p5 values (right now _SV_HEAD, which defines the first members of every SV value, and the PerlInterpreter). This is the first step that will allow SMOP to use P5 objects without the need for a proxy value.

After talking with nothingmuch on #p5p, I decided to note here the first set of goals of the SMOPP5 integration:

  • Making every perl value a SMOP__Object*
  • Implemeting Responder Interfaces for each of this values
  • Implementing the SMOP interpreter and continuation class APIs in the perl5 interpreter (using Coro::State for now)
  • Have SMOP objects visible in perl5 using proxy objects as already happens today

This set gives use the SMOP->P5 integration, after that we're going to need the P5->SMOP integration, which should involve hacking in every p5 macro in the core, which is a *lot* of hacking, so I'll not include it as our goals for now, for sanity sake!

Syndicated 2009-08-12 22:46:57 from Daniel Ruoso

Far More Than You Ever Wanted To Know About Typeglobs, Closures and Namespaces

This are the slides of a presentation I gave at a tech meeting in Lisbon about 2 years ago. The slides text is in portuguese, but I'm pretty sure they are understandable for non-portuguese speakers too.

Far More Than You Ever Wanted To Know About Typeglobs, Closures and Namespaces

Syndicated 2009-08-11 16:52:18 from Daniel Ruoso

Too much Perl 6

So, yesterday I was giving a quick perl workshop using Catalyst. The idea was to write a blog in 3 hours. At some point I wrote the following code:

sort { (stat $_)[10] } glob 'posts/*';

And it didn't work, because Perl 5 doesn't DWIM when I have a sort routine that takes only one parameter, while Perl 6 would realize that and use that value for a later sorting. Basically, Perl 6 implemented the schwartzian transform in its core.

Syndicated 2009-07-31 09:08:08 from Daniel Ruoso

Epitáfio

The Perl monks from Brazil accepted a challenge with an important social relevance. We are working, on our spare time, in the development of a system to manage the public cemeteries. Few people know, but the public cemeteries take a fundamental role regarding the respect for human rights, as well as for public health, giving the population that doesn't have the resources to pay for a tombstone in a private cemetery a memorial service and a decent burial.

At this moment we already stablished the features for the first release, and the deadline expectation is that we have the system working in the cemetery for the "dia de Finados" (day in memory of the people who past away). Yesterday we finished the first proposal for the data model (the software will be developed in Portuguese, but I guess you can figure out the meaning of the words).

The discussions regarding the system happen in the #brasil-pm channel at irc.perl.org, we have a space in the perl.org.br wiki to document the development process and a github space to host the source code.

The system will have a Web interface and is going to use PostgreSQL, specially because of the timestamp-related features and also for possibly using PostGIS, allowing to store the spatial information about the tombstones and the cemetery map.

As the development follows, I'll post the updates here.

Syndicated 2009-07-30 07:42:49 from Daniel Ruoso

Implementing SOAP in Perl today

I started looking at SOAP around 2001, and at that time SOAP::Lite was the only viable option to both consume and produce SOAP services in Perl. Everybody knew SOAP::Lite was a mess, but it was pretty much what we got and we were stuck with it.

In 2001 SOAP::Lite wasn't much a problem and that's because it implements the RPC/Encoded format for SOAP messages, and by that time this was pretty much the default. But the usage of SOAP changed, a lot, and the module didn't catch up with that changes.

First of all, we need to understand that SOAP is much more than a way to do RPC, if you need plain RPC without much management, then XML::RPC or JSON::RPC is just fine and will work without any concerns. But if, on the other hand, you need a strict management of the data being transfered as well as a proper documenting of which services use which data types, then SOAP is probably the thing you need.

But for using SOAP, there are two aspects you need to undestand:

Style

The style describes the general semantics of the service. There are two different styles in SOAP:

  • Document: This style is used for services that represent the submission of some arbitrary data that expects a result or not. One example of a service where Document style makes sense is to submit a purchase in a B2B system.
  • RPC: This style is used where you're acessing a procedural or oriented object API of some sort, its semantics define that one specific resource support several operations.

The biggest practical difference is that in the Document Style, you're going to submit one or more XML documents as the message, and this document is the resource to be used by that specific operation. In the RPC style, the first and only child of the body element describes which is the operation that this message is trying to execute, and the parameters for that operation are the direct child of that main element.

Body Use

Indpendent of the style of the service, the other aspect that governs how SOAP works is the "use" of the body. There are two types of "use"

  • Encoded: This referes to SOAP-Encoding, which is a specific format for data serialization described by the SOAP protocol itself. In the early usage of SOAP, this was the main way of exchanging data.
  • Literal: This defines that the content of the message is actually encoded acording to some arbitrary XML Schema, this has the advantage of being able to represent any data that XML can, and also makes the serialization format more decoupled from the language providing the service.

Style/Use

Considering all that, there are four ways of using SOAP:

  • RPC/Encoded: This is the only format supported by SOAP::Lite, and hardly used in new services.
  • RPC/Literal: This allows you to use arbitrary XML data as the parameters and response for an API that you want to expose.
  • Document/Encoded: While this mix is theoretically possible, I have never heard about any use of it.
  • Document/Literal: This is a type of service where you have only one operation per endpoint, and usually don't map well to RPC semantics, since it would require several endpoints to implement the whole API.

WARNING: Microsoft created a pseudo-standard called Document/Literal-Wrapped which is a mix of both RPC/Literal and Document/Literal. The service is described with Document/Literal, but the XML Schemas of the service are made considerably more complex to specify the wrapper element that would be otherwise natural in the RPC Style. It also requires the use of the SOAP-Action header to define which operation is being called. The SOAP-Action header is HTTP specific and was supposed to be used for routing purposes, not to define which is the operation being invoked. Please use RPC/Literal when you need RPC semantics with Literal body use, although I have implemented a compatibility mode for this aberration, it should not be promoted in any way.

Finally, let's implement some SOAP.

The first thing you need to do when implementing a SOAP service is deciding wether your service has RPC or Document semantics. In our example, I'm going to implement a Document-style service "converter", which receives a document describing the amount, the origin unit and the target unit, returning a document with the target amount set.

An example of a RPC style service would be an API that would include things like: get_available_source_units(), get_available_target_units() and other correlated operations. In our case we are going to assume that the document itself provides all the information we need and that the associated metadata (the XML Schema) will provide us sane data.

The second think you need to do is describing your data in the XML Schema format, so for our "unit converter" service, we're going to have a UnitConversion element. If you're used to XML, you know that you need a namespace URI for your data and your services, I'm going to use http://daniel.ruoso.com/categoria/perl/soap-today as the namespace for both the data and the service.

<xsd:schema elementFormDefault="qualified"
 xsd:targetNamespace="http://daniel.ruoso.com/categoria/perl/soap-today"
 xmlns:xsd="http://www.w3.org/2001/XMLSchema"
 xmlns:unit="http://daniel.ruoso.com/categoria/perl/soap-today">
 <xsd:element name="UnitConversion">
  <xsd:complexType>
   <xsd:sequence>
    <xsd:element name="input">
     <xsd:complexType>
      <xsd:sequence>
       <xsd:element name="amount" type="xsd:double" />
       <xsd:element name="unit" type="xsd:string" />
      </xsd:sequence>
     </xsd:complexType>
    </xsd:element>
    <xsd:element name="output">
     <xsd:complexType>
      <xsd:sequence>
       <xsd:element name="amount" type="xsd:double" />
       <xsd:element name="unit" type="xsd:string" />
      </xsd:sequence>
     </xsd:complexType>
    </xsd:element>
   </xsd:sequence>
  </xsd:complexType>
 </xsd:element>
</xsd:schema>

The above XML Schema will support data that looks like:

 <UnitConversion>
  <input>
   <amount>10</amount>
   <unit>kg</unit>
  </input>
  <output>
   <unit>pound</unit>
  </output>
 </UnitConversion>

Now that you specified how your data looks like, you need to specify how your service looks like that is done using the Web Service Description Language -- WSDL. Basically, you define:

  • Messages: The messages describes a single message exchange, independent of the Message Exchange Pattern. The WS-I says that you should have "Request" and "Response" as part of the message names, but I disagree, since a message might be used for both request and response.
  • Port Type: The Port Type describes the interface of the service, independent of the transport, basically grouping the messages into a specific "Message Exchange Pattern" -- usually this is request/response, but you can have both request-only and reponse-only.
  • Binding: The binding associates a given port type with some transport mechanism and describes the attributes of how that message will be transported. It's in the binding that you'll describe you want to use SOAP as well as the style/use.
  • Service: The service groups a set of bindings defining where are the endpoints for the client to access the services.

I'm not going to explain each detail of the WSDL here, but I think it's pretty straight forward. WSDLs are complex when generated by tools like .Net, but it doesn't need to be. This WSDL assumes that the previous XML Schema is saved as unit-conversion.xsd.

<wsdl:definitions
  xmlns:http="http://schemas.xmlsoap.org/wsdl/http/"
  xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"
  xmlns:unit="http://daniel.ruoso.com/categoria/perl/soap-today"
  targetNamespace="http://daniel.ruoso.com/categoria/perl/soap-today">
 <import namespace="http://daniel.ruoso.com/categoria/perl/soap-today"
         uri="unit-conversion.xsd" />
 <wsdl:message name="ConvertUnit">
  <part name="UnitConversion" element="unit:UnitConversion" />
 </wsdl:message>
 <wsdl:portType name="ConvertUnit">
  <wsdl:operation name="convert_unit>
   <wsdl:input message="unit:ConvertUnit" />
   <wsdl:output message="unit:ConvertUnit" />
  </wsdl:operation>
 </wsdl:portType>
 <wsdl:binding name="ConvertUnitSOAPHTTP" type="unit:ConvertUnit">
  <soap:binding transport="http://schemas.xmlsoap.org/soap/http" style="document"/>
  <wsdl:operation name="convert_unit>
   <wsdl:input>
    <soap:body use="literal">
   </wsd:input>
   <wsdl:output>
    <soap:body use="literal">
   </wsdl:output>
  </wsdl:operation>
 </wsdl:binding>
 <wsdl:service name="ConvertUnitService">
  <wsdl:port name="ConvertUnit" binding="unit:ConvertUnitSOAPHTTP">
   <soap:address location="http://localhost/myservice" />
  </wsdl:port>
 </wsdl:service>
</wsdl:definitions>

Ok, now let's really implement it

Now that we're gone through all the overhead of SOAP (and I say it again: if you think this overhead is overkill, go use XML::RPC or JSON::RPC instead, but if you need the services to be strictly documented, SOAP is what you need) we can implement the service itself. And for that we're going to use Catalyst::Controller::SOAP.

I'm not going to explain how to create a Catalyst application, there are lots of tutorials on how to do it, and if you're still reading this, you're probably aware of Catalyst already. So, after you create your Catalyst application, you need a controller that will implement the service, all you need to do is subclass Catalyst::Controller::SOAP and implements the service itself.

package MyApp::Controller::UnitConverter;
use strict;
use warnings;
use base qw(Catalyst::Controller::SOAP);

__PACKAGE__->config->{wsdl} =
  {wsdl => '/usr/share/unit-converter/schemas/UnitConverter.wsdl',
   schema => '/usr/share/unit-converter/schemas/unit-conversion.xsd'};

sub convert_unit :WSDLPort('ConvertUnit') {
    my ($self, $c, $unit_conversion) = @_;
    my $data = $unit_conversion->{UnitConversion};
    if ($data->{input}{unit} eq 'kg' &&
       $data->{output}{unit} eq 'pounds') {
      $data->{output}{amount} =
        2.20462262 * $data->{input}{amount}
      $c->stash->{soap}->compile_return($unit_conversion);
    } else {
      $c->stash->{soap}->fault({ code => 'SOAP-ENV:Client', reason => 'unsupported' });
    }
}

1;

And voilá! You have a SOAP service running. Please refer to the Catalyst::Controller::SOAP docs for more information on the details or just ask me, either in the comments or at #catalyst@irc.perl.org

Syndicated 2009-07-20 15:23:05 from Daniel Ruoso

Dice Game Perl 6

Following SF, I thought I could present an interesting solution to the dice game as in If you only had one programming language to choose –or– Let the FUD be with you.

SF did rewrite the same algorithm in Perl 6, but I thought I could give a more Perl 6 approach to the problem, leading to the following code:

sub dice($bet, $dice) {
  given $dice {
    when * <=  50 {          0 }
    when * <=  66 {       $bet }
    when * <=  75 { $bet * 1.5 }
    when * <=  99 {   $bet * 2 }
    when * == 100 {   $bet * 3 }
  }
}
sub MAIN($bet, $plays) {
  my $money = 0;
  $money += dice($bet, int(rand() * 100)+1) for ^$plays;
  say "{$bet * $plays}\$ became $money\$ after $plays plays:
     You get {$money / ($bet * $plays)}\$ for a dollar";
}

Let's go through the code step-by-step...

sub MAIN

This is a very handy thing that comes in in Perl 6 natively, if you declare a signature to a specially named subroutine MAIN, this signature will be used as GetOpt instructions, in the code above I asked for two positional arguments, which would mean two parameters:

perl6 dice.pl 40 100

But I could also ask for named parameters and it would require named command-line switches. Very handy.

for ^$plays

The prefix:<^> operator, when used with a number, generates a Range from 0 to that number - 1, so, it would be the same as 0..($plays - 1), but as the number of the play is not important here, it would have the same effect as 1..$plays... Very handy too.

"{$bet * $plays}"

Quotes in Perl 6 are clever, you can open a bracket and type in an expression that will be evaluated.

when * <= 50

This is the Whatever in action. It will generate a closure that will ask for one parameter, when knows about it and sends the "given" value to it. Very handy indeed.

Syndicated 2009-07-06 17:04:59 from Daniel Ruoso

How do we get out of this mess?

The Perl vs DarkPAN issue is evolving a lot these days, I'd like to explore a viable option so we get out of this mess.

Let me just review which are the options that are clearly not acceptable by one side or the other:

  • Have the defaults for Perl 5 changed, in a backward-incompatible way, favoring Modern Perl
  • Enforce backward-compatibility in order to preserve the huge amount of software running in Perl around the world.

It seems pretty clear that going to each extreme will not build any consensus, so, I was thinking about the way "perl -E" works on 5.10, and I was wondering if we could do one of:

  • Have the defaults changed, but with a switch that would enable "backward-compatible mode"
  • Have the defaults kept, but with a switch that would enable "modern-perl mode"

Basically, what I'm proposing is that we assume that there are two Perls in perl, and by a switch you can select the one you're going to work with. Even if the "modern-perl mode" is assumed to be default, simply adding a switch to the invocation of the interpreter in the old systems wouldn't be much hard.

Syndicated 2009-07-05 09:08:10 from Daniel Ruoso

10 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!