Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

March 23 2013

Build queries for Lucene/Solr indexes in PHP

We use Solr a lot at InterNations. Beside usual full text searches, we use it every time we need to receive documents nearly free of charge. It is fast, it is stable and, after some wrestling, our data import works very well as well.

For more than a year we added more and more functionality to a component for building Solr/Lucene queries programmatically. We provide two different ways to create queries: for complex queries we use an expression builder, for simpler ones we have a string based class. A huge advantage of a programmatic API is security: while Lucene’s query language is read-only and therefore non-destructive, query injections can lead to serious data breaches, which both components help to avoid by escaping input strings.

We now feel they are mature and stable enough to be released to the public: say hello to internations/solr-query-component.

January 17 2013

Functional programming in PHP

PHP has traditionally been a simple, procedural language that took a lot of inspiration from C and Perl. Both syntax wise and making sure function signatures are as convoluted as possible. PHP 5.0 introduced a proper object model but you know all of that already. PHP 5.3 introduced closures and PHP 5.4 improved closures very much (hint: $this is available per default).

What is functional programming anyway?

After a few years introducing more and more functional elements to my source code, it is not that straight forward to answer. I still do not have a coherent definition, but “I know it when I see it”. Let me put it that way: functional programs generally do not alter state, but use pure functions. Pure functions take a value and return another value without altering its input argument. The opposite example is a typical setter in an object oriented context.

Typical functional programming languages support higher order functions, that is, functions that take or return other functions. A lot of them support a concept that is called currying or partial function application (PFA).

Additional characteristics found in functional programming languages are elaborated type systems that e.g. use option types to prevent null pointer issues typical for imperative or object oriented programming languages.

Functional programming has a lot of desirable attributes: not fumbling with state makes parallelism easier (not easy, it’s never easy), focusing on the smallest unit of reusable code, a function, can lead to really interesting effects with regards to reusability, requiring functions to be deterministic is generally a good idea for stable software.

What does PHP has to offer?

PHP is not a “real” or “pure” functional language. Far from it. We don’t have a proper type system, the cool kids make fun of our exotic syntax for closures and we have array_walk() that looks functional but allows altering state.

Nevertheless, there are a few interesting building blocks for functional programming. Let’s start with call_user_func, call_user_func_array and $callable(). call_user_func takes a callback and a list of arguments and invokes the given callback with the given arguments. call_user_func_array does something similar, except, it takes an array of arguments. That’s pretty much the same as fn.call() and fn.apply() in JavaScript (without passing a scope). A less well known shiny new thing in PHP 5.4 is the ability to call functions directly on callables. callable is a meta type in PHP (as in: it consists of various underlying types): a callable can be a string to call simple functions, an array of <string,string> to call static methods, and array of <object,string> to call object methods, an instance of Closure or anything implementing the __invoke() magic method, also known as a Functor. This will look like that:

$print = 'printf';
$print("Hello %s
", 'World');

Additionally, PHP 5.4 introduces a new type hint “callable” that provides a simple contract for the callable meta type.

PHP also supports anonymous functions. As I said before, the Haskell community made fun us but hey, we finally have it. The jokes are somewhat expected, because the syntax is somewhat verbose. Let’s see a simple Python example.

map(lambda v: v * 2, [1, 2, 3])

Nice. Let’s see a little Ruby example:

[1, 2, 3].map{|x| x * 2}

Also pretty, nevertheless we use a block here and not strictly a lambda expression. Ruby has lambda as well, but List.map happens to take a block, not a function. Next example is Scala:

List(1, 2, 3).map((x: Int) => x * 2)

For a strictly typed language that is pretty compact. Now look at PHP:

array_map(function ($x) {return $x * 2;}, [1, 2, 3]);

A function keyword and no implicit return is what makes it look quite cumbersome. But anyway, it works. Another building block for functional programming.

array_map is a good start, there is also array_reduce. Another two important functions.

A real world functional example

Let’s start with a simple program to calculate totals in a shopping cart:

$cart = [
    [
        'name'     => 'Item 1',
        'quantity' => 10,
        'price'    => 9.99,
    ],
    [
        'name'     => 'Item 2',
        'quantity' => 3,
        'price'    => 5.99,
    ]
];
 
function calculate_totals(array $cart, $vatPercentage)
{
    $totals = [
        'gross' => 0,
        'tax'   => 0,
        'net'   => 0,
    ];
 
    foreach ($cart as $position) {
        $sum = $position['price'] * $position['quantity'];
        $tax = $sum / (100 + $vatPercentage) * $vatPercentage;
        $totals['gross'] += $sum
        $totals['tax'] += $tax
        $totals['net'] += $sum - $tax; 
    }
 
    return $totals;
}
 
calculate_totals($cart, 19);

Yes it is a simple example that will only work for a single market but it’s a halfway complicated calculation and we can easily refactor into a more functional style.

Let’s use higher order functions first:

$cart = [
    [
        'name'     => 'Item 1',
        'quantity' => 10,
        'price'    => 9.99,
    ],
    [
        'name'     => 'Item 2',
        'quantity' => 3,
        'price'    => 5.99,
    ]
];
 
function calculate_totals(array $cart, $vatPercentage)
{
   $cartWithAmounts = array_map(
       function (array $position) use ($vatPercentage) {
           $sum = $position['price'] * $position['quantity'];
           $position['gross'] = $sum;
           $position['tax'] = $sum / (100 + $vatPercentage) * $vatPercentage;
           $position['net'] = $sum - $position['tax'];
           return $position;
       },
       $cart
   );
 
   return array_reduce(
       $cartWithAmounts,
       function ($totals, $position) {
           $totals['gross'] += $position['gross'];
           $totals['net'] += $position['net'];
           $totals['tax'] += $position['tax'];
           return $totals;
       },
       [
           'gross' => 0,
           'tax'   => 0,
           'net'   => 0,
       ]
   );
}
 
calculate_totals($cart, 19);

Now we no longer alter state, not even in the function itself. array_map() returns a new array of cart positions with gross, tax, net amounts and array reduce puts together the totals array. But can we go further? Can we make it much simpler?

What if we destructure the program further and abstract it to what it really does:

* Sum an element of an array multiplied it by another element * Take away a percentage of that sum * Calculate the difference between the percentage and the sum

Now we need a little helper. This little helper is functional-php, a small library of functional primitives I’ve been developing for a few years now. First, there is Functionalpluck() that does the same as _.pluck() from underscore.js. Another helpful function is Functionalzip(). It “zips” together two lists, optionally using a callback. Functionalsum() sums the elements of a list.

use Functional as F;
$cart = [
    [
        'name'     => 'Item 1',
        'quantity' => 10,
        'price'    => 9.99,
    ],
    [
        'name'     => 'Item 2',
        'quantity' => 3,
        'price'    => 5.99,
    ]
];
 
function calculate_totals(array $cart, $vatPercentage)
{
    $gross = Fsum(
        Fzip(
            Fpluck($cart, 'price'),
            Fpluck($cart, 'quantity'),
            function($price, $quantity) {
                return $price * $quantity;
            }
        )
    );
    $tax = $gross / (100 + $vatPercentage) * $vatPercentage;
 
    return [
        'gross' => $gross,
        'tax'   => $tax,
        'net'   => $gross - $tax,
    ];
}
 
calculate_totals($cart, 19);

A good counter argument is: is that really easier to read. At first: no, at a second look: you’ll get used to it. It took me while to get used to the syntax of Scala, it took a while to learn object oriented programming and it takes a while to grasp functional programs. Is this the perfect solution? No. But it shows what you can do by thinking more in terms of applying functions to data structures instead of using expressions like foreach to handle work on data structures.

What else can we do?

Ever had issues with null pointer exceptions? There is php-option that provides an implementation of a polymorphic “maybe type” using a PHP object.

Than there is partial application: it transforms a function that takes n parameters to a function that takes <n parameters. Why is that helpful? Think about extracting the first character from a list of strings.

The boring way:

$list = ['foo', 'bar', 'baz'];
$firstChars = [];
foreach ($list as $str)  {
    $firstChars[] = substr($str, 0, 1);
}

The functional’ish way without PFA (partial function application):

array_map(function ($str) {return substr($str, 0, 1);}, ['foo', 'bar', 'baz']);

The way with PFA using reactphp/curry (my favorite currying implementation for PHP):

use ReactCurry;
array_map(Curryind('substr', Curry…(), 0, 1), ['foo', 'bar', 'baz']);

Yes. (HORIZONTAL ELLIPSIS, U+2026) is a valid function name in PHP. But if you do not like that, use Curryplaceholder() instead.

The end

Functional programming is a fascinating topic and I needed to name a single thing that I learned the most from in the
last years, it was looking into functional paradigms. It’s so different it will make you brain hurt. But in a good way.
Ah, one last thing: read Real World Functional Programming. It’s full of good advice and real world examples.

Update

Thank you Christopher Jones for fixing the higher order function example (the second step).

Update II

Thank you Anthony Ferrara for pointing out that the array_map example was wrong. Gotta love parameter ordering.

Update III

There is a russian translation.

December 20 2012

Polite Exceptions – Fixing the stepchild of API design

Every API has an visible, an invisibile and a hidden part. The visible part is obvious: public methods and properties but also constants and parameter values. That’s the most visible part to any client (read: user) of your API. The invisible part is everything private, you can’t really see it and – more important – you can’t use it (except if you resort to reflection). The hidden part consists of all the protected symbols, as you can’t really see them until you extend a class. The other hidden part are Exceptions. You can’t really see them and there is no common expectation what methods throw what kind of exception. Yes, throws@-docblocks help, but that’s mostly all we have.

Exceptions handling: the problem

The usability of hidden parts of an API is all about expectations: people love languages like Ruby because once you learned a certain set of API (e.g. the string API), you can instinctively infer a large part of other APIs. This is good and keeps learning costs down. PHP, on the other hand, with its historically grown standard extension is on the opposite site of the fence: various parameter order the naming scheme is unreliable at best.
The future is multi-lingual, you need to know more than one programming language and speed of learning matters. Like, a lot. Because “X programmers”, for any value of X, are weak players. What type of exception a class might throw should be defined by clear expectations for the general case. If you use a preconceived HTTP client httpFoo, call method request() and want to handle exception cases, what exactly do you catch?

Talent borrows, genius steals

Zend Framework 2 has a lot of problems but there are two things they did particularly well: naming of abstract classes and interfaces and how they treat exceptions. Every component (see, component is a lie here, as they aren’t really stand alone components but I digress) has its own exception subpackage which has extension specific exceptions. Those exception all implement a single marker interface called ExceptionInterface. If you use ZendSomething and want to handle all exceptions, just catch ZendSomethingExceptionExceptionInterface.

Programming transaction costs

Time to relevant data is the new time to market. We no longer optimize for feature-complete products shipping on a certain date but relevant changes generating relevant data as soon as possible. Therefore programmer round-trips matter. I consider everything that is not core domain or core UI a round trip:

  • I need to create another config file
  • I need to write another test
  • I need to ad a few more specific exception classes
  • I need to write a new contract

These steps aren’t worthless, they are worth less from a business perspective as they don’t generate revenue very soon. However they are needed to keep revenue over time. So let’s make those things cheaper.

When dealing with Exceptions in Symfony 2 projects, two steps are particularly expensive:

  • Creating the initial Exception infrastructure for a bundle
  • Creating new specific Exception classes for bundles

Especially the latter can be simplified quite dramatically.

Simplifying

To simplify Exception handling, we just open sourced a bundle we developed at InterNations. Let’s create a few custom exceptions:

php app/console exception:generate app/src/MyVendor/MyBundle "MyVendorMyBundle" 
 ExceptionInterface RuntimeException DomainException RuntimeException:SpecificRuntimeException
Create directory app/src/MyVendor/MyBundle/Exception
Writing app/src/MyVendor/MyBundle/Exception/ExceptionInterface.php
Writing app/src/MyVendor/MyBundle/Exception/RuntimeException.php
Writing app/src/MyVendor/MyBundle/Exception/SpecificRuntimeException.php
Writing app/src/MyVendor/MyBundle/Exception/DomainException.php

Let’s rewrite an existing bundle to use custom exceptions:

php app/console exception:rewrite app/src/MyVendor/MyBundle "MyVendorMyBundle"
Found bundle specific exception class BadFunctionCallException
Found bundle specific exception class BadMethodCallException
Found bundle specific exception class DomainException
Found bundle specific exception class InvalidArgumentException
Found bundle specific exception class LengthException
Found bundle specific exception class LogicException
Found bundle specific exception class OutOfBoundsException
Found bundle specific exception class OutOfRangeException
Found bundle specific exception class OverflowException
Found bundle specific exception class RangeException
Found bundle specific exception class RuntimeException
Found bundle specific exception class UnderflowException
Found bundle specific exception class UnexpectedValueException
...............
------------------------------------------------------------
------------------------------------------------------------
SUMMARY
------------------------------------------------------------
------------------------------------------------------------
Files analyzed:               15
Files changed:                1
------------------------------------------------------------
"throw" statements found:     2
"throw" statements rewritten: 1
------------------------------------------------------------
"use" statements found:       1
"use" statements rewritten:   1
"use" statements added:       1
------------------------------------------------------------
"catch" statements found:     0

You’ll find the ExceptionBundle over at github. It uses PHP Parser to rewrite code which proves again to be a wonderful project.

December 02 2012

The state of meta programming in PHP

Quoting Wikipedia

Metaprogramming is the writing of computer programs that write or manipulate other programs (or themselves) as their data, or that do part of the work at compile time that would otherwise be done at runtime

Metaprogramming is quite an interesting sub-discipline and knowing about certain techniques and tools allows you to cut corners quite dramatically for certain tasks. As always, don’t overdo but to find out when you are overdoing, first start doing, get excited, overdo, find out the right dose. Let’s have a look at what kind of tools you have available in PHP to solve typical meta programming problems.

What kind of questions can meta programmatic APIs answer?

I would group metaprogramming into three sub areas: type introspection, lower level syntax inspection and metadata management. Typical type introspection questions are:

  • How many arguments does this function have
  • What kind of types does this function take
  • Is this class abstract
  • In what namespace is this class defined

On a lower level you typically interact with a certain kind of syntax tree to answer questions like:

  • Where is an array declaration happening
  • Where does a method start, where does it end
  • Do all switch statements have a break statement

A third category is adding metadata to the declared types: Java, C# and a few others have first-class Annotation support for this kind of things but PHP only has user space solutions so far. A few things you need metadata for:

  • This property is stored in the database as column foo
  • I need the dependency Bar here
  • This method should be access protected to the rules of the DSL I put here
  • This method returns a value of type ABCD

The toolkit

Reflection APIs

PHP core delivers 2.5 key APIs for meta programming. The first one is ext/reflection. You can create reflection classes form a lot of things, functions, classes, extensions and use them to make programming assumptions about the APIs you are introspecting.

A simple example to find out the number of required parameters for each method in the class DirectoryIterator:

<?php
$class = new ReflectionClass('DirectoryIterator');
foreach ($class->getMethods() as $method) {
    $numberOfRequiredParameters = $method->getNumberOfRequiredParameters();
}

Refection is all nice and shiny, except when you don’t want to include everything you want to inspect. This is of interest if you inspect various source trees at once that declare duplicate symbols. To do so, there is PHP-Token-Reflection by Ondřej Nešpor. It’s a pretty nifty replacement for ext/reflection completely built in user land and on top of ext/tokenizer that even copes with invalid declarations. Additionally it fixes some oddities of the internal reflection API but tries to keep it as close as possible. I’ve played around with it a bit and I quite like it.

<?php
$broker = new TokenReflectionBroker(new TokenReflectionBackendMemory());
$broker->processDirectory("path/to/src");
$class = $broker->getClass('MyClass');
foreach ($class->getMethods() as $method) {
   ...
}

Tokenizer

Another core API, this time much more low level, is ext/tokenizer. If enabled at compile time it allows you to parse PHP source code into a list of tokens. Because the API is so low level it is quite hard to use without a proper abstraction layer on top of it. Most of the successful projects built upon ext/tokenizer have built one. One of them is phpcs by Greg Sherwood that built an Token Stream abstraction on top of ext/tokenizer that allows much more convenient navigation in the token stream. Another one shipping its own token stream abstraction is pdepend by Manuel Pichler. Another noteworthy, standalone abstraction is php-manipulator.
For an example on how the raw API can be used, I once wrote this little script to apply a few transformations to source trees to ease converting source trees to PHP 5.4.

PHP Parser: a fully fledged AST parser for PHP

Between a high level API like Reflection and a low level API like ext/tokenizer there surely is a gap: what if I want to work on an AST data structure. There is this beautiful project PHP-Parser by Nikita Popov. This is quite interesting for more complex transformations like user space AOP, all kinds of static code analysis and so on. If ext/tokenizer feels way underpowered, have a look at this project.

Aspect oriented programming

While we are talking about AOP: a relative newcomer is PECL AOP that provides a quite simple API for aspect orientated programming in PHP. For Zend Framework 2 there is also an AOP module available. Let’s stick to AOP for a moment: for Symfony 2 there is JMSAopBundle by Johannes Schmitt. It provides basic AOP functionality for Symfony 2. JMSSecurityExtraBundle and JMSDiExtraBundle use it to provide annotation support for Symfony security bundle and the Symfony dependency injection component.

Metadata management

Traditionally, every docblock documentation parser rolled it’s own annotation system. This changed a little with the rise of Symfony and Doctrine 2. Doctrine 2 allows you to use annotations for persistence definition and Symfony allows you to use annotations for a lot of things (routes, security, etc.). While Doctrine still ships it’s own metadata handling component in doctrine-common, there is another library by Johannes Schmitt, Metadata that aims to consolidate metadata handling for PHP. The API of the Metadata library as well as the one of doctrine-common is quite simple: you have some sort of annotation reader that maps metadata information to classes. Think about this annotation:

<?php
use MyAnnotationSome;
/**
  * @some(foo="bar")
  */
class MyClass
{}

This kind of annotation will map to an instance of MyAnnotationSome with the property $foo set to “bar”.

Radioactive, specialized or obscure

Ever dreamed of renaming functions, redeclaring classes and so on? Let us not discuss whether this is a good idea or not, but if you would like, look no further: there is runkit for that (I think this is the most current fork).

If you want to access the opcodes of a your code, Stefan Esser wrote bytekit for you (bytekit.org is no longer available, I only found Tyrael/bytekit and Mayflower/Bytekit). To make working with bytekit data a little more convenient, Sebastian Bergmann wrote bytekit-cli.

To register callbacks at every function call, there is funcall by Chen Ze and intercept by Gabriel Ricard.

One should not forget about xdebug by Derick Rethans that provides a quite specialized sub-sub-sub-discipline: code coverage analyis.

The future

PHP core itself could really use native support for annotations. This would fix little differences in how annotations are used nowadays by major projects. Another very interesting development is quite definitely PHP AOP. I would consider that a candidate for core inclusion at some point.

The userland libraries could see some consolidation and now that we have composer dependency management isn’t so much of a problem. Especially in the Symfony 2 world, reusing the same metadata framework would make totally sense. A first step is that Zend Framework 2 uses doctrine-common for annotations support.

June 25 2012

Latest sprint at InterNations III

What we did the last two weeks at our little startup, InterNations

  • Tons of bug fixes, tons of general product improvements
  • Learned a lot about HAProxy and keepalived and going to replace diverse load balancing solutions for different protocols with a single solution
  • Learned to love RabbitMQ (Erlang, mnesia, durable queues, nice failover mechanisms) and reimplemented our internal mailing infrastructure. Lot of work but it looks good so far. While doing that we fixed a number of issues (#62410, #62411, #62412) with PECL amqp and Swiftmailer
  • Got four new machines up and running and (for the better part) into production.
  • Learned more about how to structure our site navigation-wise in the future

June 09 2012

Latest sprint at InterNations II

What we did in the last weeks at InterNations

  • Had an ugly downtime due to our master database crashing
  • Implemented registration via Facebook (with a little help of FOSFacebookBundle)
  • Learned a lot about message queues and decided to go with RabbitMQ and possibly RabbitMqBundle
  • A colleague who developed an enthusiastic hatred for JavaScript in the last years started doing JS with Backbone and thinks it was fun (Yay!)
  • Gradually rolled out our new registration process (still rolling)
  • Got annoyed by Google’s decision to discontinue Google Website Optimizer but still started using it (for the lack of better alternatives)
  • Improved our backup and restore strategy
  • Worked on our product agenda and what to do next

If this stuff sounds interesting and you want to work with us, drop me a mail

May 23 2012

Latest sprint at InterNations

This will be my new series about what product and development does at InterNations. Of course I can’t tell you everything :)

January 15 2012

Drupal as a Content Repository — a few months later

I’ve recently blogged about how we use Drupal as a Content Repository. I wanted to write a lessons learned follow up post to see what worked out and what we needed to adjust.

Where we are

We still use Drupal as a Content Repository and just consume it’s content data via webservices to let our application do the complicated rendering. We launched the external part of our content in November 2011 (see our Expat Magazine and our Country & City Guides) and the internal part in December (you need to request an account to see it). Development of both parts was smooth, but we reached some limits of what one can do with Drupal’s view module and we needed to adjust our “no custom code” to “as little custom code as possible”.


Continue reading "Drupal as a Content Repository — a few months later"

October 17 2011

Drupal as a Content Repository

As one of my first projects at InterNations we want to introduce rich content management functionality for internal usage. We have a custom made PHP application and want to publish a bunch of content to provide our customers with an even richer experience and greater service. Our requirements can be read along the lines of:

  • Provide an easy to use interface for content and media management for our editorial team
  • A limited set of fairly complex content types (multi page articles, etc.)
  • CMS features like versioning, custom attributes, workflows
  • Deep integration into our custom application
  • Halfway complex rules based on categories (or taxonomies, as Drupal takes it)
  • A few edits per day, not many per hour

Continue reading "Drupal as a Content Repository"

May 06 2011

Dependency Injection Container Refactorings, Part Two

This is part of a mini-series about typical refactorings when using DI containers. Read part one.


(c) Jil A. Brown

Introduce Parameter

When configuring objects you will stumble upon occurrences of duplicated configuration. As configuration duplication is as bad as code duplication, making refactorings and maintenance time-intense and error-prone, we try to avoid them. Occurrences I had, started from defining the same hosts over and over for different services and quasi hard-coded upload prefixes for files sprinkled all over my configuration. I will illustrate this refactoring with the image upload example. We configure Zend_File_Transfer and add a few validators to allow image uploads but only specific ones:

<?xml version="1.0"?>
<container>
   <services>
      <service id="fileTransferService" class="Zend_File_Transfer">
          …
         <call method="addValidator">
            <argument>Count</argument>
            …
            <argument>photo</argument>
         </call>
         <call method="addValidator">
            <argument>Size</argument>
            …
            <argument>photo</argument>
         </call>
         <call method="addValidator">
            <argument>MimeType</argument>
            …
            <argument>photo</argument>
         </call>
         <call method="addValidator">
            <argument>ImageSize</argument>
            …
            <argument>photo</argument>
         </call>
      </service>
   <services>
</container>

When adding validators to Zend_File_Transfer the fourth argument (in this case photo) is the name of the array key of the file. In our case the markup would look like this:

<input type="file" name="photo"/>

The specific key is important if you allow the upload of various file types in one request. Now we change the requirements and allow not only photos but photos and PDFs (in the same input as photos, so that the user does not need to use different inputs based on file formats). To not mislead the next programmer working on this piece of code, we should change the markup to something like this (give me a better name please):

<input type="file" name="photoOrPdf"/>

Now we open our container configuration and change every occurrence of “photo” to “photoOrPdf” and hope not to forget one. Except the one you’ll find out two month later. To avoid this duplication of configuration, we introduce a parameter and our container configuration changes.

<?xml version="1.0"?>
<container>
   <parameters>
       <parameter key="filePrefix">photoOrPdf</parameter>
   </parameters>
   <services>
      <service id="fileTransferService" class="Zend_File_Transfer">
          …
          <call method="addValidator">
             <argument>Count</argument>
             …
             <argument>%filePrefix%</argument>
          </call>
          <call method="addValidator">
             <argument>Size</argument>
             …
             <argument>%filePrefix%</argument>
          </call>
          <call method="addValidator">
             <argument>MimeType</argument>
             …
             <argument>%filePrefix%</argument>
          </call>
          <call method="addValidator">
             <argument>ImageSize</argument>
             …
             <argument>%filePrefix%</argument>
          </call>
       </service>
   </services>
</container>

To make things even more smooth we could inject that parameter into the view and into the controller to make sure, configuration value duplication is no longer an issue with this specific module.

Parametererize Service

Excluded, as I no longer think this is actually a good idea.

Allow Environment Specific configuration

When you have a development process where you pass several acceptance stages before an artefact goes into production, these stages are typically slightly different from each other. Starting from different service IP addresses over single machine vs. multi machine, there will definitely be some variance among them. Typical variances are:

  • Logger settings: severity filters, logging targets like file on development, syslog on the rest
  • Database settings master with fake slave a.k.a. read only database user on development, master slave on the rest
  • Error handling modes especially for more introspective components: “Hard fail” vs. “soft fail and log”
  • Caching: no caching on development, caching enabled on testing and production stages
  • Code generation and building: “rebuild on request” on development, once per deployment on testing and production

One way to do so is to sprinkle conditions all over your application and check on which host you are but that will lead to an application well beyond manageability. That’s why I was never happy (at least for large applications >100 person-days) with typical PHP application configurations like the preposterous config.inc.php. Having a touring complete programming language at hand for configuration will eventually introduce ugly conditionals making configurations unreadable. But I digress.

There are various models for stage configuration, including inheritance from each former stage, inheritance from a main configuration, standalone configuration and all mixes of these models. All of them are well implementable with the Symfony 2 dependency injection container. Let’s start with the most simplistic one, standalone configuration for each stage:

<?php
$container = new ContainerBuilder();
$loader = new XmlFileLoader($container, new FileLocator(…));
$loader->import($currentStage . '.xml');

A more complicated one is main configuration + override per stage:

<?php
$container = new ContainerBuilder();
$baseLoader = new XmlFileLoader($container, new FileLocator(…));
$baseLoader->import('main.xml');
$stageLoader = new XmlFileLoader($container, new FileLocator(…));
$stageLoader->import($currentStage . '.xml');

The most “complicated” would be linear inheritance, where testing extends development, staging extends testing and so on:

<?php
$container = new ContainerBuilder();
$loader = new XmlFileLoader($container, new FileLocator(…));
foreach (array('development', 'testing', 'staging', 'production') as $stage) {
    $loader->import($stage . '.xml');
    if ($stage == $currentStage) {
        break;
    }
}

With this kind of setup you can override configuration.

Example main.xml:

<?xml version="1.0"?>
<container>
   <parameters>
        <parameter key="database.name">application</parameter>
        …
   </parameter>
   <services>
       <service id="component" class="MyComponent"/>
       <service id="component2" class="MyComponent2"/>
   </services>
</container>

testing.xml with different database name and an alternative for component2:

<?xml version="1.0"?>
<container>
   <parameters>
        <parameter key="database.name">another_database</parameter>
   </parameter>
   <services>
      <service id="component2" class="MyAlternativeComponent2"/>
   </services>
</container>

April 22 2011

PECL mogilefs 0.9.0 released

I just released 0.9.0 of PECL mogilefs. This is release comes with a few but small API breaks. Basically whenever there was no open connection, we returned false in the past. We no longer do that, instead we throw an exception of type MogileFsException. So the API breakage will be fairly visible. The complete list of changes:

  • Adding new methods setReadTimeout(float readTimeout) and getReadTimeout(). This can be used to set a differing read timeout to the connect timeout. In the past releases, the connect timeout (to the tracker) was used as a read timeout (to the storage nodes). From my experience the read timeout should be a little bit higher than the connect timeout.
  • Remove PHP max version limit so we no longer have to release a new version when PHP is released. This is what other PECL packages are doing, so I think this will work better.
  • Comply with stricter c99 standard. Yeah, nerd stuff
  • Fixed tests and made them more robust. Try them: PHP_TEST_EXECUTABLE=<php> php tests.php
  • Optimized mogilefs_sock_read() and introduced maximum message size (based on a patch from Andre Pascha of kwick.de). Less allocs, less frees. Good stuff
  • MogileFs::put() throws more exceptions: as said before

Comments, ideas, patches and anything else are more than welcome. Have fun with this release.

April 19 2011

Dependency Injection Container Refactorings, Part One


(c) Jil A. Brown

Working heavily with the Symfony2 Dependency Injection Container, I feel that we found some typical refactorings towards a DI container that emerge during the introduction of such a component. I want to write down the preliminary results of trying to systematize more or less as a draft. I will use the Symfony2 DI container configuration as an example but most of the refactorings should be applicable to other containers as well, some of them even to dependency injection without a container.

Make Dependency Explicit

This is typically the first step towards Dependency Injection: make a dependency explicit. There are three typical ways to do so, first is constructor injection, second is setter injection and third and less preferred is property injection. I roughly prefer constructor injection for invariant dependency in my domain and setter injection for infrastructure (setNotifier e.g.). Consider this example:

<?php
namespace Example;
class Client
{
    public function execute()
    {
        $dependency = new Dependency();
        $dependency->execute();
    }
}

Client creates a new instance of Dependency and call execute(). Bad for testing and for configuration, Dependency will always be hard coded there. To make it easier manageable we refactor towards setter injection:

<?php
namespace Example;
class Client
{
    public function setDependency(Dependency $dependency)
    {
        $this->_dependency = $dependency;
    }
 
    public function execute()
    {
        $this->_dependency->execute();
    }
}

Now we can manage Client in the DI container like this:

<?xml version="1.0"?>
<container xmlns="http://www.symfony-project.org/schema/dic/services">
    <services>
        <service name="example.client" class="Example\Client">
            <call method="setDependency">
                <argument type="service">
                    <service class="Example\Dependency"/>
                </argument>
            </call>
        </service>
    </services>
</container>

We see that the dependency is explicit: we specifically configure Example\Client and pass a specific Example\Dependency object.

Introduce Interface Injection

After a number of Explicit Dependency refactorings our configuration file for the service container will become huge. We will notice that we have common dependencies that are used at various places, an event manager for example. To fix that rapid growth we choose to utilize Interface Injection to ease configuration.

This is the configuration starting point:

<?xml version="1.0"?>
<container xmlns="http://www.symfony-project.org/schema/dic/services">
    <services>
        <service name="example.client" class="Example\Client">
            <call method="setDependency">
                <argument type="service">
                    <service class="Example\Dependency"/>
                </argument>
            </call>
        </service>
        <service name="example.anotherClient" class="Example\AnotherClient">
            <call method="setDependency">
                <argument type="service">
                    <service class="Example\Dependency"/>
                </argument>
            </call>
            <call method="setOtherDependency">
                <argument type="service">
                    <service class="Example\OtherDependency"/>
                </argument>
            </call>
        </service>
    </services>
</container>

We notice that both Example\Client and Example\AnotherClient depend on Example\Dependency. First of all we need an interface contracting setDependency. This is basically the Extract Interface refactoring. We call the newly extracted interface Example\DependencyAware.

The interface:

<?php
namespace Example;
interface DependencyAware
{
    public function setDependency(Dependency $dependency);
}

And we refactor both Example\Client and Example\AnotherClient to implement Example\DependencyAware.

Now we change our configuration to call setDependency no longer explicitly for Example\Client and Example\AnotherClient but for every object implementing Example\DependencyAware.

<?xml version="1.0"?>
<container xmlns="http://www.symfony-project.org/schema/dic/services">
    <services>
        <service name="example.client" class="Example\Client"/>
        <service name="example.anotherClient" class="Example\AnotherClient">
            <call method="setOtherDependency">
                <argument type="service">
                    <service class="Example\OtherDependency"/>
                </argument>
            </call>
        </service>
    </services>
    <interfaces>
        <interface class="Example\DependencyAware">
            <call method="setDependency">
                <argument type="service">
                    <service class="Example\Dependency"/>
                </argument>
            </call>
        </interface>
    </interfaces>
</container>

Expose Service

Really simple, but …
Expose Service is applied when a service has been only a dependency but should be used as a top level service. We start with the well known example:

<?xml version="1.0"?>
<container xmlns="http://www.symfony-project.org/schema/dic/services">
    <services>
        <service name="example.client" class="Example\Client">
            <call method="setDependency">
                <argument type="service">
                    <service class="Example\Dependency"/>
                </argument>
            </call>
        </service>
    </services>
</container>

Consider we want to expose Example\Dependency as a service directly, we need to change from the configuration above to
reference the service by ID.

<?xml version="1.0"?>
<container xmlns="http://www.symfony-project.org/schema/dic/services">
    <services>
        <service name="example.dependency" class="Example\Dependency"/>
        <service name="example.client" class="Example\Client">
            <call method="setDependency">
                <argument type="service" id="example.dependency"/>
            </call>
        </service>
    </services>
</container>

Simple.

Next topics would be: Introduce Parameter, Parametererize Service and Allow Environment Specific configuration

February 19 2011

PECL MogileFs 0.8.1 released

On Wednesday I 0.8.1 of PECL MogileFs has been released. The new version features a few important changes and fixes:

  • Changing timeout parameter for MogileFs::connect() to float to allow specifying microseconds. This is an important change if you want to do connection pooling for your trackers in PHP. You can now limit the time the client tries to connect to a tracker and connect to an alternative one if this fails
  • Connect timeout does not set read timeout. This change became necessary with the better connect timeout handling and is the whole reason there is a 0.8.1. The previous assumption was to reuse the connect timeout as read timeout. This is no longer feasible. If somebody needs the functionality of setting a specific read timeout, I would be happy to implement that as a specific option though. I personally have no use for it.
  • Fixing arginfo for MogileFs::put(). You dawg, I’ve heard you like reflections. So I’ve put some reflection into your reflection so you can reflect while you reflect
  • Adding read timeout handling. Andre Pascha of Kwick provided a patch for better read timeout handling. Previously read timeouts were silently ignored, this behavior has been fixed. Thanks!
  • Adding EOF check before reading/writing to a socket (Andre Pascha)

Also it’s marked as “beta” I’m fairly confident with this release. We already upgraded production environments on the newest version, so you could too.

Tags: MogileFS PECL PHP

February 02 2011

Suizidal

Die GWUP, die Gesellschaft zur wissenschaftlichen Erforschung von Parawissenschaften, plant kommenden Samstag massenweißes Homöopathie-Pillen-Geschlucke. In München wird auch geschluckt.

(via Kristian Köhntopp)

September 05 2010

PHP segfaulting with pecl/uuid and pecl/imagick

Ran into a bug yesterday, where http://pecl.php.net/uuid in combination with http://pecl.php.net/imagick yielded a segfault when using uuid_create(). GDB backtrace looks like this (without the exact place where it happens in libuuid, as there is unfortunatly no libuuid1-dbg-package in current Ubuntu versions):

gdb --silent --ex run --args php -r "var_dump(uuid_create());"
#0  0xb6e85321 in ?? () from /lib/libuuid.so.1
#1  0xb6e862bf in uuid_generate () from /lib/libuuid.so.1
#2  0xb6bcc67a in zif_uuid_create (ht=0, return_value=0xbffff1e8, return_value_ptr=0x0, this_ptr=0x0, return_value_used=1) at /usr/src/pecl-uuid-trunk/uuid.c:182
#3  0x0835d26a in zend_do_fcall_common_helper_SPEC (execute_data=0x894ed4c) at /build/buildd/php5-5.3.2/Zend/zend_vm_execute.h:313
#4  0x08333d8e in execute (op_array=0x891c464) at /build/buildd/php5-5.3.2/Zend/zend_vm_execute.h:104
#5  0x082fe283 in zend_eval_stringl (str=0xbffff998 "var_dump(uuid_create());", str_len=24, retval_ptr=0x0, string_name=0x871f2fc "Command line code")
    at /build/buildd/php5-5.3.2/Zend/zend_execute_API.c:1172
#6  0x082fe422 in zend_eval_stringl_ex (str=0xbffff998 "var_dump(uuid_create());", str_len=24, retval_ptr=0x0, string_name=0x871f2fc "Command line code", handle_exceptions=1)
    at /build/buildd/php5-5.3.2/Zend/zend_execute_API.c:1214
#7  0x082fe4a3 in zend_eval_string_ex (str=0xbffff998 "var_dump(uuid_create());", retval_ptr=0x0, string_name=0x871f2fc "Command line code", handle_exceptions=1)
    at /build/buildd/php5-5.3.2/Zend/zend_execute_API.c:1225
#8  0x083a0579 in main (argc=3, argv=0xbffff854) at /build/buildd/php5-5.3.2/sapi/cli/php_cli.c:1235

The interesting thing is, the crash happens in libuuid, but only if imagick is enabled. Let’s see what Valgrind says:

valgrind -q  php -r "var_dump(uuid_create());"
==25103== Invalid write of size 2
==25103==    at 0x5517321: ??? (in /lib/libuuid.so.1.3.0)
==25103==    by 0x55182BE: uuid_generate (in /lib/libuuid.so.1.3.0)
==25103==    by 0x57D0679: zif_uuid_create (uuid.c:182)
==25103==    by 0x835D269: zend_do_fcall_common_helper_SPEC (in /usr/bin/php5)
==25103==    by 0x8333D8D: execute (/build/buildd/php5-5.3.2/Zend/zend_vm_execute.h:104)
==25103==    by 0x82FE282: zend_eval_stringl (/build/buildd/php5-5.3.2/Zend/zend_execute_API.c:1172)
==25103==    by 0x82FE421: zend_eval_stringl_ex (/build/buildd/php5-5.3.2/Zend/zend_execute_API.c:1214)
==25103==    by 0x82FE4A2: zend_eval_string_ex (/build/buildd/php5-5.3.2/Zend/zend_execute_API.c:1225)
==25103==    by 0x83A0578: main (/build/buildd/php5-5.3.2/sapi/cli/php_cli.c:1235)
==25103==  Address 0x30 is not stack'd, malloc'd or (recently) free'd
==25103== 
==25103== 
==25103== Process terminating with default action of signal 11 (SIGSEGV)
==25103==  Access not within mapped region at address 0x30
==25103==    at 0x5517321: ??? (in /lib/libuuid.so.1.3.0)
==25103==    by 0x55182BE: uuid_generate (in /lib/libuuid.so.1.3.0)
==25103==    by 0x57D0679: zif_uuid_create (uuid.c:182)
==25103==    by 0x835D269: zend_do_fcall_common_helper_SPEC (in /usr/bin/php5)
==25103==    by 0x8333D8D: execute (/build/buildd/php5-5.3.2/Zend/zend_vm_execute.h:104)
==25103==    by 0x82FE282: zend_eval_stringl (/build/buildd/php5-5.3.2/Zend/zend_execute_API.c:1172)
==25103==    by 0x82FE421: zend_eval_stringl_ex (/build/buildd/php5-5.3.2/Zend/zend_execute_API.c:1214)
==25103==    by 0x82FE4A2: zend_eval_string_ex (/build/buildd/php5-5.3.2/Zend/zend_execute_API.c:1225)
==25103==    by 0x83A0578: main (/build/buildd/php5-5.3.2/sapi/cli/php_cli.c:1235)
==25103==  If you believe this happened as a result of a stack
==25103==  overflow in your program's main thread (unlikely but
==25103==  possible), you can try to increase the size of the
==25103==  main thread stack using the --main-stacksize= flag.
==25103==  The main thread stack size used in this run was 8388608.
Segmentation fault

Not really any more helpful. After two hours debugging the issue with the help of Mikko and Pierre we found out, that pecl/imagick is linked against libuuid too:

ldd /usr/lib/php5/20090626+lfs/imagick.so
    (...)
    libuuid.so.1 => /lib/libuuid.so.1 (0xb7086000)
    (...)

For whatever reason this is happening, this is most likely the root cause of the issue.

Solution (sort of)

pecl/uuid was loaded by /etc/php5/conf.d/uuid.ini and pecl/imagick by /etc/php5/conf.d/imagick.ini. As they are loaded in there alphabetical order, imagick initialized before uuid. Renaming /etc/php5/conf.d/uuid.ini to /etc/php5/conf.d/00-uuid.ini fixed the issue, as uuid is than initialized before imagick and the segmentation fault was gone.
Not sure about that, but maybe it would be a good idea to check in PHP_MINIT(uuid) in pecl/uuid if pecl/imagick has been initialized before and warn the user about it?

August 01 2010

Proof of Concept: Binary packed UUIDs as primary keys with Doctrine2 and MySQL

The Problem

For a project I need non-guessable synthetic primary keys. I will use them to construct URIs and these URIs need to be non-guessable. If I would use the traditional way of doing so, going the down the route of integer primary keys with auto increments, or using a sequence table an attacker could easily increment or decrement the integer to find some similar items. Next idea was to use UUIDs or GUIDs. These identifiers are globally unique, so this would work for primary keys too. Reading some documentation on the topic brought up the interesting issue of space usage. Storing the UUIDs in a CHAR column would be a huge waste of space compared to an integer primary key. As primary keys are referenced in related table, this would be a huge issue. Finally I found a trick storing there binary representation in a BINARY column. Doing that in MySQL is fairly easy:

INSERT INTO items SET id = UNHEX(REPLACE(UUID(), '-', '');

Selecting a human readable reasult is easy too:

SELECT HEX(id) FROM items;

Achieving the same thing in PHP is pretty straightforward too. You need the PECL extension UUID (pecl install uuid) and pack()/unpack():

<?php
$uuid = uuid_create(UUID_TYPE_TIME);
$uuid = str_replace("-", "", $uuid);
var_dump(pack('H*', $uuid));
string(16) "?Irp??ߐ
                   )??m"

Converting them back into there hex representation is similar:

<?php
var_dump(array_shift(unpack('H*', $binaryUuid)));
string(32) "d2f268509db211df9010000c29abf06d"

Doctrine2 integration

Next step would be integration with Doctrine2. To do so, we need to create a custom mapping type. I’m not using Doctrine2 for database abstraction, but for it’s object relational mapping capabilities so I ignore portability and concentrate on MySQL.

<?php
namespace Lars\Doctrine2\Types\Mysql;
use Doctrine\DBAL\Types\Type;
use Doctrine\DBAL\Platforms\AbstractPlatform;
 
class BinaryType extends Type
{
    const BINARY = 'binary';
 
    public function getSqlDeclaration(array $fieldDeclaration, AbstractPlatform $platform)
    {
        return sprintf('BINARY(%d)', $fieldDeclaration['length']);
    }
 
    public function getName()
    {       
        return self::BINARY;
    }   
      
    public function convertToPhpValue($value, AbstractPlatform $platform)
    {
        if ($value !== null) {
            $value= unpack('H*', $value);
            return array_shift($value);
        }
    }
 
    public function convertToDatabaseValue($value, AbstractPlatform $platform)
    {
        if ($value !== null) {
            return pack('H*', $value);
        }
    }
}

Now we are introducing the new type to Doctrine2 somewhere in our setup logic:

<?php
use Doctrine\DBAL\Types\Type;
Type::addType('binary', 'Lars\Doctrine2\Types\Mysql\BinaryType');

One issue I stumbled upon was the default Doctrine2 does. With MySQL it maps binary types to intermediate blob types (in the Doctrine2 type system). This default behavior is not configurable, so we need to patch Doctrine\DBAL\Schema\MySqlSchemaManager. I’m sure there is a more elegant way and I would love to receive some remarks here:

            case 'tinyblob':
            case 'mediumblob':
            case 'longblob':
            case 'blob':
            /** 
             * Commented out to make our custom mapping work
             * case 'binary':
             */         
            case 'varbinary':
                $type = 'blob';
                $length = null;
                break;

Last part is our entity:

<?php
namespace Lars\User\Domain;
 
/**
 * @Entity
 * @table(name="user",indexes={@index(name="user_email_idx",columns={"user_email"})})
 * @HasLifecycleCallbacks
 */
class User
{
    /**
     * @ID
     * @Column(type="binary",length=16,name="user_id")
     * @GeneratedValue(strategy="NONE")
     */
    protected $_id;
 
    /**
     * @Column(type="string",length=32,name="user_email")
     */
    protected $_email;
 
    public function changeEmail($email)
    {
        $this->_email = $email;
        return $this;
    }
 
    public function getId()
    {
        return $this->_id;
    }
 
    /**
     * @PrePersist
     */
    public function generateUuid()
    {
        $this->_id = str_replace('-', '', uuid_create(UUID_TYPE_TIME));
    }
}

The important part here is the createUuid()-method to generate the UUID once before persisting the domain object. With GeneratedValue(strategy="NONE") we told Doctrine not to generate the ID by itself and with HasLifecycleCallbacks we configure Doctrine to scan for lifecycle callback methods, so that generateUuid() will be called before persisting the entity.

Fetching an object by ID is as easy as ever, but don’t forget to convert the ID:

$user = $em->find(
    'Lars\User\Domain\User',
    pack('H*', '16aec29e9db011df8013000c29abf06d')
);

Further ideas

The whole UUID should be refactored towards an UUID value object to encapsulate UUID creation and binary conversion.

July 13 2010

April 06 2010

Hagia Sophia
Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl