Inline Python macros in C

I had a bit of time today to do some more work on that which is soon to be called something other than csnake. I’ve added a couple of features:

You can now define custom pragmas, providing a Python handler function. Unfortunately Boost Wave, which csnake uses for preprocessing, only provides callbacks for pragmas that start with “#pragma wave”.
Built-in pragmas and macros to support defining Python macros inline in C/C++ code.
A main program in the csnake package. So you can now do “python3.2 -m csnake ”, which will print out the preprocessed source.

So for example, you can do something like as follows, entirely in one C++ file:

// factorial macro.
py_def(factorial(n))
    import math
    f = math.factorial(int(str(n)))
    return [Token(T_INTLIT, f)]
py_end

int main()
{
    std::cout << factorial(3) << std::endl;
    return 0;
}

This works as follows: py_def and py_end are macros, which in turn use the _Pragma operator with built-in pragmas. Those pragmas are handled by csnake, and signal to collect the tokens in between. When the py_end macro is reached, the tokens are concatenated and a Python function macro is born.

I’m intending to do additonal “Python blocks”, including at least a py_for block, which will replicate the tokens within the block for each iteration of a loop.

There’s one big problem with the py_def support at the moment, which is that the tokens go through the normal macro replacement procedure. I think I’ll have a fix for that soon.

Posted June 18, 2011.

Name clash

Rats, looks like I didn’t do enough homework on the name “csnake”. Turns out there’s another Python-based project called CSnake: https://github.com/csnake-org/CSnake/. Incidentally - and not that it matters much - I had the name picked out before that project was released. Now I need to think up another puntastic name.

Posted June 15, 2011.

C-Preprocessor Macros in Python

TL;DR: I’ve started a new project, csnake, which allows you to write your C preprocessor macros in Python.

Long version ahead…

You want to do what now?

I had this silly idea a couple of years ago, to create a C preprocessor in which macros can be defined in Python. This was borne out of me getting sick of hacking build scripts to generate code from data, but pursued more for fun.

I started playing around with Boost Wave, which is a “Standards conformant, and highly configurable implementation of the mandated C99/C++ preprocessor functionality packed behind an easy to use iterator interface”. With a little skulduggery coding, I managed to define macros as C++ callable objects taking and returning tokens. Then it was a simple matter of adding a Python API.

The Result

What we end up with is a Python API that looks something like this:

import sys
from _preprocessor import *
def factorial(n):
    import math
    return [Token(T_INTLIT, math.factorial(int(str(n))))]

p = Preprocessor(“test.cpp”)
p.define(factorial)
for t in p.preprocess():
    sys.stdout.write(str(t))
sys.stdout.write(”\n”)

Which will take…

int main(){return factorial(3);}

And give you…

int main(){return 6;}

If it’s not immediately clear, it will translate “factorial()” into an integer literal of the factorial of the input token. This isn’t a very interesting example, so if you can imagine a useful application, let me know ;)

The above script will work with the current code, using Python 3.2, compiling with GCC. If you’d like to play with it, grab the code from the csnake github repository. Once you’ve got it, run “python3.2 setup.py build”. Currently there is just an extension module (“csnake._preprocessor”), so set your PYTHONPATH to the build directory and play with that directly.

I have chosen to make csnake Python 3.2+ only, for a couple of major reasons:

All the cool kids are doing it: it’s the way of the future. But seriously, Python 3.x needs more projects to become more mainstream.
Python 3.2 implements PEP 384, which allows extension modules to be used across Python versions. Finally. I always hated that I had to recompile for each version.

… and one very selfish (but minor) reason: I wanted to modernise my Python knowledge. I’ve been ignoring Python 3.x for far too long.

The Road Ahead

What I’ve done so far is very far from complete, and not immediately useful. It may never be very useful. But if it is to be, it would require at least:

A way of detecting (or at least configuring) pre-defined macros and include paths for a target compiler/preprocessor. A standalone C preprocessor isn’t worth much. It needs to act like or delegate to a real preprocessor, such as GCC.
A #pragma to define Python macros in source, or perhaps if I’m feeling adventurous, something like #pydefine.
A simple, documented Python API.
A simple command line interface with the look and feel of a standard C preprocessor.
Some unit tests.

I hope to add these in the near future. I’ve had code working for the first two points, and the remaining points are relatively simple. I will post again when I have made some significant progress.

Posted June 11, 2011.

Pushy 0.5 Released

After just a few short months since the 0.4 release, I am pleased to announce Pushy 0.5. As usual, this release is mostly bug fixes, with a couple of features to make life easier. Instructions on downloading and installing can be found here:
http://awilkins.id.au/pushy/install.php

On top of standard bug fixes and features, I have made an effort to beautify some of the code, and speed everything up in general, based on code profiling. This will be evident if your Pushy program performs a lot of proxying: 0.5 performs up to ~2x faster than 0.4 in my tests.

New Features

There are two new features in Pushy 0.5: with-statement support, and a simple “execute” interface.

With-statement support
Pushy connections now support the “with” statement. This is functionally equivalent to wrapping a Pushy connection with “contextlib.closing”: it will close the connection when the with-statement exits. For example:

with pushy.connect(“local:“) as con:
    …

Compile/execute interface
Previously if you wanted to execute a statement in the remote interpreter, you would have to first obtain the remote “compile” built-in function, invoke it to compile a remote code object, and then evaluate that with the “connection.eval” method. For example, to execute “print ‘Hello, World!’”:

code_obj = con.eval(“compile”)(“print ‘Hello, World!’”, “<filename>”, “exec”)
con.eval(code_obj, locals=…, globals=…)

This is a bit unfriendly, so I thought it would be a good idea to add a simpler interface. There are two new methods: “connection.compile” and “connection.execute”. The former will compile a remote code object, and the latter executes either a string (statement) or a function. Continuing our very simple example, we get the much simpler:

con.execute(“print ‘Hello, World!’”)

Under the hood, this will do exactly what we would previously have had to do manually: remotely compile the statement and then evaluate the resultant code object. Now that suffices for a very simple use case like we have discussed above, but what if we want to execute a series of statements? Wouldn’t it be nice if we could remotely execute a locally defined function? Well, now you can, using “connection.compile”.

def local_function(*args, **kwargs):
    return (args, kwargs)
remote_function = con.compile(local_function)
remote_function(…)

The “connection.compile” method will take a locally defined function and define it in the remote interpreter. It will then return a reference to the remote function, so that we can invoke it as if it were a local function. This allows the user to define complex and/or expensive logic that will be executed entirely remotely, only communicating back to the peer when proxy objects are involved, or to return the result.

Bug Fixes

The most noteworthy bugs fixed in 0.5 are:

#738216 Proxied old-style classes can’t be instantiated
Old-style/classic classes (i.e. those that don’t inherit from “object”) previously could not be instantiated via a Pushy connection. This has been rectified by defining an additional proxy type for old-style classes.

#733026 auto-importer doesn’t support modules not imported by their parent
Previously, importing a remote submodule via “connection.modules” would only work if the parent of the submodule imported it. For example, “connection.modules.os.path” would work since os imports os.path. If the parent does not import the submodule, Pushy would fail to import the module. This has been fixed in 0.5; remotely imported modules will provide the same sort of automagical importing interface as “connection.modules”.

#734311 Proxies are strongly referenced and never deleted; del isn’t transmitted to delete proxied object
This is the most significant change in Pushy 0.5. Since inception, Pushy has never released proxy objects, meaning eventual memory bloating for objects that would otherwise by garbage collected. As of 0.5, Pushy will now garbage collect proxies by default. Every so often (by default, 5 seconds), Pushy will send a message to the peer to let it know which proxies have been garbage collected. This allows the peer to release the proxied objects, and reclaim resources.

#773811 Daemon/socket transport is slow
Due to the many small messages sent back and forth by Pushy, the “daemon” transport was found to be quite slow in stress testing. This is due to Nagle’s Algorithm, which delays sending network packets so that they can be combined. This has a considerable impact on the latency of Pushy’s operations, and so it is now disabled by Pushy.

Enjoy!

Posted May 23, 2011.

Fabric and Pushy, together at last

A workmate of mine recently brought Fabric to my attention. If you’re not familiar with Fabric, it is a command-line utility for doing sys-admin type things over SSH. There’s an obvious overlap with Pushy there, so I immediately thought, “Could Pushy be used to back Fabric? What are the benefits, what are the downsides?”

A couple of fairly significant benefits:

Pushy does more than just SSH, so Fabric could conceivably be made to support additional transports by using Pushy (which uses Paramiko), rather than using Paramiko directly.
Pushy provides access to the entire Python standard library, which is largely platform independent. So you could do things like determine the operating system name without using “uname”, which is a *NIX thing. That’s a trivialisation, but you get the idea I’m sure.

One big fat con:

Pushy absolutely requires Python on the remote system (as well as SSH, of course, but Fabric requires that anyway.) So requiring Pushy would mean that Fabric would be restricted to working only with remote machines that have Python installed. Probably a safe bet in general, but not ideal.

How about using Pushy if Python is available, and just failing gracefully if it doesn’t? This turns out to be really easy to do, since Fabric and Pushy both use Paramiko. So I wrote a Fabric “operation” to import a remote Python module. Under the covers, this piggy-backs a Pushy connection over the existing Paramiko connection created by Fabric. I’ll bring this to the attention of the Fabric developers, but I thought I’d just paste it here for now.

First an example of how one might use the “remote_import” operation. Pass it a module name, and you’ll get back a reference to the remote module. You can then use the module as you would use the module as if you had done a plain old “import ”.

fabfile.py

from fabric_pushy import remote_import

def get_platform():
    platform = remote_import(“platform”)
    print platform.platform()

You just execute your fabfile as per usual, and the “remote_import” operation will create a Pushy connection to each host, import the remote Python interpreter’s standard platform module, and call its platform method to determine its platform name. Easy like Sunday morning…

    $ fab -H localhost get_platform
    [localhost] Executing task ‘get_platform’
    Linux-2.6.35-27-generic-i686-with-Ubuntu-10.10-maverick

    Done.
    Disconnecting from localhost… done.

fabric_pushy.py

from fabric.state import env, default_channel
from fabric.network import needs_host

import pushy
import pushy.transport
from pushy.transport.ssh import WrappedChannelFile

class FabricPopen(pushy.transport.BaseTransport):
    “””
    Pushy transport for Fabric, piggy-backing the Paramiko SSH connection
    managed by Fabric.
    “””

    def init(self, command, address):
        pushy.transport.BaseTransport.init(self, address)

        # Join arguments into a string
        args = command
        for i in range(len(args)):
            if “ ” in args[i]:
                args[i] = “‘%s’” % args[i]
        command = “ “.join(args)

        self.channel = default_channel()
        self.channel.exec_command(command)
        self.stdin  = WrappedChannelFile(self.channel.makefile(“wb”), 1)
        self.stdout = WrappedChannelFile(self.channel.makefile(“rb”), 0)
        self.stderr = self.__channel.makefile_stderr(“rb”)

    def del(self):
        self.close()

    def close(self):
        if hasattr(self, “stdin”):
            self.stdin.close()
            self.stdout.close()
            self.stderr.close()
        self.__channel.close()

# Add a “fabric” transport”, which piggy-backs the existing SSH connection, but
# otherwise operates the same way as the built-in Paramiko transport.
class pseudo_module:
    Popen = FabricPopen
pushy.transports[“fabric”] = pseudo_module

###############################################################################

# Pushy connection cache
connections = {}

@needs_host
def remote_import(name, python=“python”):
    “””
    A Fabric operation for importing and returning a reference to a remote
    Python package.
    “””

    if (env.host_string, python) in connections:
        conn = connections[(env.host_string, python)]
    else:
        conn = pushy.connect(“fabric:“, python=python)
        connections[(env.host_string, python)] = conn
    m = getattr(conn.modules, name)
   if “.” in name:
        for p in name.split(“.”)[1:]:
            m = getattr(m, p)
    return m

Posted March 9, 2011.

Pushy 0.4 Released

I'm pleased to announce a new release of Pushy, version 0.4. This release has been concerned primarily with improving the existing code quality, fixing bugs that have been found in the use of 0.3, and adding missing functionality to the Java API.

You can install Pushy with easy_install ("easy_install pushy"), or grab the source distribution from here: https://launchpad.net/pushy/+download. I have also been working on some more documentation and getting-started type content on the brand spanking new website: http://awilkins.id.au/pushy.

Do you use Pushy? What for? Do you have any ideas for improvements? Leave a comment here, or drop me an email at [email protected]. I'd really appreciate some feedback!

Posted March 5, 2011.

Java Pushy API

It’s been some time since I’ve spruiked Pushy, so here we go. One of my colleagues was talking the other day about how he had implemented a netcat-like service for STAF, an automation framework aimed at testing. This got me thinking about how this could be done relatively easily, using the pushy.net package, something I haven’t written about much, primarily because it’s still a work in progress.

Back in version 0.3, I introduced a Java API to Pushy, as I mentioned in an earlier post. I briefly mentioned the incorporation of packages which mimic, and in many cases are interface compatible with, packages in the Java standard library such as java.io and java.net.

The pushy.net package currently contains three main classes:

RemoteInetSocketAddress (extends java.net.InetSocketAddress)
RemoteSocket (extends java.net.Socket), and
RemoteServerSocket (extends java.net.ServerSocket).

RemoteInetSocketAddress simply provides a means of creating an InetSocketAddress whose address is resolved at the remote host. RemoteSocket is a wrapper around a remote Python socket object, but extends java.net.Socket to provide a familiar interface to Java developers. Similarly, RemoteServerSocket extends java.net.ServerSocket, and wraps a remote Python socket object.

So how about netcat emulation? Well, I won’t cover the whole implementation of a netcat clone, as that would be a non-trivial undertaking. But I will show you one of the fundamental requirements: to bind a server socket, accept a client connection, and print out the data received from that client.

Step 1. Bind a socket, and listen for connections.

import java.net.ServerSocket;
import java.net.Socket;
import pushy.Client;

public class Test {
    public static void main(String[] args) throws Exception {
        Client conn = new Client(“local:“);
        try {
            ServerSocket server = new pushy.net.RemoteServerSocket(conn, 0);
            try {
            } finally {
                server.close();
            }
        } finally {
            conn.close();
        }
    }
}

In this code snippet, we’re creating a RemoteServerSocket, and assigning it to a java.net.ServerSocket, to illustrate interface compatibility. The first argument to the constructor is the pushy.Client object we previously created, and the second argument is the port to bind to. Specifying a port of zero, means that we want to bind to an ephemeral port.

The creation of the RemoteServerSocket involves creating a Python socket object on the remote host, and performing the bind and listen methods on it.

Step 2. Accept a client connection.

Socket client = server.accept();
try {
} finally {
    client.close();
}

Here we can see that accepting a client connection is exactly as we would do with a standard java.net.ServerSocket. This probably isn’t surprising, since we’ve upcasted our RemoteServerSocket to a plain old ServerSocket. One thing of interest here is that the Server object returned is in fact a RemoteServerSocket, wrapping a remote Python socket object.

Step 3. Read data from the client connection.

InputStream in = client.getInputStream();
byte[] buf = new byte[1024];
int nread = in.read(buf, 0, buf.length);
while (nread != -1) {
    System.out.write(buf, 0, nread);
    System.out.flush();
    nread = in.read(buf, 0, buf.length);
}

Et voila! We can read the remote socket’s output via a java.io.InputStream object, returned by the getInputStream method, which is overridden by RemoteSocket. One thing you may have noticed: to run this all on the local host, sans Pushy, you could substitute the right-hand side of the initial ServerSocket construction with a standard ServerSocket, and the rest of the code would remain unchanged.

There are a few defects in the 0.3 release related to the pushy.net package, which will prevent these examples from working. I have rectified them in the process of writing this post. If you grab the trunk, it should all work nicely. There is one defect remaining: the InputStream returned by RemoteSocket is based on a a file returned by Python’s socket.makefile method. This differs from the InputStream returned by Java’s standard Socket, in that a request for N bytes will not return until all N bytes, or EOF, are received. I hope to have this fixed in the coming days.

Posted February 10, 2011. Tags: java, pushy.

Facebook Puzzles

So a couple of weeks ago I came across a forum post on XKCD about programming puzzles.Figuring that’s a significantly better way to while away the hours I’d previously been spending playing computer games, I thought I’d take a crack.

So after browsing around, I came across Facebook Puzzles, which seems to be part of Facebook’s recruitment tools. Solve some puzzles, apply for a job, and say how rad you are for having solved all of their puzzles. Something like that. Anyway, I had no intention of applying to Facebook, I just wanted to revive my fading computer science knowledge, and have a bit of fun in the progress. I don’t see what the big fuss is about working there; so many other companies are producing much more interesting, constructive things.

Naturally I went straight to the most difficult puzzle on the list, Facebull. It didn’t take long for me to realise this was an NP-hard problem. It’s fairly straight forward to see that this is a graph problem. A little more formally, we’re given a directed graph (digraph), and we’re asked to find a minimum cost strongly-connected sub-digraph, containing all vertices, but a subset of the edges.

Spoilers ahead. Turn back now if you don’t want any help.

Originally I had thought that it was the Asymmetric Traveling Salesman Problem (ATSP) in disguise, which is just the Traveling Salesman Problem (TSP) with directed edges. I spent a couple of hours here and there reading the literature, since I didn’t know anything about exact solutions (which is what the puzzle requires), and was only vaguely aware of approximation algorithms (which might help, for example, to reduce the search space.) For anyone wanting to read up on the TSP problem, I would definitely recommend the survey paper The Traveling Salesman Problem, by Jünger, Reinelt and Rinaldi. Also, this website provides a great, gentle introduction.

Not too long after, it dawned on me that the problem was not ATSP at all, since vertices (“compounds”) might be visited more than once. For example, consider a graph with the following edges:
(C1, C2), (C2, C1)
(C1, C3), (C3, C1)
This could not be solved by the ATSP, since we would have to visit C1 twice to complete a cycle from C1 to C2 to C3 and back to C1. However, it is required to be solvable for the Facebull puzzle. Back to the drawing board…

The general approach I took was similar to the one that would be taken for ATSP, though, and that is to use a “Branch and Bound” algorithm. This is a fancy way of saying, enumerate all possible solutions, backtracking at the earliest possible opportunity by computing lower and upper bounds. For example, we might start with an upper bound of infinity, and update it whenever we find a candidate solution with a lower cost than the upper bound. A lower bound can be computed by taking into a constraint: each vertex must have an in-degree of at least one, and an out-degree of at least one. By remembering the cost of each edge in the graph, we can easily compute a lower bound.

Is this enough? Perhaps, but we can do better. One more thing we can do is to prune edges. For example, if we have the edges (Ci, Cj), (Ci, Ck), and (Ck, Cj), and the sum of the cost of the latter two is less than the cost of the first, then we can eliminate the first.

In total, I submitted my solution nine times. Why so many, you might ask? Because I didn’t parse the input correctly, and didn’t realise this while I was banging my head against the problem for two weeks. I wasn’t handling white space at the end of the file, and I (foolishly) assumed that the test cases would not assign multiple edges to the same pair of vertices. Why would you have such a trivial test case to trip people up, when the underlying puzzle is hard enough as it is? It turns out my accepted solution (written in C++) ran in 300ms, so I guess the focus was more or less centred on correctness, rather than speed. That was a little frustrating, but at least I had some fun and learnt a few tricks along the way.

There is still something wrong with my solution, however. It is not finding the optimal solution to some of the test cases people have published. But it passed the puzzle robot, so that’s the main thing that matters. I had one more optimisation, which was to identify edges which are absolutely required, up-front. This can be done very cheaply, and may significantly reduce the search space in sparse graphs. This optimisation was interacting badly with my enumeration function, though; I’m not sure what the problem is, perhaps I’ll go back and take another look some time.

Now, back to wasting my time playing computer games…

Posted February 6, 2011.

Pushy Public Documentation

So, I finally got around to writing some proper documentation for Pushy.

I had originally used Epydoc to extract docstrings, and generate API documents, which I have been hosting. Then I realised I could publish HTML to PyPI, so I thought I’d do something a little more friendly than presenting the gory details of the API.

In the past I’ve used Asciidoc, a lightweight markup language, in the vein of Wiki markup languages. I found Asciidoc fairly simply to write, and there is a standard tool for processing and producing various output, including of course HTML. I wanted to make my documentation to have the look and feel of the Python standard library, so I’ve been looking into reStructuredText.

I have to say that reStructuredText is very easy to learn, and Sphinx, which is the processing tool used to generate the HTML output for the Python documentation, is a pleasure to use. The format of reStructuredText is similar to that of Asciidoc. So far I don’t have any particular affinity to either - I mainly went with reStructuredText/Sphinx for the Python documentation theming.

Posted August 7, 2010.

Java-to-Python

Did you ever want to call Python code from Java?

Until Java 5, I wasn’t much of a fan of the Java language. The addition of enums, generics, and some syntactic sugar (e.g. foreach) makes it more bearable to develop in. But all of this doesn’t change the fact that Java purposely hides non-portable operating system functions, such as signal handling and Windows registry access. Sure, you can do these things with JNI, but that makes application deployment significantly more complex.

Python, on the other hand, embraces the differences between operating systems. Certainly where it makes sense, platform-independent interfaces are provided - just like Java, and every other modern language. But just because one platform doesn’t deal in signals shouldn’t mean that we can’t use signals on Linux/UNIX.

I realise it’s probably not the most common thing to do, but calling Python code from Java needn’t be difficult. There’s Jython, which makes available a large portion of the Python standard library, but still has some way to go. Some modules are missing (e.g. _winreg, ctypes), and others are incomplete or buggy. That’s not to say that the project isn’t useful, it’s just not 100% there yet.

Enter Pushy (version 0.3). In version 0.3, a Java API was added to Pushy, making it possible to connect to a Python interpreter from a Java program, and access objects and invoke functions therein. So, for example, you now have the ctypes module available to your Java program, meaning you can load shared objects / DLLs and call arbitrary functions from them. See below for a code sample.

import pushy.Client;
import pushy.Module;
import pushy.PushyObject;

public class TestCtypes {
    public static void main(String[] args) throws Exception {
        Client client = new Client(“local:“);
        try {
            // Import the “ctypes” Python module, and get a reference to the
            // GetCurrentProcessId function in kernel32.
            Module ctypes = client.getModule(“ctypes”);
            PushyObject windll = (PushyObject)ctypes.getattr(“windll”);
            PushyObject kernel32 = (PushyObject)windll.getattr(“kernel32”);
            PushyObject GetCurrentProcessId =
                (PushyObject)kernel32.getattr(“GetCurrentProcessId”);

            // Call GetCurrentProcessId. Note that the Python interpreter is
            // running in a separate process, so this is NOT the process ID
            // running the Java program.
            Number pid = (Number)GetCurrentProcessId.call();
            System.out.println(pid);
        } finally {
            client.close();
        }
    }
}

Neat, eh? What I have demonstrated here is the following:

Connecting a Java program to a freshly created Python interpreter on the local host.
Importing the ctypes module therein, and then getting a reference to kernel32.dll (this assumes Windows, obviously. It would be much the same on Linux/UNIX platforms.)
Executing the GetCurrentProcessId function to obtain the process ID of the Python interpreter, and returning the result as a java.lang.Number object.

The final point is an important one. Pushy automatically casts types from Python types to their Java equivalent. For example, a Python dictionary object will be returned as a java.util.Map. If that map is modified, then the changes will be made in the remote dictionary object also. Tuples, which are immutable, are returned as arrays, whilst Python lists are returned as java.util.List objects.

The Pushy Java API aims to provide Java standard library equivalent interfaces to Python standard library modules where possible. For example, there is a pushy.io package for dealing with files in the Python interpreter using java.io.-like classes. Similarly, a pushy.net package exists for performing socket operations using a java.net-like interface.

One can also connect to SSH hosts, as is possible through the Pushy Python API. This is achieved by creating a connection to a local Python interpreter, and thence to the remote system using the Pushy Python API in the subprocess. This is all done transparently if a target other than “local:” is specified in the Java program.

Enjoy!