wraptypes is a general utility for creating ctypes wrappers from C header
files. The front-end is tools/wraptypes/wrap.py
, for usage:
python tools/wraptypes/wrap.py -h
There are three components to wraptypes:
handle_
methods on the class CParser in a similar manner to a
SAX parser.handle_
methods on the class
CtypesParser.The front-end wrap.py
provides a simple subclass of CtypesParser
,
CtypesWrapper
, which writes the ctypes declarations found to a file in a
format that can be imported as a module.
The parsers are built upon a modified version of PLY, a Python implementation
of lex and yacc. The modified source is included in the wraptypes
directory. The modifications are:
The first time the parsers are run (or after they are modified), PLY creates
pptab.py
and parsetab.py
in the current directory. These are
the generated state machines, which can take a few seconds to generate.
The file parser.out
is created if debugging is enabled, and contains the
parser description (of the last parser that was generated), which is essential
for debugging.
The grammar and parser are defined in preprocessor.py
.
There is only one lexer state. Each token has a type which is a string (e.g.
'CHARACTER_CONSTANT'
) and a value. Token values, when read directly from
the source file are only ever strings. When tokens are written to the output
list they sometimes have tuple values (for example, a PP_DEFINE
token on
output).
Two lexer classes are defined: PreprocessorLexer
, which reads a stack of
files (actually strings) as input, and TokenListLexer
, which reads from a
list of already-parsed tokens (used for parsing expressions).
The preprocessing entry-point is the PreprocessorParser
class. This
creates a PreprocessorLexer
and its grammar during construction. The
system include path includes the GCC search path by default but can be
modified by altering the include_path
and framework_path
lists. The
system_headers
dict allows header files to be implied on the search path
that don’t exist. For example, by setting:
system_headers['stdlib.h'] = '''#ifndef STDLIB_H
#define STDLIB_H
/* ... */
#endif
'''
you can insert your own custom header in place of the one on the filesystem. This is useful when parsing headers from network locations.
Parsing begins when parse
is called. Specify one or both of a filename
and a string of data. If debug
kwarg is True, syntax errors dump the
parser state instead of just the line number where they occurred.
The production rules specify the actions; these are implemented in
PreprocessorGrammar
. The actions call methods on PreprocessorParser
,
such as:
include(self, header)
, to push another file onto the lexer.include_system(self, header)
, to search the system path for a file to
push onto the lexererror(self, message, filename, line)
, to signal a parse error. Not
all syntax errors get this far, due to limitations in the parser. A parse
error at EOF will just print to stderr.write(self, tokens)
, to write tokens to the output list. This is
the default action when no preprocessing declaratives are being parsed.The parser has a stack of ExecutionState
, which specifies whether the
current tokens being parsed are ignored or not (tokens are ignored in an
#if
that evaluates to 0). This is a little more complicated than just a
boolean flag: the parser must also ignore #elif conditions that can have no
effect. The enable_declaratives
and enable_elif_conditionals
return
True if the top-most ExecutionState
allows declaratives and #elif
conditionals to be parsed, respecitively. The execution state stack is
modified with the condition_*
methods.
PreprocessorParser
has a PreprocessorNamespace
which keeps track of
the currently defined macros. You can create and specify your own namespace,
or use one that is created by default. The default namespace includes GCC
platform macros needed for parsing system headers, and some of the STDC
macros.
Macros are expanded when tokens are written to the output list, and when
conditional expressions are parsed.
PreprocessorNamespace.apply_macros(tokens)
takes care of this, replacing
function parameters, variable arguments, macro objects and (mostly) avoiding
infinite recursion. It does not yet handle the #
and ##
operators,
which are needed to parse the Windows system headers.
The process for evaluating a conditional (#if
or #elif
) is:
PP_IF
or PP_ELIF
and NEWLINE
are expanded
by apply_macros
.TokenListLexer
.ConstantExpressionParser
. This parser
uses the ConstantExpressionGrammar
, which builds up an AST of
ExpressionNode
objects.parse
is called on the ConstantExpressionParser
, which returns the
resulting top-level ExpressionNode
, or None
if there was a syntax
error.evaluate
method of the ExpressionNode
is called with the
preprocessor’s namespace as the evaluation context. This allows the
expression nodes to resolve defined
operators.evaluate
is always an int; non-zero values are treated as
True.Because pyglet requires special knowledge of the preprocessor declaratives
that were encountered in the source, these are encoded as pseudo-tokens within
the output token list. For example, after a #ifndef
is evaluated, it
is written to the token list as a PP_IFNDEF
token.
#define
is handled specially. After applying it to the namespace, it is
parsed as an expression immediately. This is allowed (and often expected) to
fail. If it does not fail, a PP_DEFINE_CONSTANT
token is created, and the
value is the result of evaluatin the expression. Otherwise, a PP_DEFINE
token is created, and the value is the string concatenation of the tokens
defined. Special handling of parseable expressions makes it simple to later
parse constants defined as, for example:
#define RED_SHIFT 8
#define RED_MASK (0x0f << RED_SHIFT)
The preprocessor can be tested/debugged by running preprocessor.py
stand-alone with a header file as the sole argument. The resulting token list
will be written to stdout.
The lexer for CParser
, CLexer
, takes as input a list of tokens output
from the preprocessor. The special preprocessor tokens such as PP_DEFINE
are intercepted here and handled immediately; hence they can appear anywhere
in the source header file without causing problems with the parser. At this
point IDENTIFIER
tokens which are found to be the name of a defined type
(the set of defined types is updated continuously during parsing) are
converted to TYPE_NAME
tokens.
The entry-point to parsing C source is the CParser
class. This creates a
preprocessor in its constructor, and defines some default types such as
wchar_t
and __int64_t
. These can be disabled with kwargs.
Preprocessing can be quite time-consuming, especially on OS X where thousands
of #include
declaratives are processed when Carbon is parsed. To minimise
the time required to parse similar (or the same, while debugging) header
files, the token list from preprocessing is cached and reused where possible.
This is handled by CPreprocessorParser
, which overrides push_file
to
check with CParser
if the desired file is cached. The cache is checked
against the file’s modification timestamp as well as a “memento” that
describes the currently defined tokens. This is intended to avoid using a
cached file that would otherwise be parsed differently due to the defined
macros. It is by no means perfect; for example, it won’t pick up on a macro
that has been defined differently. It seems to work well enough for the
header files pyglet requires.
The header cache is saved and loaded automatically in the working directory
as .header.cache
. The cache should be deleted if you make changes to the
preprocessor, or are experiencing cache errors (these are usually accompanied
by a “what-the?” exclamation from the user).
The actions in the grammar construct parts of a “C object model” and call
methods on CParser
. The C object model is not at all complete, containing
only what pyglet (and any other ctypes-wrapping application) requires. The
classes in the object model are:
declarator
(see below), type
(a Type object) and
storage
(for example, ‘typedef’, ‘const’, ‘static’, ‘extern’, etc).identifier
(the name of it, None if the declarator is abstract, as in
some function parameter declarations), an optional initializer
(currently ignored), an optional linked-list of array
(giving the
dimensions of the array) and an optional list of parameters
(if the
declarator is a function).pointer
to
another declarator.array
to the next array dimension, if any.type
(Type object), storage
and declarator
.qualifiers
(e.g. ‘const’, ‘volatile’, etc) and
specifiers
(the meaty bit).'int'
or 'Foo'
or
'unsigned'
. Note that types can have multiple TypeSpecifiers; not
all combinations are valid.is_union
is True)
type. tag
gives the optional foo
in struct foo
and
declarations
is the meat (an empty list for an opaque or unspecified
struct).tag
gives the optional
foo
in enum foo
and enumerators
is the list of Enumerator
objects (an empty list for an unspecified enum).name
and
expression
, an ExpressionNode object.The ExpressionNode
object hierarchy is similar to that used in the
preprocessor, but more fully-featured, and using a different
EvaluationContext
which can evaluate identifiers and the sizeof
operator (currently it actually just returns 0 for both).
Methods are called on CParser as declarations and preprocessor declaratives are parsed. The are mostly self explanatory. For example:
#ifndef
was encountered testing the macro name
in file
filename
at line lineno
.declaration
is an instance of Declaration.These methods should be overridden by a subclass to provide functionality.
The DebugCParser
does this and prints out the arguments to each
handle_
method.
The CParser
can be tested in isolation by running it stand-alone with the
filename of a header as the sole argument. A DebugCParser
will be
constructed and used to parse the header.
CtypesParser
is implemented in ctypesparser.py
. It is a subclass of
CParser
and implements the handle_
methods to provide a more
ctypes-friendly interpretation of the declarations.
To use, subclass and override the methods:
#define
).typedef
declaration. See below for type of ctype
.static
declaration.Types are represented by instances of CtypesType
. This is more easily
manipulated than a “real” ctypes type. There are subclasses for
CtypesPointer
, CtypesArray
, CtypesFunction
, and so on; see the
module for details.
Each CtypesType
class implements the visit
method, which can be used,
Visitor pattern style, to traverse the type hierarchy. Call the visit
method of any type with an implementation of CtypesTypeVisitor
: all
pointers, array bases, function parameters and return types are traversed
automatically (struct members are not, however).
This is useful when writing the contents of a struct or enum. Before writing
a type declaration for a struct type (which would consist only of the struct’s
tag), visit
the type and handle the visit_struct
method on the visitor
to print out the struct’s members first. Similarly for enums.
ctypesparser.py
can not be run stand-alone. wrap.py
provides a
straight-forward implementation that writes a module of ctypes wrappers. It
can filter the output based on the originating filename. See the module
docstring for usage and extension details.