API

computerwords.pluginsrc

Writing a plugin for Computer Words

Summary: write a class inheriting from computerwords.plugin.CWPlugin, put it in a module accessible in PYTHONPATH, and add it to a list of module paths under the "plugins" key in your config file.

1. Inherit from CWPlugin

from computerwords.cwdom.nodes import CWTagNode, CWTextNode
from computerwords.plugin import CWPlugin

class MyPlugin(CWPlugin):
    # we want to add some things to the config file
    CONFIG_NAMESPACE = 'my_plugin'

    # here are our default values
    def get_default_config(sel):
        return {
            "files_to_read": []
        }

    # override this method to write to the config after it has
    # been read
    def postprocess_config(self, config):
        local_config = config["my_plugin"]
        local_config["file_contents"] = {}
        for path in local_config["files_to_read"]:
            with open(path, 'r') as f:
                local_config["file_contents"][path] = f.read()

    # If you're writing a custom tag, this is where you
    # implement it.
    def add_processors(self, library):
        @library.processor('under-construction')
        def process_my_tag(tree, node):
            tree.replace(
                node,
                CWTagNode(
                    'marquee',
                    {'class': 'under-construction'},
                    [CWTextNode('Under Construction')]))

2. Put your plugin in PYTHONPATH

You can either install your plugin as a package or just add PYTHONPATH=$PYTHONPATH:$PATH_TO_MODULE_DIR in front of your invocation of python -m computerwords. Kind of like this:

# plugin file is at plugins/my_plugin.py
PYTHONPATH=$PYTHONPATH:./plugins \
    python -m computerwords \
        --conf docs/conf.json

3. Add your plugin to the config file

{
    "plugins": [
        "my_plugin"
    ]
}

Things that might help you

If you need to get content of a node's subtree in the order it appears in the document, use computerwords.cwdom.traversal.preorder_traversal() to walk the tree. In the future, this will be made easier, but this solution should work for now, even if it's slow.

You can store data on nodes. Each node has a data dict attribute.

You can also store data on the tree using the tree.processor_data dict.

Best Practices

Do not mutate the node "in place." Instead, create a copy, modify it, and replace the original. That way, any other processors that respond to changes in that node can run again.

class CWPlugin()src

Base class for all Computer Words plugins.

CONFIG_NAMESPACEsrc

If you want to include custom information in the config file, specify a namespace for it to live under. For example, the HTML writer's configuration is all under the "html" key.

get_default_config()src

If you specified CONFIG_NAMESPACE, provide a default dictionary here

add_processors(library)src

Use cwdom.library.Library.processor to define transforms on the tree.

computerwords.librarysrc

class Library()src

A collection of functions that can be applied to nodes. Each function is called a processor and applies to nodes with the given name.

As a user of Computer Words, you really only need to know about the processor() method.

processor(tag_name, p=None, before_others=False)src

Declare a function as a processor for nodes with name tag_name. May be used as a decorator or as a simple function call.

As a decorator:

@library.processor('Text')
def reverse_text(tree, node):
    tree.replace_node(node, CWTextNode(node.text[::-1]))

As a function:

def reverse_text(tree, node):
    tree.replace_node(node, CWTextNode(node.text[::-1]))
library.processor('Text', reverse_text)

computerwords.cwdom.CWTreesrc

class CWTree()src

The CWTree class models the tree of all documents and their contents. It allows you to traverse the tree in various ways and mutate it during some types of traversal.

Properties:

  • root: Root of the tree

  • env: Dictionary containing information about how Computer Words was invoked and configured. Contains keys output_dir, which is a pathlib path to the root directory of output files, and config, which is a dict containing the fully resolved configuration.

  • processor_data: Dict that you can use to store and retrieve arbitrary data during processing.

get_document_path(document_id)src

Returns the path of the source document matching document_id

preorder_traversal(node=None)src

Shortcut for computerwords.cwdom.traversal.preorder_traversal() using the root

postorder_traversal(node=None)src

Shortcut for computerwords.cwdom.traversal.postorder_traversal() using the root

postorder_traversal_allowing_ancestor_mutations(node=None)src

Yields every node in the tree in post-order.

While iterating, you use the mutation methods on this class to mutate the tree.

apply_library(library, initial_data=None)src

Run the processing algorithm on the tree using library. Technically public, but you probably have no use for this.

mark_node_dirty(node)src

Ensure this node's processors are run at some point in the future.

mark_ancestors_dirty(node)src

Ensure this node's ancestors' processors are run at some point in the future.

get_is_node_dirty(node)src

Returns True if the node is marked dirty.

get_was_node_removed(node)src

Returns True if the node was previously in the tree but has since been removed.

wrap_node(inner_node, outer_node)src

Add an ancestor between inner_node and its parent.

tree.wrap_node(C, CWNode('B'))

Limitations

  • outer_node may not have any existing children.

replace_subtree(old_node, new_node)src

Replace a node and all its children with another node and all its children.

tree.replace_subtree(B, CWNode('X', [
    CWNode('Y'),
    CWNode('Z'),
]))

insert_subtree(parent, i, child)src

Adds a note child and all its children as a child of parent at index i.

tree.insert_subtree(A, 1, D)

Limitations (may be temporary)

  • parent must be the active node or a descendant of it.

add_siblings_ahead(new_siblings)src

Add new_siblings as children of the active node's parent immediately after the active node.

This may be replaced by a more general method later.

@library.processor('B')
def process_b(tree, node):
    tree.add_siblings_ahead([CWNode('C'), CWNode('D')])

replace_node(old_node, new_node)src

Replace old_node with new_node. Give all of old_node's children to new_node.

tree.replace_node(B, CWNode('X'))

Limitations (may be temporary)

  • May only be used on the active node.

get_is_descendant(maybe_descendant, maybe_ancestor)src

Returns True if maybe_descendant is a descendant of maybe_ancestor.

text_to_ref_id(text)src

Returns a ref_id that is unique against all other ref_ids returned by this function, vaguely resembling text

subtree_to_text(node)src

Returns a string of the concatenated text in a subtree

class CWTreeConsistencyError(Exception)src

Error that is thrown if any of the limitations of CWTree's methods are violated

computerwords.cwdom.traversalsrc

Utilities for traversing CWNode trees.

function preorder_traversal(node, start=None, end=None) → iterator(CWNode)src

Yields every node in the tree. Each node is yielded before its descendants. Mutation is disallowed.

  • start: If specified, only yield nodes following (not including) this node.

  • end: If specified, do not yield this node or nodes following it.

function postorder_traversal(node) → iterator(CWNode)src

Yields every node in the tree. Each node is yielded after its descendants. Mutation is disallowed.

function iterate_ancestors(node)src

Yields every ancestor of a node, starting with its immediate parent.

from computerwords.cwdom.nodes import CWNode
from computerwords.cwdom.traversal import iterate_ancestors
node_c = CWNode('c', [])
tree = CWNode('a', [
    CWNode('b', [node_c]),
    CWNode('d', []),
])
assert ([node.name for node in iterate_ancestors(node_c)] ==
        ['b', 'a'])

function find_ancestor(node, predicate)src

Returns the closest ancestor of a node matching the given predicate.

from computerwords.cwdom.traversal import find_ancestor
document_node = find_ancestor(node, lambda n: n.name == 'Document')

function visit_tree(tree, node_name_to_visitor, node=None, handle_error=None)src

Recursively call the CWTreeVisitor for each node. If a node is encountered that has no corresponding visitor, MissingVisitorError is thrown.

from computerwords.cwdom.CWTree import CWTree
from computerwords.cwdom.traversal import (
    visit_tree,
    CWTreeVisitor
)

visits = []
class SimpleVisitor(CWTreeVisitor):
    def before_children(self, tree, node):
        visits.append('pre-{}'.format(node.name))

    def after_children(self, tree, node):
        visits.append('post-{}'.format(node.name))

tree = CWTree(CWNode('x', [CWNode('y', [])]))
visit_tree(tree, {
    'x': SimpleVisitor(),
    'y': SimpleVisitor(),
})
assert visits == ['pre-x', 'pre-y', 'post-y', 'post-x']

class MissingVisitorError(Exception)src

Error thrown when trying to visit a node for which no visitor is available.

class PostorderTraverser()src

A class that lets you iterate over a tree while mutating it.

Keeps track of a cursor representing the last visited node. Each time the next node is requested, the iterator looks at the cursor and walks up the tree to find the cursor's next sibling or parent.

You may replace the cursor if you want to replace the node currently being visited.

You may safely mutate the cursor's ancestors, since they haven't been visited yet.

replace_cursor(new_cursor)src

Only use this if you really know what you are doing.

computerwords.cwdom.nodessrc

This module contains the nodes that make up the document tree.

Vocabulary

  • Document ID, document_id, or doc_id: a tuple of strings representing a document's relative path to the root of the source files. For example, ('api', 'index.md').

class CWNode()src

Superclass for all nodes. Unless you're writing test cases, you'll generally be dealing with subclasses of this.

__init__(name, children=None, document_id=None)src

  • name: aka type or kind of node

  • children: list of initial children

  • document_id: document ID of the document in which this node resides. Usually assigned after initialization by CWTree.

claim_children()src

Call child.set_parent(self) on each child

set_children(children)src

Replace self.children with a new list of children, and set their parent values

get_parent()src

Return this node's current parent, or None if there isn't one

set_parent(new_parent)src

Set this node's parent

deep_set_document_id(new_id)src

Set the document_id of this node and all its children

copy()src

Copy this node but not its children

deepcopy()src

Copy this node and its children

deepcopy_children_from(node, at_end=False)src

Add a deep copy of all node's descendants before or after existing children

get_string_for_test_comparison(inner_indentation=2)src

Returns a string that is very convenient to compare using unittest.TestCase.assertMultiLineEqual()

get_args_string_for_test_comparison()src

Subclasses may override this to populate arguments in test strings

shallow_repr()src

Like self.__repr__(), but don't include children

class CWRootNode(CWNode)src

name: Root

The node at the root of a CWTree. Its immediate children should all be instances of CWDocumentNode.

class CWEmptyNode(CWNode)src

name: Empty

A node with no content. May be used instead of removing a node.

class CWDocumentNode(CWNode)src

name: Document

A node representing a document. Its descendants can be anything but CWRootNode and CWDocumentNode.

__init__(path, children=None, document_id=None)src

  • path: Filesystem path to this document

class CWTagNode(CWNode)src

name: arbitrary HTML tag name

A node that can be directly converted to valid HTML.

__init__(name, kwargs, children=None, document_id=None)src

  • name: Still the type/kind of node, but is also a valid HTML tag name.

  • kwargs: Arguments/attributes parsed from the tag or inserted by processors.

class CWTextNode(CWNode)src

name: Text

A node that only contains text to be written to output.

__init__(text, document_id=None, escape=True)src

  • text: The text

  • escape: If True, output should be escaped in whatever the output language is. Defaults to True.

class CWAnchorNode(CWTagNode)src

name: Anchor

A node that can be linked to via its globally unique ref_id. Essentially a specialized version of CWTagNode('a', {'name': ...}).

__init__(ref_id, kwargs=None, children=None, document_id=None)src

  • ref_id: Globally unique ID of this anchor

class CWLinkNode(CWNode)src

name: Link

A node that can link to the location of a CWAnchorNode to via its globally unique ref_id.

In the future, this may become a subclass of CWTagNode.

class CWDocumentLinkNode(CWNode)src

name: DocumentLink

A node that can link to the location of a document without including any anchors inside the page.