API
Summary: write a class inheriting from You can either install your plugin as a package or just add If you need to get content of a node's subtree in the order it appears in the document, use You can store data on nodes. Each node has a You can also store data on the tree using the Do not mutate the node "in place." Instead, create a copy, modify it, and replace the original. That way, any other processors that respond to changes in that node can run again.computerwords.plugin
srcWriting a plugin for Computer Words
computerwords.plugin.CWPlugin
, put it in a module accessible in PYTHONPATH
, and add it to a list of module paths under the "plugins"
key in your config file.1. Inherit from
CWPlugin
2. Put your plugin in
PYTHONPATH
PYTHONPATH=$PYTHONPATH:$PATH_TO_MODULE_DIR
in front of your invocation of python -m computerwords
. Kind of like this:3. Add your plugin to the config file
Things that might help you
computerwords.cwdom.traversal.preorder_traversal()
to walk the tree. In the future, this will be made easier, but this solution should work for now, even if it's slow.data
dict attribute.tree.processor_data
dict.Best Practices
class CWPlugin()
srcCONFIG_NAMESPACE
src"html"
key.get_default_config()
srcCONFIG_NAMESPACE
, provide a default dictionary hereadd_processors(library)
srccwdom.library.Library.processor
to define transforms on the tree.
A collection of functions that can be applied to nodes. Each function is called a processor and applies to nodes with the given name. As a user of Computer Words, you really only need to know about the Declare a function as a processor for nodes with name tag_name. May be used as a decorator or as a simple function call. As a decorator: As a function:computerwords.library
srcclass Library()
srcprocessor()
method.processor(tag_name, p=None, before_others=False)
src
The Properties: Yields every node in the tree in post-order. While iterating, you use the mutation methods on this class to mutate the tree. Add an ancestor between Limitations Replace a node and all its children with another node and all its children. Adds a note Limitations (may be temporary) Add This may be replaced by a more general method later. Replace Limitations (may be temporary) May only be used on the active node.computerwords.cwdom.CWTree
srcclass CWTree()
srcCWTree
class models the tree of all documents and their contents. It allows you to traverse the tree in various ways and mutate it during some types of traversal.root
: Root of the treeenv
: Dictionary containing information about how Computer Words was invoked and configured. Contains keys output_dir
, which is a pathlib
path to the root directory of output files, and config
, which is a dict containing the fully resolved configuration.processor_data
: Dict that you can use to store and retrieve arbitrary data during processing.get_document_path(document_id)
srcpreorder_traversal(node=None)
srccomputerwords.cwdom.traversal.preorder_traversal()
using the rootpostorder_traversal(node=None)
srccomputerwords.cwdom.traversal.postorder_traversal()
using the rootpostorder_traversal_allowing_ancestor_mutations(node=None)
srcapply_library(library, initial_data=None)
srclibrary
. Technically public, but you probably have no use for this.mark_node_dirty(node)
srcmark_ancestors_dirty(node)
srcget_is_node_dirty(node)
srcTrue
if the node is marked dirty.get_was_node_removed(node)
srcTrue
if the node was previously in the tree but has since been removed.wrap_node(inner_node, outer_node)
srcinner_node
and its parent.outer_node
may not have any existing children.replace_subtree(old_node, new_node)
srcinsert_subtree(parent, i, child)
srcchild
and all its children as a child of parent
at index i
.parent
must be the active node or a descendant of it.add_siblings_ahead(new_siblings)
srcnew_siblings
as children of the active node's parent immediately after the active node.replace_node(old_node, new_node)
srcold_node
with new_node
. Give all of old_node
's children to new_node
.get_is_descendant(maybe_descendant, maybe_ancestor)
srcTrue
if maybe_descendant
is a descendant of maybe_ancestor
.text_to_ref_id(text)
srctext
subtree_to_text(node)
srcclass CWTreeConsistencyError(Exception)
srcCWTree
's methods are violated
Yields every node in the tree. Each node is yielded before its descendants. Mutation is disallowed. start: If specified, only yield nodes following (not including) this node. end: If specified, do not yield this node or nodes following it. Yields every ancestor of a node, starting with its immediate parent. Returns the closest ancestor of a node matching the given predicate. Recursively call the A class that lets you iterate over a tree while mutating it. Keeps track of a cursor representing the last visited node. Each time the next node is requested, the iterator looks at the cursor and walks up the tree to find the cursor's next sibling or parent. You may replace the cursor if you want to replace the node currently being visited. You may safely mutate the cursor's ancestors, since they haven't been visited yet.computerwords.cwdom.traversal
srcCWNode
trees.function preorder_traversal(node, start=None, end=None) → iterator(CWNode)
srcfunction postorder_traversal(node) → iterator(CWNode)
srcfunction iterate_ancestors(node)
srcfunction find_ancestor(node, predicate)
srcfunction visit_tree(tree, node_name_to_visitor, node=None, handle_error=None)
srcCWTreeVisitor
for each node. If a node is encountered that has no corresponding visitor, MissingVisitorError
is thrown.class MissingVisitorError(Exception)
srcclass PostorderTraverser()
srcreplace_cursor(new_cursor)
src
This module contains the nodes that make up the document tree. Document ID, name: The node at the root of a name: A node with no content. May be used instead of removing a node. name: A node representing a document. Its descendants can be anything but name: arbitrary HTML tag name A node that can be directly converted to valid HTML. name: A node that only contains text to be written to output. name: A node that can be linked to via its globally unique name: A node that can link to the location of a In the future, this may become a subclass of name: A node that can link to the location of a document without including any anchors inside the page.computerwords.cwdom.nodes
srcVocabulary
document_id
, or doc_id
: a tuple of strings representing a document's relative path to the root of the source files. For example, ('api', 'index.md')
.class CWNode()
src__init__(name, children=None, document_id=None)
srcname
: aka type or kind of nodechildren
: list of initial childrendocument_id
: document ID of the document in which this node resides. Usually assigned after initialization by CWTree
.claim_children()
srcchild.set_parent(self)
on each childset_children(children)
srcself.children
with a new list of children, and set their parent valuesget_parent()
srcNone
if there isn't oneset_parent(new_parent)
srcdeep_set_document_id(new_id)
srcdocument_id
of this node and all its childrencopy()
srcdeepcopy()
srcdeepcopy_children_from(node, at_end=False)
srcnode
's descendants before or after existing childrenget_string_for_test_comparison(inner_indentation=2)
srcunittest.TestCase.assertMultiLineEqual()
get_args_string_for_test_comparison()
srcshallow_repr()
srcself.__repr__()
, but don't include childrenclass CWRootNode(CWNode)
srcRoot
CWTree
. Its immediate children should all be instances of CWDocumentNode
.class CWEmptyNode(CWNode)
srcEmpty
class CWDocumentNode(CWNode)
srcDocument
CWRootNode
and CWDocumentNode
.__init__(path, children=None, document_id=None)
srcpath
: Filesystem path to this documentclass CWTagNode(CWNode)
src__init__(name, kwargs, children=None, document_id=None)
srcname
: Still the type/kind of node, but is also a valid HTML tag name.kwargs
: Arguments/attributes parsed from the tag or inserted by processors.class CWTextNode(CWNode)
srcText
__init__(text, document_id=None, escape=True)
srctext
: The textescape
: If True
, output should be escaped in whatever the output language is. Defaults to True
.class CWAnchorNode(CWTagNode)
srcAnchor
ref_id
. Essentially a specialized version of CWTagNode('a', {'name': ...})
.__init__(ref_id, kwargs=None, children=None, document_id=None)
srcref_id
: Globally unique ID of this anchorclass CWLinkNode(CWNode)
srcLink
CWAnchorNode
to via its globally unique ref_id
.CWTagNode
.class CWDocumentLinkNode(CWNode)
srcDocumentLink