Internals¶
The following section describes some of the internals of mwlib. Only read this if you plan to extend mwlib’s functionality.
Writers¶
A writer in mwlib generates output from a collection of MediaWiki articles in some writer-specific format.
The writer function¶
Essentially a writer is just a Python function with the following signature:
def writer(env, output, status_callback, **kwargs): pass
Note that the function doesn’t necessarily have to be called “writer”.
The env
argument is an mwlib.wiki.Environment
instance which always has
the wiki
attribute set to the configured WikiDB
instance and the
metabook
attribute set to a filled-in mwlib.metabook.MetaBook
instance.
If images are used, the images
attribute of the env
object is set to
the configure ImageDB
instance.
The output
argument is a filename of a file in which the writer should
write its output.
The status_callback
argument is a callable with the following signature:
def status_callback(status=None, progress=None, article=None): pass
which should be called from time to time to update the status/progress
information. status
should be set to a short, English description of
what’s happening (e.g. “parsing”, “rendering”), progress
should be an
integer value between 0 and 100 indicating the percentage of progress
(actually you don’t have to worry about setting it to 0 at the start and to
100 at the end, this is done by mw-render
) and article
should
be the unicode string of the currently processed article. All parameters
are optional, so you can pass only one or two of the parameters to
status_callback()
and the other parameters will keep their previous
value.
The return value of the writer function is not used: If the function returns,
this is treated as success. To indicate failure, the writer must raise an
exception. Use the WriterError
exception defined in mwlib.writerbase
(or a subclass thereof) and instantiate it with a human readable
English error message if you want the message to be written to the error
file specified with the --error-file
option of mw-render
. For all
other exceptions, the traceback is written to the error file.
Your writer function can define additional keyword arguments (indicated by
the “**kwargs
” above) that can be passed to the writer with the
--writer-options
argument of the mw-render
command (see below).
If the user specified a writer option with option=value
, the kwarg
option
gets passed the string "value"
, if she specified a writer
option just with option
, the kwarg option
gets passed the value
True
. All writer options should be optional and documented using the
options attribute on the writer object (see below).
Attributes¶
Optionally – and preferably – this function object has the following additional attributes:
writer.description = 'Some short description'
writer.content_type = 'Content-Type of the output'
writer.file_extension = 'File extension for documents'
writer.options = {
'foo: {
'help': 'help text for "switch" foo',
},
'bar': {
'param': 'PARAM',
'help': 'help text for option bar with parameter PARAM',
}
}
For example the writer “odf” (defined in mwlib.odfwriter
) sets the
attributes to these values:
writer.description = 'OpenDocument Text'
writer.content_type = 'application/vnd.oasis.opendocument.text'
writer.file_extension = 'odt'
and the writer “rl” from mwlib.rl (defined in mwlib.rl.rlwriter
) sets
the attributes to these values:
writer.description = 'PDF documents (using ReportLab)'
writer.content_type = 'application/pdf'
writer.file_extension = 'pdf'
writer.options = {
'coverimage': {
'param': 'FILENAME',
'help': 'filename of an image for the cover page',
}
}
The description is used when the list of writers is displayed with
mw-render --list-writers
, all information is displayed with
mw-render --writer-info SOMEWRITER
. The content type and file extension
are written to a file, if one is specified with the --status-file
argument
of mw-render
.
Publishing the writer¶
Writers are made available as plugins using setuptools entry points.
They have a name and must belong to the entry point group “mwlib.writers”.
To publish writers in your distribution, add all included writers to the
entry group by passing the entry_points kwarg to the call to
setuptools.setup()
in your setup.py
file:
setup(
...
entry_points = {
'mwlib.writers': [
'foo = somepackage.foo:writer',
'bar = somepackage.barbaz:bar_writer',
'baz = somepackage.barbaz:baz_writer',
],
},
...
)
Using writers¶
From the command line, writers can be used with the mw-render
command.
Called with just the --list-writers
option, mw-render
lists the
available writers together with their description. A name of an available
writer can then be passed with the --writer
option to produce output
with that writer. For example this will use the ODF writer (named “odf”)
to produce a document in the OpenOffice Text format:
$ mw-render --config :en --writer odf --output test.odt Test
Additional options for the writer can be specified with the
--writer-options
argument, whose value is a “;” separated list of
keywords or “key=value” pairs.
Metabooks¶
A Metabook describes a collection of articles and chapters together with some metadata like title or version. The actual data (e.g. the wikitext of articles) is not contained in the Metabook.
The Metabook is a simple dictionary containing lists, integers, strings (which are Unicode-safe; they are represented as unicode in Python) and other dictionaries. When read from/written to a file or sent over the network, it”s serialized in JSON format.
Metabook Types¶
Every dictionary contained in the Metabook (and the Metabook dicionary itself) has a type. The different types are described below. The Metabook dictionary itself has type “collection”.
Collection¶
type (string):
Fixed value “collection”
version (integer):
Protocol version, 1 for now
title (string, optional):
Title of the collection
subtitle (string, optional):
Subtitle of the collection
editor (string, optional):
Editor of the collection
items (list of article and/or chapter objects, can be empty):
Chapters and top-level articles contained in the collection
licenses (list of license objects):
List of licenses for articles in this collection
License¶
type (string)
Fixed value “license”
name (string)
Name of license
mw_license_url (string, optional)
URL to license text in wikitext format
mw_rights_page (string, optional)
Title of article containing license text
mw_rights_icon (string, optional)
URL of license icon
mw_rights_url (string, optional)
URL to license text in any format
mw_rights_text (string, optional)
Name and possibly a short description of the license
Article¶
type (string):
Fixed value “article”
content_type (string):
Fixed value “text/x-wiki”
title (string):
Title of this article
displaytitle (string, optional):
Title to be used in rendered output instead of the real title
revision (string, optional):
Revision of article, i.e. oldid for MediaWiki. If omitted, the latest revision is used.
timestamp (integer, optional):
UNIX timestamp (seconds since 1970-1-1) of the revision of this article
url (string):
URL to article in source wiki
authors (list of strings):
list of principal authors
source-url (string)
URL of source wiki. This URL is the key to an item in the sources dictionary in the content.json object of the ZIP file.
Chapter¶
type (string):
Fixed value “chapter”
title (string):
Title of this chapter
items (list of article objects, can be empty):
List of articles contained in this chapter
Source¶
type (string)
Fixed value “source”
system (string):
Fixed value “MediaWiki” for now
url (string, optional):
“home” URL of source, e.g. “http://en.wikipedia.org/wiki/Main_Page” (same as key for this entry)
name (string):
Unique name of source, e.g. “Wikipedia (en)”
language (string)
2-character ISO code of language, e.g. “en”
interwikimap (dictionary mapping prefixes to interwiki objects, optional)
Describes interwikimap for this wiki, cf. http://en.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=interwikimap
Interwiki¶
Interwiki entries can describe language links and interwiki links
type (string)
Fixed value “interwiki”
prefix (string)
Prefix is MediaWiki links, i.e. the part before the “:”. This is the key in the interwikimap attribute of a source object.
url (string)
URL template, the string “$1” gets replaced with the link target (w/out prefx)
local (bool, optional)
True if the interwiki link is a “local” one
language (string, optional)
Name of the language, if this interwiki describes language links
Example¶
Given in JSON notation:
{
"type": "collection",
"version": 1,
"title": "This is the Collection Title",
"subtitle": "An optional subtitle",
"editor": "Jane Doe",
"items": [
{
"type": "article",
"title": "Top-level Article",
"content_type": "text/x-wiki"
},
{
"type": "chapter",
"title": "First Chapter",
"items": [
{
"type": "article",
"title": "First Article in Chapter",
"revision": "1234",
"timestamp": 122331212312,
"content_type": "text/x-wiki"
"source-url": "http://en.wikipedia.org/wiki/Main_Page",
},
{
"type": "article",
"title": "Second Article in Chapter",
"content_type": "text/x-wiki"
"source-url": "http://en.wikipedia.org/wiki/Main_Page",
}
]
},
],
"licenses": [
{
"type": "license",
"name": "GFDL",
"mw_license_url": "http://en.wikipedia.org/wiki/Wikipedia:Text_of_the_GNU_Free_Documentation_License"
}
]
}