There are two main areas where showthedocs could improve: supporting a new language and improving an existing parser/annotator. The UI itself doesn't work very well on small/touch screens, so if someone cares about that, feel free to send suggestions or pull requests.
If you'd like to hack on the site but have a hard time navigating the code base, please don't hesitate to reach out to me either on github or privately and I'll answer any questions (and add missing documentation).
supporting a new language
Adding a language requires the following:
- writing a parser (or using an existing one)
- using that parser to annotate queries
- importing the documentation (usually a small subset of it) and annotating it as well
This commit that added support for gitconfig can be used for guidance on the various pieces of the code that need to be modified.
writing a parser
The parsing that showthedocs needs is generally a lot more shallow than the one used by the target language tooling. Python has a vast array of existing parsers, so check what's out there before starting one from scratch. If you do write one, prefer to write a lenient parser, since the purpose of showthedocs isn't to validate.
The parser should produce some form of an AST that can be easily traversed by the annotator (more on that next). In essence this is similar to what most syntax highlighters do, but for the results to be more meaningful than "this is a keyword", or "that is a string", parsing needs to go deeper and provide things like "this is a SELECT statement, these are the table names", etc.
SQL is parsed using sqlparse. Nginx
and gitconfigs are parsed with pyparsing (see
This is the definiton of an annotation:
class Annotation(object): '''An annotation selects a range in the input, assigns it a group and a list of arbitrary class names. Annotations translate to HTML by wrapping the range in a <span> tag. The group appears as the value of a data-showdocs attribute, likewise for the class names. A group is an arbitrary string that identifies the selected range. A group is visualized in a special manner in the user interface, depending on the decoration applied to it. A group exists to connect a piece of the input to its documentation, which will somewhere have a tag with the same data-showdocs attribute. The list of class names is currently only used to apply decorations. A decoration controls the display of the annotation in the UI. The most common one is a back decoration, which changes the background color and supports things like connecting links when hovering the annotation.''' def __init__(self, start, end, group, classes): if end <= start: raise ValueError('end smaller than start') if not isinstance(classes, list): raise ValueError('classes needs to be a list') self.start = start self.end = end self.group = group self.classes = classes
The annotation process consists of creating annotations for the input by traversing the AST produced by the parser. It is the responsibility of the annotator to assign groups to certain parts of the input and apply decorations. Lastly, the annotator needs to request a piece of documentation to appear in the result of the query.
For example, if we consider the query:
SELECT * FROM foo;
we might see these annotations:
SELECT, group=select, decorate=back
*, group=column, decorate=back
FROM, group=from, decorate=back
foo, group=table, decorate=back
The final piece is bringing the documentation of the target language into
showthedocs. All we care about is HTML that has a bunch of tags with
data-showdocs and some decorations applied to those.
devdocs is a documentation aggregator to a vast array of languages. It scrapes the online copy and does a bunch of modifications to the downloaded HTML. The end result is ideal for our purposes.
is in charge of building documentation and running
that among others modify the output to include
data-showdocs attributes. There are currently
two base repositories.
DevDocsRepository is based on devdocs and simply
shells out to its tools. The other one is
ScrapedRepository, which has
utility functions for downloading web pages.
getdocs is used to build repositories, and puts the output files in
a directory under
external/. An annotator can then request to include a file
from this directory by calling
self.add, e.g. a file at
external/foo/bar.html will show up in the result of a query if its annotator