Corpora analysis is a complex task, requiring to learn editors for different file formats and multiple tools, often command-line based, or with programming knowledge prerequisite.
$ATK makes it easy to create pipelines connecting ecosystems to process raw data (automated transcriptions, formats conversion..), and query large corpora of annotations coming from various sources to extract advanced statistics and generate beautiful, always up-to-date charts and timelines.
$ATK is also a flexible converter ; it takes as input XML files describing the style and operations to generate a HTML document, and takes care of exporting only relevant portions of videos and their thumbnail snapshots, minimizing final document size and potential load times if hosted online.
$ATK understands the following file formats
Some formats available thanks to the TEI-CORPO project.
$ATK can also process the following media types
The following software must be installed:
Simply extract the latest release zip
It is possible to add other XML folders to the editor by specifying their path as arguments (edit .bat file to see, check the cli arguments)
When $ATK is already installed, follow these steps to update:
An editor for $ATK's XML documents is available in the browser.
To begin, start $ATK by running the launcher (avaa-toolkit.bat on windows or avaa-toolkit.sh on linux),
then navigate with your browser to avaa-toolkit.org
If internet is not available, use the provided offline editor in your installation folder (open index.html)
$ATK will process XML files and convert them to HTML.
It expects a document with the following structure
$ATK is all about querying and filtering annotations. Inside the VIEW or CHART tag, complex queries can be built to extract only specific annotations. This is done via the SELECT tag, various attributes can be combined to make a curated selection of annotations:
Attributes of type regexp (*-match) have additional options:
When multiple attributes are used, the selection will consist only of the annotations fulfilling all the constraints.
When using processors, a pipeline is created for each section of the document.
A pipeline initially contains a virtual copy of the corpus and its associated media files.
The media files are then modified sequentially by each processor.
The pipeline can be fed different initial media files, by defining the processor-pipeline-input setting.
The corpus mode is useful to process corpus files directly (audio-anonymization, formats conversion...), while for instance all-assets mode could be used to apply effects only on the exported media of the document intended for sharing with peers.
Processors inside a pipeline (that is for now, a section of the document) are executed one after another, each processor using the results of the previous one to work on.
Complex chains of processors can be built to automate heavy tasks alleviating the burden of manually running each step and verifying its consistency.
Views placed after a processor (in the same section) will inherit its modified media files when exporting clips and snapshots.
This can be helpful extracting annotations from cuts of raw media files, to avoid processing long corpus media file when testing samples ; or preprocessing a media file before it is exported into clips during later views generation.
Processors generating annotations will make these annotations immediately available in the main corpus (and not only for the current pipeline), hence for all subsequent views and processors in the document.
It is possible to change the style via CSS. The HTML code generated makes it easy to target specific elements or apply styling rules for the whole page. Each view has its own structure of elements, and a simple "Inspect Element" from browser will reveal selectors.
Styles can be defined directly in the XML file, by using a STYLE tag.
These styles will only apply to this specific HTML document.
<STYLE>
.view-timeline td {
border-color:red;
}
.view-timeline tr.tier-header {
text-align:right;
}
</STYLE>
Styles can be defined in a separate CSS file, that must be placed in the include folder.
All the generated HTML documents will load this file and have these styles in common.
h2 {
color:green;
}
section {
border-left: 2px solid gray;
}
Views generate simple HTML code and try to follow common guidelines so that applying styles is straightforward
Annotations' text labels always have the annotation class, so for instance to change the color of all annotations:
.view .annotation {
color:red;
}
$ATK can also generate PDF, though interactive features like videos or dynamic charts won't work in this format, for obvious reasons.
Chrome (or Chromium) must be installed on the system, and the cli argument --pdf must be specified.
Chrome executable should be detected automatically, if that fails it is required to provide its path with the --chrome-exe argument.
If everything works correctly, a file.pdf should be generated along the file.html document.
$ATK is made for the command line and can integrate seamlessly in any tool chain.
##cli##
Some processors require a full ffmpeg version to work.
Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.UnsupportedClassVersionError: org/avaatoolkit/Main has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to X
Solution: Your version of java runtime is outdated, follow these steps
java.net.BindException: Couldn't bind to any port in the range `42042:42042`. at org.glassfish.grizzly.AbstractBindingHandler.bind(AbstractBindingHandler.java) at org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java) at org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:) at org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java) at org.avaatoolkit.server.Daemon.start(Daemon.java) at org.avaatoolkit.Main.main(Main.java)
Solution: The toolkit is already started with the --server argument, close it before running a new instance.
Solution: Your firewall has a strict policy regarding localhost port bindings, add a rule to allow localhost:42042
On some operating systems, the installed java runtime might not be up-to-date and prevent $ATK from executing properly.
To run $ATK, at least java 11 is required. To install a valid runtime only for $ATK: