@xmlaspect/reactive-xml/History

From ApiFusion
Jump to: navigation, search

Reasons of this toolkit creation

For ApiFusion the import documentation from sources into MediaWiki pages become a sequence for processing various data from different sources, mostly data and html web services. This mix when implemented with classic asynchronous patterns like callbacks, Promise call chain, or `async` JS keyword become a spaghetti code mess hard to code and maintain.

An another use case inspired reacive-xml creation is a web crawling for data mining. In my case finding who will share the floor on various dance competitions.

Solution

Make the module for programming of execution tree working with

  • remote data retrieval via ajax
  • retrieval data as String/XML/JSON
  • load data into DOM
  • selecting collection from DOM
  • loading dom element HTML
  • posting DOM element string via ajax
  • async API for processing stages with callbacks( similar to Promise ) aware of data from more than one previous stage( unlike the Promise which gives only access to last stage result ).
  • if the stage returns collection, run following call chain for each element.


Full sequence

Initialized from ApiFusion

  • "Sync docs from source" button in project page ( when source repo and build environment are available )
  • Initialize build environment "Sync ApiFusion" task
  • watch for progress, update current content if changed.

Build environment, CI or manual run "Sync ApiFusion" task.

  • run CLI to generate docs( JavaDoc, JsDoc, SPhynx, etc. )
  • traverse over generated index pages, sync to AF page path
  • traverse over packages, sync package doc to AF page path
  • traverse over classes, sync class doc to AF page path
  • traverse over methods/members, sync method doc to AF page path

Here

  • traverse over generated content means
  1. read html from URL
  2. parse HTML (load into DOM)
  3. query HTML sections matching context

Sync to AF page path means

  1. read AF page
  2. compare sub-content of generated doc HTML with AF content
  3. if not match, copy sub-content of generated doc HTML into ApiFusion page with path matching the module/package/method.

Each operation is an AJAX call and number of operations in thousands is expected. Hence processing should

  • be executed in parallel asynchronous manner to avoid long waiting.
  • be interrupt-able
  • safe to re-start
  • limited parallelism, permitting only defined number processing threads a time. Since the i.e. only one module/class a time

sudo code

import { CallChain as $ } from "@xmlaspect/reactive-xml";

// docs generated by CLI, browser loading following script
$( getIndexUrl )         // routine to get url to indexes web service
    .post()              // post to that url
    .$then( processIndexes     )// returns collection of index files URLs
    .get()                      // ajax get from url before
    .$then( processIndexFile   )// returns collection of package doc URLs
    .$then( processPackageFile )// returns collection of classes/modules/files as list of urls
    .$then( processClassFile )  // returns collection of methods/members
    .$then( processMember );    // collection members served in parallel

$then( callback ) is the method which invokes the callback each time the callchain is executed. It differs from Promise.then() by ability to be executed multiple times and arguments which reflect not only previous call result but also other results in call chain.

Each `processXXX`

function processXXX( ajaxData, url )
{
    // load ajaxData into DOM
    // get content on XXX level
    // await get page from AF
    // compare AF and ajax content
    // if not equal await post content to AF page
    // query XXX+1 urls from DOM
    // return url collection
}

Stop/Pause/Restart

The execution tree could grow large, consume a lot of resources from CPU to memory and network connections, or slow to execute.

In any of those cases the execution chain should be capable of breaking and releasing resources like closures, opened network connections, timers, etc.

After breaking the execution chain, it should be possible to restart it.

The need to re-start from particular place in call chain TBD. Perhaps when preparation to certain point is time expensive.

Optimisation

Since the code above end up to create ajax calls for each method, in case of relatively large project, number of simultaneously created xhr connections and the associated JS closures goes beyond browser capability pretty fast.

The resource consumption could be limited by manual ordering to process only one collection element a time:

// docs generated by CLI, browser loading following script
$( getIndexUrl )     // routine to get url to indexes web service
    .post()          // post to that url
    .$then( processIndexes )    // returns collection of index files URLs
    .batch()                    // run 1 a time
    .get()                      // ajax get
    .$then( processIndexFile   )// returns collection of package doc URLs
    .batch()                    // run 1 a time
    .$then( processPackageFile )// returns collection of classes/modules/files
    .batch()                    // run 1 a time
    .$then( processClassFile   )// returns collection of methods/members
    .$then( processMember );    // collection members served in simultaneously

batch( N )

Before the call in chain, if the result is a collection (Array, NodeList,...), it would run following chain for each collection element.

The collection elements are processed in batches to prevent holding the resources for following asynchronous execution of chain calls.

The batch behavior and size could be changed by explicit call batch(N), which interprets the result of previous call as collection and continues the execution chain keeping in progress not more than N branches.

batch() If N is omitted the execution goes one by one, same as batch(1)

batch(0) process the collection as single object, i.e. remaining callchain will be executed on collection object( rather on each element ).

batch(-N) If N is negative, the batch will completely finish processing of N elements before picking next N.