IVC Software Framework
|Space Filling or Treemap Algorithm|
Tracing its ancestry to Venn diagrams (Venn,1971), the Treemap algorithm was developed in the HCI Lab at the University of Maryland. It uses a space filling technique to map a tree structure (e.g. file directory) into nested rectangles with each rectangle representing a node. A rectangular area is first allocated to hold the representation of the tree, and this area is then subdivided into a set of rectangles that represent the top level of the tree. This process continues recursively on the resulting rectangles to represent each lower level of the tree, each level alternating between vertical and horizontal subdivision.The parent-child relationship is indicated by enclosing the child-rectangle by its parent-rectangle. That is, all descendents of a node are displayed as rectangles inside its rectangle. Associated with each node is a numeric value (e.g. size of a directory) and the size of a nodes rectangle is proportional to its value. Ben Shneiderman's 'Treemaps frspace-constrained visualization of hierarchies' webpage provides the full story. Many efforts have been made to make the space filling technique more effective in visualizing information hierarchy:
The Burst Detection algorithm has been developed and provided by Jon Kleinberg (Cornell University). The algorithm aims to analyze documents to find features that have high intensity over finite/limited durations of time periods. Rather than using plain frequencies of the occurrences of words, the algorithm employs a probabilistic automaton whose states correspond to the frequencies of individual words. State transitions correspond to points in time around which the frequency of the word changes significantly. The algorithm is intended to extract meaningful structure from document streams that arrive continuously over time (e.g., emails or news articles). It generates a ranked list of the most significant word bursts in the document stream, together with the intervals of time in which they occurred. This can serve as a means of identifying topics or concepts that rose to prominence over the course of the stream, were discussed actively for a period of time, and then faded away. An advantage of the implementation is that no pre-processing is performed on the documents (we simply eliminate all non-letter characters and down-case all words). Burst analysis is performed for all words (including stop-words such as `the').
|Pathfinder Network Scaling|
Pathfinder Network Scaling is a structural modeling technique originally developed for the analysis of proximity data in psychology by Schvaneveldt, Durso & Dearholt (1998). Pathfinder algorithms take estimates of the proximity's between pairs of items as input and define a network representation of the items that preserves only the most important links. The resulting Pathfinder network (PFNET) consists of the items as nodes and a set of links (which may be either directed or undirected for symmetrical or non symmetrical proximity estimates) connecting pairs of the nodes.Graphical representations of Pathfinder networks are generated using force directed graph drawing algorithms.
|Latent Semantic Analysis|
Latent Semantic Analysis (LSA) can be applied to induce and represent aspects of the meaning of words (Berry et al., 1995; Deerwester et al., 1990; Landauer & Dumais, 1997; Landauer et al.,1998). LSA is a variant of the vector space model that converts a representative sample of documents to a term-by-document matrix in which each cell indicates the frequency with which each term (rows) occurs in each document (columns). Thus a document becomes a column vector and can be compared with a user's query represented as a vector of the same dimension.In the case of digital libraries, terms may stand for words in abstracts or titles, author names, or cited references. Only author names, titles, abstracts, and cited references occurring in more than one document are included in the analysis.LSA extends the vector space model by modeling term-document relationships using a reduced approximation for the column and row space computed by the singular value decomposition of the term by doment matrix. This is explained in detail subsequently.
|Vector Space Model|
|Worldmapper & User Trail & Chat Log Visualizations|
Open Source Toolkits
Piccolo is a toolkit that supports the development of 2D structured graphics programs, in general, and Zoomable User Interfaces (ZUIs), in particular. A ZUI is a new kind of interface that presents a huge canvas of information on a traditional computer display by letting the user smoothly zoom in, to get more detailed information, and zoom out for an overview. We use a
|Similarity Flooding Algorithm|
Similarity flooding is a matching algorithm based on fix-point computation that is usable across different scenarios [Melnek, S., et. al. 2002]. The algorithm takes two graphs in Pajek format and produces as output a mapping between corresponding nodes of the graphs. Depending on the matching goal, a subset of the mapping is chosen using a set of filters. The percentage of accurate match done by the algorithm is a measure of its accuracy. The data provided as input to the algorithm is first converted into directed labeled graphs. As a first step, an iterative fixpoint computation is performed whose results tell us what nodes in one graph are similar to nodes in the second graph. For computing the similarities, the underlying principle used is that elements of two distinct models are similar when their adjacent elements are similar. A part of the similarity of two elements propagates to their respective neighbors. The spreading of similarities in the matched models is considered similar to the way IP packets flood the network in broadcast communication. The resulting output is then converted into a Pajek input file, which can then be used to easily visualise the similarities and differences between the datasets using appropriate color coding. The amount of human intervention required in correcting the results is used as a measure of accuracy.
|Content-Addressable Network Model|
|Network Analysis Toolkit|
ABSURDIST (Aligning Between Systems Using Relations Derived Inside Systems for Translation) is a computational algorithm that uses within-system relations to find between-system translations (Feng, Y, et. al, 2004, Goldstone, R., 2002). In addition to performing simple string match of node labels, this algorithm also uses network structure information to identify the overlap between two given networks.Within a network, a node is treated as a concept. ABSURDIST derives semantic similarity information for different concepts embedded within each network. The algorithm uses these internal semantic structure by itself, or in combination with external information about similarities between concepts to establish mapping of concepts that exists within two given networks. It supports networks with multiple types of weighted or unweighted, directed or unidrected relations between concepts. The algorithm exploits graph sparsity to improve computational efficiency. Parameters for running the algorithm are optimized to achieve better mapping results, as well as improve noise tolerance, iteration speed, and scalability.