MiMFa RAVAR DataLab

There have been emerging new tools to collect and analyze the big data since recent years. So developers of this area of knowledge, are confronting challenges of developing and learning new programming languages, frameworks, modules, and etc. This led us to aim at developing a new tool to integrate most of the problems of big volume data. Basically, the tool should take three objectives into the consideration, which match the data-warehousing algorithms:
1. Automatic data extraction from the offline and online resources covering different formats in a semi up to big volume data,
2. Process and integration of collected and existing data for the future use (data normalization),
3. Presentation of data for analysis and reporting purposes through derivating data-mart and other types.

Our software includes the options and utilities as follows:
• Automatic collecting of text, image, sound, video, and other multimedia (the focus is mostly on text)
– unstructured, semi-structured and structured texts
– from different formats or resources (MS-Excel, MS-Word, MS-PowerPoint, PDF, XML, HTML, and etc.)
– from Local Disk, Web Pages, [Cloud, LAN, and etc.]
– semi-big up to big volume data
– through parallel processing algorithms
• Data presentation and visualization of several related files on table-view, form-view, text-view, chart-view, and, etc.
• Accessing, editing, processing and normalization of several data files in an integrated environment
• Search, filter and applying queries on different types of data
• Exporting data to one or more given standards for data exchange between different systems (such as Dublin-Core standard for exchange of media-metadata)
• Importing data from different data exchange standard formats
• Adopting organization of data with different metadata standards for storing and exchange purposes
• Developing algorithms on the data for general indexing, subject indexing, normalization, clustering, and creating different data structure (like tree, graphs, and etc.),
• Developing standard digital archive and library environment for preserving, using, and analyzing data of different media
• Scripting simple codes to make different scenarios and run them
• Developing and adding new algorithms into the software for the intended pre-processing and processing activities