Tatyana Yudina
Scientific Information Infrastructure for Social Research in Russia

yudina@mail.cir.ru

Last years of XX - first years of XXI century is the time of important events for social and human sciences in Russia: Internet-based scientific information infrastructure for social research has started its forming and several components of the infrastructure are accomplished and available for free access to the Russian and international academic community. In January 2000 the University Information System RUSSIA (UIS RUSSIA, www.cir.ru; www.cir.ru/eng/) was opened. Early in 2001 SocioNet (http://socionet.ru) was opened. In December 2001 National Archive of social data was announced (www.vciom.ru) as a joint efforts of 4 main public opinion polls institutions. Taken together these three resources construct a modern scientific information base covering empirical social data and content to provide for applied research in social sciences and network for cooperation and communication of specialists. The efforts to integrate all three sources and provide for cross search are undertaken. The UIS RUSSIA technology serves for integration.

The UIS RUSSIA is maintained as a thematic resource in social sciences. The current version includes the basic scope of social domain data and documents. There are more than 400,000 documents and 22,000 tables. The March 2002 version includes:
- official data and documents (laws, presidential decrees and directives, governmental enactments, acts and regulations) since 1991;
- stenogramms (daily records) of State Duma of Federal Assembly of RF from 1994;
- Goscomstat of RF data (full collection);
- CIS economic and social data, provided for by CIS Interstate Statistical Committee;
- monthly monitorings, provided for by the ministries of RF;
- election statistics of both federal and local levels since 1993, provided for by Central Election Commission of RF;
- mass media sources (8 newspapers, 2 information agencies),
- "Expert" weekly journal;
- databases, publications and reports of leading analytical centers;
- extended reference information on the components of the Russian Federation.
In 2001 scientific publications modules was developed and several journals of high academic value were integrated - "Sociological Journal", "Problems of Forecasting", "Effective Anti-crises Management", social domain series of Moscow State University Journal. Work underway is on "International Life", "Federalizm", "Political Studies" and on a new module - publications of leading think tanks in economics, social and political studies.

"Budget System of RF" database (www.budgetrf.ru) is accomplished in 2001 as a subject-oriented resource. The database covers government data and documents on federal budget and regional budgets since 1992, complimenting materials, State Duma and Council of Federation documents, analytical reports, mass media articles and other materials on the topic.

New full text collections will be included later in 2002:
- Constitutional Court of RF, Supreme Court of RF, Arbitrary Court of FR, decisions,
- local mass media sources.
- OECD Health Data.

To arrange such a scope of dynamically updated collections and integrate into a system the know how of automatic linguistic analysis is accomplished. Main element is the Thesaurus, in its current version covers 60,000 descriptors with synonyms. Thesaurus-based terminological analysis provides for conceptual indexing, classification and annotation of electronic text corpora. The analysis results are used to support advanced search engine.

The Thesaurus is translated into English and the UIS RUSSIA version with search tools in English is available (www.cir.ru/eng)

Academic (research-assisting) services
Academic services is part of the UIS RUSSIA, this is complex of organizational efforts, technical preprocessing and content analysis of the collections. To name the main elements:
- purposeful forming of electronic scientific information base meeting the needs of social studies,
- maintenance of subject-oriented resources on most demanded scientific and social topics,
- documents from different sources converge into the HTML format,
- bibliographic processing of a source\source meta data extraction,
- bibliographic processing of each document/table and source meta data assignment,
- content-based classification of documents and tables,
- full text documents annotating,
- all the collection integration in Oracle-based IS,
- translation into Russian of foreign collections' search instruments and help.

Additional complex are made for statistics:
- special SSH-based classification,
- all the tables are converged into the MS Excel 97 format,
- the Goscomstat of RF Methodological notes' and Glossary complimenting the tables,
- JEL (Journal of Economic Literature) and Russian Scientific Literature Classificatory -based classification of scientific publications,
- analytical publications' tables and graphics module forming and tables converge into the MS Excel 97 format,
- election statistics map-based presentation.

Automatic topic query update is accomplished.

The UIS RUSSIA provides for advanced search instruments:
- UIS RUSSIA SSH-based search,
- Congressional Research Service, LC, SSH- based search,
- UIS RUSSIA Thesaurus-based navigation and query refinement,
- all the collections cross search,
- relevancy-based ranging of query results,
- hiperlinks to Methodological notes and Glossary from statistics collections,

Marked as topic query update is accomplished and available in automatic module.

The technologies accomplished under the UIS RUSSIA are applied to the SocioNet and National Social Data Archive documents and data to provide for integration of resources and cross search. The beta version of SocioNet-UIS RUSSIA complex is available at www.cir.ru. The next step is to process and integrate the wide scope of full text scientific publications available under the RePEc - up to 20,000 articles, mostly in English. Technology of automatic analysis detects the main topics of each text and the links between them (structural annotation), English translations of the Thesaurus assist in translation of structural annotation into Russian. The quality of annotations directly depends on the English part of the Thesaurus. In its current version it is just a list of translations borrowed from the main thesauri in English - those of the Congressional Research Service, LC, UNESCO, EVROVOC, LegiSlate, Westlaw. The terminology is not arranged into a thesaurus and no culturological expertise to evaluate and coordinate the Russian and English concepts is made due to lack of funding.

The UIS RUSSIA - SocioNet - VCIOM form the base for scientific information infrastructure with other sources preparing to join. The technology is distributed for free to other universities, higher education institutions and academic centers outside Moscow to develop regional information systems based on local resources. The network architecture will ensure the integration and cross-search. The preparatory stage has already started by cooperation with the Sankt Petersburg University Information Center and the first mirror site is in action (www.uisrussia.nw.ru).

Tatyana Yudina
Moscow State University
Research Computing Center
Russia
Tel (095) 939 30 15, e-mail: yudina@mail.cir.ru