Robert Wozniak
Emerging from the Quagmire: Building Expert Systems Technologies for the Social Sciences

wozniak@pop.umn.edu

With the acceptance of processable metadata and the exploding growth of today's online data storage capacity, current stateless, largely context-free http- or cgi-driven extraction interfaces are quickly proving inadequate for traversing the vast amounts of online social science information. This presentation will explore ways of taking advantage of the latest technology for the discovery and access to ever-growing amounts of social science data.

Before the web, people could only go to experts who understood the data they were interested in. They described what they were after, using what terminology they were capable of, and left it to professionals to translate their request into a language the data extraction system understood. Putting an extraction process on the web, while relieving the burden on the professional, has simply shifted the burden of expertise onto the user. Without the guidance of a domain expert on the other end of a phone, users are only able to rely on the informational content displayed on their computer screen. Users risk spending their time scouring through a quagmire of documentation (sometimes with little context) and overwhelmed by seemingly inexhaustive and often times irrelevant lists and options.

Domain experts understand the ontology of their domain and can effectively draw the necessary (even common sense) inferences and deductions from a user's request to make a data extraction. It is this intellectual property that is missing in the vast majority of current online data extraction systems. Difficult hit-or-miss keyword searches and large selection lists are the norm today. But as data grows in size, comprehension and complexity, this approach becomes a hindrance. It is of paramount importance that organizations and domain experts take advantage of current technology and incorporate as much domain knowledge as possible within their search systems. Such advances will accommodate an ever-broadening user base confronted with an ever-growing amount of social science data.

Tomorrow's web-based solutions offer the means of democratizing access to data as well as interactively assisting users in understanding social science data and methodologies. Leveraging the development of the DDI, rule-based grammars for middle-tier processing, and xslt-driven interface and documentation generation, the web can be used as a pedagogic device to assist both novice and expert users in compiling meaningful social sciences data in a highly dynamic, personalized and intuitive way. This democratizes access in the best possible way: first, by accommodating both novice and expert level usage and second, by offering the means by which the novice can expand and improve upon their knowledge of social sciences and quantitative research to become, should they so choose, a domain expert themselves.

Robert Wozniak, Minnesota Population Center