Antonis C. Kakas Fariba Sadri (Eds.)
Computational Logic: Logic Programming and Beyond Essays in Honour of Robert A. Ko...

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Antonis C. Kakas Fariba Sadri (Eds.)

Computational Logic: Logic Programming and Beyond Essays in Honour of Robert A. Kowalski Part II

13

Series Editors Jaime G. Carbonell,Carnegie Mellon University, Pittsburgh, PA, USA J¨org Siekmann, University of Saarland, Saarbr¨ucken, Germany Volume Editors Antonis C. Kakas University of Cyprus, Department of Computer Science 75 Kallipoleos St., 1678 Nicosia, Cyprus E-mail:[email protected] Fariba Sadri Imperial College of Science, Technology and Medicine Department of Computing, 180 Queen’s Gate London SW7 2BZ, United Kingdom E-mail: [email protected]

Cataloging-in-Publication Data applied for Die Deutsche Bibliothek - CIP-Einheitsaufnahme Computational logic: logig programming and beyond : essays in honour of Robert A. Kowalski / Antonis C. Kakas ; Fariba Sadri (ed.). - Berlin ; Heidelberg ; New York ; Barcelona ; Hong Kong ; London ; Milan ; Paris ; Tokyo : Springer Pt. 2 . (2002) (Lecture notes in computer science ; Vol. 2408 : Lecture notes in artificial intelligence) ISBN 3-540-43960-9

CR Subject Classification (1998): I.2.3, D.1.6, I.2, F.4, I.1 ISSN 0302-9743 ISBN 3-540-43960-9 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH http://www.springer.de © Springer-Verlag Berlin Heidelberg 2002 Printed in Germany Typesetting: Camera-ready by author, data conversion by Boller Mediendesign Printed on acid-free paper SPIN 10873683 06/3142 543210

Foreword Alan Robinson This set of essays pays tribute to Bob Kowalski on his 60th birthday, an anniversary which gives his friends and colleagues an excuse to celebrate his career as an original thinker, a charismatic communicator, and a forceful intellectual leader. The logic programming community hereby and herein conveys its respect and thanks to him for his pivotal role in creating and fostering the conceptual paradigm which is its raison d'être. The diversity of interests covered here reflects the variety of Bob's concerns. Read on. It is an intellectual feast. Before you begin, permit me to send him a brief personal, but public, message: Bob, how right you were, and how wrong I was. I should explain. When Bob arrived in Edinburgh in 1967 resolution was as yet fairly new, having taken several years to become at all widely known. Research groups to investigate various aspects of resolution sprang up at several institutions, the one organized by Bernard Meltzer at Edinburgh University being among the first. For the half-dozen years that Bob was a leading member of Bernard's group, I was a frequent visitor to it, and I saw a lot of him. We had many discussions about logic, computation, and language. By 1970, the group had zeroed in on three ideas which were soon to help make logic programming possible: the specialized inference rule of linear resolution using a selection function, together with the plan of restricting it to Horn clauses ("LUSH resolution"); the adoption of an operational semantics for Horn clauses; and a marvellously fast implementation technique for linear resolution, based on structure-sharing of syntactic expressions. Bob believed that this work now made it possible to use the predicate calculus as a programming language. I was sceptical. My focus was still on the original motivation for resolution, to build better theorem provers. I worried that Bob had been sidetracked by an enticing illusion. In particular because of my intellectual investment in the classical semantics of predicate logic I was quite put off by the proposed operational semantics for Horn clauses. This seemed to me nothing but an adoption of MIT's notorious "Planner" ideology of computational inference. I did try, briefly, to persuade Bob to see things my way, but there was no stopping him. Thank goodness I could not change his mind, for I soon had to change mine. In 1971, Bob and Alain Colmerauer first got together. They pooled their thinking. The rest is history. The idea of using predicate logic as a programming language then really boomed, propelled by the rush of creative energy generated by the ensuing Marseilles-Edinburgh synergy. The merger of Bob's and Alain's independent insights launched a new era. Bob's dream came true, confirmed by the spectacular practical success of Alain's Prolog. My own doubts were swept away. In the thirty years since then, logic programming has developed into a jewel of computer science, known all over the world. Happy 60th birthday, Bob, from all of us.

Preface Bob Kowalski together with Alain Colmerauer opened up the new field of Logic Programming back in the early 1970s. Since then the field has expanded in various directions and has contributed to the development of many other areas in Computer Science. Logic Programming has helped to place logic firmly as an integral part of the foundations of Computing and Artificial Intelligence. In particular, over the last two decades a new discipline has emerged under the name of Computational Logic which aims to promote logic as a unifying basis for problem solving. This broad role of logic was at the heart of Bob Kowalski’s work from the very beginning as expounded in his seminal book “Logic for Problem Solving.” He has been instrumental both in shaping this broader scientific field and in setting up the Computational Logic community. This volume commemorates the 60th birthday of Bob Kowalski as one of the founders of and contributors to Computational Logic. It aspires to provide a landmark of the main developments in the field and to chart out its possible future directions. The authors were encouraged to provide a critical view of the main developments of the field together with an outlook on the important emerging problems and the possible contribution of Computational Logic to the future development of its related areas. The articles in this volume span the whole field of Computational Logic seen from the point of view of Logic Programming. They range from papers addressing problems concerning the development of programming languages in logic and the application of Computational Logic to real-life problems, to philosophical studies of the field at the other end of the spectrum. Articles cover the contribution of CL to Databases and Artificial Intelligence with particular interest in Automated Reasoning, Reasoning about Actions and Change, Natural Language, and Learning. It has been a great pleasure to help to put this volume together. We were delighted (but not surprised) to find that everyone we asked to contribute responded positively and with great enthusiasm, expressing their desire to honour Bob Kowalski. This enthusiasm remained throughout the long process of reviewing (in some cases a third reviewing process was necessary) that the invited papers had to go through in order for the decision to be made, whether they could be accepted for the volume. We thank all the authors very much for their patience and we hope that we have done justice to their efforts. We also thank all the reviewers, many of whom were authors themselves, who exhibited the same kind of zeal towards the making of this book. A special thanks goes out to Bob himself for his tolerance with our continuous stream of questions and for his own contribution to the book – his personal statement on the future of Logic Programming. Bob has had a major impact on our lives, as he has had on many others. I, Fariba, first met Bob when I visited Imperial College for an interview as a PhD applicant. I had not even applied for logic programming, but, somehow, I ended up being interviewed by Bob. In that very first meeting his enormous enthusiasm and energy for his subject was fully evident, and soon afterwards I found myself registered to do a PhD in logic

VIII

Preface

programming under his supervision. Since then, throughout all the years, Bob has been a constant source of inspiration, guidance, friendship, and humour. For me, Antonis, Bob did not supervise my PhD as this was not in Computer Science. I met Bob well after my PhD and I became a student again. I was extremely fortunate to have Bob as a new teacher at this stage. I already had some background in research and thus I was better equipped to learn from his wonderful and quite unique way of thought and scientific endeavour. I was also very fortunate to find in Bob a new good friend. Finally, on a more personal note the first editor wishes to thank Kim for her patient understanding and support with all the rest of life’s necessities thus allowing him the selfish pleasure of concentrating on research and other academic matters such as putting this book together. Antonis Kakas and Fariba Sadri

Table of Contents, Part II

VI Logic in Databases and Information Integration MuTACLP: A Language for Temporal Reasoning with Multiple Theories . . Paolo Baldan, Paolo Mancarella, Alessandra Raﬀaet` a, Franco Turini

1

Description Logics for Information Integration . . . . . . . . . . . . . . . . . . . . . . . . . 41 Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini Search and Optimization Problems in Datalog . . . . . . . . . . . . . . . . . . . . . . . . . 61 Sergio Greco, Domenico Sacc` a The Declarative Side of Magic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Paolo Mascellani, Dino Pedreschi Key Constraints and Monotonic Aggregates in Deductive Databases . . . . . . 109 Carlo Zaniolo

VII Automated Reasoning A Decidable CLDS for Some Propositional Resource Logics . . . . . . . . . . . . . 135 Krysia Broda A Critique of Proof Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Alan Bundy A Model Generation Based Theorem Prover MGTP for First-Order Logic . 178 Ryuzo Hasegawa, Hiroshi Fujita, Miyuki Koshimura, Yasuyuki Shirai A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 Eugenio G. Omodeo, Jacob T. Schwartz An Open Research Problem: Strong Completeness of R. Kowalski’s Connection Graph Proof Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 J¨ org Siekmann, Graham Wrightson

VIII Non-deductive Reasoning Meta-reasoning: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Stefania Costantini Argumentation-Based Proof Procedures for Credulous and Sceptical Non-monotonic Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 Phan Minh Dung, Paolo Mancarella, Francesca Toni

X

Table of Contents, Part II

Automated Abduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Katsumi Inoue The Role of Logic in Computational Models of Legal Argument: A Critical Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 Henry Prakken, Giovanni Sartor

IX Logic for Action and Change Logic Programming Updating - A Guided Approach . . . . . . . . . . . . . . . . . . . . 382 Jos´e J´ ulio Alferes, Lu´ıs Moniz Pereira Representing Knowledge in A-Prolog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Michael Gelfond Some Alternative Formulations of the Event Calculus . . . . . . . . . . . . . . . . . . . 452 Rob Miller, Murray Shanahan

X Logic, Language, and Learning Issues in Learning Language in Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 James Cussens On Implicit Meanings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506 Veronica Dahl Data Mining as Constraint Logic Programming . . . . . . . . . . . . . . . . . . . . . . . . 526 Luc De Raedt DCGs: Parsing as Deduction? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548 Chris Mellish Statistical Abduction with Tabulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Taisuke Sato, Yoshitaka Kameya

XI Computational Logic and Philosophy Logicism and the Development of Computer Science . . . . . . . . . . . . . . . . . . . . 588 Donald Gillies Simply the Best: A Case for Abduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Stathis Psillos

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627

Table of Contents, Part I

A Portrait of a Scientist as a Computational Logician . . . . . . . . . . . . . . . . . . Maurice Bruynooghe, Lu´ıs Moniz Pereira, J¨ org H. Siekmann, Maarten van Emden

1

Bob Kowalski: A Portrait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marek Sergot

5

Directions for Logic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Robert A. Kowalski

I Logic Programming Languages Agents as Multi-threaded Logical Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Keith Clark, Peter J. Robinson Logic Programming Languages for the Internet . . . . . . . . . . . . . . . . . . . . . . . . 66 Andrew Davison Higher-Order Computational Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 John W. Lloyd A Pure Meta-interpreter for Flat GHC, a Concurrent Constraint Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Kazunori Ueda

II Program Derivation and Properties Transformation Systems and Nondeclarative Properties . . . . . . . . . . . . . . . . . 162 Annalisa Bossi, Nicoletta Cocco, Sandro Etalle Acceptability with General Orderings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Danny De Schreye, Alexander Serebrenik Specification, Implementation, and Verification of Domain Specific Languages: A Logic Programming-Based Approach . . . . . . . . . . . . . . . . . . . . . 211 Gopal Gupta, Enrico Pontelli Negation as Failure through Abduction: Reasoning about Termination . . . . 240 Paolo Mancarella, Dino Pedreschi, Salvatore Ruggieri Program Derivation = Rules + Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Alberto Pettorossi, Maurizio Proietti

XII

Table of Contents, Part I

III Software Development Achievements and Prospects of Program Synthesis . . . . . . . . . . . . . . . . . . . . . 310 Pierre Flener Logic for Component-Based Software Development . . . . . . . . . . . . . . . . . . . . . 347 Kung-Kiu Lau, Mario Ornaghi Patterns for Prolog Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 Leon Sterling

IV Extensions of Logic Programming Abduction in Logic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402 Mark Denecker, Antonis Kakas Learning in Clausal Logic: A Perspective on Inductive Logic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Peter Flach, Nada Lavraˇc Disjunctive Logic Programming: A Survey and Assessment . . . . . . . . . . . . . . 472 Jack Minker, Dietmar Seipel Constraint Logic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 Mark Wallace

V Applications in Logic Planning Attacks to Security Protocols: Case Studies in Logic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 Luigia Carlucci Aiello, Fabio Massacci Multiagent Compromises, Joint Fixpoints, and Stable Models . . . . . . . . . . . . 561 Francesco Buccafurri, Georg Gottlob Error-Tolerant Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586 Thomas Eiter, Viviana Mascardi, V.S. Subrahmanian Logic-Based Hybrid Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626 Christoph G. Jung, Klaus Fischer Heterogeneous Scheduling and Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655 Thomas Sj¨ oland, Per Kreuger, Martin Aronsson

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677

MuTACLP: A Language for Temporal Reasoning with Multiple Theories Paolo Baldan, Paolo Mancarella, Alessandra Raﬀaet` a, and Franco Turini Dipartimento di Informatica, Universit` a di Pisa Corso Italia, 40, I-56125 Pisa, Italy {baldan,p.mancarella,raffaeta,turini}@di.unipi.it

Abstract. In this paper we introduce MuTACLP, a knowledge representation language which provides facilities for modeling and handling temporal information, together with some basic operators for combining diﬀerent temporal knowledge bases. The proposed approach stems from two separate lines of research: the general studies on meta-level operators on logic programs introduced by Brogi et al. [7,9] and Temporal Annotated Constraint Logic Programming (TACLP) deﬁned by Fr¨ uhwirth [15]. In MuTACLP atoms are annotated with temporal information which are managed via a constraint theory, as in TACLP. Mechanisms for structuring programs and combining separate knowledge bases are provided through meta-level operators. The language is given two diﬀerent and equivalent semantics, a top-down semantics which exploits meta-logic, and a bottom-up semantics based on an immediate consequence operator.

1

Introduction

Interest in research concerning the handling of temporal information has been growing steadily over the past two decades. On the one hand, much eﬀort has been spent in developing extensions of logic languages capable to deal with time (see, e.g., [14,36]). On the other hand, in the ﬁeld of databases, many approaches have been proposed to extend existing data models, such as the relational, the object-oriented and the deductive models, to cope with temporal data (see, e.g., the books [46,13] and references therein). Clearly these two strands of research are closely related, since temporal logic languages can provide solid theoretical foundations for temporal databases, and powerful knowledge representation and query languages for them [11,17,35]. Another basic motivation for our work is the need of mechanisms for combining pieces of knowledge which may be separated into various knowledge bases (e.g., distributed over the web), and thus which have to be merged together to reason with. This paper aims at building a framework where temporal information can be naturally represented and handled, and, at the same time, knowledge can be separated and combined by means of meta-level composition operators. Concretely, we introduce a new language, called MuTACLP, which is based on Temporal Annotated Constraint Logic Programming (TACLP), a powerful framework deﬁned A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 1–40, 2002. c Springer-Verlag Berlin Heidelberg 2002

2

Paolo Baldan et al.

by Fr¨ uhwirth in [15], where temporal information and reasoning can be naturally formalized. Temporal information is represented by temporal annotations which say at what time(s) the formula to which they are attached is valid. Such annotations make time explicit but avoid the proliferation of temporal variables and quantiﬁers of the ﬁrst-order approach. In this way, MuTACLP supports quantitative temporal reasoning and allows one to represent deﬁnite, indeﬁnite and periodic temporal information, and to work both with time points and time periods (time intervals). Furthermore, as a mechanism for structuring programs and combining diﬀerent knowledge sources, MuTACLP oﬀers a set of program composition operators in the style of Brogi et al. [7,9]. Concerning the semantical aspects, the use of meta-logic allows us to provide MuTACLP with a formal and, at the same time, executable top-down semantics based on a meta-interpreter. Furthermore the language is given a bottom-up semantics by introducing an immediate consequence operator which generalizes the operator for ordinary constraint logic programs. The two semantics are equivalent in the sense that the meta-interpreter can be proved sound and complete with respect to the semantics based on the immediate consequence operator. An interesting aspect of MuTACLP is the fact that it integrates modularity and temporal reasoning, a feature which is not common to logical temporal languages (e.g., it is lacking in [1,2,10,12,15,16,21,28]). Two exceptions are the language Temporal Datalog by Orgun [35] and the work on amalgamating knowledge bases by Subrahmanian [45]. Temporal Datalog introduces a concept of module, which, however, seems to be used as a means for deﬁning new nonstandard algebraic operators, rather than as a knowledge representation tool. On the other hand, the work on amalgamating knowledge bases oﬀers a multitheory framework, based on annotated logics, where temporal information can be handled, but only a limited interaction among the diﬀerent knowledge sources is allowed: essentially a kind of message passing mechanism allows one to delegate the resolution of an atom to other databases. In the database ﬁeld, our approach is close to the paradigm of constraint databases [25,27]. In fact, in MuTACLP the use of constraints allows one to model temporal information and to enable eﬃcient implementations of the language. Moreover, from a deductive database perspective, each constraint logic program of our framework can be viewed as an enriched relational database where relations are represented partly intensionally and partly extensionally. The meta-level operators can then be considered as a means of constructing views by combining diﬀerent databases in various ways. The paper is organized as follows. Section 2 brieﬂy introduces the program composition operators for combining logic theories of [7,9] and their semantics. Section 3, after reviewing the basics of constraint logic programming, introduces the language TACLP. Section 4 deﬁnes the new language MuTACLP, which integrates the basic ideas of TACLP with the composition operators on theories. In Section 5 the language MuTACLP is given a top-down semantics by means of a meta-interpreter and a bottom-up semantics based on an immediate consequence operator, and the two semantics are shown to be equivalent. Section 6 presents

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

3

some examples to clarify the use of operators on theories and to show the expressive power and the knowledge representation capabilities of the language. Section 7 compares MuTACLP with some related approaches in the literature and, ﬁnally, Section 8 outlines our future research plans. Proofs of propositions and theorems are collected in the Appendix. Due to space limitations, the proofs of some technical lemmata are omitted and can be found in [4,38]. An extended abstract of this paper has been presented at the International Workshop on Spatio-Temporal Data Models and Languages [33].

2

Operators for Combining Theories

Composition operators for logic programs have been thoroughly investigated in [7,9], where both their meta-level and their bottom-up semantics are studied and compared. In order to illustrate the basic notions and ideas of such an approach this section describes the meta-level deﬁnition of the operators, which is simply obtained by adding new clauses to the well-known vanilla metainterpreter for logic programs. The resulting meta-interpreter combines separate programs without actually building a new program. Its meaning is straightforward and, most importantly, the meta-logical deﬁnition shows that the multitheory framework can be expressed from inside logic programming itself. We consider two operators to combine programs: union ∪ and intersection ∩. Then the so-called program expressions are built by starting from a set of plain programs, consisting of collections of clauses, and by repeatedly applying the composition operators. Formally, the language of program expressions Exp is deﬁned by the following abstract syntax: Exp ::= Pname | Exp ∪ Exp | Exp ∩ Exp where Pname is the syntactic category of constant names for plain programs. Following [6], the two-argument predicate demo is used to represent provability. Namely, demo(E, G) means that the formula G is provable with respect to the program expression E. demo(E, empty). demo(E, (B1 , B2 )) ← demo(E, B1 ), demo(E, B2 ) demo(E, A) ← clause(E, A, B), demo(E, B) The unit clause states that the empty goal, represented by the constant symbol empty, is solved in any program expression E. The second clause deals with conjunctive goals. It states that a conjunction (B1 , B2 ) is solved in the program expression E if B1 is solved in E and B2 is solved in E. Finally, the third clause deals with the case of atomic goal reduction. To solve an atomic goal A, a clause with head A is chosen from the program expression E and the body of the clause is recursively solved in E. We adopt the simple naming convention used in [29]. Object programs are named by constant symbols, denoted by capital letters like P and Q. Object

4

Paolo Baldan et al.

level expressions are represented at the meta-level by themselves. In particular, object level variables are denoted by meta-level variables, according to the socalled non-ground representation. An object level program P is represented, at the meta-level, by a set of axioms of the kind clause(P, A, B), one for each object level clause A ← B in the program P . Each program composition operator is represented at the meta-level by a functor, whose meaning is deﬁned by adding new clauses to the above metainterpreter. clause(E1 ∪ E2 , A, B) ← clause(E1 , A, B) clause(E1 ∪ E2 , A, B) ← clause(E2 , A, B) clause(E1 ∩ E2 , A, (B1 , B2 )) ← clause(E1 , A, B1 ), clause(E2 , A, B2 ) The added clauses have a straightforward interpretation. Informally, union and intersection mirror two forms of cooperation among program expressions. In the case of union E1 ∪E2 , whose meta-level implementation is deﬁned by the ﬁrst two clauses, either expression E1 or E2 may be used to perform a computation step. For instance, a clause A ← B belongs to the meta-level representation of P ∪ Q if it belongs either to the meta-level representation of P or to the meta-level representation of Q. In the case of intersection E1 ∩ E2 , both expressions must agree to perform a computation step. This is expressed by the third clause, which exploits the basic uniﬁcation mechanism of logic programming and the non-ground representation of object level programs. A program expression E can be queried by demo(E, G), where G is an object level goal.

3

Temporal Annotated CLP

In this section we ﬁrst brieﬂy recall the basic concepts of Constraint Logic Programming (CLP). Then we give an overview of Temporal Annotated CLP (TACLP), an extension of CLP suited to deal with time, which will be used as a basic language for plain programs in our multi-theory framework. The reader is referred to the survey of Jaﬀar and Maher [22] for a comprehensive introduction to the motivations, foundations, and applications of CLP languages, and to the recent work of Jaﬀar et al. [23] for the formal presentation of the semantics. A good reference for TACLP is Fr¨ uhwirth’s paper [15]. 3.1

Constraint Logic Programming

A CLP language is completely determined by its constraint domain. A constraint domain C is a tuple SC , LC , DC , TC , solvC , where – SC = ΣC , ΠC is the constraint domain signature, comprising the function symbols ΣC and the predicate symbols ΠC .

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

5

– LC is the class of constraints, a set of ﬁrst-order SC -formulae, denoted by C, possibly subscripted. – DC is the domain of computation, a SC -structure which provides the intended interpretation of the constraints. The domain (or support) of DC is denoted by DC . – TC is the constraint theory, a SC -theory describing the logical semantics of the constraints. – solvC is the constraint solver, a (computable) function which maps each formula in LC to either true, or false, or unknown, indicating that the formula is satisﬁable, unsatisﬁable or it cannot be told, respectively. We assume that ΠC contains the predicate symbol “=”, interpreted as identity in DC . Furthermore we assume that LC contains all atoms constructed from “=”, the always satisﬁable constraint true and the unsatisﬁable constraint false, and that LC is closed under variable renaming, existential quantiﬁcation and conjunction. A primitive constraint is an atom of the form p(t1 , . . . , tn ) where p is a predicate in ΠC and t1 , . . . , tn are terms on ΣC . We assume that the solver does not take variable names into account. Also, the domain, the theory and the solver agree in the sense that DC is a model of TC and for every C ∈ LC : – solvC (C) = true implies TC |= ∃C, and – solvC (C) = f alse implies TC |= ¬∃C. Example 1. (Real) The constraint domain Real has as predicate symbols, +, -, *, / as function symbols and sequences of digits (possibly with a decimal point) as constant symbols. Examples of primitive constraints are X + 3 10. The domain of computation is the structure with reals as domain, and where the predicate symbols and the function symbols +, -, *, / are interpreted as the usual relations and functions over reals. Finally, the theory TReal is the theory of real closed ﬁelds. A possible constraint solver is provided by the CLP(R) system [24], which relies on Gauss-Jordan elimination to handle linear constraints. Non-linear constraints are not taken into account by the solver (i.e., their evaluation is delayed) until they become linear. Example 2. (Logic Programming) The constraint domain Term has = as predicate symbol and strings of alphanumeric characters as function or constant symbols. The domain of computation of Term is the set Tree of ﬁnite trees (or, equivalently, of ﬁnite terms), while the theory TTerm is Clark’s equality theory. The interpretation of a constant is a tree with a single node labeled by the constant. The interpretation of an n-ary function symbol f is the function fTree : Tree n → Tree mapping the trees t1 , . . . , tn to a new tree with root labeled by f and with t1 , . . . , tn as children. A constraint solver is given by the uniﬁcation algorithm. Then CLP(Term) coincides with logic programming.

6

Paolo Baldan et al.

For a given constraint domain C, we denote by CLP(C) the CLP language based on C. Our results are parametric to a language L in which all programs and queries under consideration are included. The set of function symbols in L, denoted by ΣL , coincides with ΣC , while the set of predicate symbols ΠL includes ΠC . A constraint logic program, or simply a program, is a ﬁnite set of rules of the form: A ← C1 , . . . , Cn , B1 , . . . , Bm where A and B1 , . . . , Bm (m ≥ 0) are atoms (whose predicate symbols are in ΠL but not in ΠC ), and C1 , . . . , Cn (n ≥ 0) are primitive constraints1 (A is called the head of the clause and C1 , . . . , Cn , B1 , . . . , Bm the body of the clause). If m = 0 then the clause is called a fact. A query is a sequence of atoms and/or constraints. Interpretations and Fixpoints. A C-interpretation for a CLP(C) program is an interpretation which agrees with DC on the interpretations of the symbols in LC . Formally, a C-interpretation I is a subset of C-base L , i.e. of the set {p(d1 , . . . , dn ) | p predicate in ΠL \ ΠC , d1 , . . . , dn ∈ DC }. Note that the meaning of primitive constraints is not speciﬁed, being ﬁxed by C. The notions of C-model and least C-model are a natural extension of the corresponding logic programming concepts. A valuation σ is a function that maps variables into DC . A C-ground instance A of an atom A is obtained by applying a valuation σ to the atom, thus producing a construct of the form p(a1 , . . . , an ) with a1 , . . . , an elements in DC . C-ground instances of queries and clauses are deﬁned in a similar way. We denote by ground C (P ) the set of C-ground instances of clauses from P . Finally the immediate consequence operator for a CLP(C) program P is a function TPC : ℘(C-baseL ) → ℘(C-baseL ) deﬁned as follows: A ← C1 , . . . , Ck , B1 , . . . , Bn , ∈ ground C (P ), C TP (I) = A | {B1 , . . . , Bn } ⊆ I, DC |= C1 , . . . , Ck The operator TPC is continuous, and therefore it has a least ﬁxpoint which can be computed as the least upper bound of the ω-chain {(TPC )i } i≥0 of the iterated applications of TPC starting from the empty set, i.e., (TPC )ω = i∈N (TPC )i . 3.2

Temporal Annotated Constraint Logic Programming

Temporal Annotated Constraint Logic Programming (TACLP), proposed by Fr¨ uhwirth in [15,39], has been shown to be a natural and powerful framework for formalizing temporal information and reasoning. In [15] TACLP is presented 1

Constraints and atoms can be in any position inside the body of a clause, although, for the sake of simplicity, we will always assume that the sequence of constraints precedes the sequence of atoms.

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

7

as an instance of annotated constraint logic (ACL) suited for reasoning about time. ACL, which can be seen as an extension of generalized annotated programs [26,30], generalizes basic ﬁrst-order languages with a distinguished class of predicates, called constraints, and a distinguished class of terms, called annotations, used to label formulae. Moreover ACL provides inference rules for annotated formulae and a constraint theory for handling annotations. An advantage of the languages in the ACL framework is that their clausal fragment can be eﬃciently implemented: given a logic in this framework, there is a systematic way to make a clausal fragment executable as a constraint logic program. Both an interpreter and a compiler can be generated and implemented in standard constraint logic programming languages. We next summarize the syntax and semantics of TACLP. As mentioned above, TACLP is a constraint logic programming language where formulae can be annotated with temporal labels and where relations between these labels can be expressed by using constraints. In TACLP the choice of the temporal ontology is free. In this paper, we will consider the instance of TACLP where time points are totally ordered and labels involve convex, non-empty sets of time points. Moreover we will assume that only atomic formulae can be annotated and that clauses are negation free. With an abuse of notation, in the rest of the paper such a subset of the language will be referred to simply as TACLP. Time can be discrete or dense. Time points are totally ordered by the relation ≤. We denote by D the set of time points and we suppose to have a set of operations (such as the binary operations +, −) to manage such points. We assume that the time-line is left-bounded by the number 0 and open to the future, with the symbol ∞ used to denote a time point that is later than any other. A time period is an interval [r, s] with r, s ∈ D and 0 ≤ r ≤ s ≤ ∞, which represents the convex, non-empty set of time points {t | r ≤ t ≤ s}2 . Thus the interval [0, ∞] denotes the whole time line. An annotated formula is of the form A α where A is an atomic formula and α an annotation. In TACLP, there are three kinds of annotations based on time points and on time periods. Let t be a time point and J = [r, s] be a time period. (at) The annotated formula A at t means that A holds at time point t. (th) The annotated formula A th J means that A holds throughout, i.e., at every time point in, the time period J. The deﬁnition of a th-annotated formula in terms of at is: A th J ⇔ ∀t (t ∈ J → A at t). (in) The annotated formula A in J means that A holds at some time point(s) but we do not know exactly which - in the time period J. The deﬁnition of an in-annotated formula in terms of at is: A in J ⇔ ∃t (t ∈ J ∧ A at t). The in temporal annotation accounts for indeﬁnite temporal information. 2

The results we present naturally extend to time lines that are bounded or unbounded in other ways and to time periods that are open on one or both sides.

8

Paolo Baldan et al.

The set of annotations is endowed with a partial order relation which turns it into a lattice. Given two annotations α and β, the intuition is that α β if α is “less informative” than β in the sense that for all formulae A, A β ⇒ A α. More precisely, being an instance of ACL, in addition to Modus Ponens, TACLP has two further inference rules: the rule () and the rule ( ). Aα

γα Aγ

rule ()

Aα

Aβ γ=α Aγ

β

rule ( )

The rule () states that if a formula holds with some annotation, then it also holds with all annotations that are smaller according to the lattice ordering. The rule ( ) says that if a formula holds with some annotation α and the same formula holds with another annotation β then it holds with the least upper bound α β of the two annotations. Next, we introduce the constraint theory for temporal annotations. Recall that a constraint theory is a non-empty, consistent ﬁrst order theory that axiomatizes the meaning of the constraints. Besides an axiomatization of the total order relation ≤ on the set of time points D, the constraint theory includes the following axioms deﬁning the partial order on temporal annotations. (at th) (at in) (th ) (in )

at t = th [t, t] at t = in [t, t] th [s1 , s2 ] th [r1 , r2 ] ⇔ r1 ≤ s1 , s1 ≤ s2 , s2 ≤ r2 in [r1 , r2 ] in [s1 , s2 ] ⇔ r1 ≤ s1 , s1 ≤ s2 , s2 ≤ r2

The ﬁrst two axioms state that th I and in I are equivalent to at t when the time period I consists of a single time point t.3 Next, if a formula holds at every element of a time period, then it holds at every element in all sub-periods of that period ((th ) axiom). On the other hand, if a formula holds at some points of a time period then it holds at some points in all periods that include this period ((in ) axiom). A consequence of the above axioms is (in th )

in [s1 , s2 ] th [r1 , r2 ] ⇔ s1 ≤ r2 , r1 ≤ s2 , s1 ≤ s2 , r1 ≤ r2

i.e., an atom annotated by in holds in any time period that overlaps with a time period where the atom holds throughout. To summarize the above explanation, the axioms deﬁning the partial order relation on annotations can be arranged in the following chain, where it is assumed that r1 ≤ s1 , s1 ≤ s2 , s2 ≤ r2 : in [r1 , r2 ] in [s1 , s2 ] in [s1 , s1 ] = at s1 = th [s1 , s1 ] th [s1 , s2 ] th [r1 , r2 ] Before giving an axiomatization of the least upper bound on temporal annotations, let us recall that, as explained in [15], the least upper bound of two annotations always exists but sometimes it may be “too large”. In fact, rule ( ) is correct only if the lattice order ensures A α ∧ A β ∧ (γ = α β) =⇒ A γ whereas, 3

Especially in dense time, one may disallow singleton periods and drop the two axioms. This restriction has no eﬀects on the results we are presenting.

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

9

in general, this is not true in our case. For instance, according to the lattice, th [1, 2] th [4, 5] = th [1, 5], but according to the deﬁnition of th-annotated formulae in terms of at , the conjunction A th [1, 2] ∧ A th [4, 5] does not imply A th [1, 5], since it does not express that A at 3 holds. From a theoretical point of view, this problem can be overcome by enriching the lattice of annotations with expressions involving . In practice, it suﬃces to consider the least upper bound for time periods that produce another diﬀerent meaningful time period. Concretely, one restricts to th annotations with overlapping time periods that do not include one another: (th )

th [s1 , s2 ]

th [r1 , r2 ] = th [s1 , r2 ] ⇔ s1 < r1 , r1 ≤ s2 , s2 < r2

Summarizing, a constraint domain for time points is ﬁxed where the signature includes suitable constants for time points, function symbols for operations on time points (e.g., +, −, . . .) and the predicate symbol ≤, modeling the total order relation on time points. Such constraint domain is extended to a constraint domain A for handling annotations, by enriching the signature with function symbols [·, ·], at, th, in, and the predicate symbol , axiomatized as described above. Then, as for ordinary constraint logic programming, a TACLP language is determined by ﬁxing a constraint domain C, which is required to contain the constraint domain A for annotations. We denote by TACLP(C) the TACLP language based on C. To lighten the notation, in the following, the “C” will be often omitted. The next deﬁnition introduces the clausal fragment of TACLP that can be used as an eﬃcient temporal programming language. Definition 1. A TACLP clause is of the form: A α ← C1 , . . . , Cn , B1 α1 , . . . , Bm αm (n, m ≥ 0) where A is an atom (not a constraint), α and αi are (optional) temporal annotations, the Cj ’s are constraints and the Bi ’s are atomic formulae. Constraints Cj cannot be annotated. A TACLP program is a ﬁnite set of TACLP clauses.

4

Multi-theory Temporal Annotated Constraint Logic Programming

A ﬁrst attempt to extend the multi-theory framework introduced in Section 2 to handle temporal information is presented in [32]. In that paper an object level program is a collection of annotated logic programming clauses, named by a constant symbol. An annotated clause is of the kind A ← B1 , . . . , Bn 2 [a, b] where the annotation [a, b] represents the period of time in which the clause holds. The handling of time is hidden at the object level and it is managed at the meta-level by intersecting or joining the intervals associated with clauses. However, this approach is not completely satisfactory, in that it does not oﬀer

10

Paolo Baldan et al.

mechanisms for modeling indeﬁnite temporal information and for handling periodic data. Moreover, some problems arise when we want to extract temporal information from the intervals at the object level. To obtain a more expressive language, where in particular the mentioned deﬁciencies are overcome, in this paper we consider a multi-theory framework where object level programs are taken from Temporal Annotated Constraint Logic Programming (TACLP) and the composition operators are generalized to deal with temporal annotated constraint logic programs. The resulting language, called Multi-theory Temporal Annotated Constraint Logic Programming (MuTACLP for short), thus arises as a synthesis of the work on composition operators for logic programs and of TACLP. It can be thought of both as a language which enriches TACLP with high-level mechanisms for structuring programs and for combining separate knowledge bases, and as an extension of the language of program expressions with constraints and with time-representation mechanisms based on temporal annotations for atoms. The language of program expressions remains formally the same as the one in Section 2. However now plain programs, named by the constant symbols in Pname, are TACLP programs as deﬁned in Section 3.2. Also the structure of the time domain remains unchanged, whereas, to deal with program composition, the constraint theory presented in Section 3.2 is enriched with the axiomatization of the greatest lower bound # of two annotations: (th #) th [s1 , s2 ] # th [r1 , r2 ] = th [t1 , t2 ] ⇔ s1 ≤ s2 , r1 ≤ r2 , t1 = max {s1 , r1 }, t2 = min{s2 , r2 }, t1 ≤ t2 (th # ) th [s1 , s2 ] # th [r1 , r2 ] = in [t2 , t1 ] ⇔ s1 ≤ s2 , r1 ≤ r2 , t1 = max {s1 , r1 }, t2 = min{s2 , r2 }, t2 < t1 (th in #) th [s1 , s2 ] # in [r1 , r2 ] = in [r1 , r2 ] ⇔ s1 ≤ r2 , r1 ≤ s2 , s1 ≤ s2 , r1 ≤ r2 (th in # ) th [s1 , s2 ] # in [r1 , r2 ] = in [s2 , r2 ] ⇔ s1 ≤ s2 , s2 < r1 , r1 ≤ r2 (th in # ) th [s1 , s2 ] # in [r1 , r2 ] = in [r1 , s1 ] ⇔ r1 ≤ r2 , r2 < s1 , s1 ≤ s2 (in #) in [s1 , s2 ] # in [r1 , r2 ] = in [t1 , t2 ] ⇔ s1 ≤ s2 , r1 ≤ r2 , t1 = min{s1 , r1 }, t2 = max {s2 , r2 } Keeping in mind that annotations deal with time periods, i.e., convex, nonempty sets of time points, it is not diﬃcult to verify that the axioms above indeed deﬁne the greatest lower bound with respect to the partial order relation . For instance the greatest lower bound of two th annotations, th [s1 , s2 ] and th [r1 , r2 ], can be: – a th [t1 , t2 ] annotation if [r1 , r2 ] and [s1 , s2 ] are overlapping intervals and [t1 , t2 ] is their (not empty) intersection (axiom (th #)); – an in [t1 , t2 ] annotation, otherwise, where interval [t1 , t2 ] is the least convex set which intersects both [s1 , s2 ] and [r1 , r2 ] (axiom (th # ), see Fig. 1.(a)). In all other cases the greatest lower bound is always an in annotation. For instance, as expressed by axiom (th in # ), the greatest lower bound of two

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

11

annotations th [s1 , s2 ] and in [r1 , r2 ] with disjoint intervals is given by in [s2 , r2 ], where interval [s2 , r2 ] is the least convex set containing [r1 , r2 ] and intersecting [s1 , s2 ] (see Fig. 1.(b)). The greatest lower bound will play a basic role in the deﬁnition of the intersection operation over program expressions. Notice that in TACLP it is not needed since the problem of combining programs is not dealt with. th s1

th s2

r1

th r2

in

s1

s2

r1

in t1

r2

in t2

(a)

s2

r2

(b)

Fig. 1. Greatest lower bound of annotations.

Finally, as in TACLP we still have, in addition to Modus Ponens, the inference rules () and ( ). Example 3. In a company there are some managers and a secretary who has to manage their meetings and appointments. During the day a manager can be busy if she/he is on a meeting or if she/he is not present in the oﬃce. This situation is modeled by the theory Managers. Managers: busy(M ) th [T1 , T2 ] ← in-meeting(M ) th [T1 , T2 ] busy(M ) th [T1 , T2 ] ← out -of -oﬃce(M ) th [T1 , T2 ] This theory is parametric with respect to the predicates in-meeting and out -of -oﬃce since the schedule of managers varies daily. The schedules are collected in a separate theory Today-Schedule and, to know whether a manager is busy or not, such a theory is combined with Managers by using the union operator. For instance, suppose that the schedule for a given day is the following: Mr. Smith and Mr. Jones have a meeting at 9am lasting one hour. In the afternoon Mr. Smith goes out for lunch at 2pm and comes back at 3pm. The theory Today-Schedule below represents such information. Today-Schedule: in-meeting(mrSmith) th [9am, 10am]. in-meeting(mrJones) th [9am, 10am]. out -of -oﬃce(mrSmith) th [2pm, 3pm]. To know whether Mr. Smith is busy between 9:30am and 10:30am the secretary can ask for busy(mrSmith) in [9:30am, 10:30am]. Since Mr. Smith is in a meeting

12

Paolo Baldan et al.

from 9am till 10am, she will indeed obtain that Mr. Smith is busy. The considered query exploits indeﬁnite information, because knowing that Mr. Smith is busy in one instant in [9:30am, 10:30am] the secretary cannot schedule an appointment for him for that period. Example 4. At 10pm Tom was found dead in his house. The only hint is that the answering machine recorded some messages from 7pm up to 8pm. At a ﬁrst glance, the doctor said Tom died one to two hours before. The detective made a further assumption: Tom did not answer the telephone so he could be already dead. We collect all these hints and assumptions into three programs, Hints, Doctor and Detective, in order not to mix ﬁrm facts with simple hypotheses that might change during the investigations. Hints:

found at 10pm. ans-machine th [7pm, 8pm].

Doctor:

dead in [T − 2:00, T − 1:00] ← found at T

Detective:

dead in [T1 , T2 ] ← ans-machine th [T1 , T2 ]

If we combine the hypotheses of the doctor and those of the detective we can extend the period of time in which Tom possibly died. The program expression Doctor ∩ Detective behaves as dead in [S1 , S2 ] ← in [T − 2:00, T − 1:00] # in [T1 , T2 ] = in [S1 , S2 ], found at T , ans-machine th [T1 , T2 ] The constraint in [T − 2:00, T − 1:00] # in [T1 , T2 ] = in [S1 , S2 ] determines the annotation in [S1 , S2 ] in which Tom possibly died: according to axiom (in #) the resulting interval is S1 = min{T − 2:00, T1 } and S2 = max {T − 1:00, T2}. In fact, according to the semantics deﬁned in the next section, a consequence of the program expression Hints ∪ (Doctor ∩ Detective) is just dead in [7pm, 9pm] since the annotation in [7pm, 9pm] is the greatest lower bound of in [8pm, 9pm] and in [7pm, 8pm].

5

Semantics of MuTACLP

In this section we introduce an operational (top-down) semantics for the language MuTACLP by means of a meta-interpreter. Then we provide MuTACLP with a least ﬁxpoint (bottom-up) semantics, based on the deﬁnition of an immediate consequence operator. Finally, the meta-interpreter for MuTACLP is proved sound and complete with respect to the least ﬁxpoint semantics.

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

13

In the deﬁnition of the semantics, without loss of generality, we assume all atoms to be annotated with th or in labels. In fact at t annotations can be replaced with th [t, t] by exploiting the (at th) axiom. Moreover, each atom which is not annotated in the object level program is intended to be true throughout the whole temporal domain and thus it can be labelled with th [0, ∞]. Constraints remain unchanged. 5.1

Meta-interpreter

The extended meta-interpreter is deﬁned by the following clauses. demo(E, empty).

(1)

demo(E, (B1 , B2 )) ← demo(E, B1 ), demo(E, B2 )

(2)

demo(E, A th [T1 , T2 ]) ← S1 ≤ T1 , T1 ≤ T2 , T2 ≤ S2 , clause(E, A th [S1 , S2 ], B), demo(E, B)

(3)

demo(E, A th [T1 , T2 ]) ← S1 ≤ T1 , T1 < S2 , S2 < T2 , clause(E, A th [S1 , S2 ], B), demo(E, B), demo(E, A th [S2 , T2 ])

(4)

demo(E, A in [T1 , T2 ]) ← T1 ≤ S2 , S1 ≤ T2 , T1 ≤ T2 , clause(E, A th [S1 , S2 ], B), demo(E, B)

(5)

demo(E, A in [T1 , T2 ]) ← T1 ≤ S1 , S2 ≤ T2 , clause(E, A in [S1 , S2 ], B), demo(E, B)

(6)

demo(E, C) ← constraint(C), C

(7)

clause(E1 ∪ E2 , A α, B) ← clause(E1 , A α, B)

(8)

clause(E1 ∪ E2 , A α, B) ← clause(E2 , A α, B)

(9)

clause(E1 ∩ E2 , A γ, (B1 , B2 )) ← clause(E1 , A α, B1 ), clause(E2 , A β, B2 ), α#β =γ

(10)

A clause A α ← B of a plain program P is represented at the meta-level by clause(P, A α, B) ← S1 ≤ S2 where α = th [S1 , S2 ] or α = in [S1 , S2 ].

(11)

14

Paolo Baldan et al.

This meta-interpreter can be written in any CLP language that provides a suitable constraint solver for temporal annotations (see Section 3.2 for the corresponding constraint theory). A ﬁrst diﬀerence with respect to the metainterpreter in Section 2 is that our meta-interpreter handles constraints that can either occur explicitly in its clauses, e.g., the constraint s1 ≤ t1 , t1 ≤ t2 , t2 ≤ s2 in clause (3), or can be produced by resolution steps. Constraints of the latter kind are managed by clause (7) which passes each constraint C to be solved directly to the constraint solver. The second diﬀerence is that our meta-interpreter implements not only Modus Ponens but also rule () and rule ( ). This is the reason why the third clause for the predicate demo of the meta-interpreter in Section 2 is now split into four clauses. Clauses (3), (5) and (6) implement the inference rule (): the atomic goal to be solved is required to be labelled with an annotation which is smaller than the one labelling the head of the clause used in the resolution step. For instance, clause (3) states that given a clause A th [s1 , s2 ] ← B whose body B is solvable, we can derive the atom A annotated with any th [t1 , t2 ] such that th [t1 , t2 ] th [s1 , s2 ], i.e., according to axiom (th ), [t1 , t2 ] ⊆ [s1 , s2 ], as expressed by the constraint s1 ≤ t1 , t1 ≤ t2 , t2 ≤ s2 . Clauses (5) and (6) are built in an analogous way by exploiting axioms (in th ) and (in ), respectively. Rule ( ) is implemented by clause (4). According to the discussion in Section 3.2, it is applicable only to th annotations involving overlapping time periods which do not include one another. More precisely, clause (4) states that if we can ﬁnd a clause A th [s1 , s2 ] ← B such that the body B is solvable, and if moreover the atom A can be proved throughout the time period [s2 , t2 ] (i.e., demo(E, A th [s2 , t2 ]) is solvable) then we can derive the atom A labelled with any annotation th [t1 , t2 ] th [s1 , t2 ]. The constraints on temporal variables ensure that the time period [t1 , t2 ] is a new time period diﬀerent from [s1 , s2 ], [s2 , t2 ] and their subintervals. Finally, in the meta-level representation of object clauses, as expressed by clause (11), the constraint s1 ≤ s2 is added to ensure that the head of the object clause has a well-formed, namely non-empty, annotation. As far as the meta-level deﬁnition of the union and intersection operators is concerned, clauses implementing the union operation remain unchanged with respect to the original deﬁnition in Section 2, whereas in the clause implementing the intersection operation a constraint is added, which expresses the annotation for the derived atom. Informally, a clause A α ← B, belonging to the intersection of two program expressions E1 and E2 , is built by taking one clause instance from each program expression E1 and E2 , such that the head atoms of the two clauses are uniﬁable. Let such instances of clauses be cl1 and cl2 . Then B is the conjunction of the bodies of cl1 and cl2 and A is the uniﬁed atom labelled with the greatest lower bound of the annotations of the heads of cl1 and cl2 . The following example shows the usefulness of clause (4) to derive new temporal information according to the inference rule ( ). Example 5. Consider the databases DB1 and DB2 containing information about people working in two companies. Jim is a consultant and he works for the ﬁrst

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

15

company from January 1, 1995 to April 30, 1995 and for the second company from April 1, 1995 to September 15, 1995. DB1: consultant(jim) th [Jan 1 1995 , Apr 30 1995 ]. DB2: consultant(jim) th [Apr 1 1995 , Sep 15 1995 ]. The period of time in which Jim works as a consultant can be obtained by querying the union of the above theories as follows: demo(DB1 ∪ DB2, consultant(jim) th [T1 , T2 ]). By using clause (4), we can derive the interval [Jan 1 1995 , Sep 15 1995 ] (more precisely, the constraints Jan 1 1995 ≤ T1 , T1 < Apr 30 1995 , Apr 30 1995 < T2 , T2 ≤ Sep 15 1995 are derived) that otherwise would never be generated. In fact, by applying clause (3) alone, we can prove only that Jim is a consultant in the intervals [Jan 1 1995 , Apr 30 1995 ] and [Apr 1 1995 , Sep 15 1995 ] (or in subintervals of them) separately. 5.2

Bottom-Up Semantics

To give a declarative meaning to program expressions, we deﬁne a “higherorder” semantics for MuTACLP. In fact, the results in [7] show that the least Herbrand model semantics of logic programs does not scale smoothly to program expressions. Fundamental properties of semantics, like compositionality and full abstraction, are deﬁnitely lost. Intuitively, the least Herbrand model semantics is not compositional since it identiﬁes programs which have diﬀerent meanings when combined with others. Actually, all the programs whose least Herbrand model is empty are identiﬁed with the empty program. For example, the programs {r ← s} {r ← q} are both denoted by the empty model, though they behave quite diﬀerently when composed with other programs (e.g., consider the union with {q.}). Brogi et al. showed in [9] that deﬁning as meaning of a program P the immediate consequence operator TP itself (rather than the least ﬁxpoint of TP ), one obtains a semantics which is compositional with respect to several interesting operations on programs, in particular ∪ and ∩. Along the same line, the semantics of a MuTACLP program expression is taken to be the immediate consequence operator associated with it, i.e., a function from interpretations to interpretations. The immediate consequence operator of constraint logic programming is generalized to deal with temporal annotations by considering a kind of extended interpretations, which are basically sets of annotated elements of C-base L . More precisely, we ﬁrst deﬁne a set of (semantical) annotations Ann = {th [t1 , t2 ], in [t1 , t2 ] | t1 , t2 time points ∧ DC |= t1 ≤ t2 }

16

Paolo Baldan et al.

where DC is the SC -structure providing the intended interpretation of the constraints. Then the lattice of interpretations is deﬁned as (℘(C-base L × Ann), ⊆) where ⊆ is the usual set-theoretic inclusion. Finally the immediate consequence operator TCE for a program expression E is compositionally deﬁned in terms of the immediate consequence operator for its sub-expressions. Definition 2 (Bottom-up semantics). Let E be a program expression, the function TCE : ℘(C-base L × Ann) → ℘(C-base L × Ann) is deﬁned as follows. – (E is a plain program P ) TCP (I) = (α = th [s1 , s2 ] ∨ α = in [s1 , s2 ]), ¯ B1 α1 , . . . , Bn αn ∈ ground C (P ), A α ← C, (A, α) | {(B1 , β1 ), . . . , (Bn , βn )} ⊆ I, ¯ α1 β1 , . . . , αn βn , s1 ≤ s2 DC |= C, ∪ ¯ B1 α1 , . . . , Bn αn ∈ ground C (P ), A th [s1 , s2 ] ← C, (A, th [s1 , r2 ]) | {(B1 , β1 ), . . . , (Bn , βn )} ⊆ I, (A, th [r1 , r2 ]) ∈ I, ¯ α1 β1 , . . . , αn βn , s1 < r1 , r1 ≤ s2 , s2 < r2 DC |= C, where C¯ is a shortcut for C1 , . . . , Ck . – (E = E1 ∪ E2 ) TCE1 ∪E2 (I) = TCE1 (I) ∪ TCE2 (I) – (E = E1 ∩ E2 ) TCE1 ∩E2 (I) = TCE1 (I) e TCE2 (I) where I1 e I2 = {(A, γ) | (A, α) ∈ I1 , (A, β) ∈ I2 , DC |= α # β = γ}. Observe that the deﬁnition above properly extends the standard deﬁnition of the immediate consequence operator for constraint logic programs (see Section 3.1). In fact, besides the usual Modus Ponens rule, it captures rule ( ) (as expressed by the second set in the deﬁnition of TCP ). Furthermore, also rule () is taken into account to prove that an annotated atom holds in an interpretation: to derive the head A α of a clause it is not necessary to ﬁnd in the interpretation exactly the atoms B1 α1 , . . . , Bn αn occurring in the body of the clause, but it suﬃces to ﬁnd atoms Bi βi which imply Bi αi , i.e., such that each βi is an annotation stronger than αi (DC |= αi βi ). Notice that TCE (I) is not downward closed, namely, it is not true that if (A, α) ∈ TCE (I) then for all (A, γ) such that DC |= γ α, we have (A, γ) ∈ TCE (I). The downward closure will be taken only after the ﬁxpoint of TCE is computed. We will see that, nevertheless, no deductive capability is lost and rule () is completely modeled. The set of immediate consequences of a union of program expressions is the set-theoretic union of the immediate consequences of each program expression. Instead, an atom A labelled by γ is an immediate consequence of the intersection of two program expressions if A is a consequence of both program expressions,

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

17

possibly with diﬀerent annotations α and β, and the label γ is the greatest lower bound of the annotations α and β. Let us formally deﬁne the downward closure of an interpretation. Definition 3 (Downward closure). The downward closure of an interpretation I ⊆ C-base L × Ann is deﬁned as: ↓ I = {(A, α) | (A, β) ∈ I, DC |= α β}. The next proposition sheds some more light on the semantics of the intersection operator, by showing that, when we apply the downward closure, the image of an interpretation through the operator TCE1 ∩E2 is the set-theoretic intersection of the images of the interpretation through the operators associated with E1 and E2 , respectively. This property supports the intuition that the program expressions have to agree at each computation step (see [9]). Proposition 1. Let I1 and I2 be two interpretations. Then ↓ (I1 e I2 ) = (↓ I1 ) ∩ (↓ I2 ). The next theorem shows the continuity of the TCE operator over the lattice of interpretations. Theorem 1 (Continuity). For any program expression E, the function TCE is continuous (over (℘(C-base L × Ann), ⊆)). The ﬁxpoint semantics for a program expression is now deﬁned as the downward of the least ﬁxpoint of TCE which, by continuity of TCE , is determined closure C i as i∈N (TE ) . Definition 4 (Fixpoint semantics). Let E be a program expression. The ﬁxpoint semantics of E is deﬁned as

F C (E) =↓ (TCE )ω . We remark that the downward closure is applied only once, after having computed the ﬁxpoint of TCE . However, it is easy to see that the closure is a continuous operator on the lattice of interpretations ℘(C-base L × Ann). Thus ↓

(TCE )i

i∈N

=

i∈N

↓ (TCE )i

showing that by taking the closure at each step we would have obtained the same set of consequences. Hence, as mentioned before, rule () is completely captured.

18

5.3

Paolo Baldan et al.

Soundness and Completeness

In the spirit of [7,34] we deﬁne the semantics of the meta-interpreter by relating the semantics of an object program to the semantics of the corresponding vanilla meta-program (i.e., including the meta-level representation of the object program). When stating the correspondence between the object program and the meta-program we consider only formulae of interest, i.e., elements of C-base L annotated with labels from Ann, which are the semantic counterpart of object level annotated atoms. We show that given a MuTACLP program expression E (object program) for any A ∈ C-base L and any α ∈ Ann, demo(E, A α) is provable at the meta-level if and only if (A, α) is provable in the object program. Theorem 2 (Soundness and completeness). Let E be a program expression and let V be the meta-program containing the meta-level representation of the object level programs occurring in E and clauses (1)-(10). For any A ∈ C-base L and α ∈ Ann, the following statement holds: demo(E, A α) ∈ (TVM )ω

⇐⇒

(A, α) ∈ F C (E),

where TVM is the standard immediate consequence operator for CLP programs. Note that V is a CLP(M) program where M is a multi-sorted constraint domain, including the constraint domain Term, presented in Example 2, and the constraint domain C. It is worth observing that if C is a C-ground instance of a constraint then DM |= C ⇔ DC |= C.

6

Some Examples

This section is devoted to present examples which illustrate the use of annotations in the representation of temporal information and the structuring possibilities oﬀered by the operators. First we describe applications of our framework in the ﬁeld of legal reasoning. Then we show how the intersection operator can be employed to deﬁne a kind of valid-timeslice operator.

6.1

Applications to Legal Reasoning

Laws and regulations are naturally represented in separate theories and they are usually combined in ways that are necessarily more complex than a plain merging. Time is another crucial ingredient in the deﬁnition of laws and regulations, since, quite often, they refer to instants of time and, furthermore, their validity is restricted to a ﬁxed period of time. This is especially true for laws and regulations which concern taxation and government budget related regulations in general.

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

19

British Nationality Act. We start with a classical example in the ﬁeld of legal reasoning [41], i.e. a small piece of the British Nationality Act. Simply partitioning the knowledge into separate programs and using the basic union operation, one can exploit the temporal information in an orderly way. Assume that Jan 1 1955 is the commencement date of the law. Then statement x obtains the British Nationality at time t if x is born in U.K. at time t and t is after commencement and y is parent of x and y is a British citizen at time t or y is a British resident at time t is modeled by the following program. BNA: get-citizenship(X) at T ← T ≥ Jan 1 1955 , born(X,uk) at T , parent(Y,X) at T , british-citizen(Y) at T get-citizenship(X) at T ← T ≥ Jan 1 1955 , born(X,uk) at T , parent(Y,X) at T , british-resident(Y) at T Now, the data for a single person, say John, can be encoded in a separate program. John: born(john,uk) at Aug 10 1969 . parent(bob,john) th [T, ∞] ← born(john, ) at T british-citizen(bob) th [Sept 6 1940 , ∞]. Then, by means of the union operator, one can inquire about the citizenship of John, as follows demo(BNA ∪ John, get-citizenship(john) at T ) obtaining as result T = Aug 10 1969 . Movie Tickets. Since 1997, an Italian regulation for encouraging people to go to the cinema, states that on Wednesdays the ticket price is 8000 liras, whereas in the rest of the week it is 12000 liras. The situation can be modeled by the following theory BoxOff. BoxOff: ticket (8000 , X ) at T ← T ≥ Jan 1 1997 , wed at T ticket (12000 , X ) at T ← T ≥ Jan 1 1997 , non wed at T The constraint T ≥ Jan 1 1997 represents the validity of the clause, which holds from January 1, 1997 onwards. The predicates wed and non wed are deﬁned in a separate theory Days, where w is assumed to be the last Wednesday of 1996.

20

Days:

Paolo Baldan et al.

wed at w. wed at T + 7 ← wed at T non wed th [w + 1, w + 6]. non wed at T + 7 ← non wed at T

Notice that, by means of recursive predicates one can easily express periodic temporal information. In the example, the deﬁnition of the predicate wed expresses the fact that a day is Wednesday if it is a date which is known to be Wednesday or it is a day coming seven days after a day proved to be Wednesday. The predicate non wed is deﬁned in an analogous way; in this case the unit clause states that all six consecutive days following a Wednesday are not Wednesdays. Now, let us suppose that the owner of a cinema wants to increase the discount for young people on Wednesdays, establishing that the ticket price for people who are eighteen years old or younger is 6000 liras. By resorting to the intersection operation we can build a program expression that represents exactly the desired policy. We deﬁne three new programs Cons, Disc and Age. Cons:

ticket (8000 , X ) at T ← Y > 18, age(X , Y ) at T ticket (12000 , X ) at T .

The above theory speciﬁes how the predicate deﬁnitions in BoxOff must change according to the new policy. In fact to get a 8000 liras ticket now a further constraint must be satisﬁed, namely the customer has to be older than eighteen years old. On the other hand, no further requirement is imposed to buy a 12000 liras ticket. Disc:

ticket (6000 , X ) at T ← a ≤ 18, wed at T , age(p, a) at T

The only clause in Disc states that a 6000 liras ticket can be bought on Wednesdays by a person who is eighteen years old or younger. The programs Cons and Disc are parametric with respect to the predicate age, which is deﬁned in a separate theory Age. Age:

age(X , Y ) at T ← born(X ) at T1 , year-diﬀ(T1 , T, Y )

At this point we can compose the above programs to obtain the program expression representing the new policy, namely (BoxOff ∩ Cons) ∪ Disc ∪ Days ∪ Age. Finally, in order to know how much is a ticket for a given person, the above program expression must be joined with a separate program containing the date of birth of the person. For instance, such program could be Tom:

born(tom) at May 7 1982 .

Then the answer to the query demo(((BoxOff ∩ Cons) ∪ Disc ∪ Days ∪ Tom), ticket (X , tom) at May 20 1998 ) is X = 6000 since May 20 1998 is a Wednesday and Tom is sixteen years old.

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

21

Invim. Invim was an Italian law dealing with paying taxes on real estate transactions. The original regulation, in force since January 1, 1950, requires time calculations, since the amount of taxes depends on the period of ownership of the real estate property. Furthermore, although the law has been abolished in 1992, it still applies but only for the period antecedent to 1992. To see how our framework allows us to model the described situation let us ﬁrst consider the program Invim below, which contains a sketch of the original body of regulations. Invim: due(Amount,X,Prop) th [T2 , ∞] ← T2 ≥ Jan 1 1950 , buys(X,Prop) at T1 , sells(X,Prop) at T2 , compute(Amount,X,Prop,T1 ,T2 ) compute(Amount,X,Prop,T1 ,T2 ) ← . . . To update the regulations in order to consider the decisions taken in 1992, as in the previous example we introduce two new theories. The ﬁrst one includes a set of constraints on the applicability of the original regulations, while the second one is designed to embody regulations capable of handling the new situation. Constraints: due(Amount,X,Prop) th [Jan 1 1993 , ∞] ← sells(X,Prop) in [Jan 1 1950 , Dec 31 1992 ] compute(Amount,X,Prop,T1 ,T2 ). The ﬁrst rule speciﬁes that the relation due is computed, provided that the selling date is antecedent to December, 31 1992. The second rule speciﬁes that the rules for compute, whatever number they are, and whatever complexity they have, carry on unconstrained to the new version of the regulation. It is important to notice that the design of the constraining theory can be done without taking care of the details (which may be quite complicated) embodied in the original law. The theory which handles the case of a property bought before December 31, 1992 and sold after the ﬁrst of January, 1993, is given below. Additions: due(Amount,X,Prop) th [T2 , ∞] ← T2 ≥ Jan 1 1993 , buys(X,Prop) at T1 , sells(X,Prop) at T2 , compute(Amount,X,Prop,T1 ,Dec 31 1992 ) Now consider a separate theory representing the transactions regarding Mary, who bought an apartment on March 8, 1965 and sold it on July 2, 1997. Trans1: buys(mary,apt8) at Mar 8 1965 . sells(mary,apt8) at Jul 2 1997 .

22

Paolo Baldan et al.

The query demo(Invim ∪ Trans1, due(Amount,mary,apt8) th [ , ]) yields the amount, say 32.1, that Mary has to pay when selling the apartment according to the old regulations. On the other hand, the query demo(((Invim ∩ Constraints) ∪ Additions) ∪ Trans1, due(Amount,mary,apt8) th [ , ]) yields the amount, say 27.8, computed according to the new regulations. It is smaller than the previous one because taxes are computed only for the period from March 8, 1965 to December 31, 1992, by using the clause in the program Additions. The clause in Invim ∩ Constraints cannot be used since the condition regarding the selling date (sells(X,Prop) in [Jan 1 1950 , Dec 31 1992 ]) does not hold. In the transaction, represented by the program below, Paul buys the ﬂat on January 1, 1995. Trans2: buys(paul,apt9) at Jan 1 1995 . sells(paul,apt9) at Sep 12 1998 . demo(Invim ∪ Trans2, due(Amount,paul,apt9) th [ , ]) Amount = 1.7 demo(((Invim ∩ Constraints) ∪ Additions) ∪ Trans2, due(Amount,paul,apt9) th [ , ]) no If we query the theory Invim ∪ Trans2 we will get that Paul must pay a certain amount of tax, say 1.7, while, according to the updated regulation, he must not pay the Invim tax because he bought and sold the ﬂat after December 31, 1992. Indeed, the answer to the query computed with respect to the theory ((Invim ∩ Constraints) ∪ Additions) ∪ Trans2 is no, i.e., no tax is due. Summing up, the union operation can be used to obtain a larger set of clauses. We can join a program with another one to provide it with deﬁnitions of its undeﬁned predicates (e.g., Age provides a deﬁnition for the predicate age not deﬁned in Disc and Cons) or alternatively to add new clauses for an existing predicate (e.g., Disc contains a new deﬁnition for the predicate ticket already deﬁned in BoxOff). On the other hand, the intersection operator provides a natural way of imposing constraints on existing programs (e.g., the program Cons constrains the deﬁnition of ticket given in BoxOff). Such constraints aﬀect not only the computation of a particular property, like the intersection operation deﬁned by Brogi et al. [9], but also the temporal information in which the property holds.

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

23

The use of TACLP programs allows us to represent and reason on temporal information in a natural way. Since time is explicit, at the object level we can directly access the temporal information associated with atoms. Periodic information can be easily expressed by recursive predicates (see the predicates wed and non-wed in the theory Days). Indeﬁnite temporal information can be represented by using in annotations. E.g., in the program Additions the in annotation is used to specify that a certain date is within a time period (sell(X,Prop) in [Jan 1 1950, Dec 31 1992]). This is a case in which it is not important to know the precise date but it is suﬃcient to have an information which delimits the time period in which it can occur. 6.2

Valid-Timeslice Operator

By exploiting the features of the intersection operator we can deﬁne an operator which eases the selection of information holding in a certain interval. Definition 5. Let P be a plain program. For a ground interval [t1 , t2 ] we deﬁne [t ,t2 ]

P ⇓ [t1 , t2 ] = P ∩ 1P1 [t ,t ]

where 1P1 2 is a program which contains a fact “p(X1 , . . . , Xn )th [t1 , t2 ].” for all p deﬁned in P with arity n. Intuitively the operator ⇓ selects only the clauses belonging to P that hold in [t1 , t2 ] or in a subinterval of [t1 , t2 ], and it restricts their validity time to such an interval. Therefore ⇓ allows us to create temporal views of programs, for instance P ⇓ [t, t] is the program P at time point t. Hence it acts as a validtimeslice operator in the ﬁeld of databases (see the glossary in [13]). Consider again the Invim example of the previous section. The whole history of the regulation concerning Invim, can be represented by using the following program expression (Invim ⇓ [0, Dec 31 1992 ]) ∪ ((Invim ∩ Constraints) ∪ Additions) By applying the operation ⇓, the validity of the clauses belonging to Invim is restricted to the period from January 1, 1950 up to December 31, 1992, thus modeling the law before January 1, 1993. On the other hand, the program expression (Invim ∩ Constraints) ∪ Additions expresses the regulation in force since January 1, 1993, as we previously explained. This example suggests how the operation ⇓ can be useful to model updates. Suppose that we want to represent that Frank is a research assistant in mathematics, and that, later, he is promoted becoming an assistant professor. In our formalism we can deﬁne a program Frank that records the information associated with Frank as a research assistant. Frank: research assistant(maths) th [Mar 8 1993 , ∞].

24

Paolo Baldan et al.

On March 1996 Frank becomes an assistant professor. In order to modify the information contained in the program Frank, we build the following program expression: (Frank ⇓ [0, Feb 29 1996 ]) ∪ {assistant prof(maths) th [Mar 1 1996 , ∞].} where the second expression is an unnamed theory. Unnamed theories, which have not been discussed so far, can be represented by the following meta-level clause: clause({X α ← Y }, X α, Y ) ← T1 ≤ T2 where α = th [T1 , T2 ] or α = in [T1 , T2 ]. The described update resembles the addition and deletion of a ground atom. For instance in LDL++ [47] an analogous change can be implemented by solving the goal −research assistant(maths), +assistant prof (maths). The advantage of our approach is that we do not change directly the clauses of a program, e.g. Frank in the example, but we compose the old theory with a new one that represents the current situation. Therefore the state of the database before March 1, 1996 is preserved, thus maintaining the whole history. For instance, the ﬁrst query below inquires the updated database before Frank’s promotion whereas the second one shows how information in the database has been modiﬁed. demo((Frank ⇓ [0, Feb 29 1996 ]) ∪ {assistant prof(maths) th [Mar 1 1996 , ∞].}, research assistant(X) at Feb 23 1994 ) X = maths demo((Frank ⇓ [0, Feb 29 1996 ]) ∪ {assistant prof(maths) th [Mar 1 1996 , ∞].}, research assistant(X) at Mar 12 1996 ) no.

7

Related Work

Event Calculus by Kowalski and Sergot [28] has been the ﬁrst attempt to cast into logic programming the rules for reasoning about time. In more details, Event Calculus is a treatment of time, based on the notion of event, in ﬁrstorder classical logic augmented with negation as failure. It is closely related to Allen’s interval temporal logic [3]. For example, let E1 be an event in which Bob gives the Book to John and let E2 be an event in which John gives Mary the Book. Assume that E2 occurs after E1. Given these event descriptions, we can deduce that there is a period started by the event E1 in which John possesses the book and that there is a period terminated by E1 in which Bob possesses the book. This situation is represented pictorially as follows:

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

25

Bob has the Book John has the Book < −−−−−−−−−−−−−−−−− ◦ −−−−−−−−−−−−−−−−− − > E1 John has the Book Mary has the Book < −−−−−−−−−−−−−−−−− − ◦ −−−−−−−−−−−−−−−−−−−−− > E2

A series of axioms for deducing the existence of time periods and the Start and End of each time period are given by using the Holds predicate. Holds(before(e r )) if Terminates(e r ) means that the relationship r holds in the time period before(e r) that denotes a time period terminated by the event e. Holds(after(e r)) is deﬁned in an analogous way. Event Calculus provides a natural treatment of valid time in databases, and it was extended in [43,44] to include the concept of transaction time. Therefore Event Calculus exploits the deductive power of logic and the computational power of logic programming as in our approach, but the modeling of time is diﬀerent: events are the granularity of time chosen in Event Calculus, whereas we use time points and time periods. Furthermore no provision for multiple theories is given in Event Calculus. Kifer and Subrahmanian in [26] introduce generalized annotated logic programs (GAPs), and show how Templog [1] and an interval based temporal logic can be translated into GAPs. The annotations used there correspond to the th annotations of MuTACLP. To implement the annotated logic language, the paper proposes to use “reductants”, additional clauses which are derived from existing clauses to express all possible least upper bounds. The problem is that a ﬁnite program may generate inﬁnitely many such reductants. Then a new kind of resolution for annotated logic programs, called “ca-resolution”, is proposed in [30]. The idea is to compute dynamically and incrementally the least upper bounds by collecting partial answers. Operationally this is similar to the metainterpreter presented in Section 5.1 which relies on recursion to collect the partial answers. However, in [30] the intermediate stages of the computation may not be sound with respect to the standard CLP semantics. The paper [26] presents also two ﬁxpoint semantics for GAPs, deﬁned in terms of two diﬀerent operators. The ﬁrst operator, called TP , is based on interpretations which associate with each element of the Herbrand Base of a program P a set of annotations which is an ideal, i.e., a set downward closed and closed under ﬁnite least upper bounds. For each atom A, the computed ideal is the least one containing the annotations α of annotated atoms A α which are heads of (instances of) clauses whose body holds in the interpretation. The other operator, RP , is based on interpretations which associate with each atom of the Herbrand Base a single annotation, obtained as the least upper bound of the set of annotations computed as in the previous case. Our ﬁxpoint operator for MuTACLP works similarly to the TP operator: at each step we take the closure with respect to (representable) ﬁnite least upper bounds, and, although we perform the downward closure only at the end of the computation, this does

26

Paolo Baldan et al.

not aﬀect the set of derivable consequences. The main diﬀerence resides in the language: MuTACLP is an extension of CLP, which focuses on temporal aspects and provides mechanisms for combining programs, taking from GAP the basic ideas for handling annotations, whereas GAP is a general language with negation and arbitrary annotations but without constraints and multiple theories. Our temporal annotations correspond to some of the predicates proposed by Galton in [19], which is a critical examination of Allen’s classical work on a theory of action and time [3]. Galton accounts for both time points and time periods in dense linear time. Assuming that the intervals I are not singletons, Galton’s predicate holds-in(A,I) can be mapped into MuTACLP’s A in I, holdson(A,I) into A th I, and holds-at(A,t) into A at t, where A is an atomic formula. From the described correspondence it becomes clear that MuTACLP can be seen as reiﬁed FOL where annotated formulae, for example born(john)at t, correspond to binary meta-relations between predicates and temporal information, for example at(born(john), t). But also, MuTACLP can be regarded as a modal logic, where the annotations are seen as parameterized modal operators, e.g., born(john) (at t). Our temporal annotations also correspond to some temporal characteristics in the ChronoBase data model [42]. Such a model allows for the representation of a wide variety of temporal phenomena in a temporal database which cannot be expressed by using only th and in annotations. However, this model is an extension of the relational data model and, diﬀerently from our model, it is not rule-based. An interesting line of research could be to investigate the possibility of enriching the set of annotations in order to capture some other temporal characteristics, like a property that holds in an interval but not in its subintervals, still maintaining a simple and clear semantics. In [10], a powerful temporal logic named MTL (tense logic extended by parameterized temporal operators) is translated into ﬁrst order constraint logic. The resulting language subsumes Templog. The parameterized temporal operators of MTL correspond to the temporal annotations of TACLP. The constraint theory of MTL is rather complex as it involves quantiﬁed variables and implication, whose treatment goes beyond standard CLP implementations. On the other hand, MuTACLP inherits an eﬃcient standard constraint-based implementation of annotations from the TACLP framework. As far as the multi-theory setting is concerned, i.e. the possibility oﬀered by MuTACLP to structure and compose (temporal) knowledge, there are few logic-based approaches providing the user with these tools. One is Temporal Datalog [35], an extension of Datalog based on a simple temporal logic with two temporal operators, namely ﬁrst and next. Temporal Datalog introduces a notion of module, which however does not seem to be used as a knowledge representation tool but rather to deﬁne new non-standard algebraic operators. In fact, to query a temporal Datalog program, Orgun proposes a “point-wise extension” of the relational algebra upon the set of natural numbers, called TRAalgebra. Then he provides a mechanism for specifying generic modules, called temporal modules, which are parametric Temporal Datalog programs, with a

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

27

number of input predicates (parameters) and an output predicate. A module can be then regarded as an operator which, given a temporal relation, returns a temporal relation. Thus temporal modules are indeed used as operators of TRA, through which one has access to the use of recursion, arithmetic predicates and temporal operators. A multi-theory framework in which temporal information can be handled, based on annotated logics, is proposed by Subrahmanian in [45]. This is a very general framework aimed at amalgamating multiple knowledge bases which can also contain temporal information. The knowledge bases are GAPs [26] and temporal information is modeled by using an appropriate lattice of annotations. In order to integrate these programs, a so called Mediatory Database is given, which is a GAP having clauses of the form A0 : [m, µ] ← A1 : [D1 , µ1 ], . . . , An : [Dn , µn ] where each Di is a set of database names. Intuitively, a ground instance of a clause in the mediator can be interpreted as follows: “If the databases in set Di , 1 ≤ i ≤ n, (jointly) imply that the truth value of Ai is at least µi , then the mediator will conclude that the truth value of A0 is at least µ”. Essentially the fundamental mechanism provided to combine knowledge bases is a kind of message passing. Roughly speaking, the resolution of an atom Ai : [Di , µi ] is delegated to diﬀerent databases, speciﬁed by the set Di of database names, and the annotation µi is obtained by considering the least upper bounds of the annotations of each Ai computed in the distinct databases. Our approach is quite diﬀerent because the meta-level composition operators allow us to access not only to the relation deﬁned by a predicate but also to the deﬁnition of the predicate. For instance P ∪ Q is equivalent to a program whose clauses are the union of the clauses of P and Q and thus the information which can be derived from P ∪ Q is greater than the union of what we can derive from P and Q separately.

8

Conclusion

In this paper we have introduced MuTACLP, a language which joins the advantages of TACLP in handling temporal information with the ability to structure and compose programs. The proposed framework allows one to deal with time points and time periods and to model deﬁnite, indeﬁnite and periodic temporal information, which can be distributed among diﬀerent theories. Representing knowledge in separate programs naturally leads to use knowledge from diﬀerent sources; information can be stored at diﬀerent sites and combined in a modular way by employing the meta-level operators. This modular approach also favors the reuse of the knowledge encoded in the programs for future applications. The language MuTACLP has been given a top-down semantics by means of a meta-interepreter and a bottom-up semantics based on an immediate consequence operator. Concerning the bottom-up semantics, it would be interesting to investigate on diﬀerent deﬁnitions of the immediate consequence operator,

28

Paolo Baldan et al.

for instance by considering an operator similar to the function RP for generalized annotated programs [26]. The domain of interpretations considered in this paper is, in a certain sense, unstructured: interpretations are general sets of annotated atoms and the order, which is simply subset inclusion, does not take into account the order on annotations. Alternative solutions, based on diﬀerent notions of interpretation, may consider more abstract domains. These domains can be obtained by endowing C-base L × Ann with the product order (induced by the identity relation on C-base L and the order on Ann) and then by taking as elements of the domain (i.e. as interpretations) only those subsets of annotated atoms that satisfy some closure properties with respect to such an order. For instance, one can require “downward-closedness”, which amounts to including subsumption in the immediate consequence operator. Another possible property is “limit-closedness”, namely the presence of the least upper bound of all directed sets, which, from a computational point of view, amounts to consider computations which possibly require more than ω steps. In [15] the language TACLP is presented as an instance of annotated constraint logic (ACL) for reasoning about time. Similarly, we could have ﬁrst introduced a Multi-theory Annotated Constraint Logic (MuACL in brief), viewing MuTACLP as an instance of MuACL. To deﬁne MuACL the constructions described in this paper should be generalized by using, as basic language for plain programs, the more general paradigm of ACL where atoms can be labelled by a general class of annotations. In deﬁning MuACL we should require that the class of annotations forms a lattice, in order to have both upper bounds and lower bounds (the latter are necessary for the deﬁnition of the intersection operator). Indeed, it is not diﬃcult to see that, under the assumption that only atoms can be annotated and clauses are free of negation, both the meta-interpreter and the immediate consequence operator smootly generalize to deal with general annotations. Another interesting topic for future investigation is the treatment of negation. In the line of Fr¨ uhwirth, a possible solution consists of embodying the “negation by default” of logic programming into MuTACLP by exploiting the logical equalities proved in [15]: ((¬A) th I) ⇔ ¬(A in I)

((¬A) in I) ⇔ ¬(A th I)

Consequently, the meta-interpreter is extended with two clauses which use such equalities: demo(E, (¬A) th I) ← ¬demo(E, A in I) demo(E, (¬A) in I) ← ¬demo(E, A th I) However the interaction between negation by default and program composition operations is still to be fully understood. Some results on the semantic interactions between operations and negation by default are presented in [8], where, nevertheless, the handling of time is not considered. Furthermore, it is worth noticing that in this paper we have implicitly assumed that the same unit for time is used in diﬀerent programs, i.e. we have not dealt with diﬀerent time granularities. The ability to cope with diﬀerent

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

29

granularities (e.g. seconds, days, etc.) is particularly relevant to support interoperability among systems. A simple way to handle this feature, is by introducing in MuTACLP a notion of time unit and a set of conversion predicates which transform time points into the chosen time unit (see, e.g., [5]). We ﬁnally observe that in MuTACLP also spatial data can be naturally modelled. In fact, in the style of the constraint databases approaches (see, e.g., [25,37,20]) spatial data can be represented by using constraints. The facilities to handle time oﬀered by MuTACLP allows one to easily establish spatiotemporal correlations, for instance time-varying areas, or, more generally, moving objects, supporting either discrete or continuous changes (see [38,31,40]). Acknowledgments: This work has been partially supported by Esprit Working Group 28115 - DeduGIS.

References 1. M. Abadi and Z. Manna. Temporal logic programming. Journal of Symbolic Computation, 8:277–295, 1989. 2. J.F. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11):832–843, 1983. 3. J.F. Allen. Towards a general theory of action and time. Artificial Intelligence, 23:123–154, 1984. 4. P. Baldan, P. Mancarella, A. Raﬀaet` a, and F. Turini. Mutaclp: A language for temporal reasoning with multiple theories. Technical report, Dipartimento di Informatica, Universit` a di Pisa, 2001. 5. C. Bettini, X. S. Wang, and S. Jajodia. An architecture for supporting interoperability among temporal databases. In [13], pages 36–55. 6. K.A. Bowen and R.A. Kowalski. Amalgamating language and metalanguage in logic programming. In K. L. Clark and S.-A. Tarnlund, editors, Logic programming, volume 16 of APIC studies in data processing, pages 153–172. Academic Press, 1982. 7. A. Brogi. Program Construction in Computational Logic. PhD thesis, Dipartimento di Informatica, Universit` a di Pisa, 1993. 8. A. Brogi, S. Contiero, and F. Turini. Programming by combining general logic programs. Journal of Logic and Computation, 9(1):7–24, 1999. 9. A. Brogi, P. Mancarella, D. Pedreschi, and F. Turini. Modular logic programming. ACM Transactions on Programming Languages and Systems, 16(4):1361– 1398, 1994. 10. C. Brzoska. Temporal Logic Programming with Metric and Past Operators. In [14], pages 21–39. 11. J. Chomicki. Temporal Query Languages: A Survey. In Temporal Logic: Proceedings of the First International Conference, ICTL’94, volume 827 of Lecture Notes in Artificial Intelligence, pages 506–534. Springer, 1994. 12. J. Chomicki and T. Imielinski. Temporal Deductive Databases and Inﬁnite Objects. In Proceedings of ACM SIGACT/SIGMOD Symposium on Principles of Database Systems, pages 61–73, 1988. 13. O. Etzion, S. Jajodia, and S. Sripada, editors. Temporal Databases: Research and Practice, volume 1399 of Lecture Notes in Computer Science. Springer, 1998.

30

Paolo Baldan et al.

14. M. Fisher and R. Owens, editors. Executable Modal and Temporal Logics, volume 897 of Lecture Notes in Artificial Intelligence. Springer, 1995. 15. T. Fr¨ uhwirth. Temporal Annotated Constraint Logic Programming. Journal of Symbolic Computation, 22:555–583, 1996. 16. D. M. Gabbay. Modal and temporal logic programming. In [18], pages 197–237. 17. D.M. Gabbay and P. McBrien. Temporal Logic & Historical Databases. In Proceedings of the Seventeenth International Conference on Very Large Databases, pages 423–430, 1991. 18. A. Galton, editor. Temporal Logics and Their Applications. Academic Press, 1987. 19. A. Galton. A Critical Examination of Allen’s Theory of Action and Time. Artificial Intelligence, 42:159–188, 1990. 20. S. Grumbach, P. Rigaux, and L. Segouﬁn. The DEDALE system for complex spatial queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD-98), pages 213–224, 1998. 21. T. Hrycej. A temporal extension of Prolog. Journal of Logic Programming, 15(1& 2):113–145, 1993. 22. J. Jaﬀar and M.J. Maher. Constraint Logic Programming: A Survey. Journal of Logic Programming, 19 & 20:503–582, 1994. 23. J. Jaﬀar, M.J. Maher, K. Marriott, and P.J. Stuckey. The Semantics of Constraint Logic Programs. Journal of Logic Programming, 37(1-3):1–46, 1998. 24. J. Jaﬀar, S. Michaylov, P. Stuckey, and R. Yap. The CLP(R) Language and System. ACM Transactions on Programming Languages and Systems, 14(3):339–395, 1992. 25. P.C. Kanellakis, G.M. Kuper, and P.Z. Revesz. Constraint query languages. Journal of Computer and System Sciences, 51(1):26–52, 1995. 26. M. Kifer and V.S. Subrahmanian. Theory of Generalized Annotated Logic Programming and its Applications. Journal of Logic Programming, 12:335–367, 1992. 27. M. Koubarakis. Database models for inﬁnite and indeﬁnite temporal information. Information Systems, 19(2):141–173, 1994. 28. R. A. Kowalski and M.J. Sergot. A Logic-based Calculus of Events. New Generation Computing, 4(1):67–95, 1986. 29. R.A. Kowalski and J.S. Kim. A metalogic programming approach to multi-agent knowledge and belief. In Artificial Intelligence and Mathematical Theory of Computation. Academic Press, 1991. 30. S.M. Leach and J.J. Lu. Computing Annotated Logic Programs. In Proceedings of the eleventh International Conference on Logic Programming, pages 257–271, 1994. 31. P. Mancarella, G. Nerbini, A. Raﬀaet` a, and F. Turini. MuTACLP: A language for declarative GIS analysis. In Proceedings of the Sixth International Conference on Rules and Objects in Databases (DOOD2000), volume 1861 of Lecture Notes in Artificial Intelligence, pages 1002–1016. Springer, 2000. 32. P. Mancarella, A. Raﬀaet` a, and F. Turini. Knowledge Representation with Multiple Logical Theories and Time. Journal of Experimental and Theoretical Artificial Intelligence, 11:47–76, 1999. 33. P. Mancarella, A. Raﬀaet` a, and F. Turini. Temporal Annotated Constraint Logic Programming with Multiple Theories. In Tenth International Workshop on Database and Expert Systems Applications, pages 501–508. IEEE Computer Society Press, 1999. 34. B. Martens and D. De Schreye. Why Untyped Nonground Metaprogramming Is Not (Much Of) A Problem. Journal of Logic Programming, 22(1):47–99, 1995. 35. M. A. Orgun. On temporal deductive databases. Computational Intelligence, 12(2):235–259, 1996.

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

31

36. M. A. Orgun and W. Ma. An Overview of Temporal and Modal Logic Programming. In Temporal Logic: Proceedings of the First International Conference, ICTL’94, volume 827 of Lecture Notes in Artificial Intelligence, pages 445–479. Springer, 1994. 37. J. Paredaens, J. Van den Bussche, and D. Van Gucht. Towards a theory of spatial database queries. In Proceedings of the 13th ACM Symposium on Principles of Database Systems, pages 279–288, 1994. 38. A. Raﬀaet` a. Spatio-temporal knowledge bases in a constraint logic programming framework with multiple theories. PhD thesis, Dipartimento di Informatica, Universit` a di Pisa, 2000. 39. A. Raﬀaet` a and T. Fr¨ uhwirth. Semantics for Temporal Annotated Constraint Logic Programming. In Labelled Deduction, volume 17 of Applied Logic Series, pages 215–243. Kluwer Academic, 2000. 40. A. Raﬀaet` a and C. Renso. Temporal Reasoning in Geographical Information Systems. In International Workshop on Advanced Spatial Data Management (DEXA Workshop), pages 899–905. IEEE Computer Society Press, 2000. 41. M. J. Sergot, F. Sadri, R. A. Kowalski, F. Kriwaczek, P. Hammond, and H. T. Cory. The British Nationality Act as a logic program. Communications of the ACM, 29(5):370–386, 1986. 42. S. Sripada and P. M¨ oller. The Generalized ChronoBase Temporal Data Model. In Meta-logics and Logic Programming, pages 310–335. MIT Press, 1995. 43. S.M. Sripada. A logical framework for temporal deductive databases. In Proceedings of the Very Large Databases Conference, pages 171–182, 1988. 44. S.M. Sripada. Temporal Reasoning in Deductive Databases. PhD thesis, Department of Computing Imperial College of Science & Technology, 1991. 45. V. S. Subrahmanian. Amalgamating Knowledge Bases. ACM Transactions on Database Systems, 19(2):291–331, 1994. 46. A. Tansel, J. Cliﬀord, S. Gadia, S. Jajodia, A. Segev, and R. Snodgrass editors. Temporal Databases: Theory, Design, and Implementation. Benjamin/Cummings, 1993. 47. C. Zaniolo, N. Arni, and K. Ong. Negation and aggregates in recursive rules: The LDL++Approach. In International conference on Deductive and ObjectOriented Databases (DOOD’93), volume 760 of Lecture Notes in Computer Science. Springer, 1993.

32

Paolo Baldan et al.

Appendix: Proofs Proposition 1 Let I1 and I2 be two interpretations. Then ↓ (I1 e I2 ) = ↓ I1 ↓ I2 . Proof. Assume (A, α) ∈↓ (I1 e I2 ). By deﬁnition of downward closure there exists γ such that (A, γ) ∈ I1 e I2 and DC |= α γ. By deﬁnition of e there exist β and β such that (A, β) ∈ I1 and (A, β ) ∈ I2 and DC |= β # β = γ. Therefore DC |= α β, α β , by deﬁnition of downward closure we conclude (A, α) ∈↓ I1 and (A, α) ∈↓ I2 , i.e., (A, α) ∈↓ I1 ↓ I2 . Vice versa assume (A, α) ∈↓ I1 ∩ ↓ I2 . By deﬁnition of set-theoretic intersection and downward closure there exist β and β such that DC |= α β, α β and (A, β) ∈ I1 and (A, β ) ∈ I2 . By deﬁnition of e, (A, γ) ∈ I1 e I2 and DC |= β # β = γ. By property of the greatest lower bound DC |= α β # β , hence (A, α) ∈↓ (I1 e I2 ). Theorem 1 Let E be a program expression. The function TCE is continuous (on (℘(C-base L × Ann), ⊆)). Proof. Let {Ii }i∈N be a chain in (℘(C-base L × Ann), ⊆), i.e., I0 ⊆ I1 ⊆ . . . ⊆ Ii . . .. Then we have to prove

C Ii ⇐⇒ (A, α) ∈ TCE (Ii ). (A, α) ∈ TE i∈N i∈N The proof is by structural induction of E. (E is a plain program P ). (A, α) ∈ TCP ( i∈N Ii ) ⇐⇒ {deﬁnition of TCP } ((α = th [s1 , s2 ] ∨ α = in [s1 , s2 ]) ∧ A α ← C1 , . . . , Ck , B1 α1 , . . . , Bn αn ∈ ground C (P ) ∧ {(B1 , β1 ), . . . , (Bn , βn )} ⊆ i∈N Ii ∧ DC |= C1 , . . . , Ck , α1 β1 , . . . , αn βn , s1 ≤ s2 ) ∨ (α = th [s1 , r2 ] ∧ A th [s1 , s 2 ] ← C1 , . . . , Ck , B1 α1 , . . . , B n αn ∈ ground C (P ) ∧ {(B1 , β1 ), . . . , (Bn , βn )} ⊆ i∈N Ii ∧ (A, th [r1 , r2 ]) ∈ i∈N Ii ∧ DC |= C1 , . . . , Ck , α1 β1 , . . . , αn βn , s1 < r1 , r1 ≤ s2 , s2 < r2 ) ⇐⇒ {property of set-theoretic union and {Ii }i∈N is a chain. Notice that for (=⇒) j can be any element of the set {k | (Bi , βi ) ∈ Ik , i = 1, . . . , n} which is clearly not empty} ((α = th [s1 , s2 ] ∨ in [s1 , s2 ]) ∧ A α ← C1 , . . . , Ck , B1 α1 , . . . , Bn αn ∈ ground C (P ) ∧ {(B1 , β1 ), . . . , (Bn , βn )} ⊆ Ij ∧ DC |= C1 , . . . , Ck , α1 β1 , . . . , αn βn , s1 ≤ s2 ) ∨

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

33

(α = th [s1 , r2 ] ∧ A th [s1 , s2 ] ← C1 , . . . , Ck , B1 α1 , . . . , Bn αn ∈ ground C (P ) ∧ {(B1 , β1 ), . . . , (Bn , βn )} ⊆ Ij ∧ (A, th [r1 , r2 ]) ∈ Ij ∧ DC |= C1 , . . . , Ck , α1 β1 , . . . , αn βn , s1 < r1 , r1 ≤ s2 , s2 < r2 ) ⇐⇒ {deﬁnition of TCP } (A, α) ∈ TCP (Ij ) ⇐⇒ {set-theoretic union} (A, α) ∈ i∈N TCP (Ii ) (E = Q ∪ R).

(A, α) ∈ TCQ∪R ( i∈N Ii ) ⇐⇒ {deﬁnitionof TCQ∪R } (A, α) ∈ TCQ ( i∈N Ii ) ∪ TCR ( i∈N Ii ) ⇐⇒ {inductive hypothesis} C C (A, α) ∈ i∈N TQ (Ii ) ∪ i∈N TR (Ii ) ⇐⇒ {properties of union} (A, α) ∈ i∈N TCQ (Ii ) ∪ TCR (Ii ) C ⇐⇒ {deﬁnition ofCTQ∪R } (A, α) ∈ i∈N TQ∪R (Ii ) (E = Q ∩ R).

(A, α) ∈ TCQ∩R ( i∈N Ii ) ⇐⇒ {deﬁnitionof TCQ∩R } (A, α) ∈ TCQ ( i∈N Ii ) e TCR ( i∈N Ii ) ⇐⇒ {inductive hypothesis} C C (A, α) ∈ i∈N TQ (Ii ) e i∈N TR (Ii ) ⇐⇒ {deﬁnition and monotonicity of TC } e of C C (A, α) ∈ i∈N TQ (Ii ) e TR (Ii ) C ⇐⇒ {deﬁnition ofCTQ∩R } (A, α) ∈ i∈N TQ∩R (Ii ) Soundness and Completeness This section presents the proofs of the soundness and completeness results for MuTACLP meta-interpreter. Due to space limitations, the proofs of the technical lemmata are omitted and can be found in [4,38]. We ﬁrst ﬁx some notational conventions. In the following we will denote by E, N , R and Q generic program expressions, and by C the ﬁxed constraint domain where the constraints of object programs are interpreted. Let M be the ﬁxed constraint domain, where the constraints of the meta-interpreter deﬁned in Section 5.1 are interpreted. We denote by A, B elements of C-base L , with α, β, γ annotations in Ann and by C a C-ground instance of a constraint. All symbols may have subscripts. In the following for simplicity we will drop the reference to C and M in the name of the immediate consequence operators. Moreover we refer to the program containing the meta-level representation of object level programs and clauses (1)-(10) as “the meta-program V corresponding to a program expression”.

34

Paolo Baldan et al.

We will say that an interpretation I ⊆ C-base L × Ann satisﬁes the body of a C-ground instance A α ← C1 , . . . , Ck , B1 α1 , . . . , Bn αn of a clause, or in symbols I |= C1 , . . . , Ck , B1 α1 , . . . , Bn αn , if 1. DC |= C1 , . . . , Ck and 2. there are annotations β1 , . . . , βn such that {(B1 , β1 ), . . . , (Bn , βn )} ⊆ I and DC |= α1 β1 , . . . , αn βn . Furthermore, will often denote a sequence C1 , . . . , Ck of C-ground instances ¯ while a sequence B1 α1 , . . . , Bn αn of annotated atoms in of constraints by C, ¯ For example, with this convention a clause of C-base L ×Ann will be denoted by B. ¯ B, ¯ and, the kind A α ← C1 , . . . , Ck , B1 α1 , . . . , Bn αn will be written as A α ← C, ¯ B)) ¯ similarly, in the meta-level representation, we will write clause(E, A α, (C, in place of clause(E, A α, (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )). Soundness. In order to show the soundness of the meta-interpreter (restricted to the atoms of interest), we present the following easy lemma, stating that if a conjunctive goal is provable at the meta-level then also its atomic conjuncts are provable at the meta-level. Lemma 1. Let E be a program expression and let V be the corresponding metainterpreter. For any B1 α1 , . . . , Bn αn with Bi ∈ C-base L and αi ∈ Ann and for any C1 , . . . , Ck , with Ci a C-ground instance of a constraint, we have: For all h demo(E, (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )) ∈ TVh =⇒ {demo(E, B1 α1 ), . . . , demo(E, Bn αn )} ⊆ TVh ∧ DC |= C1 , . . . , Ck . The next two lemmata relate the clauses computed from a program expression E at the meta-level, called “virtual clauses”, with the set of consequences of E. The ﬁrst lemma states that whenever we can ﬁnd a virtual clause computed from E whose body is satisﬁed by I, the head A α of the clause is a consequence of the program expression E. The second one shows how the head of a virtual clause can be “joined” with an already existing annotated atom in order to obtain an atom with a larger th annotation. Lemma 2 (Virtual Clauses Lemma 1). Let E be a program expression and V be the corresponding meta-interpreter. For any sequence C¯ of C-ground instances ¯ in C-base L × Ann and any interpretation I ⊆ of constraints, for any A α, B C-base L × Ann, we have: ¯ B ¯ ¯ B)) ¯ ∈ T ω ∧ I |= C, clause(E, A α, (C, V

=⇒

(A, α) ∈ TE (I).

Lemma 3 (Virtual Clauses Lemma 2). Let E be a program expression and ¯ in V be the corresponding meta-program. For any A th [s1 , s2 ], A th [r1 , r2 ], B C-base L × Ann, for any sequence C¯ of C-ground instances of constraints, and any interpretation I ⊆ C-base L × Ann, the following statement holds:

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

35

¯ B)) ¯ ∈ T ω ∧ I |= C, ¯ B ¯ ∧ clause(E, A th [s1 , s2 ], (C, V (A, th [r1 , r2 ]) ∈ I ∧ DC |= s1 < r1 , r1 ≤ s2 , s2 < r2 =⇒ (A, th [s1 , r2 ]) ∈ TE (I). Now, the soundness of the meta-interpreter can be proved by showing that if an annotated atom A α is provable at the meta-level from the program expression E then A γ is a consequence of E for some γ such that A γ ⇒ A α, i.e., the annotation α is less or equal to γ. Theorem 3 (soundness). Let E be a program expression and let V be the corresponding meta-program. For any A α with A ∈ C-base L and α ∈ Ann, the following statement holds: demo(E, A α) ∈ TVω

=⇒

(A, α) ∈ FC (E).

Proof. We ﬁrst show that for all h demo(E, A α) ∈ TVh

=⇒

∃γ : (A, γ) ∈ Tω E ∧ DC |= α γ.

(12)

The proof is by induction on h. (Base case). Trivial since TV0 = ∅. (Inductive case). Assume that demo(E, A α) ∈ TVh

=⇒

∃γ : (A, γ) ∈ Tω E ∧ DC |= α γ.

Then: demo(E, A α) ∈ TVh+1 ⇐⇒ {deﬁnition of TVi } demo(E, A α) ∈ TV (TVh ) We have four cases corresponding to clauses (3), (4), (5) and (6). We only show the cases related to clause (3) and (4) since the others are proved in an analogous way. (clause (3)) {α = th [t1 , t2 ], deﬁnition of TV and clause (3)} ¯ B)), ¯ demo(E, (C, ¯ B))} ¯ {clause(E, A th [s1 , s2 ], (C, ⊆ TVh ∧ DC |= s1 ≤ t1 , t2 ≤ s2 , t1 ≤ t2 ¯ B) ¯ = (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )} =⇒{Lemma 1 and (C, clause(E, A th [s1 , s2 ], (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )) ∈ TVh ∧ {demo(E, B1 α1 ), . . . , demo(E, Bn αn )} ⊆ TVh ∧ DC |= C1 , . . . , Ck ∧ DC |= s1 ≤ t1 , t2 ≤ s2 , t1 ≤ t2 =⇒{inductive hypothesis} ∃β1 , . . . , βn : clause(E, A th [s1 , s2 ], (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )) ∈ TVh ∧ {(B1 , β1 ), . . . , (Bn , βn )} ⊆ Tω E ∧ DC |= α1 β1 , . . . , αn βn ∧ DC |= C1 ,. . . , Ck ∧ DC |= s1 ≤ t1 , t2 ≤ s2 , t1 ≤ t2 =⇒{TVω = i∈N TVi } clause(E, A th [s1 , s2 ], (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )) ∈ TVω ∧ {(B1 , β1 ), . . . , (Bn , βn )} ⊆ Tω E ∧ DC |= α1 β1 , . . . , αn βn ∧ DC |= C1 , . . . , Ck ∧ DC |= s1 ≤ t1 , t2 ≤ s2 , t1 ≤ t2

36

Paolo Baldan et al.

=⇒{Lemma 2} (A, th [s1 , s2 ]) ∈ TE (Tω E ) ∧ DC |= s1 ≤ t1 , t2 ≤ s2 , t1 ≤ t2 =⇒{Tω is a ﬁxpoint of TE and DC |= s1 ≤ t1 , t2 ≤ s2 , t1 ≤ t2 } E ∧ DC |= th [t1 , t2 ] th [s1 , s2 ] (A, th [s1 , s2 ]) ∈ Tω E (clause (4)) {α = th [t1 , t2 ], deﬁnition of TV and clause (4)} ¯ B)), ¯ demo(E, (C, ¯ B)), ¯ demo(E, A th [s2 , t2 ])} ⊆ T h {clause(E, A th [s1 , s2 ], (C, V ∧ DC |= s1 ≤ t1 , t1 < s2 , s2 < t2 ¯ B) ¯ = (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )} =⇒{Lemma 1 and (C, clause(E, A th [s1 , s2 ], (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )) ∈ TVh ∧ {demo(E, B1 α1 ), . . . , demo(E, Bn αn ), demo(E, A th [s2 , t2 ])} ⊆ TVh ∧ DC |= C1 , . . . , Ck ∧ DC |= s1 ≤ t1 , t1 < s2 , s2 < t2 =⇒{inductive hypothesis} ∃β, β1 , . . . , βn : clause(E, A th [s1 , s2 ], (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )) ∈ TVh ∧ {(B1 , β1 ), . . . , (Bn , βn ), (A, β)} ⊆ Tω E ∧ DC |= α1 β1 , . . . , αn βn , th [s2 , t2 ] β ∧ DC |= C1 , . . . , Ck ∧ DC |= s1 ≤ t1 , t1 < s2 , s2 < t2 . Since DC |= th [s2 , t2 ] β then β = th [w1 , w2 ] with DC |= w1 ≤ s2 , t2 ≤ w2 . Hence we distinguish two cases according to the relation between w1 and s1 . – DC |= w1 ≤ s1 . In this case we immediately conclude because DC |= th [t1 , t2 ] th [w1 , w2 ], and thus (A, th [w1 , w2 ]) ∈ Tω E ∧ DC |= th [t1 , t2 ] th [w1 , w2 ]. – DC |= s1 < w1 . In this case clause(E, Ath [s1 , s2 ], (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )) ∈ TVω , since ω TV = i∈N TVi . Moreover, from DC |= s1 < w1 , w1 ≤ s2 , s2 < t2 , t2 ≤ w2 , ω by Lemma 3 we obtain (A, th [s1 , w2 ]) ∈ TE (Tω E ). Since TE is a ﬁxpoint of TE and DC |= s1 ≤ t1 , t2 ≤ w2 we can conclude (A, th [s1 , w2 ]) ∈ TωE and DC |= th [t1 , t2 ] th [s1 , w2 ]. We are ﬁnally able to prove the soundness of the meta-interpreter with respect to the least ﬁxpoint semantics. demo(E, A α) ∈ TVω ω =⇒ {TV = i∈N TVi } ∃h : demo(E, A α) ∈ TVh =⇒ {Statement (12)} ∃β : (A, β) ∈ Tω E ∧ DC |= α β =⇒ {deﬁnition of F C } (A, α) ∈ F C (E). Completeness. We ﬁrst need a lemma stating that if an annotated atom A α is provable at the meta-level in a program expression E then we can prove at the meta-level the same atom A with any other “weaker” annotation (namely A γ, with γ α).

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

37

Lemma 4. Let E be a program expression and V be the corresponding metaprogram. For any A ∈ C-base L and α ∈ Ann, the following statement holds: demo(E, A α) ∈ TVω

=⇒

{demo(E, A γ) | γ ∈ Ann, DC |= γ α} ⊆ TVω .

Now the completeness result for MuTACLP meta-interpreter basically relies on two technical lemmata (Lemma 7 and Lemma 8). Roughly speaking they assert that when th and in annotated atoms are derivable from an interpretation I by using the TE operator then we can ﬁnd corresponding virtual clauses in the program expression E which permit to derive the same or greater information. Let us ﬁrst introduce some preliminary notions and results. Definition 6 (covering). A covering for a th -annotation th [t1 , t2 ] is a sequence of annotations {th [ti1 , ti2 ]}i∈{1,...,n} , such that DC |= th [t1 , t2 ] th [t11 , t2n ] and for any i ∈ {1, . . . , n} ≤ ti2 , ti1 < ti+1 DC |= ti1 ≤ ti2 , ti+1 1 1 . In words, a covering of a th annotation th [t1 , t2 ] is a sequence of annotations {th [ti1 , ti2 ]}i∈{1,...,n} such that each of the intervals overlaps with its successor, and the union of such intervals includes [t1 , t2 ]. The next simple lemma observes that, given two annotations and a covering for each of them, we can always build a covering for their greatest lower bound. Lemma 5. Let th [t1 , t2 ] and th [s1 , s2 ] be annotations and th [w1 , w2 ] = th [t1 , t2 ] # th [s1 , s2 ]. Let {th [ti1 , ti2 ]}i∈{1,...,n} and {th [sj1 , sj2 ]}j∈{1,...,m} be coverings for th [t1 , t2 ] and th [s1 , s2 ], respectively. Then a covering for th [w1 , w2 ] can be extracted from {th [ti1 , ti2 ] # th [sj1 , sj2 ] | i ∈ {1, . . . n} ∧ j ∈ {1, . . . , m}}. In the hypothesis of the previous lemma [w1 , w2 ] = [t1 , t2 ] ∩ [s1 , s2 ]. Thus the result of the lemma is simply a consequence of the distributivity of set-theoretical intersection with respect to union. Definition 7. Let E be a program expression, let V be the corresponding metaprogram and let I ⊆ C-base L × Ann be an interpretation. Given an annotated atom (A, th [t1 , t2 ]) ∈ C-base L × Ann, an (E, I)-set for (A, th [t1 , t2 ]) is a set ¯ i ))}i∈{1,...,n} ⊆ T ω {clause(E, A th [ti1 , ti2 ], (C¯ i , B V such that 1. {th [ti1 , ti2 ]}i∈{1,...,n} is a covering of th [t1 , t2 ], and ¯ i. 2. for i ∈ {1, . . . , n}, I |= C¯ i , B An interpretation I ⊆ C-base L × Ann is called th -closed with respect to E (or E-closed, for short) if there is an (E, I)-set for every annotated atom (A, th [t1 , t2 ]) ∈ I.

38

Paolo Baldan et al.

The next lemma presents some properties of the notion of E-closedness, which essentially state that the property of being E-closed is invariant with respect to some obvious algebraic transformations of the program expression E. Lemma 6. Let E, R and N be program expressions and let I be an interpretation. Then the following properties hold, where op ∈ {∪, ∩} 1. 2. 3. 4. 5. 6.

I is (E op E)-closed iﬀ I is E-closed; I is (E op R)-closed iﬀ I is (R op E)-closed; I is ((E op R) op N )-closed iﬀ I is E op (R op N )-closed; if I is E-closed then I is (E ∪ R)-closed; if I is (E ∩ R)-closed then I is E-closed; I is ((E ∩ R) ∪ N )-closed iﬀ I is ((E ∪ N ) ∩ (R ∪ N ))-closed.

We next show that if we apply the TE operator to an E-closed interpretation, then for any derived th -annotated atom there exists an (E, I)-set (see Deﬁnition 7). This result represents a basic step towards the completeness proof. In fact, it tells us that starting from the empty interpretation, which is obviously E-closed, and iterating the TE then we get, step after step, th -annotated atoms which can be also derived from the virtual clauses of the program expression at hand. For technical reasons, to make the induction work, we need a slightly stronger property. Lemma 7. Let E and Q be program expressions, let V be the corresponding meta-program4 and let I ⊆ C-base L × Ann be an (E ∪ Q)-closed interpretation. Then for any atom (A, th [t1 , t2 ]) ∈ TE (I) there exists an (E ∪ Q, I)-set. Corollary 1. Let E be any program expression and let V be the corresponding meta-program. Then for any h ∈ N the interpretation ThE is E-closed. Therefore TωE is E-closed. Another technical lemma is needed for dealing with the in annotations, which comes in pair with Lemma 7. Lemma 8. Let E be a program expression, let V be the corresponding metaprogram and let I be any E-closed interpretation. For any atom (A, in [t1 , t2 ]) ∈ TE (I) we have ¯ B)) ¯ ∈ T ω ∧ I |= C, ¯ B ¯ ∧ DC |= in [t1 , t2 ] α. clause(E, A α, (C, V Now we can prove the completeness of the meta-interpreter with respect to the least ﬁxpoint semantics. Theorem 4 (Completeness). Let E be a program expression and V be the corresponding meta-program. For any A ∈ C-base L and α ∈ Ann the following statement holds: (A, α) ∈ F C (E) 4

=⇒

demo(E, A α) ∈ TVω .

The meta-program contains the meta-level representation of the plain programs in E and Q.

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

39

Proof. We ﬁrst show that for all h (A, α) ∈ ThE

=⇒

demo(E, A α) ∈ TVω .

(13)

The proof is by induction on h. (Base case). Trivial since T0E = ∅. (Inductive case). Assume that (A, α) ∈ ThE

=⇒

demo(E, A α) ∈ TVω .

Observe that, under the above assumption, ¯ B ¯ ThE |= C,

⇒

¯ B)) ¯ ∈ T ω. demo(E, (C, V

(14)

¯ = B1 α1 , . . . , Bn αn . Then the notation Th |= C¯ In fact let C¯ = C1 , . . . , Ck and B E amounts to say that for each i, DC |= Ci and thus demo(E, Ci ) ∈ TVω , by deﬁnition ¯ means that for each i, (Bi , βi ) ∈ Th of TV and clause (7). Furthermore ThE |= B E and DC |= αi βi . Hence by inductive hypothesis demo(E, Bi βi ) ∈ TVω and thus, by Lemma 4, demo(E, Bi αi ) ∈ TVω . By several applications of clause (2) in ¯ C)) ¯ ∈ T ω. the meta-interpreter we ﬁnally deduce demo(E, (B, V It is convenient to treat separately the cases of th and in annotations. If we assume that α = th [t1 , t2 ], then (A, th [t1 , t2 ]) ∈ Th+1 E ⇐⇒ {deﬁnition of TiE } (A, th [t1 , t2 ]) ∈ TE (ThE ) =⇒ {Lemma 7 and ThE is E-closed by Corollary 1} ¯ i ))}i∈{1,...,n} ⊆ T ω ∧ {clause(E, A th [ti1 , ti2 ], (C¯ i , B V h i i ¯ for i ∈ {1, . . . , n} ∧ TE |= C¯ , B {th [ti1 , ti2 ]}i∈{1,...,n} covering of th [t1 , t2 ] =⇒ {previous remark (14)} ¯ i ))}i∈{1,...,n} ⊆ T ω ∧ {clause(E, A th [ti1 , ti2 ], (C¯ i , B V i i ω ¯ )) ∈ T for i ∈ {1, . . . , n} ∧ demo(E, (C¯ , B V {th [ti1 , ti2 ]}i∈{1,...,n} covering of th [t1 , t2 ] =⇒ {deﬁnition of TV , clause (3) and TVω is a ﬁxpoint of TV } demo(E, A th [tn1 , tn2 ]) ∈ TVω ∧ ¯ i ))}i∈{1,...,n−1} ⊆ T ω ∧ {clause(E, A th [ti1 , ti2 ], (C¯ i , B V i i ω ¯ )) ∈ T for i ∈ {1, . . . , n − 1} ∧ demo(E, (C¯ , B V {th [ti1 , ti2 ]}i∈{1,...,n} covering of th [t1 , t2 ] =⇒ {deﬁnition of TV , clause (4), Lemma 4 and TVω is a ﬁxpoint of TV } ¯ i ))}i∈{1,...,n−2} ⊆ T ω demo(E, A th [tn−1 , tn2 ]) ∧ {clause(E, A th [ti1 , ti2 ], (C¯ i , B 1 V i i ω ¯ )) ∈ T for i ∈ {1, . . . , n − 2} ∧ ∧ demo(E, (C¯ , B V {th [ti1 , ti2 ]}i∈{1,...,n} covering of th [t1 , t2 ] =⇒ {by exploiting several times clause (4) as above} demo(E, A th [t11 , tn2 ]) ∧ {th [ti1 , ti2 ]}i∈{1,...,n} covering of th [t1 , t2 ] =⇒ {by deﬁnition of covering DC |= th [t1 , t2 ] th [t11 , tn2 ] and Lemma 4} demo(E, A th [t1 , t2 ]) ∈ TVω

40

Paolo Baldan et al.

Instead, if α = in [t1 , t2 ], then (A, in [t1 , t2 ]) ∈ Th+1 E ⇐⇒ {deﬁnition of TiE } (A, in [t1 , t2 ]) ∈ TE (ThE ) =⇒ {Lemma 8} ¯ B ¯ ∧ DC |= in [t1 , t2 ] β ¯ B)) ¯ ∈ T ω ∧ Th |= C, clause(E, A β, (C, V E =⇒ {previous remark (14)} ¯ B)) ¯ ∈ T ω ∧ DC |= in [t1 , t2 ] β ¯ B)) ¯ ∈ T ω ∧ demo(E, (C, clause(E, A β, (C, V V ω =⇒ {clause (3) or (6), and TV is a ﬁxpoint of TV } demo(E, A β) ∈ TVω ∧ DC |= in [t1 , t2 ] β =⇒ {Lemma 4} demo(E, A in [t1 , t2 ]) ∈ TVω We now prove the completeness of the meta-interpreter of the program expressions with respect to the least ﬁxpoint semantics. (A, α) ∈ F C (E) =⇒ {deﬁnition of FC (E)} ∃γ ∈ Ann: (A, γ) ∈ Tω E ∧ DC |= α γ i =⇒ {Tω = T } E i∈N E ∃h : (A, γ) ∈ ThE ∧ DC |= α γ =⇒ {statement (13)} demo(E, A γ) ∈ TVω ∧ DC |= α γ =⇒ {Lemma 4} demo(E, A α) ∈ TVω

Description Logics for Information Integration Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini Dipartimento di Informatica e Sistemistica Universit` a di Roma “La Sapienza” Via Salaria 113, 00198 Roma, Italy lastname @dis.uniroma1.it, http://www.dis.uniroma1.it/∼lastname Abstract. Information integration is the problem of combining the data residing at diﬀerent, heterogeneous sources, and providing the user with a uniﬁed view of these data, called mediated schema. The mediated schema is therefore a reconciled view of the information, which can be queried by the user. It is the task of the system to free the user from the knowledge on where data are, and how data are structured at the sources. In this chapter, we discuss data integration in general, and describe a logic-based approach to data integration. A logic of the Description Logics family is used to model the information managed by the integration system, to formulate queries posed to the system, and to perform several types of automated reasoning supporting both the modeling, and the query answering process. We focus, in particular, on a speciﬁc Description Logic, called DLR, speciﬁcally designed for database applications. In the chapter, we illustrate how DLR is used to model a mediated schema of an integration system, to specify the semantics of the data sources, and ﬁnally to support the query answering process by means of the associated reasoning methods.

1

Introduction

Information integration is the problem of combining the data residing at diﬀerent sources, and providing the user with a uniﬁed view of these data, called mediated schema. The mediated schema is therefore a reconciled view of the information, which can be queried by the user. It is the task of the data integration system to free the user from the knowledge on where data are, and how data are structured at the sources. The interest in this kind of systems has been continuously growing in the last years. Many organizations face the problem of integrating data residing in several sources. Companies that build a Data Warehouse, a Data Mining, or an Enterprise Resource Planning system must address this problem. Also, integrating data in the World Wide Web is the subject of several investigations and projects nowadays. Finally, applications requiring accessing or re-engineering legacy systems must deal with the problem of integrating data stored in diﬀerent sources. The design of a data integration system is a very complex task, which comprises several diﬀerent issues, including the following: A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 41–60, 2002. c Springer-Verlag Berlin Heidelberg 2002

42

1. 2. 3. 4. 5. 6.

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini

heterogeneity of the sources, relation between the mediated schema and the sources, limitations on the mechanisms for accessing the sources, materialized vs. virtual integration, data cleaning and reconciliation, how to process queries expressed on the mediated schema.

Problem (1) arises because sources are typically heterogeneous, meaning that they adopt diﬀerent models and systems for storing data. This poses challenging problems in specifying the mediated schema. The goal is to design such a schema so as to provide an appropriate abstraction of all the data residing at the sources. One aspect deserving special attention is the choice of the language used to express the mediated schema. Since such a schema should mediate among diﬀerent representations of overlapping worlds, the language should provide ﬂexible and powerful representation mechanisms. We refer to [34] for a more detailed discussion on this subject. Following the work in [32,16,40], in this paper we use a formalism of the family of Description Logics to specify mediated schemas. With regard to Problem (2), two basic approaches have been used to specify the relation between the sources and the mediated schema. The ﬁrst approach, called global-as-view (or query-based), requires that the mediated schema is expressed in terms of the data sources. More precisely, to every concept of the mediated schema, a view over the data sources is associated, so that its meaning is speciﬁed in terms of the data residing at the sources. The second approach, called local-as-view (or source-based), requires the mediated schema to be speciﬁed independently from the sources. The relationships between the mediated schema and the sources are established by deﬁning every source as a view over the mediated schema. Thus, in the local-as-view approach, we specify the meaning of the sources in terms of the concepts in the mediated schema. It is clear that the latter approach favors the extensibility of the integration system, and provides a more appropriate setting for its maintenance. For example, adding a new source to the system requires only to provide the deﬁnition of the source, and does not necessarily involve changes in the mediated schema. On the contrary, in the global-as-view approach, adding a new source typically requires changing the deﬁnition of the concepts in the mediated schema. For this reason, in the rest of the paper, we adopt the local-as-view approach. A comparison between the two approaches is reported in [51]. Problem (3) refers to the fact, that, both in the local-as-view and in the global-as-view approach, it may happen that a source presents some limitations on the types of accesses it supports. A typical example is a web source accessible through a form where one of the ﬁelds must necessarily be ﬁlled in by the user. Such a situation can be modeled by specifying the source as a relation supporting only queries with a selection on a column. Suitable notations have been proposed for such situations [44], and the consequences of these access limitations on query processing in integration systems have been investigated in several papers [44,43,27,56,55,41,42]. Problem (4) deals with a further criterion that we should take into account in the design of a data integration system. In particular, with respect to the

Description Logics for Information Integration

43

data explicitely managed by the system, we can follow two diﬀerent approaches, called materialized and virtual. In the materialized approach, the system computes the extensions of the concepts in the mediated schema by replicating the data at the sources. In the virtual approach, data residing at the sources are accessed during query processing, but they are not replicated in the integration system. Obviously, in the materialized approach, the problem of refreshing the materialized views in order to keep them up-to-date is a major issue [34]. In the following, we only deal with the virtual approach. Whereas the construction of the mediated schema concerns the intentional level of the data integration system, problem (5) refers to a number of issues arising when considering the integration at the extensional/instance level. A ﬁrst issue in this context is the interpretation and merging of the data provided by the sources. Interpreting data can be regarded as the task of casting them into a common representation. Moreover, the data returned by various sources need to be converted/reconciled/combined to provide the data integration system with the requested information. The complexity of this reconciliation step is due to several problems, such as possible mismatches between data referring to the same real world object, possible errors in the data stored in the sources, or possible inconsistencies between values representing the properties of the real world objects in diﬀerent sources [28]. The above task is known in the literature as Data Cleaning and Reconciliation, and the interested reader is referred to [28,10,4] for more details on this subject. Finally, problem (6) is concerned with one of the most important issues in a data integration system, i.e., the choice of the method for computing the answer to queries posed in terms of the mediated schema. While query answering in the global-as-view approach typically reduces to unfolding, an integration system based on the local-as-view approach must resort to more sophisticated query processing techniques. The main issue is that the system should be able to reexpress the query in terms of a suitable set of queries posed to the sources. In this reformulation process, the crucial step is deciding how to decompose the query on the mediated schema into a set of subqueries on the sources, based on the meaning of the sources in terms of the concepts in the mediated schema. The computed subqueries are then shipped to the sources, and the results are assembled into the ﬁnal answer. In the rest of this paper, we concentrate on Problem (6), namely, query processing in a data integration system speciﬁed by means of the local-as-view approach, and we present the following contributions: – We ﬁrst provide a logical formalization of the problem. In particular, we illustrate a general architecture for a data integration system, comprising a mediated schema, a set of views, and a query. Query processing in this setting is formally deﬁned as the problem of answering queries using views: compute the answer to a query only on the basis of the extension of a set of views [1,29]. We observe that, besides data integration, this problem is relevant in several ﬁelds, including data warehousing [54], query optimization [17], supporting physical data independence [50], etc.

44

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini

– Then we instantiate the general framework to the case where schemas, views and queries are expressed by making use of a particular logical language. In particular: • The mediated schema is expressed in terms of a knowledge base constituted by general inclusion assertions and membership assertions, formulated in an expressive Description Logic [6]. • Queries and views are expressed as non-recursive datalog programs, whose predicates in the body are concepts or relations that appear in the knowledge base. • For each view, it can be speciﬁed whether the provided extension is sound, complete, or exact with respect to the view deﬁnition [1,11]. Such assumptions are used in data integration with the following meaning. A sound view corresponds to an information source which is known to produce only, but not necessarily all, the answers to the associated query. A complete view models a source which is known to produce all answers to the associated query, and maybe more. Finally, an exact view is known to produce exactly the answers to the associated query. – We then illustrate a technique for the problem of answering queries using views in our setting. We ﬁrst describe how to formulate the problem in terms of logical implication, and then we present a technique to check logical implication in 2EXPTIME worst case complexity. The paper is organized as follows. Section 2 presents the general framework. Section 3 illustrates the use of Description Logics for setting up a particular architecture for data integration, according to the general framework. Section 4 presents the method we use for query answering using views in our architecture. Section 5 describes other works on the problem of answering query using views. Finally, Section 6 concludes the paper.

2

Framework

In this section we set up a logical framework for data integration. Since we assume to work with relational databases, in the following we refer to a relational alphabet A, i.e., an alphabet constituted by a set of predicate and constant symbols. Predicate symbols are used to denote the relations in the database, whereas constant symbols denote the objects stored in relations. We adopt the so-called unique name assumption, i.e., we assume that diﬀerent constants denote diﬀerent objects. A database (DB) DB is simply a set of relations, one for each predicate symbol in the alphabet A. The relation corresponding to the predicate symbol Ri is constituted by a set of tuples of constants, which specify the objects that satisfy the relation associated to Ri . The main components of a data integration system are the mediated schema, the sources, and the queries. Each component is expressed in a speciﬁc language over the alphabet A:

Description Logics for Information Integration

45

– the mediated schema is expressed in the schema language LS , – the sources are modeled as views over the mediated schema, expressed in the view language LV , – queries are issued over the mediated schema, and are expressed in the query language LQ . In what follows, we provide a speciﬁcation of the three components of a data integration system. – The mediated schema S is a set of constraints, each one expressed in the language LS over the alphabet A. The language LS determines the expressiveness allowed for specifying the schema of our database, i.e., the constraints that the database must satisfy. If S is constituted by the constraints {C1 , . . . , Cn }, we say that a database DB satisﬁes S if all constraints C1 , . . . , Cn are satisﬁed by DB. – The sources are modeled in terms of a set of views V = {V1 , . . . , Vm } over the mediated schema. Associated to each view Vi we have: • A deﬁnition def (Vi ) in terms of a query Vi (x) ← vi (x, y) over DB, where vi (x, y) is expressed in the language LV over the alphabet A. The arity of x determines the arity of the view Vi . • A set ext(Vi ) of tuples of constants, which provides the information about the extension of Vi , i.e., the content of the sources. The arity of each tuple is the same as that of Vi . • A speciﬁcation as(Vi ) of which assumption to adopt for the view Vi , i.e., how to interpret the content of the source ext (Vi ) with respect to the actual set of tuples in DB that satisfy Vi . We describe below the various possibilities that we consider for as(Vi ). – A query is expressed in the language LQ over the alphabet A, and is intended to provide the speciﬁcation of which data to extract from the virtual database represented in the integration system. In general, if Q is a query and DB is a database satsfying S, we denote with ans(Q, DB) the set of tuples in DB that satisfy Q. The speciﬁcation as(Vi ) determines how accurate is the knowledge on the pairs satisfying the views, i.e., how accurate is the source with respect to the speciﬁcation def (Vi )1 . As pointed out in several papers [1,29,37,11], the following three assumptions are relevant in a data integration system: – Sound Views. When a view Vi is sound (denoted with as(Vi ) = sound ), its extension provides any subset of the tuples satisfying the corresponding deﬁnition. In other words, from the fact that a tuple is in ext(Vi ) one can conclude that it satisﬁes the view, while from the fact that a tuple is not in ext(Vi ) one cannot conclude that it does not satisfy the view. Formally, a database DB is coherent with the sound view Vi , if ext(Vi ) ⊆ ans(def (Vi ), DB). 1

In some papers, for example [11], diﬀerent assumptions on the domain of the database are also taken into account.

46

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini

– Complete Views. When a view Vi is complete (denoted with as(Vi ) = complete), its extension provides any superset of the tuples satisfying the corresponding deﬁnition. In other words, from the fact that a tuple is in ext(Vi ) one cannot conclude that such a tuple satisﬁes the view. On the other hand, from the fact that a tuple is not in ext (Vi ) one can conclude that such a tuple does not satisfy the view. Formally, a database DB is coherent with the complete view Vi , if ext(Vi ) ⊇ ans(def (Vi ), DB). – Exact Views. When a view Vi is exact (denoted with as(Vi ) = exact ), its extension is exactly the set of tuples of objects satisfying the corresponding deﬁnition. Formally, a database DB is coherent with the exact view Vi , if ext(Vi ) = ans(def (Vi ), DB). The ultimate goal of a data integration system is to allow a client to extract information from the database, taking into account that the only knowledge s/he has on the database is the extension of the set of views, i.e., the content of the sources. More precisely, the problem of extracting information from the data integration system reduces to the problem of answering queries using views. Given – a schema S, – a set of views V = {V1 , . . . , Vm }, with, for each Vi , • its deﬁnition def (Vi ), • its extension ext(Vi ), and • the speciﬁcation as(Vi ) of whether it is sound, complete, or exact, – a query Q of arity n, and – a tuple d = (d1 , . . . , dn ) of constants, the problem consists in deciding whether d ∈ ans(Q, S, V), i.e., deciding whether (d1 , . . . , dn ) ∈ ans(Q, DB), for each DB such that: – DB satisﬁes the schema S, – DB is coherent with V1 , . . . , Vm . ¿From the above deﬁnition, it is easy to see that answering queries using views is essentially an extended form of reasoning in the presence of incomplete information [53]. Indeed, when we answer the query on the basis of the views, we know only the extensions of the views, and this provides us with only partial information on the database. Moreover, since the query language may admit various forms of incomplete information (due to union, for instance), there are in general several possible databases that are coherent with the views. The following example rephrases an example given in [1]. Example 1. Consider a relational alphabet containing (among other symbols) a binary predicate couple, and two constants Ann and Bill. Consider also two views female and male, respectively with deﬁnitions female(f ) ← couple(f, m) male(m) ← couple(f, m)

Description Logics for Information Integration

47

and extensions ext (female) = {Ann} and ext (male) = {Bill}, and assume that there are no constraints imposed by a schema. If both views are sound, we only know that some couple has Ann as its female component and Bill as its male component. Therefore, the query Qc (x, y) ← couple(x, y) asking for all couples would return an empty answer, i.e., ans(Qc , S, V) = ∅. However, if both views are exact, we can conclude that all couples have Ann as their female component and Bill as their male component, and hence that (Ann, Bill) is the only couple, i.e., ans(Qc , S, V) = (Ann, Bill).

3

Specifying the Content of the Data Integration System

We propose here an architecture for data integration that is coherent with the framework described in Section 2, and is based on Description Logics [9,8]. In such an architecture, to specify mediated schemas, views, and queries we use the Description Logic DLR [6]. We ﬁrst introduce DLR, and then we illustrate how we use the logic to specify the three components of a data integration system. 3.1

The Description Logic DLR

Description Logics 2 (DLs) have been introduced in the early 80’s in the attempt to provide a formal ground to Semantic Networks and Frames. Since then they have evolved into knowledge representation languages that are able to capture virtually all class-based representation formalisms used in Artiﬁcial Intelligence, Software Engineering, and Databases. One of the distinguishing features of the work on these logics is the detailed computational complexity analysis both of the associated reasoning algorithms, and of the logical implication problem that the algorithms are supposed to solve. By virtue of this analysis, most of these logics have optimal reasoning algorithms, and practical systems implementing such algorithms are now used in several projects. In DLs, the domain of interest is modeled by means of concepts and relations, which denote classes of objects and relationships, respectively. Here, we focus our attention on the DL DLR [5,6]. The basic elements of DLR are concepts (unary relations), and n-ary relations. We assume to deal with an alphabet A constituted by a ﬁnite set of atomic relations, atomic concepts, and constants, denoted by P , A, and a, respectively. We use R to denote arbitrary relations (of given arity between 2 and nmax ), and C to denote arbitrary concepts, respectively built according to the following syntax: R ::= n | P | $i/n : C | ¬R | R1 R2 C ::= 1 | A | ¬C | C1 C2 | ∃[$i]R | (≤ k [$i]R) where i denotes a component of a relation, i.e., an integer between 1 and nmax , n denotes the arity of a relation, i.e., an integer between 2 and nmax , and k denotes a nonnegative integer. We also use the following abbreviations: 2

See http://dl.kr.org for the home page of Description Logics.

48

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini In PI $i/n : C I (¬R)I (R1 R2 )I

⊆ ⊆ = = =

(∆I )n In {(d1 , . . . , dn ) ∈ In | di ∈ C I } In \ RI R1I ∩ R2I

I1 AI (¬C)I (C1 C2 )I (∃[$i]R)I (≤ k [$i]R)I

= ⊆ = = = =

∆I ∆I ∆I \ C I C1I ∩ C2I {d ∈ ∆I | ∃(d1 , . . . , dn ) ∈ RI . di = d} {d ∈ ∆I | {(d1 , . . . , dn ) ∈ R1I | di = d} ≤ k}

Fig. 1. Semantic rules for DLR (P , R, R1 , and R2 have arity n) – – – –

⊥ for ¬, C1 C2 for ¬(¬C1 ¬C2 ), C1 ⇒ C2 for ¬C1 C2 , and C1 ≡ C2 for (C1 ⇒ C2 ) (C2 ⇒ C1 ).

We consider only concepts and relations that are well-typed, which means that – only relations of the same arity n are combined to form expressions of type R1 R2 (which inherit the arity n), and – i ≤ n whenever i denotes a component of a relation of arity n. The semantics of DLR is speciﬁed as follows. An interpretation I is constituted by an interpretation domain ∆I , and an interpretation function ·I that assigns to each constant an element of ∆I under the unique name assumption, to each concept C a subset C I of ∆I , and to each relation R of arity n a subset RI of (∆I )n , such that the conditions in Figure 1 are satisﬁed. Observe that, the “¬” constructor on relations is used to express diﬀerence of relations, and not the complement [6]. 3.2

Mediated Schema, Views, and Queries

We remind the reader that a mediated schema is constituted by a ﬁnite set of constraints expressed in a schema language LS . In our setting, the schema language LS is based on the DL DLR. In particular, each constraint is formulated as an assertion of one of the following forms: R1 R2

C1 C2

where R1 and R2 are DLR relations of the same arity, and C1 and C2 are DLR concepts. As we said before, a database DB is a set of relations, one for each predicate symbol in the alphabet A. We denote with RDB the relation in DB corresponding

Description Logics for Information Integration

49

to the predicate symbol R (either an atomic concept, or an atomic relation). Note that a database can be seen as an interpretation for DLR, whose domain coincides with the set of constants in the alphabet A. We say that a database DB satisfies an assertion R1 R2 (resp., C1 C2 ) if R1DB ⊆ R2DB (resp., C1DB ⊆ C2DB ). Moreover, DB satisﬁes a schema S if DB satisﬁes all assertions in S. In order to deﬁne views and queries, we now introduce the notion of query expression in our setting. We assume that the alphabet A is enriched with a ﬁnite set of variable symbols, simply called variables. A query expression Q is a non-recursive datalog query of the form Q(x) ← conj 1 (x, y 1 ) ∨ · · · ∨ conj m (x, y m ) where each conj i (x, y i ) is a conjunction of atoms, and x, y i are all the variables appearing in the conjunct. Each atom has one of the forms R(t) or C(t), where t and t are variables in x and y i or constants in A, R is a relation, and C is a concept. The number of variables of x is called the arity of Q, and is the arity of the relation denoted by the query Q. We observe that the atoms in the query expressions are arbitrary DLR relations and concepts, freely used in the assertions of the KB. This distinguishes our approach with respect to [22,39], where no constraints on the relations that appear in the queries can be expressed in the KB. Given a database DB, a query expression Q of arity n is interpreted as the set QDB of n-tuples of constants (c1 , . . . , cn ), such that, when substituting each ci for xi , the formula ∃y 1 .conj 1 (x, y 1 ) ∨ · · · ∨ ∃y m .conj m (x, y m ) evaluates to true in DB. With the introduction of query expressions, we can now deﬁne views and queries. Indeed, in our setting, query expressions constitute both the view language LV , and the query language LQ : – Associated to each view Vi in the set V = {V1 , . . . , Vm } we have: • A deﬁnition def (Vi ) in terms of a query expression • A set ext(Vi ) of tuples of constants, • A speciﬁcation as(Vi ) of which assumption to adopt for the view Vi , where each as(Vi ) is either sound, complete, or exact. – A query is simply a query expression, as deﬁned above. Example 2. Consider for example the following DLR schema Sd , expressing that Americans who have a doctor as relative are wealthy, and that each surgeon is also a doctor American ∃[$1](RELATIVE $2 : Doctor) Wealthy Surgeon Doctor

50

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini

and two sound views V1 and V2 , respectively with deﬁnitions V1 (x) ← RELATIVE(x, y) ∧ Surgeon(y) V2 (x) ← American(x) and extensions ext (V1 ) = {Ann, Bill} ext (V2 ) = {Ann, Dan} Given the query Qw (x) ← Wealthy(x), asking for those who are wealthy, we have that the only constant in ans(Qw , Sd , V) is Ann. Moreover, if we add an exact view V3 with deﬁnition V3 (x) ← Wealthy(x), and an extension ext(V3 ) not containing Bill, then, from the constraints in Sd and the information we have on the views, we can conclude that Bill is not American. 3.3

Discussion

We observe that DLR is able to capture a great variety of data models with many forms of constraints [15,6]. For example, DLR is capable to capture formally Conceptual Data Models typically used in databases [33,24], such as the EntityRelationship Model [18]. Hence, in our setting, query answering using views is done under the constraints imposed by a conceptual data model. The interest in DLR is not conﬁned to the expressiveness it provides for specifying data schemas. It is also equipped with eﬀective reasoning techniques that are sound and complete with respect to the semantics. In particular, checking whether a given assertion logically follows from a set of assertions is EXPTIMEcomplete in DLR (assuming that numbers are encoded in unary), and query containment, i.e., checking whether one query is contained in another one in every model of a set of assertions, is EXPTIME-hard and solvable in 2EXPTIME [6].

4

Query Answering

In this section we study the problem of query answering using views in the setting just deﬁned: the schema is expressed as a DLR knowledge base, and queries and view deﬁnitions are espressed as DLR query expressions. We call the resulting problem answering query using views in DLR. The technical results regarding answering query using views in DLR illustrated in this section are taken from [7]. The ﬁrst thing to observe is that, given a schema S expressed in DLR, a set of views V = {V1 , . . . , Vm }, a query Q, and a tuple d = (d1 , . . . , dn ) of constants, verifying whether, d is in ans(Q, S, V) is essentially a form of logical implication. This observation can be made even sharper if we introduce special assertions, expressed in ﬁrst-order logic with equality, that encode as logical formulas the extension of the views. In particular, for each view V ∈ V, with def (V ) = (V (x) ← v(x, y)) and ext(V ) = {a1 , . . . , ak }, we introduce the following assertions.

Description Logics for Information Integration

51

– If V is sound, then for each tuple ai , 1 ≤ i ≤ k, we introduce the existentially quantiﬁed assertion ∃y.v(ai , y) – If V is complete, then we introduce the universally quantiﬁed assertion ∀x.∀y.((x != a1 ∧ · · · ∧ x != ak ) → ¬v(x, y)) – If V is exact, then, according to the deﬁnition, we treat it as a view that is both sound and complete, and introduce both types of assertions above. Let us call Ext(V) the set of assertions corresponding to the extension of the views V. Now, the problem of query answering using views in DLR, i.e., checking whether d ∈ ans(Q, S, V), can be reformulated as checking whether the following logical implication holds: S ∪ Ext(V) |= ∃y.q(d, y) where q(x, y) is the right hand part of Q. Checking such a logical implication can in turn be rephrased as checking the unsatisﬁability of S ∪ Ext (V) ∪ {∀y.¬q(d, y)} Observe that the assertion ∀y.¬q(d, y) has the same form as the universal assertion used for expressing extensions of complete views, except that the antecedent in the implication is empty. The problem with the newly introduced assertions is that they are not yet expressed in a DL. The next step is to translate them in a DL. Instead of working directly with DLR, we are going to translate the problem of query answering using views in DLR to reasoning in a DL, called CIQ, that directly corresponds to a variant of Propositional Dynamic Logic [20,6]. 4.1

The Description Logic CIQ

The DL CIQ is obtained from DLR by restricting relations to be binary (such relations are called roles and inverse roles) and allowing for complex roles corresponding to regular expressions [20]. Concepts of CIQ are formed according to the following abstract syntax: C ::= | A | C1 C2 | ¬C | ∃R.C | (≤ k Q. C) Q ::= P | P − R ::= Q | R1 R2 | R1 ◦ R2 | R∗ | R− | id (C) where A denotes an atomic concept, C a generic concept, P an atomic role, Q a simple role, i.e., either an atomic role or the inverse of an atomic role, and R a generic role. We also use the following abbreviations:

52

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini AI I (¬C)I (C1 C2 )I (∃R.C)I (≤ k Q. C)I

⊆ = = = = =

∆I ∆I ∆I \ C I C1I ∩ C2I {d ∈ ∆I | ∃(d, d ) ∈ RI . d ∈ C I } {d ∈ ∆I | {(d, d ) ∈ QI | d ∈ C I } ≤ k}

PI (R1 R2 )I (R1 ◦ R2 )I (R∗ )I (R− )I id(C)I

⊆ = = = = =

∆I × ∆I R1I ∪ R2I R1I ◦ R2I (RI )∗ = i≥0 (RI )i {(d1 , d2 ) ∈ ∆I × ∆I | (d2 , d1 ) ∈ RI } {(d, d) ∈ ∆I × ∆I | d ∈ C I }

Fig. 2. Semantic rules for CIQ – ∀R.C for ¬∃R.¬C, – (≥ k Q. C) for ¬(≤ k−1 Q. C) The semantic conditions for CIQ are speciﬁed in Figure 2 3 . The use of CIQ allows us to exploit various results established recently for reasoning in such a logic. The basis of these results lies in the correspondence between CIQ and a variant of Propositional Dynamic Logic [26,35] that includes converse programs and “graded modalities” [25,52] on atomic programs and their converse [47]. CIQ inherits from Propositional Dynamic Logics the ability of internalizing assertions. Indeed, one can deﬁne a role U that essentially corresponds to a universal modality, as the reﬂexive-transitive closure of all roles and inverse roles in the language. Using such a universal modality we can re-express each assertion C1 C2 as the concept ∀U .(C1 ⇒ C2 ). This allows us to re-express logical implication as concept satisﬁability [47]. Concept satisﬁability (and hence logical implication) in CIQ is EXPTIME-complete [20]. Although CIQ does not have constructs for n-ary relations as DLR, it is possible to represent n-ary relations in a sound and complete way wrt concept satisﬁability (and hence logical implication) by means of reification [20]. An atomic relation P is reiﬁed by introducing a new atomic concept AP and n functional roles f1 , . . . , fn , one for each component of P . In this way, a tuple of the relation is represented by an instance of the corresponding concept, which is linked through each of the associated roles to an object representing the component of the tuple. Performing the reiﬁcation requires however some attention, since in a relation there may not be two equal tuples (i.e., constituted by the same components in the same positions) in its extension. In the reiﬁed counterpart, on the other hand, one cannot explicitly rule out (e.g., by using speciﬁc assertions) that there are two objects o1 and o2 “representing” the same tuple, i.e., that are connected to exactly the same objects denoting the components of 3

The notation (RI )i stands for i repetitions of RI – i.e., (RI )1 = RI , and (RI )i = RI ◦ (RI )i−1 .

Description Logics for Information Integration

53

the tuple. However, due to the fundamental inability of CIQ to express that two role sequences meet in the same object, no CIQ concept can force such a situation. Therefore one does not need to take this constraint explicitly into account when reasoning. Finally, we are going to make use of CIQ extended with object-names. An object-name is an atomic concept that, in each model, has as extension a single object. Object-names are not required to be disjoint, i.e, we do not make the unique name assumption on them. Disjointness can be explicitly enforced when needed through explicit assertions. In general, adding object-names to CIQ makes reasoning NEXPTIME-hard [49]. However our use of object-names in CIQ is restricted so as to keep reasoning in EXPTIME. 4.2

Reduction of Answering Queries Using Views in DLR to CIQ Unsatisfiability

We tackle answering queries using views in DLR, by reducing the problem of checking whether d ∈ ans(Q, S, V) to the problem of checking the unsatisﬁability of a CIQ concept in which object-names appear. Object-names are then eliminated, thus obtaining a CIQ concept. We translate S ∪ Ext (V) into a CIQ concept as follows. First, we eliminate n-ary relations by means of reification. Then, we reformulate each assertion in S as a concept by internalizing assertions. Instead, representing assertions in Ext(V) requires the following ad-hoc techniques. We translate each existentially quantiﬁed assertion ∃y.v(a, y) as follows. We represent every constant ai by an object-name Nai , enforcing disjointness between the object-names corresponding to diﬀerent constants. We represent each existentially quantiﬁed variable y, treated as a Skolem constant, by a new object-name without disjointness constraints. We also use additional concept-names representing tuples of objects. Speciﬁcally: – An atom C(t), where C is a concept and t is a term (either a constant or a variable), is translated to ∀U .(Nt ⇒ σ(C)) where σ(C) is the reiﬁed counterpart of C, Nt is the object-name corresponding to t, and U is the reﬂexive-transitive closure of all roles and inverse roles introduced in the reiﬁcation. – An atom R(t), where R is a relation of arity n and t = (t1 , . . . , tn ) is a tuple of terms, is translated to the conjunction of the following concepts: ∀U .(Nt ⇒ σ(R)) where σ(R) is the reiﬁed counterpart of R and Nt is an object-name corresponding to t, ∀U .(Nt ≡ (∃f1 .Nt1 · · · ∃fn .Ntn ))

54

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini

and for each i, 1 ≤ i ≤ n, a concept ∀U .(Nti ⇒((∃fi− .Nt ) (≤ 1 fi− . Nt ))) Then, the translations of the atoms are combined as in v(a, y). To translate universally quantiﬁed assertions corresponding to the complete views and also to the query, it is suﬃcient to deal with assertions of the form: ∀x.∀y.((x != a1 ∧ · · · ∧ x != ak ) → ¬conj (x, y)) Following [6], we construct for conj (x, y) a special graph, called tuple-graph, which reﬂects the dependencies between variables. Speciﬁcally, the tuple-graph is used to detect cyclic dependencies. In general, the tuple-graph is composed of ! ≥ 1 connected components. For the i-th connected component we build a CIQ concept δi (x, y) as in [6]. Such a concept contains newly introduced concepts Ax and Ay , one for each x in x and y in y. We have to treat variables in x and y that occur in a cycle in the tuple-graph diﬀerently from those outside of cycles. Let xc (resp., y c ) denote the variables in x (resp., y) that occur in a cycle, and xl (resp., y l ) those that do not occur in cycles. We ﬁrst deﬁne the concept C[xc /s, y c /t] as the concept obtained from (∀U .¬δ1 (x, y)) · · · (∀U .¬δ (x, y)) as follows: – for each variable xi in xc (resp., yi in y c ), the concept Axi (resp., Ayi ) is replaced by Nsi (resp., Nti ); – for each variable yi in y l , the concept Ayi is replaced by . Then the concept corresponding to the universally quantiﬁed assertion is constructed as the conjunction of: – ∀U .Cxl , where Cxl is obtained from x != a1 ∧ · · · ∧ x != ak by replacing each (x != a) with (Ax ≡ ¬Na ). Observe that (x1 , . . . , xn ) != (a1 , . . . , an ) is an abbreviation for (x1 != a1 ∨ · · · ∨ xn != an ). – One concept C[xc /s, y c /t] for each possible instantiation of s and t with the constants in Ext(V) ∪ {d}, with the proviso that s cannot coincide with any of the ai , for 1 ≤ i ≤ k (notice that the proviso applies only in the case where all variables in x occur in a cycle in the tuple-graph). The critical point in the above construction is how to express a universally quantiﬁed assertion ∀x.∀y.((x != a1 ∧ · · · ∧ x != ak ) → ¬conj (x, y)) If there are no cycles in the corresponding tuple-graph, then we can directly translate the assertion into a CIQ concept. As shown in the construction above,

Description Logics for Information Integration

55

dealing with a nonempty antecedent requires some special care to correctly encode the exceptions to the universal rule. Instead, if there is a cycle, due to the fundamental inability of CIQ to express that two role sequences meet in the same object, no CIQ concept can directly express the universal assertion. The same inability, however, is shared by DLR. Hence we can assume that the only cycles present in a model are those formed by the constants in the extension of the views or those in the tuple for which we are checking whether it is a certain answer of the query. And these are taken care of by the explicit instantiation. As the last step to obtain a CIQ concept, we need to encode object-names in CIQ. To do so we can exploit the construction used in [21] to encode CIQABoxes as concepts. Such a construction applies to the current case without any need of major adaptation. It is crucial to observe that the translation above uses object-names in order to form a sort of disjunction of ABoxes (cfr. [31]). In [7], the following basic fact is proved for the construction presented above. Let Cqa be the CIQ concept obtained by the construction above. Then d ∈ ans(Q, S, V) if and only if Cqa is unsatisﬁable. The size of Cqa is polynomial in the size of the query, of the view deﬁnitions, and of the inclusion assertions in S, and is at most exponential in the number of constants in ext(V) ∪ {d}. The exponential blow-up is due to the number of instantiations of C[xc /s, y c /t] with constants in ext (V) ∪ {d} that are needed to capture universally quantiﬁed assertions. Hence, considering EXPTIME-completeness of satisﬁability in DLR and in CIQ, we get that query answering using views in DLR is EXPTIME-hard and can be done in 2EXPTIME.

5

Related Work

We already observed that query answering using views can be seen as a form of reasoning with incomplete information. The interested reader is referred to [53] for a survey on this subject. We also observe that, to compute the whole set ans(Q, S, V), we need to run the algorithm presented above once for each possible tuple (of the arity of Q) of objects in the view extensions. Since we are dealing with incomplete information in a rich language, we should not expect to do much better than considering each tuple of objects separately. Indeed, in such a setting reasoning on objects, such as query answering, requires sophisticated forms of logical inference. In particular, verifying whether a certain tuple belongs to a query gives rise to a line of reasoning which may depend on the tuple under consideration, and which may vary substantially from one tuple to another. For simple languages we may indeed avoid considering tuples individually, as shown in [45] for query answering in the DL ALN without cyclic TBox assertions. Observe, however, that for such a DL, reasoning on objects is polynomial in both data and expression complexity [36,46], and does not require sophisticated forms of inference. Query answering using views has been investigated in the last years in the context of simpliﬁed frameworks. In [38,44], the problem has been studied for the

56

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini

case of conjunctive queries (with or without arithmetic comparisons), in [2] for disjunctive views, in [48,19,30] for queries with aggregates, in [23] for recursive queries and nonrecursive views, and in [11,12] for several variants of regular path queries. Comprehensive frameworks for view-based query answering, as well as several interesting results for various query languages, are presented in [29,1]. Query answering using views is tightly related to query rewriting [38,23,51]. In particular, [3] studies rewriting of conjunctive queries using conjunctive views whose atoms are DL concepts or roles (the DL used is less expressive thatn DLR). In general, a rewriting of a query with respect to a set of views is a function that, given the extensions of the views, returns a set of tuples that is contained in the answer set of the query with respect to the views. Usually, one ﬁxes a priori the language in which to express rewritings (e.g., unions of conjunctive queries), and then looks for the best possible rewriting expressible in such a language. On the other hand, we may call perfect a rewriting that returns exactly the answer set of the query with respect to the views, independently of the language in which it is expressed. Hence, if an algorithm for answering queries using views exists, it can be viewed as a perfect rewriting [13,14]. The results presented here show the existence of perfect, and hence maximal, rewritings in a setting where the mediated schema, the views, and the query are expressed in DLR.

6

Conclusions

We have illustrated a logic-based framework for data integration, and in particular for the problem of query answering using views in a data integration system. We have addressed the problem for the case of non-recursive datalog queries posed to a mediated schema expressed in DLR. We have considered different assumptions on the view extensions (sound, complete, and exact), and we have presented a technique that solves the problem in 2EXPTIME worst case computational complexity. We have seen in the previous section that an algorithm for answering queries using views is in fact a perfect rewriting. For the setting presented here, it remains open to ﬁnd perfect rewritings expressed in a more declarative query language. Moreover it is of interest to ﬁnd maximal rewritings belonging to well behaved query languages, in particular, languages with polynomial data complexity, even though we already know that such rewritings cannot be perfect [13].

Acknowledgments The work presented here was partly supported by the ESPRIT LTR Project No. 22469 DWQ – Foundations of Data Warehouse Quality, and by MURST Coﬁn 2000 D2I – From Data to Integration. We wish to thank all members of the projects. Also, we thank Daniele Nardi, Riccardo Rosati, and Moshe Y. Vardi, who contributed to several ideas illustrated in the chapter.

Description Logics for Information Integration

57

References 1. Serge Abiteboul and Oliver Duschka. Complexity of answering queries using materialized views. In Proc. of the 17th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS’98), pages 254–265, 1998. 2. Foto N. Afrati, Manolis Gergatsoulis, and Theodoros Kavalieros. Answering queries using materialized views with disjunction. In Proc. of the 7th Int. Conf. on Database Theory (ICDT’99), volume 1540 of Lecture Notes in Computer Science, pages 435–452. Springer-Verlag, 1999. 3. Catriel Beeri, Alon Y. Levy, and Marie-Christine Rousset. Rewriting queries using views in description logics. In Proc. of the 16th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS’97), pages 99–108, 1997. 4. Mokrane Bouzeghoub and Maurizio Lenzerini. Special issue on data extraction, cleaning, and reconciliation. Information Systems, 26(8), pages 535–536, 2001. 5. Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini. Conjunctive query containment in Description Logics with n-ary relations. In Proc. of the 1997 Description Logic Workshop (DL’97), pages 5–9, 1997. 6. Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini. On the decidability of query containment under constraints. In Proc. of the 17th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS’98), pages 149–158, 1998. 7. Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini. Answering queries using views over description logics knowledge bases. In Proc. of the 17th Nat. Conf. on Artificial Intelligence (AAAI 2000), pages 386–391, 2000. 8. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, Daniele Nardi, and Riccardo Rosati. Description logic framework for information integration. In Proc. of the 6th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR’98), pages 2–13, 1998. 9. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, Daniele Nardi, and Riccardo Rosati. Information integration: Conceptual modeling and reasoning support. In Proc. of the 6th Int. Conf. on Cooperative Information Systems (CoopIS’98), pages 280–291, 1998. 10. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, Daniele Nardi, and Riccardo Rosati. Data integration in data warehousing. Int. J. of Cooperative Information Systems, 10(3), pages 237–271, 2001. 11. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. Answering regular path queries using views. In Proc. of the 16th IEEE Int. Conf. on Data Engineering (ICDE 2000), pages 389–398, 2000. 12. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. Query processing using views for regular path queries with inverse. In Proc. of the 19th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS 2000), pages 58–66, 2000. 13. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. View-based query processing and constraint satisfaction. In Proc. of the 15th IEEE Symp. on Logic in Computer Science (LICS 2000), pages 361–371, 2000. 14. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. What is query rewriting? In Proc. of the 7th Int. Workshop on Knowledge Representation meets Databases (KRDB 2000), pages 17–27. CEUR Electronic Workshop Proceedings, http://sunsite.informatik.rwth-aachen.de/Publications/ CEUR-WS/Vol-29/, 2000.

58

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini

15. Diego Calvanese, Maurizio Lenzerini, and Daniele Nardi. Description logics for conceptual data modeling. In Jan Chomicki and G¨ unter Saake, editors, Logics for Databases and Information Systems, pages 229–264. Kluwer Academic Publisher, 1998. 16. Tiziana Catarci and Maurizio Lenzerini. Representing and using interschema knowledge in cooperative information systems. J. of Intelligent and Cooperative Information Systems, 2(4):375–398, 1993. 17. S. Chaudhuri, S. Krishnamurthy, S. Potarnianos, and K. Shim. Optimizing queries with materialized views. In Proc. of the 11th IEEE Int. Conf. on Data Engineering (ICDE’95), Taipei (Taiwan), 1995. 18. P. P. Chen. The Entity-Relationship model: Toward a uniﬁed view of data. ACM Trans. on Database Systems, 1(1):9–36, March 1976. 19. Sara Cohen, Werner Nutt, and Alexander Serebrenik. Rewriting aggregate queries using views. In Proc. of the 18th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS’99), pages 155–166, 1999. 20. Giuseppe De Giacomo and Maurizio Lenzerini. What’s in an aggregate: Foundations for description logics with tuples and sets. In Proc. of the 14th Int. Joint Conf. on Artificial Intelligence (IJCAI’95), pages 801–807, 1995. 21. Giuseppe De Giacomo and Maurizio Lenzerini. TBox and ABox reasoning in expressive description logics. In Luigia C. Aiello, John Doyle, and Stuart C. Shapiro, editors, Proc. of the 5th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR’96), pages 316–327. Morgan Kaufmann, Los Altos, 1996. 22. Francesco M. Donini, Maurizio Lenzerini, Daniele Nardi, and Andrea Schaerf. ALlog: Integrating Datalog and description logics. J. of Intelligent Information Systems, 10(3):227–252, 1998. 23. Oliver M. Duschka and Michael R. Genesereth. Answering recursive queries using views. In Proc. of the 16th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS’97), pages 109–116, 1997. 24. Ramez A. ElMasri and Shamkant B. Navathe. Fundamentals of Database Systems. Benjamin and Cummings Publ. Co., Menlo Park, California, 1988. 25. M. Fattorosi-Barnaba and F. De Caro. Graded modalities I. Studia Logica, 44:197– 221, 1985. 26. Michael J. Fischer and Richard E. Ladner. Propositional dynamic logic of regular programs. J. of Computer and System Sciences, 18:194–211, 1979. 27. Daniela Florescu, Alon Y. Levy, Ioana Manolescu, and Dan Suciu. Query optimization in the presence of limited access patterns. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 311–322, 1999. 28. Helena Galhardas, Daniela Florescu, Dennis Shasha, and Eric Simon. An extensible framework for data cleaning. Technical Report 3742, INRIA, Rocquencourt, 1999. 29. G¨ osta Grahne and Alberto O. Mendelzon. Tableau techniques for querying information sources through global schemas. In Proc. of the 7th Int. Conf. on Database Theory (ICDT’99), volume 1540 of Lecture Notes in Computer Science, pages 332– 347. Springer-Verlag, 1999. 30. St´ephane Grumbach, Maurizio Rafanelli, and Leonardo Tininini. Querying aggregate data. In Proc. of the 18th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS’99), pages 174–184, 1999. 31. Ian Horrocks, Ulrike Sattler, Sergio Tessaris, and Stephan Tobies. Query containment using a DLR ABox. Technical Report LTCS-Report 99-15, RWTH Aachen, 1999.

Description Logics for Information Integration

59

32. Michael N. Huhns, Nigel Jacobs, Tomasz Ksiezyk, Wei-Min Shen an Munindar P. Singh, and Philip E. Cannata. Integrating enterprise information models in Carnot. In Proc. of the Int. Conf. on Cooperative Information Systems (CoopIS’93), pages 32–42, 1993. 33. R. B. Hull and R. King. Semantic database modelling: Survey, applications and research issues. ACM Computing Surveys, 19(3):201–260, September 1987. 34. Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou, and Panos Vassiliadis, editors. Fundamentals of Data Warehouses. Springer-Verlag, 1999. 35. Dexter Kozen and Jerzy Tiuryn. Logics of programs. In Jan van Leeuwen, editor, Handbook of Theoretical Computer Science — Formal Models and Semantics, pages 789–840. Elsevier Science Publishers (North-Holland), Amsterdam, 1990. 36. Maurizio Lenzerini and Andrea Schaerf. Concept languages as query languages. In Proc. of the 9th Nat. Conf. on Artificial Intelligence (AAAI’91), pages 471–476, 1991. 37. Alon Y. Levy. Obtaining complete answers from incomplete databases. In Proc. of the 22nd Int. Conf. on Very Large Data Bases (VLDB’96), pages 402–412, 1996. 38. Alon Y. Levy, Alberto O. Mendelzon, Yehoshua Sagiv, and Divesh Srivastava. Answering queries using views. In Proc. of the 14th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS’95), pages 95–104, 1995. 39. Alon Y. Levy and Marie-Christine Rousset. CARIN: A representation language combining Horn rules and description logics. In Proc. of the 12th Eur. Conf. on Artificial Intelligence (ECAI’96), pages 323–327, 1996. 40. Alon Y. Levy, Divesh Srivastava, and Thomas Kirk. Data model and query evaluation in global information systems. J. of Intelligent Information Systems, 5:121– 143, 1995. 41. Chen Li and Edward Chang. Query planning with limited source capabilities. In Proc. of the 16th IEEE Int. Conf. on Data Engineering (ICDE 2000), pages 401–412, 2000. 42. Chen Li and Edward Chang. On answering queries in the presence of limited access patterns. In Proc. of the 8th Int. Conf. on Database Theory (ICDT 2001), 2001. 43. Chen Li, Ramana Yerneni, Vasilis Vassalos, Hector Garcia-Molina, Yannis Papakonstantinou, Jeﬀrey D. Ullman, and Murty Valiveti. Capability based mediation in TSIMMIS. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 564–566, 1998. 44. Anand Rajaraman, Yehoshua Sagiv, and Jeﬀrey D. Ullman. Answering queries using templates with binding patterns. In Proc. of the 14th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS’95), 1995. 45. Marie-Christine Rousset. Backward reasoning in ABoxes for query answering. In Proc. of the 1999 Description Logic Workshop (DL’99), pages 18–22. CEUR Electronic Workshop Proceedings, http://sunsite.informatik.rwth-aachen. de/Publications/CEUR-WS/Vol-22/, 1999. 46. Andrea Schaerf. Query Answering in Concept-Based Knowledge Representation Systems: Algorithms, Complexity, and Semantic Issues. PhD thesis, Dipartimento di Informatica e Sistemistica, Universit` a di Roma “La Sapienza”, 1994. 47. Klaus Schild. A correspondence theory for terminological logics: Preliminary report. In Proc. of the 12th Int. Joint Conf. on Artificial Intelligence (IJCAI’91), pages 466–471, Sydney (Australia), 1991. 48. D. Srivastava, S. Dar, H. V. Jagadish, and A. Levy. Answering queries with aggregation using views. In Proc. of the 22nd Int. Conf. on Very Large Data Bases (VLDB’96), pages 318–329, 1996.

60

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini

49. Stephan Tobies. The complexity of reasoning with cardinality restrictions and nominals in expressive description logics. J. of Artificial Intelligence Research, 12:199–217, 2000. 50. O. G. Tsatalos, M. H. Solomon, and Y. E. Ioannidis. The GMAP: A versatile tool for phyisical data independence. Very Large Database J., 5(2):101–118, 1996. 51. Jeﬀrey D. Ullman. Information integration using logical views. In Proc. of the 6th Int. Conf. on Database Theory (ICDT’97), volume 1186 of Lecture Notes in Computer Science, pages 19–40. Springer-Verlag, 1997. 52. Wiebe Van der Hoek and Maarten de Rijke. Counting objects. J. of Logic and Computation, 5(3):325–345, 1995. 53. Ron van der Meyden. Logical approaches to incomplete information. In Jan Chomicki and G¨ unter Saake, editors, Logics for Databases and Information Systems, pages 307–356. Kluwer Academic Publisher, 1998. 54. Jennifer Widom. Special issue on materialized views and data warehousing. IEEE Bulletin on Data Engineering, 18(2), 1995. 55. Ramana Yerneni, Chen Li, Hector Garcia-Molina, and Jeﬀrey D. Ullman. Computing capabilities of mediators. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 443–454, 1999. 56. Ramana Yerneni, Chen Li, Jeﬀrey D. Ullman, and Hector Garcia-Molina. Optimizing large join queries in mediation systems. In Proc. of the 7th Int. Conf. on Database Theory (ICDT’99), pages 348–364, 1999.

Search and Optimization Problems in Datalog Sergio Greco1,2 and Domenico Sacc` a1,2 1

DEIS, Univ. della Calabria, 87030 Rende, Italy 2 ISI-CNR, 87030 Rende, Italy {greco,sacca}@deis.unical.it

Abstract. This paper analyzes the ability of DATALOG languages to express search and optimization problems. It is ﬁrst shown that NP search problems can be formulated as unstratiﬁed DATALOG queries under nondeterministic stable model semantics so that each stable model corresponds to a possible solution. NP optimization problems are then formulated by adding a max (or min) construct to select the stable model (thus, the solution) which maximizes (resp., minimizes) the result of a polynomial function applied to the answer relation. In order to enable a simpler and more intuitive formulation for search and optimization problems, it is introduced a DATALOG language in which the use of stable model semantics is disciplined to refrain from abstruse forms of unstratiﬁed negation. The core of our language is stratiﬁed negation extended with two constructs allowing nondeterministic selections and with query goals enforcing conditions to be satisﬁed by stable models. The language is modular as the level of expressivity can be tuned and selected by means of a suitable use of the above constructs, thus capturing signiﬁcant subclasses of search and optimization queries.

1

Introduction

DATALOG is a logic-programming language that was designed for database applications, mainly because of its declarative style and its ability to express recursive queries[3,32]. Later DATALOG has been extended along many directions (e.g., various forms of negations, aggregate predicates and set constructs) to enhance its expressive power. In this paper we investigate the ability of DATALOG languages to express search and optimization problems. We recall that, given an alphabet Σ, a search problem is a partial multivalued function f , deﬁned on some (not necessarily proper) subset of Σ ∗ , say dom(f ), which maps every string x of dom(f ) into a number of strings y1 , · · · , yn (n > 0), thus f (x) = {y1 , · · · , yn }. The function f is therefore represented by the following relation on Σ ∗ ×Σ ∗ : graph(f ) = {(x, y)| x ∈ dom(x) and y ∈ f (x)}. We say that graph(f ) is polynomially balanced if for each (x, y) in graph(f ), the size of y is polynomially bounded in the size of x. NP search problems are those functions

Work partially supported by the Italian National Research Council (CNR) and by MURST (projects DATA-X and D2I).

A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 61–82, 2002. c Springer-Verlag Berlin Heidelberg 2002

62

Sergio Greco and Domenico Sacc` a

f for which both graph(f ) is polynomially balanced and graph(f ) is in NP, i.e., given x, y ∈ Σ ∗ , deciding whether (x, y) ∈ graph(f ) is in NP. In this paper we show that NP search problems can be formulated as DATALOG¬ (i.e., DATALOG with unstratiﬁed negation) queries under the nondeterministic version of total stable model semantics[11], thus the meaning of a DATALOG¬ program is given by any stable model. As an example of the language, take the Vertex Cover problem: given a graph G = (V, E), ﬁnd a vertex cover — a subset V of V is a vertex cover of G if for each pair edge (x, y) in E either x or y is in V . The problem can be formulated by the query Pvc , v (X) where Pvc is the following DATALOG¬ program: v (X) ← v(X), ¬v”(X). v”(X) ← v(X), ¬v (X). no cover ← e(X, Y), ¬v (X), ¬v (Y). refuse no cover ← no cover, ¬refuse no cover. The predicates v and e deﬁne the vertices and the edges of the graph by means of a suitable number of facts. The last rule enforces that every total stable model correspond to some vertex cover (otherwise no cover would be true and, then, the atom ref use no cover would result undeﬁned). In order to enable a simpler and more intuitive formulation of search problems, we introduce a DATALOG language where the usage of stable model semantics is disciplined to avoid both undeﬁnedness and unnecessary computational complexity, and to refrain from abstruse forms of unstratiﬁed negation. Thus the core of our language is stratiﬁed negation extended with two constructs (choice and subset) allowing nondeterministic selections and an additional ground goal (called constraint goal) in the query, enforcing conditions to be satisﬁed by stable models. For instance, the above query can be formulated as Pvc , !¬no cover, v (X) where Pvc is the following stratiﬁed DATALOG¬ program with a subset construct to nondeterministically select a subset of the vertices: v (X) ⊆ v(X). no cover ← e(X, Y), ¬v (X), ¬v (Y). The constraint goal !¬no cover speciﬁes that only those stable models by which ¬no cover is made true are to be taken into consideration. The expressive power (and the complexity as well) of the language gradually increases by moving from the basic language (stratiﬁed DATALOG¬ ) up to the whole repertoire of additional constructs. Observe that, if we do not add any constraint goal in the query, the query reduces to a stratiﬁed program with additional constructs for nondeterministic selections, which cannot be eventually retracted, thus avoiding exponential explosion of the search space. For example, the query Pst , st(X, Y ) , where Pst is deﬁned below, computes a spanning tree of the graph G in polynomial time: st(nil, X) ← v(X), choice((), (X)). st(X, Y) ← st(Z, X), e(X, Y), st(nil, Z), Y = Z, Y = X, choice((X), (Y)).

Search and Optimization Problems in Datalog

63

The ﬁrst choice selects any vertex of the graph as the root of the tree; the second choice selects one vertex y at a time to be added to the current spanning tree st so that y is connected to exactly one vertex x of st, thus preserving the tree structure. Polynomial-time computation is guaranteed since nondeterministic selections made by the choice constructs cannot be eventually discarded because there is no constraint goal to satisfy as in the example of vertex cover. Observe that also a vertex cover can be computed in polynomial time; thus we may rewrite the above query using the choice construct without constraint goal so that polynomial-time computation is guaranteed. Obviously, this kind of rewriting is not feasible for all NP search queries as they can be NP hard. In the paper we characterize various classes of search queries, including tractable classes (for which an answer can be computed in polynomial time), and we show how such classes can be captured by a suitably disciplined usage of our DATALOG¬ language. In the paper we also deal with the issue of formulating optimization problems. We recall that an optimization (min or max) problem, associated to a search problem f , is a function g such that dom(g) = dom(f ) and for each x ∈ dom(g), g(x) = {y| y ∈ f (x) and for each other y ∈ f (x), |y| ≤ |y | (or |y| ≥ |y | if is a maximization problem)}. The optimization problems associated to NP search problems are called NP optimization problems. We show that NP optimization problems can be formulated as DATALOG¬ queries under the non deterministic version of total stable model semantics by using a max (or min) construct to select the model which maximizes (resp., minimizes) the cardinality of the answer relation. As an example of the language, take the Min Vertex Cover problem: given a graph G = (V, E), ﬁnd the vertex cover with minimal cardinality. The problem can be formulated by the query Pvc , !¬no cover, min(v (X)) where Pvc is the above program. The goal min(v (X)) further restricts the set of suitable stable models to those for which the subset of nodes v is minimum. The advantage of expressing NP search and NP optimization problems by using rules with built-in predicates rather than standard DATALOG¬ rules, is that the use of built-in atoms preserves simplicity and intuition in expressing problems and permits to perform query optimization. The language is ‘modular’ in the sense that the desired level of expressivity is achieved by enabling the constructs for non-stratiﬁed negation only when needed; in particular, if no constraint goal and min/max goal are used then polynomial time computation is guaranteed. The paper is organized as follows. In Section 2 we introduce search and optimization queries and provide a formal ground for their classiﬁcation using results from complexity theory on multivalued functions. In Section 3 we prove that NP search queries coincide with DATALOG¬ queries under nondeterministic total stable model semantics. We also introduce the min/max goal to capture NP optimization queries. In order to capture meaningful subclasses of NP search and optimization queries, in Section 4 we then present our language, called DATALOG¬s ,c , and we show its ability of expressing tractable NP search problems. We also prove that optimization problems can be hard also when associated to

64

Sergio Greco and Domenico Sacc` a

tractable search problems. This explains the renewed attention [26,25,19,20,6,7] towards optimization problems, mainly with the aim of characterizing classes of problems that are constant or log approximable (i.e., there is a polynomial time algorithm that approximates the optimum value of the problem within a factor that is respectively constant or logarithmic in the size of the input). In Section 5 we introduce suitable restrictions to DATALOG¬s ,c in order to capture NP optimization subclasses that are approximable and present meaningful examples. We draw conclusions and discuss further work in Section 6.

2

Search and Optimization Queries

We assume that the reader is familiar with the basic terminology and notation of relational databases and of database queries [3,18,32]. A relational database scheme DB over a ﬁxed countable domain U is a set of relation symbols {r1 , ..., rk } where each ri has a given arity, denoted by |ri |. A database D on DB is a ﬁnite structure (A, R1 , ..., Rk ) where A ⊆ U is the active domain and Ri ⊆ A|ri | are the (ﬁnite) relations of the database, one for each relation scheme ri — we denote A by U (D) and Ri by D(ri ). We assume that a database is suitably encoded by a string and the recognition of whether a string represents a database on DB is done in polynomial time. Definition 1. Given a database scheme DB and an additional relation symbol f (the query goal), a search query N Q = DB, f is a (possibly partial) multivalued recursive function which maps every database D on DB to a ﬁnite, non-empty set of ﬁnite (possibly empty) relations F ⊆ U (D)|f | and is invariant under an isomorphism on U − W , where W is any ﬁnite subset of U (i.e., the function is W -generic). Thus N Q(D) yields a set of relations on the goal, that are the answers of the query; the query has no answer if this set is empty or the function is not deﬁned on D. 2 The class of all search queries is denoted by NQ. In classifying query classes, we shall refer to the following complexity classes of languages: the class P (languages that are recognized by deterministic Turing machines in polynomial time), the class NP (languages that are recognized by nondeterministic Turing machines in polynomial time), and the class coNP (the complement of NP) — the reader can refer to [10,17,24] for excellent sources of information on this subject. As search queries correspond to functions rather than to languages as it instead happens for boolean queries, we next introduce, for their classiﬁcation, some background on complexity of functions (for a more comprehensive description of this topic we address readers to [30,31,9]). Let a ﬁnite alphabet Σ with at least two elements be given. A partial multivalued (MV) function f : Σ ∗ → Σ ∗ associates zero, one or several outcomes (outputs) to each input string. Let f (x) stands for the set of possible results of f on an input string x; thus, we write y ∈ f (x) if y is a value of f on the input string x. Deﬁne dom(f ) = {x | ∃y(y ∈ f (x))} and graph(f ) = { x, y | x ∈

Search and Optimization Problems in Datalog

65

dom(f ), y ∈ f (x)}. If x ∈dom(f ), we will say that f is undeﬁned at x. It is now clear that a search query is indeed a computable MV function: the input x is a suitable encoding of a database D and each output string y encodes an answer of the query. A computable (i.e., partial recursive) MV function f is computed by some Turing transducer, i.e., a (deterministic or not) Turing machine T which, in addition to accept any string x ∈ dom(f ), writes a string y ∈ f (x) on an output tape before entering the accepting state. So, if x ∈ dom(f ), the set of all strings that are written in all accepting computations is f (x); on the other hand, if x ∈ dom(f ), T never enters the accepting state. Given two MV functions f and g, deﬁne g to be a reﬁnement of f if dom(g) = dom(f ) and graph(g) ⊆ graph(f ). Moreover, given a class G of MV functions, we say that f ∈c G if G contains a reﬁnement of f . For a class of MV functions F , deﬁne F ⊆c G if, for all f ∈ F , f ∈c G. Since we are in general interested in ﬁnding any output of a MV function, an important practical question is whether an output can be eﬃciently computed by means of a polynomial-time, singlevalued function. In other terms, desirable MV function classes are those which are reﬁned by PF, where PF is the class of all functions that are computed by deterministic polynomial-time transducers. Let us now recall some important classes of MV functions. A MV function f is polynomially balanced if, for each x, the size of each result in f (x) is polynomially bounded in the size of x. The class NPMV is deﬁned as the set of all MV functions f such that (i) f is polynomially balanced, and (ii) graph(f ) is in NP. By analogy, the classes NPMV g and coNPMV are deﬁned as the classes of all polynomiallybalanced multivalued functions f for which graph(f ) is respectively in P and in coNP. Observe that NPMV consists of all MV functions that are computed by nondeterministic transducers in polynomial time [30]. Definition 2. 1. NQPMV (resp., NQPMV g and coNQPMV) is the class of all search queries which are in NPMV (resp., NPMV g and coNPMV) — we shall also call the queries in this class NP search queries; 2. NQPTIME is the class of all queries that are computed by a nondeterministic polynomial-time transducer for which every computation path ends in an accepting state; 2 3. NQPTIME g is equal to NQPTIME ∩ NQPMV g . Observe that a query N Q = DB, f is in NQPMV (resp., NQPMV g and coNQPMV) if and only if for each database D on DB and for each relation F on f , deciding whether F is in N Q(D) is in NP (resp., in P and in coNP). We stress that NQPMV is diﬀerent from the class NQPTIME ﬁrst introduced in [1,2] — in fact, the latter class consists of all queries in NQPMV for which acceptance is guaranteed no matter which nondeterministic moves are guessed by the transducer.

66

Sergio Greco and Domenico Sacc` a

We next present some results on whether the above query classes can be reﬁned by PF, thus whether a query answer in these classes can be computed in deterministic polynomial time — the results have been proven in [21]. Fact 1 [21] 1. NQPMV g ⊆ (NQPMV ∩ coNQPMV) and the inclusion is strict unless P = NP; 2. neither NQPMV ⊆ coNQPMV nor coNQPMV ⊆ NQPMV unless NP = coNP; 3. NQPTIME ⊂ NQPMV, NQPTIME ⊆ coNQPMV unless NP = coNP, and NQPTIME g ⊆ NQPTIME and the inclusion is strict unless P = NP; 2 4. NQPTIME ⊆c PF and NQPMV g ⊆c PF unless P = NP. It turns out that queries in NQPTIME and NQPTIME g can be eﬃciently computed whereas queries in the other classes may not. Observe that queries in NQPTIME have a strange anomaly: computing an answer can be done in polynomial time, but testing whether a given relation is an answer cannot (unless P = NP). This anomaly does not occur in the class NQPTIME g which, therefore, turns out to be very desirable. Example 1. Let a database scheme DBG = {v, e} represent a directed graph G = (V, E) such that v has arity 1 and deﬁnes the nodes while e has arity 2 and deﬁnes the edges. We recall that a kernel is a subset V of V such that (i) no two nodes in V are joined by an edge and (ii) for each node x not in V , there is a node y in V for which (y, x) ∈ E. – N QKernel is the query which returns the kernels of the input graph G; if the graph has no kernel then the query is not deﬁned. The query is in NQPMV g , but an answer cannot be computed in polynomial time unless P = NP since deciding whether a graph has a kernel is NP-complete [10]. – N QSubKernel is the query that, given an input graph G, returns any subset of some kernel of G. This query is in NQPMV, but neither in NQPMV g (unless P = NP) nor in coNQPMV (unless NP = coNP). – N QN odeN oK is the query that, given an input graph G, returns a node not belonging to any kernel of G. This query is in coNQPMV, but not in NQPMV (unless NP = coNP). – N Q01K is the query that, given a graph G, returns the relation {0} if G has no kernel, {1} if every subset of nodes of G is a kernel, both relations {0} and {1} otherwise. Clearly, the query is in NQPTIME : indeed it is easy to construct a non-deterministic polynomial-time transducer which ﬁrst nondeterministically generates any subset of nodes of G and then outputs {1} or {0} according to whether this subset is a kernel or not. The query is not in NQPTIME g otherwise we could check in polynomial time if a graph has a kernel – as the graph has a kernel iﬀ {1} is a result of N Q01K – and, therefore, P would coincide with NP. – N QCUT is the query which returns a subset E of the edges such that the 2 graph G = (V, E ) is 2-colorable. The query is in NQPTIME g .

Search and Optimization Problems in Datalog

67

According to Fagin’s well-known result [8], a class of ﬁnite structures is NPrecognizable iﬀ it is deﬁnable by a second order existential formula, thus queries in NQPMV may be expressed as follows. Fact 2 Let N Q = DB, f be a search query in NQPMV, then there is a sequence S of relation symbols s1 , . . . , sk , distinct from those in DB ∪ {f }, and a closed ﬁrst-order formula φ(DB, f, S) such that for each database D on DB, N Q(D) = 2 { F : F ⊆ U (D)|f | , Si ⊆ U (D)|si | (1 ≤ i ≤ k), and φ(D, F, S) is true }. From now on, we shall formulate a query in NQPMV as N Q = { f : (DB, f, S) |= φ(DB, f, S) }. Example 2. CUT. The query N QCUT of Example 1 can be deﬁned as follows: { e : (DB G , e , s) |= (∀x, y)[e (x, y) → ( (e(x, y) ∧ s(x) ∧ ¬s(y)) ∨(e(x, y) ∧ ¬s(x) ∧ s(y)) ) ] }.

2

Example 3. KERNEL. The query N QKernel of Example 1 can be deﬁned as: { v : (DB G , v ) |= (∀x) [ (v (x) ∧ ∀y(¬v (y) ∨ ¬e(x, y))) ∨(¬v (x) ∧ ∃y(v (y) ∧ e(y, x))) ] }

2

Definition 3. Given a search query N Q = DB, f , an optimization query OQ = opt(N Q) = DB, opt(f ) , where opt is either max or min, is a search query reﬁning N Q such that for each database D on DB for which N Q is deﬁned, OQ(D) = opt|F | {F : F ∈ N Q(D)} — i.e., OQ(D) consists of the answers in N Q(D) with the maximum or minimum (resp., if opt = max or min) cardinality. The query N Q is called the search query associated to OQ and the relations in N Q(D) are the feasible solutions of OQ. The class of all optimization queries is denoted by OPT NQ. Given a search class QC, the class of all queries whose search queries are in QC is denoted by OPT QC. The queries in the class OPT NQPMV are called NP optimization 2 queries. Proposition 1. Let OQ = DB, opt|f | be an optimization query, then the following statements are equivalent: 1. OQ is in OPT NQPMV. 2. There is a closed ﬁrst-order formula φ(DB, f, S) over relation symbols DB ∪ {f } ∪ S such that OQ = opt|f | {f : (DB, f, S) |= φ(DB, f, S)}. 3. There is a ﬁrst-order formula φ(w, DB, S), where w is a a(f )-tuple of distinct variables, such that the relation symbols are those in DB∪S, the free variables are exactly those in w, and OQ = opt|w| {w : (DB, S) |= φ(w, DB, S)}).

68

Sergio Greco and Domenico Sacc` a

PROOF. The equivalence of statements (1) and (2) is obvious. Clearly optimization formulae deﬁned in Item 2 (called feasible in [20]) are a special case of ﬁrst order optimization formulae deﬁned in Item 3 which deﬁne the class OPT PB, of all optimization problems that can be logically deﬁned. Moreover, in [20] it has been shown that the class OPT PB, can be expressed by means of 2 feasible optimization ﬁrst order formulae. The above results pinpoint that the class OPT NQPMV corresponds to the class OPT PB of all optimization problems that can be logically deﬁned [19,20]. For simplicity, but without substantial loss of generality, we use as objective function the cardinality rather than a generic polynomial-time computable function. Moreover, we output the relation with the optimal cardinality rather than just the cardinality. Example 4. MAX-CUT. The problem consists in ﬁnding the cardinality of the largest cut in the graph G = (V, E). The query coincides with max(N Qcut ) (see Example 2) and can also be deﬁned as: max({ (x, y) : (DBG , s) |= [(e(x, y) ∧ s(x) ∧ ¬s(y)) ∨ (e(x, y) ∧ ¬s(x) ∧ s(y))]}). The query is an NP maximization query.

2

Example 5. MIN-KERNEL. In this case we want to ﬁnd the minimum cardinality of the kernels of a graph G = (V, E). The query is min(N Qkernel ) (see Example 3) and can be equivalently deﬁned as: min({ w : (DB G , v ) |= v (w) ∨ ¬(∀x) [ (v (x) ∧ ∀y(¬v (y) ∨ ¬e(x, y))) ∨(¬v (x) ∧ ∃y(v (y) ∧ e(y, x))) ] }) This query is a NP minimization query.

2

Finally, note that the query max(N QKernel ) equals the query max(N QSubKernel ) although their search queries are distinct. The following results show that in general optimization queries are much harder than search queries, e.g., they cannot be solved in polynomial time even when the associated query is in NQPTIME g . Proposition 2. 1. neither OPT NQPMV ⊆ coNQPMV nor OPT NQPMV ⊆ NQPMV unless NP = coNP; 2. neither OPT coNQPMV ⊆ coNQPMV nor OPT coNQPMV ⊆ NQPMV unless NP = coNP; 3. OPT NQPMV g ⊂ coNQPMV and OPT NQPMV g ⊆ NQPMV unless NP = coNP; 4. neither OPT NQPTIME ⊆ coNPMV nor OPT NQPTIME ⊆ NQPMV unless NP = coNP; 5. OPT NQPTIME g ⊂ coNQPMV and OPT NQPTIME g ⊆ NQPMV g unless P = NP.

Search and Optimization Problems in Datalog

69

PROOF. 1. Let max Q be a query in MAX NQPMV — the same argument would hold also for a minimization query. Then, given a database D, to decide whether a relation f is an answer of max Q(D), ﬁrst we have to test whether f is an answer of Q(D) and, then, we must verify that there is no other answer of Q(D) with fewer tuples than f . As the former test is in NP and the latter test is in coNP, it is easy to see that deciding whether f is an answer of max Q(D) is neither in NP nor in coNP unless NP=coNP — indeed it is in the class DP [24]. 2. Let us now assume that the query in the proof of part (1) is in MAX coNQPMV. Then testing whether f is an answer of Q(D) is in coNP whereas verifying that there is no other answer of Q(D) with fewer tuples than f is in coNP NP , that is a class at the second level of the polynomial hierarchy [24]. 3. Suppose now that the query in the proof of part (1) is in MAX NQPMV g . Then testing whether f is an answer of Q(D) is in P whereas verifying that there is no other answer of Q(D) with fewer tuples than f is in coNP. 4. Take any query max Q in MAX NQPMV. We construct the query Q by setting Q (D) = Q(D) ∪ {∅} for each D. Then Q is in NQPTIME as the transducer for Q can now accept on every branch by eventually returning the empty relation. It is now easy to see that the complexity of ﬁnding the maximum answer for Q is in general the same of ﬁnding the maximum answer for Q. So the results follow from part (1). 5. OPT NQPTIME g ⊂ coNQPMV follows from part (3) as NQPTIME g ⊂ NQPMV g by deﬁnition. Consider now the query Q returning a maximal clique (i.e., a clique which is not contained in another one) of an undirected graph. Q is obviously in NQPTIME g as a maximal clique can be constructed by selecting any node and adding additional nodes as long as the clique property is preserved. We have that max Q is the query returning the maximum clique in a graph (i.e., the maximal clique with the maximum number of 2 nodes) which is known to be NP-hard.

3

Search and Optimization Queries in DATALOG

We assume that the reader is familiar with basic notions of logic programming and DATALOG¬ [3,22,32]. A program P is a ﬁnite set of rules r of the form H(r) ← B(r), where H(r) is an atom (head of the rule) and B(r) is a conjunction of literals (body of the rule). A rule with empty body is called a fact. The ground instantiation of P is denoted by ground(P ); the Herbrand universe and the Herbrand base of P are denoted by UP and BP , respectively. An interpretation I ⊆ BP is a T-stable (total stable) model [11] if I = T∞ pos(P,I) (∅), where T is the classical immediate consequence transformation and pos(P, I) denotes the positive logic program that is obtained from ground(P )

70

Sergio Greco and Domenico Sacc` a

by (i) removing all rules r such that there exists a negative literal ¬A in B(r) and A is in I, and (ii) by removing all negative literals from the remaining rules. It is well-known that a program may have n T-stable models with n ≥ 0. Given a program P and two predicate symbols p and q, we write p → q if there exists a rule where q occurs in the head and p in the body or there exists a predicate s such that p → s and s → q. A program is stratiﬁed if there exists no rule where a predicate p occurs in a negative literal in the body, q occurs in the head and q → p, i.e. there is no recursion through negation [5]. Stratiﬁed programs have a unique stable model which coincides with the stratiﬁed model, obtained by partitioning the program into an ordered number of suitable subprograms (called ’strata’) and computing the ﬁxpoints of every stratum in their order [5]. A DATALOG¬ program is a logic program with negation in the rule bodies, but without functions symbols. Predicate symbols can be either extensional (i.e. deﬁned by the facts of a database — EDB predicate symbols) or intensional (i.e. deﬁned by the rules of the program — IDB predicate symbols). The class of all DATALOG¬ programs is simply called DATALOG¬ ; the subclass of all positive (resp. stratiﬁed) programs is called DATALOG (resp. DATALOG¬s ). A DATALOG¬ program P has associated a relational database scheme DB P , which consists of all EDB predicate symbols of P . We assume that possible constants in P are taken from the same domain U of DB P . Given a database D on DB P , the tuples of D are seen as facts added to P ; so P on D yields the following logic program PD = P ∪{q(t). : q ∈ DB P ∧t ∈ D(q)}. Given a T-stable model M of PD and a relation symbol r in PD , M (r) denotes the relation {t : r(t) ∈ M }. Definition 4. A DATALOG¬ search query P, f , where P is a DATALOG¬ program and f is an IDB predicate symbol of P , deﬁnes the query N Q = DB P , f such that for each D on DBP , N Q(D) = {M (f ) : M is a T-stable model of PD }. The set of all DATALOG¬ , DATALOG or DATALOG¬s search queries are denoted respectively by search(DATALOG¬ ), search(DATALOG) and search(DATALOG¬s ). The DATALOG¬ optimization query P, opt(f ) deﬁnes the optimization query opt(N Q). The set of all DATALOG¬ , DATALOG or DATALOG¬s optimization queries are denoted respectively by opt(DATALOG¬ ), opt(DATALOG) and opt(DATALOG¬s ). 2 Observe that, given a database D, if the program PD has no stable models then both search and optimization queries are not deﬁned on D. Proposition 3. 1. search(DATALOG¬ ) = NQPMV and opt(DATALOG¬ ) = OPT NQPMV; 2. search(DATALOG) ⊂ search(DATALOG¬s ) ⊂ NQPTIME g . PROOF. In [28] it has been shown that a database query N Q is deﬁned by a query in search(DATALOG¬ ) if and only if, for each input database, the answers of N Q are NP-recognizable. Hence search(DATALOG¬ ) = NQPMV and opt(DATALOG¬ ) = OPT NQPMV. Concerning part (2), observe that queries in search(DATALOG¬s ) are a proper subset of deterministic polynomial-time queries

Search and Optimization Problems in Datalog

71

[5] and then search(DATALOG¬s ) ⊂ NQPTIME g . Finally, the relationship search (DATALOG) ⊂ search(DATALOG¬s ) is well known in the literature [3]. 2 Note that search(DATALOG) = opt(DATALOG) and search(DATALOG¬s ) = opt (DATALOG¬s ) as the queries are deterministic. Example 6. Take the queries N Qcut and max(N Qcut ) of Examples 2 and 4, respectively. Consider the following DATALOG¬ program Pcut v (X) ← v(X), ¬^ v (X). v (X) ← v(X), ¬v (X). ^ e (X, Y) ← e(X, Y), v (X), ¬v (Y). e (X, Y) ← e(X, Y), ¬v (X), v (Y). We have that N Qcut = Pcut , e and max(N Qcut ) = Pcut , max(e ) .

2

Example 7. Take the queries N Qkernel and min(N Qkernel ) of Examples 3 and 5. Consider the following DATALOG¬ program Pkernel v (X) ← v(X), ¬^ v (X). v (X) ^ ← v(X), ¬v (X). joined to v (X) ← v (Y), e(Y, X). no kernel ← v (X), joined to v (X). no kernel ←^ v (X), ¬joined to v (X). constraint ← ¬no kernel, ¬constraint. We have that N Qkernel = Pkernel , v and min(N Qkernel ) = Pkernel , min(v ) . Observe that Pkernel has no T-stable model iﬀ N Qkernel is not deﬁned on D (i.e., there is no kernel). 2 The problem in using DATALOG¬ to express search and optimization problems is that the usage of unrestricted negation in programs is often neither simple nor intuitive and, besides, it does not allow to discipline the expressive power (e.g., the classes NQPTIMEand NQPTIME g are not captured). This situation might lead to write queries that have no total stable models or whose computation is hard even though the problem is not. On the other hand, as pointed out in Proposition 3, if we just use DATALOG¬s the expressive power is too low so that we cannot express simple polynomial-time problems. For instance, the query asking for a spanning tree of an undirected graph needs the use of a program with unstratiﬁed negation such as: (1) (2) (3) (4)

reached(a). reached(Y) ← spanTree(X, Y). spanTree(X, Y) ← reached(X), e(X, Y), Y = a, ¬ diffChoice(X, Y). diffChoice(X, Y) ← spanTree(Z, Y), Z = X.

But the freedom in the usage of negation may result in meaningless programs. For instance, in the above program, in an attempt to simplify it, one could decide to modify the third rule into

72

Sergio Greco and Domenico Sacc` a

(3 ) spanTree(X, Y) ← reached(X), arc(X, Y), Y = a, ¬ reached(Y). and remove the fourth rule. Then the resulting program will have no total stable models, thus loosing its practical meaning. Of course the risk of writing meaningless programs is present in any language, but this risk is much higher in a language with non-intuitive semantics as for unstratiﬁed negation. In the next section we propose a language where the usage of stable model semantics is disciplined to avoid both undeﬁnedness and unnecessary computational complexity, and to refrain from abstruse forms of unstratiﬁed negation. The core of the language is stratiﬁed DATALOG extended with only one type of non-stratiﬁed negation, hardwired into two ad-hoc constructs. The disciplined structure of negation in our language will enable us to capture interesting subclasses of NQPMV.

4

Datalog Languages for Search and Optimization Problems

In this section we analyze the expressive power of several languages derived from DATALOG¬ by restricting the use of negation. In particular, we consider the combination of stratiﬁed negation, a nondeterministic construct, called choice and subset rules computing subsets of tuples of a given relation. The choice construct is supported by several deductive database systems such as LDL++ [33] and Coral [27], and it is used to enforce functional constraints on rules of a logic program. Thus, a goal of the form, choice((X), (Y )), in a rule r denotes that the set of all consequences derived from r must respect the FD X → Y . In general, X can be a vector of variables — possibly an empty one denoted by “( )” — and Y is a vector of one or more variables. As shown in [29] the formal semantics of the construct can be given in terms of stable model semantics. For instance, a rule r of the form r : p(X, Y, W ) ← q(X, Y, Z, W ), choice((X), (Y )), choice((Y ), (X)). expressing that for any stable model M , the ground instantiation of r w.r.t. M must satisfy the FDs X → Y and Y → X, is rewritten into the following standard rules r1 r2 r3 r4

: : : :

p(X, Y, W ) ← q(X, Y, Z, W ), chosen(X, Y, Z). chosen(X, Y, Z) ← q(X, Y, Z, W ), ¬dif f choice(X, Y, Z). dif f choice(X, Y, Z) ← chosen(X, Y , Z ), Y = Y . dif f choice(X, Y, Z) ← chosen(X , Y, Z ), Z = Z .

where the choice predicates have been substituted by the chosen predicate and for each choice predicate there is a diﬀchoice rule. The rule r will be called choice rule, the rule r1 will be called modiﬁed rule, the rule r2 will be called chosen rule and the rules r3 and r4 will be called diﬀchoice rules. Let P be a DATALOG¬ program with choice constructs, we denote with sv(P ) the program obtained by rewriting the choice rules as above — sv(P ) is called the standard version of P .

Search and Optimization Problems in Datalog

73

In general, the program sv(P ) generated by the transformation discussed above has the following properties [29,13]: 1) if P is in DATALOG or in DATALOG¬s then sv(P ) has one or more total stable models, and 2) the chosen atoms in each stable model of sv(P ) obey the FDs deﬁned by the choice goals. The stable models of sv(P ) are called choice models for P . The set of functional dependencies deﬁned by choice atoms on the instances of a rule r (resp., program P ) will be denoted F Dr (resp., F DP ). A subset rule is of the form s(X) ⊆ A1 , . . . , An . where s is an IDB predicate symbol not deﬁned elsewhere in the program (subset predicate symbol) and all literals A1 , . . . , An in the body are EDB. The rule enforces to select any subset of the relation that is derived from the body. The formal semantics of the rule is given by rewriting it into the following set of normal DATALOG¬ rules s(X) ← A1 , . . . , An , ¬ˆs(X). ˆs(X) ← A1 , . . . , An , ¬s(X). where sˆ is a new IDB predicate symbol with the same arity as s. Observe that the semantics of a subset rule can be also given in terms of choice as follows: label(1). label(2). ˆs(1, X) ← A1 , . . . , An , label(L), choice((X), (L)). s(X) ← ˆs(1, X). It turns out that subset rules are not necessary in our language, but we keep them in order to simplify the formulation of optimization queries. In the following we shall denote with DATALOG¬s ,c the language DATALOG¬s with choice and subset rules. More formally we say: Definition 5. A DATALOG¬ program P with choice and subset rules is in DATALOG¬s ,c if P is stratiﬁed, where P is obtained from sv(P ”) by removing diﬀchoice rules and diﬀchoice atoms and P ” is obtained from P by rewriting subset rules in terms of choice constructs. Search and otpimization queries are denoted by search(DATALOG¬s ,c ) and opt(DATALOG¬s ,c ), respectively. Moreover, search(DATALOG¬s ,c )g denotes the class of queries N Q = P, f such that f is a relation deﬁned by choice or subset rules and such rules are not deﬁned in terms of other choice or subset rules; the cor2 responding optimization class is opt(DATALOG¬s ,c )g . Proposition 4. 1. search(DATALOG¬s ,c ) = NQPTIME and opt(DATALOG¬s ,c ) = OPT NQPTIME; 2. search(DATALOG¬s ,c )g = NQPTIME g and opt(DATALOG¬s ,c )g = OPT NQPTIME g .

74

Sergio Greco and Domenico Sacc` a

PROOF. The fact that search(DATALOG¬s ,c ) = NQPTIME has been proven in many places, e.g., in [13,21,12]. Observe now that, given any query Q in search(DATALOG¬s ,c )g , Q ∈ NQPTIME as Q is also in search(DATALOG¬s ,c ). Moreover, for each D and for each answer of Q(D), the non-deterministic choices, that are issued while executing the logic program, are kept into the answer; thus every answer contains a certiﬁcate of its recognition and, then, recognition is in P. Hence also Q ∈ NQPMV g and, then, Q ∈ NQPTIME g . To show that every query Q in NQPTIME g is also in search(DATALOG¬s ,c )g , we use the following characterization of NQPTIME g [21]: every answer of Q can be constructed starting from the empty relation by adding one tuple at a time after a polynomialtime membership test. This construction can be easily implemented by deﬁning 2 a suitable query in search(DATALOG¬s ,c )g . Next we show how to increase the expressive power of the language. We stress that the additional power is added in a controlled fashion so that a high level of expressivity is automatically enabled only if required by the complexity of the problem at hand. Definition 6. Let search(DATALOG¬s ,c )! denote the class of queries N Q = P, !A, f such that P, f is in search(DATALOG¬s ,c ) and A is a ground literal (the constraint goal); for each D in DBP , N Q(D) = {M (f ) : M is a T-stable model of PD and either A ∈ M if A is positive or A ∈ M otherwise}. Accordingly, we deﬁne opt(DATALOG¬s ,c )! , search(DATALOG¬s ,c )g,! and opt(DATALOG¬s ,c )g,! . 2 Proposition 5. 1. search(DATALOG¬s ,c )! = NQPMV and opt(DATALOG¬s ,c )! = OPT NQPMV; 2. search(DATALOG¬s ,c )g,! = NQPMV g and opt(DATALOG¬s ,c )g,! = OPT NQPMV g . PROOF. Given any query Q = P, !A, f in search(DATALOG¬s ,c )! , Q ∈ NQPMV since for each database D and for each relation F , testing whether F ∈ Q(D) can be done in nondeterministic polynomial time as follows: we guess an interpretation M and, then, we check in deterministic polynomial time whether both M is a stable model and A is true in M . To prove that every query Q in NQPMV can be deﬁned by some query in search(DATALOG¬s ,c )! , we observe that Q can be expressed by a closed ﬁrst-order formula by Fact 2 and that this formula can be easily translated into a query in search(DATALOG¬s ,c )! . The proof of part (2) follows the lines of the proof of part (2) of Proposition 4. 2 Example 8. The program Pcut of Example 6 can be replaced by the following program Pcut : v (X) ⊆ v(X). e (X, Y) ← e(X, Y ), v (X), ¬v (Y ). e (X, Y) ← e(X, Y ), ¬v (X), v (Y ). The query Pcut , e is in search(DATALOG¬s ,c )g Pcut , max(e ) is in max(DATALOG¬s ,c )g .

and, therefore, the query

2

Search and Optimization Problems in Datalog

75

The program of the above example has been derived from the program of Example 6 by replacing the two rules with unstratiﬁed negation, deﬁning v with a subset rule. Example 9. The program Pkernel of Example 7 can be replaced by the following program Pkernel : v (X) ⊆ v(X). joined to v (X) ← v (Y), e(Y, X). no kernel ← v (X), joined to v (X). no kernel ← ¬v (X), ¬joined to v (X). The query Pkernel , ¬no kernel, v is in search(DATALOG¬s ,c )g,! and, therefore, the query Pkernel , min|v | is in min(DATALOG¬s ,c )g,! . 2 The advantage of using restricted languages is that programs with built-in predicates are more intuitive and it is possible to control the expressive power.

5

Capturing Desirable Subclasses of NP Optimization Problems

We have shown that optimization queries are much harder than associated search queries. Indeed it often happens that the optimization of polynomial-time computable search queries cannot be done in polynomial time. In this section we show how to capture optimization queries for which “approximate” answers can be found in polynomial time. Let us ﬁrst recall that, as said in Proposition 1, an NP optimization query opt|N Q| = DB, opt|f | corresponds to a problem in the class OPT PB that is deﬁned as opt|N Q| = optS |{w : (DB, S) |= φ(w, DB, S)}|. In addition to the free variables w, the ﬁrst order formula φ may also contain quantiﬁed variables so that the general format of it is of two types: (∃x1 )(∀x2 ) . . . (Qk xk )ψ(w, DB, S, x1 , . . . , xk ), or (∀x1 )(∃x2 ) . . . (Qk xk )ψ(w, DB, S, x1 , . . . , xk ), where k ≥ 0, Qk is either ∃ or ∀, and ψ is a non-quantiﬁed formula. In the ﬁrst case φ is a Σk formula while it is a Πk formula in the latter case. (If φ has no quantiﬁers then it is both a Σ0 and a Π0 formula.) Accordingly, the class of all NP optimization problems for which the formula φ is a Σk (resp., Πk ) formula is called OPT Σk (resp., OPT Πk ). Kolaitis and Thakur [20] have introduced two hierarchies for the polynomially bounded NP minimization problems and for the polynomially bounded NP maximization problems: MAX Σ0 ⊂ MAX Σ1 ⊂ MAX Π1 = MAX Σ2 ⊂ MAX Π2 = MAX PB MIN Σ0 = MIN Σ1 ⊂ MIN Π1 = MIN Σ2 = MIN PB

76

Sergio Greco and Domenico Sacc` a

Observe that the classes MAX Σ0 and MAX Σ1 have been ﬁrst introduced in [26] with the names MAX SNP and MAX NP, respectively, whereas the class MAX Π1 has been ﬁrst introduced in [25]. A number of maximization problems have a desirable property: approximation. In particular, Papadimitriou and Yannakakis have shown that every problem in the class MAX Σ1 is constant-approximable [26]. This is not the case for the complementary class MIN Σ1 or other minimization subclasses: indeed the class MIN Σ0 contains problems which are not log-approximable (unless P = NP) [20]. To single out desirable subclasses for minimization problems, Kolaitis and Thakur introduced a reﬁnement of the hierarchies of NP optimization problems by means of the notion of feasible NP optimization problem, based on the fact that, as pointed out in Proposition 1, an NP optimization query, opt|N Q| = DB, opt|f | , can be also deﬁned as optf,S {|f | : (D, f, S) |= φ(DB, f, S)}. Therefore, the class of all NP optimization problems for which the above formula φ is a Σk (resp., Πk ) formula is called OPT F Σk (resp., OPT F Πk ). The following containment relations hold: MAX Σ0 ⊂ MAX Σ1 ⊂ MAX F Π1 = MAX F Σ2 = MAX Π1 = MAX F Σ1

MAX Σ2 ⊂ MAX F Π2 = MAX Π2 = MAX PB

MIN Σ0 = MIN Σ1 = MIN F Π1 MIN F Σ1

⊂ MIN F Σ2 ⊂ MIN Π1 = MIN Σ2 = M IN F Π2 = MIN Π2 = MIN PB

Observe that all problems in MAX F Σ1 are constant-approximable since MAX F Σ1 ⊂ MAX Σ1 . A further reﬁnement of feasible NP optimization classes can be obtained as follows. A ﬁrst order formula φ(S) is positive w.r.t. the relation symbol S if all occurrences of S are within an even number of negation. The class of feasible NP minimization problems whose ﬁrst order part is a positive Πk formula (1 ≤ k ≤ 2) is denoted by MIN F + Πk . Particularly relevant is MIN F + Π1 as all optimization problems contained in this class are constant-approximable [20]. We next show that it is possible to further discipline DATALOG¬s ,c in order to capture most of the above mentioned optimization subclasses. First of all we point out that feasible NP optimization problems can be captured in DATALOG¬s ,c,! by restricting to the class opt(DATALOG¬s ,c )g . For instance, the problem expressed by the query of Example 9 is feasible whereas the problem expressed by the query of Example 8 is not feasible. Let P be a DATALOG¬s ,c program, p(y) be an atom and X a set of variables. We say that p(y) is free w.r.t. X (in P ) if 1. var(p(y)) ⊆ X, where var(p(y)) is the set of variables occurring in y, and 2. ∀r ∈ P such that the head H(r) and p(y) unify, then var(B(r)) ⊆ var(H(r)) (i.e., the variables in the body also appear in the head) and for each atom q(w) in B(r), either q is an EDB predicate or q(w) is free w.r.t. var(q(w)).

Search and Optimization Problems in Datalog

77

We denote with opt(DATALOG¬s ,c ) ∃ the class of all queries P, opt|f | in opt(DATALOG¬s ,c ) such that f (X) is free w.r.t. X, where X is a list of distinct variables. Thus, opt(DATALOG¬s ,c ) ∃ denotes the class of all queries P, opt|f | in opt(DATALOG¬s ,c ), where all rules used to deﬁne (transitively) the predicate f , do not have additional variables w.r.t. to the head variables. For instance, the query of Example 8 is in opt(DATALOG¬s ,c ) ∃ . Theorem 1. opt(DATALOG¬s ,c ) ∃ = OPT Σ0 . PROOF. Let P, opt|f | be a query in opt(DATALOG¬s ,c ) ∃ . Consider the rules that deﬁne directly or indirectly the goal f and let X be a list of a(f ) distinct variables. Since f (X) is free w.r.t. X by hypothesis, it is possible to rewrite the variables in the above rules such that they are a subset of X. It is now easy to show that the query can be written as a quantiﬁer-free ﬁrst-order formula with the free variables X, i.e., the query is in OPT Σ0 . The proof that every query in OPT Σ0 can be formulated as a query in opt(DATALOG¬s ,c ) ∃ is straightforward. 2 It turns out that all queries in max(DATALOG¬s ,c ) ∃ are constant-approximable. Example 10. MAX CUT. Consider the program Pcut of Example 8. The query 2 Pcut , max(e ) is in MAX Σ0 since e (X, Y ) is free w.r.t. X, Y . Let P be a DATALOG¬s ,c program and p(y) be an atom. We say that P is semipositive w.r.t. p(y) if 1. p is an EDB or a subset predicate symbol, or 2. ∀r ∈ P deﬁning p, P is semipositive w.r.t. every positive literal in the body B(r) while each negative literal is EDB or subset. We now denote with opt(DATALOG¬s ,c )+ the class of all queries P, opt(f ) in opt(DATALOG¬s ,c ) such that P is semipositive w.r.t.f (X). Thus,opt(DATALOG¬s ,c )+ denotes the class of all queries P, opt|f | in opt(DATALOG¬s ,c ) where negated predicates used to deﬁne (transitively) the predicate f are either EDB predicates or subset predicates. For instance, the query of Example 8 is inopt(DATALOG¬s ,c )+ . Moreover, since the predicate appearing in the goal is a subset predicate, the query of Example 8 is in opt(DATALOG¬s ,c )g,+ . Theorem 2. 1. opt(DATALOG¬s ,c )+ = OPT Σ1 , 2. opt(DATALOG¬s ,c )g,+ = OPT F Σ1 . PROOF. Let P, opt|f | be a query in opt(DATALOG¬s ,c )+ and X be a list of a(f ) distinct variables. Consider the rules that deﬁne directly or indirectly the goal f . Since P is semipositive w.r.t. f (X) by hypothesis, it is possible to rewrite the variables in the above rules such that each of them is either in X or existentially quantiﬁed. It is now easy to show that the query can be formulated 2 in the OPT Σ1 format. The proof of part (2) is straightforward. Then all queries in both max(DATALOG¬s ,c )+ and max(DATALOG¬s ,c )g,+ are constant-approximable.

78

Sergio Greco and Domenico Sacc` a

Example 11. MAX SATISFIABILITY. We are given two unary relation c and a such that a fact c(x) denotes that x is a clause and a fact a(v) asserts that v is a variable occurring in some clause. We also have two binary relations p and n such that the facts p(x, v) and n(x, v) say that a variable v occurs in the clause x positively or negatively, respectively. A boolean formula, in conjunctive normal form, can be represented by means of the relations c, a, p, and n. The maximum number of clauses simultaneously satisﬁable under some truth assignment can be expressed by the query Psat , max(f ) where Psat is the following program: s(X) ⊆ a(X). f(X) ← c(X), p(X, V), s(V). f(X) ← c(X), n(X, V), ¬s(V). Observe that f (X) is not free w.r.t. X (indeed the query is not in MAX Σ0 ) but Psat is semipositive w.r.t. f (X) so that the query is in MAX Σ1 . Observe now that the query goal f is not a subset predicate: indeed the query is not in 2 MAX F Σ1 . Let !A be a goal in a query in opt(DATALOG¬s ,c )! on a program P — recall that A is a positive or negative ground literal. Then a (not necessarily ground) atom C has 1. a mark 0 w.r.t. A if C = A; 2. a mark 1 w.r.t. A if C = ¬A; 3. a mark k ≥ 0 w.r.t. A if there exists a rule r in P and a substitution σ for the variables in C such that either (i) H(r ) has mark (k − 1) w.r.t. A and Cσ occurs negated in the body of r , or (ii) H(r ) has mark k w.r.t. A and Cσ is a positive literal in the body of r . Let us now deﬁne the class opt(DATALOG¬s ,c )!, ∃ of all queries P, !A, opt(f )

in opt(DATALOG¬s ,c )! such that (i) f (X) is free w.r.t. X and (ii) for each atom C that has an even mark w.r.t. A and for every rule r in P , whose head uniﬁes with C, the variables occurring in the body B(r ) also occur in the head H(r ). We are ﬁnally able to deﬁne a subclass which captures OPT F + Π1 that is approximable when OPT = MIN . To this end, we deﬁne opt(DATALOG¬s ,c )!, ∃,g,+ as the subclass of opt(DATALOG¬s ,c )!, ∃,g consisting of those queries P, !A, opt(f )

such that there exists no subset atom s(x) having an odd mark w.r.t. A. Theorem 3. 1. opt(DATALOG¬s ,c )!, ∃ = OPT Π1 ; 2. opt(DATALOG¬s ,c )!, ∃,g = OPT F Π1 ; 3. opt(DATALOG¬s ,c )!, ∃,g,+ = OPT F + Π1 . PROOF. Let P, !A, opt(f ) be a query in opt(DATALOG¬s ,c )!, ∃ . Consider the rules that deﬁne directly or indirectly the goal f and let X be a list of a(f ) distinct variables. Since f (X) is free w.r.t. X by hypothesis, it is possible to

Search and Optimization Problems in Datalog

79

rewrite the variables in the above rules such that they are a subset of X. Consider now the rules that deﬁne directly or indirectly the goal !A. We can now rewrite the variables in the above rules such that they are universally quantiﬁed. It is now easy to show that the query can be written as an existential-free ﬁrst-order formula with the free variables X and possibly additional variables universally quantiﬁed, i.e., the query is in OPT Π1 . The proofs of the other relationships are simple. 2 Example 12. MAX CLIQUE. In this example we want to ﬁnd the cardinality of a maximum clique, i.e. a set of nodes V such that for each pair of nodes (x, y) in V there is an edge joining x to y. The maximum clique problem can be expressed by the query Pclique , !¬no clique, max(v ) where the program Pclique is as follows: v (X) ⊆ v(X). no clique ← v (X), v (Y), X = Y, ¬e(X, Y). The query is in the class max(DATALOG¬s ,c )!, ∃,g and, therefore, the optimization query is in MAX F Π1 (= MAXΠ1 ). On the other hand both atoms v (X) and v (Y) in the body of the rule deﬁning the predicate no clique have mark 1 (i.e. odd) w.r.t. the ”!” goal. Therefore, the query Pclique , !¬no clique, max(v )

2 is not in the class max(DATALOG¬s ,⊆ )!, ∃,g,+ , thus it is not in MAX F + Π1 . Example 13. MIN VERTEX COVER. As discussed in the introduction, the problem can be formulated by the query Pvc , !¬no cover, min(v (X)) where Pvc is the following program: v (X) ⊆ v(X). no cover ← e(X, Y), ¬v (X), ¬v (Y). Observe that both atoms v (X) and v (Y) in the rule deﬁning no cover have a mark 2 (i.e., even) w.r.t. the “!” goal. Therefore, the query is in min(DATALOG¬s ,c )!, ∃,g,+ and, then, in MIN F + Π1 ; so the problem is constantapproximable. 2 Additional interesting subclasses could be captured in our framework, but they are not investigated here. We just give an example of a query which is in the class MIN F + Π2 (1) — this class is a subset of MIN Π2 where every subset predicate symbol occurs positively and at most once in every disjunction of the formula ψ. Problems in this class are log-approximable [20]. Example 14. MIN DOMINATING SET. Let G = (V, E) be a graph. A subset V of V is a dominating set if every node is either in V or has a neighbour in V . The query Pds , !¬no ds, min(v (X)) where Pds is the following program, computes the cardinality of a minimum dominating set: v (X) ⊆ v(X). q(X) ← v (X). q(X) ← e(X, Y), v (Y). no ds ← v(X), ¬q(X). This problem belongs to MIN F + Π2 (1).

2

80

Sergio Greco and Domenico Sacc` a

Observe that the problem min kernel as deﬁned in Example 8 is in the class MIN F Π2 , but not in MIN F + Π2 , as it contains occurrences of the subset predicate v which have an odd mark w.r.t. the ”!” goal.

6

Conclusion

In this paper we have shown that NP search and optimization problems can be formulated as DATALOG¬ queries under non-deterministic total stable model semantics. In order to enable a simpler and more intuitive formulation of such problems, we have also introduced an extension of stratiﬁed DATALOG¬ that is able to express all NP search and optimization queries using a disciplined style of programming in which only simple forms of unstratiﬁed negations are supported. The core of this language, denoted by DATALOG¬s ,c,! , is stratiﬁed DATALOG¬ augmented with three types of non-stratiﬁed negations which are hardwired into ad-hoc constructs: choice predicate, subset rule and constraint goal. The former two constructs serve to issue non-deterministic selections while constructing one of possible total stable models, whereas the latter one deﬁnes some constraint that must be respected by the stable model in order to be accepted as an intended meaning of the program. The language DATALOG¬s ,c,! has been further reﬁned in order to capture interesting subclasses of NP search queries, some of them computable in polynomial time. As for optimization queries, since in general they are not tractable also when the associated search problems are, we introduced restrictions to our language to single out classes of approximable optimization problems which have been recently introduced in the literature. Our on-going research follows two directions: 1. eﬃcient implementation schemes for the language, particularly to perform eﬀective subset selections by pushing down constraints and possibly adopting ‘intelligent’ search strategies; this is particularly useful if one wants to ﬁnd approximate solutions; 2. further extensions of the language such as (i) adding the possibility to use IDB predicates whenever an EDB predicate is required (provided that IDB deﬁnitions are only given by stratiﬁed rules), (ii) freezing, under request, nondeterministic selections to enable a “don’t care” non-determinism (thus, some selections cannot be eventually retracted because of the constraint goal), and (iii) introducing additional constructs, besides to choice and subset rule, to enable nondeterministic selections satisfying predeﬁned constraints that are tested on the ﬂy.

References 1. Abiteboul, S., Simon, E., and Vianu, V., Non-deterministic languages to express deterministic transformations. In Proc. ACM Symp. on Principles of Database Systems, 1990, pp. 218-229.

Search and Optimization Problems in Datalog

81

2. Abiteboul, S., and Vianu, V., Non-determinism in logic-based languages. Annals of Mathematics and Artificial Intelligence 3, 1991, pp. 151-186. 3. Abiteboul, S., Hull, R., and Vianu, V., Foundations of Databases. Addison-Wesley, 1994. 4. Afrati, F., Cosmadakis, S. S., and Yannakakis, M., On Datalog vs. Polynomial Time. Proc. ACM Symp. on Principles of Database Systems, 1991, pp. 13-25. 5. Apt, K., Blair, H., and Walker, A., Towards a theory of declarative knowledge. In Foundations of Deductive Databases and Logic Programming, J. Minker (ed.), Morgan Kauﬀman, Los Altos, USA, 1988, 89-142. 6. Ausiello, G., Crescenzi, P., and Protasi M., Approximate solution of NP optimization problems. Theoretical Computer Science, No. 150, 1995, pp. 1-55. 7. Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A., and Protasi, M., Complexity and Approximation - Combinatorial optimization problems and their approximability properties Springer-Verlag, 1999. 8. Fagin, R., Generalized First-Order Spectra and Polynomial-Time Recognizable Sets. In Complexity of Computation (R. Karp, Ed.), SIAM-AMS Proc., Vol. 7, 1974, pp. 43-73. 9. Fenner, S., Green, F., Homer, S., Selman, A. L., Thierauf, T. and Vollmer H., Complements of Multivalued Functions. Chicago Journal of Theoretical Computer Science, 1999. 10. Garey, M., and Johnson, D. S., Computers and Intractability — A Guide to the Theory of NP-Completeness. W.H. Freeman, New York, USA, 1979. 11. Gelfond, M., and Lifschitz, V., The Stable Model Semantics for Logic Programming. Proc. 5th Int. Conf. on Logic Programming, 1988, pp. 1070-1080. 12. Giannotti, F., Pedreschi, D., and Zaniolo, C., Semantics and Expressive Power of Non-Deterministic Constructs in Deductive Databases. Journal of Computer and System Sciences, 62, 1, 2001, pp. 15-42. 13. Giannotti, F., Pedreschi, D., Sacc` a, D., and Zaniolo, C., Nondeterminism in Deductive Databases. Proc. 2nd Int. Conf. on Deductive and Object-Oriented Databases, 1991, pp. 129-146. 14. Greco, S., Sacc` a, D., and Zaniolo C., Datalog with Stratiﬁed Negation and Choice: from P to DP . Proc. Int. Conf. on Database Theory, 1995, pp. 574–589. 15. Greco, S., and Sacc` a, D., NP-Optimization Problems in Datalog. Proc. Int. Logic Programming Symp., 1997, pp. 181-195. 16. Greco, S., and Zaniolo, C., Greedy Algorithms in Datalog. Proc. Int. Joint Conf. and Symp. on Logic Programming, 1998, pp. 294-309. 17. Johnson, D. S., A Catalog of Complexity Classes. In Handbook of Theoretical Computer Science, Vol. 1, J. van Leewen (ed.), North-Holland, 1990. 18. Kanellakis, P. C., Elements of Relational Database Theory. In Handbook of Theoretical Computer Science, Vol. 2, J. van Leewen (ed.), North-Holland, 1991. 19. Kolaitis, P. G., and Thakur, M. N., Logical Deﬁnability of NP Optimization Problems. Information and Computation, No. 115, 1994, pp. 321-353. 20. Kolaitis, P. G., and Thakur, M. N., Approximation Properties of NP Minimization Classes. Journal of Computer and System Science, No. 51, 1995, pp. 391-411.

82

Sergio Greco and Domenico Sacc` a

21. Leone, N., Palopoli, L., and Sacc` a, D. On the Complexity of Search Queries. In Fundamentals Of Information Systems (T. Plle, T. Ripke, K.D. Schewe, eds), 1999, pp. 113-127. 22. Lloyd, J., Foundations of Logic Programming. Springer-Verlag, 1987. 23. Marek, W., and Truszczynski, M., Autoepistemic Logic. Journal of the ACM, Vol. 38, No. 3, 1991, pp. 588-619. 24. Papadimitriou, C. H., Computational Complexity. Addison-Wesley, Reading, MA, USA, 1994. 25. Panconesi, A., and Ranjan, D., Quantiﬁers and Approximation. Theoretical Computer Science, No. 1107, 1992, pp. 145-163. 26. Papadimitriou, C. H., and Yannakakis, M., Optimization, Approximation, and Complexity Classes. Journal Computer and System Sciences, No. 43, 1991, pp. 425-440. 27. Ramakrisnhan, R., Srivastava, D., and Sudanshan, S., CORAL — Control, Relations and Logic. In Proc. of 18th Conf. on Very Large Data Bases, 1992, pp. 238-250. 28. Sacc` a, D., The Expressive Powers of Stable Models for Bound and Unbound Queries. Journal of Computer and System Sciences, Vol. 54, No. 3, 1997, pp. 441464. 29. Sacc` a, D., and Zaniolo, C., Stable Models and Non-Determinism in Logic Programs with Negation. In Proc. ACM Symp. on Principles of Database Systems, 1990, pp. 205-218. 30. Selman, A., A taxonomy of complexity classes of functions. Journal of Computer and System Science, No. 48, 1994, pp. 357-381. 31. A. Selman, Much ado about functions. Proc. of the 11th Conf. on Computational Complexity, IEEE Computer Society Press, 1996, pp. 198-212. 32. Ullman, J. K., Principles of Data and Knowledge-Base Systems, volume 1 and 2. Computer Science Press, New York, 1988. 33. Zaniolo, C., Arni, N., and Ong, K., Negation and Aggregates in Recursive Rules: the LDL++ Approach. Proc. 3rd Int. Conf. on Deductive and Object-Oriented Databases, 1993, pp. 204-221.

The Declarative Side of Magic Paolo Mascellani1 and Dino Pedreschi2 1

Dipartimento di Matematica, Universit` a di Siena via del Capitano 15, Siena - Italy [email protected] 2 Dipartimento di Informatica, Universit` a di Pisa Corso Italia 40, Pisa - Italy [email protected]

Abstract In this paper, we combine a novel method for proving partial correctness of logic programs with a known method for proving termination, and apply them to the study of the magic-sets transformation. As a result, a declarative reconstruction of eﬃcient bottom-up execution of goal-driven deduction is accomplished, in the sense that the obtained results of partial and total correctness of the transformation abstract away from procedural semantics.

1

Introduction

In the recent years, various principles and methods for the veriﬁcation of logic programs have been put forward, as witnessed for instance in [11,3,16,17,13]. The main aim of this line of research is to verify the crucial properties of logic programs, notably partial and total correctness, on the basis of the declarative semantics only, or, equivalently, by abstracting away from procedural semantics. The aim of this paper is to apply some new methods for partial correctness combined with some known methods for total correctness to a case study of clear relevance, namely bottom-up computing. More precisely, we: – introduce a method for proving partial correctness by extending the ideas in [14], – combine it with the approach in [6,7] for proving termination, and – apply both to the study of the transformation techniques known as magicsets, introduced for the eﬃcient bottom-up execution of goal-driven deduction — see [9,20] for a survey. We found the exercise stimulating, as all proofs of correctness of the magicsets transformation(s) available in the literature are based on operational arguments, and often quite laborious. The results of partial and total correctness presented in this paper, instead, are based on purely declarative reasoning, which clariﬁes the natural idea underlying the magic-sets transformation. Moreover, these results are applicable under rather general assumptions, which broadly encompass the programming paradigm of deductive databases. A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 83–108, 2002. c Springer-Verlag Berlin Heidelberg 2002

84

Paolo Mascellani and Dino Pedreschi

Preliminaries Throughout this paper we use the standard notation of Lloyd [12] and Apt [1]. In particular, for a logic program P we denote the Herbrand Base of P by BP , the least Herbrand model of P by MP and the immediate consequence operator by TP . Also, we use Prolog’s convention identifying in the context of a program each string starting with a capital letter with a variable, reserving other strings for the names of constants, terms or relations. In the programs we use Prolog’s list notation. Identiﬁers ending with “s”, like xs, range over lists. Bold capital letters, like A, identify a possibly empty sequence (conjunction) of atoms or set of variables: the context should always be clear. Plan of the Paper In Section 2 we introduce a declarative method for proving the partial correctness of a logic program w.r.t. a speciﬁcation. In Section 3 we use this method to obtain a declarative proof of the correctness of a particular implementation of the magic-sets transformation technique. In Section 4 is recalled the concept of acceptability, which allows to conduct declarative termination proofs for logic programs. In Section 5, we apply this concept to prove the termination of the magic programs under and some related properties. Finally, in Section 6, we provide a set of examples in order to clarify the use of the proof methods proposed and how the magic-sets transformation works.

2

Partial Correctness

Partial correctness aims at characterizing the input-output behavior of programs. The input to a logic program is a query, and the associated output is the set of computed instances of such a query. Therefore, partial correctness in logic programming deals with the problem of characterizing the computed instances of a query. In Apt [2,3], a notation recalling that of Hoare’s triples (correctness formulas) is used. The triple: {Q} P Q denotes the fact that Q is the set of computed instances of query Q. A natural question is: can we establish a correctness formula by reasoning on declarative semantics, i.e. by abstracting away from procedural semantics? The following simple result, which generalizes one from [3] tells us that this is possible in the case of ground output. Theorem 1. Consider the set Q of the correct instances of a query Q and a program P , and suppose that every query in Q is ground. Then: {Q} P Q.

The Declarative Side of Magic

85

Proof. Clearly, every computed instance of Q is also a correct instance of Q by the Soundness of SLD-resolution. Conversely, consider a correct instance Q1 of Q. By the Strong Completeness of SLD-resolution, there exists a computed instances Q2 of Q such that Q1 is an instances of Q2 . By the Soundness of SLDresolution, Q2 is a correct instance of Q, so it is ground. Consequently Q2 = Q1 , 2 hence Q1 is a computed instance of Q. So, for programs with ground output, correct and computed instances of queries coincide, and therefore we can use directly the declarative semantics to check partial correctness. When considering one-atom queries only, the above result can be rephrased as follows: if the one-atom query A to program P admits only ground correct instances, then: {A}

P MP ∩ [A] .

(1)

A simple suﬃcient condition (taken from [3]) to check that all correct instances of a one-atom query A are ground is to show that the set [A] ∩ MP is ﬁnite, i.e. that A admits a ﬁnite number of correct ground instances. So, in principle, it is possible to reason about partial correctness on the basis of the least Herbrand model only. As an example, consider the Append program: append([], Ys, Ys). append([X|Xs],Ys,[X|Zs]) ← append(Xs,Ys,Zs). the interpretation: IAppend = {append(xs,ys,zs) | xs,ys,zs are lists and xs * ys = zs}

(2)

where zs is some given list, and “∗” denotes list concatenation, and the correctness formula: {append(Xs, Ys, zs)}

Append IAppend

We can establish such a triple provided we can show that the interpretation IAppend is indeed the least Herbrand model of the Append program, since the number of pairs of lists whose concatenation yields zs is clearly ﬁnite. Unfortunately, despite the fact that the set IAppend is the natural intended interpretation of the Append program, it is not a model of Append, because the ﬁrst clause does not hold in it. In fact, for many programs it is quite cumbersome to construct their least Herbrand model. Note for example that MAppend contains elements of the form append(s,t,u) where neither t nor u is a list. A correct deﬁnition of MAppend is rather intricate, and clearly, it is quite clumsy to reason about programs when even in so simple cases their semantics is deﬁned in such a laborious way. Why is the least Herbrand model diﬀerent from the speciﬁcation, or intended interpretation, of a program? The reason is that we usually design programs with

86

Paolo Mascellani and Dino Pedreschi

reference to a class of intended queries which describes the admissible input for the program. As a consequence, the speciﬁcation of the program is relative to the set of intended queries, whereas MP is not. In the example, the intended queries for Append are described by the set: {append(s,t,u) | s,t are lists or u is a list}

(3)

and it is possible to show that the speciﬁcation (2) is indeed the fragment of the least Herbrand model MAppend restricted to the set (3) of the intended queries. A method for identifying the intended fragment of the least Herbrand model is proposed in [4]; such a fragment is then used to establish the desired correctness formulas. This method adopts a notion of well-typedness [8,3,18], which makes it applicable to Prolog programs only, in that it exploits the left-to-right ordering of atoms within clauses. In the next section we introduce a more general characterization of the intended fragment of the least Herbrand model, which remedies the asymmetry of the method in [4], and allows us to prove partial correctness of logic programs with no reference to control issues. Bases The key concept of this paper is introduced in the following: Definition 1. An interpretation I is called a base for a program P w.r.t. some model M of P iﬀ, for every ground instance A ← A of every clause of P : if I |= A and M |= A, then I |= A.

2 The notion of a base has been designed to formalize the idea of an admissible set of “intended (one-atom) queries”. Deﬁnition 1 requires that all possible clauses which allow to deduce an atom A in a base I have their bodies true in I itself. The condition that the body is true in some model of the program (obviously a necessary condition to conclude A) is used to get a weakening of the requirement. Roughly speaking, a base is required to include all possible atoms needed to deduce any atom in the base itself. The concept of a base was ﬁrst introduced in [14], where it is referred to as a closed interpretation. As an example, it is readily checked that the set (3) is a base for Append. Since a base I is assumed to describe the intended queries, the intended fragment of the least Herbrand model is MP ∩ I. The main motivation for introducing the notion of a base is that of obtaining a method to identify MP ∩ I directly, without having to construct MP ﬁrst. To this purpose, given a base I for a program P , we deﬁne the reduced program, denoted PI , as the set of ground instances of clauses from P whose heads are in I. In other words, PI = {A ← A ∈ ground(P ) | A ∈ I}.

(4)

The Declarative Side of Magic

87

The following observation is immediate: TPI (X) = TP (X) ∩ I.

(5)

The following crucial result, ﬁrst shown in [14], shows that the least Herbrand model of the reduced program coincides with the intended fragment (w.r.t. I) of the least Herbrand model of the program: Theorem 2. Let P be a program and I a base for P . Then: MP ∩ I = MPI . Proof. First, we show that, for any interpretation J ⊆ MP : TPI (J) = TPI (J ∩ I).

(6)

The (⊇) part is a direct consequence of the fact that TPI is monotonic. To establish the (⊆) part, consider A ∈ TPI (J). Then, for some clause A ← A from PI , we have J |= A, hence MP |= A, which together with the fact that I is a base and I |= A, implies that I |= A. Therefore J ∩ I |= A, which implies A ∈ TPI (J ∩ I). We now show that, for all n > 0, TPn (∅) ∩ I = TPnI (∅) which implies the thesis. The proof is by induction on n. In the base case (n = 0), the claim is trivially true. In the induction case (n > 0), we calculate: TPn (∅) ∩ I = TP (TPn−1 (∅)) ∩ I {(5)} = TPI (TPn−1 (∅)) {TPn−1 (∅) ⊆ MP and (6)} = TPI (TPn−1 (∅) ∩ I) {induction hypothesis} = TPI (TPn−1 (∅)) I = TPnI (∅).

2 So, given a base I for program P , MPI is exactly the desired fragment of MP . The reduced program PI is a tool to construct such a desired fragment of MP without constructing MP ﬁrst. Therefore, MPI directly can be used to prove correctness formulas for intended queries, i.e. queries whose ground instances are in I, as stated in the following:

88

Paolo Mascellani and Dino Pedreschi

Theorem 3. Let P be program, I a base for P , and Q a one-atom query which admits ground correct instances only. Then: {Q}

P MPI ∩ [Q] .

Proof. By Theorem 2, MPI = MP ∩ I, and [Q] ⊆ I implies MP ∩ [Q] = MP ∩ I ∩ [Q]. The result then follows immediately from (1) or, equivalently, Theorem 1. 2 In the Append example, the intended speciﬁcation (2) is indeed the least Herbrand model of the Append program reduced with respect to the base (3), so, using Theorem 3, we can establish the desired triple: {append(Xs, Ys, zs)}

Append {append(xs, ys, zs) | zs = xs ∗ ys}

Later, a simple, induction-less, method for proving that a given interpretation is the least Herbrand model of certain programs is discussed. Example 1. Consider the following program ListSum, computing the sum of a list of natural numbers: listsum([],0) ← listsum([X|Xs],Sum) ← listsum(Xs,PSum),sum(PSum,X,Sum) sum(X,0,X) ← sum(X,s(Y),s(Z)) ← sum(X,Y,Z) and the Herbrand interpretations IListSum and M , deﬁned as follows: IListSum =

{listsum(xs, sum) | listnat(xs)} ∪ {sum(x, y, z) | nat(x) ∧ nat(y)}

M

{listsum(xs, sum) | listnat(xs) ⇒ nat(sum)} ∪ {sum(x, y, z) | nat(x) ∧ nat(y) ⇒ nat(z)}

=

where listnat(x) and nat(x) hold when x is, respectively, a list of natural numbers and a natural number. First, we check that M is a model of ListSum: M M M M

|= listsum([], 0) |= listsum([x|xs], sum) ⇐ M |= listsum(xs, psum), sum(psum, x, sum) |= sum(x, 0, x) |= sum(x, s(y), s(z)) ⇐ M |= sum(x, y, z)

Next, we check that IListSum is a base for ListSum w.r.t. M : IListSum |= listsum([x|xs], sum) ∧ M |= listsum(xs, psum), sum(psum, x, sum) ⇒ IListSum |= listsum(xs, psum), sum(psum, x, sum) IListSum |= sum(x, s(y), s(z)) ∧ M |= sum(x, y, z) ⇒ IListSum |= sum(x, y, z)

The Declarative Side of Magic

The following set is the intended interpretation of the ListSum program: listsum(xs, sum) | listnat(xs) ∧ sum = x∈xs x ∪ {sum(x, y, z) | nat(x) ∧ nat(y) ∧ x + y = z}

89

(7)

and, although it is not a model of the program (the unit clause of sum does not hold in it), it is possible to prove that it is the fragment of the MListSum restricted to the base MListSum. Therefore, by Theorem 3, provided xs is a list of natural numbers, we establish the following triple: x . {listsum(xs, Sum)} ListSum listsum(xs, sum) | sum = x∈xs

2

In many examples, like the one above, bases are constructed using type information. Typically, a base is constructed by specifying the types of input positions of relations, and the model associated with a base is constructed by specifying how types propagate from input to output positions. If a decidable type deﬁnition language is adopted, such as the one proposed in [10], then checking that a given interpretation is base is fully automatazible. However, a full treatment of this aspects is outside the scope of this paper.

3

Partial Correctness and Bottom-Up Computing

Consider a naive bottom-up evaluation of the ListSum program. The sequence of approximating sets is hard to compute for several reasons. 1. The unit clause sum(X, 0, X) ← introduces inﬁnitely many facts at the very ﬁrst step. In fact, such a clause is not safe in the sense of [19], i.e. variables occur in the head, which do not occur in the body. 2. Even if a safe version of the ListSum program is used, using a relation which generates natural numbers, the approximating sets grow exponentially large. 3. In any case, the bottom-up computation diverges. In a goal-driven execution starting from the query listsum(xs,X), where xs is the input list and X is a variable, however, only a linearly increasing subset of each approximation is relevant. A more eﬃcient bottom-up computation can be achieved using the program ListSum reduced w.r.t. an appropriate base I which includes all instances of the desired query. Indeed, Theorem 2 tells us that, in the bottom-up computation, it is equivalent to take the intersection with the base I at the limit of the computation, or at each approximation. The second option is clearly more eﬃcient, as it allows to discard promptly all facts which are unrelevant to the purpose of answering the desired query. Therefore, the base should be chosen as small as possible, in order to minimize the size of the approximations. However, computing with the reduced program is unrealistic for

90

Paolo Mascellani and Dino Pedreschi

two reasons. First, constructing a suitable base before the actual computation takes place is often impossible. In the ListSum example, an appropriate base should be chosen as follows: Ixs =

{listsum(ys, sum) | listnat(ys) ∧ ys is a suﬃx of xs} ∪ {sum(x, y, z) | nat(x) ∧ nat(y) ∧ z ≥ n}

where xs is the input list and n is the sum of the numbers in xs, so the expected result of the computation! Second, a reduced program is generally inﬁnite or, at best, hopelessly large. Nevertheless, bases and reduced programs are useful abstractions to explain the idea behind the optimization techniques like magic-sets, widely used in deductive database systems to support eﬃcient bottom-up execution of goal-driven deduction. In fact, we shall see how the optimized magic program is designed to combine the construction of a base and its exploitation in an intertwined computation.

The Magic-Sets Transformation In the literature, the problem of the eﬃcient bottom-up execution of goal-driven computations has been tackled in a compilative way, i.e. by means of a repertoire of transformation techniques which are known under the name of magic-sets— see [9] or [20, Ch. 13] for a survey on this broad argument. Magic-sets is a non trivial program transformation which, given a program P and a query Q, yields a transformed program which, when executed bottom-up, mimics the top-down, Prolog-like execution of the original program P , activated on the query Q. Many variations of the basic magic-sets technique have been proposed, which however share the original idea. All available justiﬁcations of its correctness are given by means of procedural arguments, by relating the bottom-up computation of the transformed (magic) program with the top-down computation of the original program and query. As a consequence, all known proofs of correctness of the magic-sets transformation(s) are rather complicated, although informative about the relative eﬃciency of the top-down and bottom-up procedures—see for instance [20, pp.836-841]. We show here how the core of the magic-sets transformation can be explained in rather natural declarative terms, by adopting the notion of a base, and the related results discussed in the previous section. Actually, we show that the “magic” of the transformation lies in computing and exploiting a base of the original program. We provide an incremental version of the core magic-sets transformation, which allows us to compile separately each clause of the program. We need to introduce the concept of call pattern, or mode, which relates to that of binding pattern in [20]. Informally, modes indicate whether the arguments of a relation should be used either as an input or as an output, thus specifying the way a given program is intended to be queried.

The Declarative Side of Magic

91

Definition 2. Consider an n-ary relation symbol p. A mode for p is a function: mp : [1, n] → {+, −} . If mp (i) = + , we call i an input position of p, otherwise we call i an output position of p. By a moding we mean a collection of modes, one for each relation 2 symbol in a program. We represent modes in a compact way, writing mp in the more suggestive form p(mp (1), . . . , mp (n)). For instance the mode sum(+,+,-) speciﬁes the input/output behavior of the relation sum, which is therefore expected to be queried with the two ﬁrst positions ﬁlled in with ground terms. ¿From now on we assume that some ﬁxed moding is given for any considered program. To simplify our notation, we assume, without loss of generality, that, in each relation, input positions precede output positions, so that any atom A can be viewed as p(u, v), where u are the terms in the input positions of p and v are the terms in the output positions of p. With reference to this notation, the magic version of an atom A = p(u, v), denoted A , is the atom p (u), where p is a fresh predicate symbol (not occurring elsewhere in the program), whose arity is the number of input position of p. Intuitively, the magic atom p (u) represent the fact that the relation p is called with input arguments u. We are now ready to introduce our version of the magic-sets transformation. Definition 3. Consider a program P and a one-atom query Q. The magic program O is obtained from P and Q by the following transformation steps: 1. for every decomposition A ← A, B, B of every clause from P , add a new clause B ← A , A; 2. add a new unit clause Q ← ; 3. replace each original clause A ← A from P with the new clause A ← A , A.

2 The magic program O is the optimized version of the program P w.r.t. the query Q. Observe that the transformation step (1) is performed in correspondence with every body atom of every clause in the program. Also, the only unit clause, or fact, is that introduced at step (2), also called a “seed”. The collection of clauses generated at steps (1) and (2) allows to deduce all the magic atoms corresponding to the calls generated in the top-down/left-to-right execution of the original program P starting with the query Q. The declarative reading of the clause B ← A , A introduced at step (1) is: “if the relation in the head of the original clause is called with input arguments as in A , and the atoms A preceding B in the original clause have been deduced, then the relation B is called with input arguments as in B ”. Finally, the information about the calls represented by the magic atoms is exploited at step (3), where the premises of

92

Paolo Mascellani and Dino Pedreschi

the original clauses are strengthened by an extra constraint, namely that the conclusion A is taken only if it is pertinent to some needed call, represented by the fact that A has been deduced. Example 2. Consider the program ListSum of Example 1 with the moding: listsum(+,-) sum(+,+,-) and the query: listsum([2,1,5],Sum) that is consistent with the moding. The corresponding magic program is: listsum([],0) ← listsum’([]) listsum([X|Xs],Sum) ← listSum’([X|Xs]),listsum(Xs,PSum),sum(PSum,X,Sum) sum(X,0,X) ← sum’(X,0) sum(X,s(Y),s(Z)) ← sum’(X,s(Y)),sum(X,Y,Z) listsum’(Xs) ← listsum’([X|Xs]) sum’(Psum,X) ← listsum’([X|Xs]),listsum(Xs,PSum) sum’(X,Y) ← sum’(X,s(Y)) listsum’([2,1,5]) ←

2 Partial Correctness of the Magic-Sets Transformation We now want to show that the magic-sets transformation is correct. The correctness of the transformation is stated in natural terms in the main result of this section, which essentially says that the original and the magic program share the same logical consequences, when both are restricted to the intended query. Theorem 4. Let P be a program, Q be a one-atom query, and consider the magic program O. Then: MP ∩ [Q] = MO ∩ [Q]. Proof. The proof is organized in the following three steps: 1. the interpretation M = {A ∈ BP | MO |= A ⇒ MO |= A} is a model of P ; 2. the interpretation I = {A ∈ BP | MO |= A } is a base for P w.r.t. M ; 3. MP ∩ I = MO ∩ I.

The Declarative Side of Magic

93

The thesis follows directly from (3), observing that [Q] ⊆ I as a consequence of the fact that the magic program O contains the seed fact Q ← . We now prove the facts (1), (2) and (3). Proof of 1 Consider a ground instance A ← A of a clause from P : to show that M is a model of the clause, we assume: M |= A MO |= A

(8) (9)

and prove that MO |= A. In turn, such conclusion is implied by MO |= A as a consequence of (9) and the fact that the magic program O contains the clause A ← A , A. To prove MO |= A we proceed by induction on A: in the base case (A is empty) the conclusion trivially holds. In the induction case (A = B, B, C) the magic program contains the clause B ← A , B, and therefore MO |= B as a consequence of (9) and the induction hypothesis. As M |= B by (8), we have that MO |= B implies MO |= B, by the deﬁnition of M . Proof of 2 Consider a ground instance A ← A of a clause from P , and assume: I |= A M |= A

(10) (11)

To obtain the desired conclusion, we prove that I |= A by induction on A. In the base case (A is empty) the conclusion trivially holds. In the induction case (A = B, B, C) the magic program O contains the clause c : B ← A , B. By the induction hypothesis, I |= B, which implies MO |= B by the deﬁnition of I. This, together with (11), implies: MO |= B

(12)

by the deﬁnition of M . Next, by (10) and the deﬁnition of I, we obtain MO |= A , which, together with (12) and clause c, implies MO |= B . This directly implies I |= B. Proof of 3 (⊆). First we show that MO is a model of PI . In fact, consider clause A ← A of PI , and assume that MO |= A. By the deﬁnition of PI , I |= A, which by the deﬁnition of I implies MO |= A . Hence, considering that A ← A , A is a ground instance of a cause of O, MO |= A. This implies that MO includes MPI , which, by Lemma 2, is equal to MP ∩ I, since I is a base for P from (i) and (ii). (⊇). Clearly MO ∩ BP ⊆ MP , as the clauses from P are strengthened in O with extra premises in the body. Hence, observing that I ⊆ BP we obtain MO ∩ I ⊆ MP ∩ I. 2 The crucial point in this proof is the fact that the set I of atoms corresponding to the magic atoms derived in O is a base, i.e. an admissible set of intended

94

Paolo Mascellani and Dino Pedreschi

queries, which describes all possible calls to the program originating from the top level query Q. An Immediate consequence of Theorem 4 is the following: Corollary 1. Let P be a program, Q be a one-atom query, and consider the magic program O. Then, A is a ground instance of a computed answer of Q in 2 P iﬀ it is a ground instance of a computed answer of Q in O. Observe that the above equivalence result is obtained with no requirement about the fact the original program respects the speciﬁc moding, nor with any need of performing the so-called bound/free analysis. In this sense, this result is more general to the equivalence results in the literature, based on procedural reasoning. However, these results, such as that in [20] tell us more from the point of view of the relative eﬃciency of bottom-up and top-down computing. As a consequence of Theorems 1 and 4, we can conclude that, for any oneatom query A which admits only ground correct instances w.r.t. a program P , the following triple holds: {A}

P MO ∩ [A]

(13)

i.e. the computed instances of A in P coincide with the correct instances of A in the magic program O. However, we need a syntactic condition able to guarantee that every correct instance is ground. Well-Moded Programs In the practice of deductive databases, the magic-sets transformation is applied to so-called well-moded programs, as for this programs the computational beneﬁts of the transformation are fully exploited, in a sense which shall be clariﬁed in the sequel. Definition 4. With reference to some speciﬁc, ﬁxed moding: – a one-atom query p(i, o) is called well-moded iﬀ: vars(i) = ∅; – a clause p0 (o0 , in+1 ) ← p1 (i1 , o1 ), . . . , pn (in , on ) is called well-moded if, for i ∈ [1, n + 1]: vars(ii ) ⊆ vars(o0 ) ∪ · · · ∪ vars(oi−1 ); – a program is called well-moded if every clause of it is.

2

The Declarative Side of Magic

95

Thus, in well-moded clauses, all variables in the input positions of a body atom occur earlier in the clause, either in an output position of a preceding body atom, or in an input position of the head. Also, one-atom well-moded queries are ground at input positions. Well-modedness is a simple syntactic condition which guarantees that a given program satisﬁes a given moding. A well-known property of well-moded programs and queries is that they deliver ground output. Theorem 5. Let P be a well-moded program, and A a one-atom well-moded query. Then every computed instance of A in P is ground. Proof. See, for instance, [5]. The general idea of this proof is to show the following points: 1. at each step of the resolution process, the selected atom is well-moded; 2. all the output terms of a selected atom in a refutation appears in the input term of some selected atom of the refutation. This, together with the fact that the ﬁrst selected atom (the query) is well2 moded, implies the claim. So, well-modedness provides a (syntactic) suﬃcient condition to fulﬁll the proof obligation of triple (13). Example 3. The program ListSum of Example 1 is well-moded w.r.t.: listsum(+,-) sum(+,+,-) hence the following triple can be established: {listsum(xs, Sum)}

ListSum MListSum ∩ [listsum(xs, Sum)]

Consider the magic program O for ListSum and listsum(xs,Sum). As a consequence of (13), we can also establish that: {listsum(xs, Sum)}

ListSum MO ∩ [listsum(xs, Sum)]

So the computed instances of the desired query can be deduced using the magic program O. This is relevant because, as we shall see later, bottom-up computing with the magic program is much easier than with the original pro2 gram. Moreover, well-modedness of the original program implies safety of the magic program, in the sense of [19]: every variable that occurs in the head of a clause of the magic program, also occurs in its body. Theorem 6. Let P be a well-moded program and Q a well-moded query. Then, the magic program O is safe.

96

Paolo Mascellani and Dino Pedreschi

Proof. By Deﬁnition 3, there are three types of clauses in O. Case A ← A , A The variables in the input positions of A occur in A , by Deﬁnition 3. By Deﬁnition 4, the variables in the output positions of A appear either in the input positions of A, and hence in A , or in the output positions of A. Case Q ← By the fact the Q is well-moded, Q is ground. Case B ← A , A By Deﬁnition 3, the original clause from P is A ← A, B, B. The variables of B are those in the input positions of B, that, by Deﬁnition 4, occur either in the input terms of A, and hence in A , or in the output terms of A. 2 Thus, despite the fact that a well-moded program, such as ListSum of Example 1, may not be suited for bottom-up computing, its magic version is, in the sense that the minimum requirement that ﬁnitely many new facts are inferred at each bottom-up iteration is fulﬁlled. We conclude this section with some remarks about the transformation. First, observe that the optimization algorithm is modular, in the sense that each clause can be optimized separately. In particular we can obtain the optimized program transforming the program at compile time and the query, which provides the seed for the computation, at run time. Second, non-atomic queries can be dealt with easily: given a query A, it is suﬃcient to add to the program a new clause ans(X) ← A, where ans is a fresh predicate and X are the variables in A, and optimize the extended program w.r.t. the one-atom query ans(X). Finally, the traditional distinction between an extensional database (EDB) and an intensional one (IDB) is immaterial to the discussion presented in this paper.

4

Total Correctness

What is the meaning of a triple {Q} P Q in the sense of total correctness? Several interpretations are possible, but the most common is to require partial correctness plus the fact that all derivations for Q in P are ﬁnite—a property which is referred to as universal termination. However, such a requirement would be unnecessarily restrictive if an arbitrary selection strategy is allowed in the top-down computation. For this reason, the termination analysis is usually tailored for some particular top-down strategy, such as Prolog’s depth-ﬁrst strategy combined with a leftmost selection rule, referred to as LD-resolution. A proof method for termination of Prolog programs is introduced in [6,7], based on the following notion of an acceptable program. Definition 5. Let A be an atom and c be a clause, then: – A level mapping is a function | | from ground atoms to natural numbers. – A is bounded w.r.t. | |, if | | is bounded on the set of all ground instances of A.

The Declarative Side of Magic

97

– c is acceptable w.r.t. | | and an interpretation I, if • I is a model of c, • for all ground instances A ← A, B, B of c such that I |= A |A| > |B|. – A program is acceptable w.r.t. | | and I, if every clause of it is.

2

The intuition behind this deﬁnition is the following. The level mapping plays the role of a termination function, and it is required to decrease from head to the body of any (ground instance of a) clause. The model I used in the notion of acceptability gives a declarative account of the leftmost selection rule of Prolog. The decreasing of the level mapping from the head A to a body atom B is required only if the body atoms to the left of B have been already refuted: in this case, by the Soundness of SLD-resolution, these atoms are true in any model of the program. In the proof method, the model I is employed to propagate inter-argument relations from left to right. The following result about acceptable programs holds. Theorem 7. Suppose that – the program P is acceptable w.r.t. | | and I, – the one-atom query Q is bounded w.r.t. | |. Then all Prolog computations of Q in P are ﬁnite. Proof. See [6,7], for a detailed proof. The general idea is to associate a multiset of integers to each query of the resolution and to show the multiset associated 2 with a query is strictly greater than the one associated with its resolvent. Moreover, it is possible to show that each terminating Prolog program P is acceptable w.r.t. the following level mapping: |A| = nodesP (A) where nodesP denotes the number of nodes in the S-tree for P ∪ { ← A}. Example 4. The program ListSum of Example 1 is acceptable w.r.t. any model and the level mapping | | deﬁned as follows: |listsum(xs, sum)| = size(xs) |sum(x, y, z)| = size(y) where size(t) counts the number of symbols in the (ground) term t. This can be easily checked simply observing that the number of functional symbols of every atom in the body of the clauses is strictly less than the number of functional symbols in the corresponding head.

98

Paolo Mascellani and Dino Pedreschi

Also, for every ground term xs and variable Sum, the query listsum(xs,Sum) is bounded, so every Prolog computation for it terminates, as a consequence of Theorem 7. In many cases, a non-trivial model is needed in the proof of termination. In the ListSum example, if the two input arguments of the relation sum in the recursive clause of listsum are swapped, then a model I is needed, such that I |= listsum(xs, sum) iﬀ size(xs) ≥ size(sum). Moreover, it is in general possible to use simpler level mappings, but this 2 requires more complicate deﬁnitions: see [7,15] for details. Besides its use in proving termination, the notion of acceptability makes the task of constructing the least Herbrand model of a program much easier. Call an interpretation I for a program P supported if for any A ∈ I there exists a ground instance A ← B of a clause from P such that I |= B. The following result from [6] holds. Theorem 8. Any acceptable program P has a unique supported model, which coincides with its least Herbrand model MP . Proof. See [6] for details. Consider a ﬁx-point X of TP , strictly greater that MP , and an element A ∈ X\MP ; then, there must be a ground atom B ∈ X\MP such that A ← A, B, B ∈ ground(P ). But this leads to an inﬁnite chain of resovents, 2 starting from A. Usually, checking that an interpretation is a supported model of the program is straightforward, and does not require inductive reasoning. Also, this technique can be used with the reduced program, as reduced programs of acceptable programs are in turn acceptable. Summarizing, the problem of establishing a triple {A} P A in the sense of total correctness, for a well-moded program P and query A, can be solved by the following steps: 1. ﬁnd a base I for P such that [A] ⊆ I; 2. show that P is acceptable and A is bounded w.r.t. the same model and level mapping; 3. ﬁnd a supported model M of PI ; 4. check that A = M ∩ [A]. In the Append example of Section 2, it is easy to show that the set (2), namely {append(xs,ys,zs) | xs,ys,zs are lists and xs * ys = zs} is indeed a supported model of the program reduced by its base (3), so the desired triple can be established. In the ListSum example, it is readily checked that the set 7 from Example 2 is a supported model of the program reduced by its base IListSum .

5

Total Correctness and Bottom-Up Computing

Although a thorough study of the relative eﬃciency of bottom-up and top-down execution is outside the reach of our declarative methods, we are able to show

The Declarative Side of Magic

99

the total correctness of the magic-sets transformation on the basis of the results of the previous section. In fact, we can show that if the original program is terminating in a top-down sense, then the magic program is terminating in a bottom-up sense, in a way which is made precise by the next result. Two assumptions on the original programs are necessary, namely acceptability, which implies termination, and well-modedness, which implies ground output. Theorem 9. Let P be a well-moded, acceptable program, and Q a one-atom well-moded, bounded query. Then the least Herbrand model of the magic program O is ﬁnite. Proof. Let I and | | be the model and level mapping used in the proof of acceptability. We deﬁne a mapping of magic atoms into ω ∪ ∞ as follows: |A | = max{|B| | A = B }. Next, we show that MO contains a ﬁnite number of magic atoms. First, we observe that, for the seed fact Q ∈ TP (∅), |Q | < ω, as the query Q is bounded. Consider now a magic atom B deduced at stage n > 1 in the bottom-up computation, i.e. B ∈ TOn (∅) \ TOn−1 (∅). By the magic transformation, there is a clause B ← A , A in O such that TOn−1 (∅) |= A , A. Since TOn−1 (∅) |= A implies that A holds in any model of P by the partial correctness Theorem 4, we have by the acceptability of P that , for each clause A ← A, B, B in P , |A| > |B|, which implies |A | > |B |. Therefore, the level of newly deduced magic atoms is smaller than that of some preexisting magic atoms, which implies that ﬁnitely many magic atoms are in MO . To conclude the proof, we have to show that there are ﬁnitely many non-magic atoms in MO . Observe that every non-magic atom A of MO is a computed answer of a query B such that MO |= B . Given A ∈ MO , consider a query B with its output positions ﬁlled with distinct variables, and B = A . By Theorems 7 and 5, B has a ﬁnite set of ground computed answers. The thesis then follows by the fact that ﬁnitely many magic atoms are in MO . 2 As an immediate consequence of this theorem we have that, for some n ≥ 0: TPnI (∅) = MO and therefore the bottom-up computation with O terminates. Notice that this result does not imply that the bottom-up computation with O and the topdown one with P are equally eﬃcient, although both terminates. In [20], an extra condition on the original program is required, namely that it is subgoal rectiﬁed, in order to obtain that the cost of computing with the magic program is proportional to top-down evaluation. As a ﬁnal example, consider again the ListSum program of Example 1 and the query listsum(xs,Sum). By the partial correctness results, we know that: {listsum(xs, Sum)}

ListSum MO ∩ [listsum(xs, Sum)]

100

Paolo Mascellani and Dino Pedreschi

By Theorem 9 MO is ﬁnite, so we can actually perform a bottom-up computation with O, thus obtaining MO ﬁrst, and then extract the desired computed instances from it.

6

Examples

Length of a List Consider the program ListLen, the call pattern listlen(+,-) and the query listlen([a,b,b,a]). The optimized program is: listlen([],0) ← listlen’([]) listlen([X|Xs],s(L)) ← listlen’([X|Xs]),base(X),listlen(Xs,L) listlen’(Xs) ← base(X),listlen’([X|Xs]) listlen’([a,b,b,a]) ← As we can see there is only one clause which depends from the query, namely the optimized query w.r.t. C, and it can be easily produced at run time. The bottom-up evaluation of the optimized program is: TP1 (∅) = TP2 (∅) = TP3 (∅) = TP4 (∅) = TP5 (∅) = TP6 (∅) = TP7 (∅) = TP8 (∅) = TP9 (∅) = TP10 (∅) =

TP1 (∅) TP2 (∅) TP3 (∅) TP4 (∅) TP5 (∅) TP6 (∅) TP7 (∅) TP8 (∅) TP9 (∅)

{listlen([a, b, b, a])} ∪ {listlen([b, b, a])} ∪ {listlen([b, a])} ∪ {listlen([a])} ∪ {listlen([])} ∪ {listlen([], 0)} ∪ {listlen([b, a], s(s(0)))} ∪ {listlen([b, b, a], s(s(s(0))))} ∪ {listlen([a, b, b, a], s(s(s(s(0)))))}

It can be noted that in the ﬁrst part of the computation the optimized program computes the closed interpretation IListlen,[a,b,b,a] , and in the last one uses it in order to optimize the computation. Sum of a List of Numbers Consider the program ListSum, the call patterns: listsum(+,-) sum(+,+,-) and the query listsum([s(0),s(s(0))], Sum). The optimized program is:

The Declarative Side of Magic

101

listsum([],0) ← listsum’([]) listsum([X|Xs],Sum) ← listSum’([X|Xs]),listsum(Xs,PSum),sum(PSum,X,Sum) sum(X,0,X) ← sum’(X,0),nat(X) sum(X,s(Y),s(Z)) ← sum’(X,s(Y)),sum(X,Y,Z) listsum’(Xs) ← listsum’([X|Xs]) sum’(Psum,X) ← listsum’([X|Xs]),listsum(Xs,PSum) sum’(X,Y) ← sum’(X,s(Y)) listsum’([s(0),s(s(0))]) ← The bottom-up evaluation of the optimized program is: TP1 (∅) = TP2 (∅) = TP3 (∅) = TP4 (∅) = TP5 (∅) = TP6 (∅) = TP7 (∅) = TP8 (∅) = TP9 (∅) = TP10 (∅) = TP11 (∅) = TP12 (∅) = TP13 (∅) = TP14 (∅) = TP15 (∅) = TP16 (∅) =

TP1 (∅) ∪ TP2 (∅) ∪ TP3 (∅) ∪ TP4 (∅) ∪ TP5 (∅) ∪ TP6 (∅) ∪ TP7 (∅) ∪ TP8 (∅) ∪ TP9 (∅) ∪ TP10 (∅) ∪ TP11 (∅) ∪ TP12 (∅) ∪ TP13 (∅) ∪ TP14 (∅) ∪ TP15 (∅) ∪

{listsum([s(0), s(s(0))])} {listsum([s(s(0))])} {listsum([])} {listsum([], 0)} {sum (0, s(s(0)))} {sum (0, s(0))} {sum (0, 0)} {sum(0, 0, 0)} {sum(0, s(0), s(0))} {sum(0, s(s(0)), s(s(0)))} {listsum([s(s(0))], s(s(0)))} {sum (s(s(0)), s(0))} {sum (s(s(0)), 0)} {sum(s(s(0)), 0, s(s(0)))} {sum(s(s(0)), s(0), s(s(s(0))))} {listsum([s(s(0)), s(0)], s(s(s(0))))}

In this case the computation of the closed interpretation is interlaced with the computation of the interesting part of the least Herbrand model. Ancestors Consider the following program Ancestor: ancestor(X,Y) ← parent(X,Y) ancestor(X,Y) ← parent(X,Z),ancestor(Z,Y) where P arent is a base relation. Consider the moding ancestor(+,-) and the query ancestor(f,Y). The optimized program is: ancestor(X,Y) ← ancestor’(X),parent(X,Y) ancestor(X,Y) ← ancestor’(X),parent(X,Z),ancestor(Z,Y) ancestor’(Y) ← parent(X,Y),ancestor’(X) ancestor’(a) ←

102

Paolo Mascellani and Dino Pedreschi

If we suppose the following deﬁnition for the base relation parent:

parent(a,b) parent(a,c) parent(a,d) parent(e,b) parent(e,c) parent(e,d) parent(f,a) parent(f,g) parent(h,e) parent(h,i)

← ← ← ← ← ← ← ← ← ←

The computation is: {ancestor (f)} ancestor (a) ancestor(g) 2 1 TP (∅) = TP (∅) ∪ ancestor(f, a) g) ancestor(f, ancestor(b) ancestor(c) ancestor (d) TP3 (∅) = TP2 (∅) ∪ ancestor(f, b) ancestor(f, c) ancestor(f, d) TP4 (∅) = TP3 (∅)

TP1 (∅) =

However, we obtain a diﬀerent optimized program if we consider the moding ancestor(-,+) and the query ancestor(X,b):

ancestor(X,Y) ← ancestor’(Y),parent(X,Y) ancestor(X,Y) ← ancestor’(Y),parent(X,Z), ancestor(Z,Y) ancestor’(X) ← parent(X,Y),ancestor’(Y) ancestor’(Y) ←

The Declarative Side of Magic

103

The computation is: {ancestor (b)} (a) ancestor (e) ancestor 2 1 TP (∅) = TP (∅) ∪ ancestor(a, b) b) acenstor(e, ancestor (f) ancestor (h) 3 2 TP (∅) = TP (∅) ∪ ancestor(f, b) ancestor(h, b) 4 3 TP (∅) = TP (∅)

TP1 (∅) =

As we can see, diﬀerent call patterns generate diﬀerent optimized program. In general these programs are not equivalent. Powers Consider now the following program Power, which computes xy , where x and y are natural numbers: power(X,0,s(0)) ← power(X,s(Y),Z) ← power(X,Y,W),times(X,W,Z) times(X,0,0) ← times(X,s(Y),Z) ← times(X,Y,W),sum(X,W,Z) sum(X,0,X) ← sum(X,s(Y),s(Z)) ← sum(X,Y,Z) If we consider the call patterns: power(+,+,-) times(+,+,-) sum(+,+,-) and the query: power(s(s(0)),s(s(0)),Z) the optimized program is: power(X,0,s(0)) ← power’(X,0) power(X,s(Y),Z) ← power’(X,s(Y)),power(X,Y,W),times(X,W,Z) times(X,0,0) ← times’(X,0) times(X,s(Y),Z) ← times’(X,s(Y)),times(X,Y,W),sum(X,W,Z) sum(X,0,X) ← sum’(X,0) sum(X,s(Y),s(Z)) ← sum’(X,s(Y)),sum(X,Y,Z) power’(X,Y) ← power’(X,s(Y))

104

Paolo Mascellani and Dino Pedreschi

times’(X,W) ← power(X,Y,W),power’(X,s(Y)) times’(X,Y) ← times’(X,s(Y)) sum’(X,W) ← times(X,Y,W),times’(X,s(Y)) sum’(X,Y) ← sum’(X,s(Y)) power’(s(s(0)),s(s(0))) ← The computation is: TP1 (∅) = TP2 (∅) = TP3 (∅) = TP4 (∅) = TP5 (∅) = TP6 (∅) = TP7 (∅) = TP8 (∅) = TP9 (∅) = TP10 (∅) = TP11 (∅) = TP12 (∅) = TP13 (∅) = TP14 (∅) = TP15 (∅) = TP16 (∅) = TP17 (∅) = TP18 (∅) = TP19 (∅) =

TP1 (∅) ∪ TP2 (∅) ∪ TP3 (∅) ∪ TP4 (∅) ∪ TP5 (∅) ∪ TP6 (∅) ∪ TP7 (∅) ∪ TP8 (∅) ∪ TP9 (∅) ∪ TP10 (∅) ∪ TP11 (∅) ∪ TP12 (∅) ∪ TP13 (∅) ∪ TP14 (∅) ∪ TP15 (∅) ∪ TP16 (∅) ∪ TP17 (∅) ∪ TP18 (∅)

{power(s(s(0)), s(s(0)))} {power(s(s(0)), s(0))} {power(s(s(0)), 0)} {power(s(s(0)), 0, s(0))} {times(s(s(0)), s(0))} {times(s(s(0)), 0)} {times(s(s(0)), 0, 0)} {sum (s(s(0)), 0)} {sum(s(s(0)), 0, s(s(0)))} {times(s(s(0)), s(0), s(s(0)))} {power(s(s(0)), s(0), s(s(0)))} {times(s(s(0)), s(s(0)))} {sum (s(s(0)), s(s(0)))} {sum (s(s(0)), s(0))} {sum(s(s(0)), s(0), s(s(s(0))))} {sum(s(s(0)), s(s(0)), s(s(s(s(0)))))} {times(s(s(0)), s(s(0)), s(s(s(s(0)))))} {power(s(s(0)), s(s(0)), s(s(s(s(0)))))}

It is interesting to note that the computation is, in this case, really closed to that generate by a functional program with lazy evaluation. Binary Search Consider the following program Search, implementing the dichotomic (or binary) search on a list of pairs (Key, V alue) ordered with respect to Key: search(N,Xs,M) ← divide(Xs,Xs1,X,Y,Xs2),switch(N,X,Y,Xs1,Xs2,M) switch(N,N,M,Xs1,Xs2,M) ← key(N),value(M) switch(N,X,Y,Xs1,Xs2,M) ← greater(N,X),search(N,Xs2,M) switch(N,X,Y,Xs1,Xs2,M) ← greater(X,N),search(N,Xs1,M) where Key and Value are base relations. Observe that the program is not completely speciﬁed, as the relations Divide, and Greater have no deﬁnition. If we consider the following call patterns:

The Declarative Side of Magic

105

search(+,+,-) switch(+,+,+,+,+,-) and the query search(5,[(1,a),(3,b),(5,a),(10,c)],M), the optimized program is: search(N,Xs,M) ← search’(N,Xs),divide(Xs,Xs1,X,Y,Xs2), switch(N,X,Y,Xs1,Xs2,M) switch(N,N,M,Xs1,Xs2,M) ← switch’(N,N,M,Xs1,Xs2), key(N),value(M) switch(N,X,Y,Xs1,Xs2,M) ← switch’(N,X,Y,Xs1,Xs2),greater(N,X), search(N,Xs2,M) switch(N,X,Y,Xs1,Xs2,M) ← switch’(N,X,Y,Xs1,Xs2),greater(X,N), search(N,Xs1,M) switch’(N,X,Y,Xs1,Xs2) ← divide(Xs,Xs1,X,Y,Xs2),search’(N,Xs) search’(N,Xs2) ← N>X,switch’(N,X,Y,Xs1,Xs2) search’(N,Xs1) ← N<X,switch’(N,X,Y,Xs1,Xs2) search’(5,[(1,a),(3,b),(5,a),(10,c)]) ← The computation is the following: TP1 (∅) = TP2 (∅) = TP3 (∅) = TP4 (∅) = TP5 (∅) = TP6 (∅) = TP7 (∅) = TP8 (∅) = TP9 (∅) =

TP1 (∅) TP2 (∅) TP3 (∅) TP4 (∅) TP5 (∅) TP6 (∅) TP7 (∅) TP8 (∅)

{search(5, [(1, a), (3, b), (5, a), (10, c)])} ∪ {switch(5, 3, b, [(1, a)], [(5, a), (10, c)])} ∪ {search(5, [(5, a), (10, c)])} ∪ {switch(5, 5, a, [], (10, c)])} ∪ {switch(5, 5, a, [], (10, c)], a)} ∪ {search(5, [(5, a), (10, c)], a)} ∪ {switch(5, 3, b, [(1, a)], [(5, a), (10, c)], a)} ∪ {search(5, , [(1, a), (3, b), (5, a), (10, c)], a)}

Fibonacci Numbers Consider the following program, that computes the Fibonacci numbers: fib(0,0) ← fib(s(0),s(0)) ← fib(s(s(X)),Y) ← fib(s(X),Y1),fib(X,Y2),sum(Y1,Y2,Y) sum(X,0,X) ← sum(X,s(Y),s(Z)) ← sum(X,Y,Z) with the moding: fib(+,-) sum(+,+,-)

106

Paolo Mascellani and Dino Pedreschi

and the query fib(s(s(s(0)))),Y). The optimized program is: fib’(s(s(s(0)))) ← fib’(s(X)) ← fib’(s(s(X))) fib’(X) ← fib’(s(s(X)),fib(s(X),Y1) sum’(Y1,Y2) ← fib’(s(s(X))),fib(s(X),Y1),fib(X,Y2) sum’(X,Y) ← sum’(X,s(Y)) fib(0,0) ← fib’(0) fib(s(0),s(0)) ← fib’(s(0)) fib(s(s(X)),Y) ← fib’(s(s(X))),fib(s(X),Y1),fib(X,Y2),sum(Y1,Y2,Y) sum(X,0,X) ← sum’(X,0) sum(X,s(Y),s(Z)) ← sum’(X,s(Y)),sum(X,Y,Z) The computation is the following: TP1 (∅) = TP2 (∅) = TP3 (∅) = TP4 (∅) = TP5 (∅) = TP6 (∅) = TP7 (∅) = TP8 (∅) = TP9 (∅) = TP10 (∅) = TP11 (∅) = TP12 (∅) =

TP1 (∅) ∪ TP2 (∅) ∪ TP3 (∅) ∪ TP4 (∅) ∪ TP5 (∅) ∪ TP6 (∅) ∪ TP7 (∅) ∪ TP8 (∅) ∪ TP9 (∅) ∪ TP10 (∅) ∪ TP11 (∅)

{fib (s(s(s(0))))} {fib (s(s(0)))} {fib (s(0))} {fib (0), fib(s(0), s(0))} {fib(0, 0)} {sum (s(0), 0)} {sum(s(0), 0, s(0))} {fib(s(s(0)), s(0))} {sum (s(0), s(0))} {sum(s(0), s(0), s(s(0))} {fib(s(s(s(0)))), s(s(0))}

Here we can observe that the magic-sets transformation is suitable also for nonlinear recursive programs, i.e. program with more than one mutually recursive body atoms. Once again we can see that the computation is “lazy”.

7

Conclusions

In this paper, we introduced a method for proving partial correctness, revised another method for total correctness, and applied both to the case study of the magic-sets transformation for goal-driven bottom-up computing. The obtained results rely on purely declarative reasoning, abstracting away from procedural semantics, and are new under various points of view. First, partial correctness is obtained without any assumptions that the program respects the given moding. Second, termination is obtained under the only assumptions of well-modedness, which is natural in practical bottom-up computing, and acceptability, which is a necessary and suﬃcient condition for top-down termination.

The Declarative Side of Magic

107

Moreover, both partial correctness and termination are established for logic programs in full generality, and not only for function-free Datalog programs. Further research may be pursued on the topics of this paper. For instance, we are conﬁdent that the same kind of result can be established for other variants of the magic-sets transformation technique and also for extensions of it to general logic programs (i.e. logic program with negation in the body of the clauses). Moreover, it is interesting to investigate whether other optimization techniques may be deﬁned using the concept of base. Acknowledgements Thanks are owing to Yeoshua Sagiv for useful discussions.

References 1. 2.

3.

4.

5. 6. 7.

8.

9.

10.

11. 12.

K.R. Apt. Logic programming. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B, pages 493–574. Elsevier, 1990. K. R. Apt. Declarative programming in Prolog. In D. Miller, editor, Proc. International Symposium on Logic Programming, pages 11–35. MIT Press, 1993. K.R. Apt. Program Veriﬁcation and Prolog. In E. B¨ orger, editor, Specification and Validation methods for Programming languages and systems. Oxford University Press, 1994. K.R. Apt, M. Gabbrielli, and D. Pedreschi. A Closer Look at Declarative Interpretations. Technical Report CS-R9470, Centre for Mathematics and Computer Science, Amsterdam, Journal of Logic Programming. 28(2): 147180, 1996. K.R. Apt and E. Marchiori. Reasoning about Prolog programs: from modes through types to assertions. Formal Aspects of Computing, 6A:743–764, 1994. K.R. Apt and D. Pedreschi. Reasoning about termination of pure prolog programs. Information and computation, 106(1):109–157, 1993. K. R. Apt and D. Pedreschi. Modular termination proofs for logic and pure Prolog programs. In G. Levi, editor, Advances in Logic Programming Theory, pages 183–229. Oxford University Press, 1994. A. Bossi and N. Cocco. Verifying Correctness of Logic Programs. In J. Diaz and F. Orejas, editors, TAPSOFT ’89, volume 352 of Lecture Notes in Computer Science, pages 96–110. Springer-Verlag, Berlin, 1989. C. Beeri and R. Ramakrishnan. The power of magic. In Proc. 6th ACMSIGMOD-SIGACT Symposium on Principles of Database systems, pages 269– 283. The Association for Computing Machinery, New York, 1987. F. Bronsard, T.K. Lakshman, and U.S. Reddy. A framework of directionality for proving termination of logic programs. In K. R. Apt, editor, Proceedings of the Joint International Conference and Symposium on Logic Programming, pages 321–335. MIT Press, 1992. P. Deransart. Proof methods of declarative properties of deﬁnite programs. Theoretical Computer Science, 118:99–166, 1993. J.W. Lloyd. Foundations of logic programming. Springer-Verlag, Berlin, second edition, 1987.

108 13. 14.

15.

16. 17. 18.

19. 20.

Paolo Mascellani and Dino Pedreschi P. Mascellani. Declarative Veriﬁcation of General Logic Programs. In Proceedings of the Student Session, ESSLLI-2000. Birmingham UK, 2000. P. Mascellani and D. Pedreschi. Proving termination of prolog programs. In Proceedings 1994 Joint Conf. on Declarative Programming GULP-PRODE ’94, pages 46–61, 1994. P. Mascellani and D. Pedreschi. Total correctness of prolog programs. In F.S. de Boer and M. Gabbrielli, editors, Proceedings of the W2 Post-Conference Workshop ICLP’94. Vrije Universiteit Amsterdam, 1994. D. Pedreschi. Veriﬁcation of Logic Programs. In M. I. Sessa, editor, Ten Years of Logic Programming in Italy, pages 211–239. Palladio, 1995. D. Pedreschi and S. Ruggieri. Veriﬁcation of Logic Programs. Journal of Logic Programming, 39 (1-3):125-176, April 1999 S. Ruggieri. Proving (total) correctness of prolog programs. In F.S. de Boer and M. Gabbrielli, editors, Proceedings of the W2 Post-Conference Workshop ICLP’94. Vrije Universiteit Amsterdam, 1994. J.D. Ullman. Principles of Database and Knowledge-base Systems, Volume I. Principles of Computer Science Series. Computer Science Press, 1988. J.D. Ullman. Principles of Database and Knowledge-base Systems, Volume II; The New Technologies. Principles of Computer Science Series. Computer Science Press, 1989.

Key Constraints and Monotonic Aggregates in Deductive Databases Carlo Zaniolo Computer Science Department University of California at Los Angeles Los Angeles, CA 90095 [email protected] http://www.cs.ucla.edu/∼zaniolo

Abstract. We extend the ﬁxpoint and model-theoretic semantics of logic programs to include unique key constraints in derived relations. This extension increases the expressive power of Datalog programs, while preserving their declarative semantics and eﬃcient implementation. The greater expressive power yields a simple characterization for the notion of set aggregates, including the identiﬁcation of aggregates that are monotonic with respect to set containment and can thus be used in recursive logic programs. These new constructs are critical in many applications, and produce simple logic-based formulations for complex algorithms that were previously believed to be beyond the realm of declarative logic.

1

Introduction

The basic relational data model consists of a set of tables (or base relations) and of a query language, such as SQL or Datalog, from which new relations can be derived. Unique keys can be declared to enforce functional dependency constraints on base relations, and their important role in database schema design has been recognized for a long time [1,28]. However, little attention has been paid so far to the use of unique keys, or functional dependencies, in derived relations. This paper shows that keys in derived relations increase signiﬁcantly the expressive power of the query languages used to deﬁne such relations and this additional power yields considerable beneﬁts. In particular, it produces a formal treatment of database aggregates, including user-deﬁned aggregates, and monotonic aggregates, which can be used without restrictions in recursive queries to express complex algorithms that were previously considered problematic for Datalog and SQL.

2

Keys on Derived Relations

For example, consider a database containing relations student(Name, Major), and professor(Name, Major). In fact, let us consider the following microcollege example that only has three facts: A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 109–134, 2002. c Springer-Verlag Berlin Heidelberg 2002

110

Carlo Zaniolo

student(JimBlack, ee).

professor(ohm, ee). professor(bell, ee).

Now, the rule is that the major of a student must match his/her advisor’s main area of specialization. Then, eligible advisors can be computed as follows: elig adv(S, P) ← student(S, Majr), professor(P, Majr). Now the answer to a query ?elig adv(S, P) is {elig adv( JimBlack, ohm), elig adv( JimBlack, bell)} But, a student can only have one advisor. We can express this constraint by requiring that the ﬁrst argument be a unique key for the advisor relation. We denote this constraint by the notation unique key(advisor, [1])! Thus, the ﬁrst argument of unique key speciﬁes the predicate restricted by the key, and the second argument gives the list of the argument positions that compose the key. An empty list denotes that the derived relation can only contain a single tuple. The exclamation mark is used as the punctuation mark for key constraints. We can now write the following program for our microcollege: Example 1. For each student select one advisor from professors in the same area unique key(advisor, [1])! advisor(S, P) ←student(S, Majr), professor(P, Majr). student(JimBlack, ee). professor(ohm, ee). professor(bell, ee). Since the key condition ensures that there is only one professor in the resulting advisor table, our query has two possible answers. One is the set {advisor(JimBlack, ohm)} and the other is the set: {advisor(JimBlack, bell)} In the next section, we show that positive programs with keys can be characterized naturally by ﬁxpoint semantics containing multiple canonical answers; in Section 4, we show that their meaning can also be modelled by programs with negated goals under stable models semantics. Let us consider now some examples that provide a ﬁrst illustration of the expressive power brought to logic programming by keys in derived relations. The following program constructs a spanning tree rooted in node a, for a graph stored in a binary relation g as follows:

Key Constraints and Monotonic Aggregates in Deductive Databases

111

Example 2. Computing spanning trees unique key(tree, [2])! tree(root, a). tree(Y, Z) ← tree(X, Y), g(Y, Z). g(a, b). g(b, c). g(a, c). Two diﬀerent spanning trees can be derived, as follows: {tree(root, a), tree(a, b), tree(b, c)} {tree(root, a), tree(a, b), tree(a, c)} More than one key can be declared for each derived relation. For instance, let us add a second key, unique key(tree, [1]), to the previous graph example. Then, the result may no longer be a spanning tree; instead, it is a simple path, where for each source node, there is only one sink node and vice versa: Example 3. Computing simple paths unique key(spath, [1])! unique key(spath, [2])! spath(root, X) ←g(X, Y). spath(Y, Z) ← spath(X, Y), g(Y, Z). freenode ← g( , Y), ¬spath( , Y). The last rule in Example 3, above, detects whether any node remains free, i.e., whether there is a node not touched by the simple path. Now, a query on whether, for some simple path, there is no free node (i.e., is ¬freenode true?) can be used to decide the Hamiltonian path problem for our graph; this is an N P-complete problem. An equivalent way to pose the same question is asking whether freenode is true for all solutions. A system that generates all possible paths and returns a positive answer when f reenode holds for all paths implements an all-answer semantics. This example illustrates how exponential problems can be expressed in Datalog with keys under this semantics [14]. Polynomial time problems, however, are best treated using single-answer semantics, since this can be supported in polynomial time for Datalog programs with key constraints and stratiﬁed negation, as discussed later in this paper; moreover, these programs can express all the queries that are polynomial in the size of the database—i.e., the queries in the class DB-P T IM E [1]. Under singleanswer semantics, a deductive system is only expected to compute one out of the many existing canonical models for a program, and return an answer based on this particular model. For certain programs, this approach results in diﬀerent query answers being returned for diﬀerent canonical models computed by

112

Carlo Zaniolo

the system—nondeterministic queries. For other programs, however, the query answer remains the same for all canonical models—deterministic queries. This is, for instance, the case of the parity query below, which determines whether a non-empty database relation b(X) has an even number of tuples: Example 4. Counting mod 2 unique key(chain, [1])! unique key(chain, [2])! chain(nil, X) ← b(X). chain(X, Y) ← chain( , X), b(Y). ca(Y, odd) chain(nil, Y) ca(Y, even) ← ca(X, odd), chain(X, Y). ca(Y, odd) ← ca(X, even), chain(X, Y). mod2(Parity) ← ca(Y, Parity), ¬chain(Y, ). Observe that this program consists of three parts. The ﬁrst part is the chain rules that enumerate the elements of b(X) one-by-one. The second part is the ca rules that perform a speciﬁc aggregate-like computation on the elements of chain—i.e., the odd/even computation for the parity query. The third part is the mod2 rule that uses negation to detect the element of the chain without a successor, and to return the aggregate value ‘odd’ or ‘even’ from that of its ﬁnal element. We will later generalize this pattern to express the computation of generic aggregates. Observe that the query in Example 4 is deterministic, inasmuch as the answer to the parity question ?mod2(even) is independent of the particular chain being constructed, and only depends on the length of this chain, which is determined by the cardinality of b(x). The parity query is a well-known polynomial query that cannot be answered by Datalog with stratiﬁed negation under the genericity assumption [1]. Furthermore, the chain predicate illustrates how the elements of a domain can be arranged in a total order; we thus conclude that negationstratiﬁed Datalog with key constraints can express all DB-P T IM E queries [1]. In a nutshell, key constraints under single answer semantics extend the expressive power of logic programs, and ﬁnd important new applications. Of particular importance is the deﬁnition of set-aggregates. While aggregates have been used extensively in database applications, particularly in decision support and data mining applications, a general treatment of this fundamental concept had, so far, been lacking and is presented in this paper. 2.1

Basic Definitions

We assume that the reader is familiar with the relational data model and Datalog [1,36]. A logic program P/K consists of a set of rules, P , and a set of key constraints K; each such a constraint has the form unique key(q, γ), where q is the name of the predicate in P and γ is a subset of the arguments of q. Let I be an

Key Constraints and Monotonic Aggregates in Deductive Databases

113

interpretation of P ; we say that I satisﬁes the constraint unique key(q, γ), when no two atoms in I are identical in all their γ arguments. The notation I |= K will be used to denote that I satisﬁes every key constraint in K. The basic semantics of a positive Datalog program P consists of evaluating “in parallel” all applicable instantiations of P ’s rules. This semantics is formalized by the Immediate Consequences Operator, TP , that deﬁnes a mapping over the (Herbrand) interpretations of P , as follows: TP (I) = { A | A ← B1 , . . . , Bn ∈ ground(P ) ∧ B1 ∈ I ∧ . . . ∧ Bn ∈ I }. A rule r ∈ ground(P ) is said to be enabled by the interpretation I when all its goals are contained in I. Thus the operator TP (I) returns the set of the heads of rules enabled by I. The upward powers of TP starting from an interpretation I are deﬁned as follows: TP↑0 (I) = I ↑(i+1)

TP

(I) = TP (TP↑i (I)), ↑i TP↑ω (I) = TP (I).

for i ≥ 0

i≥0

The semantics of a positive program is deﬁned by the least ﬁxpoint of TP , denoted lf p(TP ), which is also equal to the least model of P , denoted MP [29]. The least ﬁxpoint of Tp can be computed as the ω-power of TP applied to the empty set: i.e., lf p(Tp ) = TP↑ω (∅). The inflationary version of the TP operator is denoted TP and deﬁned as follows: TP (I) = TP (I) ∪ I For positive programs, we have: TP↑ω = T↑ω P = MP = lf p(TP ) = lf p(TP ) The equivalence of model-theoretic and ﬁxpoint semantics no longer holds in Datalog¬ programs, which allow the use of negated goals in rules. Various semantics have therefore been proposed for Datalog¬ programs. For instance, the inflationary semantics, which adopts T↑ω P as the meaning of a program P , can be implemented eﬃciently but lacks desirable logical properties [1]. On the other hand, stratiﬁed negation is widely used and combines desirable computational and logical properties [22]; however, stratiﬁed negation severely restricts the class of programs that one can write. Formal semantics for more general classes of programs are also available [10,30,2]. Because of its generality and support for nondeterminism, we will use here the stable model semantics, that is deﬁned via a stability transformation [10], as discussed next. Given an interpretation I and a Datalog¬ program P , the stability transformation derives the positive program groundI (P ) by modifying the rules of ground(P ) as follows:

114

Carlo Zaniolo

– drop all clauses with a negative literal ¬A in the body with A ∈ I, and – drop all negative literals in the body of the remaining clauses. Next, an interpretation M is a stable model for a Datalog¬ program P iﬀ M is the least model of the program groundM (P ). In general, Datalog¬ programs may have zero, one, or many stable models. We shall see how the multiplicity of stable models can be exploited to give a declarative account of non-determinism.

3

Fixpoint Semantics

We use the notation P/K to denote a logic program P constrained by the set of unique keys K. We make no distinction between interpretations of P and interpretations of P/K; thus every I ⊆ BP is an interpretation for P/K. Since a program with key constraints can have multiple interpretations, we will now introduce the concept of family of interpretations. A family of interpretations for P is deﬁned as a non-empty set of maximal interpretations for P . More formally: Definition 1. Let be a nonempty set of interpretations for P where no element in is a subset of another. Then is called a family of interpretations for P . The set of families of interpretations for P will be denoted by f ins(P ). For instance, let P be the program: a. b ← a. Then f ins(P ) consists of the following families of interpretations: 1. 2. 3. 3. 4. 3.1

{{}} {{a}} {{b}} {{a}, {b}} {{a, b}} Lattice

The f ins(P ) can be partially ordered as follows: Definition 2. Let 1 and 2 be two elements of f ins(P ). If ∀I1 ∈ 1 , ∃I2 ∈ 2 s.t. I1 ⊆ I2 , then we say that 1 is a subfamily of 2 and write 1 2 . Now, (, f ins(P )) is a partial order, and also a complete lattice, with least upper bound (lub): 1 2 = {I ∈ 1 |¬∃I2 ∈ 2 s.t. I2 ⊃ I} ∪ {I ∈ 2 |¬∃I1 ∈ 1 s.t. I1 ⊇ I}

Key Constraints and Monotonic Aggregates in Deductive Databases

115

The greatest lower bound (glb) is: 1 2 = {I1 ∩I2 |I1 ∈ 1 , I2 ∈ 2 and ¬(∃I ∈ 1 , ∃I ∈ 2 s.t. I ∩I ⊃ I1 ∩I2 )} These two operations are easily extended to families with inﬁnitely many elements; thus we have a complete lattice, with {BP } as top and {∅} as bottom. 3.2

Fixpoint Semantics of Positive Programs with Keys

Let us consider ﬁrst the case of positive programs P without key constraints, by revisiting the computation of the successive power of TP , where TP denotes the immediate consequence operator for P . We will also use the inflationary version of this operator, which was previously deﬁned as TP (I) = TP (I) ∪ I. The computation TP↑ω (∅) = T↑ω P (∅) generates an ascending chain; if I is the result obtained at the last step, the application of TP (I) adds to the old I the set of new tuples TP (I) − I, all at once. We next deﬁne an operator where the new consequences are added one by one; this will be called the Atomic Consequence Operator (ACO), TP , which is a mapping on families of interpretations. For a singleton set {I}, TP is deﬁned as follows: TP ({I}) = {I | ∃x ∈ [TP (I) − I] s.t. I = I ∪ {x}} {I} Then, for a family of sets, , we have TP () =

TP ({I})

I∈

Therefore, our new operator adds to I a single new consequence atom from TP (I) − I, when this is not empty; thus, it produces a family of interpretations from a singleton interpretation {I}. When TP (I) = I, then, by the above deﬁnition, TP ({I}) = {I}. The following result follows immediately from the deﬁnitions: Proposition 1. Let P be a positive logic program without keys. Then, TP defines a mapping that is monotonic and also continuous. Since we have a continuous mapping in a complete lattice, the well-known Knaster-Tarski theorem, and related ﬁxpoint results, can be used to conclude that there always exists solutions of the ﬁxpoint equation = TP (), and there also exists the least of such solutions, called the least fixpoint of TP . The least ﬁxpoint of TP , denoted lf p(TP ), can be computed as the ω-power of TP starting from the bottom element {∅}. Proposition 2. Let P be a positive logic program without key constrains. Then, = TP () has a least fixpoint solution denoted lf p(TP ), where: ↑j TP ({∅}) = {lf p(TP )} lf p(TP ) = TP↑ω ({∅}) = 0<j

116

Carlo Zaniolo

Thus for a positive program without keys, the least ﬁxpoint of the TP provides an equivalent characterization of the semantics of positive logic programs since the least ﬁxpoint of TP is the singleton set containing the least ﬁxpoint of TP . We now consider the situation of a positive program with keys P/K. The Immediate Consequence Operator (ICO) for this program is obtained by simply ignoring the keys: TP/K (I) = TP (I). The ACO is deﬁned as follows: Definition 3. Let TP/K be a logic program with key constraints, and let {I} ∈ f ins(P ) and ∈ f ins(P ). Then, TP/K ({I}) and TP/K () are defined as follows: TP/K ({I}) = {I | ∃x ∈ [TP (I) − I] s.t. I = I ∪ {x} and I |= K} {I}

TP/K () =

TP ({I})

I∈

For instance, if T denotes the ACO for our tiny college example, thenT ↑1 ({∅}) is simply a family with three singleton sets, one for each fact in the program: T ↑1 ({∅}) = { {prof essor(ohm, ee)}, {prof essor(bell, ee)}, {student( JimBlack , ee)} }

Thus, T ↑2 ({∅}) consists of pairs taken from the three program facts: T ↑2 ({∅}) = { {prof essor(bell, ee), prof essor(ohm, ee)} {student( JimBlack , ee), prof essor(bell, ee)}, {student( JimBlack , ee), prof essor(ohm, ee)}} From the ﬁrst pair, above, we can only obtain a family containing the three original facts; but from the second pair and third pair we obtain two diﬀerent advisors. In fact, we obtain: T ↑3 ({∅}) = { {student( JimBlack , ee), prof essor(bell, ee), prof essor(ohm, ee)}, {student( JimBlack , ee), prof essor(bell, ee), advisor( JimBlack , bell)}, {student( JimBlack , ee), prof essor(ohm, ee), advisor( JimBlack , ohm)} }

In the next step, these three parallel derivations converge into the following two sets: T ↑4 ({∅}) = { { student( JimBlack , ee), prof essor(bell, ee), prof essor(ohm, ee), advisor( JimBlack , bell)} { student( JimBlack , ee), prof essor(bell, ee), prof essor(ohm, ee), advisor( JimBlack , ohm)}}

Key Constraints and Monotonic Aggregates in Deductive Databases

117

No set can be further enlarged at the next step, given that the addition of a new advisor would violate the key constraints. So we have T ↑5 ({∅}) = T ↑4 ({∅}), and we have reached the ﬁxpoint. As illustrated by this example, although the operator TP/K is not monotonic, the ω-power of TP/K has desirable characteristics that makes it the natural choice for canonical semantics of positive programs with keys. In fact we have the following property: Proposition 3. Let P/K be a positive program with key constraints. Then, ↑ω ↑ω ({∅}) is a fixpoint for TP/K , and each {I} ∈ TP/K ({∅}) is a minimal fixTP/K point for TP/K . ↑ω ({∅}) can only generate elements which Proof: The application of TP/K to TP/K

↑ω were generated in the ω-derivation. Thus TP/K ({∅}) is a ﬁxpoint. Now, let

↑ω ({∅}). Clearly, TP/K ({I}) = {I}, otherwise the previous property {I} ∈ TP/K does not hold. Thus {I} is a ﬁxpoint. To prove that it is minimal, let J ⊂ I. If we trace the derivation chain for {I}, we ﬁnd a predecessor of {I } where I is not a subset of J, but its immediate predecessor, I is. Now let {x} = I − I , then J ∪ {x} does not violate the key constraints (since its superset I does not), 2 and {x} is in TP (J). Thus {J} cannot be a ﬁxpoint.

Therefore, under the all-answer semantics, we expect the whole family ↑ω TP/K ({∅}) to be returned as the canonical answer, whereas under a single-answer

↑ω semantics any of the interpretations in TP/K ({∅}) is accepted as a valid answer. In the next section, we introduce an equivalent semantics for our programs with keys using the notion of stable models.

4

Stable-Model Semantics

Programs with keys have an equivalent model-theoretic semantics. We will next ↑ω show that TP/K ({∅}) corresponds to the family of stable models for the program f oe(P/K) obtained from P/K by expressing the key constraints by negated goals. The stable model semantics also extends naturally to stratiﬁed programs with key constraints. 4.1

Positive Programs with Key Constraints

An equivalent characterization of a positive programs P/K can be obtained by introducing negated goals in the rules of P to enforce the key constraints. The program obtained by this transformation will be denoted f oe(P/K), and called the first order equivalent of P/K. The program f oe(P/K) so obtained always has a formal meaning under stable model semantics [10]. Take, for instance, our advisor example; the rule in Example 1 can also be expressed as follows:

118

Carlo Zaniolo

Example 5. The Advisor Example 1 Expressed Using Negation advisor(S, P) ←

student(S, Majr, Year), professor(P, Majr), ¬kviol advisor(S, P). kviol advisor(S, P) ← advisor(S, P), P = P .

Therefore, we allow a professor P to become the advisor of a student S provided that no other P = P is already an advisor of S. In general, if q is the name of a predicate subject to a key constraint, we use a new predicate kviol q to denote the violation of key constraints on q; then, we add a kviol q rule for each key declared for q. Finally, a negated kviol q goal is added to the original rules deﬁning q. For instance, the simple path program of Example 3 can be re-expressed in the following way: Example 6. The simple-path program of Example 3 Expressed Using Negation spath(root, X) ← g(X, Y), ¬kviol spath(root, X). spath(Y, Z) ← spath(X, Y), g(Y, Z), ¬kviol spath(Y, Z). kviol spath(X1, X2) ← spath(X1, Y2), X2 = Y2. kviol spath(X1, X2) ← spath(Y1, X2), X1 = Y1. Derivation of f oe(P/K). In general, given a program P/K constrained with keys, its ﬁrst order equivalent f oe(P/K) is computed as follows: 1. For each rule r, with head q(Z1 , . . . , Zn ), where q is constrained by some key, add the goal ¬kviol q(Z1 , . . . , Zn ) to r, 2. For each unique key(q, ArgList)! in K, where n is the arity of q, add a new rule, kviol q(X1 , . . . , Xn ) ← q(Y1 , . . . , Yn ), Y1 θ1 X1 , . . . , Yn θn Xn . where θj denotes the equality symbol ‘=’ for every j in ArgList, and the inequality symbol ‘=’ for every j not in ArgList. For instance, the f oe of our advisor example is: advisor(S, P) ←

student(S, Majr, Year), professor(P, Majr), ¬kviol advisor(S, P). kviol advisor(X1, X2 ) ← advisor(Y1, Y2 ), X1 = Y1 , X2 = Y2 . This transformation does in fact produce the rules of Example 6, after we replace equals with equals and eliminate all equality goals. The newly introduced predicates with the preﬁx kviol will be called key-violation predicates. Stable models provide the formal semantics for our f oe programs: Proposition 4. Let P/K be a positive logic program with keys. Then f oe(P/K) has one or more stable models.

Key Constraints and Monotonic Aggregates in Deductive Databases

119

A proof for this proposition can be easily derived from [25,13], where the same transformation is used to deﬁne the formal semantics of programs with the choice construct. With I an interpretation of f oe(P ), let pos(I) denote the interpretation obtained by removing all the key-violation atoms from I and leaving the others unchanged. Likewise, if is a family of interpretation of f oe(P ), then we deﬁne: pos() = pos(I) I∈

Then, the following theorem elucidates the equivalence between the two semantics: Proposition 5. Let P/K be a positive program, and Σ be the set of stable ↑ω ({∅}). models for f oe(P/K). Then pos(Σ) = TP/K Proof: Let I ∈ TP↑ω ({∅}), and PI = groundI (f oe(P/K)) be the program produced by the stability transformation on f oe(P/K). It suﬃces to show that ↑ω TP↑ω ({∅}) = I, i.e., that {I} = TP↑ω ({∅}). Now, take a derivation in TP/K ({∅}) I I producing I; we can ﬁnd an identical derivation in TP↑ω ({∅}) . This concludes I our proof. 2

4.2

Stratification

The notion of stratiﬁcation signiﬁcantly increases the expressive power of Datalog, while retaining the declarative ﬁxpoint semantics of programs. Consider ﬁrst the notion of stratiﬁcation with respect to negation for programs without key constraints: Definition 4. Let P be a program with negated goals, and σ1 , . . . , σn be a partition of the predicate names in P . Then, P is said to be stratified, when for each rule r ∈ P (with head hr ) and each goal gr in r, the following property holds: 1. stratum(hr ) > stratum(gr ) if gr is a negated goal 2. stratum(hr ) ≥ stratum(gr ) if gr is a positive goal. Therefore, a stratiﬁed program P can be viewed as a stack of rule layers, where the higher layers do not inﬂuence the lower ones. Thus the correct semantics can be assigned to a program by starting from the bottom layer and proceeding upward, with the understanding that computation for the higher layers cannot aﬀect lower ones. The computation can be implemented using the ICO TP , which, in the presence of negated goals, is generalized as follows. A rule r ∈ ground(P ) is said to be enabled by an interpretation I when all of its positive goals are in I and none of its negated goals are in I. Then, TP (I) is deﬁned as containing the heads of all rules in ground(P ) that are enabled by I. (This change automatically adjusts the deﬁnitions of T and T that are based on TP .)

120

Carlo Zaniolo

Therefore, let I[≤ j] and P [≤ j], respectively, denote the atoms in I and the rules in P whose head belongs to strata ≤ j. Also let P [j] denote the set of rules in P whose head belongs to stratum j. Then, we observe that for a stratiﬁed program P , the mapping deﬁned by P [j] (i.e., TP [j] ) is monotonic with respect to I[j]. Thus, if Ij−1 is the meaning of P [≤ j − 1], then T↑ω P [j] (Ij−1 ) is the meaning of P [≤ j]. Thus, let P be a program stratiﬁed with respect to negation and without key constraints; then the following algorithm inductively constructs the iterated ﬁxpoint for TP (and TP ): Iterated Fixpoint computation for TP , where P is stratiﬁed with strata σ1 , . . . , σn . 1. Let I0 = ∅; 2. For j = 1, . . . , n, let Ij = T↑ω P [j] (Ij−1 ) For every 1 ≤ j ≤ n, Ij = In [≤ j] is a minimal ﬁxpoint of P [≤ j]. The interpretation In obtained at the end of this computation is called the iterated ﬁxpoint for TP and deﬁnes the meaning of the program P . It is well-known that the iterated ﬁxpoint for a stratiﬁed program P is equal to P ’s unique stable model [36]. These notions can now be naturally extended to programs with key constraints. A program P/K is stratiﬁed whenever its keyless counterpart P is stratiﬁed. Let P/K[j] denote the rules with head in the j th stratum, along with the key constraints on their head predicates; also, let P/K[≤ j] denote the rules with head in strata lower than the j th stratum, along with their applicable key constraints. Finally, let: [≤ j] =

I[≤ j]

I∈

The notion of T can be extended in natural fashion to stratiﬁed programs. If j−1 is the meaning of P/K[≤ j − 1], then TP↑ω [j] (j−1 ) is the natural meaning of P/K[≤ j]. Thus we have the following extension of the iterated ﬁxpoint algorithm: Iterated Fixpoint Computation for TP/K where P/K is stratiﬁed with strata σ1 , . . . , σn . 1. Let 0 = {∅}; ↑ω (j−1 ) 2. For j = 1, . . . , n, let j = TP/K[j] The family of interpretations n obtained from this computation will be called the iterated fixpoint for TP/K . The iterated ﬁxpoint for TP/K deﬁnes the meaning of P/K; it has the property that, for each 1 ≤ j ≤ n, each member in j = n [≤ j] is a minimal ﬁxpoint for TP/K[≤j] .

Key Constraints and Monotonic Aggregates in Deductive Databases

121

Stable Model Semantics for Stratified Programs. Every program P that is stratiﬁed with respect to negation has a unique stable model that can be computed by the iterated ﬁxpoint computation for TP previously discussed. Likewise, every stratiﬁed program P/K can be expanded into its ﬁrst order equivalent f oe(P/K). Then, it can be shown that (i) f oe(P/K) always has one or more stable models, and (ii) if Σ denotes the family of its stable models, then pos(Σ) coincides with the iterated ﬁxpoint of TP/K .

5

Single-Answer Semantics and Nondeterminism

↑ω The derivation TP/K ({∅}) can be used to compute in parallel all the stable models for a positive program f oe(P/K). In this computation, each application of TP/K expands in parallel all interpretations in the current family, by the addition of a single new element to each interpretation. In [38], we discuss condensed derivations based on TP/K , which accelerate the derivation process by adding several new elements at each step of the computation. This ensures a faster convergence toward the ﬁnal result, while still computing all stable models at once. Even with condensed derivations, the computation of all stable models requires exponential time, since the number of such models can be exponential in the size of the database. This, computational complexity might be acceptable when dealing with N P-complete problems, such as deciding the existence of an Hamiltonian path. However, in many situations involving programs with multiple stable models, only one such model, not all of them, is required in practice. For instance, this is the case of Example 4, where we use choice to enumerate into a chain the elements of a set one by one, with the knowledge that the even/odd parity of the whole set only depends on its cardinality, and not on the particular chain used. Therefore for Example 4, the computation of any stable model will suﬃce to answer correctly the parity query. Since this situation is common for many queries, we need eﬃcient operators for computing a single stable model. Even with N P-complete problems, it is normally desirable to generate the stable models in a serial rather than parallel fashion. For instance, for the Hamiltonian circuit problem of Example 3, we can test if the last generated model satisﬁes the desired property (i.e., if there is any freenode), and only if this test fails, proceed with the generation of another model— normally, calling on some heuristics to aid in the search for a good model. On the average, this search succeeds without having to produce an exponential number of stable models, since exponential complexity only represents the worst-case behavior for many N P-complete algorithms. Now, the computation of a single stable model is in general N P-hard [26]; however, this computation for a program f oe(P/K) derived from one with key constraints can be performed in polynomial time, and, as we describe next, with minimal overhead with respect to the standard ﬁxpoint computation. Therefore, ↑ω ({∅}), we next concentrate on the problem of generating a single element in TP/K and on expressing polynomial-time queries using this single-answer semantics.

122

Carlo Zaniolo

We deﬁne next the notions of soundness and completeness for nondeterministic operators to be used to compute an element in TP↑ω ({∅}). Definition 5. Let P/K be a logic program with keys, and C be a class of functions on interpretations of P . Then we define the following two properties: 1. Soundness. A function τ ∈ C will be said to be sound for a program P/K ↑ω when τ ↑ω (∅) ∈ TP/K ({∅}). The function class C will be said to be sound when all its members are sound. 2. Completeness. The function class C will be said to be complete for a program ↑ω ({∅}) there exists some τ ∈ C such that: TP/K when for each M ∈ TP/K ↑ω τ (∅) = M . In situations where any answer will solve the problem at hand, there is no point in seeking completeness and we can limit ourselves to classes of functions that are sound, and eﬃcient to compute, even if completeness is lost; eager derivations discussed next represent an interesting class of such functions. Definition 6. Let P/K be a program with key constraints, and let Γ (I) be a function on interpretations of P . Then, Γ (I) will be called an eager derivation operator for P/K if it satisfies the following three conditions: 1. I ⊆ Γ (I) ⊆ TP (I) 2. Γ (I) |= K 3. Every subset of TP (I) that is a proper superset of Γ (I) violates some key constraint in K. Let CΓ be the class of eager derivation operators for a given program P/K. Then it is immediate to see that CΓ is sound for all programs. Eager derivation operators can be implemented easily. Their implementation only requires tables to memorize atoms previously derived and compare the new values against previous ones to avoid key violations. Inasmuch as table-based memorization is already part of the basic mechanism for the computation of ﬁxpoints in deductive databases, key constraints are easy to implement. A limitation of eager derivation operators is that they do not form a complete class for all positive programs with key constraints. This topic is discussed in [38], where classes of operators which are both sound and complete are also discussed. However, in the rest of this paper, we only use key constraints to deﬁne chain rules, such as those in Example 4; for these rules, the eager derivations are complete—in addition to being sound and eﬃciently computable.

6

Set Aggregates in Logic

The additional expressive power brought to Datalog by key constraints ﬁnds many uses; here we employ it to achieve a formal characterization of database aggregates, thus solving an important open problem in database theory and logic

Key Constraints and Monotonic Aggregates in Deductive Databases

123

programming. In fact, the state-of-the-art characterization of aggregates relies on the assumption that the universe is totally ordered [36]. Using this assumption, the atoms satisfying a given predicate are chained together in ascending order, starting from the least value and ending with the largest value. Unfortunately, this solution has four serious drawbacks, since (i) it compromises data independence by violating the genericity property [1], (ii) it relies on negation, thus infecting aggregates with the nonmonotonic curse, (iii) it is often ineﬃcient since it requires the data to be sorted before aggregation, and (iv) it cannot be applied to more advanced forms of aggregation, such as on-line aggregates and rollups, that are used in decision support and other advanced applications [33]. Online aggregation [8], in particular, cannot be expressed under the current approach that relies on a totally ordered universe to sort the elements of the set being processed, starting from its least element. In fact, at the core of on-line aggregation, there is the idea of returning partial results after visiting a proper subset of the given dataset, while the rest is still unknown. Now, it is impossible to compute the least element of a set when only part of it is known. We next show that all these problems ﬁnd a simple solution once key constraints are added to Datalog. For concreteness, we use the aggregate constructs of LDL++ [4], but very similar syntactic constructs are used by other systems (e.g., CORAL [23]), and the semantics here proposed is general and applicable to every logic-based language and database query language. 6.1

User Defined Aggregates

Consider the parity query of Example 4. To deﬁne an equivalent parity aggregate in LDL++ the user will write the following rules: Example 7. Deﬁnition rules for the parity aggregate mod2 single(mod2, , odd). multi(mod2, X, odd, even). multi(mod2, X, even, odd). freturn(mod2, , Parity, Parity). These rules have the same function as the last four rules in Example 4. The single rule speciﬁes how to initialize the computation of the mod2 aggregate by specifying its value on a singleton set (same as the ﬁrst ca rule in the example). The two multi rules instead specify how the new aggregate value (the fourth argument) should be updated for each new input value (second argument), given its previous value (third argument). (Thus these rules perform the same function as the second and the third of the ca rules in Example 4.) The freturn rule speciﬁes (as fourth argument) the value to be returned once the last element in the set is detected (same as the last rule in Example 4). For mod2, the value returned is simply taken from the third argument, where it was left by the multi rule executed on the last element of the set. Two important observations can therefore be made:

124

Carlo Zaniolo

1. We have described a very general method for deﬁning aggregates by specifying the computation to be performed upon (i) the initial value, (ii) each successive value, and (iii) the ﬁnal value in the set. This paradigm is very general, and also describes the mechanism for introducing user deﬁned aggregates (UDAs) used by SQL3 and in the AXL system [33]. 2. The correspondence between the above rules and those of Example 4 outlines the possibility of providing a logic semantics to UDAs by simply expanding the single, multi, and freturn rules into an equivalent logic program (using the chain rules) such as that of Example 4. The rules in Example 7 are generic, and can be applied to any set of facts. To reproduce the behavior of Example 4, they must be applied to b(X). In LDL++ this is speciﬁed by the aggregate-invocation rule: p(mod2X ) ← b(X). that speciﬁes that the result of the computation of mod2 on b(X) is returned as the argument of a predicate, that our user has named p. There has been much recent interest in online aggregates [8], which also ﬁnd important applications in logic programming, as discussed later in this paper. For instance, when computing averages on non-skewed data, the aggregate often converges toward the ﬁnal value long before all the elements in the set are visited. Thus, the system should support early returns to allow the user to check convergence and stop the computation as soon as the series of successive values has converged within the prescribed accuracy [8]. UDAs with early returns can be deﬁned in LDL++ through the use of ereturn rules. Say, for instance, that we want to deﬁne a new aggregate myavg, and apply it to the elements of d(Y), and view the results of this computation as a predicate q. Then, the LDL++ programmer must specify one aggregate-application rule, and several aggregate-definition rules. For instance, the following is an aggregate application rule: r : q(myavgY) ← d(Y). The . . . notation in the head of r denotes an aggregate; this rule speciﬁes that the deﬁnition rules for myavg must be applied to the stream of Y-values that satisfy the body of the rule. The aggregate deﬁnition rules include: (i) single rule(s) (ii) multi rule(s), (iii) freturn rule(s) for ﬁnal returns and/or (iv) ereturn rule(s) for early returns. All four kinds of rules are used in the following deﬁnition of myavgr: single(myavg, Y, cs(1, Y)). multi(myavg, Y, cs(Cnt, Sum), cs(Cnt1, Sum1)) ← Cnt1 = Cnt + 1, Sum1 = Sum + Y. freturn(myavg, Y, cs(Cnt, Sum), Val) ← Val = Sum/Cnt.

Key Constraints and Monotonic Aggregates in Deductive Databases

125

ereturn(myavg, X, (Sum, Count), Avg) ← Count mod 100 = 0, Avg = Sum/Count. Observe that the ﬁrst argument in the head of the single, multi, ereturn, and freturn rules contains the name of the aggregate: therefore, these aggregate deﬁnition rules can only be used by aggregate application rules that contain myavg . . . in the head. The second argument in the head of a single or multi rule holds the ‘new’ value from the input stream, while the last argument holds the partial value returned by the previous computation. Thus, for averages, the last argument should hold the pair cs(Count, Sum). The single rule speciﬁes the value of the aggregate for a singleton set (containing the ﬁrst value in the stream); for myavg, the singleton rule must return cs(1, Y). The multi rules prescribe an inductive computation on a set with n + 1 elements, by specifying how the n + 1th element in the stream is to be combined with the value returned (as third argument in multi) by the computation on the ﬁrst n elements. For myavg, the count is increased by one and the sum is increased by the new value in the stream. The freturn rules specify how the ﬁnal value(s) of the aggregate are to be returned. For myavg, we return the ratio of sum and count. The ereturn rules specify when early returns are to be produced and what are their values. In particular for myavg, we produce early returns every 100 elements in the stream, and the value produced is the current ratio sum/count—online aggregation. 6.2

Semantics of Aggregates

In general, the semantics of an aggregate application rule r r : q(myavgY) ← d(Y). can be deﬁned by expanding it into its key-constrained equivalent logic program, denoted kce(r), which contains the following rules: 1. A main rule p(Y) ← results(avg, Y). where results(avg, Y) is derived from d(Y) by a program consisting of: 2. The chain rules that link the elements of d(Y) into an order-inducing chain ( nil is a special value not in d(Y)), unique key(chainr, [1])! unique key(chainr, [2])! chainr(nil, Y) ← d(Y). chainr(Y, Z) ← chainr(X, Y), d(Z). 3. The cagr rules that perform the inductive computation: cagr(AgName, Y, New) ← chainr(nil, Y), Y = nil, single(myagr, Y, New). cagr(AgName, Y2, New) ← chainr(Y1, Y2), cagr(AgName, Y1, Old), multi(AgName, Y2, Old, New).

126

Carlo Zaniolo

Thus, the cagr rules are used to memorize the previous results, and to apply (i) single to the ﬁrst element of d(Y) (i.e., for the pattern chainr(nil, Y)) and (ii) multi to the successive elements. 4. The two results rules, where the ﬁrst rule produces early returns and second rule produces final returns as follows: results(AgName, Y2, New) ← chainr(Y1, Y2), cagr(AgName, Y1, Old), ereturn(AgName, Y2, Old, Yield). results(AgName, AgValue) ← chainr(X, Y), ¬chainr(Y, ), cagr(AgName, Y, Old), freturn(AgName, Y, Old, AgValue). Therefore, the ﬁrst results rule produces the early returns by applying ereturn to every element in the chain, and the second rule produces the ﬁnal returns by applying freturn on the last element in the chain (i.e., the element without a successor). In LDL++, an implicit group-by operation is performed on the head arguments not used to apply aggregates. Thus, to compute the average salary of employees grouped by Dno, the user can write: avgsal(Dno, myavgSal) ← emp(Eno, Sal, Dno). As discussed in [34], the semantics of aggregates with group-by can simply be deﬁned by including an additional argument in the predicates chainr and results to hold the group-by attributes. 6.3

Applications of User Defined Aggregates

We will now discuss the use of UDAs to express polynomial algorithms in a natural and eﬃcient way. These algorithms use aggregates in programs that yield the correct ﬁnal results unaﬀected by the nondeterministic behavior of the aggregates. Therefore, aggregate computation here uses single-answer semantics, which assures polynomial complexity. Let us consider ﬁrst uses of nonmonotonic aggregates. For instance, say that from a set of pairs such as (Name, YearOfBirth) as input, we want to return the Name of the youngest person (i.e., the person born in the latest year). This computation cannot be expressed directly as an aggregate in SQL, but can be expressed by the UDA youngest given below (in LDL++, a vector of n arguments (X1 , . . . , Xn ) is basically treated as a n-argument function with a default name). single(youngest, (N, Y), (N, Y)). multi (youngest, (N, Y), (N1, Y1), (N, Y)) ← Y ≥ Y1. multi (youngest, (N, Y), (N1, Y1), (N1, Y1)) ← Y ≤ Y1. freturn(youngest, (N, Y), (N1, Y1), N1). User-deﬁned aggregates provide a simple solution to a number of complex problems in deductive databases; due to space limitations we will here consider only simple examples—a more complete set of examples can be found in [37].

Key Constraints and Monotonic Aggregates in Deductive Databases

127

We already discussed the deﬁnition and uses of online aggregates, such as myavg that returns values every 100 samples. In a more general framework, the user would want to control how often new results are to be returned to the user, on the basis of the estimated progress toward convergence in the computation [8]. UDAs provide a natural setting for this level of control. Applications of UDAs are too many to mention. But for an example, take the interval coalescing problem of temporal databases [35]. For instance, say that from a base relation emp(Eno, Sal, Dept, (From, To)), we project out the attribute Sal and Dept; then the same Eno appears in tuples with overlapping valid-time intervals and must be coalesced. Here we use closed intervals represented by the pair (From, To) where From is the start-time, and To is the end-time. Under the assumption that tuples are sorted by increasing start-time, we can use a special coales aggregate to perform the task in one pass through the data. Example 8. Coalescing overlapping intervals sorted by start time. emp(Eno, , , (From, To)). empProj(Eno, coales(From, To)) ← single(coales, (Frm, To), (Frm, To)). multi(coales, (Nfr, Nto), (Cfr, Cto), (Cfr, Lgr)) ← Nfr ≤ Cto, larger(Cto, Nto, Lgr). multi(coales, (Nfr, Nto), (Cfr, Cto), (Cfr, Nto)) ← Nfr > Cto. ereturn(coales, (Nfr, Nto), (Cfr, Cto), (Cfr, Cto)) ← Nfr > Cto. freturn(coales, , LastInt, LastInt). larger(X, Y, X) ← X ≥ Y. larger(X, Y, X) ← X < Y. Thus, the single rule starts the coalescing process by setting the current interval equal to the ﬁrst interval. The multi rule operates as follows: when the new interval (Nfr, Nto) overlaps the current interval (Cfr, Cto) (i.e., when Nfr ≤ Cto), the two are coalesced into an interval that begins at Cfr, and ends with the larger of Nto and Cto; otherwise, the current interval is returned and the new interval becomes the current one.

7

Monotonicity

Commercial database systems and most deductive database systems disallow the use of aggregates in recursion and require programs to be stratiﬁed with respect to aggregates. This restriction is also part of the SQL99 standards [7]. However, many important algorithms, particularly greedy algorithms, use aggregates such as count, sum, min and max in a monotonic fashion, inasmuch as previous results are never discarded. This observation has inspired a significant amount of previous work seeking eﬃcient expression of these algorithms in logic [27,6,24,31,9,15]. At the core of this issue there is the characterization of programs where aggregates behave monotonically and can therefore be freely used in recursion. For many interesting programs, special lattices can be found

128

Carlo Zaniolo

in which aggregates are monotonic [24]. But the identiﬁcation of such lattices cannot be automated [31], nor is the computation of ﬁxpoints for such programs. Our newly introduced theory of aggregates provides a deﬁnitive solution to the monotonic aggregation problem, including a simple syntactic characterization to determine if an aggregate is monotonic and can thus be used freely in recursion. 7.1

Partial Monotonicity

For a program P/K, we will use the words constrained predicates and free predicates to denote predicates that are constrained by keys and those that are not. With I an interpretation, let Ic , and If , respectively, denote the atoms in I that are instances of constrained and free predicates; Ic will be called the constrained component of I, and If is called the free component of I. Then, let I and J be two interpretations such that I ⊆ J and Ic = Jc (thus If ⊆ Jf ). Likewise, each family can be partitioned into the family of its constrained components, c , and the family of its free components, f . Then, the following proposition shows that a program P/K deﬁnes a monotonic transformation with respect to the free components of families of interpretations: Proposition 6. Partial Monotonicity: Let and be two families of interpretations for a program P/K. If , while c = c then, TP/K () TP/K ( ). Proof. It suﬃces to prove the property for two singleton sets {I} and {J} where If ⊆ Jf , while Ic = Jc . Take an arbitrary I ∈ TP/K ({I}): we need to show that there exists a J ∈ TP/K ({J}) where I ⊆ J . If I ⊆ J the conclusion is trivial; else, let I = I ∪ {x}, x ∈ TP (I) − I, and I |= K. Since I is a subset of J but I is not, x is not in J, and x ∈ TP (J) − J. Also, if J = J ∪ {x}, J |= K (since Jc = Ic ). Thus, J ∈ TP/K ({J}). 2 This partial monotonicity property (i.e., monotonicity w.r.t. free predicates only) extends to the successive powers of TP/K , including its ω-power. Thus If ↑ω ↑ω , while c = c then, TP/K () TP/K ( ). This result shows that the program P/K deﬁnes a monotonic mapping from unconstrained predicates to every other predicate in the program. It is customary in deductive databases to draw a distinction between extensional information (base relations) and intensional information (derived relations). Therefore, a program can be viewed as deﬁning a mapping from base relations to derived relations. Therefore, the partial monotonicity property states that the mapping from database relations free of key constraints to derived relations is monotonic—i.e., the larger the base relations, the larger the derived relations. For a base relation R that is constrained by keys, we can introduce an auxiliary input relation RI free of key constraints, along with a copy rule that derives R from RI . Then, we can view RI as the input relation and R as a result of ﬁltering RI with the key constraints. Then, we have a monotonic mapping from the input relation RI to the derived relations in the program.

Key Constraints and Monotonic Aggregates in Deductive Databases

7.2

129

Monotonic Aggregates

Users normally think of an aggregate application rule, such as r, as a direct mapping from r’s body to r’s head—a mapping which behaves according to the rules deﬁning the aggregate. This view is also close to the actual implementation, since in a system such as LDL++ the execution of the rules in kce(r) is already built into the system. The encapsulate program for an aggregate application rule r, will be denoted 0(r) and contains all the rules in kce(r) and the single, multi, ereturn and freturn rules deﬁning the aggregates used in r. Then, the transitive mapping deﬁned by 0(r) transforms families of interpretations of the body of r to families of interpretations of the heads of rules in 0(r). With I an interpretation of the body of r (i.e., a set of atoms from predicates in the body of r), then the mapping for ↑ω ({I}), when there are no freturn rules, and is equal to the 0(r) is equal to T(r) result of the iterated ﬁxpoint of the stratiﬁed 0(r) program, otherwise. For instance, consider the deﬁnition and application rules for an online count aggregate msum: r : q(msumX ) ← p(X). single(msum, Y, Y). multi(msum, Y, Old, New) ← New = Old + Y. ereturn(msum, Y, Old, New) ← New = Old + Y. The transitive mapping established by 0(r ) can be summarized by the chainr atoms, which describe a particular sequencing of the elements in I and the aggregate values for the sequence so generated:

↑ω T(r ) ()

{{p(3)}}

{{chainr (nil, 3), q(3)}}

{{p(1), p(3)}} {{chainr (nil, 1), chainr (1, 3), q(1), q(4)}, {chainr (nil, 3), chainr (3, 1), q(3), q(4)}} ... ... Therefore, the mapping deﬁned by the aggregate rules is multivalued —i.e., from families of interpretations to families of interpretations. The ICO for the set of non aggregate rules P can also be seen as a mapping between families of interpretations by simply letting TP ({I}) = {TP (I)}. Then, the encapsulated consequence operator for a program with aggregates combines the immediate consequence operator for regular rules with the transitive consequences for the aggregate rules. Because of the partial monotonicity properties of programs with key constraints, we now derive the following property: Proposition 7. Let P be a positive program with aggregates defined without final return rules. Then, the encapsulated consequence operator for P is monotonic in the lattice of families of interpretations.

130

Carlo Zaniolo

Therefore, aggregates deﬁned without freturn rules will be called monotonic; thus, monotonic aggregates can be used freely in recursive programs. Aggregate computation in actual programs is very similar to the seminaive computation used to implement deductive databases [5,35], which is based on combining old values with new values according to rules obtained by the symbolic diﬀerentiation of the original rules. For aggregates, we can use the same framework with the diﬀerence that the rules for storing the old values and those for producing the results are now given explicitly by the programmer through the single/multi and ereturn/freturn rules in the deﬁnition. 7.3

Aggregates in Recursion

Our newly introduced theory of aggregates provides a deﬁnitive solution to the monotonic aggregation problem, with a simple syntactic criterion to decide if an aggregate is monotonic and can thus be used freely in recursion. The rule is as follows: All aggregates which are defined without any freturn rule are monotonic and can be used freely in recursive rules. The ability of freely using aggregates with early returns in programs allows us to express concisely complex algorithms. For instance, we next deﬁne a continuous count that returns the current count after each new element but the ﬁrst one (thus, it does not have a freturn since that would be redundant). single(mcount, Y, 1). multi(mcount, Y, Old, New) ← New = Old + 1. ereturn(mcount, Y, Old, New) ← New = Old + 1. Using mcount we can now code the following applications, taken from [24]. Join the Party Some people will come to the party no matter what, and their names are stored in a sure(Person) relation. But others will join only after they know that at least K = 3 of their friends will be there. Here, friend(P, F) denotes that F is P’s friend. willcome(P) ← sure(P). willcome(P) ← c friends(P, K), K ≥ 3. c friends(P, mcountF ) ← willcome(F), friend(P, F). Consider now a computation of these rules on the following database. friend(jerry, mark). friend(penny, mark). friend(jerry, jane). friend(penny, jane). friend(jerry, penny). friend(penny, tom).

sure(mark). sure(tom). sure(jane).

Key Constraints and Monotonic Aggregates in Deductive Databases

131

Then, the basic semi-naive computation yields: willcome(mark), willcome(tom), willcome(jane), c friends(jerry, 1), c friends(penny, 1), c friends(jerry, 2), c friends(penny, 2), c friends(penny, 3), willcome(penny), c friends(jerry, 3), willcome(jerry). This example illustrates how the standard semi-naive computation can be applied to queries containing monotonic user-deﬁned aggregates. Another interesting example is transitive ownership and control of corporations [24]. Company Control Say that owns(C1, C2, Per) denotes the percentage of shares that corporation C1 owns of corporation C2. Then, C1 controls C2 if it owns more than, say, 50% of its shares. In general, to decide whether C1 controls C3 we must also add the shares owned by corporations such as C2 that are controlled by C1. This yields the transitive control rules deﬁned with the help of a continuous sum aggregate that returns the partial sum for each new element, but the ﬁrst one. control(C, C) ← owns(C, , ). control(Onr, C) ← twons(Onr, C, Per), Per > 50. towns(Onr, C2, msumPer ) ← control(Onr, C1), owns(C1, C2, Per). Thus, every company controls itself, and a company C1 that has transitive ownership of more than 50% of C2’s shares controls C2 . In the last rule, twons computes transitive ownership with the help of msum that adds up the shares of controlling companies. Observe that any pair (Onr, C2) is added at most once to control, thus the contribution of C1 to Onr’s transitive ownership of C2 is only accounted once. Bill-of-Materials (BoM) Applications BoM applications represent an important application area that requires aggregates in recursive rules. Say, for instance that assembly(P1, P2, QT) denotes that P1 contains part P2 in quantity QT. We also have elementary parts described by the relation basic part(Part, Price). Then, the following program computes the cost of a part as the sum of the cost of the basic parts it contains. part cost(Part, O, Cst) ← basic part(Part, Cst). part cost(Part, mcountSb , msumMCst ) ← part cost(Sb, ChC, Cst), prolfc(Sb, ChC), assembly(Part, Sb, Mult), MCst = Cst ∗ Mult. Thus, the key condition in the body of the second rule is that a subpart Sb is counted in part cost only when all of Sb’s children have been counted. This occurs when the number of Sb’s children counted so far by mcount is equal to the out-degree of this node in the graph representing assembly. This number is kept in the proliﬁcacy table, prolfc(Part, ChC), which can be computed as follows: prolfc(P1, countP2 ) ← assembly(P1, P2, ). prolfc(P1, 0) ← basic part(P1, ).

132

8

Carlo Zaniolo

Conclusions

Keys in derived relations extend the expressive power of deductive databases while retaining their declarative semantics and eﬃcient implementations. In this paper, we have presented equivalent ﬁxpoint and model-theoretic semantics for programs with key constraints in derived relations. Database aggregates can be easily modelled under this extension, yielding a simple characterization of monotonic aggregates. Monotonic aggregates can be freely used in recursive programs, thus providing simple and eﬃcient expressions for optimization and greedy algorithms that had been previously considered impervious to the logic programming paradigm. There has been a signiﬁcant amount of previous work that is relevant to the results presented in this paper. In particular the LDL++ provides the choice construct to declare functional dependency constraints in derived relations. The stable model characterization and several other results presented in this paper ﬁnd a similar counterpart in properties of LDL++ choice construct [13,37]; however, no ﬁxpoint characterization and related results were known for LDL++ choice. An extension of this concept to temporal logic programming was proposed by Orgun and Wadge [21], who introduced the notion of choice predicates that ensure that a given predicate is single-valued. This notion ﬁnds applications in intensional logic programming [21]. The cardinality and weight constraints proposed by Niemel¨a and Simons provide a powerful generalization to key constraints discussed here [20]. In fact, while the key constraint restrict the cardinality of the results to be one, the constraint that such cardinality must be restricted within a user-speciﬁed interval is supported in the mentioned work (where diﬀerent weights can also be attached to atoms). Thus Niemel¨a and Simons (i) provide a stable model characterization for logic programs containing such constraints, (ii) propose an implementation using Smodels [19], and (ii) show how to express NP-complete problems using these constraints. The implementation approach used for Smodels is quite diﬀerent from that of LDL++; thus investigating the performance of diﬀerent approaches in supporting cardinality constraints represents an interesting topic for future research. Also left for future research, there is the topic of SLD-resolution, which (along with the ﬁxpoint and model-theoretic semantics treated here) would provide a third semantic characterization for logic programs with key constraints [29]. Memoing techniques could be used for this purpose, and for an eﬃcient implementation of keys and aggregates [3]. Acknowledgements The author would like to thank the reviewers for the many improvements they have suggested, and Frank Myers for his careful proofreading of the manuscript. The author would also like to express his gratitude to Dino Pedreschi, Domenico Sacc´a, Fosca Giannotti and Sergio Greco who laid the seeds of these ideas during our past collaborations. This work was supported by NSF Grant IIS-007135.

Key Constraints and Monotonic Aggregates in Deductive Databases

133

References 1. S. Abiteboul, R. Hull, and V. Vianu: Foundations of Databases. Addison-Wesley, 1995. 2. N. Bidoit and C. Froidevaux: General logical Databases and Programs: Default Logic Semantics and Stratiﬁcation. Information and Computation, 91, pp. 15–54, 1991. 3. W. Chen, D. S. Warren: Tabled Evaluation With Delaying for General Logic Programs. JACM, 43(1): 20-74 (1996). 4. D. Chimenti, R. Gamboa, R. Krishnamurthy, S. Naqvi, S.Tsur and C. Zaniolo: The LDL System Prototype. IEEE Transactions on Knowledge and Data Engineering, 2(1), pp. 76-90, 1990. 5. S. Ceri, G. Gottlob and L. Tanca: Logic Programming and Databases. Springer, 1990. 6. S. W. Dietrich: Shortest Path by Approximation in Logic Programs. ACM Letters on Programming Languages and Systems, 1(2), pp. 119–137, 1992. 7. S. J. Finkelstein, N.Mattos, I.S. Mumick, and H. Pirahesh: Expressing Recursive Queries in SQL, ISO WG3 report X3H2-96-075, March 1996. 8. J. M. Hellerstein, P. J. Haas, H. J. Wang.: Online Aggregation. SIGMOD 1997: Proc. ACM SIGMOD Int. Conference on Management of Data, pp. 171-182, ACM, 1997. 9. S. Ganguly, S. Greco, and C. Zaniolo: Extrema Predicates in Deductive Databases. JCSS 51(2), pp. 244-259, 1995. 10. M. Gelfond and V. Lifschitz: The Stable Model Semantics for Logic Programming. Proc. Joint International Conference and Symposium on Logic Programming, R. A. Kowalski and K. A. Bowen (eds.), pp. 1070-1080, MIT Press, 1988. 11. F. Giannotti, D. Pedreschi, D. Sacc` a, C. Zaniolo: Non-Determinism in Deductive Databases. In DOOD’91, C. Delobel, M. Kifer, Y. Masunaga (eds.), pp. 129-146, Springer, 1991. 12. F. Giannotti, G. Manco, M. Nanni, D. Pedreschi: On the Eﬀective Semantics of Nondeterministic, Nonmonotonic, Temporal Logic Databases. Proceedings of 12th Int. Workshop, Computer Science Logic, pp. 58-72, LNCS Vol. 1584, Springer, 1999. 13. F. Giannotti, D. Pedreschi, and C. Zaniolo: Semantics and Expressive Power of Non-Deterministic Constructs in Deductive Databases. JCSS 62, pp. 15-42, 2001. 14. Sergio Greco, Domenico Sacc` a: NP Optimization Problems in Datalog. ILPS 1997: Proc. Int. Logic Programming Symposium, pp. 181-195, MIT Press, 1997. 15. S. Greco and C. Zaniolo: Greedy Algorithms in Datalog with Choice and Negation, Proc. 1998 Joint Int. Conference & Symposium on Logic Programming, JCSLP’98, pp. 294-309, MIT Press, 1998. 16. R. Krishnamurthy, S. Naqvi: Non-Deterministic Choice in Datalog. In Proc. 3rd Int. Conf. on Data and Knowledge Bases, pp. 416-424, Morgan Kaufmann, 1988. 17. V. W. Marek and M. Truszczynski: Nonmonotonic Logic. Springer-Verlag, New York, 1995. 18. J. Minker: Logic and Databases: A 20 Year Retrospective. In D. Pedreschi and C. Zaniolo (eds.), Proceedings International Workshop on Logic in Databases (LID’96), Springer-Verlag, pp. 5–52, 1996. 19. I. Niemel¨ a, P. Simons and T. Syrjanen: Smodels: A System for Answer Set Programming Proceedings of the 8th International Workshop on NonMonotonic Reasoning, April 9-11, 2000, Breckenridge, Colorado, 4 pages. (Also see: http://www.tcs.hut.ﬁ/Software/smodels/)

134

Carlo Zaniolo

20. I. Niemel¨ta and P. Simons: Extending the Smodels System with Cardinality and Weight Constraints. In Jack Minker (ed.): Logic-Based Artificial Intelligence, pp. 491-521. Kluwer Academic Publishers, 2001. 21. M.A. Orgun and W.W. Wadge, Towards an Uniﬁed Theory of Intensional Logic Programming. The Journal of Logic and Computation, 4(6), pp. 877-903, 1994. 22. T. C. Przymusinski: On the Declarative and Procedural Semantics of Stratiﬁed Deductive Databases: In J. Minker (ed.), Foundations of Deductive Databases and Logic Programming, pp. 193–216, Morgan Kaufmann, 1988. 23. R. Ramakrishnan, D. Srivastava, S. Sudanshan, and P. Seshadri: Implementation of the CORAL Deductive Database System. SIGMOD’93: Proc. Int. ACM SIGMOD Conference on Management of Data, pp. 167–176, ACM, 1993. 24. K. A. Ross and Yehoshua Sagiv: Monotonic Aggregation in Deductive Database, JCSS 54(1), pp. 79-97, 1997. 25. D. Sacc` a and C. Zaniolo: Deterministic and Non-deterministic Stable Models, Journal of Logic and Computation, 7(5), pp. 555-579, 1997. 26. J. S. Schlipf: Complexity and Undecidability Results in Logic Programming, Annals of Mathematics and Artificial Intelligence, 15, pp. 257-288, 1995. 27. S. Sudarshan and R. Ramakrishnan: Aggregation and relevance in deductive databases. VLDB’91: Proceedings of 17th Conference on Very Large Data Bases, pp. 501-511, Morgan Kaufmann, 1991. 28. J. D. Ullman: Principles of Data and Knowledge-Based Systems, Computer Science Press, New York, 1988. 29. M.H. Van Emden and R. Kowalski: The Semantics of Predicate Logic as a Programming Language. JACM 23(4), pp. 733-742, 1976. 30. A. Van Gelder, K. A. Ross, and J. S. Schlipf: The Well-Founded Semantics for General Logic Programs. JACM 38, pp. 620–650, 1991. 31. A. Van Gelder: Foundations of Aggregations in Deductive Databases. In DOOD’93, S. Ceri, K. Tanaka, S. Tsur (Eds.), pp. 13-34, Springer, 1993. 32. H. Wang and C. Zaniolo: User-Deﬁned Aggregates in Object-Relational Database Systems. ICDE 2000: International Conference on Database Engineering. pp. 111121, IEEE Press, 2000. 33. H. Wang and C. Zaniolo: Using SQL to Build New Aggregates and Extenders for Object-Relational Systems. VLDB 2000: Proceedings of 26th Conference on Very Large Data Bases, pp. 166-175, Morgan Kaufmann, 2000. 34. C. Zaniolo and H. Wang: Logic-Based User-Deﬁned Aggregates for the Next Generation of Database Systems. In K.R. Apt, V. Marek, M. Truszczynski, D.S. Warren (eds.): The Logic Programming Paradigm: Current Trends and Future Directions. Springer Verlag, pp. 121-140, 1999. 35. C. Zaniolo, S. Ceri, C. Faloutzos, R. Snodgrass, V.S. Subrahmanian, and R. Zicari: Advanced Database Systems, Morgan Kaufmann, 1997. 36. C. Zaniolo: The Nonmonotonic Semantics of Active Rules in Deductive Databases. In DOOD 1997, F. Bry, R. Ramakrishnan, K. Ramamohanarao (eds.), pp. 265-282, Springer, 1997. 37. C. Zaniolo et al.: LDL++ Documentation and Web Demo, 1988: http://www.cs.ucla.edu/ldl 38. C. Zaniolo: Key Constraints and Monotonic Aggregates in Deductive Databases. UCLA technical report, June 2001.

A Decidable CLDS for Some Propositional Resource Logics Krysia Broda Department of Computing, Imperial College 180 Queens’ Gate, London SW7 2BZ [email protected]

Abstract. The compilation approach for Labelled Deductive Systems (CLDS) is a general logical framework. Previously, it has been applied to various resource logics within natural deduction, tableaux and clausal systems, and in the latter case to yield a decidable (first order) CLDS for propositional Intuitionistic Logic (IL). In this paper the same clausal approach is used to obtain a decidable theorem prover for the implication fragments of propositional substructural Linear Logic (LL) and Relevance Logic (RL). The CLDS refutation method is based around a semantic approach using a translation technique utilising first-order logic together with a simple theorem prover for the translated theory using techniques drawn from Model Generation procedures. The resulting system is shown to correspond to a standard LL(RL) presentation as given by appropriate Hilbert axiom systems and to be decidable.

1

Introduction

Among the computational logic community no doubt there are very many people, like me, whose enthusiasm for logic and logic programming was ﬁred by Bob Kowalski. In my case it led to an enduring interest in automated reasoning, and especially the connection graph procedure. In appreciation of what Bob taught me, this paper deals with some non-classical resource logics and uses a classical ﬁrst order theory to give a clausal theorem prover for them. The general methodology based on Gabbay’s Labelled Deductive Systems (LDS) [9], called the Compiled Labelled Deductive Systems approach (CLDS), is described in [5], [6]. The method allows various logics to be formalised within a single framework and was ﬁrst applied to modal logics in [14] and generally to the multiplicative part of substructural logics in [5], [6]. The CLDS refutation method is based around a semantic approach using a translation into ﬁrst-order logic together with a simple theorem prover for the translated theory that employs techniques drawn from Model Generation procedures. However, one critical problem with the approach is that the resulting ﬁrst order theory is often too expressive and therefore not decidable, even when the logic being modelled is known to be so. It was described in [4] how to construct a decidable refutation prover for the case of Intuitionistic Logic (IL); in this paper that A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 135–159, 2002. c Springer-Verlag Berlin Heidelberg 2002

136

Krysia Broda

prover is extended, in diﬀerent ways, to deal with the implication fragments of the propositional resource logics Linear Logic (LL) and Relevance Logic (RL). The motivation for using LDS derives from the observation that many logics only diﬀer from each other in small ways. In the family of modal logics, for example, the diﬀerences can be captured semantically through the properties of the accessibility relation, or syntactically within various side-conditions on the proof steps. In substructural logics, the diﬀerences can be captured in the syntax by means of the structural proof rules. In a CLDS, capturing diﬀerences between logics is achieved through the use of a combined language, incorporating a language for wﬀs and a language for terms (known as labels), called a labelling language. Elements of the two languages are combined to produce declarative units of the form α : λ, where α is a wﬀ and λ is a label. The interpretation of a declarative unit depends on the particular family of logics being formalised. In the case of modal logics the label λ names a possible world, whereas in substructural, or resource, logics it names a combination of resources. A theory built from declarative units is called a conﬁguration and consists both of declarative units and literals stating the relationships between labels of the conﬁguration (called R-literals). In this LDS approach applied to resource logics the declarative unit α : λ represents the statement that the “resource λ veriﬁes the wﬀ α”. This was ﬁrst exploited in [9]. Resources can be combined using the operator ◦ and their power of veriﬁcation related by , where λ λ is interpreted to mean that λ can verify everything that λ can and is thus the more powerful of the two. Depending on the properties given to ◦ the power of combined resources can be controlled. In RL, for example, resources can be copied; that is, λ ◦ λ λ, or λ is just as powerful as multiple copies of itself. In both RL and LL the order in which resources are combined does not matter, so λ ◦ λ λ ◦ λ. These properties, contraction and commutativity, respectively, correspond to the structural rules of contraction and permutation of standard sequent calculi for RL and LL. In fact, in LDS, all substructural logics can be treated in a uniform way, simply by including diﬀerent axioms in the labelling algebra [1]. The semantics of a CLDS is given by translating a conﬁguration into ﬁrst order logic in a particular way, the notion of semantic entailment being deﬁned with respect to such translated conﬁgurations. An example of a conﬁguration is the set of declarative units {p → (p → (q → p)) : b, p : a, q : c, q → p : b ◦ a ◦ a, p : b ◦ a ◦ a ◦ c} and R-literals {a ◦ a a, a b ◦ a ◦ a ◦ c}, called constraints in this paper. The translation of a conﬁguration uses a language of special monadic predicates of the form [α]∗ , one predicate for each wﬀ α. For the above example of a conﬁguration the translation is {[p → (p → (q → p))]∗ (b), [p]∗ (a), [q → p]∗ (b ◦ a ◦ a), [p]∗ (b ◦ a ◦ a ◦ c), a ◦ a a, a b ◦ a ◦ a ◦ c} A set of axioms to capture the meanings of the logical operators and a theory, called the labelling algebra, are used for manipulating labels and the relations

A Decidable CLDS for Some Propositional Resource Logics

H * YH (3) HH (1) HHj (2) - A ∪ F OT (C) ∪ ¬F OT (C ) |= ∪ F OT (C) ∪ ¬F OT (C ) C |=S C

A+ S

AlgMG

137

+ S

F OL

Fig. 1. Refutation CLDS

between them. The language, axiom theory and labelling algebra considered in this paper are together referred to as LCLDS and RCLDS , respectively, for LL and RL. An example of a semantic axiom, using the monadic predicates of the form [α]∗ , in this case that captures the meaning of the → operator, is ∀x([α → β]∗ (x) ↔ ∀y([α]∗ (y) → [β]∗ (x ◦ y))) For a given problem, the set of semantic axioms is implicitly instantiated for every wﬀ that occurs in the problem; this set of instances together with a translation of the initial conﬁguration, in which α : λ is translated as [α]∗ (λ), can also be taken as a compiled form of the problem. Any standard ﬁrst order theorem prover, for example Otter [12], could be used to ﬁnd refutations, although not always very eﬃciently. In [4], a decidable refutation theorem prover based on the methods of Davis Putnam [8], Hyper-resolution [13] and model generation [11] was taken as the proof system and shown to be sound and complete with respect to the semantics. A similar approach can be taken for LL and RL, here called AlgMG, but appropriate new restrictions to retain decidability for LL and RL are required and deﬁnitions of these are the main contribution of this paper. The CLDS approach is part of a systematic general framework that can be applied to any logic, either old or new. In case a CLDS corresponds to a known logic, the correspondence with a standard presentation of that logic must also be provided. That is, it must be shown that (i) every derivation in the chosen standard presentation of that logic can be simulated by the rules of the CLDS, in this case by the refutation theorem prover, and (ii) how to build an interpretation such that, if a formula α is not a theorem of the logic in question, then there is an appropriate model in which a suitable declarative unit constructed using α is false. It is this second part that needs care in order to obtain a decidable system for the two logics LL and RL. The approach taken in a refutation CLDS is illustrated in Fig. 1, where C and C are conﬁgurations and ¬F OT (C) denotes the disjunction of the negations of the translated declarative units in F OT (C). Arrow (2) represents the soundness and completeness of the refutation prover and arrow (1) is the deﬁnition of the semantics of a CLDS. The derived arrow (3) represents a soundness and completeness property of the refutation procedure with respect to conﬁgurations. A fuller description of the language, labelling algebra and axioms modelling the derivation rules for the languages under consideration is given in Sect. 2, whilst Sect. 3 outlines the theorem prover and the results concerning soundness

138

Krysia Broda

and completeness. The main result of the paper, dealing with decidability, is in Sect. 4, with proofs of other properties in Sect. 5 and the paper concludes with a brief discussion in Sect. 6.

2

Refutation CLDS for Substructural Logics

The CLDS approach for the implication fragment1 of LL and RL is now described. Deﬁnitions of the language, syntax and semantics are given, and conﬁgurations are introduced. 2.1

Languages and Syntax

A CLDS propositional language is deﬁned as an ordered pair LP , LL , where LL is a labelling language and LP is a propositional language. For the implication fragment of LL and RL the language LP is composed of a countable set of proposition symbols, {p, q, r, . . .} and the binary connective →. A special proposition symbol is ⊥, where ¬A is deﬁned also as A → ⊥, so allowing negation to be represented. (The wﬀ is sometimes used in place of ⊥ → ⊥.) The labelling language LL is a fragment of a ﬁrst-order language composed of a binary operator ◦, a countable set of variables {x, y, z, . . .}, a binary predicate , the set of logical connectives {¬, ∧, ∨, →, ↔}, and the quantiﬁers ∀ and ∃. The ﬁrst-order language F unc(LP , LL ) is an extension of LL as follows. Definition 1. Let the set of all wﬀs in LP be {α1 , α2 , . . .}, then the semiextended labelling language F unc(LP , LL ) comprises LL extended with a set of skolem constant symbols {cα1 , cα2 , . . .}, also referred to as parameters. Terms of the semi-extended labelling language Func(LP , LL ) are deﬁned inductively, as consisting of parameters and variables, together with expressions of the form λ◦λ for terms λ and λ , and are also called labels. Note that the parameter cα represents the smallest label verifying α and that all parameters will have a special role in the semantics. There is the parameter 1 (shorthand for c ) that represents the empty resource, since is always provable. To capture diﬀerent classes of logics within the CLDS framework an appropriate ﬁrst-order theory written in the language Func(LP , LL ), called the labelling algebra, needs to be deﬁned. The labelling algebra is a binary ﬁrst-order theory which axiomatises (i) the binary predicate as a pre-ordering relation and (ii) the properties identity and order preserving of the commutative and associative function symbol ◦. For RL, the structural property contraction is also included. Definition 2. The labelling algebra AL , written in Func(LP , LL ), is the ﬁrst order theory given by the axioms (1) - (5), where x, y and z all belong to Func(LP , LL ). The algebra AR is the algebra AL enhanced by axiom (6). 1

Restricted in order to keep the paper short.

A Decidable CLDS for Some Propositional Resource Logics

139

(identity) ∀x[1 ◦ x x ∧ x 1 ◦ x] (order-preserving) ∀x, y, z[x y → x ◦ z y ◦ z ∧ z ◦ x z ◦ y] (pre-ordering) ∀x[x x] and ∀x, y, z[x y ∧ y z → x z] (commutativity) ∀x, y[x ◦ y y ◦ x] (associativity) ∀x, y, z[(x ◦ y) ◦ z x ◦ (y ◦ z)] and ∀x, y, z[x ◦ (y ◦ z) (x ◦ y) ◦ z] 6. (contraction) ∀x[x ◦ x x] 1. 2. 3. 4. 5.

The CLDS language facilitates the formalisation of two types of information, (i) what holds at particular points, given by the declarative units, and (ii) which points are in relation with each other and which are not, given by constraints (literals). A declarative unit is deﬁned as a pair “formula:label” expressing that a formula “holds” at a point. The label component is a ground term of the language F unc(LP , LL ) and the formula is a wﬀ of the language LP . A constraint is any ground literal in F unc(LP , LL ) of the form λ1 λ2 or λ1 λ2 ), where λ1 and λ2 are labels, expressing that λ2 is, or is not, related to λ1 . In the applications considered here, little use will be made of negated constraints. In Intuitionistic Logic “related to” was interpreted syntactically as “subset of”, but for LCLDS it is interpreted as “has exactly the same elements as” and for RCLDS as “has the same elements as, but possibly with more occurences”. This combined aspect of the CLDS syntax yields a deﬁnition of a CLDS theory, called a conﬁguration, which is composed of a set of constraints and a set of declarative units. An example of a conﬁguration was given in the introduction. The formal deﬁnition of a conﬁguration is as follows. Definition 3. Given a CLDS language, a conﬁguration C is a tuple D, F , where D is a ﬁnite set of constraints (referred to as a diagram) and F is a function from the set of ground terms of Func(LP , LL ) to the set of sets of wﬀs of LP . Statements of the form α ∈ F(λ) will be written as α : λ ∈ C. 2.2

Semantics

The model-theoretic semantics of CLDS is deﬁned in terms of a ﬁrst-order semantics using a translation method. This enables the development of a modeltheoretic approach which is equally applicable to any logic also belonging to diﬀerent families whose operators have a semantics which can be expressed in a ﬁrst-order theory. As mentioned before, a declarative unit α : λ represents that the formula is veriﬁed (or holds) at the point λ, whose interpretation is strictly related to the type of underlying logic. These notions are expressed in terms of ﬁrst-order statements of the form [α]∗ (λ), where [α]∗ is a predicate symbol. The relationships between these predicate symbols are constrained by a set of ﬁrst-order axiom schemas which capture the satisﬁability conditions of each type of formula α. The extended labelling algebra M on(LP , LL ) is an extension of the language F unc(LP , LL ) given by adding a monadic predicate symbol [α]∗ for each wﬀ α of LP . It is formally deﬁned below.

140

Krysia Broda

Table 1. Basic and clausal semantic axioms for LCLDS and RCLDS Ax1: Ax2: Ax3: Ax2a: Ax2b: Ax3a: Ax3b: Ax3c:

∀x∀y(x y ∧ [α]∗ (x) → [α]∗ (y)) ∀x([α]∗ (x) → ∃y([α]∗ (y) ∧ ∀z([α]∗ (z) → y z))) ∀x([α → β]∗ (x) ↔ ∀y([α]∗ (y) → [β]∗ (x ◦ y))) ∀x([α]∗ (x) → [α]∗ (cα )) ∀x([α]∗ (x) → cα x) ∀x∀y([α → β]∗ (x) ∧ [α]∗ (y) → [β]∗ (x ◦ y)) ∀x([α → β]∗ (x) ← [β]∗ (x ◦ cα )) ∀x([α → β]∗ (x) ∨ [α]∗ (cα ))

Definition 4. Let F unc(LP , LL ) be a semi-extended labelling language. Let the ordered set of wﬀs of LP be α1 , . . . , αn , . . ., then the extended labelling language, called M on(LP , LL ), is deﬁned as the language F unc(LP , LL ) extended with the set {[α1 ]∗ , . . . , [αn ]∗ , . . .} of unary predicate symbols. The extended algebra A+ L for LCLDS is a ﬁrst-order theory written in M on(LP ,LL ), which extends the labelling algebra AL with a particular set of axiom schemas. A LCLDS system S can now be deﬁned as S =

LP , LL , A+ L , AlgMG , where AlgMG is the program for processing the ﬁrst order theory A+ L . Similarly for that includes the (contraction) property. RCLDS , but using A+ R The axiom schemas are given in Table 1. There are the basic axioms, (Ax1) (Ax3), and the clausal axioms, (Ax3a), (Ax3b), etc., derived from them by taking each half of the ↔ in turn. The ﬁrst axiom (Ax1) characterises the property that increasing labels λ and λ , such that λ λ , imply that the sets of wﬀs veriﬁed by those labels are also increasing. The second axiom (Ax2) characterises a special property that states that, if a wﬀ α is veriﬁed by some label, then it is veriﬁed by a “smallest” label. Both these axioms relate declarative units to constraints. The axiom (Ax3) characterises the operator →. Several of the axioms have been simpliﬁed by the use of parameters, (Ax1) and (Ax2) (eﬀectively applying Skolemisation). In (Ax2) the variable y is Skolemised to the parameter cα . The Skolem term cα is a constant, not depending on x, and this is the feature that eventually yields decidability. A standard Skolemisation technique would result in a function symbol depending on x, but the simpler version suﬃces for the following reason. Using (Ax2), any two “normal” Skolem terms, cα (x1 ) and cα (x2 ), would satisfy cα (x1 ) cα (x2 ) and cα (x2 ) cα (x1 ) By (Ax1) this would allow the equivalence of [α]∗ (cα (x)) and [α]∗ (cα (y)) for any x and y. The single representative cα is introduced in place of the “normal” Skolem terms cα (x). It is not very diﬃcult to show that, for any set S of instances of the axiom schema Skolemised in the “normal” way using Skolem symbols cα (x), S is inconsistent iﬀ the same set of instances of the axioms, together with a set of clause schema of the form ∀x([α]∗ (cα ) ↔ [α]∗ (cα (x))), is inconsistent.

A Decidable CLDS for Some Propositional Resource Logics

141

The Skolemised (Ax2) can also be simpliﬁed to the following equivalent version (also called (Ax2)) ∀x([α]∗ (x) → ([α]∗ (cα ) ∧ cα x)) from which (Ax2a) and (Ax2b) are derived. In the system of [4] for IL a further simpliﬁcation was possible, in that (Ax3c) could be replaced by [α → β]∗ (1) ∨ [α]∗ (a). This is not the case for LCLDS or RCLDS , which consequently require a slightly more complicated algorithm AlgMG. The clausal axioms in Table 1, together with the appropriate properties of the Labelling Algebra, are also called or A+ . It is for ﬁnite sets of instances of the Extended Labelling Algebra, A+ L R these axioms that a refutation theorem prover is given in Sect. 3. The notions of satisﬁability and semantic entailment are common to any CLDS and are based on a translation method which associates syntactic expressions of the CLDS system with sentences of the ﬁrst-order language M on(LP ,LL), and hence associates conﬁgurations with ﬁrst-order theories in the language M on(LP , LL ). Each declarative unit α : λ is translated into the sentence [α]∗ (λ), and constraints are translated as themselves. A formal deﬁnition is given below. Definition 5. Let C = D, F be a conﬁguration. The ﬁrst-order translation of C, F OT (C), is a theory in M on(LP , LL ) and is deﬁned by the expression: F OT (C) = D ∪ DU , where DU = {[α]∗ (λ) | α ∈ F(λ), λ is a ground term of F unc(LP , LL )}. The notion of semantic entailment for LCLDS as a relation between conﬁgurations is given in terms of classical semantics using the above deﬁnition. In what follows, + wherever A+ L and |=L are used, AR and |=R could also be used, assuming the additional property of (contraction) in the Labelling Algebra.2

Definition 6. Let S =

LP , LL , , A+ , AlgMG be a LCLDS , C = D, F and C = L

D , F be two conﬁgurations of S, and F OT (C) = D∪DU and F OT (C ) = D ∪ DU be their respective ﬁrst-order translations. The conﬁguration C semantically entails C , written C |=L C , iﬀ A+ L ∪ F OT (C) ∪ ¬F OT (C ) |=F OL .

If δ is a declarative unit or constraint belonging to C and F OT (δ) its ﬁrst order translation, then C |=L C implies that A+ L ∪ F OT (C) ∪ ¬F OT (δ) |=F OL , which will also be written as C |=L δ. Declarative units of the form α : 1, such that T∅ |=L α : 1, where T∅ is an empty conﬁguration (i.e. D and F are both empty), are called theorems. In order to show that a theorem α : 1 holds in LCLDS (RCLDS ), appropriate instances (A+ ) are ﬁrst formed for each subformula of α, and then of the axioms in A+ L R ∗ ¬[α] (1) is added. This set of clauses is refuted by AlgMG. More generally, to show that α follows from the wﬀs β1 , . . . , βn , the appropriate instances include those for each subformula of α, β1 , . . . , βn , together with ¬[α]∗ (i), where i = cβ1 ◦ . . . ◦ cβn , together with the set {[βj ]∗ (cβj )}. This derives from consideration 2

Recall ¬F OT (C) means the disjunction of the negation of the literals in F OT (C).

142

Krysia Broda

of the deduction theorem, namely, that {βj } implies α iﬀ β1 → . . . βn → α is a theorem. Notice that, if a formula β occurs more than once, then [β]∗ (cβ ) need only be included once in the translated data, but its label cβ is included in i as many times as it occurs.

3

A Theorem Prover for LCLDS and RCLDS Systems

The Extended Labelling Algebra A+ enjoys a very simple clausal form. The L theorem prover AlgMG, described below as a logic program, uses an adaptation of the Model Generation techniques [11]. The axioms of the Labelling Algebra AL , or, including (contraction), AR , together with Axioms (Ax1) and (Ax2a) are incorporated into the uniﬁcation algorithm, called AlgU. Axioms (Ax1), (Ax2a) and (Ax2b) were otherwise accounted for in the derivation of the remaining axioms and are not explicitly needed any further. First, some deﬁnitions are given for this particular kind of ﬁrst order theory. Note 1. In this section, a clause will either be denoted by C, or by L ∨ D, where L is a literal and D is a disjunction of none or more literals. All variables are implicitly universally quantiﬁed. Literals are generally denoted by L or ¬L, but may also be denoted by: L(x) or L(y), when the argument is exactly the variable x or y, L(u), when the argument contains no variables, L(xu), when it contains a variable x and other ground terms u, in which case u is called the ground part, or L(w) when the argument may, or may not, contain a variable. The suﬃces 1 , 2 , etc. are also used if necessary. For ease of reading and writing, label combinations such as a ◦ b ◦ c will be written as abc. It is convenient to introduce the multi-set diﬀerence operator − on labels in which every occurrence counts. For example, aab − ab = a and ab − 1 = ab. In the sequel, by non-unit parameter will be meant any parameter cα other than c (=1). Definition 7. For a given set of clauses S, the set DS , the Herbrand Domain of S, is the set {cα |cα is a non-unit parameter occurring in S}∪{1}. The Herbrand Universe of S is the set of terms formed using the operator ◦ applied to elements from the Herbrand Domain. A ground instance of a clause C or literal L (written Cθ or Lθ) is the result of replacing each variable xi in C or L by a ground term ti from the Herbrand Universe, where the substitution θ = {xi := ti }. Definition 8. u1 uniﬁes with u2 (with respect to AlgU) iﬀ u1 u2 . Notice that uniﬁcation is not symmetric. In AlgMG it is also necessary to unify non-ground terms and details of the various cases (derived from the ground case), which are diﬀerent for each of RL and LL, are given next. They are labelled (a), (b) etc. for reference. (a) (ground, ground + var) u1 uniﬁes with xu2 , where u2 may implicitly be the label 1, iﬀ there is a ground substitution θ for x such that u1 uniﬁes

A Decidable CLDS for Some Propositional Resource Logics

143

with (xu2 )θ. In the case of LL there is only one possible value for θ, viz. x := u1 − u2 , but in the case of RL there may be several possible values, depending on the number of implicit contraction operations applied to u1 . For example, aaa uniﬁes with ax, with substitutions x := 1, x := a or x := aa. (b) (ground+var, ground) xu1 uniﬁes with u2 , where u1 may implicitly be the label 1, iﬀ there is a ground substitution θ such that (xu1 )θ uniﬁes with u2 . The substitution θ is chosen so that (xu1 )θ is the largest possible term that uniﬁes with u2 (under ). For example, in RL, ax uniﬁes with ab with substitution x := b, even though other substitutions for x are possible, eg x := abb.3 If u1 = 1 this case reduces to x := u2 . (c) (var+ground, var+ground) x1 u1 uniﬁes with x2 u2 iﬀ there are substitutions θ1 and θ2 for variables x1 and x2 of the form x1 := u3 x and x2 := u4 x, such that u1 u3 uniﬁes with u2 u4 . Either or both of u1 , u2 may implicitly be the label 1. The substitution for x1 is maximal (under ), in the sense that any other possible substitution for x1 has the form x1 := u5 x, where u5 u3 . In LL there is only one possible substitution for x2 of the right form, namely x2 := x ◦ (u1 − u2 ). In RL there may be several possible substitutions, depending on the number of implicit contraction steps. For example, in RL, aax1 uniﬁes with bx2 with both the substitutions x1 := bx, x2 := ax or x1 := bx, x2 := aax. However, because of the presence of the variable x in the substitution for x2 , it is only necessary to use the maximal substitution, which is the ﬁrst one. The reader can check the correct results are obtained if u1 = 1 or u2 = 1, respectively, that x1 = x2 u2 or x2 = u1 x1 . Subsumption can also be applied between literals. Definition 9. L(w) subsumes L(w ) iﬀ w uniﬁes with w with uniﬁer θ and L(w ) is identical to L(w)θ. This deﬁnition leads to the following cases. (d) (ground, ground) L(u1 ) subsumes L(u2 ) iﬀ u1 u2 (e) (ground, ground+var) L(u1 ) does not subsume L(xu2 ). (f ) (ground+var, ground) L(xu1 ) subsumes L(u2 ) iﬀ there is a ground substitution θ for x such that (xu1 )θ uniﬁes with u2 . (g) (ground+var, ground+var) L(x1 u1 ) subsumes L(x2 u2 ) iﬀ there is a substitution θ for x1 of the form x1 := x2 u3 such that u3 u1 uniﬁes with u2 . For example, in RL, P (xaa) subsumes P (ay) and P (aby), but it does not subsume P (by). Literal L subsumes clause C iﬀ L subsumes a literal in C. Definition 10. Unit clause L(w) resolves with D ∨ ¬L(w ) to give Dθ iﬀ w uniﬁes with w with uniﬁer θ. If D is empty and L(w) and ¬L(w ) resolve, then they are called complements of each other. A Hyper-resolvent is a clause with no negative literals formed by resolving a clause with one or more positive unit clauses. 3

Recall that in the presence of contraction bb b.

144

Krysia Broda

Brief Overview of AlgMG. AlgMG for the implication fragment operates on sets of clauses, each of which may either be a Horn clause (including unit clauses), or a non-Horn clause of the form ∀x([α]∗ (cα ) ∨ [α → β]∗ (x)). There is just one kind of negative unit clause, ¬[α]∗ (i), derived from the initial goal, where α is the wﬀ to be proved and i = i1 ◦ . . . ◦ in is the label consisting of the parameters i1 , . . . , in that verify the formulas from which α is to be proved. AlgMG incorporates the special uniﬁcation algorithm AlgU, which is used to unify two labels x and z, where x and/or z may contain a variable, implicitly or A+ and the diﬀerent deﬁnitions of taking into account the properties of A+ L R uniﬁer (cases (a) to (c) above). Notice that the order of parameters in a label does not matter because of the properties (associativity) and (commutativity), so abc would match with bca, for example. By (identity), the parameter 1 is only explicitly needed in the label 1 itself, which is treated as the empty multiset. There are, in fact, only a restricted number of kinds of uniﬁcation which can arise using AlgMG and these are listed after the available rules have been described. The initial set of clauses for refuting a formula α are derived from instances of the semantic axioms appropriate for the predicates occurring in the ﬁrst order translation of α (called the “appropriate set of clauses for showing α”). There are seven diﬀerent rules in AlgMG, which can be applied to a ﬁnite list of clauses. Five are necessary for the operation of the algorithm and the other two, (Simplify) and (Purity), are useful for the sake of practicality; only (Simplify) is included here. The (Purity) rule serves to remove a clause if it can be detected that it cannot usefully contribute to the derivation. Unit clauses in a list, derived by the (Hyper) or (Split) rule, or given initially, are maintained as a partial model of the initial clauses. The following rules are available in AlgMG: End A list containing an atom and its complement is marked as successfully ﬁnished. The only negative unit clause is derived from the initial goal. Subsumption Any clause subsumed by a unit clause L is removed. Simplify A unit clause [α]∗ (x) can be used to remove any literal ¬[α]∗ (w) in a clause since [α]∗ (x) complements ¬[α]∗ (w). Fail A list in which no more steps are possible is marked as failed and can be used to give a model of the initial clauseset. Hyper A hyper-resolvent (with respect to AlgU) is formed from a non-unit clause in the list and (positive) unit clauses in the list. Only hyper-resolvents that cannot immediately be subsumed are generated. Split If L is a list of clauses containing clause L ∨ L , two new lists [L |L− ] and [L |L− ] are formed, where L− results from removing L ∨ L from L. The algorithm is then applied to each list. The possible opportunities for uniﬁcation that arise in AlgMG are as follows: 1. Uniﬁcation of a label of the form xu in a positive literal, where x may be missing, with y in a negative literal in a (Hyper) step – the uniﬁer is given as in case (a) or case (c) as appropriate. 2. Uniﬁcation of a label x in a positive literal with some label w in a (Simplify) step. This always succeeds and w is unchanged. (This is a special case of (b) or (c).)

A Decidable CLDS for Some Propositional Resource Logics

145

3. Uniﬁcation of a label of the form xu1 in a positive literal, where either of x or u1 may be missing, with u2 in the negative literal in an (End) step. This is either the ground case of uniﬁcation, that is u1 u2 , or case (b). 4. Uniﬁcation in a (Hyper) step between a label of the form xu, where either x or u may be missing, with cα y. This is again either case (a) or (c). If use of either the (Hyper) or (Simplify) rule yields a label in which there are two variables, they can be replaced by a new variable x. The (Hyper) rule is the problem rule in AlgMG for the systems LCLDS and RCLDS . Its unrestricted use in a branch can lead to the generation of atoms with labels of increasing length. For example, the clause schema arising from α → α is [α → α]∗ (x) ∧ A(y) → A(xy), which, if there are atoms of the form [α → α]∗ (u1 ) and A(u2 ), will lead to A(u1 u2 ), A(u1 u2 u2 ) and so on, possibly none of them subsumed by earlier atoms. Therefore, without some restriction on its use, development of a branch could continue forever. The LDS tableau system in [1] and the natural deduction system in [2] both exhibited a similar problem, but its solution was not addressed in those papers. In the application to IL, due to the additional property of monotonicity in the labelling algebra, that x x ◦ y, labels could be regarded as sets of parameters. Together with the fact that the Herbrand Domain for any particular problem was ﬁnite, there was an upper bound on the size of labels generated (i.e. on the number of occurrences of parameters in a label) and hence the number of applications of (Hyper) was ﬁnite and termination of the algorithm was assured. In the two systems LCLDS and RCLDS this is not so any more and a more complex bound must be used to guarantee termination. Before introducing these restrictions, an outline logic program for AlgMG is given together with some examples of its operation. Outline Program for Algorithm AlgMG. The program is given below. A rudimentary version has been built in Prolog to check very simple examples similar to those in this paper. 0(start) 1(fail) 2(end) 3(subsume)

dp(S,F,R) :- dp1 ([ ],S,F,R). dp1(M,S,M,false) :- noRulesApply(M,S). dp1(M,S,[],true) :- endApplies(S,M). dp1(M,S,F,R) :- subsumed(C,M,S), remove(C,S,NewS), dp1( M,NewS,F,R). 4(simplify) dp1(M,S,F,R) :- simplify(M,S,NewS), dp1(M,NewS,F,R). 5(hyper) dp1(M,S,F,R) :- hyper(M,S,New), add(New,S,M,NewS,NewM), dp1(NewM,NewS,F,R). 6(split) dp1(M,S,F,R) :- split(M,S,NewS,S1,S2), dp1([S1|M],NewS,F1,R1),dp1([S2|M],NewS,F2,R2), join(F1,F2,F), and(R1,R2,R).

The initial call is the query dp(S, F, R), in which F and R will be variables, and S is a list of clauses appropriate for showing α and derived from a LCLDS or RCLDS . At termination, R will be bound either to true or to f alse and in the latter case F will be bound to a list of unit clauses. The list F can be used to ﬁnd a ﬁnite model of S Assume that any subsumed clauses in the initial set of

146

Krysia Broda

(1) (2) (3) (4) (5) (6) (7) (8)

Initial clauses: P0 (a) ¬P1 (a) P2 (b) ∨ P1 (x) P3 (bx) → P1 (x) P0 (x) ∧ A(y) → B(xy) P2 (x) ∧ B(y) → C(xy) A(c) ∨ P3 (x) C(cx) → P3 (x)

Initial translation: P0 (x) [α → β]∗ (x) ∗ (β → γ) (x) P1 (x) → (α → γ) ∗ P2 (x) [β → γ] (x) [α → γ]∗ (x) P3 (x) Derivation: (9) (Split (3)) P2 (b) (10) (Split (7)) A(c) (11) (Hyper (5)) B(ac)

(12) (13) (14) (15) (16) (17) (18) (19) (20)

(Hyper (6)) (Hyper (8)) (Hyper (4)) (End ) (Split (7)) (Hyper (4)) (End) (Split (3)) (End)

C(abc) P3 (ab) P1 (a) P3 (x) P1 (x) P1 (x)

Fig. 2. Refutation of (α → β) → ((β → γ) → (α → γ)) in LCLDS using AlgMG

clauses have been removed. This means that in the initial call to dp, S contains neither subsumed clauses nor tautologies - the latter because of the way the clauses are originally formed. This property will be maintained throughout. In dp1 the ﬁrst argument is the current (recognised) set of positive unit clauses, which is assumed to be empty at the start.4 The predicates used in the Prolog version of AlgMG can be interpreted as follows ((S, M ) represents the list of all clauses in S and M ): add(New,S,M,NewS,NewM) holds iﬀ the units in N ew derived from the (Hyper) rule are added to M to form N ewM and disjunctions in N ew are added to S to form N ewS. and(X,Y,Z) holds iﬀ Z = X ∧ Y . endApplies(S,M) holds iﬀ (End) can be applied to (S, M ). hyper(M,S,New) holds iﬀ N ew is a set of hyper-resolvents using unit clauses in M and a clause in S, that do not already occur in M . The labels of any new hyper-resolvents are subject to a size restriction (see later), in order that there are not an inﬁnite number of hyperresolvents. join(F1,F2,F) holds iﬀ F is the union of F 1 and F 2. noRulesApply(M,S) holds iﬀ there are no applicable rules to (M , S). remove(P,S,NewS) holds iﬀ clause P is removed from S to give N ewS. simplify(M,S,NewS) holds iﬀ clauses in S can be simpliﬁed to N ewS by units in M . split(M,S,NewS, S1,S2) holds iﬀ S1 ∨ S2 is removed from S to leave N ewS. subsumed(C,M,S) holds if Clause C in S is subsumed by clauses from S or M . Examples. Two examples of refutations appear in Figs. 2 and 3, in which the LL theorem (α → β) → ((β → γ) → (α → γ)) and the RL theorem (α → β) → 4

In case the initial goal is to be shown from some data, in the start clause this initial data would be placed in the first argument of dp1.

A Decidable CLDS for Some Propositional Resource Logics

147

Initial clauses: (6) P1 (x) ∧ B(y) → C(xy) P0 (a) (7) P2 (x) ∧ P3 (y) → P4 (xy) P1 (b) (8) P4 (x) ∧ P3 (y) → D(xy) P2 (c) ¬D(abc) (9) A(d) ∨ P3 (x) P0 (x) ∧ A(y) → B(xy) (10) C(dx) → P3 (x) Initial translation: P3 (x) [α → γ]∗ (x) P0 (x) [α → β]∗ (x) P4 (x) [(α → γ) → δ]∗ (x) P1 (x) [β → γ]∗ (x) P2 (x) [(α → γ) → ((α → γ) → δ)]∗ (x) Derivation: (11) (Split (9)) A(d) (17) (End) (12) (Hyper (5)) B(ad) (18) (Split(9)) P3 (x) (13) (Hyper (6)) C(bad) (19) (Hyper (7)) P4 (cx) (20) (Hyper (8) D(cx) (14) (Hyper (10)) P3 (ba) (15) (Hyper (7)) P4 (bac) (21) (End ) (16) (Hyper (8)) D(bacba)

(1) (2) (3) (4) (5)

Fig. 3. Refutation in RCLDS using AlgMG

((β → γ) → ((α → γ) → ((α → γ) → δ)) → δ) are, respectively, proved. For ease of reading, the parameters used are called a, b, c, . . . instead of having the form cα→β , etc. and the predicates A, B and C are used in place of [α]∗ , [β]∗ and [γ]∗ . In Fig. 2, the (translation of the) data α → β is added as a fact and the goal is (the translation of) (β → γ) → (α → γ). In Fig. 3, the initial data α → β, β → γ and (α → γ) → ((α → γ) → δ) are added as facts. The goal in this case is δ. These arrangements simply make the refutations a little shorter than if the initial goal had been the immediate translation of the theorem to be proved. The calls to dp1 can be arranged into a tree, bifurcation occurring when the (Split) rule is used. In the derivations each line after the list of initial clauses records a derived clause. Derived unit clauses would be added to an accumulating partial model M , which is returned in case of a branch ending in failure. In Fig. 2, for example, there are three branches in the tree of calls to dp1, which all contain lines (1) - (8) implicitly and terminate using the (End) rule. The ﬁrst branch contains lines (9) - (15), the second contains lines (9), (16) - (18), and the third contains lines (19), (20). Deletions of clauses due to purity and subsumption, and of literals due to simplify are not made, for the sake of simplicity. However, line (17) could have been achieved by a (Simplify) step instead. A possible subsumption step after line (16) is the removal of clauses (7) and (8). Notice that, in Fig. 2 only some of the appropriate axioms have been included. It might be expected that clauses derived from both halves of the appropriate equivalence schemas would be included, resulting in the inclusion of, for instance, P3 (x) ∧ A(y) → C(xy). However, it is only necessary to include a restricted number of clauses based on the polarity of the sub-formula occurrences.

148

4 4.1

Krysia Broda

Main Results Termination of AlgMG

In this section suitable termination criteria are described for the (Hyper) rule of AlgMG for the two logics in question, Linear Logic and Relevance Logic. A diﬀerent condition is imposed for each of LCLDS and RCLDS and in such a way that termination of a branch without the use of (End) will not cause loss of soundness. That is, AlgMG will terminate a branch without (End) only if the original goal to be checked is not a theorem of LL (or RL). It is assumed that the translation of the initial goal α is initially included in the list S in AlgMG in the form ¬[α]∗ (1). The termination conditions for the two logics are, at ﬁrst sight, rather similar; however, the condition for LL uses a global restriction, whereas that for RL uses local restrictions, dependent on the particular development of the AlgMG tree. When forming the translation of a conﬁguration, clauses corresponding to axiom (Ax3c) for which the same wﬀ α is involved all make use of the same parameter cα . The number of occurences of a non-unit parameter cα for wﬀ α in an instance of axiom (Ax3c) is called the relevant index of cα and is denoted by mα . For example, in case an occurrence of axiom (Ax3c) is made for the two wﬀs α → β and α → γ, then the two occurrences would be [α]∗ (cα ) ∨ [β]∗ (x) and [α]∗ (cα ) ∨ [γ]∗ (x) and mα = 2. Definition 11. Let LCLDS be a propositional Linear LDS based on the languages LP and LL , and S be a set of clauses appropriate for showing the wﬀ α. The ﬁnite subset of terms in Func(LP , LL ) that mentions only parameters in S and does not include any non-unit parameter cα more times than its relevant index mα is called the restricted Linear Herbrand Universe HL . The restricted set of ground instances SHL is the set of ground instances of clauses in S such that every argument is in HL . The restricted atom set BHL is the set of atoms using predicates mentioned in S and terms in HL . Termination in LCLDS . The criterion to ensure termination in LCLDS is as follows: Let B be a branch of a tree generated by AlgMG; an atom L(w) may be added to B only if it is not subsumed by any other atom in B and has a ground instance L(u), where u ∈ HL . (Notice that any atom of the form P (ux), where u contains every parameter exactly mα times, has only one ground instance, P (w), such that w ∈ HL . This instance occurs when x = 1 and w = u. This atom would therefore only be added to B if not already present.) The above criterion places an upper bound on the potential size of u such that, at worst, there can be Π(mαi + 1) atoms for each predicate in any branch, where mαi are the relevant indices for non-unit parameters cαi . There is one predicate for each subformula in α, the given formula to be tested. In fact, for

A Decidable CLDS for Some Propositional Resource Logics

149

LL, it is possibly simpler to use a more restrictive translation, in which a diﬀerent parameter is introduced for each occurrence of α. Then the relevant index of any non-unit parameter is always 1, and the terms in HL are restricted to containing any non-unit parameter at most once. The formula for the number of atoms then reduces to 2n , where n is the number of non-unit parameters introduced by the translation. In practice there are fewer than this maximum due to subsumption. If AlgMG is started with an initial set of sentences S appropriate for showing α and termination occurs with (End) in all branches, then, as is shown in Sect. 5, α is a theorem of LL. On the other hand, suppose termination of a branch B occurs without using (End), possibly because of the size restriction. Then a model of SHL can be constructed as follows: Assign true to each atom in BHL that occurs also in B or that is subsumed by an atom in B, and false to all other atoms in BHL . For illustration, if the example in Fig. 3 were redone using LCLDS , then the step at line (16) would not have been generated, nor could the branch be extended further; the atoms in it can be used to obtain a ﬁnite model of the initial clauses. The following atoms would be assigned true: P0 (a), P1 (b), P2 (c), A(d), B(ad), C(bad), P3 (ba), P4 (bac) and all other atoms in BL would be assigned false. It is easy to check that this is a model of the ground instances of clauses (1) - (10) whose terms all lie in HL . Suppose that each clause C in S is modiﬁed by the inclusion of a new condition of the form restricted(x), one condition for each variable x in C. The atom restricted(x) is to be interpreted as true exactly if x lies within HL . It is easy to show that the set of modiﬁed clauses is unsatisﬁable over DS iﬀ the set SHL is unsatisﬁable. This property will be exploited when proving the correspondence of LCLDS with LL. Termination in RCLDS . In the case of RL, the termination is complicated by the presence of contraction, illustrated in the example in Fig. 3, where the atom D(bacba), derived at line (16), includes the parameter b more than mb times (mb = 1).5 The restriction dictating which atoms to generate by (Hyper) in RCLDS uses the notion of relevant set, which in turn uses the notion of full labels. Unlike the case for LL, there is no easily stated global restriction on labels (such as that indicated by restricted(x)). The criterion described below was inspired by the description given in [16] for the relevant logic LR. Definition 12. Let RCLDS be a propositional relevant LDS based on LP and LL and S be a set of clauses appropriate for showing α. A ground label in LL , that mentions only parameters in S and in which every non-unit parameter a occurs at least ma times, is called full. A ground label in LL , that mentions only parameters in S and is not full, is called small. A parameter a that occurs in a 5

The inclusion of P (b) in the data is due to an implied occurrence of axiom (Ax3c) and there is just one such implicit occurrence.

150

Krysia Broda

small label, but less than ma times, belongs to its small part. A parameter a that occurs in a label (either full or small) at least ma times belongs to its full part. A ground atom having a predicate occurring in S that has a full/small label is also called a full/small atom. Definition 13. Let RCLDS be a propositional relevant LDS based on LP and LL and S be a set of clauses appropriate for showing α. Suppose that B is a branch derived from the application of AlgMG such that no subsumption steps can be made to B and let P (u1 ) be a ground atom occurring in B. The relevant set of P (u1 ) (relative to B), is the set of ground atoms P (u2 ) such that: only parameters occurring in S occur in u2 and either, (i) there is at least one non-unit parameter a in P (u1 ) occuring k times, 0 ≤ k < ma , that also occurs in P (u2 ) more than k times, or, (ii) there is at least one non-unit parameter a in P (u1 ) occuring k times, 1 ≤ k, that occurs in P (u2 ) zero times. As an example, suppose there are two parameters a and b and that ma = 2 and mb = 3, then the relevant set of P (aab) (=P (a2 b)) is the set of atoms of one of the forms: P (ar b2 ), P (ar b3 ), P (ar bp ), where r ≥ 1, p ≥ 4, or P (bs ), P (as ), where s ≥ 0. The relevant set of the full atom P (a2 b3 ) is the set of atoms of the form P (as ) or P (bs ), where s ≥ 0. If P (w) is not ground, then the relevant set is the intersection of the relevant set of each ground instance of P (w). The criterion to ensure termination in RCLDS can now be stated. In RCLDS the (Hyper) rule is restricted so that a ground atom P (w) is only added to a branch B if (i) it is not subsumed by any literal in B and (ii) it belongs to the relevant set of every other P -atom in B. In other words, if P (w) is added to a branch, then for every atom P (z) in the branch, either the number of occurrences of at least one non-unit parameter a in z that occurs fewer than ma times is increased in w, or some non-unit parameter in z is reduced to zero in w. Notice that, if there are no P -atoms in the branch, then P (w) can be added vacuously according to the criterion. In case the (Hyper) rule generates a non-ground atom, then as long as it is not subsumed and some ground instance of it satisﬁes property (ii) above it can be added to the branch. Although relevant sets are (countably) inﬁnite, the impact of all relevant sets having to include any new literal in a branch is quite strict and very quickly reduces the number of possibilities to a ﬁnite number. For instance, a literal P (u) in a branch with a small label u = u1 u2 , where u1 is the small part of u, will prevent any other literal P (u ), where the small part of u is subsumed by u1 , from being added to the branch. For instance, if P (a4 b2 ) belongs to a branch, and ma = 2, mb = 3, then no literal of the form P (as b2 ) or P (as b), s ≥ 1, can be added to the branch. If ma = mb = 2, then no literal of the form P (as br ) can be added, s ≥ 1, r ≥ 1. For any particular set of initial clauses there are only a ﬁnite number of labels that can occur as small parts of labels. This observation means that the maximum number of literals in a branch will be ﬁnite. It also

A Decidable CLDS for Some Propositional Resource Logics

151

allows for the following deﬁnition of measure for a branch that decreases with each new atom added to the branch. Definition 14. Let

LP , LL , A+ R , AlgMG be a RCLDS and S be a set of clauses appropriate for showing α. The relevant measure of the positive atoms in a branch B derived using AlgMG, with no pairwise subsumption, is deﬁned as the sum, over each predicate P in S, of the number of possible small parts of labels that do not occur in any P -literal in B or in any P -literal subsumed by a literal in B. It is easy to see that, when a new atom P (w) is added to a branch B by AlgMG, then the relevant measure will decrease. Eventually, either (i) (End) will be applied to B, or (ii) the measure of B will have been reduced to zero, or (iii) no further steps are possible using AlgMG. For example, suppose that branch B includes just the atom P (a2 b), that there is one predicate P and two parameters a and b each with a relevant index of 2. The relevant measure is 7, since the small parts a2 b and ab are, respectively, covered by P (a2 b) and P (ab), subsumed by P (a2 b). If P (a2 b2 ) is now added then the branch measure is reduced to 5. Also, the literal P (a2 b) would be subsumed. In summary, in applying AlgMG, an atom can be added to a branch as long as it respects the following (informal) criterion: LCLDS An atom is added to a branch B only if the ground part of its label belongs to HL and if it is not subsumed by any atom in B. RCLDS An atom P (w1 ) is added to a branch B only if it has a ground instance which belongs to some relevant set of every atom in B and if it is not subsumed by any atom in B. In practice, this means that P (w1 ) is not subsumed, and, for each atom P (w2 ), it must either increase the number of occurrences of at least one non-full parameter in w2 , or it must reduce the number of occurences of at least one non-unit parameter in w2 to zero. 4.2

Properties of AlgMG.

There are several properties that hold about the relationship between the Semantics given by the Axioms in the Extended Labelling Algebra A+ L and the procedure AlgMG, which are stated in Theorem 1. A proof of these properties can be made in a similar way to that given in [4] for IL. An outline is given here, including in detail the new cases for the two logics LL and RL. Theorem 1 (Properties of AlgMG). Let S be a LCLDS , α be a propositional LL formula, A+ (α) be the particular clauses and instances of the Semantic AxL ∗ ioms for showing α and Gα = A+ L (α) ∪ {¬[α] (1)}. Let AlgMG be initiated by the call dp(Gα , F, R) for variables F and R, then the following properties hold: 1. If AlgMG returns R = true then Gα |=FOL . 2. If AlgMG returns R = f alse then F is a partial model of Gα , in a way to be explained. 3. AlgMG terminates.

152

Krysia Broda

4. If α is also a Hilbert theorem of propositional LL (i.e. α can be derived from the Hilbert Axioms for LL and Modus Ponens), then Gα |=FOL . 5. If Gα |=FOL then α is a theorem of LL. Similar properties hold for RL. In AlgMG every step (except (Hyper)) reduces the total number of literals in M ∪ S. However, the number of (Hyper) steps is restricted to a ﬁnite number in RL by the use of relevant sets and in LL by the restriction of terms to belong to HL . Exactly the same proof for termination of AlgMG as in [4] can then be used. Properties (1) and (2) are soundness and completeness results for AlgMG, in the sense that they show that the algorithm is correct with respect to ﬁnding refutations. These properties can be proved as in [4], except for the case of clause 1, the case that covers extending the resulting value of F to become a model of the clauses S, which is detailed in the proof of Lemma 1. Properties (4) and (5) show that AlgMG corresponds with LL, (4) showing it gives a refutation for any theorem of LL, and (5) showing that it only succeeds for theorems. Similarly for RL. Proofs of these properties can be made following the same proof structure as in [4], but with some changes to cope with the diﬀerent logics. Lemmas 2 and 3 give the details for the two logics considered in this paper.

5

Proving the Properties of AlgMG

Proving Properties 1 and 2. Properties (1) and (2) of AlgMG are proved by showing that the following proposition, called (PROP1and2) holds for each clause of (dp1): if the dp1 conditions of the clause satisfy invariant (INV) and the other conditions are also true, then the dp1 conclusion of the clause satisﬁes (INV) also, where (INV) is Either, R = false, M ⊆ F and F can be extended to a model of S or, R = true, F = [ ] and M ∪ S have no Herbrand models. For the case of LCLDS , when R = false F is extended to be a model of the ground instances of S, taken over the domain of the initial clauses set of clauses S, SHL , which are called restricted ground instances in the Lemma below. Note that, for the (End) clause in LCLDS , when R = true, it is the set of restricted ground instances of M ∪ S that has no models. This implies that M ∪ S also has no Herbrand models, for any such model would also be a model of the restricted instances. (It suﬃces to deal with Herbrand models since nonexistence of a Herbrand model of S implies the non-existence of any model of S (see, for example, [7]).) Lemma 1. The fail clause of dp1 satisﬁes (PROP1and2).

A Decidable CLDS for Some Propositional Resource Logics

153

Proof. The details of the proof are diﬀerent for each of the two logics. For LL a model of restricted ground instances is found, whereas for RL a Herbrand model is given. R is false; all rules have been applied and F = M . Certainly, M ⊆ F . There are then two cases: for LL and for RL. Case for Linear Logic. The set F is extended to be a model M0 of the restricted ground instances of the clauses remaining in S as follows: Any ground atom with label in HL that is subsumed by a literal in M is true in M0 . All other ground atoms with label in HL are false in M0 . The clauses left in S can only generate subsumed clauses, disallowed atoms or they are a negative unit. Assume that there is a restricted ground instance of a non-negative clause C in S that is false in M0 . That is, for some instance C , of C, its condition literals are true in M0 and its conclusion is false in M0 . If the conclusion is a single literal then, as (Hyper) has been applied to C already, the conclusion is either true in M , and hence in M0 , or it is subsumed by a clause in M , and again is true in M0 . Both contradict the assumptions. If the conclusion is a disjunction, then (Split) must have eventually been applied and the conclusion will again be true in M , or the disjunction is subsumed by a literal in M , contradicting the assumption. In case C = ¬L is a false negative unit clause in S, then some instance C = ¬L is false, or L is true in M0 . But in that case (End) would have been applied, a contradiction. The model obtained is a model of the clauses remaining when no more steps are possible in some chosen branch. Case for Relevant logic. Let the set of atoms formed using predicates in the initial set of clauses S and labels drawn from the Herbrand Domain of S, DS , be called BS . A model M0 of the atoms in BS is assigned, using the atoms in M , by the following assignment conditions: (i) Any ground atom in BS that is subsumed by an atom in M is true in M0 . (ii) Any ground atom in BS that subsumes an atom L in M by contraction of parameters in the full part of L only, is true in M0 . (iii) All other ground atoms in BS are false in M0 . Assume that there is a ground instance of a non-negative clause C in S that is false in M0 . That is, for some instance C , of C, its condition literals are true in M0 and its conclusion is false in M0 . If the conclusion is a single literal then, as (Hyper) has been applied to C already, the conclusion L is either true in M , and hence in M0 , or it is subsumed by a clause in M , and again is true in M0 , or it is disallowed. Both the ﬁrst two circumstances contradict the assumption. For the third circumstance, since L is disallowed, there is some literal L , in M or subsumed by a literal in M , which is subsumed by L by contracting only parameters that occur in the full part of L . But then by assignment condition (ii) both L and L are assigned true, again contradicting the assumption. The remainder of the proof is as given for LCLDS . An example of a failed refutation in RL is given in Fig. 4, in which there is an attempt to show (α → α) → (α → (α → α)). For this problem there are two parameters a and b with respective relevant indices ma = 1 and mb = 2. In the

154

Krysia Broda

Initial translation: ∗ (α → α) → (x) P0 (x) (α → (α → α)) ∗ P1 (x) [α → α] (x) P2 (x) [α → (α → α)]∗ (x)

(1) (2) (3) (4) (5) (6)

Initial clauses: ¬P0 (1) P2 (ax) → P0 (x) P1 (a) ∨ P0 (x) P1 (x) ∧ A(y) → A(xy) A(b) ∨ P1 (x) A(bx) → P1 (x)

(7) A(b) ∨ P2 (x) (8) P1 (bx) → P2 (x) Derivation: (9) (Split (3)) P1 (a) (10) (Split (5)) A(b) (11) (Hyper (4)) A(ab) (12) (Hyper (6)) P1 (1)

Fig. 4. Failed refutation in AR using AlgMG

branch (9) - (12) any further literals generated using (4), such as A(a2 b), are not allowed as they are not a member of the relevant set of A(ab). The atoms P1 (a), P1 (1), A(b) and A(ab) are assigned true, as are P1 (ak ) and A(ak b), k ≥ 2. All others are assigned false. Note that atoms of the form A(bk ), k ≥ 2, are not assigned true by assignment condition (ii), based on atom A(b), because neither b nor a occur in the full part, which is just the parameter 1. The reader can check that this is a model for the clauses (1)-(8). The number of atoms in a branch for each predicate depends on how soon atoms with full parts are derived for that predicate. If, for example, there are two parameters a and b, ma = 2 and mb = 3, then if P (aabbb) happened to be derived immediately, no other P atoms with both a and b in the label would be derived. Those with fewer occurrences (at least one of each parameter) would be prevented by subsumption, whereas those with more occurrences would be prevented by the termination restriction (ii). On the other hand, the worst case number of P atoms generated, with at least one occurrence of each parameter in the label, would be 6; for example, the following generation order would require all 6: ab, a2 b, ab2 , a2 b2 , ab3 , a2 , b3 . 5.1

Proving Correspondence of LCLDS /RCLDS with LL/RL

In order to show that the refutation system LCLDS presented here does indeed correspond to a standard Hilbert axiom presentation for Linear Logic it is necessary to show that theorems derived within the two systems are the same (Properties 4 and 5 of Theorem 1). Similarly for RCLDS and Relevant Logic. The complete set of axioms used in the implication fragments of LL and RL is shown in Table 2. Axioms (I2), (I3) and (I4) correspond, respectively, to contraction, distributivity and permutation. A useful axiom, (I5), is derivable also from (I3) and (I4) and is included for convenience. All axioms are appropriate for RL, whereas (I2) is omitted for LL. Respectively, Theorems 2 and 3 state that theorems in LL and RL derived from these axioms together with the rule of Modus Ponens (MP) are also theorems of AlgMG, and that theorems of LCLDS and RCLDS are also theorems in the Hilbert System(s).

A Decidable CLDS for Some Propositional Resource Logics

155

Table 2. The Hilbert axioms for ICLDS α→α (I1) (α → (β → γ)) → (β → (α → γ)) (I4) (α → (α → β)) → (α → β) (I2) (α → β) → ((β → γ) → (α → γ)) (I5) (α → β) → ((γ → α) → (γ → β)) (I3)

Correspondence Part I. Property (4) of AlgMG is shown in Theorem 2. An outline proof is given. For RL the appropriate Hilbert axioms are (Ax1) - (Ax5); (Ax2) is omitted for LL. Theorem 2. Let P be a Hilbert theorem of LL then the union of {¬[P ]∗ (1)} and the appropriate set of instances of the semantic axioms (equivalences) for ¬[P ]∗ (1), PS , has no models in HL . (For RL, PS has no models.) Proof. (Outline only.) The proof is essentially the same for both logics. Let PS be the set of deﬁning equivalences for P and its subformulas, ∀x[[P ]∗ (x) ↔ R(x)] be the deﬁning equivalence for [P ]∗ and ∀x[[P ]∗ (x) ↔ TP (x)] be the resulting equivalence after replacing every occurrence in R(x) of an atom that has a deﬁning equivalence in PS by the right-hand side of that equivalence. It is shown next that TP (1) is always true and hence that there are no models of PS and ¬[P ]∗ (1). This property of TP (1) is shown by induction on the number of (MP) steps in the Hilbert proof of P . In case P is an axiom and uses no applications of (MP) in its proof then the property can be seen to hold by construction. For instance, in the case of the contraction axiom (I2), T(I2) (1) is the sentence ∀y(∀zv([α]∗ (z) ∧ [α]∗ (v) → [β]∗ (zyv)) → ∀u([α]∗ (u) → [β]∗ (uy))) In the case of LL, the equivalences include also the restricted predicate (shortened to r in the illustration below). For the permutation axiom (I4), T(I4) (1), after some simpliﬁcation6 , is the sentence ∀y([α]∗ (y) → ∀v(r(zyv) → ([β]∗ (v) → [γ]∗ (zyv)))) → ∀z ∀u([β]∗ (u) → ∀w(r(zuw) → ([α]∗ (u) → [γ]∗ (zuw)))) Let the property hold for all theorems that have Hilbert proofs using < n applications of (MP), and consider a theorem P such that its proof uses n (MP) steps, with the last step being a derivation from P and P → P . By hypothesis, TP (1) is true, and TP →P (1) is true. Hence, since ∀x[TP →P (x) ↔ ∀u[TP (u) → TP (ux)]], then TP (1) is also true. The contrapositive of Theorem 2 allows the conclusion that P is not a theorem to be drawn from the existence of a model for {¬[P ]∗ (1)} ∪ PS as found by a terminating AlgMG. 6

In particular, restricted(xy) implies also restricted(x) and restricted(y).

156

Krysia Broda

Correspondence Part II. To show that every formula classiﬁed as a theorem by AlgMG in RCLDS or LCLDS is also derivable using the appropriate Hilbert axioms and the rule of Modus Ponens, Theorem 3 is used. Theorem 3. Let Gα be the set of instances of A+ for showing α (not including L ¬[α]∗ (1)), then if there exists an AlgMG refutation in LCLDS of Gα ∪¬[α]∗ (1) then there is a Hilbert proof in LL of α, which is therefore a theorem of LL. That is, if Gα , ¬[α]∗ (1) |=FOL then "HI α7 . Similarly for RCLDS and RL. Proof. Suppose Gα , ¬[α]∗ (1) |=FOL , hence any model of Gα is also a model of [α]∗ (1); it is required to show "HI α. Lemma 2 below states there is a model M of A+ (A+ ), and hence of Gα , with the property that [α]∗ (1) = true iﬀ "HI α. L R Therefore, since M is a model of A+ (A+ ) it is a model of [α]∗ (1) and hence "HI α L R is true, as required. The desired model is based on the canonical interpretation introduced in [1]. Definition 15. The canonical interpretation for LCLDS is an interpretation from Mon(LP , LL ) onto the power set of LP deﬁned as follows: – ||cα || = {z :"HI α → z}, for each parameter cα ; – ||λ ◦ λ || = {z : "HI α ∧ β → z} = {z : "HI α → (β → z)} , where α ∈ ||λ|| and β ∈ ||λ ||; – ||1|| = {z : "HI z} and – || || = {(||x||, ||y||) : ||x|| ⊆ ||y||}; – ||[α]∗ || = {||x|| : α ∈ ||x||}; Similarly for RCLDS . For the case of LCLDS an interpretation of the restricted predicate is also needed. This depends on the particular theorem that is to be proven, as it makes use of the relevant indices of the parameters occurring in the translated clauses. The interpretation is given by: ||restricted|| = {||x|| : ∀z(z ∈ ||x|| → z is provable using ≤ mαi occurrences of αi )} In other words, restricted(x) = true iﬀ x includes ≤ mαi occurrences of parameter αi . (In case a new parameter is used for each instance of Axiom (Ax3c) then the deﬁnition does not depend on the particular theorem to be proven as mαi = 1 for every cαi .) + The canonical interpretation is used to give a Herbrand model for A+ L (AR ), ∗ ∗ by setting [α] (x) = true iﬀ α ∈ ||x||. This means, in particular, that if [α] (1) = true then α ∈ ||1|| and hence "HI α. The following Lemma states that the (A+ ). canonical interpretation of Deﬁnition 15 is a model of A+ I R Lemma 2. The properties of the labelling algebra AL (AR ) given in Deﬁnition 2 + and the semantic axioms of A+ L (AR ) are satisﬁed by the canonical interpretation for LCLDS (RCLDS ). 7

The notation HI γ indicates that γ is provable using the appropriate Hilbert axioms.

A Decidable CLDS for Some Propositional Resource Logics

157

Proof. Each of the properties of the labelling algebra is satisﬁed by the canonical interpretation. For RCLDS the case for contraction is given here. The other cases are as given in [4]. For LCLDS the case for Axiom (Ax3a) is given. The other cases are as given in [4] but modiﬁed to include the restricted predicate. contraction Suppose that δ ∈ ||λ|| ◦ ||λ||. Then there is a Hilbert proof of α → (α → δ), where α ∈ ||λ||. By axiom (I2) "HI α → δ and δ ∈ ||λ||. (Ax3a) Let the maximum number of parameter occurrences allowed be ﬁxed by the global relevant indices for the particular theorem to be proved. Suppose restricted(x), restricted(y) and restricted(xy) and that α ∈ ||x|| and α → β ∈ ||y||. Then there are Hilbert proofs of δ → α and γ → α → β for δ ∈ ||x|| and γ ∈ ||y|| such that no more than the allowed number of subformula occurrences, as given by the relevant indices for the problem, are used in the combined proofs of δ and γ. To show δ → (γ → β), and hence β ∈ ||x ◦ y||, use axioms (I4) and (I5).

6

Conclusions

In this paper the method of Compiled Labelled Deductive Systems, based on the principles in [9], is applied to the two resource logics, LL and RL. The method of CLDS provides logics with a uniform presentation of their derivability relations and semantic entailments and its semantics is given in terms of a translation approach into ﬁrst-order logic. The main features of a CLDS system and model theoretic semantics are described here. The notion of a conﬁguration in a CLDS system generalises the standard notion of a theory and the notion of semantic entailment is generalised to relations between structured theories. The method is used to give presentations of LCLDS and RCLDS , which are seen to be generalisations, respectively, of Linear and Relevance Logic through the correspondence results in Sect. 5, which shows that there is a one-way translation of standard theories into conﬁgurations, while preserving the theorems of LL and RL. The translation results in a compiled theory of a conﬁguration. A refutation system based on a Model Generation procedure is deﬁned for this theory, which, together with a particular uniﬁcation algorithm and an appropriate restriction on the size of terms, yields a decidability test for formulas of propositional Linear Logic or Relevance Logic. The main contribution of this paper is to show how the translation approach into ﬁrst order logic for Labelled Deductive Systems can still yield decidable theories. This meets one of the main criticisms levelled at LDS, and at CLDS in particular, that for decidable logics the CLDS representation is not decidable. The method used in this paper can be extended to include all operators of Linear Logic, including the additive and exponential operators. For instance, the axiom for the additive disjunction operator ∨ in LL is ∀x([α ∨ β]∗ (x) ↔ ∀y(([α → γ]∗ (y) ∧ [β → γ]∗ (y)) → [γ]∗ (x ◦ y))) From an applicative point of view, the CLDS approach provides a logic with reasoning which is closer to the needs of computing and A.I. These are in fact

158

Krysia Broda

application areas with an increasing demand for logical systems able to represent and to reason about structures of information (see [9]). For example in [3] it is shown how a CLDS can provide a ﬂexible framework for abduction. For the automated theorem proving point of view, the translation method described in Section 2.2 facilitates the use of ﬁrst-order therem provers for deriving theorems of the underlying logic. In fact, the ﬁrst order axioms of a CLDS extended algebra A+ S can be translated into clausal form, and so any clausal theorem proving method might be appropriate for using the axioms to automate the process of proving theorems. The clauses resulting from the translation of a particular conﬁguration represent a partial coding of the data. A resolution refutation that simulates the application of natural deduction rules could be developed, but because of the simple structure of the clauses resulting from a subtructural CLDS theory the extended Model Generation method used here is appropriate.

References 1. M. D’Agostino and D. Gabbay. A generalisation of analytic deduction via labelled deductive systems. Part I: Basic substructural Logics. Journal of Automated Reasoning, 13:243-281, 1994. 2. K. Broda, M. Finger and A. Russo. Labelled Natural Deduction for Substructural Logics. Logic Journal of the IGPL, Vol. 7, No. 3, May 1999. 3. K. Broda and D. Gabbay. An Abductive CLDS. In Labelled Deduction, Kluwer, Ed. D. Basin et al, 1999. 4. K.Broda and D. Gabbay. A CLDS for Propositional Intuitionistic Logic. TABLEAUX-99, USA, LNAI 1617, Ed. N. Murray, 1999. 5. K. Broda and A. Russo. A Unified Compilation Style Labelled Deductive System for Modal and Substructural Logic using Natural Deduction. Technical Report 10/97. Department of Computing, Imperial College 1997. 6. K. Broda, A. Russo and D. Gabbay. A Unified Compilation Style Natural Deduction System for Modal, Substructural and Fuzzy logics, in Dicovering World with Fuzzy logic: Perspectives and Approaches to Formalization of Human-consistent Logical Systems. Eds V. Novak and I.Perfileva, Springer-Verlag 2000 7. A. Bundy. The Computer Modelling of Mathematical Reasoning. Academic Press, 1983. 8. C. L. Chang and R. Lee. Symbolic Logic and Mechanical Theorem Proving. Academic Press 1973. 9. D. Gabbay. Labelled Deductive Systems, Volume I - Foundations. OUP, 1996. 10. J. H. Gallier. Logic for Computer Science. Harper and Row, 1986. 11. R. Hasegawa, H. Fujita and M. Koshimura. MGTP: A Model Generation Theorem Prover - Its Advanced Features and Applications. In TABLEAUX-97, France, LNAI 1229, Ed. D. Galmiche, 1997. 12. W. Mc.Cune. Otter 3.0 Reference Manual and Guide. Argonne National Laboraqtory, Argonne, Illinois, 1994. 13. J.A. Robinson. Logic, Form and Function. Edinburgh Press, 1979. 14. A. Russo. Modal Logics as Labelled Deductive Systems. PhD. Thesis, Department of Computing, Imperial College, 1996.

A Decidable CLDS for Some Propositional Resource Logics

159

15. R. A. Schmidt. Resolution is a decision procedure for many propositional modal logics. Advances in Modal Logic, Vol.1, CSLI, 1998. 16. P. B. Thistlethwaite, M. A. McRobbie and R. K. Meyer. Automated TheoremProving in Non-Classical Logics, Wiley, 1988.

A Critique of Proof Planning Alan Bundy Division of Informatics, University of Edinburgh

Abstract. Proof planning is an approach to the automation of theorem proving in which search is conducted, not at the object-level, but among a set of proof methods. This approach dramatically reduces the amount of search but at the cost of completeness. We critically examine proof planning, identifying both its strengths and weaknesses. We use this analysis to explore ways of enhancing proof planning to overcome its current weaknesses.

Preamble This paper consists of two parts: 1. a brief ‘bluﬀer’s guide’ to proof planning1 ; and 2. a critique of proof planning organised as a 4x3 array. Those already familiar with proof planning may want to skip straight to the critique which starts at §2, p164.

1

Background

Proof planning is a technique for guiding the search for a proof in automated theorem proving, [Bundy, 1988, Bundy, 1991, Kerber, 1998, Benzm¨ uller et al, 1997]. The main idea is to identify common patterns of reasoning in families of similar proofs, to represent them in a computational fashion and to use them to guide the search for a proof of conjectures from the same family. For instance, proofs by mathematical induction share the common pattern depicted in ﬁgure 1. This common pattern has been represented in the proof planners Clam and λClam and used to guide a wide variety of inductive proofs [Bundy et al, 1990b, Bundy et al, 1991, Richardson et al, 1998].

1

The research reported in this paper was supported by EPSRC grant GR/M/45030. I would like to thank Andrew Ireland, Helen Lowe, Raul Monroy and two anonymous referees for helpful comments on this paper. I would also like to thank other members of the Mathematical Reasoning Group and the audiences at CIAO and Scottish Theorem Provers for helpful feedback on talks from which this paper arose. Pointers to more detail can be found at http://dream.dai.ed.ac.uk/projects/proof planning.html

A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 160–177, 2002. c Springer-Verlag Berlin Heidelberg 2002

A Critique of Proof Planning

; ;

;

base case

induction

161

@ @@R step case ripple

? fertilize

Inductive proofs start with the application of an induction rule, which reduces the conjecture to some base and step cases. One of each is shown above. In the step case rippling reduces the diﬀerence between the induction conclusion and the induction hypothesis (see §1.2, p162 for more detail). Fertilization applies the induction hypothesis to simplify the rippled induction conclusion.

Fig. 1. ind strat: A Strategy for Inductive Proof

1.1

Proof Plans and Critics

The common patterns of reasoning are represented using tactics: computer programs which control proof search by applying rules of inference [Gordon et al, 1979]. These tactics are speciﬁed by methods. These methods give both the preconditions under which the tactics are applicable and the eﬀects of their successful application. Meta-level reasoning is used to combine the tactics into a customised proof plan for the current conjecture. This meta-level reasoning matches the preconditions of later tactics to the eﬀects of earlier ones. Examples of such customised proof plans are given in ﬁgure 2. Proof planning has been extended to capture common causes of proof failure and ways to patch them [Ireland, 1992, Ireland & Bundy, 1996b]. With each proof method are associated some proof critics. Critics have a similar format to methods, but their preconditions specify situations in which the method’s associated tactic will fail and instead of tactics they have instructions on patching a failed proof. Each of the critics associated with a method has a diﬀerent precondition. These are used to decide on an appropriate patch. Most of the critics built to date have been associated with the ripple method, or rather with its principle sub-method, wave, which applies one ripple step (see §1.2, p162). Among the

162

Alan Bundy ind strat( x + 1

↑

, x)

ind strat( x + 1

↑

[ ind strat( y + 1 ind strat( y + 1 Associativity of + x + (y + z) = (x + y) + z

, x) then ↑ ↑

, y) , y)

] Commutativity of + x+y =y+x

The associativity of + is an especially simple theorem, which can be proved with a single application of ind strat from ﬁgure 1, using a one step induction rule on induction variable x. The commutativity of + is a bit more complicated. ind strat is ﬁrst applied using induction variable x then in both the base and step cases there is a nested application of ind strat using y. The ﬁrst argument of ind strat indexes the induction rule using the rippling concept of wave-fronts (see §1.2, p162). The second argument speciﬁes the induction variable.

Fig. 2. Special-Purpose Proof Plans

patches these critics suggest are: a generalisation of the current conjecture, the use of an intermediate lemma, a case split and using an alternative induction rule. The use of a critic to generalise a conjecture is illustrated in ﬁgure 8. Proof planning has been tested successfully on a wide range of inductive and other theorems. These include conjectures arising from formal methods, i.e. from the veriﬁcation, synthesis and transformation of both software and hardware. They include, for instance: the transformation of naive into tail recursive programs [Hesketh et al, 1992], the veriﬁcation of a microprocessor, [Cantu et al, 1996], the synthesis of logic programs [Kraan et al, 1996], decision procedures [Armando et al, 1996] and the rippling tactic [Gallagher, 1993], resolution completeness proofs [Kerber & Sehn, 1997], proofs of limit theorems [Melis, 1998] and diagonalization proofs [Huang et al, 1995, Gow, 1997]. Critics are especially useful at coming up with, so called, ‘eureka’ steps, i.e. those proof steps that usually seem to require human intervention, for instance constructing appropriate induction rules, intermediate lemmas and generalisations [Lowe et al, 1998] and loop invariants [Ireland & Stark, 1997]. Proof planning has also been applied outwith mathematics to the computer games of bridge [Frank et al, 1992] and Go [Willmott et al, 1999] and also to problems of conﬁguring systems from parts, [Lowe, 1991, Lowe et al, 1996]. 1.2

Rippling

Rippling is the key method in proof plans for inductive proof. Not only does it guide the manipulation of the induction conclusion to prepare it for the application of the induction hypothesis, but preparation for rippling suggests an

A Critique of Proof Planning

163

appropriate induction rule and variable and diﬀerent patterns of rippling failure suggest new lemmas and generalisations. Since it is also cited several times in the critique, we have included a brief introduction to rippling here. Rippling is useful whenever there is a goal to be proved in the context of one or more ‘givens’. Givens may be axioms, previously proved theorems, assumptions or hypotheses. It works by calculating the diﬀerence between the goal and the given(s) and then systematically reducing it. The similarities and diﬀerences between the goal and given(s) are marked with meta-level annotations. These annotations are shown graphically in ﬁgure 5, where the notation of rippling is explained. An example of rippling is given in ﬁgure 6.

rev(nil) = nil rev(H :: T ) = rev(T ) (H :: nil) qrev(nil, L) = L qrev(H :: T, L) = qrev(T, H :: L) rev and qrev are alternative recursive functions for reversing a list. Each is deﬁned by a one-step list recursion using a base and step case. :: is an inﬁx list cons and an inﬁx list append. rev is a naive reverse function and qrev a more eﬃcient, tail-recursive function. The second argument of qrev is called an accumulator. This accumulator should be set to nil when qrev is ﬁrst applied to reverse a list. Figure 4 states two theorems that relate these two functions.

Fig. 3. Recursive Deﬁnitions of Two Reverse Functions

∀k. rev(k) = qrev(k, nil)

(1)

∀k, l. rev(k) l = qrev(k, l)

(2)

Theorem (1) shows that rev and qrev output the same result from the same input when the accumulator of qrev is initialised to nil. Theorem (2) generalises theorem (1) for all values of this accumulator. Paradoxically, the more specialised theorem (1) is harder to prove. One way to prove it is ﬁrst to generalise it to theorem (2).

Fig. 4. Two Theorems about List Reversing Functions

164

Alan Bundy

Given: rev(t) L = qrev(t, L) Goal: rev( h :: t Wave-Rules:

↑

) l = qrev( h :: t

rev( H :: T qrev( H :: T ( X Y

↑

↑

↑

↑

, l)

) ⇒ rev(T ) H :: nil

, L) ⇒ qrev(T, H :: L

↓

) Z ⇒ X ( Y Z

↑

) ↓

(3) (4)

)

(5)

The example is drawn from the inductive proof of theorem (2) in ﬁgure 4. The given and the goal are the induction hypothesis and induction conclusion, respectively, of this theorem. Wave-rules (3) and (4) are annotated versions of the step cases of the recursive deﬁnitions of the two list reversing functions in ﬁgure 3. Wave-rule (5) is from the associativity of . The grey boxes are called wave-fronts and the holes in them are called waveholes. The wave-fronts in the goal indicate those places where the goal diﬀers from the given. Those in the wave-rules indicate the diﬀerences between the left and right hand sides of the rules. The arrows on the wave-fronts indicate the direction in which rippling will move them: either outwards (↑) or inwards (↓). The corners, . . ., around the l in the goal indicate a sink. A sink is one of rippling’s target locations for wave-fronts; the other target is to surround an instance of the whole given with a wave-front. The wave-rules are used to rewrite each side of the goal. The eﬀect is to move the wave-fronts either to surround an instance of the given or to be absorbed into a sink. An example of this process is given in ﬁgure 6

Fig. 5. The Notation of Rippling

2

Critique

Our critique of proof planning is organised along two dimensions. On the ﬁrst dimension we consider four diﬀerent aspects of proof planning: (1) its potential for advance formation, (2) its theorem proving power, (3) its support for interaction and (4) its methodology. On the second dimension, for each aspect of the ﬁrst dimension we present: (a) the original dream, (b) the reality of current implementations and (c) the options available for overcoming obstacles and realising part of that original dream. 2.1

The Advance Formation of Plans

The Dream: In the original proposal for proof planning [Bundy, 1988] it was envisaged that the formation of a proof plan for a conjecture would precede its use to guide the search for a proof. Meta-level reasoning would be used to join general proof plans together by matching the preconditions of later ones to the

A Critique of Proof Planning

165

Given: rev(t) L = qrev(t, L) Goal: rev( h :: t ( rev(t) h :: nil

↑ ↑

) l = qrev( h :: t

↑

, l)

) l = qrev(t, h :: l)

rev(t) (h :: nil) l = qrev(t, h :: l) rev(t) h :: l = qrev(t, h :: l) The example comes from the step case of the inductive proof of theorem (2) from ﬁgure 4. Note that the induction variable k becomes the constant t in the ↑

in the goal. However, the other universal given and the wave-front h :: t variable, l, becomes a ﬁrst-order meta-variable, L, in the given, but a sink, l, in the goal. We use uppercase to indicate meta-variables and lowercase for object-level variables and constants. The left-hand wave-front is rippled-out using wave-rule (3) from ﬁgure 5, but then rippled-sideways using wave-rule (5), where it is absorbed into the lefthand sink. The right-hand wave-front is rippled-sideways using wave-rule (4) and absorbed into the right-hand sink. After the left-hand sink is simpliﬁed, using the recursive deﬁnition of , the contents of the two sinks are identical and the goal can be fertilized with the given, completing the proof. Note that fertilization uniﬁes the meta-variable L with the sink h :: l. Note that there is no point in rippling sideways unless this absorbs wave-fronts into sinks. Sinks mark the potential to unify wave-fronts with meta-variables during fertilization. Without sinks to absorb the wave-fronts, fertilization will fail. Such a failure is illustrated in ﬁgure 7

Fig. 6. An Example of Rippling

eﬀects of earlier ones. A tactic would then be extracted from the customised proof plan thus constructed. A complete proof plan would be sent to a tacticbased theorem prover where it would be unpacked into a formal proof with little or no search. The Reality: Unfortunately, in practice, this dream proved impossible to realise. The problem is due to the frequent impossibility of checking the preconditions of methods against purely abstract formulae. For instance, the preconditions of rippling include checking for the presence of wave-fronts in the current goal formula, that a wave-rule matches a sub-expression of this goal and that any new inwards wave-fronts have a wave-hole containing a sink. These preconditions cannot be checked unless the structure of the goal is known in some detail. To know this structure requires anticipating the eﬀects of the previous methods in the current plan. The simplest way to implement this is to apply each of the tactics of the previous methods in order.

166

Alan Bundy

Similar arguments hold for most of the other proof methods used by proof planners. This is especially true in applications to game playing where the diﬀerent counter actions of the opposing players must be explored before a response can be planned, [Willmott et al, 1999]. So the reality is an interleaving of proof planning and proof execution. Moreover, the proof is planned in a consecutive fashion, i.e. the proof steps are developed starting at one end of the proof then proceeding in order. At any stage of the planning process only an initial or ﬁnal segment of the object-level proof is known. The Options: One response to this reality is to admit defeat, abandon proof planning and instead recycle the preconditions of proof methods as preconditions for the application of tactics. Search can then be conducted in a space of condition/action production rules in which the conditions are the method preconditions and the actions are the corresponding tactics. Satisfaction of a precondition will cause the tactic to be applied thus realising the preconditions of subsequent tactics. Essentially, this strategy was implemented by Horn in the Oyster2 system [Horn, 1992]. The experimental results were comparable to earlier versions of Clam, i.e. if tactics are applied as soon as they are found to be applicable then proof planning conveys no advantage over Horn’s production rule approach. However, in subsequent developments some limited abstraction has been introduced into proof planning, in particular, the use of (usually second-order) meta-variables. In many cases the method preconditions can be checked on such partially abstract formulae. This allows choices in early stages of the proof to be delayed then made subsequently, e.g. as a side eﬀect of uniﬁcation of the meta-variables. We call this middle-out reasoning because it permits the nonconsecutive development of a proof, i.e. instead of having to develop a proof from the top down or the bottom up we can start in the middle and work outwards. Middle-out reasoning can signiﬁcantly reduce search by postponing a choice with a high branching factor until the correct branch can be determined. Figure 8 provides an example of middle-out reasoning. Among the choices that can be successfully delayed in this way are: the witness of an existential variable, the induction rule, [Bundy et al, 1990a], an intermediate lemma and generalisation of a goal [Ireland & Bundy, 1996b, Ireland & Bundy, 1996a]. Each of these has a high branching factor – inﬁnite in some cases. A single abstract branch containing meta-variables can simultaneously represent all the alternative branches. Incremental instantiation of the meta-variables as a side eﬀect of subsequent proof steps will implicitly exclude some of these branches until only one remains. Even though the higher-order2 uniﬁcation required to whittle down these choices is computationally expensive the cost is far less than the separate exploration of each branch. Moreover, the wave annotation can be exploited to control higher-order uniﬁcation by requiring wave-fronts to unify with wave-fronts and wave-holes to unify with wave-holes. 2

Only second-order unification is required for the examples tackled so far, but higherorder unification is required in the general case.

A Critique of Proof Planning

167

Given: rev(t) = qrev(t, nil) Goal: rev( h :: t ( rev(t) h :: nil

↑ ↑

) = qrev( h :: t ) = qrev( h :: t

↑

↑

, nil) , nil)

blocked The example comes from the failed step case of the inductive proof of theorem (1) from ﬁgure 4. A particular kind of ripple failure is illustrated. The left-hand wave-front can be rippled-out using wave-rule (3) and is then completely rippled. However, the right-hand wave-front cannot be rippledsideways even though wave-rule (4) matches it. This is because there is no sink to absorb the resulting inwards directed wave-front. If the wave-rule was nevertheless applied then any subsequent fertilization attempt would fail. Figure 8 shows how to patch the proof by a generalisation aimed to introduce a sink into the appropriate place in the theorem and thus allow the ripple to succeed.

Fig. 7. A Failed Ripple We have exploited this middle-out technique to especially good eﬀect in our use of critics, [Ireland & Bundy, 1996b]. Constraints have also been used as a least commitment mechanism in the Ωmega proof planner [Benzm¨ uller et al, 1997]. Suppose a proof requires an object with certain properties. The existence of such an object can be assumed and the properties posted as constraints. Such constraints can be propagated as the proof develops and their satisfaction interleaved with that proof in an opportunistic way [Melis et al, 2000b, Melis et al, 2000a]. Middle-out reasoning recovers a small part of the original dream of advance proof planning and provides some signiﬁcant search control advantage over the mere use of method preconditions in tactic-based production rules. 2.2

The Theorem Proving Power of Proof Planning

The Dream: One of the main aims of proof planning was to enable automatic theorem provers to prove much harder theorems than conventional theorem provers were capable of. The argument was that the meta-level planning search space was considerably smaller than the object-level proof search space. This reduction was partly due to the fact that proof methods only capture common patterns of reasoning, excluding many unsuccessful parts of the space. It was also because the higher-level methods, e.g. ind strat, each cover many objectlevel proof steps. Moreover, the use of abstraction devices, like meta-variables, enables more than one proof branch to be explored simultaneously. Such search space reductions should bring much harder proofs into the scope of exhaustive search techniques.

168

Alan Bundy

Schematic Conjecture: ∀k, l. F (rev(k), l) = qrev(k, G(l)) Given: F (rev(t), L) = qrev(t, G(L)) Goal: F (rev( h :: t

↑

F ( rev(t) h :: nil

rev(t) ( h :: nil F ( rev(t) h :: nil

rev(t) ( h :: F ( rev(t) h :: nil

↑

↑

), l) = qrev( h :: t

↑

↑

, G(l))

, l) = qrev(t, h :: G(l) ↓

, l)

) = qrev(t, h :: G(l) ↓

, l)

) = qrev(t, h :: G(l)

↓

↓

↓

)

)

)

rev(t) (h :: l) = qrev(t, h :: l) Meta-Variable Bindings: λu, v. u F (u, v)/F λu, v. v./F λu. u./G Generalised Conjecture: ∀k, l. rev(k) l = qrev(k, l) The example shows how the failed proof attempt in ﬁgure 7 can be analysed using a critic and patched in order to get a successful proof. The patch generalises the theorem to be proved by introducing an additional universal variable and hence a sink. Middle-out reasoning is used to delay determining the exact form of the generalisation. This form is determined later as a side eﬀect of higher-order uniﬁcation during rippling. First a schematic conjecture is introduced. A new universal variable l is introduced, in the right-hand side, at the point where a sink was required in the failed proof in ﬁgure 7. Since we are not sure exactly how l relates to the rest of the right-hand side a second-order meta-variable G is wrapped around it. On the left-hand side a balancing occurrence of l is introduced using the metavariable F . Note that l becomes a ﬁrst-order meta-variable L in the given, but a sink l in the goal. Induction on k, rippling, simpliﬁcation and fertilization are now applied, but higher-order uniﬁcation is used to instantiate F and G. If the schematic conjecture is now instantiated we see that the generalised conjecture is, in fact, theorem (2) from ﬁgure 4.

Fig. 8. Patching a Failed Proof using Middle-Out Reasoning The Reality: This dream has been partially realised. The reduced search space does allow the discovery of proofs that would be beyond the reach of purely object-level, automatic provers: for instance, many of the proofs listed in §1.1, p161.

A Critique of Proof Planning

169

Unfortunately, these very search reduction measures can also exclude the proofs of hard theorems from the search space, making them impossible to ﬁnd. The reduced plan space is incomplete. Hard theorems may require uncommon or even brand new patterns of reasoning, which have not been previously captured in proof methods. Or they may require existing tactics to be used in unusual ways that are excluded by their current heuristic preconditions. Indeed, it is often a characteristic of a breakthrough in mathematical proof that the proof incorporates some new kind of proof method, cf G¨ odel’s Incompleteness Theorems. Such proofs will not be found by proof planning using only already known proof methods, but could potentially be stumbled upon by exhaustive search at the object-level.

The Options: Firstly, we consider ways of reducing the incompleteness of proof planning, then ways of removing it. We should strive to ensure that the preconditions of methods are as general as possible, for instance, minimising the use of heuristic preconditions, as opposed to preconditions that are required for the legal application of the method’s tactic. This will help ensure that the tactic is applied whenever it is appropriate and not excluded due to a failure to anticipate an unusual usage. A balance is required here since the absence of all heuristic preconditions may increase the search space to an infeasible size. Rather diligence is needed to design both tactics and their preconditions which generalise away from the particular examples that may have suggested the reasoning pattern in the ﬁrst place. The use of critics expands the search space by providing a proof patch when the preconditions of a method fail. In practice, critics have been shown to facilitate the proof of hard theorems by providing the ‘eureka’ steps, e.g. missing lemmas, goal generalisations, unusual induction rules, etc, that hard theorems often require [Ireland & Bundy, 1996b]. However, even with these additions, the plan space is still incomplete; so the problem is only postponed. One way to restore completeness would be to allow arbitrary object-level proof steps, e.g. the application of an individual rule of inference such as rewriting, generalisation, induction, etc, with no heuristic limits on its application. Since such a facility is at odds with the philosophy of proof planning, its use would need to be carefully restricted. For instance, a proof method could be provided that made a single object-level proof step at random, but only when all other possibilities had been exhausted. Provided that the rest of the plan space was ﬁnite, i.e. all other proof methods were terminating, then this random method would occasionally be called and would have the same potential for stumbling upon new lines of proof that a purely object-level exhaustive prover does, i.e. we would not expect it to happen very often – if at all. It is interesting to speculate about whether it would be possible to draw a more permanent beneﬁt from such serendipity by learning a new proof method from the example proof. Note that this might require the invention of new meta-level concepts: consider, for instance, the learning of rippling from example

170

Alan Bundy

object-level proofs, which would require the invention of the meta-level concepts of wave-front, wave-hole, etc. Note that a ﬁrst-order object-level proof step might be applied to a formula containing meta-variables. This would require the ﬁrst-order step to be applied using higher-order uniﬁcation, – potentially creating a larger search space than would otherwise occur. Also, some object-level proof steps require the speciﬁcation of an expression, e.g. the witness of an existential quantiﬁer, an induction variable and term, the generalisation of an expression. If these expressions are not provided via user interaction then inﬁnite branching could be avoided by the use of meta-variables. So object-level rule application can introduce meta-variables even if they are not already present. These considerations further underline the need to use such object-level steps only as a last resort. 2.3

The Support for Interaction of Proof Planning

The Dream: Proof planning is not just useful for the automation of proof, it can also assist its interactive development. The language of proof planning describes the high-level structure of a proof and, hence, provides a high-level channel of communication between machine and user. This can be especially useful in a very large proof whose description at the object-level is unwieldy. The diﬀerent proof methods chunk the proof into manageable pieces at a hierarchy of levels. The method preconditions and eﬀects describe the relationships between and within each chunk and at each level. For instance, the language of rippling enables a proof state to be described in terms of diﬀerences between goals and givens, why it is important to reduce those diﬀerences and of ways to do so. The preconditions and eﬀects of methods and critics support the automatic analysis and patching of failed proof attempts. Thus the user can be directed to the reasons for a failed proof and the kind of steps required to remedy the situation. This orients the user within a large and complex search space and gives useful hints as to how to proceed. The Reality: The work of Lowe, Jackson and others in the XBarnacle system [Lowe & Duncan, 1997] shows that proof planning can be of considerable assistance in interactive proof. For instance, in Jackson’s PhD work, [Jackson, 1999, Ireland et al, 1999], the user assists in the provision of goal generalisations, missing lemmas, etc. by instantiating meta-variables. However, each of the advantages listed in the previous section brings corresponding disadvantages. Firstly, proof planning provides an enriched language of human/computer communication but at the price of introducing new jargon for the user to understand. The user of XBarnacle must learn the meaning of wave-fronts, ﬂawed inductions, fertilization, etc. Secondly, and more importantly, the new channel of communication assists users at the cost of restricting them to the proof planning search space; cf the discussion of incompleteness in §2.2, p168. For instance, XBarnacle users can

A Critique of Proof Planning

171

get an explanation of why a method or critic did or did not apply in terms of successful or failed preconditions. They can over-ride those preconditions to force or prevent a method or critic applying. But their actions are restricted to the search space of tactics and critics. If the proof lies outside that space then they are unable to direct XBarnacle to ﬁnd it. The Options: The ﬁrst problem can be ameliorated in a number of ways. Jargon can be avoided, translated or explained according to the expertise and preferences of the user. For instance, “fertilization” can be avoided in favour of, or translated into, the “use of the induction hypothesis”. “Wave-front”, on the other hand, has no such ready translation into standard terminology and must be explained within the context of rippling. Thus, although this problem can be irritating, it can be mitigated with varying amounts of eﬀort. The second problem is more fundamental. Since it is essentially the same as the problem of the incompleteness of the plan space, discussed in §2.2, p168, then one solution is essentially that discussed at the end of §2.2, p169. New methods can be provided which apply object-level proof steps under user control. As well as providing an escape mechanism for a frustrated user this might also be a valuable device for system developers. It would enable them to concentrate on the parts of a proof they were interested in automating while using interaction to ‘fake’ the other parts. The challenge is to integrate such object-level steps into the rest of the proof planning account. For instance, what story can we now tell about how such object-level steps exploit the eﬀects of previous methods and enable the preconditions of subsequent ones? 2.4

The Methodology of Proof Planning

The Dream: Proof planning aims to capture common patterns of reasoning and repair in methods and critics. In [Bundy, 1991] we provide a number of criteria by which these methods and critics are to be assessed. These include expectancy3 , generality, prescriptiveness4, simplicity, eﬃciency and parsimony. In particular, each method and critic should apply successfully in a wide range of situations (generality) and a few methods and critics should generate a large number of proofs (parsimony). Moreover, the linking of eﬀects of earlier methods and critics to the preconditions of later ones should enable a good ‘story’ to be told about how and why the proof plan works. This ‘story’ enables the expectancy criterion to be met. The Reality: It is hard work to ensure that these criteria are met. A new method or critic may originally be inspired by only a handful of examples. There is a constant danger of producing methods and critics that are too ﬁne tuned to 3 4

Some degree of assurance that the proof plan will succeed. The less search required the better.

172

Alan Bundy

these initial examples. This can arise both from a lack of imagination in generalising from the speciﬁc situation and from the temptation to get quick results in automation. Such over-speciﬁcity leads to a proliferation of methods and critics with limited applicability. Worse still, the declarative nature of methods may be lost as methods evolve into arbitrary code tuned to a particular problem set. The resulting proof planner will be brittle, i.e. will frequently fail when confronted with new problems. It will become increasing hard to tell an intelligible story about its reasoning. Critical reviewers will view the empirical results with suspicion, suspecting that the system has been hand-tuned to reproduce impressive results on only a handful of hard problems. As the consequences of over-speciﬁcity manifest themselves in failed proof attempts so the methods and critics can be incrementally generalised to cope with the new situations. One can hope that this process of incremental generalisation will converge on a few robust methods and critics, so realising the original dream. However, a reviewer may suspect that this process is both inﬁnite and non-deterministic, with each incremental improvement only increasing the range of the methods and critics by a small amount. The opposite problem is caused by an over-general or missing precondition, permitting a method to apply in an inappropriate situation. This may occur, for instance, where a method is developed in a context in which a precondition is implicit, but then applied in a situation in which it is absent. This problem is analogous to feature interaction in telecomms or of predicting the behaviour of a society of agents. The Options: The challenge is not only to adopt a development methodology that meets the criteria in [Bundy, 1991] but also to be seen to do so. This requires both diligence in the development of proof plans and the explicit demonstration of this diligence. Both aims can be achieved by experimental or theoretical investigations designed to test explicit hypotheses. For instance, to test the criterion of generality, systematic and thorough application of proof planning systems should be conducted. This testing requires a large and diverse set of examples obtained from independent sources. The diversity should encompass the form, source and diﬃculty level of the examples. However, the generality of the whole system should not be obtained at the cost of parsimony, i.e. by providing lots of methods and critics ‘hand crafted’ to cope with each problematic example; so each of the methods and critics must be shown to be general-purpose. Unfortunately, it is not possible to test each one in isolation, since the methods and critics are designed to work as a family. However, it is possible to record how frequently each method and critic is used during the course of a large test run. To meet the criterion of expectancy the speciﬁcations of the methods and critics should be declarative statements in a meta-logic. It should be demonstrated that the eﬀects of earlier methods enable the preconditions of later ones and that the patches of critics invert the failed preconditions of the methods to which they are attached. Such demonstrations will deal both with the situation

A Critique of Proof Planning

173

in which method preconditions/eﬀects are too-speciﬁc (they will not be strong enough hypotheses) and in which they are too general (they will not be provable). The work of Gallagher [Gallagher, 1993] already shows that this kind of reasoning about method preconditions and eﬀects can be automated. To meet the criterion of prescriptiveness the search space generated by rival methods needs to be compared either theoretically or experimentally; the method with the smaller search space is to be preferred. However, reductions in search space should not be obtained at the cost of unacceptable reductions in success rate. So it might be shown experimentally and/or via expectancy arguments that acceptable success rates are maintained. Reduced search spaces will usually contribute to increased eﬃciency, but it is possible that precondition testing is computationally expensive and that this cost more than oﬀsets the beneﬁts of the increased prescriptiveness, so overall eﬃciency should also be addressed.

3

Conclusion

In this paper we have seen that some of the original dreams of proof planning have not been fully realised in practice. We have shown that in some cases it has not been possible to deliver the dream in the form in which it was originally envisaged, for instance, because of the impossibility of testing method preconditions on abstract formulae or the inherent incompleteness of the planning search space. In each case we have investigated whether and how a lesser version of the original dream can be realised. This investigation both identiﬁes the important beneﬁts of the proof planning approach and points to the most promising directions for future research. In particular, there seem to be three important lessons that have permeated the analysis. Firstly, the main beneﬁts of proof planning are in facilitating a nonconsecutive exploration of the search space, e.g. by ‘middle-out’ reasoning. This allows the postponement of highly branching choice points using least commitment mechanisms, such as meta-variables or constraints. Parts of the search space with low branching rates are explored ﬁrst and the results of this search determine the postponed choices by side-eﬀect, e.g. using higher-order uniﬁcation or constraint solving. This can result in dramatic search space reductions. In particular, ‘eureka’ steps can be made in which witnesses, generalisations, intermediate lemmas, customised induction rules, etc, are incrementally constructed. The main vehicle for such non-consecutive exploration is critics. Our analysis points to the further development of critics as the highest priority in proof planning research. Secondly, in order to increase the coverage of proof planners in both automatic and interactive theorem proving it is necessary to combine it with more brute force approaches. For instance, it may be necessary to have default methods in which arbitrary object-level proof steps are conducted either at random or under user control. One might draw an analogy with simulated annealing in which it is sometimes necessary to make a random move in order to escape from a local minimum.

174

Alan Bundy

Thirdly, frequent and systematic rational reconstruction is necessary to oﬀset the tendency to develop over-specialised methods and critics. This tendency is a natural by-product of the experimental development of proof planning as speciﬁcations are tweaked and tuned to deal with challenging examples. It is necessary to clean-up non-declarative speciﬁcations, merge and generalise methods and critics and to test proof planners in a systematic and thorough way. The assessment criteria of [Bundy, 1991] must be regularly restated and reapplied. Despite the limitations exposed by the analysis of this paper, proof planning has been shown to have a real potential for eﬃcient and powerful, automatic and interactive theorem proving. Much of this potential still lies untapped and our analysis has identiﬁed the priorities and directions for its more eﬀective realisation. Afterword I ﬁrst met Bob Kowalski in June 1971, when I joined Bernard Meltzer’s Metamathematics Unit as a research fellow. Bernard had assembled a world class centre in automatic theorem proving. In addition to Bob, the other research fellows in the Unit were: Pat Hayes, J Moore, Bob Boyer and Donald Kuehner; Donald was the co-author, with Bob, of SL-Resolution, which became the theoretical basis for Prolog. Bob’s ﬁrst words to me were “Do you like computers? I don’t!”. This sentiment was understandable given the primitive computer facilities then available to us: one teletype with a 110 baud link to a shared ICL 4130 with 64k of memory. Bob went on to forsake the automation of mathematical reasoning as the main domain for theorem proving and instead pioneered logic programming: the application of theorem proving to programming. I stuck with mathematical reasoning and focussed on the problem of proof search control. However, I was one of the earliest adopters of Prolog and have been a major beneﬁciary of Bob’s work, using logic programming both as a practical programming methodology and as a domain for formal veriﬁcation and synthesis. I am also delighted to say that Bob has remained a close family friend for 30 years. Happy 60th birthday Bob!

References [Armando et al, 1996]

[Benzm¨ uller et al, 1997]

Armando, A., Gallagher, J., Smaill, A. and Bundy, A. (3-5 January 1996). Automating the synthesis of decision procedures in a constructive metatheory. In Proceedings of the Fourth International Symposium on Artiﬁcial Intelligence and Mathematics, pages 5–8, Florida. Also in the Annals of Mathematics and Artificial Intelligence, 22, pp 259–79, 1998. Benzm¨ uller, C., Cheikhrouhou, L., Fehrer, D., Fiedler, A., Huang, X., Kerber, M., Kohlhase, K., Meier, A, Melis, E., Schaarschmidt, W., Siekmann, J. and Sorge, V. (1997).

A Critique of Proof Planning

[Bundy, 1988]

[Bundy, 1991]

[Bundy et al, 1990a]

[Bundy et al, 1990b]

[Bundy et al, 1991]

[Cantu et al, 1996]

[Frank et al, 1992]

[Gallagher, 1993]

[Gordon et al, 1979]

[Gow, 1997]

[Hesketh et al, 1992]

175

Ωmega: Towards a mathematical assistant. In McCune, W., (ed.), 14th International Conference on Automated Deduction, pages 252–255. Springer-Verlag. Bundy, A. (1988). The use of explicit plans to guide inductive proofs. In Lusk, R. and Overbeek, R., (eds.), 9th International Conference on Automated Deduction, pages 111–120. Springer-Verlag. Longer version available from Edinburgh as DAI Research Paper No. 349. Bundy, Alan. (1991). A science of reasoning. In Lassez, J.L. and Plotkin, G., (eds.), Computational Logic: Essays in Honor of Alan Robinson, pages 178–198. MIT Press. Also available from Edinburgh as DAI Research Paper 445. Bundy, A., Smaill, A. and Hesketh, J. (1990a). Turning eureka steps into calculations in automatic program synthesis. In Clarke, S. L.H., (ed.), Proceedings of UK IT 90, pages 221–6. IEE. Also available from Edinburgh as DAI Research Paper 448. Bundy, A., van Harmelen, F., Horn, C. and Smaill, A. (1990b). The Oyster-Clam system. In Stickel, M. E., (ed.), 10th International Conference on Automated Deduction, pages 647–648. Springer-Verlag. Lecture Notes in Artificial Intelligence No. 449. Also available from Edinburgh as DAI Research Paper 507. Bundy, A., van Harmelen, F., Hesketh, J. and Smaill, A. (1991). Experiments with proof plans for induction. Journal of Automated Reasoning, 7:303–324. Earlier version available from Edinburgh as DAI Research Paper No 413. Cantu, Francisco, Bundy, Alan, Smaill, Alan and Basin, David. (1996). Experiments in automating hardware verification using inductive proof planning. In Srivas, M. and Camilleri, A., (eds.), Proceedings of the Formal Methods for Computer-Aided Design Conference, number 1166 in Lecture Notes in Computer Science, pages 94–108. SpringerVerlag. Frank, I., Basin, D. and Bundy, A. (1992). An adaptation of proof-planning to declarer play in bridge. In Proceedings of ECAI-92, pages 72–76, Vienna, Austria. Longer Version available from Edinburgh as DAI Research Paper No. 575. Gallagher, J. K. (1993). The Use of Proof Plans in Tactic Synthesis. Unpublished Ph.D. thesis, University of Edinburgh. Gordon, M. J., Milner, A. J. and Wadsworth, C. P. (1979). Edinburgh LCF - A mechanised logic of computation, volume 78 of Lecture Notes in Computer Science. SpringerVerlag. Gow, J. (1997). The Diagonalization Method in Automatic Proof. Undergraduate project dissertation, Dept of Artificial Intelligence, University of Edinburgh. Hesketh, J., Bundy, A. and Smaill, A. (June 1992). Using middle-out reasoning to control the synthesis of tail-

176

Alan Bundy

recursive programs. In Kapur, Deepak, (ed.), 11th International Conference on Automated Deduction, volume 607 of Lecture Notes in Artiﬁcial Intelligence, pages 310–324, Saratoga Springs, NY, USA. [Horn, 1992] Horn, Ch. (1992). Oyster-2: Bringing type theory into practice. Information Processing, 1:49–52. [Huang et al, 1995] Huang, X., Kerber, M. and Cheikhrouhou, L. (1995). Adapting the diagonalization method by reformulations. In Levy, A. and Nayak, P., (eds.), Proc. of the Symposium on Abstraction, Reformulation and Approximation (SARA-95), pages 78–85. Ville d’Esterel, Canada. [Ireland & Bundy, 1996a] Ireland, A. and Bundy, A. (1996a). Extensions to a Generalization Critic for Inductive Proof. In McRobbie, M. A. and Slaney, J. K., (eds.), 13th International Conference on Automated Deduction, pages 47–61. Springer-Verlag. Springer Lecture Notes in Artificial Intelligence No. 1104. Also available from Edinburgh as DAI Research Paper 786. [Ireland & Bundy, 1996b] Ireland, A. and Bundy, A. (1996b). Productive use of failure in inductive proof. Journal of Automated Reasoning, 16(1– 2):79–111. Also available from Edinburgh as DAI Research Paper No 716. [Ireland & Stark, 1997] Ireland, A. and Stark, J. (1997). On the automatic discovery of loop invariants. In Proceedings of the Fourth NASA Langley Formal Methods Workshop. NASA Conference Publication 3356. Also available as Research Memo RM/97/1 from Dept of Computing and Electrical Engineering, HeriotWatt University. [Ireland, 1992] Ireland, A. (1992). The Use of Planning Critics in Mechanizing Inductive Proofs. In Voronkov, A., (ed.), International Conference on Logic Programming and Automated Reasoning – LPAR 92, St. Petersburg, Lecture Notes in Artificial Intelligence No. 624, pages 178–189. Springer-Verlag. Also available from Edinburgh as DAI Research Paper 592. [Ireland et al, 1999] Ireland, A., Jackson, M. and Reid, G. (1999). Interactive Proof Critics. Formal Aspects of Computing: The International Journal of Formal Methods, 11(3):302–325. A longer version is available from Dept. of Computing and Electrical Engineering, Heriot-Watt University, Research Memo RM/98/15. [Jackson, 1999] Jackson, M. (1999). Interacting with Semi-automated Theorem Provers via Interactive Proof Critics. Unpublished Ph.D. thesis, School of Computing, Napier University. [Kerber & Sehn, 1997] Kerber, Manfred and Sehn, Arthur C. (1997). Proving ground completeness of resolution by proof planning. In Dankel II, Douglas D., (ed.), FLAIRS-97, Proceedings of the 10th International Florida Artiﬁcial Intelligence Research Symposium, pages 372–376, Daytona, Florida, USA. Florida AI Research Society, St. Petersburg, Florida, USA. [Kerber, 1998] Kerber, Manfred. (1998). Proof planning: A practical approach to mechanized reasoning in mathematics. In Bibel,

A Critique of Proof Planning

[Kraan et al, 1996]

[Lowe & Duncan, 1997]

[Lowe, 1991]

[Lowe et al, 1996]

[Lowe et al, 1998]

[Melis, 1998]

[Melis et al, 2000a]

[Melis et al, 2000b]

[Richardson et al, 1998]

[Willmott et al, 1999]

177

Wolfgang and Schmitt, Peter H., (eds.), Automated Deduction, a Basis for Application – Handbook of the German Focus Programme on Automated Deduction, chapter III.4, pages 77–95. Kluwer Academic Publishers, Dordrecht, The Netherlands. Kraan, I., Basin, D. and Bundy, A. (1996). Middle-out reasoning for synthesis and induction. Journal of Automated Reasoning, 16(1–2):113–145. Also available from Edinburgh as DAI Research Paper 729. Lowe, H. and Duncan, D. (1997). XBarnacle: Making theorem provers more accessible. In McCune, William, (ed.), 14th International Conference on Automated Deduction, pages 404–408. Springer-Verlag. Lowe, Helen. (1991). Extending the proof plan methodology to computer configuration problems. Artiﬁcial Intelligence Applications Journal, 5(3). Also available from Edinburgh as DAI Research Paper 537. Lowe, H., Pechoucek, M. and Bundy, A. (October 1996). Proof planning and configuration. In Proceedings of the Ninth Exhibition and Symposium on Industrial Applications of Prolog. Also available from Edinburgh as DAI Research Paper 859. Lowe, H., Pechoucek, M. and Bundy, A. (1998). Proof planning for maintainable configuration systems. Artiﬁcial Intelligence in Engineering Design, Analysis and Manufacturing, 12:345–356. Special issue on configuration. Melis, E. (1998). The “limit” domain. In Simmons, R., Veloso, M. and Smith, S., (eds.), Proceedings of the Fourth International Conference on Artiﬁcial Intelligence in Planning Systems, pages 199–206. Melis, E., Zimmer, J. and M¨ uller, T. (2000a). Extensions of constraint solving for proof planning. In Horn, W., (ed.), European Conference on Artiﬁcial Intelligence, pages 229– 233. Melis, E., Zimmer, J. and M¨ uller, T. (2000b). Integrating constraint solving into proof planning. In Ringeissen, Ch., (ed.), Frontiers of Combining Systems, Third International Workshop, FroCoS’2000, number 1794 in Lecture Notes on Artificial Intelligence, pages 32–46. Springer. Richardson, J. D. C, Smaill, A. and Green, I. (July 1998). System description: proof planning in higher-order logic with Lambda-Clam. In Kirchner, Claude and Kirchner, H´el`ene, (eds.), 15th International Conference on Automated Deduction, volume 1421 of Lecture Notes in Artiﬁcial Intelligence, pages 129–133, Lindau, Germany. Willmott, S., Richardson, J., Bundy, A. and Levine, J. (1999). An adversarial planning approach to Go. In Jaap van den Herik, H. and Iida, H., (eds.), Computers and Games, pages 93–112. 1st Int. Conference, CG98, Springer. Lecture Notes in Computer Science No. 1558.

A Model Generation Based Theorem Prover MGTP for First-Order Logic Ryuzo Hasegawa, Hiroshi Fujita, Miyuki Koshimura, and Yasuyuki Shirai Graduate School of Information Science and Electrical Engineering Kyushu University 6-1, Kasuga-koen, Kasuga, Fukuoka 816-8580, JAPAN {hasegawa,fujita,koshi,shirai}@ar.is.kyushu-u.ac.jp

Abstract. This paper describes the major results on research and development of a model generation theorem prover MGTP. It exploits OR parallelism for non-Horn problems and AND parallelism for Horn problems achieving more than a 200-fold speedup on a parallel inference machine PIM with 256 processing elements. With MGTP, we succeeded in proving diﬃcult mathematical problems that cannot be proven on sequential systems, including several open problems in ﬁnite algebra. To enhance the pruning ability of MGTP, several new features are added to it. These include: CMGTP and IV-MGTP to deal with constraint satisfaction problems, enabling negative and interval constraint propagation, respectively, non-Horn magic set to suppress the generation of useless model candidates caused by irrelevant clauses, a proof simpliﬁcation method to eliminate duplicated subproofs, and MM-MGTP for minimal model generation. We studied several techniques necessary for the development of applications, such as negation as failure, abductive reasoning and modal logic systems, on MGTP. These techniques share a basic idea, which is to use MGTP as a meta-programming system for each application.

1

Introduction

Theorem proving is an important basic technology that gave rise to logic programming, and is acquiring a greater importance not only for reasoning about mathematical theorems but also for developing knowledge processing systems. We started research on parallel theorem provers in 1989 in the Fifth Generation Computer Systems (FGCS) project, with the aim of integrating logic programming and theorem proving technologies. The immediate goal of this research was to develop a fast theorem proving system on the parallel inference machine PIM [42], by eﬀectively utilizing KL1 languages [55] and logic programming techniques. MGTP [11,12] basically follows the model generation method of SATCHMO [38] which has a good property that one way uniﬁcation suﬃces. Indeed, the method is very suited to KL1 implementation because we can use fast builtin uniﬁcation without occur-check. MGTP exploits OR parallelism from non-Horn A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 178–213, 2002. c Springer-Verlag Berlin Heidelberg 2002

A Model Generation Based Theorem Prover MGTP for First-Order Logic

179

problems by independently exploring each branch of a proof tree caused by case splitting, whereas it exploits AND parallelism from Horn problems that do not cause case splitting. Although OR parallelization of MGTP is relatively easy, it is essential to reduce the amount of inter processor communication. For this, we proposed a new method called the N-sequential method [22]. The basic idea is that we run in each processing element (PE) a sequential algorithm to traverse a proof tree depth-ﬁrst and restrict the number of tasks being activated to at most the number N of available PEs. Almost linear speedup was achieved for both Horn and non-Horn problems on a PIM/m system consisting of 256 PEs. With MGTP, we succeeded in solving some open quasigroup problems in ﬁnite algebra [13]. We also solved several hard condensed detachment problems that could not be solved by OTTER [39] with any strategy [25]. On the other hand, research on solving quasigroup problems with MGTP reveals that it lacks negative constraint propagation ability. Then, we developed CMGTP (Constraint-MGTP) [50] that can handle constraint propagations with negative atoms. As a result, CMGTP’s search spaces became much smaller than the original MGTP’s. Recently, we have been developing Java versions of MGTP (JavaMGTP) aiming at better eﬃciency as well as wider usability. JavaMGTP achieves several tens fold speedup compared to KL1 based implementations on a sequential machine. However, in order to further improve the eﬃciency of model generation, several problems remain to be solved that are common to model generation based provers: redundant inference caused by clauses that are irrelevant to the given goal, duplication of the same subproof after case-splitting, and generation of nonminimal models. To solve the ﬁrst problem, we developed a method called non-Horn magic sets (NHM) [24,45]. NHM is a natural extension of the magic sets developed in the deductive database ﬁeld, and is applicable to non-Horn problems. We showed that NHM has the same power as relevancy testing in SATCHMORE [36], although they take completely diﬀerent approaches. For the second problem, we came up with a method that combines the relevancy testing realized by NHM and SATCHMORE with folding-up proposed by Letz [34], within a single framework [32]. The method has not only an eﬀect similar to relevancy testing that suppresses useless model extensions with irrelevant clauses, but also a folding-up function to eliminate duplicate subproofs. These can be achieved by computing relevant literals that contribute to closing a branch. The third problem is how to avoid generating nonminimal models that are redundant and thus would cause ineﬃciency. To this end, we proposed an eﬃcient method that employs branching assumptions and lemmas so as to prune branches that lead to nonminimal models, and to reduce minimality tests on obtained models [23]. Then, we have implemented MM-MGTP based on the method. Experimental results with MM-MGTP show a remarkable speedup compared to the original MGTP.

180

Ryuzo Hasegawa et al.

Regarding applications, MGTP can be viewed as a meta-programming system. We can build various reasoning systems on MGTP by writing the inference rules used for each system as MGTP input clauses. Along this idea, we developed several techniques and reasoning systems necessary for AI applications. They include a method to incorporate negation as failure into MGTP [29], abductive reasoning systems [30], and modal logic systems [31]. In particular, MGTP has actually been used as a rule-based engine for the argumentation and negotiation support system in the legal area.

2

An Abstract MGTP Procedure

MGTP is a theorem proving system for ﬁrst-order logic. An input for MGTP is given as a set of clauses of the implicational form: Π →Σ where, normally, the antecedent Π is a conjunction of atoms and the consequent Σ is a disjunction of atoms1 . A clause is said to be positive if its antecedent is empty or true, negative if its consequent is empty or false, and mixed otherwise. A clause is called a Horn clause if it has at most one atom in its consequent, otherwise it is called a non-Horn clause. A clause is said to be range-restricted if every variable in the consequent of the clause appears in the antecedent, and violated under a model candidate M if it holds that M |= Πσ and M |= Σσ with some substitution σ. A generic algorithm of a standard MGTP procedure is sketched in Fig. 1. The task of MGTP is to try to construct a model for a given set of clauses, by extending the current model candidate M so as to satisfy violated clauses under M (model extension). The function M G takes as an initial input positive Horn clauses U0 , positive non-Horn clauses D0 , and an empty model candidate M , and returns true/false (SAT/UNSAT) as a proof result. MG also outputs a model every time it is found. It works as follows: (1) As long as the unit buﬀer U is not empty, M G picks up an atom u from U , tests whether M |= u (subsumption test), and extends a model candidate M with u (Horn extension). Then, the conjunctive matching procedure CJM (M, u) is invoked to search for clauses whose antecedents Π are satisﬁed by M ∪ {u} under some substitution σ. If such nonnegative clauses are found, their consequents Σσ are added to U or the disjunction buﬀer D according to the form of a consequent. When the antecedent of a negative clause is satisﬁed by M ∪ {u} in CJM (M, u), M G rejects M and returns false (model rejection). (2) When U becomes empty, and if D is not empty, M G picks up a disjunction d from D. If d is not satisﬁed by M , M G recursively calls itself to expand M with each disjunct Lj ∈ d (non-Horn extension). (3) When both U and D become empty, M G outputs M and returns true. 1

This is the primitive form of a clause in a standard MGTP, which will be extended in several ways in MGTP descendants.

A Model Generation Based Theorem Prover MGTP for First-Order Logic procedure MGTP : begin U0 ← positive Horn clauses; D0 ← positive non-Horn clauses; output M G(U0 , D0 , ∅); end ; boolean function M G(buﬀer U, buﬀer D, buﬀer M ) : begin while (U = ∅) begin U ← U \ {u ∈ U }; if (M |= u) then begin M ← M ∪ {u}; CJM (M, u); if (M is rejected) then return false ; end end ; if (D = ∅) then begin D ← D \ {d ∈ D}; (where d = (L1 ∨ . . . ∨ Ln )) if (M |= d) then return

n

181

· · · (1)

· · · (2)

M G(U ∪ {Lj }, D, M );

j=1

end else begin output M ; return true ; end end .

· · · (3)

Fig. 1. A standard MGTP procedure The standard MGTP procedure might be modiﬁed in several ways. For instance, each disjunct of Σ is allowed to be a conjunction of literals. This is especially useful, in fact, for implementing a negation as failure mechanism [29]. We can also extend the procedure to deal with negative literals by introducing two additional operations: unit refutation and unit simpliﬁcation. This extension yields CMGTP [28,50] which is meant for solving constraint satisfaction problems more eﬃciently, and MM-MGTP [23] for minimal model generation. Although the procedure apparently looks sequential, it can be parallelized by exploiting parallelism inherent in it. These issues will be described in detail in subsequent sections.

3

Parallel Implementation

There are several ways to parallelize the proving process in MGTP. These are to exploit parallelism in conjunctive matching, subsumption tests, and case splitting. For ground non-Horn cases, it is most promising to exploit OR parallelism induced by case splitting. Here we use OR parallelism to seek multiple models, which produce multiple solutions, in parallel. For Horn clauses, we have to exploit AND parallelism during the traversal of a single branch. The main source of AND parallelism is conjunctive matching and subsumption testing.

182

Ryuzo Hasegawa et al.

push

?

pop top for self PE

6

PE master

give task

task newer ...

- popfor bottom other PEs ;;;;;;;;;;;;;; task older

Fig. 2. Task stack

3.1

; @ ; @@[email protected]@ ;;;; @@ ; PE 1

take task

PE 2

...

PE n

Fig. 3. Process diagram for OR parallelization

OR Parallelization

For ground non-Horn clauses, it is relatively easy for MGTP to exploit OR parallelism by exploring diﬀerent branches (model candidates) in diﬀerent processing elements (PEs) independently. However, inter-PE communication increases rapidly as the number of branching combinatorially explodes and a large amount of data structures, e.g. model candidates and model extending candidates, is copied to one PE after another. Conventional PE allocation methods, such as cyclic and probabilistic allocation, are based on the principle that whenever tasks are created in own PE, all of them but one are to be thrown to other PEs. Although this scheme is easy to implement, the amount of inter-PE communication is at least proportional to the number of tasks created in the whole computation. To overcome this, we proposed a new method called the N-sequential method [22]. The basic idea is that we run in each PE a sequential algorithm to traverse a proof tree depth-ﬁrst and restrict the number of activated tasks at any time to at most the number N of available PEs. In this method, a PE can move an unsolved task to other idle PE only when requested from it. When the number of created tasks exceeds the number of free PEs, the excess of tasks are executed sequentially within their current PE. Each PE maintains a task stack as shown in Fig. 2 for use in the sequential traversal of multiple unsolved branches. Created tasks are pushed onto the stack, then popped up from the top of stack (pop top) when the current task has been completed. On receipt of a request from the other PE, a task at the bottom is given to it (pop bottom). We provide a master process as shown in Fig. 3 which acts as a matchmaker between task-requesting (take task) and task-oﬀering (give task) PEs. The task stack process and the master process are written in KL1 and incorporated to the MGTP program. OR Parallel Performance. The performance of OR parallel MGTP was evaluated on a PIM/m with 128 PEs and a Sun Enterprise 10000 (SE10k) with 24 PEs. For the latter we used the Klic system which compiles KL1 programs into C codes and makes it possible to run them on a single machine or parallel ma-

A Model Generation Based Theorem Prover MGTP for First-Order Logic

25

Ideal GRP124-8.004 test2-445 PUZ010-1 QG5-12

20

3 +

3 2

+ ×

2 3

15

×

+

2 10 5 0

3 × 2 +

+ × 3 2 0

×

3 + 2 ×

3

183

+ 2 ×

5

10 15 Number of PEs

20

25

Fig. 4. Speedup ratio by OR parallelization on SE10k(1–24PE)

chines like SE10k. The experimental results show signiﬁcant speedups on both systems. Figure 4 shows a speedup ratio by OR parallel execution for non-Horn problems using the N-sequential method on SE10k. Here, GRP124-8.004 and PUZ0101 are problems taken from the TPTP library [53], QG5-12 is a quasigroup problem to be explained in later sections, and test2-445 is an artiﬁcial benchmark spanning a trinary tree. A satisfactory speedup is attained for such problem as GRP124-8.004 and test2-445 in which the number of non-Horn extensions dominates that of Horn extensions. The reason why the PUZ010-1 and QG5-12 degrade the speedup is that they require a signiﬁcant number of Horn-extensions, whereas they do only a small number of non-Horn extensions.

(a) Cyclic allocation method

(b) N-sequential method

Fig. 5. Snapshot of “xamonitor” on PIM/m

184

Ryuzo Hasegawa et al.

Figure 5 depicts snapshots of a “xamonitor” window that indicates the CPU usage on PIM/m which is sampled and displayed at every second of interval. With this ﬁgure, one can observe clear distinction of the characteristic behavior between the cyclic allocation and N-sequential methods. The lighter part of each bar in the graph indicates the percentage of the CPU time used for the net computation during an interval (one second), and the darker part indicates the rate of inter-PE communication. The inter-PE communication consumed about 20 percent of the execution time for the cyclic allocation, whereas it took almost negligible time for the N-sequential method. Furthermore, for the cyclic allocation, the percentage of idling time increases as the computation progresses, whereas there is almost no idling time for the N-sequential method. As a result, the execution of N-sequential method terminates faster than the cyclic allocation. 3.2

AND Parallelization

The computational mechanism for MGTP is essentially based on the “generateand-test” scheme. However, this approach would cause over-generation of atoms, leading to the waste of time and memory spaces. In the AND parallelization of MGTP, we adopted the lazy model generation method [26] that induces a demand-driven style of computation. In this method, a generator process to perform model extension generates a speciﬁed number of atoms only when required by the tester process to perform rejection testing. The lazy mechanism can avoid over-generation of atoms in model extension, and provides ﬂexible control to maintain a high running rate in a parallel environment. Figure 6 shows a process diagram for AND parallel MGTP. It consists of generator(G), tester(T), and master(M) processes. In our implementation, several G and T processes are allocated to each PE. G(T) processes perform conjunctive matching with mixed(negative) clauses. Atoms created by a G process are stored in a N ew buﬀer in the G, and are sent via the Master to T processes to perform rejection testing. The M process prevents G processes from generating too many atoms by monitoring the number of atoms stored in N ew buﬀers and by keeping that number in a moderate range. This number indicates the diﬀerence between the number of atoms generated by G processes and the number of atoms tested by T processes. By simply controlling G and T processes with the buﬀering mechanism mentioned above, the idea of lazy model generation can be implemented. This also enables us to balance the computational load of G and T processes, thus keeping a high running rate. AND Parallel Performance. Figure 7 shows AND parallel performance for solving condensed detachment problems [39] on PIM/m with 256 PEs. Proving time (sec) obtained with 32 PEs for each problem is as follows: #49:18600, #44:9700, #22:8600, #79:2500, and #82:1900. The numbers of atoms that have been kept in M and D are in between 15100 and 36500. More than a 230-fold speedup was attained for #49 and #44, and a 170 to 180-fold speedup for #22, #79 and #82.

@ B ; @@ ; @ B ; R@ B ;; ; @@ ; @@ ; R ;; @

A Model Generation Based Theorem Prover MGTP for First-Order Logic

G2

...

Gg

newg

new1

M aster

∆1

T1

∆t

T2

...

Tt

256 Speedup

G1

ideal

#49 #44 #79 #22 #82

128 64 32 0

185

0 32 64

Fig. 6. Process diagram for AND parallelization

128 No. of PEs

256

Fig. 7. Speedup ratio

To verify the eﬀectiveness of an AND parallel MGTP, we challenged 12 hard condensed detachment problems. These problems could not be solved by OTTER with any strategy proposed in [39]. 7 of 12 problems were solved within an hour except for problem #23, in which the maximum number of atoms being stored in M and D was 85100. The problems we failed to solve were such that this size exceeds 100000 and more than 5 hours are required to solve them. 3.3

Java Implementation of MGTP

While MGTP was originally meant for parallel theorem proving based on parallel logic programming technology, Java implementations of it (JavaMGTP) [20,21] have been developed aiming at more pervasive use of MGTP through the Java technology. Here, we will brieﬂy describe these recent implementations and results for interested readers. The advantages of JavaMGTP’s over the previous implementations with logic languages include platform independence, friendly user interfaces, and ease of extension. Moreover, JavaMGTP achieved the best performance on conventional machines among a family of model generation based provers. This achievement is brought by several implementation techniques that include a special mechanism called A-cells for handling multiple contexts, and an eﬃcient term indexing. It is also a key to the achievement that we eﬀectively utilize Java language facilities such as sophisticated class hierarchies, method overriding, and automatic memory management (garbage collection), as well as destructive assignment. A-cells. Finding a clause Γ → B1 ∨ . . . ∨ Bm violated under a current model candidate M , i.e., (M |= Γ ) ∧ (∀j(1≤j≤m) . M |= Bj ) holds, MGTP extends M to M ∪{B1 }, . . . , M ∪{Bm }. Repeating such extension forms a tree of model candidates, called an MG tree. Thus, each model candidate Mi comprises a sequence < Mi0 , . . . , Mij , . . . , Miki > of sets of literals, where j is a serial number given to

186

Ryuzo Hasegawa et al. S1 = { → a ∨ b. a → c ∨ d. c → ¬e. b → d. d → f. }

φ

φ

; a A ; c A ◦ 2

¬e M1

(a)

◦ 1

; a A ;@ c A d

◦ 1

⇒

• 2

¬e M1

M2 (b)

φ HHH ; a A b ; @ c A d A d • 1

A◦3

⇒

• 2

¬e M1

A◦4

• 3

M2 ⇓ A◦4 f M3 (c)

Fig. 8. Clause set S1 and its MG-tree a context, i.e., a branch extended with a disjunct literal B j , and Mij contains B j and literals used for Horn extension that follow B j . The most frequent operation in MGTP is to check if a ground literal L belongs to the current model candidate M . For this, we introduce an Activation-cell (A-cell) [21]. For each Mij above, we allocate an A-cell Aj containing a boolean ﬂag act. When Mij gets included in the current model candidate M , the act ﬂag of the associated A-cell Aj is set true (denoted by A◦j ), indicating Mij is active. When Mij becomes no longer a member of M , the act of Aj is set false (denoted by A•j ), indicating Mij is inactive. On the other hand, we allocate for each atom P two variables called pac and nac, and assign a pointer to Aj to pac(nac) when P (¬P ) becomes a member of Mij . Note that all occurrences of P and its complement ¬P are represented with a unique object for P in the system. Thus, whether P (¬P ) ∈ Mij can be checked simply by looking into Aj via pac(nac) of P . This A-cell mechanism reduces the complexity of the membership test to O(1) from O(|M |) which would be taken if it were naively implemented. Figure 8 (a) shows an MG tree when a model M1 is generated, in which pac of a refers to an A-cell A◦1 , and both pac of c and nac of e refer to A◦2 . In Fig. 8 (b), the current model candidate has moved from M1 to M2 , so that the A-cell A◦2 is inactivated (changed to A•2 ), which means that neither c nor ¬e belongs to the current model candidate M2 = {a, d}. In Fig. 8 (c), the current model candidate is now M3 = {b, d, f }, and the fact is easily recognized by looking into pac ﬁelds of b, d, and f . Note that d’s pac ﬁeld was updated from A•3 to A◦4 . It is also easily seen that none of the other “old” literals a, c, and ¬e belongs to M3 , since their pac or nac ﬁeld refers to the inactivated A-cell A•1 or A•2 . Graphics. A JavaMGTP provides users with a graphical interface called Proof Tree Visualizer (PTV) for visualizing and controlling the proof process, which is especially useful for debugging and educational purpose. Several kinds of graphical representation for a proof tree can be chosen in PTV, e.g., a standard tree and a radial tree (see Fig. 9). The available graphical functions on a proof tree include: zooming up/down, marking/unmarking nodes, and displaying statistical information on each node. All these graphical operations are performed in concurrent with the proving process by using the multi-threading facility of Java.

A Model Generation Based Theorem Prover MGTP for First-Order Logic

187

Fig. 9. A snapshot of PTV window

Moreover, one can pause/resume a proving process via the mouse control on the graphic objects for a proof tree. Performance of a JavaMGTP. We compared JavaMGTP written in JDK1.2 (+JIT) with a Klic version of MGTP (KlicMGTP) written in KLIC v3.002 and the fastest version [49] of SATCHMO [38] written in ECLi PSe v3.5.2, on a SUN UltraSPARC10 (333MHz, 128MB). 153 range-restricted problems are taken from TPTP v2.1.1 [53], of which 42 satisﬁable problems were run in all-solution mode. In Fig. 10–13, the problems are arranged and numbered in an ascending order of their execution times taken by JavaMGTP. In Fig. 12,13, a black bar shows the runtime ratio for a propositional problem, while a white bar for a ﬁrst-order problem. A gap between bars (ratio zero) indicates the problems for which the two systems gave diﬀerent proofs. JavaMGTP Vs. KlicMGTP. Regarding the problems after #66 for which JavaMGTP takes more than one millisecond, JavaMGTP is 12 to 26 times (except #142) as fast as KlicMGTP for propositional cases, while 5 to 20 times for ﬁrst-order cases (Fig. 12). This diﬀerence in performance is explained as follows. In JavaMGTP, CJM of ground literals like p, q(a) is performed with A-cells, while CJM of a nonground literal like r(X, Y ) is performed with a term memory (TM) [51] rather heavier than A-cells. On the other hand, KlicMGTP always utilizes a TM for CJM, which contains some portions to be linearly scanned. Moreover, since in KlicMGTP, the TM has to be copied every time case splitting occurs, this overhead degrades the performance more signiﬁcantly as the problem becomes harder.

188

Ryuzo Hasegawa et al.

Fig. 12. JavaMGTP vs. KlicMGTP

Fig. 13. JavaMGTP vs. SATCHMO

JavaMGTP Vs. SATCHMO. SATCHMO solved three problems faster than JavaMGTP, while it failed to solve some problems due to memory overﬂow. This is because the proofs given by the two systems diﬀer for such problems. For the other problems, SATCHMO gives the same proofs as JavaMGTP. Observe the problems after #66 in Fig. 13. JavaMGTP is 8 to 23 times as fast as SATCHMO for propositional cases. As for ﬁrst-order cases, JavaMGTP achieves 27- to 38fold speedup compared to SATCHMO for some problems, although its speedup gain is about 3 to 5 for most problems. In SATCHMO, since a model candidate M is maintained by using assert/retract of Prolog, the complexity of CJM is always O(|M |). On the other hand, JavaMGTP can perform CJM of ground literals in O(1) with A-cells. Consequently, a remarkable eﬀect brought by this is seen for propositional problems as well as in Fig. 12. The diﬀerence in runtime for ﬁrst-order problems is mainly caused by that in speed between match-TM and linear-search based findall operations, employed in JavaMGTP and SATCHMO, respectively. To get an instance of a literal, the latter takes time proportional to the number N of asserted literals, while the former a constant time w.r.t. N .

A Model Generation Based Theorem Prover MGTP for First-Order Logic

4

189

Extensions of MGTP Features

4.1

Extension for Constraint Solving

In this section, we present two types of extensions of the MGTP system in terms of constraint solving. Those extensions aimed at solving constraint satisfaction problems in MGTP eﬃciently. MGTP presents a general framework to represent and solve ﬁrst order clauses, but sometimes it lacks the ability of constraint propagation using the problem (or domain) structures. We consider, as an example, quasigroup (QG) existence problems in ﬁnite algebra [3]. This problem can be deﬁned as ﬁnite-domain constraint satisfaction problems. In solving these problems, we found that the negative information should be propagated explicitly to prune redundant branches. This ability has been realized in the extension of MGTP, called CMGTP. Another example we consider here is channel routing problems in VLSI design. For these problems, it is needed to propagate interval constraint information as well as negative information. This additional propagation ability has been realized in the other extension of MGTP, called IV-MGTP. CMGTP. In 1992, MGTP succeeded in solving several open quasigroup (QG) problems on a parallel inference machine PIM/m consisting of 256 processors [13]. Later, other theorem provers or constraint solvers such as DDPP, FINDER, and Eclipse solved other new open problems more eﬃciently than the original MGTP. Those researches have revealed that the original MGTP lacked negative constraint propagation ability. This motivated us to develop CMGTP [28,50] that allows negated atoms in the MGTP clause to enable it to propagate negative constraints explicitly. Quasigroup Problems. A quasigroup is a pair Q, ◦ where Q is a ﬁnite set, ◦ a binary operation on Q and for any a, b, c ∈ Q, a◦b=a◦c⇒b=c a ◦ c = b ◦ c ⇒ a = b. The multiplication table of this binary operation ◦ forms a Latin square (shown in Fig. 14). QG problems we tried to solve are classiﬁed to 7 categories (called QG1, QG2, ..., QG7), each of which is deﬁned by adding some constraints to original quasigroup constraints. For example, QG5 constraint is deﬁned as ∀X, Y ∈ Q. ((Y ◦ X) ◦ Y ) ◦ Y = X. This constraint is represented with an MGTP clause: p(Y, X, A) ∧ p(A, Y, B) ∧ p(B, Y, C), X = C → .

(1)

From the view point of constraint propagation, rule (1) can be rewritten as follows2 : p(Y, X, A) ∧ p(A, Y, B) → p(B, Y, X). 2

In addition, we assume functionality in the arguments of p.

(2)

190

Ryuzo Hasegawa et al. ◦

1 2 3 4 5

1

1 3 2 5 4

2

5 2 4 3 1

3

4 5 3 1 2

4

2 1 5 4 3

5

3 4 1 2 5

Fig. 14. Latin square (order 5) p(Y, X, A) ∧ p(B, Y, X) → p(A, Y, B). p(B, Y, X) ∧ p(A, Y, B) → p(Y, X, A).

(3) (4)

These rules are still in the MGTP representation. To generate negative constraints, we add extra rules containing negative atoms to the original MGTP rule, by making contrapositives of it. For example, rule (2) can be augmented by the following rules: p(Y, X, A) ∧ ¬p(B, Y, X) → ¬p(A, Y, B).

(5)

p(A, Y, B) ∧ ¬p(B, Y, X) → ¬p(Y, X, A).

(6)

Each of the above rules is logically equivalent to (2), but has a diﬀerent operational meaning, that is, if a negative atom is derived, it can simplify the current disjunctive clauses in the disjunction buﬀer D. This simpliﬁcation can reduce the number of redundant branches signiﬁcantly. CMGTP Procedure. The structure of the model generation processes in CMGTP is basically the same as MGTP. The diﬀerences between CMGTP and MGTP are in the unit refutation processes and the unit simpliﬁcation processes with negative atoms. We can use negative atoms explicitly in CMGTP to represent constraints. If there exist P and ¬P in the current model candidate M , then f alse is derived by the unit refutation mechanism. If for a unit clause ¬Pi ∈ M (Pi ∈ M ), there exists a disjunction which includes Pi (¬Pi ), then Pi (¬Pi ) is removed from that disjunction by the unit simpliﬁcation mechanism. The refutation and simpliﬁcation processes added to MGTP guarantee that for any atom P ∈ M , P and ¬P are not in the current M simultaneously, and disjunctions in the current D have already been simpliﬁed by all unit clauses in M. Experimental Results. Table 1 compares the experimental results for QG problems on CP, CMGTP and other systems. CP is an experimental program written in SICStus Prolog, that is dedicated to QG problems [50]. In CP, the domain variable and its candidates to be assigned are represented with shared variables. The number of failed branches generated by CP and CMGTP are almost equal to DDPP and less than those from FINDER and MGTP. In fact, we

A Model Generation Based Theorem Prover MGTP for First-Order Logic

191

Table 1. Comparison of experimental results for QG5 Failed Branches Order DDPP FINDER MGTP CP CMGTP IV-MGTP 9 15 40 239 15 15 15 10 50 356 7026 38 38 52 11 136 1845 51904 117 117 167 12 443 13527 2749676 372 372 320

conﬁrmed that CP and CMGTP have the same pruning ability as DDPP by comparing the proof trees generated by these systems. The slight diﬀerences in the number of failed branches were caused by the diﬀerent selection functions used. For general performance, CP was superior to the other systems in almost every case. In particular, we obtained a new result in October 1993 that no model exists for QG5.16 by running CP on a SPARCstation-10 for 21 days. On the other hand, CMGTP is about 10 times slower than CP. The diﬀerence in speed is mainly caused by the term memory manipulation necessary for CMGTP. IV-MGTP. In MGTP (CMGTP), interpretations (called model candidates) are represented as ﬁnite sets of ground atoms (literals). In many situations this turns out being too redundant. Take, for example, variables I, J ranging over the domain {1, . . . , 4}, and interpret ≤, + naturally. A rule like “p(I) ∧ {I + J ≤ 4} → q(J)” splits into three model extensions: q(1), q(2), q(3), if p(1) is present in the current model candidate. Now assume we have the rule “q(I)∧q(J)∧{I = J} → .” saying that q is functional in its argument and, say, q(4) is derived from another rule. Then all three branches must be refuted separately. Totally ordered, ﬁnite domains occur naturally in many problems. In such problems, situations such as the one just sketched are common. Thus we developed an IV-MGTP system [19] to enhance MGTP with mechanisms to deal with them eﬃciently. Introducing Constrained Atoms into MGTP. In order to enhance MGTP with totally ordered, ﬁnite domain constraints, we adopt the notation: p(t1 , . . . , tr , S1 , . . . , Sm ) for what we call a constrained atom. This notation is motivated from the viewpoint of signed formula logic programming (SFLP) [37] and constraint logic programming (CLP) over ﬁnite domains [41]. Constrained atoms explicitly stipulate subsets of domains and thus are in solved form. The language of IV-MGTP needs to admit other forms of atoms, in order to be practically useful in solving problems with totally ordered domains. An IV-MGTP atom is an expression p(t1 , . . . , tr , κ1 , . . . , κm ), where the κi has one of the following forms: 1. {i1 , . . . , ir }, where ij ∈ N for 1 ≤ j ≤ r (κi is in solved form); 2. ]ι1 , ι2 [, where ιj (j = 1, 2) ∈ N ∪ CVar; the intended meaning is ]ι1 , ι2 [ = {i ∈ N | i < ι1 or i > ι2 };

192

Ryuzo Hasegawa et al.

3. [ι1 , ι2 ], where ιj (j = 1, 2) ∈ N ∪ CVar; the intended meaning is [ι1 , ι2 ] = {i ∈ N | ι1 ≤ i ≤ ι2 }; 4. U ∈DVar. where CVar is a set of constraint variables which hold elements from a domain N , and DVar is a set of domain variables which hold subsets of a domain N . In this framework, since intervals play a central role, we gave the name IV-MGTP to the extension of MGTP. For each predicate p with constrained arguments, an IV-MGTP program contains a declaration line of the form “declare p(t, . . . , t, j1 , . . . , jm )”. If the i-th place of p is t, then the i-th argument of p is a standard term; if the i-th place of p is a positive integer j, then the i-th argument of p is a constraint over the domain {1, . . . , j}. Each IV-MGTP atom p(t1 , . . . , tr , κ1 , . . . , κm ) consists of two parts: the standard term part p(t1 , . . . , tr ) and the constraint part κ1 , . . . , κm . Each of r and m can be 0. The latter, m = 0, is in particular the case for a predicate that has no declaration. By this convention, every MGTP program is an IV-MGTP program. If m = 1 and the domain of κ1 is {1, 2}, the IV-MGTP programs are equivalent to CMGTP programs where {1} is interpreted as positive and {2} as negative. Hence, every CMGTP program is also an IV-MGTP program. Model Candidates in IV-MGTP. While the deduction procedure for IV-MGTP is almost the same as for CMGTP, model candidates are treated diﬀerently. In MGTP, a list of current model candidates that represent Herbrand interpretations is kept during the deduction process, and model candidates can be simply identiﬁed with sets of ground atoms. The same holds in IV-MGTP, only that some places of a predicate contain a ground constraint in solved form (that is: a subset of a domain) instead of a ground term. Note that, while in MGTP one model candidate containing ground atoms {L1 , . . . , Lr } trivially represents exactly one possible interpretation of the set of atoms {L1 , . . . , Lr }, in IV-MGTP one model candidate represents many IV-MGTP interpretations which diﬀer in the constraint parts. Thus, model candidates can be conceived as sets of constrained atoms of the form p (t1 , . . . , tr , S1 , . . . , Sm ), where the Si are subsets of the appropriate domain. If M is a model candidate, p(t1 , . . . , tr ) the ground term part, and

S1 , . . . , Sm the constraint part in M , then deﬁne M ( p(t1 , . . . , tr ) ) = S1 , . . . , Sm . We say that a ground constrained atom L = p (t1 , . . . , tr , i1 , . . . , im ) is satisfied by M (M |= L) iﬀ there are domain elements s1 ∈ i1 , . . . , sm ∈ im such that s1 , . . . , sm ∈ M (p(t1 , . . . , tr )). Formally, a model candidate M is a partial function that maps ground instances of the term part of constrained atoms which is declared as “p(t, . . . , t, j1 , . . . , jm )” into (2{1,...,j1 } −{∅})×· · ·×(2{1,...,jm } −{∅}). Note that M (p(t1 , . . . , tr )) can be undeﬁned. Besides rejection, subsumption, and extension of a model candidate, in IVMGTP there is a fourth possibility not present in MGTP, that is, model can-

A Model Generation Based Theorem Prover MGTP for First-Order Logic

193

didate update. We see that model candidate update is really a combination of subsumption and rejection. Consider the following example. Example 1. Let C = p({1, 2}) be the consequent of an IV-MGTP rule and assume M (p) = {2, 3}. Neither is the single atom in C inconsistent with M nor is it subsumed by M . Yet the information contained in C is not identical to that in M and it can be used to reﬁne M to M (p) = {2}. Channel Routing Problems. Channel routing problems in VLSI design can be represented as constraint satisfaction problems, in which connection requirements (what we call nets) between terminals must be solved under the condition that each net has a disjoint path from all others. For these problems, many specialized solvers employing heuristics were developed. Our experiments are not primarily intended to compare IV-MGTP with such solvers, but to show the eﬀectiveness of the interval/extraval representation and its domain calculation in the IV-MGTP procedure. We consider a multi-layer channel which consists of multiple layers, each of which has multiple tracks. We assume in addition, to simplify the problem, that each routing path makes no detour and contains only one track. By this assumption, the problem can be formalized to determine the layer and the track numbers for each net with the help of constraints that express the two binary relations: not equal (neq) and above. neq(N1 , N2 ) means that the net N1 and N2 do not share the same track. above(N1 , N2 ) means that if N1 and N2 share the same layer, the track number of N1 must be larger than that of N2 . For example, not equal constraints for nets N1 and N2 are represented in IV-MGTP as follows: p(N1 , [L, L], [T1 , T1 ]) ∧ p(N2 , [L, L], [T21 , T22 ]) ∧ neq(N1 , N2 ) → p(N2 , [L, L], ]T1 , T1 [) where the predicate p has two constraint domains: layer number L and track number Ti . Experimental Results. We developed an IV-MGTP prototype system in Java and made experiments on a Sun Ultra 5 under JDK 1.2. The results are compared with those on the same problems formulated and run with CMGTP [50] (also written in Java [21]). We experimented with problems consisting of 6, 8, 10, and 12 net patterns on the 2 layers channel each of which has 3 tracks. The results are shown in Table 2. IV-MGTP reduces the number of models considerably. For example, we found the following model in a 6-net problem: { p(1, [1, 1], [3, 3]), p(2, [1, 1], [1, 1]), p(3, [1, 1], [2, 2]), p(4, [2, 2], [2, 3]), p(5, [2, 2], [1, 2]), p(6, [1, 1], [2, 3]) }, which contains 8 (= 1 × 1 × 1 × 2 × 2 × 2) CMGTP models. The advantage of using IV-MGTP is that the diﬀerent feasible track numbers can be represented as

194

Ryuzo Hasegawa et al.

Table 2. Experimental results for the channel routing problem Number of Nets = 6 IV-MGTP CMGTP models 250 840 branches 286 882 runtime(msec) 168 95 Number of Nets = 10 models branches runtime(msec)

IV-MGTP CMGTP 4998 51922 6238 52000 2311 3882

Number of Nets = 8 models branches runtime(msec)

IV-MGTP CMGTP 1560 10296 1808 10302 706 470

Number of Nets = 12 models branches runtime(msec)

IV-MGTP CMGTP 13482 538056 20092 539982 7498 31681

interval constraints. In CMGTP, the above model is split into 8 diﬀerent models. Obviously, as the number of nets increases, the reduction ratio of the number of models becomes larger. We conclude that IV-MGTP can eﬀectively suppress unnecessary case splitting by using interval constraints, and hence, reduce the total size of proofs. Because CMGTP program can be transferred to IV-MGTP program, QG problems can be transferred into IV-MGTP program. IV-MGTP, however, cannot solve QG problems more eﬃciently than CMGTP, that is, QG problems do not receive the beneﬁt of IV-MGTP representation and deduction process. The eﬃciency or advantage by using IV-MGTP depends on the problem domain how beneﬁcial the eﬀect of interval/extraval constraints on performance is. For problems where the ordering of the domain elements has no signiﬁcance, such as the elements of a QG problem (whose numeric elements are considered strictly as symbolic values, not arithmetic values), CMGTP and IV-MGTP have essentially the same pruning eﬀect. However, where reasoning on the arithmetic ordering between the elements is important, such as in channel routing problems, IV-MGTP outperforms CMGTP. Completeness. MGTP provides a sound and complete procedure in the sense of standard Herbrand interpretation. The extensions, CMGTP and IV-MGTP described above, however, lost completeness [19]. The reason is essentially the same as for incompleteness of resolution and hypertableaux with unrestricted selection function [18]. It can be demonstrated with the simple example P = {→ p, ¬q → ¬p, q →}. The program P is unsatisﬁable, yet deduction procedures based on selection of only antecedent (or only consequent) literals cannot detect this. Likewise, the incomplete treatment of negation in CMGTP comes up with the incorrect model {p} for P . The example can be transferred to IV-MGTP 3 . Assume p and q are 3

We discuss only about IV-MGTP in the rest of this section, because CMGTP can be considered as a special case of IV-MGTP. It is suﬃcient to say about IV-MGTP.

A Model Generation Based Theorem Prover MGTP for First-Order Logic

195

deﬁned “declare p(2)” and “declare q(2)”. The idea is to represent a positive literal p with p({2}) and a negative literal ¬p with p({1}). Consider P = {→ p({2}), q({1}) → p({1}), q({2}) →}

(7)

which is unsatisﬁable (recall that p and q are functional), but has an IV-MGTP model, where M (p) = {2}, and M (q) is undeﬁned. In order to handle such cases, we adopt a non-standard semantics called extended interpretations which is suggested in SFLP [37]. The basic idea underlying extended interpretations (e-interpretations) is to introduce the disjunctive information inherent to constraints into the interpretations themselves. In e-interpretations, an e-interpretation of a predicate p is a partial function I mapping ground instances of the term part of p into its constraint part. This means that the concepts introduced for model candidates can be used for einterpretations. An extended interpretation I does e-satisfy an IV-MGTP ground atom L= , p( t1 , . . . , tr , S1 , . . . , Sm ) iﬀ I(p(t1 , . . . , tr )) is deﬁned, has the value S1 , . . . , Sm and Si ⊆ Si for all 1 ≤ i ≤ m. Using the above deﬁnition, we have proved the following completeness theorem [19]. Theorem 1 (Completeness). An IV-MGTP program P having an IV-MGTP model M is e-satisfiable by M (viewed as an e-interpretation). Simple conversion of this theorem and proof makes the case of CMGTP trivial. 4.2

Non-Horn Magic Set

The basic behaviors of model generation theorem provers, such as SATCHMO and MGTP, are to detect a violated clause under some interpretation, called a model candidate, and to extend the model candidate so that the clause is satisﬁed. However, when there are several violated clauses, a computational cost may greatly diﬀer according to the order in which those clauses are evaluated. Especially when a non-Horn clause irrelevant to the given goal is selected, many interpretations generated with the clause would become useless. Thus, in the model generation method, it is necessary to develop a method to suppress the generation of useless interpretations. To this end, Loveland et al. proposed a method, called relevancy testing [56,36], to restrict the selecting of a violated clause to only those whose consequent literals are all relevant to the given goal (“totally relevant”). Then they implemented this idea in SATCHMORE (SATCHMO with RElevancy). Let HC be a set of Horn clauses, and I be a current model candidate. A relevant literal is deﬁned as a goal called in a failed search to prove ⊥ from HC ∪ I or a goal called in a failed search to prove the antecedent of a non-Horn clause by Prolog execution.

196

Ryuzo Hasegawa et al.

The relevancy testing can avoid useless model extension with irrelevant violated clauses. However, there is some overhead, because it computes relevant literals dynamically by utilizing Prolog over Horn clauses whenever violated nonHorn clauses are detected. On the other hand, compared to top-down provers, a model generation prover like SATCHMO or MGTP can avoid solving duplicate subgoals because it is based on bottom-up evaluation. However, it also has the disadvantage of generating irrelevant atoms to prove the given goal. Thus it is necessary to combine bottom-up with top-down proving to use goal information contained in negative clauses, and to avoid generating useless model candidates. For this purpose, several methods such as magic sets, Alexander templates, and bottom-up metainterpretation have been proposed in the ﬁeld of deductive databases [9]. All of these transform the given Horn intentional databases to eﬃcient Horn intentional databases, which generate only ground atoms relevant to the given goal in extensional databases. However, these were restricted to Horn programs. To further extend these methods, we developed a new transformation method applicable to non-Horn clauses. We call it the non-Horn magic set (NHM) [24]. NHM is a natural extension of the magic set yet works within the framework of the model generation method. Another extension for non-Horn clauses has been proposed, which simulates top-down execution based on the model elimination procedure within a forward chaining paradigm [52]. In the NHM method, each clause in a given clause set is transformed into two types of clauses. One is used to simulate backward reasoning and the other is to control inferences in forward reasoning. The set of transformed clauses is proven by bottom-up theorem provers. There are two kinds of transformation methods: the breadth-first NHM and the depth-first NHM. The former simulates breadth-ﬁrst backward reasoning, and the latter simulates depth-ﬁrst backward reasoning. Breadth-first NHM. For the breadth-ﬁrst NHM method, a clause A1 ∧ · · · ∧ An → B1 ∨ · · · ∨ Bm in the given clause set S is transformed into the following (extended) clauses: TB1 : goal(B1 ) ∧ . . . ∧ goal(Bm ) → goal(A1 ) ∧ . . . ∧ goal(An ). TB2 : goal(B1 ) ∧ . . . ∧ goal(Bm ) ∧ A1 ∧ . . . ∧ An → B1 ∨ . . . ∨ Bm . In this transformation, for n = 0 (a positive clause), the ﬁrst transformed clause TB1 is omitted. For m = 0 (a negative clause), the conjunction of goal(B1 ), . . . , goal(Bm ) becomes true. For n = 0, two clauses TB1 and TB2 are obtained by the transformation. Here, the meta-predicate goal(A) represents that the atom A is relevant to the goal and it must be solved. The clause TB1 simulates top-down evaluation. Intuitively, TB1 means that when it is necessary to solve the consequent B1 , . . . , Bm of the original clause, it is necessary to solve the antecedent A1 , . . . , An before doing that. The n antecedent literals are solved in parallel. On the other hand, the clause TB2 simulates relevancy testing. TB2 means that a model extension with

A Model Generation Based Theorem Prover MGTP for First-Order Logic

197

the consequent is performed only when A1 , . . . , An are satisﬁed by the current model candidate and all the consequent atoms B1 , . . . , Bm are relevant to the given goal. That is, the original clause is not used for model extension if there exists any consequent literal Bj such that Bj is not a goal. Depth-first NHM. For the depth-ﬁrst NHM transformation, a clause A1 ∧ · · · ∧ An → B1 ∨ · · · ∨ Bm in S is transformed into n + 1 (extended) clauses: 1 : goal(B1 ) ∧ . . . ∧ goal(Bm ) → goal(A1 ) ∧ contk,1 (Vk ). TD 2 TD : contk,1 (Vk ) ∧ A1 → goal(A2 ) ∧ contk,2 (Vk ). .. . n : contk,(n−1) (Vk ) ∧ An−1 → goal(An ) ∧ contk,n (Vk ). TD n+1 TD : contk,n (Vk ) ∧ An → B1 ∨ . . . ∨ Bm .

where k is the clause identiﬁer of the original clause, Vk is the tuple of all variables appearing in the original clause. The transformed clauses are interpreted as follows: If all consequent literals B1 , · · · , Bm are goals, we ﬁrst attempt to solve the ﬁrst atom A1 . At that time, the variable bindings obtained in the sat2 by isﬁability checking of the antecedent are propagated to the next clause TD the continuation literal contk,1 (Vk ). If atom A1 is solved under contk,1 (Vk ), then we attempt to solve the second atom A2 , and so on. Unlike the breadth-ﬁrst NHM transformation, n antecedent atoms are being solved sequentially from A1 to An . During this process, the variable binding information is propagated from A1 to An in this order. Several experimental results obtained so far suggest that the NHM and relevancy testing methods have a similar or the same pruning ability. To clarify this, we deﬁned the concept of weak relevancy testing that mitigates the condition of relevancy testing, and then proved that the NHM method is equivalent to the weak relevancy testing in terms of the ability to prune redundant branches [45]. However, signiﬁcant diﬀerences between NHM and SATCHMORE can be admitted. First, SATCHMORE performs the relevancy testing dynamically during proof, while NHM is based on the static analysis of input clauses and transforms them as a preprocessing of proof. Second, the relevancy testing by SATCHMORE repeatedly calls Prolog to compute relevant literals backward whenever a new violated clause is found. This process often results in re-computation of the same relevant literals. In contrast, for NHM, goal literals are computed forward and their re-computation is avoided. 4.3

Eliminating Redundant Searches by Dependency Analysis

There are two types of redundancies in model generation: One is that the same subproof tree may be generated at several descendants after a case-splitting occurs. Another is caused by unnecessary model candidate extensions. Folding-up is a well known technique for eliminating duplicate subproofs in a tableaux framework [34]. In order to embed folding-up into model generation,

198

Ryuzo Hasegawa et al.

B1 σ

Bi σ

Bm σ

A1

Ai

Am

Fig. 15. Model extension

we have to analyze dependency in a proof for extracting lemmas from proven subproofs. Lemmas are used for pruning other remaining subproofs. Dependency analysis makes unnecessary parts visible because such parts are independent of essential parts in the proof. In other words, we can separate unnecessary parts from the proof according to dependency analysis. Identifying unnecessary parts and eliminating them are considered as proof simplification. The computational mechanism for their elimination is essentially the same as that for proof condensation [46] and level cut [2]. Taking this into consideration, we implemented not only folding-up but also proof condensation by embedding a single mechanism, i.e. proof simpliﬁcation, into model generation [32]. In the following, we consider the function M G in Fig. 1 to be a builder of proof trees in which each leaf is labeled with ⊥ (for a failed branch, that is, UNSAT) or (for a success branch, that is, SAT), and each non-leaf node is labeled with an atom used for model extension. Definition 1 (Relevant atom). Let P be a finite proof tree. A set Rel(P ) of relevant atoms of P is defined as follows: 1. If P = ⊥ and A1 σ ∧ . . . ∧ An σ → is the negative clause used for building P , then Rel(P ) = {A1 σ, . . . , An σ}. 2. If P = , then Rel(P ) = ∅. 3. If P is in the form depicted in Fig. 15, A1 σ ∧ . . . ∧ An σ → B1 σ ∨ . . . ∨ Bm σ is the mixed or positive clause used for forming the root of P and (a) ∀i(1 ≤ i ≤ m)Bi σ ∈ Rel(Pi ), then Rel(P ) = ∪m i=1 (Rel(Pi ) \ {Bi σ}) ∪ {A1 σ, . . . , An σ} (b) ∃i(1 ≤ i ≤ m)Bi σ ∈ Rel(Pi ), then Rel(P ) = Rel(Pi0 ) (where i0 is the minimal index satisfying 1 ≤ i0 ≤ m and Bi0 σ ∈ Rel(Pi0 )) Informally, relevant atoms of a proof tree P are atoms which contribute to building P and appear as ancestors of P if P does not contain . If P contains , the set of relevant atoms of P is ∅. Definition 2 (Relevant model extension). A model extension with a clause A1 σ ∧ . . .∧ An σ → B1 σ ∨ . . .∨ Bm σ is relevant to the proof if the model extension yields the proof tree in the form depicted is Fig. 15 and either ∀i(1 ≤ i ≤ m)Bi σ ∈ Rel(Pi ) or ∃i(1 ≤ i ≤ m)(Pi contains ) holds.

A Model Generation Based Theorem Prover MGTP for First-Order Logic

199

We can eliminate irrelevant model extensions as follows. Let P be a proof tree in the form depicted in Fig. 15. If there exists a subproof tree Pi (1 ≤ i ≤ m) such that Bi σ ∈ Rel(Pi ) and Pi does not contain , we can conclude that the model extension forming the root of P is unnecessary because Bi σ does not contribute to Pi . Therefore, we can delete other subproof trees Pj (1 ≤ j ≤ m, j = i) and take Pi to be a simpliﬁed proof tree of P . When P contains , we see that the model extension forming the root of P is necessary from a model ﬁnding point of view. Performing proof simpliﬁcation during the proof, instead of after the proof has been completed, makes the model generation procedure more eﬃcient. Let assume that we build a proof tree P (in the form depicted in Fig. 15) in a leftﬁrst manner and check whether Bi σ ∈ Rel(Pi ) after Pi is built. If Bi σ ∈ Rel(Pi ) holds, we can ignore building the proofs Pj (i < j ≤ m) because the model extension does not contribute to the proof Pi . Thus m − i out of m branches are eliminated after i branches have been explored. This proof elimination mechanism is essentially the same as the proof condensation [46] and the level cut [2] facilities. We can make use of a set of relevant atoms not only for proof condensation but also for generating lemmas. Theorem 2. Let S be a set of clauses, M a set of ground atoms and P = M G(U0 , D0 , ∅). Note that M G in Fig. 1 is modified to return a proof tree. If all leaves in P are labeled with ⊥, i.e. P does not contain , then S ∪ Rel(P ) is unsatisfiable. This theorem says that a set of relevant atoms can be considered as a lemma. Consider the model generation procedure shown in Fig. 1. Let M be a current model candidate and P be a subproof tree which was previously obtained and does not contain . If M ⊃ Rel(P ) holds, we can reject M without further proving because S ∪ M is unsatisﬁable where S is a clause set to be proven. This rejection mechanism can reduce search spaces by orders of magnitude. However, it is expensive to test whether M ⊃ Rel(P ). Thus, we restrict the usage of the rejection mechanism. Definition 3 (Context unit lemma). Let S be a set of clauses and P a proof tree of S in the form depicted in Fig. 15. When Bi σ ∈ Rel(Pi ), Rel(Pi ) \ {Bi σ} |=S ¬Bi σ is called a context unit lemma4 extracted from Pi . We call Rel(Pi ) \ {Bi σ} the context of the lemma. Note that Bi σ ∈ Rel(Pi ) implies Rel(Pi ) is not empty. Therefore, Pi does not contain . Thus, S ∪ Rel(Pi ) is unsatisﬁable according to Theorem 2. The context of the context unit lemma extracted from Pi (1 ≤ i ≤ m) is satisﬁed in model candidates of sibling proofs Pj (j = i, 1 ≤ j ≤ m), that is, the lemma is available in Pj . Furthermore, the lemma can be lifted to the nearest ancestor’s node which does not satisfy the context (in other words, which is 4

Γ |=S L is an abbreviation of S ∪ Γ |= L where Γ is a set of ground literals, S is a set of clauses, and L is a literal.

200

Ryuzo Hasegawa et al.

labeled with an atom in the context) and is available in its descendant’s proofs. Lifting context unit lemmas to appropriate nodes and using them for pruning proof tree is an implementation of folding-up [34] for model generation. In this way, not only folding-up but also proof condensation can be achieved by calculating sets of relevant atoms of proofs. We have already implemented the model generation procedure with folding-up and proof condensation and experienced their pruning eﬀects on some typical examples. For all non-Horn problems (1984 problems) in the TPTP library [53] version 2.2.1, the overall success rate was about 19% (cf., pure model generation 16%, Otter(v3.0.5) 27%5 ) for a time limit of 10 minutes on a Sun Ultra1 (143MHz, 256MB, Solaris2.5.1) workstation. 4.4

Minimal Model Generation

The notion of minimal models is important in a wide range of areas such as logic programming, deductive databases, software veriﬁcation, and hypothetical reasoning. Some applications in such areas would actually need to generate Herbrand minimal models of a given set of ﬁrst-order clauses. A model generation algorithm can generate all minimal Herbrand models if they are ﬁnite, though it may generate non-minimal models [10]. Bry and Yahya proposed a sound (in the sense that it generates only minimal models) and complete (in the sense that it generates all minimal models) minimal model generation prover MM-SATCHMO [10]. It uses complement splitting (or foldingdown in [34]) for pruning some branches leading to nonminimal models and constrained search for eliminating non-minimal models. Niemel¨ a also presented a propositional tableaux calculus for minimal model reasoning [43], where he introduced the groundedness test which substitutes for constrained searches. The following theorem says that a model being eliminated by factorization [34] in the model generation process is not minimal. This implies that model generation with factorization is complete for generating minimal models. It is also known that factorization is more ﬂexible than complement splitting for pruning the redundant search spaces [34]. Theorem 3. Let P be a proof tree of a set S of clauses. We assume that N1 and N2 are sibling nodes in P , Ni is labeled with a literal Li , and Pi is a subproof tree under Ni (i = 1, 2) shown in Fig. 16(a). If there is a node N3 , descended from N2 , labeled with L1 , then for each model M found in proof tree P3 , there exists a model M found in P1 such that M ⊂ M where P3 is a subproof tree under N3 (Fig. 16(b)). To avoid a circular argument, the proof tree has to be supplied with an additional factorization dependency relation. 5

This measurement is obtained by our experiment with just Otter (not Otter+MACE).

A Model Generation Based Theorem Prover MGTP for First-Order Logic L1

L2 N2 N1 L1

L2 N2

P1

P2 (a)

L1 N3

L11

L1i

P3 (b)

(c)

201

N1 L1m1

N1 L1

L2 N2

L1 N3 (d)

Fig. 16. Proof trees explaining Theorem 3, 4 and Deﬁnition 5

Definition 4 (Factorization dependency relation). A factorization dependency relation on a proof tree is a strict partial ordering ≺ relating sibling nodes in the tree (N1 ≺ N2 means that searching minimal models under N2 is delegated to that under N1 ). Definition 5 (Factorization). Given a proof tree P and a factorization dependency relation ≺ on P . First, select a node N3 labeled with literal L1 and another node N1 labeled with the same literal L1 such that (1) N3 is a descendant of N2 which is the sibling node of N1 , and (2) N2 ≺ N1 . Then, mark N3 with N1 and modify ≺ by first adding the pair of nodes

N1 , N2 and then forming the transitive closure of the relation. We say that N3 has been factorized with N1 . Marking N3 with N1 indicates finding models under N3 is delegated to that under N1 . The situation is depicted in Fig. 16(d). Corollary 1. Let S be a set of clauses. If a minimal model M of S is built by model generation, then M is also built by model generation with factorization. We can replace L1 ∨ L2 ∨ . . . ∨ Ln used for non-Horn extension with an augmented one (L1 ∧ ¬L2 ∧ . . . ∧ ¬Ln ) ∨ (L2 ∧ ¬L3 ∧ . . . ∧ ¬Ln ) ∨ . . . ∨ Ln , which corresponds to complement splitting. Here a negated literal is called a branching assumption. If none of branching assumptions ¬Li+1 , . . . , ¬Ln is used in a branch expanded below Li , we can use ¬Li as a unit lemma in the proof of Lj (i + 1 ≤ j ≤ n). The unit lemma is called a branching lemma. We consider model generation with complement splitting as pre-determining factorization dependency relation on sibling nodes N1 , . . . , Nm as follows: Nj ≺ Ni if i < j for all i and j (1 ≤ i, j ≤ m). According to this consideration, complement splitting is a restricted way of implementing factorization. We have proposed a minimal model generation procedure [23] that employs branching assumptions and lemmas. We consider model generation with branching assumptions and lemmas as arranging factorization dependency relation on sibling nodes N1 , . . . , Nm as follows: For each i (1 ≤ i ≤ m), Nj ≺ Ni for all j (i < j ≤ m) if Nj0 ≺ Ni for some j0 (i < j0 ≤ m) and otherwise Ni ≺ Nj for all j (i < j ≤ m). Performing branching assumptions and lemmas can still be taken as a restricted implementation of factorization. Nevertheless, it provides

202

Ryuzo Hasegawa et al.

Table 3. Results of MM-MGTP and other systems Problem ex1 (N=5) ex1 (N=7) ex2 (N=14) ex3 (N=16) ex3 (N=18) ex4

ex5

MM-MGTP Rcmp Mchk 0.271 0.520 100000 100000 0 0 34.150 OM (>144) 10000000 − 0 − 0.001 0.001 1 1 26 26 19.816 5.076 65536 65536 1 1 98.200 26.483 262144 262144 1 1 0.002 0.002 341 341 96 96 0.001 0.001 17 17 84 84

MMSATCHMO 8869.950 100000 0 OM (>40523) − − 1107.360 1 1594323 OM (>2798) − − OM (>1629) − − 0.3 341 284 0.25 17 608

MGTP 0.199 100000 0 19.817 10000000 0 9.013 1594323 0 589.651 86093442 0 5596.270 774840978 0 0.004 501 0 0.001 129 0

top: time(sec), middle: No. of models, bottom: No. of failed branches, OM: Out of memory. MM-MGTP and MGTP: run on Java (Solaris JDK 1.2.1 03) MM-SATCHMO: run on ECLi PSs Prolog Version 3.5.2 All programs were run on Sun Ultra10 (333MHz, 128MB)

an eﬃcient way of applying factorization to minimal model generation, since it is unnecessary to compute the transitive closure of the factorization dependency relation. In order to make the procedure sound in the sense that it generates only minimal models, it is necessary to test whether a generated model is minimal or not. The following theorem gives a necessary condition for a generated model to be nonminimal. Theorem 4. Let S be a set of clauses and P a proof tree of S obtained by the model generation with factorization. We assume that N1 and N2 are sibling nodes in P , Pi a subproof tree under Ni , and Mi a model found in Pi (i = 1, 2). If N2 ≺ N1 , then M1 ⊂ M2 . Theorem 4 says that we have to test whether M1 ⊂ M2 only when Mi is found under a node Ni (i = 1, 2) such that N2 ≺ N1 .

A Model Generation Based Theorem Prover MGTP for First-Order Logic

203

We implemented a minimal model generation prover called MM-MGTP with branching assumptions and lemmas on Java [23]. The implementation takes Theorem 4 into account. It is applicable to ﬁrst-order clauses as well as MMSATCHMO. Table 3 shows experimental results on MM-MGTP, MM-SATCHMO, and MGTP. There are two versions of MM-MGTP: model checking (Mchk) and model re-computing (Rcmp). The former is based on constrained search and the latter on the groundedness test. Although the model checking MM-MGTP is similar to MM-SATCHMO, the way of treating model constraints diﬀers somewhat. Instead of dynamically adding model constraints (negative clauses) to the given clause set, MM-MGTP retains them in the form of a model tree consisting of only models. Thus, the constrained search for minimal models in MM-SATCHMO is replaced by a model tree traversal for minimality testing. In the model re-computing version, a re-computation procedure for minimality testing is invoked instead of a model tree traversal. The procedure is the same as M G except that some routines are modiﬁed for restarting the execution. It returns UNSAT if the current model is minimal, otherwise SAT. Experimental results show remarkable speedup compared to MM-SATCHMO. See [23] for a detailed consideration on the experiment.

5

Applications

A model generation theorem prover has a general reasoning power in various AI applications. In particular, we ﬁrst implemented a propositional modal tableaux system on MGTP, by representing each rule of tableaux with MGTP input clauses. This approach has lead to research on logic programming with negation as failure [29], abductive reasoning [30], modal logic systems [31], mode analysis of FGHC programs [54], and legal reasoning [44,27], etc. In the following sections, we focus on the issue of implementing negation as failure within a framework of model generation, and describe how this feature is used to build a legal reasoning system. 5.1

Embedding Negation as Failure into MGTP

Negation as failure is one of the most important techniques developed in the logic programming ﬁeld, and logic programming supporting this feature can be a powerful knowledge representation tool. Accordingly, declarative semantics such as the answer set semantics have been given to extensions of logic programs containing both negation as failure (not) and classical negation (¬), where the negation as failure operator is considered to be a non-monotonic operator [16]. However, for such extended classes of logic programs, the top-down approach cannot be used for computing the answer set semantics because there is no local property in evaluating programs. Thus, we need bottom-up computation for correct evaluation of negation as failure formulas. For this purpose, we use the

204

Ryuzo Hasegawa et al.

framework of MGTP, which can ﬁnd the answer sets as the ﬁxpoint of model candidates. Here, we introduce a method to transform any logic program (with negation as failure) into a positive disjunction program (without negation as failure) [40] for which MGTP can compute the minimal models [29]. Translation into MGTP Rules. A positive disjunctive program is a set of rules of the form: (8) A1 | . . . | Al ← Al+1 , . . . , Am where m ≥ l ≥ 0 and each Ai is an atom. The meaning of a positive disjunctive program P can be given by the minimal models of P [40]. The minimal models of positive disjunctive programs can be computed using MGTP. We represent each rule of the form (8) in a positive disjunctive program with the following MGTP input clauses: Al+1 ∧ . . . ∧ Am → A1 ∨ . . . ∨ Al

(9)

General and Extended Logic Programs. MGTP can also compute the stable models of a general logic program [15] and the answer sets of an extended disjunctive program [16] by translation into positive disjunctive programs. An extended logic program is a set of rules of the form: L1 | . . . | Ll ← Ll+1 , . . . , Lm , not Lm+1 , . . . , not Ln

(10)

where n ≥ m ≥ l ≥ 0 and each Li is a literal. This logic program is called a general logic program if l ≤ 1, and each Li is an atom. While a general logic program contains negation-as-failure but does not contain classical negation, an extended disjunctive program contains both of them. In evaluating not L in a bottom-up manner, it is necessary to interpret not L with respect to a ﬁxpoint of the computation, because even if L is not currently proved, L might be proved in subsequent inferences. When we have to evaluate not L in a current model candidate, we split the model candidate into two: (1) the model candidate where L is assumed not to hold, and (2) the model candidate where it is necessary that L holds. Each negation-as-failure formula not L is thus translated into negative and positive literals with a modality expressing belief, i.e., “disbelieve L” (written as ¬KL) and “believe L” (written as KL). Based on the above discussion, we translate each rule of the form (10) to the following MGTP rule: Ll+1 ∧ . . . ∧ Lm → H1 ∨ . . . ∨ Hl ∨ KLm+1 ∨ . . . ∨ KLn

(11)

where Hi ≡ ¬KLm+1 ∧ . . . ∧ ¬KLn ∧ Li (i = 1, . . . , l) For any MGTP rule of the form (11), if a model candidate M satisﬁes Ll+1 , . . . , Lm , then M is split into n − m + l (n ≥ m ≥ 0, 0 ≤ l ≤ 1) model candidates.

A Model Generation Based Theorem Prover MGTP for First-Order Logic

205

In order to reject model candidates when their guesses turn out to be wrong, the following two schemata (integrity constraints) are introduced: ¬KL ∧ L →

for every literal L ∈ L .

(12)

¬KL ∧ KL →

for every literal L ∈ L .

(13)

Added to the schemata above, we need the following 3 schemata to deal with classical negation. Below, L is the literal complement to a literal L. L∧L → KL ∧ L →

for every literal L ∈ L . for every literal L ∈ L .

(14) (15)

KL ∧ KL →

for every literal L ∈ L .

(16)

Next is the condition to guarantee stability at a ﬁxpoint that all of the guesses made so far in a model candidate M are correct. For every ground literal L, if KL ∈ M , then L ∈ M. The above computation by the MGTP is sound and complete with respect to the answer set semantics. This technique is simply based on a bottom-up model generation method together with integrity constraints over K-literals expressed by object-level schemata on the MGTP. Compared with other approaches, the proposed method has several computational advantages: put simply, it can ﬁnd all minimal models for every class of groundable logic program or disjunctive database, incrementally, without backtracking, and in parallel. This method has been applied to a legal reasoning system [44]. 5.2

Legal Reasoning

As an real application, MGTP has been applied to a legal reasoning system [44,27]. Since legal rules imply uncertainty and inconsistency, we have to introduce other language rather than the MGTP input language, for users to represent law and some judicial precedents. In this section, we show an extended logic programming language, and a method to translate it into the MGTP input clauses to solve legal problems automatically using MGTP. Extended Logic Programming Language. In our legal reasoning system, we adopted the extended logic programming language deﬁned below to represent legal knowledge and judicial precedents. We consider rules of the form: R :: L0 ← L1 ∧ . . . ∧ Lm ∧ not Lm+1 ∧ . . . ∧ not Ln .

(17)

R ::← L1 ∧ L2 . R :: L0 ⇐ L1 ∧ . . . ∧ Lm ∧ not Lm+1 ∧ . . . ∧ not Ln .

(18) (19)

206

Ryuzo Hasegawa et al.

where Li (0 ≤ i ≤ n) represents a literal, not represents negation as failure (NAF), and R is a rule identiﬁer, which has all variables occurring in Li (0 ≤ i ≤ n) as its arguments. (17) is called an exact rule, in which if all literals in the rule body are assigned true, then the rule head is assigned true without any exception. (18) is called an integrity constraint which means the constraint that L1 and L2 must not be assigned true in the same context. (19) is called a default rule, in which if all literals in the rule body are assigned true, then the rule head is assigned true unless it causes a conﬂict or destroys an integrity constraints. Example: r1(X) :: f ly(X) ⇐ bird(X) ∧ not baby(X). r2(X) :: ¬f ly(X) ⇐ penguin(X). r3(X) :: bird(X) ← penguin(X). f 1 :: bird(a). f 2 :: penguin(a). f 3 :: baby(b). In this example, r1(X) can derive f ly(a), that is inconsistent with ¬f ly(a) derived from r2(X). Since r1(X) and r2(X) are represented with default rules, we cannot conclude whether a ﬂies or a does not ﬂy. If r2(X), however, were deﬁned as a more speciﬁc rule than r1(X), that is, r2(X) is preferred to r1(X), ¬f ly(a) could defeat f ly(a). In order to realize such reasoning about rule preference, we introduce another form of literal representation: R1 < R2 which means “rule R2 is preferred to R1 ” (where R1 and R2 are rule identiﬁers with arguments). For example, the following rule represents that r2(X) is preferred to r1(X) when X is a bird: r4(X) :: r1(X) < r2(X) ← bird(X). If we recognize it as a default rule, we can replace ← with ⇐. The rule preference deﬁned as above is called dynamic in the sense that the preference is determined according to its arguments. Semantics of the Rule Preference. A lot of semantics for a rule preference structure have been proposed: introducing the predicate preference relation into circumscription [35,17], introducing the rule preference relation into the default theory [4,8,1,5,6], using literal preference relation [7,48], deﬁning its semantics as translation rules [33,47]. Among these, our system adopted the approach presented in [33], because it can be easily applied to legal reasoning and is easy to translate into MGTP input clauses. Translation into the MGTP Input Clauses. Assume we have the default rule as: R1 :: L10 ⇐ L11 ∧ . . . , L1m ∧ not L1m+1 ∧ . . . ∧ not L1n .

A Model Generation Based Theorem Prover MGTP for First-Order Logic

207

If we have the following default rule: R2 :: L20 ⇐ L21 ∧ . . . ∧ L2k ∧ not L2k+1 ∧ . . . ∧ not L2q . then R1 is translated to: L10 ← L11 ∧ . . . ∧ L1m ∧ not L1m+1 ∧ . . . ∧ not L1n ∧ not def eated(R1 ). This translation shows the interpretation of our default rules, that is, the rule head can be derived if the rule body is satisﬁed and there is no proof that R1 can be defeated. The predicate def eated is newly introduced and deﬁned as the following rules: def eated(R2 θ) ← L11 θ ∧ . . . ∧ L1m θ ∧ not L1m+1 θ ∧ . . . ∧ not L1n θ ∧ not def eated(R1 θ)∧ L21 θ ∧ . . . ∧ L2k θ ∧ not L2k+1 θ ∧ . . . ∧ not L2q θ ∧ not R1 θ < R2 θ. where θ is a most general uniﬁer that satisﬁes the following condition: There exists the unifier θ such that L10 θ = ¬L20 θ, or there exists the unifier θ such that for some integrity constraint ← L1 ∧ L2 , L1 θ = L10 θ and L2 θ = L20 θ, or L2 θ = L10 θ and L1 θ = L20 θ. In this way, default rules with rule preference relations are translated to the rule with NAF, The deduction process in MGTP for those rule set is based on [29]. Introducing Modal Operator. For each NAF literal in a rule, a modal operator K is introduced. If we have the following clause: Al ← Al+1 ∧ . . . ∧ Am ∧ not Am+1 ∧ . . . ∧ not An then we translate it with modal operators into: Al+1 ∧ . . . ∧ Am → (−KAm+1 ∧ . . . ∧ −KAn , Al ) ∨ KAm+1 ∨ . . . ∨ KAn In addition, we provide the integrity constraint for K such as P ∧ ¬KP →, which enables MGTP to derive the stable models for the given input clauses. These integrity constraints are built in the MGTP deduction process with slight modiﬁcation. Extracting Stable Models. The derived models from MGTP contain not only all possible stable models but also the models which are constructed only by hypotheses. A stable model must satisfy the following condition called T-condition. T-condition is a criteria to extract ﬁnal stable models from the derived models from MGTP. T-Condition. If KP ∈ M , then P ∈ M . If the proof structure included in a stable model also occurs in all the other stable models, we call it a justified argument, otherwise a plausible argument. Justiﬁed arguments are sound for any attacks against them, while plausible arguments are not sound for some attacks, that is, they might be attacked by some arguments and cause a conﬂict.

208

Ryuzo Hasegawa et al.

Fig. 17. The interface window in the argumentation support system

System and Experiments. We have developed an argumentation support system [27] including the legal reasoning system by MGTP. The system is written in Java and works on each client machine which is connected with other client via a TCP/IP network. Each participant (including parties concerned and a judge if needed) makes argument diagrams according to his/her own assertion by hand or sometimes automatically, and sends them to all others. Figure 17 shows an example of argument diagrams on the user interface window. The system maintains the current status of each node, that is, agreed by all, disagreed by someone, attacked by some nodes or attacking some nodes, etc. Based on these status, the judge, if necessary, intervenes their arguments and undertakes mediation. As an experiment, we implemented a part of Japanese civil law on the system. More than 10 legal experts used the system, investigated the arguments which were automatically derived from the legal reasoning system, and had high opinions of the ability about: representation of the extended logic programming language, negotiation protocol adopted, and eﬃciency of reasoning.

6

Conclusion

We have reviewed research and development of the model generation theorem prover MGTP, including our recent activities around it. MGTP is one of successful application systems developed at the FGCS project. MGTP achieved more than a 200-fold speedup on a PIM/m consisting of 256 PEs for many theorem proving benchmarks. By using parallel MGTP systems, we succeeded in solving some hard mathematical problems such as

A Model Generation Based Theorem Prover MGTP for First-Order Logic

209

condensed detachment problems and quasigroup existence problems in ﬁnite algebra. In the current parallel implementation, however, we have to properly use an AND parallel MGTP for Horn problems and an OR parallel MGTP for nonHorn problems separately. Thus, it is necessary to develop a parallel version of MGTP which can combine AND- and OR-parallelization for proving a set of general clauses. In addition, when running MGTP (written in Klic [14]) on other commercial parallel computers, it is diﬃcult for them to attain such a good parallel performance as PIM for problems that require ﬁne-grain concurrency. At present, the N-sequential method to exploit coarse-grain concurrency with low communication costs would be a practical solution for this. Recent results with Java versions of MGTP (JavaMGTP) shows several tens fold speedup compared to Klic versions. This achievement is largely due to the new A-cell mechanism for handling multiple contexts and several language facilities of Java including destructive assignment to variables. To enhance the MGTP’s pruning ability, we extended the MGTP features in several ways. NHM is a key technology for making MGTP practical and applicable to several applications such as disjunctive databases and abductive reasoning. The essence of the NHM method is to simulate a top-down evaluation in a framework of bottom-up computation by static clause transformation to propagate goal (negative) information, thereby pruning search spaces. This propagation is closely related to the technique developed in CMGTP to manipulate (negative) constraints. Thus, further research is needed to clarify whether the NHM method can be incorporated to CMGTP or its extended version, IVMGTP. It is also important in real applications that MGTP avoids duplicating the same subproofs and generating nonminimal models. The proof simpliﬁcation based on dependency analysis is a technique to embed both folding-up and proof condensation in a model generation framework, and has a similar eﬀect to NHM. Although the proof simpliﬁcation is weaker than NHM in the sense that relevancy testing is performed after a model extension occurs, it is compensated by the folding-up function embedded. Incorporating this method into a minimal model generation prover MM-MGTP would enhance its pruning ability furthermore. Lastly, we have shown that the feature of negation as failure, which is a most important invention in logic programming, can be easily implemented on MGTP, and have presented a legal reasoning system employing the feature. The basic idea behind this is to translate formulas with special properties, such as non-monotonicity and modality, into ﬁrst order clauses on which MGTP works as a meta-interpreter. The manipulation of these properties is thus reduced to generate-and-test problems for model candidates. These can then be handled by the MGTP very eﬃciently through case-splitting of disjunctive consequences and rejection of inconsistent model candidates. A family of MGTP systems is available at http://ss104.is.kyushu-u.ac. jp/software/.

210

Ryuzo Hasegawa et al.

Acknowledgment We would like to thank Prof. Kazuhiro Fuchi of Keio University, the then director of ICOT, and Prof. Koichi Furukawa of Keio University, the then deputy director of ICOT, who have given us continuous support and helpful comments during the Fifth Generation Computer Systems Project. Thanks are also due to members of the MGTP research group including Associate Prof. Katsumi Inoue of Kobe University and Prof. Katsumi Nitta of Tokyo Institute of Technology for their fruitful discussions and cooperation.

References 1. Franz Baader and Bernhard Hollunder. How to prefer more speciﬁc defaults in terminological default logic. In Proc. International Joint Conference on Artificial Intelligence, pages 669–674, 1993. 2. Peter Baumgartner, Ulrich Furbach, and Ilkka Niemel¨ a. Hyper Tableaux. In Jos´e J´ ulio Alferes, Lu´ıs Moniz Pereira, and Ewa OrJlowska, editors, Proc. European Workshop: Logics in Artificial Intelligence, JELIA, volume 1126 of Lecture Notes in Artificial Intelligence, pages 1–17. Springer-Verlag, 1996. 3. Frank Bennett. Quasigroup Identities and Mendelsohn Designs. Canadian Journal of Mathematics, 41:341–368, 1989. 4. Gerhard Brewka. Preferred subtheories : An extended logical framework for default reasoning. In Proc. International Joint Conference on Artificial Intelligence, pages 1043–1048, Detroit, MI, USA, 1989. 5. Gerhard Brewka. Adding priorities and speciﬁcity to default logic . In Proc. JELIA 94, pages 247–260, 1994. 6. Gerhard Brewka. Reasoning about priorities in default logic. In Proc. AAAI 94, pages 940–945, 1994. 7. Gerhard Brewka. Well-founded semantics for extended logic programs with dynamic preference. Journal of Artificial Intelligence Research, 4:19–36, 1996. 8. Gerhard Brewka and Thomas F. Gordon. How to Buy a Porsche: An Approach to defeasible decision making. In Proc. AAA94 workshop on Computational Dialectics, 1994. 9. Fran¸cois Bry. Query evaluation in recursive databases: bottom-up and top-down reconciled. Data & Knowledge Engineering, 5:289–312, 1990. 10. Fran¸cois Bry and Adnan Yahya. Minimal Model Generation with Positive Unit Hyper-Resolution Tableaux. In Proc. 5th International Workshop, TABLEAUX’96, volume 1071 of Lecture Notes in Artificial Intelligence, pages 143–159, Terrasini, Palermo, Italy, May 1996. Springer-Verlag. 11. Hiroshi Fujita and Ryuzo Hasegawa. A Model-Generation Theorem Prover in KL1 Using Ramiﬁed Stack Algorithm. In Proc. 8th International Conference on Logic Programming, pages 535–548. The MIT Press, 1991. 12. Masayuki Fujita, Ryuzo Hasegawa, Miyuki Koshimura, and Hiroshi Fujita. Model Generation Theorem Provers on a Parallel Inference Machine. In Proc. International Conference on Fifth Generation Computer Systems, volume 1, pages 357– 375, Tokyo, Japan, June 1992. 13. Masayuki Fujita, John Slaney, and Frank Bennett. Automatic Generation of Some Results in Finite Algebra. In Proc. International Joint Conference on Artificial Intelligence, 1993.

A Model Generation Based Theorem Prover MGTP for First-Order Logic

211

14. Tetsuro Fujita, Takashi Chikayama, Kazuaki Rokuwasa, and Akihiko Nakase. KLIC: A Portable Implementation of KL1. In Proc. International Conference on Fifth Generation Computer Systems, pages 66–79, Tokyo, Japan, December 1994. 15. Michael Gelfond and Vladimir Lifschitz. The Stable Model Semantics for Logic Programming. In Proc. 5th International Conference and Symposium on Logic Programming, pages 1070–1080. MIT Press, 1988. 16. Michael Gelfond and Vladimir Lifschitz. Classical Negation in Logic Programs and Disjunctive Databases. New Generation Computing, 9:365–385, 1991. 17. Benjamin Grosof. Generalization Prioritization. In Proc. 2nd Conference on Knowledge Representation and Reasoning, pages 289–300, 1991. 18. Reiner H¨ ahnle. Tableaux and related methods. In Alan Robinson and Andrei Voronkov, editors, Handbook of Automated Reasoning, volume I. North-Holland, 2001. 19. Reiner H¨ ahnle, Ryuzo Hasegawa, and Yasuyuki Shirai. Model Generation Theorem Proving with Finite Interval Constraints. In Proc. First International Conference on Computational Logic (CL2000), 2000. 20. Ryuzo Hasegawa and Hiroshi Fujita. Implementing a Model-Generation Based Theorem Prover MGTP in Java. Research Reports on Information Science and Electrical Engineering, 3(1):63–68, 1998. 21. Ryuzo Hasegawa and Hiroshi Fujita. A new Implementation Technique for a ModelGeneration Theorem Prover to Solve Constraint Satisfaction Problems. Research Reports on Information Science and Electrical Engineering, 4(1):57–62, 1999. 22. Ryuzo Hasegawa, Hiroshi Fujita, and Miyuki Koshimura. MGTP: A Parallel Theorem-Proving System Based on Model Generation. In Proc. 11th International Conference on Applications of Prolog, pages 34–41, Tokyo, Japan, September 1998. 23. Ryuzo Hasegawa, Hiroshi Fujita, and Miyuki Koshimura. Eﬃcient Minimal Model Generation Using Branching Lemmas. In Proc. 17th International Conference on Automated Deduction, volume 1831 of Lecture Notes in Artificial Intelligence, pages 184–199, Pittsburgh, Pennsylvania, USA, June 2000. Springer-Verlag. 24. Ryuzo Hasegawa, Katsumi Inoue, Yoshihiko Ohta, and Miyuki Koshimura. NonHorn Magic Sets to Incorporate Top-down Inference into Bottom-up Theorem Proving. In Proc. 14th International Conference on Automated Deduction, volume 1249 of Lecture Notes in Artificial Intelligence, pages 176–190, Townsville, North Queensland, Australia, July 1997. Springer-Verlag. 25. Ryuzo Hasegawa and Miyuki Koshimura. An AND Parallelization Method for MGTP and Its Evaluation. In Proc. First International Symposium on Parallel Symbolic Computation, Lecture Notes Series on Computing, pages 194–203. World Scientiﬁc, September 1994. 26. Ryuzo Hasegawa, Miyuki Koshimura, and Hiroshi Fujita. Lazy Model Generation for Improving the Eﬃciency of Forward Reasoning Theorem Provers. In Proc. International Workshop on Automated Reasoning, pages 221–238, Beijing, China, July 1992. 27. Ryuzo Hasegawa, Katsumi Nitta, and Yasuyuki Shirai. The Development of an Argumentation Support System Using Theorem Proving Technologies. In Research Report on Advanced Software Enrichment Program 1997, pages 59–66. Information Promotion Agency, Japan, 1999. (in Japanese). 28. Ryuzo Hasegawa and Yasuyuki Shirai. Constraint Propagation of CP and CMGTP: Experiments on Quasigroup Problems. In Proc. Workshop 1C (Automated Reasoning in Algebra), CADE-12, Nancy, France, 1994.

212

Ryuzo Hasegawa et al.

29. Katsumi Inoue, Miyuki Koshimura, and Ryuzo Hasegawa. Embedding Negation as Failure into a Model Generation Theorem Prover. In Proc. 11th International Conference on Automated Deduction, volume 607 of Lecture Notes in Artificial Intelligence, pages 400–415, Saratoga Springs, NY, USA, 1992. Springer-Verlag. 30. Katsumi Inoue, Yoshihiko Ohta, Ryuzo Hasegawa, and Makoto Nakashima. Bottom-Up Abduction by Model Generation. In Proc. International Joint Conference on Artificial Intelligence, pages 102–108, 1993. 31. Miyuki Koshimura and Ryuzo Hasegawa. Modal Propositional Tableaux in a Model Generation Theorem Prover. In Proc. 3rd Workshop on Theorem Proving with Analytic Tableaux and Related Methods, pages 145–151, May 1994. 32. Miyuki Koshimura and Ryuzo Hasegawa. Proof Simpliﬁcation for Model Generation and Its Applications. In Proc. 7th International Conference, LPAR 2000, volume 1955 of Lecture Notes in Artificial Intelligence, pages 96–113. SpringerVerlag, November 2000. 33. Robert A. Kowalski and Francesca Toni. Abstract Argumentation. Artificial Intelligence and Law Journal, 4:275–296, 1996. 34. Reinhold Letz, Klaus Mayr, and Christoph Goller. Controlled Integration of the Cut Rule into Connection Tableau Calculi. Journal of Automated Reasoning, 13:297–337, 1994. 35. Vladimir Lifschitz. Computing Circumscription. In Proc. International Joint Conference on Artificial Intelligence, pages 121–127, Los Angeles, CA, USA, 1985. 36. Donald W. Loveland, David W. Reed, and Debra S. Wilson. Satchmore: Satchmo with RElevancy. Journal of Automated Reasoning, 14(2):325–351, April 1995. 37. James J. Lu. Logic Programming with Signs and Annotations. Journal of Logic and Computation, 6(6):755–778, 1996. 38. Rainer Manthey and Fran¸oois Bry. SATCHMO: a theorem prover implemented in Prolog. In Proc. 9th International Conference on Automated Deduction, volume 310 of Lecture Notes in Computer Science, pages 415–434, Argonne, Illinois, USA, May 1988. Springer-Verlag. 39. William McCune and Larry Wos. Experiments in Automated Deduction with Condensed Detachment. In Proc. 11th International Conference on Automated Deduction, volume 607 of Lecture Notes in Artificial Intelligence, pages 209–223, Saratoga Springs, NY, USA, 1992. Springer-Verlag. 40. Jack Minker. On indeﬁnite databases and the closed world assumption. In Proc. 6th International Conference on Automated Deduction, volume 138 of Lecture Notes in Computer Science, pages 292–308, Courant Institute, USA, 1982. Springer-Verlag. 41. Ugo Montanari and Francesca Rossi. Finite Domain Constraint Solving and Constraint Logic Programming. In Constraint Logic Programming: Selected Research, pages 201–221. The MIT press, 1993. 42. Hiroshi Nakashima, Katsuto Nakajima, Seiichi Kondo, Yasutaka Takeda, Y¯ u Inamura, Satoshi Onishi, and Kanae Matsuda. Architecture and Implementation of PIM/m. In Proc. International Conference on Fifth Generation Computer Systems, volume 1, pages 425–435, Tokyo, Japan, June 1992. 43. Ilkka Niemel¨ a. A Tableau Calculus for Minimal Model Reasoning. In Proc. 5th International Workshop, TABLEAUX’96, volume 1071 of Lecture Notes in Artificial Intelligence, pages 278–294, Terrasini, Palermo, Italy, May 1996. Springer-Verlag. 44. Katsumi Nitta, Yoshihisa Ohtake, Shigeru Maeda, Masayuki Ono, Hiroshi Ohsaki, and Kiyokazu Sakane. HELIC-II: A Legal Reasoning System on the Parallel Inference Machine. In Proc. International Conference on Fifth Generation Computer Systems, volume 2, pages 1115–1124, Tokyo, Japan, June 1992.

A Model Generation Based Theorem Prover MGTP for First-Order Logic

213

45. Yoshihiko Ohta, Katsumi Inoue, and Ryuzo Hasegawa. On the Relationship Between Non-Horn Magic Sets and Relevancy Testing. In Proc. 15th International Conference on Automated Deduction, volume 1421 of Lecture Notes in Artificial Intelligence, pages 333–349, Lindau, Germany, July 1998. Springer-Verlag. 46. Franz Oppacher and E. Suen. HARP: A Tableau-Based Theorem Prover. Journal of Automated Reasoning, 4:69–100, 1988. 47. Henry Prakken and Giovanni Sartor. Argument-based Extended Logic Programming with Defeasible Priorities. Journal of Applied Non-Classical Logics, 7:25–75, 1997. 48. Chiaki Sakama and Katsumi Inoue. Representing Priorities in Logic Programs. In Proc. International Conference and Symposium on Logic Programming, pages 82–96, 1996. 49. Heribert Sch¨ utz and Tim Geisler. Eﬃcient Model Generation through Compilation. In Proc. 13th International Conference on Automated Deduction, volume 1104 of Lecture Notes in Artificial Intelligence, pages 433–447. Springer-Verlag, 1996. 50. Yasuyuki Shirai and Ryuzo Hasegawa. Two Approaches for Finite-domain Constraint Satisfaction Problem - CP and MGTP -. In Proc. 12th International Conference on Logic Programming, pages 249–263. MIT Press, 1995. 51. Mark Stickel. The Path-Indexing Method For Indexing Terms. Technical Note 473, AI Center, SRI, 1989. 52. Mark E. Stickel. Upside-Down Meta-Interpretation of the Model Elimination Theorem-Proving Procedure for Deduction and Abduction. Journal of Automated Reasoning, 13(2):189–210, October 1994. 53. Geoﬀ Sutcliﬀe, Christian Suttner, and Theodor Yemenis. The TPTP Problem Library. In Proc. 12th International Conference on Automated Deduction, volume 814 of Lecture Notes in Artificial Intelligence, pages 252–266, Nancy, France, 1994. Springer-Verlag. 54. Evan Tick and Miyuki Koshimura. Static Mode Analyses of Concurrent Logic Programs. Journal of Programming Languages, 2:283–312, 1994. 55. Kazunori Ueda and Takashi Chikayama. Design of the Kernel Language for the Parallel Inference Machine. Computer Journal, 33:494–500, December 1990. 56. Debra S. Wilson and Donald W. Loveland. Incorporating Relevancy Testing in SATCHMO. Technical Reports CS-1989-24, Department of Computer Science, Duke University, Durham, North Carolina, USA, 1989.

A ‘Theory’ Mechanism for a Proof-Verifier Based on First-Order Set Theory Eugenio G. Omodeo1 and Jacob T. Schwartz2 1

2

University of L’Aquila, Dipartimento di Informatica [email protected] University of New York, Department of Computer Science, Courant Institute of Mathematical Sciences [email protected]

We often need to associate some highly compound meaning with a symbol. Such a symbol serves us as a kind of container carrying this meaning, always with the understanding that it can be opened if we need its content. (Translated from [12, pp. 101–102])

Abstract. We propose classical set theory as the core of an automated proof-veriﬁer and outline a version of it, designed to assist in proof development, which is indeﬁnitely expansible with function symbols generated by Skolemization and embodies a modularization mechanism named ‘theory’. Through several examples, centered on the ﬁnite summation operation, we illustrate the potential utility in large-scale proof-development of the ‘theory’ mechanism: utility which stems in part from the power of the underlying set theory and in part from Skolemization.

Key words: Proof-veriﬁcation technology, set theory, proof modularization.

1

Introduction

Set theory is highly versatile and possesses great expressive power. One can readily ﬁnd terse set-theoretic equivalents of established mathematical notions and express theorems in purely set-theoretic terms. Checking any deep fact (say the Cauchy integral theorem) using a proofveriﬁer requires a large number of logical statements to be fed into the system. These must formalize a line of reasoning that leads from bare set rudiments to the specialized topic of interest (say, functional analysis) and then to a target theorem. Such an enterprise can only be managed eﬀectively if suitable modularization constructs are available.

E.G. Omodeo enjoyed a Short-term mobility grant of the Italian National Research Council (CNR) enabling him to stay at the University of New York during the preparation of this work.

A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 214–230, 2002. c Springer-Verlag Berlin Heidelberg 2002

A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory

215

This paper outlines a version of the Zermelo-Fraenkel theory designed to assist in automated proof-veriﬁcation of mathematical theorems. This system incorporates a technical notion of “theory” designed, for large-scale proof-development, to play a role similar to the notion of object class in large-scale programming. Such a mechanism can be very useful for “proof-engineering”. The theories we propose, like procedures in a programming language, have lists of formal parameters. Each “theory” requires its parameters to meet a set of assumptions. When “applied” to a list of actual parameters that have been shown to meet the assumptions, a theory will instantiate several additional “output” set, predicate, and function symbols, and then supply a list of theorems initially proved explicitly by the user inside the theory itself. These theorems will generally involve the new symbols. Such use of “theories” and their application adds a touch of second-order logic capability to the ﬁrst-order system which we describe. Since set theory has full multi-tier power, this should be all the second-order capability that is needed. We illustrate the usefulness of the proposed theory notion via examples ranging from mere “utilities” (e.g. the speciﬁcation of ordered pairs and associated projections, and the thinning of a binary predicate into a global single-valued map) to an example which characterizes a very ﬂexible recursive deﬁnition scheme. As an application of this latter scheme, we outline a proof that a ﬁnite summation operation which is insensitive to operand rearrangement and grouping can be associated with any commutative-associative operation. This is an intuitively obvious fact (seldom, if ever, proved explicitly in algebra texts), but nevertheless it must be veriﬁed in a fully formalized context. Even this task can become unnecessarily challenging without an appropriate set-theoretic support, or without the ability to indeﬁnitely extend the formal language with new Skolem symbols such as those resulting from “theory” invocations. Our provisional assessment of the number of “proofware” lines necessary to reach the Cauchy integral theorem in a system like the one which we outline is 20–30 thousand statements.

2

Set Theory as the Core of a Proof-Verifier

A fully satisfactory formal logical system should be able to digest ‘the whole of mathematics’, as this develops by progressive extension of mathematics-like reasoning to new domains of thought. To avoid continual reworking of foundations, one wants the formal system taken as basic to remain unchanged, or at any rate to change only by extension as such eﬀorts progress. In any fundamentally new area work and language will initially be controlled more by guiding intuitions than by entirely precise formal rules, as when Euclid and his predecessors ﬁrst realized that the intuitive properties of geometric ﬁgures in 2 and 3 dimensions, and also some familiar properties of whole numbers, could be covered by modes of reasoning more precise than those used in everyday life. But mathematical developments during the last two centuries have reduced the intuitive

216

Eugenio G. Omodeo and Jacob T. Schwartz

content of geometry, arithmetic, and calculus (‘analysis’) in set-theoretic terms. The geometric notion of ‘space’ maps into ‘set of all pairs (or triples) of real numbers’, allowing consideration of the ‘set of all n-tuples of real numbers’ as ‘n-dimensional space’, and of more general related constructs as ‘inﬁnite dimensional’ and ‘functional’ spaces. The ‘ﬁgures’ originally studied in geometry map, via the ‘locus’ concept, into sets of such pairs, triples, etc. Dedekind reduced ‘real number x’ to ‘set x of rational numbers, bounded above, such that every rational not in x is larger than every rational in x’. To eliminate everything but set theory from the formal foundations of mathematics, it only remained (since ‘fractions’ can be seen as pairs of numbers) to reduce the notion of ‘integer’ to set-theoretic terms. This was done by Cantor and Frege: an integer is the class of all ﬁnite sets in 1-1 correspondence with any one such set. Subsequently Kolmogorov modeled ‘random’ variables as functions deﬁned on an implicit settheoretic measure space, and Laurent Schwartz interpreted the initially puzzling ‘delta functions’ in terms of a broader notion of generalized function systematically deﬁned in set-theoretic terms. So all of these concepts can be digested without forcing any adjustment of the set-theoretic foundation constructed for arithmetic, analysis, and geometry. This foundation also supports all the more abstract mathematical constructions elaborated in such 20th century ﬁelds as topology, abstract algebra, and category theory. Indeed, these were expressed settheoretically from their inception. So (if we ignore a few ongoing explorations whose signiﬁcance remains to be determined) set theory currently stands as a comfortable and universal basis for the whole of mathematics—cf. [5]. It can even be said that set theory captures a set of reality-derived intuitions more fundamental than such basic mathematical ideas as that of number. Arithmetic would be very diﬀerent if the real-world process of counting did not return the same result each time a set of objects was counted, or if a subset of a ﬁnite set S of objects proved to have a larger count than S. So, even though Peano showed how to characterize the integers and derive many of their properties using axioms free of any explicit set-theoretic content, his approach robs the integers of much of their intuitive signiﬁcance, since in his reduced context they cannot be used to count anything. For this and the other reasons listed above, we prefer to work with a thoroughly set-theoretic formalism, contrived to mimic the language and procedures of standard mathematics closely.

3

Set Theory in a Nutshell

Set theory is based on the handful of very powerful ideas summarized below. All notions and notation are more or less standard (cf. [16]).1 – The dyadic Boolean operations ∩, \, ∪ are available, and there is a null set, ∅, devoid of elements. The membership relation ∈ is available, and set nesting is 1

As a notational convenience, we usually omit writing universal quantiﬁers at the beginning of a sentence, denoting the variables which are ruled by these understood quantiﬁers by single uppercase Italic letters.

A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory

217

made possible via the singleton operation X → {X}. Derived from this, we have single-element addition and removal, and useful increment/decrement operations: X with Y := X ∪ {Y }, X less Y := X \ {Y }, next(X) := X with X. Unordered lists {t1 , . . . , tn } and ordered tuples [t1 , . . . , tn ] are deﬁnable too: in particular, {X1 , . . . , Xn } := {X1 } ∪ · · · ∪ {Xn }. – ‘Sets whose elements are the same are identical’: Following a step = r in a proof, one can introduce a new constant b subject to the condition b ∈ ↔ b ∈ / r; no subsequent conclusions where b does not appear will depend on this condition. Negated set inclusion ⊆ can be treated similarly, since X ⊆ Y := X \ Y = ∅. – Global choice: We use an operation arb which, from any non-null set X, deterministically extracts an element which does not intersect X. Assuming arb ∅ = ∅ for deﬁniteness, this means that arb X ∈ next(X) & X ∩ arb X = ∅ for all X. – Set-formation: By (possibly transﬁnite) element- or subset-iteration over the sets represented by the terms t0 , t1 ≡ t1 (x0 ), ..., tn ≡ tn (x0 , ..., xn−1 ), we can form the set { e : x0 C0 t0 , x1 C1 t1 , . . . , xn Cn tn | ϕ } , where each Ci is either ∈ or ⊆, and where e ≡ e(x0 , . . . , xn ) and ϕ ≡ ϕ(x0 , . . . , xn ) are a set-term and a condition in which the p.w. distinct variables xi can occur free (similarly, each tj+1 may involve x0 , . . . , xj ). Many operations are readily deﬁnable using setformers, e.g. Y := { x2 : x1 ∈ Y, x2 ∈ x1 } , Y × Z := { [x1 , x2 ] : x1 ∈ Y, x2 ∈ Z } , (Y ) := { x : x ⊆ Y } , pred(X) := arb { y ∈ X | next(y) = X } ,

P

where if the condition ϕ is omitted it is understood to be true, and if the term e is omitted it is understood to be the same as the ﬁrst variable inside the braces. – ∈-recursion: (“Transﬁnite”) recursion over the elements of any set allows one to introduce global set operations; e.g., Ult membs(S) := S ∪ { Ult membs(x) : x ∈ S } and rank(S) := { next( rank(x) ) : x ∈ S } , which respectively give the set of all “ultimate members” (i.e. elements, elements of elements, etc.) of S and the maximum “depth of nesting” of sets inside S. – ‘Infinite sets exist’: There is at least one s inf satisfying s inf = ∅ & (∀ x ∈ s inf)({x} ∈ s inf) , so that the p.w. distinct elements b, {b}, {{b}}, {{{b}}}, . . . belong to s inf for each b in s inf.

218

Eugenio G. Omodeo and Jacob T. Schwartz

The historical controversies concerning the choice and replacement axioms of set theory are all hidden in our use of setformers and in our ability, after a statement of the form ∃ y ψ(X1 , . . . , Xn , y) has been proved, to introduce a Skolem function f (X1 , . . . , Xn ) satisfying the condition ψ ( X1 , . . . , Xn , f (X1 , . . . , Xn ) ). In particular, combined use of arb and of the setformer construct lets us write the choice set of any set X of non-null pairwise disjoint sets simply as { arb y : y ∈ X }.2 To appreciate the power of the above formal language, consider von Neumann’s elegant deﬁnition of the predicate ‘X is a (possibly transﬁnite) ordinal’, and the characterization of R , the set of real numbers, as the set of Dedekind cuts (cf. [17]):

P

Ord(X) := X ⊆ (X) & (∀ y, z ∈ X)(y ∈ z ∨ y = z ∨ z ∈ y) , R := { c ⊆ Q | (∀ y ∈ c)(∃ z ∈ c)(y < z) & (∀ y ∈ c)(∀ z ∈ Q )(z < y → z ∈ c) } \ {∅, Q }; here the ordered ﬁeld before R .3

4

Q , < of rational numbers is assumed to have been deﬁned

Theories in Action: First Examples

Here is one of the most obvious theories one can think of: THEORY ordered pair() ==>(opair, car, cdr) car( opair(X, Y ) ) = X cdr( opair(X, Y ) ) = Y opair(X, Y ) = opair(U, V ) → X = U & Y = V END ordered pair. This THEORY has no input parameters and no assumptions, and returns three global functions: a pairing function and its projections. To start its construction, the user simply has to SUPPOSE THEORY ordered pair() ==> END ordered pair, then to ENTER THEORY ordered pair, and next to deﬁne e.g. opair(X, Y ) := { {X}, { {X}, {Y, {Y }} } } , car(P ) := arb arb P , cdr(P ) := car( arb (P \ {arb P }) \ {arb P } ) . 2 3

Cf. [18, p. 177]. Even in the more basic framework of ﬁrst-order predicate calculus, the availability of choice constructs can be highly desirable, cf. [1]. For an alternative deﬁnition of real numbers which works very well too, see E.A. Bishop’s adaptation of Cauchy’s construction of R in [2, pp. 291–297].

A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory

219

This makes it possible to prove such intermediate lemmas as arb {U } = U , V ∈ Z → arb {V, Z} = V , car( { {X}, { {X}, W } } ) = X , arb opair(X, Y ) = {X} , cdr( opair(X, Y ) ) = car( { { Y, {Y } } } ) = Y . Once these intermediate results have been used to prove the three theorems listed earlier, the user can indicate that they are the ones he wants to be externally visible, and that the return-parameter list consists of opair, car, cdr (the detailed deﬁnitions of these symbols, as well as the intermediate lemmas, have hardly any signiﬁcance outside the THEORY itself4 ). Then, after re-entering the main THEORY, which is set theory, the user can APPLY(opair, head, tail) ordered pair() ==> head( opair(X, Y ) ) = X tail( opair(X, Y ) ) = Y opair(X, Y ) = opair(U, V ) → X = U & Y = V , thus importing the three theorems into the main proof level. As written, this application also changes the designations ‘car’ and ‘cdr’ into ‘head’ and ‘tail’. Fig.1 shows how to take advantage of the functions just introduced to deﬁne notions related to maps that will be needed later on.5 is map(F ) := F = {[head(x), tail(x)] : x ∈ F } Svm(F ) := is map(F ) & (∀ x, y ∈ F )( head(x) = head(y) → x = y ) 1 1 map(F ) := Svm(F ) & (∀ x, y ∈ F )( tail(x) = tail(y) → x = y ) F −1 := {[tail(x), head(x)] : x ∈ F } domain(F ) := {head(x) : x ∈ F } range(F ) := {tail(x) : x ∈ F } F {X} := { y ∈ range(F ) | [X, y] ∈ F } F|S := F ∩ ( S × range(F ) ) Finite(S) := ¬ ∃ f ( 1 1 map(f ) & S = domain(f ) = range(f ) ⊆ S )

Fig. 1. Notions related to maps, single-valued maps, and 1-1 maps

For another simple example, suppose that the theory THEORY setformer0(e, s, p) ==> s = ∅ → { e(x) : x ∈ s } = ∅ { x ∈ s | p(x) } = ∅ → { e(x) : x ∈ s | p(x) } = ∅ END setformer0 4 5

A similar remark on Kuratowski’s encoding of an ordered pair as a set of the form {{x, y}, {x}} is made in [14, pp. 50–51]. We subsequently return to the notation [X, Y ] for opair(X, Y ).

220

Eugenio G. Omodeo and Jacob T. Schwartz

has been proved, but that its user subsequently realizes that the reverse implications could be helpful too; and that the formulae s ⊆ T → { e(x) : x ∈ s | p(x) } ⊆ { e(x) : x ∈ T | p(x) } , s ⊆ T & (∀ x ∈ T \ s)¬ p(x) → { e(x) : x ∈ s | p(x) } = { e(x) : x ∈ T | p(x) } are also needed. He can then re-enter the THEORY setformer0, strengthen the implications already proved into bi-implications, and add the new results: of course he must then supply proofs of the new facts. Our next sample THEORY receives as input a predicate P ≡ P(X, V ) and an “exception” function xcp ≡ xcp(X); it returns a global function img ≡ img(X) which, when possible, associates with its argument X some Y such that P(X, Y ) holds, and otherwise associates with X the “ﬁctitious” image xcp(X). The THEORY has an assumption, intended to guarantee non-ambiguity of the ﬁctitious value: THEORY fcn from pred(P, xcp) ¬ P( X, xcp(X) ) -- convenient “guard” ==>(img) img(X) = xcp(X) ↔ ∃ v P(X, v) P(X, V ) → P( X, img(X) ) END fcn from pred. To construct this THEORY from its assumption, the user can simply deﬁne img(X) := if P( X, try(X) ) then try(X) else xcp(X) end if , where try results from Skolemization of the valid ﬁrst-order formula ∃ y ∀ v ( P(X, v) → P(X, y) ) , after which the proofs of the theorems of fcn from pred pose no problems. As an easy example of the use of this THEORY, note that it can be invoked in the special form APPLY(img) fcn from pred( P(X, Y ) → Y ∈ X & Q(Y ), xcp(X) → X

)==> · · ·

for any monadic predicate Q (because ∈ is acyclic); without the condition Y ∈ X such an invocation would instead result in an error indication, except in the uninteresting case in which one has proved that ∀ x ¬ Q(x). Here is a slightly more elaborate example of a familiar THEORY: THEORY equivalence classes(s, Eq) (∀ x ∈ s)( Eq(x, x) ) (∀ x, y, z ∈ s)( Eq(x, y) → ( Eq(y, z) ↔ Eq(x, z) ) ) ==>(quot, cl of) -- “quotient”-set and globalized “canonical embedding” (∀ x, y ∈ s)( Eq(x, y) ↔ Eq(y, x) )

A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory

221

(∀ x ∈ s)( cl of(x) ∈ quot ) (∀ b ∈ quot)( arb b ∈ s & cl of(arb b) = b ) (∀ y ∈ s)( Eq(x, y) ↔ cl of(x) = cl of(y) ) END equivalence classes. Suppose that this THEORY has been established, and that N , Z, and the multiplication operation ∗ have been deﬁned already, where N is the set of natural numbers, and Z, intended to be the set of signed integers, is deﬁned (somewhat arbitrarily) as

Z := {[n, m] : n, m ∈ N | n = 0 ∨ m = 0} . Here the position of 0 in a pair serves as a sign indication, and the restriction of ∗ to Z × Z is integer multiplication (but actually, x ∗ y is always deﬁned, whether or not x, y ∈ Z). Then the set Fr of fractions and the set Q of rational numbers can be deﬁned as follows: Fr := { [x, y] : x, y ∈ Z | y = [0, 0] } , Same frac(F, G) := ( head(F ) ∗ tail(G) = tail(F ) ∗ head(G) ), APPLY(Q , Fr to Q ) equivalence classes( s → Fr, Eq(F, G) → Same frac(F, G) )==> · · · Before APPLY can be invoked, one must prove that the restriction of Same frac to Fr meets the THEORY assumptions, i.e. it is an equivalence relation. Then the system will not simply return the two new symbols Q and Fr to Q , but will provide theorems insuring that these represent the standard equivalence-class reduction Fr/Same frac and the canonical embedding of Fr into this quotient. Note as a curiosity —which however hints at the type of hiding implicit in the THEORY mechanism— that a Q satisfying the conclusions of the THEORY is not actually forced to be the standard partition of Fr but can consist of singletons or even of supersets of the equivalence classes (which is harmless).

5

A Final Case Study: Finite Summation

Consider the operation Σ(F ) or, more explicitly,

x∈domain(F )

[x,y]∈F

y

available for any finite map F (and in particular when domain(F ) = d ∈ N , so that x ∈ d amounts to saying that x = 0, 1, . . . , d − 1) such that range(F ) ⊆ abel, where abel is a set on which a given operation + is associative and commutative and has a unit element u. Most of this is captured formally by the following THEORY:

222

Eugenio G. Omodeo and Jacob T. Schwartz

THEORY sigma add(abel, +, u) (∀ x, y ∈ abel)(x+y ∈ abel & -- closure w.r.t. . . . x+y = y+x) -- . . . commutative operation u ∈ abel & (∀ x ∈ abel)(x+u = x) -- designated unit element (∀ x, y, z ∈ abel)( (x+y)+z = x+(y+z) )-- associativity ==>(Σ) -- summation operation Σ(∅) = u & (∀ x ∈ N )(∀ y ∈ abel)( Σ({[x, y]}) = y ) is map(F ) & Finite(F ) & range(F ) ⊆ abel & domain(F ) ⊆ N → Σ(F ) = Σ(F ∩ G) + Σ(F \ G) -- additivity END sigma add. We show below how to construct this THEORY from its assumptions, and how to generalize it into a THEORY gen sigma add in which the condition domain(F ) ⊆ N is dropped, allowing the condition (∀ x ∈ N )(∀ y ∈ abel)( Σ({[x, y]}) = y ) to be simpliﬁed into (∀ y ∈ abel)( Σ({[X, y]}) = y ). After this, we will sketch the proof of a basic property (‘rearrangement of terms’) of this generalized summation operation. 5.1

Existence of a Finite Summation Operation

In order to tackle even the simple sigma add, it is convenient to make use of recursions somewhat diﬀerent (and actually simpler) than the fully general transﬁnite ∈-recursion axiomatically available in our version of set theory. Speciﬁcally, we can write Σ(F ) := if F = ∅ then u else tail(arb F ) + Σ(F less arb F ) end if , which is a sort of “tail recursion” based on set inclusion. To see why such constructions are allowed we can use the fact that strict inclusion is a well-founded relation between ﬁnite sets, and in particular that it is well-founded over { f ⊆ N × abel | Finite(f ) }: this makes the above form of recursive deﬁnition acceptable. In preparing to feed this deﬁnition —or something closely equivalent to it— into our proof-veriﬁer, we can conveniently make a d´etour through the following THEORY (note that in the following formulae Ord(X) designates the predicate ‘X is an ordinal’—see end of Sec.3): THEORY well founded set(s, Lt) (∀t ⊆ s)( t = ∅ → (∃ m ∈ t)(∀u ∈ t)¬ Lt(u, m) ) -- Lt is thereby assumed to be irreflexive and well-founded on s ==>(orden) (∀ x, y ∈ s)( ( Lt(x, y) → ¬ Lt(y, x) ) & ¬ Lt(x, x) ) s ⊆ { orden(y) : y ∈ X } ↔ orden(X) = s orden(X) = s ↔ orden(X) ∈ s Ord(U ) & Ord(V ) & orden(U ) = s = orden(V ) → ( Lt( orden(U ), orden(V ) ) → U ∈ V )

A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory

223

{ u ∈ s : Lt( u, orden(V ) ) } ⊆ { orden(x) : x ∈ V } Ord(U ) & Ord(V ) & orden(U ) = s = orden(V ) & U = V → orden(U ) = orden(V ) ∃ o( Ord(o) & s = { orden(x) : x ∈ o } & 1 1 map( {[x, orden(x)] : x ∈ o} ) ) END well founded set. Within this THEORY and in justiﬁcation of it, orden can be deﬁned in two steps: Minrel(T ) := if ∅ = T ⊆ s then arb { m ∈ T | (∀x ∈ T )¬ Lt(x, m) } else s end if , orden(X) := Minrel( s \ { orden(y) : y ∈ X} ) , after which the proof of the output theorems of the THEORY just described will take approximately one hundred lines. Next we introduce a THEORY of recursion on well-founded sets. Even though the deﬁnition of Σ only requires much less, other kinds of recursive deﬁnition beneﬁt if we provide a generous scheme like the following: THEORY recursive fcn(dom, Lt, a, b, P) (∀t ⊆ dom)( t = ∅ → (∃ m ∈ t)(∀u ∈ t)¬ Lt(u, m) ) -- Lt is thereby assumed to be irreflexive and well-founded on dom ==>(rec) (∀ v ∈ dom)( rec(v) = a( v, { b( v, w, rec(w) ) : w ∈ dom | Lt(w, v) & P( v, w, rec(w) ) } ) ) END recursive fcn. The output symbol rec of this THEORY is easily deﬁnable as follows: G(X) := a( orden(X), { b( orden(X), orden(y), G(y) ) : y ∈ X | Lt( orden(y), orden(X) ) & P( orden(X), orden(y), G(y) ) rec(V ) := G( index of(V ) ) ;

}) ,

here orden results from an invocation of our previous THEORY well founded set, namely APPLY(orden) well founded set( s → dom, Lt(X, Y ) → Lt(X, Y ) )==> · · · ; also, the restriction of index to to dom is assumed to be the local inverse of the function orden. Note that the recursive characterization of rec in the theorem of recursive fcn is thus ultimately justiﬁed in terms of the very general form of ∈-recursion built into our system, as appears from the deﬁnition of G. Since we cannot take it for granted that we have an inverse of orden, a second auxiliary THEORY, invokable as APPLY(index of) bijection( f(X) → orden(X), d → o1, r → dom )==> · · · ,

224

Eugenio G. Omodeo and Jacob T. Schwartz

is useful. Here o1 results from Skolemization of the last theorem in well founded set. The new THEORY used here can be speciﬁed as follows: THEORY bijection(f, d, r) 1 1 map( {[x, f(x)] : x ∈ d} ) & r = { f(x) : x ∈ d } f(X) ∈ r → X ∈ d -- convenient “guard” ==>(finv) Y ∈ r → f ( finv(Y ) ) = Y Y ∈ r → finv(Y ) ∈ d X ∈ d ↔ f(X) ∈ r X ∈ d → finv( f(X) ) = X ( finv(Y ) ∈ d & ∃ x( f(x) = Y ) ) ↔ Y ∈ r d = { finv(y) : y ∈ r } & 1 1 map( {[y, finv(y)] : y ∈ r} ) END bijection. This little digression gives us one more opportunity to show the interplay between theories, because one way of deﬁning finv inside bijection would be as follows: APPLY(finv) fcn from pred( P(Y, X) → f(X) = Y & d = ∅ , e(Y ) → if Y ∈ r then d else arb d end if )==> · · · , where fcn from pred is as shown in Sec.4. We can now recast our ﬁrst-attempt deﬁnition of Σ as APPLY(Σ) recursive fcn( dom → { f ⊆ N × abel | is map(f ) & Finite(f ) } , Lt(W, V ) → W ⊆ V & W = V , a(V, Z) → if V = ∅ then u else tail(arb V ) + arb Z end if , b(V, W, Z) → Z , P(V, W, Z) → W = V less arb V )==> · · · , whose slight intricacy is the price being paid to our earlier decision to keep the recursive deﬁnition scheme very general. We skip the proofs that Σ(∅) = u and (∀ x ∈ N )(∀ y ∈ abel)( Σ({[x, y]}) = y ), which are straightforward. Concerning additivity, assume by absurd hypothesis that f is a ﬁnite map with domain(f) ⊆ N and range(f) ⊆ abel such that Σ(f) = Σ(f ∩ g) + Σ(f \ g) holds for some g, and then use the following tiny but extremely useful THEORY (of induction over the subsets of any ﬁnite set) THEORY finite induction(n, P) Finite(n) & P(n) ==>(m) m ⊆ n & P(m) & (∀ k ⊆ m)( k = m → ¬ P(k) ) END finite induction, to get an inclusion-minimal such map, f0, by performing an

A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory

225

APPLY(f0) finite induction( n → f, P(F ) → ∃ g ( Σ(F ) = Σ(F ∩ g) + Σ(F \ g) ) )==> · · · . Reaching a contradiction from this is very easy. 5.2

Generalized Notion of Finite Summation

Our next goal is to generalize the ﬁnite summation operation Σ(F ) to any ﬁnite map F with range(F ) ⊆ abel. To do this we can use a few basic theorems on ordinals, which can be summarized as follows. Deﬁne min el(T, S) := if S ⊆ T then S else arb (S \ T ) end if , enum(X, S) := min el( { enum(y) : y ∈ X}, S ) , for all sets S, T (a use of ∈-recursion quite similar to the construction used inside the THEORY well founded set!6 ). Then the following enumeration theorem holds: ∃ o ( Ord(o) & S = { enum(x, S) : x ∈ o } & (∀ x, y ∈ o)( x = y → enum(x, S) = enum(y, S) ) ) . From this one gets the function ordin by Skolemization. Using the predicate Finite of Fig.1, and exploiting the inﬁnite set s inf axiomatically available in our version of set theory, we can give the following deﬁnition of natural numbers:

N := arb { x ∈ next( ordin(s inf) ) | ¬ Finite(x) } . These characterizations of Finite and

N yield

X ∈ N ↔ ordin(X) = X & Finite(X) , Finite(X) ↔ ordin(X) ∈ N , Finite(F ) → Finite( domain(F ) ) & Finite( range(F ) ) . Using these results and working inside the THEORY gen sigma add, we can obtain the generalized operation Σ by ﬁrst invoking APPLY(σ) sigma add( abel → abel, + → +, u → u )==> · · · and then deﬁning: Σ(F ) := σ ( { [x, y] : x ∈ ordin( domain(F ) ), y ∈ range(F ) | [ enum( x, domain(F ) ), y ] ∈ F } ) . We omit the proofs that Σ(∅) = u, (∀ y ∈ abel)( Σ({[X, y]}) = y ), and Σ(F ) = Σ(F ∩ G) + Σ(F \ G), which are straightforward. 6

This is more than just an analogy: we could exploit the well-foundedness of ∈ to hide the details of the construction of enum into an invocation of the THEORY well founded set.

226

5.3

Eugenio G. Omodeo and Jacob T. Schwartz

Rearrangement of Terms in Finite Summations

To be most useful, the THEORY of Σ needs to encompass various strong statements of the additivity property. Writing Φ(F ) ≡ is map(F ) & Finite( domain(F ) ) & range(F ) ⊆ abel , Ψ (P, X) ≡ X = P & (∀ b, v ∈ P )(b = v → b ∩ v = ∅) for brevity, much of what is wanted can be speciﬁed e.g. as follows: THEORY gen sigma add(abel, +, u) (∀ x, y ∈ abel)(x+y ∈ abel & -- closure w.r.t. . . . x+y = y+x) -- . . . commutative operation u ∈ abel & (∀ x ∈ abel)(x+u = x) -- designated unit element (∀ x, y, z ∈ abel)( (x+y)+z = x+(y+z) )-- associativity ==>(Σ) -- summation operation Σ(∅) = u & (∀ y ∈ abel)( Σ({[X, y]}) = y ) Φ(F ) → Σ(F ) ∈ abel Φ(F ) → Σ(F ) = Σ(F ∩ G) + Σ(F \ G) -- additivity Φ(F ) & Ψ ( P, F ) → Σ(F ) = Σ ( { [g, Σ(g)] : g ∈ P } ) Φ(F ) & Ψ ( P, domain(F ) ) → Σ(F ) = Σ ( { [b, Σ ( F|b )] : b ∈ P } ) Φ(F ) & Svm(G) & domain(F ) = domain(G) → Σ(F ) = Σ ( { [x, Σ ( F|G−1 {x} )] : x ∈ range(G) } ) END gen sigma add. A proof of the last of these theorems, which states that Σ is insensitive to operand rearrangement and grouping, is sketched below. Generalized additivity is proved ﬁrst: starting with the absurd hypothesis that speciﬁc f, p exist for which Φ(f) & Ψ ( p, f ) & Σ(f) = Σ ( { [g, Σ(g)] : g ∈ p } ) holds, one can choose an inclusion-minimal such p referring to the same f and included in the p chosen at ﬁrst, by an invocation APPLY(p0) finite induction( n → p, P(Q) → Ψ ( Q, f ) & Σ(f) = Σ ( { [g, Σ(g)] : g ∈ Q } ) )==> · · · . From this, a contradiction is easily reached. The next theorem, namely Φ(F ) & Ψ ( P, domain(F ) ) → Σ(F ) = Σ ( { [b, Σ ( F|b )] : b ∈ P

})

follows since Ψ ( P, domain(F ) ) implies Ψ ( {F|b : b ∈ P }, F ) . Proof of the summand rearrangement theorem seen above is now easy, because Svm(G) & D = domain(G) → Ψ ( { G−1 {x} : x ∈ range(G) }, D ) holds for any D and hence in particular for D = domain(F ).

A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory

227

The above line of proof suggests a useful preamble is to construct the following theory of Ψ : THEORY is partition(p, s) ==>(flag) -- this indicates whether or not s is partitioned by p flag ↔ s = p & (∀ b, v)(b = v → b ∩ v = ∅) flag & Finite(s) → Finite(p) flag & s = domain(F ) & Q = { F|b : b ∈ p } → F = Q & (∀ f, g ∈ Q)(f = g → f ∩ g = ∅) Svm(G) & s = domain(G) & p = { G−1 {y} : y ∈ range(G) } → flag END is partition.

6

Related Work

To support software design and speciﬁcation, rapid prototyping, theorem proving, user interface design, and hardware veriﬁcation, various authors have proposed systems embodying constructs for modularization which are, under one respect or another, akin to our THEORY construct. Among such proposals lies the OBJ family of languages [15], which integrates speciﬁcation, prototyping, and veriﬁcation into a system with a single underlying equational logic. In the implementation OBJ3 of OBJ, a module can either be an object or a theory: in either case it will have a set of equations as its body, but an object is executable and has a ﬁxed standard model whereas a theory describes nonexecutable properties and has loose semantics, namely a variety of admissible models. As early as in 1985, OBJ2 [13] was endowed with a generic module mechanism inspired by the mechanism for parameterized speciﬁcations of the Clear speciﬁcation language [3]; the interface declarations of OBJ2 generics were not purely syntactic but contained semantic requirements that actual modules had to satisfy before they could be meaningfully substituted. The use of OBJ for theorem-proving is aimed at providing mechanical assistance for proofs that are needed in the development of software and hardware, more than at mechanizing mathematical proofs in the broad sense. This partly explains the big emphasis which the design of OBJ places on equational reasoning and the privileged role assigned to universal algebra: equational logic is in fact suﬃciently powerful to describe any standard model within which one may want to carry out computations. We observe that an equational formulation of set theory can be designed [11], and may even oﬀer advantages w.r.t. a more traditional formulation of ZermeloFraenkel in limited applications where it is reasonable to expect that proofs can be found in fully automatic mode; nevertheless, overly insisting on equational reasoning in the realm of set theory would be preposterous in light of the highly interactive proof-veriﬁcation environment which we envision. We like to mention another ambitious project, closer in spirit to this paper although based on a sophisticated variant of Church’s typed lambda-calculus [6]: the Interactive Mathematical Proof System (IMPS) described in [10]. This

228

Eugenio G. Omodeo and Jacob T. Schwartz

system manages a database of mathematics, represented as a collection of interconnected axiomatic “little theories” which span graduate-level parts of analysis (about 25 theories: real numbers, partial orders, metric spaces, normed spaces, etc.), some algebra (monoids, groups, and ﬁelds), and also some theories more directly relevant to computer science (concerning state machines, domains for denotational semantics, and free recursive datatypes). The initial library caters for some fragments of set theory too: in particular, it contains theorems about cardinalities. Mathematical analysis is regarded as a signiﬁcant arena for testing the adequacy of formalizations of mathematics, because analysis requires great expressive power for constructing proofs. The authors of [10] claim that IMPS supports a view of the axiomatic method based on “little theories” tailored to the diverse ﬁelds of mathematics as well as the “big theory” view in which all reasoning is performed within a single powerful and highly expressive set theory. Greater emphasis is placed on the former approach, anyhow: with this approach, links —“conduits”, so to speak, to pass results from one theory to another— play a crucial role. To realize such links, a syntactic device named “theory interpretation” is used in a variety of ways to translate the language of a source theory to the language of a target theory so that the image of a theorem is always a theorem: this method enables reuse of mathematical results “transported” from relatively abstract theories to more specialized ones. One main diﬀerence of our approach w.r.t. that of IMPS is that we are willing to invest more on the “big theory” approach and, accordingly, do not feel urged to rely on a higher-order logic where functions are organized according to a type hierarchy. It may be contended that the typing discipline complies with everyday mathematical practice, and perhaps gives helpful clues to the automated reasoning mechanisms so as to ensure better performance; nevertheless, a well-thought type-free environment can be conceptually simpler. Both OBJ and IMPS attach great importance to interconnections across theories, inheritance to mention a most basic one, and “theory ensembles” to mention a nice feature of IMPS which enables one to move, e.g., from the formal theory of a metric space to a family of interrelated replicas of it, which also caters for continuous mappings between metric spaces. As regards theory interconnections, the proposal we have made in this paper still awaits being enriched. The literature on the OBJ family and on the IMPS system also stresses the kinship between the activity of proving theorems and computing in general; even more so does the literature on systems, such as Nuprl [8] or the Calculus of Constructions [9], which rely on a constructive foundation, more or less close to Martin-L¨ of’s intuitionistic type theory [19]. Important achievements, and in particular the conception of declarative programming languages such as Prolog, stem in fact from the view that proof-search can be taken as a general paradigm of computation. On the other hand, we feel that too little has been done, to date, in order to exploit a “proof-by-computation” paradigm aimed at enhancing theorem-proving by means of the ability to perform symbolic computations

A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory

229

eﬃciently in specialized contexts of algebra and analysis (a step in this direction was moved with [7]). Here is an issue that we intend to deepen in a forthcoming paper.

7

Conclusions

We view the activity of setting up detailed formalized proofs of important theorems in analysis and number theory as an essential part of the feasibility study that must precede the development of any ambitious proof-checker. In mathematics, set theory has emerged as the standard framework for such an enterprise, and full computer-assisted certiﬁcation of a modernized version of Principia Mathematica should now be possible. To convince ourselves of a veriﬁer system’s ability to handle large-scale mathematical proofs —and such proofs cannot always be avoided in program-correctness veriﬁcation—, it is best to follow the royal road paved by the work of Cauchy, Dedekind, Frege, Cantor, Peano, Whitehead–Russell, Zermelo–Fraenkel–von Neumann, and many others. Only one facet of our work on large-scale proof scenarios is presented in this paper. Discussion on the nature of the basic inference steps a proof-veriﬁer should (and reasonably can) handle has been omitted to focus our discussion on the issue of proof modularization. The obvious goal of modularization is to avoid repeating similar steps when the proofs of two theorems are closely analogous. Modularization must also conceal the details of a proof once they have been fed into the system and successfully certiﬁed. When coupled to a powerful underlying set theory, indeﬁnitely expansible with new function symbols generated by Skolemization, the technical notion of “theory” proposed in this paper appears to meet such proof-modularization requirements. The examples provided, showing how often the THEORY construct can be exploited in proof scenarios, may convince the reader of the utility of this construct.

Acknowledgements We thank Ernst-Erich Doberkat (Universit¨ at Dortmund, D), who brought to our attention the text by Frege cited in the epigraph of this paper. We are indebted to Patrick Cegielski (Universit´e Paris XII, F) for helpful comments.

References 1. A. Blass and Y. Gurevich. The logic of choice. J. of Symbolic Logic, 65(3):1264–1310, 2000. 2. D. S. Bridges. Foundations of real and abstract analysis. Springer-Verlag, Graduate Texts in Mathematics vol.174, 1997. 3. R. Burstall and J. Goguen. Putting theories together to make speciﬁcations. In R. Reddy, ed, Proc. 5th International Joint Conference on Artificial Intelligence. Cambridge, MA, pp. 1045–1058, 1977.

230

Eugenio G. Omodeo and Jacob T. Schwartz

4. R. Caferra and G. Salzer, editors. Automated Deduction in Classical and NonClassical Logics. LNCS 1761 (LNAI). Springer-Verlag, 2000. 5. P. Cegielski. Un fondement des math´ematiques. In M. Barbut et al., eds, La recherche de la v´erit´e. ACL – Les ´editions du Kangourou, 1999. 6. A. Church. A formulation of the simple theory of types. J. of Symbolic Logic, 5:56–68, 1940. 7. E. Clarke and X. Zhao. Analytica—A theorem prover in Mathematica. In D. Kapur, ed, Automated Deduction—CADE-11. Springer-Verlag, LNCS vol. 607, pp. 761–765, 1992. 8. R. L. Constable, S. F. Allen, H. M. Bromley, W. R. Cleaveland, J. F. Cremer, R. W. Harper, D. J. Howe, T. B. Knoblock, N. P. Mendler, P. Panangaden, J. T. Sasaki, and S. F. Smith. Implementing mathematics with the Nuprl development system. Prentice-Hall, Englewood Cliﬀs, NJ, 1986. 9. Th. Coquand and G. Huet. The calculus of constructions. Information and Computation, 76(2/3):95–120, 1988. 10. W. M. Farmer, J. D. Guttman, F. J. Thayer. IMPS: An interactive mathematical proof system. J. of Automated Reasoning, 11:213–248, 1993. 11. A. Formisano and E. Omodeo. An equational re-engineering of set theories. In Caferra and Salzer [4, pp. 175–190]. 12. G. Frege. Logik in der Mathematik. In G. Frege, Schriften zur Logik und Sprachphilosophie. Aus dem Nachlaß herausgegeben von G. Gabriel. Felix Meiner Verlag, Philosophische Bibliothek, Band 277, Hamburg, pp. 92–165, 1971. 13. K. Futatsugi, J. A. Goguen, J.-P. Jouannaud, J. Meseguer. Principles of OBJ2. Proc. 12th annual ACM Symp. on Principles of Programming Languages (POPL’85), pp. 55-66, 1985. 14. R. Godement. Cours d’alg` ebre. Hermann, Paris, Collection Enseignement des Sciences, 3rd edition, 1966. 15. J. A. Goguen and G. Malcolm. Algebraic semantics of imperative programs. MIT, 1996. 16. T. J. Jech. Set theory. Springer-Verlag, Perspectives in Mathematical Logic, 2nd edition, 1997. 17. E. Landau. Foundation of analysis. The arithmetic of whole, rational, irrational and complex numbers. Chelsea Publishing Co., New York, 2nd edition, 1960. 18. A. Levy. Basic set theory. Springer-Verlag, Perspectives in Mathematical Logic, 1979. 19. P. Martin-L¨ of. Intuitionistic type theory. Bibliopolis, Napoli, Studies in Proof Theory Series, 1984.

An Open Research Problem: Strong Completeness of R. Kowalski’s Connection Graph Proof Procedure J¨ org Siekmann1 and Graham Wrightson2 1

Universit¨ at des Saarlandes, Stuhlsatzenhausweg, D-66123 Saarbr¨ ucken, Germany. [email protected] 2 Department of Computer Science and Software Engineering, The University of Newcastle, NSW 2308, Australia. [email protected]

Abstract. The connection graph proof procedure (or clause graph resolution as it is more commonly called today) is a theorem proving technique due to Robert Kowalski. It is a negative test calculus (a refutation procedure) based on resolution. Due to an intricate deletion mechanism that generalises the well-known purity principle, it substantially reﬁnes the usual notions of resolution-based systems and leads to a largely reduced search space. The dynamic nature of the clause graph upon which this refutation procedure is based, poses novel meta-logical problems previously unencountered in logical deduction systems. Ever since its invention in 1975 the soundness, conﬂuence and (strong) completeness of the procedure have been in doubt in spite of many partial results. This paper provides an introduction to the problem as well as an overview of the main results that have been obtained in the last twenty-ﬁve years.

1

Introduction to Clause Graph Resolution

We assume the reader to be familiar with the basic notions of resolution-based theorem proving (see, for example, Alan Robinson [1965], Chang, C.-L. and Lee, R.C.-T. [1973] or Don Loveland [1978]). Clause graphs introduced a new ingenious development into the ﬁeld, the central idea of which is the following: In standard resolution two resolvable literals must ﬁrst be found in the set of sets of literals before a resolution step can be performed, where a set of literals represents a clause (i.e. a disjunction of these literals) and a statement to be refuted is represented as a set of clauses. Various techniques were developed to carry out this search. However, Robert Kowalski [1975] proposed an enhancement to the basic data structure in order to make possible resolution steps explicit, which — as it turned out in subsequent years — not only simpliﬁed the search, but also introduced new and unexpected logical problems. This enhancement was gained by the use of so-called links between complementary literals, thus turning the set notation into a graph-like structure. The new approach allowed in particular for the removal of a link after the corresponding resolution step and A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 231–252, 2002. c Springer-Verlag Berlin Heidelberg 2002

232

J¨ org Siekmann and Graham Wrightson

a clause that contains a literal which is no longer connected by a link may be removed also (generalised purity principle). An important side eﬀect was that this link removal had the potential to cause the disappearance of even more clauses from the current set of clauses (avalanche eﬀect). Although this eﬀect could reduce the search space drastically it also had a signiﬁcant impact on the underlying logical foundations. To quote Norbert Eisinger from his monograph on Kowalski’s clause graphs [1991]: “Let S and S be the current set of formulae before and after a deduction step S S . A step of a classical calculus and a resolution step both simply add a formula following from S. Thus, each interpreted as the conjunction of its members, S and S are always equivalent. For clause graph resolution, however, S may contain formulae missing in S , and the removed formulae are not necessarily consequences of those still present in S . While this does not aﬀect the forward implication, S does in general no longer ensue from S . In other words, it is possible for S to possess more models than S. But, when S is unsatisﬁable, so must be S , i.e. S must not have more models than S, if soundness, unsatisﬁability and hence refutability, is to be preserved.” This basic problem underlying all investigations of the properties of the clause graph procedure will be made more explicit in the following.

2

Clause Graph Resolution: The Problem

The standard resolution principle, called set resolution in the following, assumes the axioms and the negated theorem to be represented as a set of clauses. In contrast, the clause graph proof procedure represents the initial set of clauses as a graph by drawing a link between pairs of literal occurrences to denote that some relation holds between these two literals. If this relation is “complementarity” (it may denote other relations as well, see e.g. Christoph Walter [1981], but this is the standard case and the basic point of interest in this paper) of the two literals, i.e. resolvability of the respective clauses, then an initial clause graph for the set S = {{ −P (z, c, z), −P (z, d, z)}, {P (a, x, a), −P (a, b, c)}, {P (a, w, c), P (w, y, w)}, {P (u, d, u), −P (b, u, d), P (u, b, b)}, {−P (a, b, b)}, {−P (c, b, c), P (v, a, d), P (a, v, b)}} is the graph in Figure 1. Here P is a ternary predicate symbol, letters from the beginning of the alphabet a, b, c, . . . denote constants, letters from the end of the alphabet x, y, z, v, . . . denote variables and −P (. . . ) denotes the negation of P (. . . ).

An Open Research Problem

233

Example 1.

-Pzcz

-Pzdz

Pudu

10

1

Pavb 9

4 2

-Pbud

7

3

-Pabb 8

-Paxa

-Pabc

6

5

-Pawc -Pwyw

-Pcbc

Pvad

Pavb

Fig. 1.

An appropriate most general uniﬁer is associated with each link (not shown in the example of Figure 1). We use the now standard notation that adjacent boxes denote a clause, i.e. the conjunction of the literals in the boxes. So far such a clause graph is just a data structure without commitment to a particular proof procedure and in fact there have been many proposals to base an automated deduction procedure on some graph-like notion (e.g. Andrews [1976], Andrews [1981], Bibel [1981b], Bibel [1982], Chang and Slagle [1979], Kowalski [1975], Shostak [1976], Shostak [1979], Sickel [1976], Stickel [1982], Yates and Raphael and Hart [1970], Omodeo [1982], Yarmush [1976], Murray and Rosenthal [1993], Murray and Rosenthal [1985]). Kowalski’s procedure uses a graph-like data structure as well, but its impact is more fundamental since it operates now as follows: suppose we want to perform the resolution step represented by link 6 in Figure 1 based on the uniﬁer σ = {w → b}. Renaming the variables appropriately we obtain the resolvent {P (a, x , a), P (b, y , b)} which is inserted into the graph and if now all additional links are set this yields the graph: Pax’a Pby’b 11

12 13

14 Pudu

-Pzcz -Pzdz 1

10

Pavb 9

4 2

-Pbud

7

3

-Pabb 8

-Paxa -Pabc

-Pawc -Pwyw

5

Fig. 2.

-Pcbc

Pvad

Pavb

234

J¨ org Siekmann and Graham Wrightson

Now there are three essential operations: 1. The new links don’t have to be recomputed by comparing every pair of literals again for complementarity, but this information can instead be inherited from the given link structure. 2. The link resolved upon is deleted to mark the fact that this resolution step has already been performed, 3. Clauses that contain a literal with no link connecting it to the rest of the graph may be deleted (generalised purity principle). While the ﬁrst point is the essential ingredient for the computational attractiveness of the clause graph procedure, the second and third points show the ambivalence between gross logical and computational advantages versus severe and novel theoretical problems. Let us turn to the above example again. After resolution upon link 6 we obtain the graph in Figure 2 above. Now since link 6 has been resolved upon we have it deleted it according to rule (2). But now the two literals involved become pure and hence the two clauses can be deleted as well leading to the following graph:

Pax’a

11

Pby’b

12 13

-Pzcz

14

-Pzdz

Pudu

-Pbud

10

Pavb 9

7

-Pabb 8

-Pcbc

Pvad

Pavb

Fig. 3.

But now the literal −P (c, b, c) in the bottom clause becomes pure as well and hence we have the graph:

An Open Research Problem Pax’a

11

Pby’b

12 13

-Pzcz

235

14

-Pzdz

Pudu 10

-Pbud

Pavb 9 -Pabb

Fig. 4.

This removal causes the only literal −P (a, b, b) in the bottom clause to become pure and hence, after a single resolution step followed by all these purity deletions, we arrive at the ﬁnal graph:

Pax’a

11

Pby’b

12 13

-Pzcz

14

-Pzdz

Fig. 5.

It is this strong feature that reduces redundancy in the complementary set of clauses, that marks the fascination for this proof procedure (see Ohlbach [1985], Ohlbach [1983], Bl¨ asius [1986] and [1987], Eisinger et al. [1989], Ohlbach and Siekmann [1991], Bl¨ asius et al. [1981], Eisinger [1981], Eisinger and Siekmann and Unvericht [1979], Ohlbach [1987], Ramesh et al. [1997], Murray and Rosenthal [1993], Siekmann and Wrightson [1980]). It can sometimes even reduce the initial redundant set to its essential contradictory subset (subgraph). But this also marks its problematical theoretical status: how do we know that we have not deleted too many clauses? Skipping the details of an exact deﬁnition of the various inheritance mechanisms (see e.g. Eisinger [1991] for details) the following example demonstrates the problem.

236

J¨ org Siekmann and Graham Wrightson

Suppose we have the refutable set S = {{P (a), P (a)}, {−P a}} and its initial graph as in Figure 6, where PUR means purity deletion and MER stands for merging two literals (Andrews [1968]), whilst RES stands for resolution. Example 2.

?

-Pa Pa

Pa

PUR

-Pa

PUR

Pa MER

-Pa

RES

{2}

Fig. 6.

Thus in two steps we would arrive either at the empty set ?, which stands for satisﬁability, or in the lower derivation we arrive at the empty clause {}, which stands for unsatisﬁability. This example would seem to show that the procedure: (i) is not conﬂuent, as deﬁned below (ii) is not sound (correct), and (iii) is not refutation complete (at least not in the strong sense as deﬁned below), and hence would be useless for all practical purposes. But here we can spot the ﬂaw immediately: the process did not start with the full initial graph, where all possible links are set. If, instead, all possible links are drawn in the initial graph, the example in Figure 6 fails to be a counterexample. On the other hand, after a few initial steps we always have a graph with some links deleted, for example because they have been resolved upon. So how can we be sure that the same disastrous phenomenon, as in the above example, will not occur again later on in the derivation? These problems have been called the conﬂuence, the soundness and the (strong) completeness problem of the clause graph procedure and it can be shown that for the original formulation of the procedure in Kowalski [1975] (with full

An Open Research Problem

237

subsumption and tautology removal) all these three essential properties unfortunately do not hold in general. However, for suitable remedies (of subsumption and tautology removal) the ﬁrst two properties hold, whereas the third property has been open ever since.

3

Properties and Results for the Clause Graph Proof Procedure

In order to capture the strange and novel properties of logical graphs let us ﬁx the following notions: A clause graph of a set of clauses S consists of a set of nodes labelled by the literal occurrences in S and a set of links that connect complementary literals. There are various possibilities to make this notion precise (e.g. Siekmann and Stephan [1976] and [1980], Brown [1976], Eisinger [1986] and [1991], Bibel [1980], Smolka [1982a,b,c] Bibel and Eder [1997], H¨ahnle et al. [2001], Murray and Rosenthal [1985]). Let INIT(S) be the full initial clause graph for S with all possible links set. This is called a full connection graph in Bibel and Eder [1997], a total graph in Eisinger [1991] and in Siekmann, Stephan [1976] and a complete graph in Brown [1976]. Definition 1. Clause graph resolution is called ∗ refutation sound if INIT(S) −→ {} then S is unsatisﬁable; refutation complete if S is unsatisﬁable then there exists a derivation ∗ INIT(S) −→ {}; refutation conﬂuent if S is unsatisﬁable, and, ∗ ∗ if INIT(S) −→ G1 and INIT(S) −→ G2 ∗ ∗ then there exists G1 −→ G and G2 −→ G for some G ; ∗ aﬃrmation sound if INIT(S) −→ ? then S is satisﬁable; aﬃrmation complete if S is satisﬁable then there exists a derivation ∗ INIT(S) −→ ?; aﬃrmation conﬂuent if S is satisﬁable, and, ∗ ∗ if INIT(S) −→ G1 and INIT(S) −→ G2 ∗ ∗ then there exists G1 −→ G and G2 −→ G , for some G . The state of knowledge about the clause graph proof procedure at the end of the 1980’s can be summarised by the following major theorems. There are some subtleties involved when subsumption and tautology removal are involved (see Eisinger [1991] for a thorough exposition; the discovery of the problems with subsumption and tautology removal and an appropriate remedy for these problems is due to Wolfgang Bibel). Theorem 1 (Bibel, Brown, Eisinger, Siekmann, Stephan). Clause graph resolution is refutation sound. Theorem 2 (Bibel). Clause graph resolution is refutation complete.

238

J¨ org Siekmann and Graham Wrightson

Theorem 3 (Eisinger, Smolka, Siekmann, Stephan). Clause graph resolution is refutation conﬂuent. Theorem 4 (Eisinger). Clause graph resolution is aﬃrmation sound. Theorem 5 (Eisinger). Clause graph resolution is not aﬃrmation conﬂuent. Theorem 6 (Smolka). For the unit refutable class, clause graph resolution with an unrestricted tautology rule is refutation complete, refutation conﬂuent, aﬃrmation sound, (and strongly complete). The important notion of strong completeness is introduced below. Theorem 7 (Eisinger). Clause graph resolution with an unrestricted tautology rule is refutation complete, but neither refutation conﬂuent nor aﬃrmation sound. As important and essential as the above-mentioned results may be, they are not enough for the practical usefulness of the clause graph procedure: the principal requirement for a proof procedure is not only to know that there exists a refutation, but even more importantly that the procedure can actually ﬁnd it after a ﬁnite number of steps. These two notions, called refutation completeness and strong refutation completeness in the following, essentially coincide for set resolution but unfortunately they do not do so for the clause graph procedure. This can be demonstrated by the example, in Figure 7, where we start with the graph G0 and derive G1 from G0 by resolution upon the link marked ☞. The last graph G2 contains a subgraph that is isomorphic to the ﬁrst, hence the corresponding inference steps can be repeated over and over again and the procedure will not terminate with the empty clause. Note that a refutation, i.e. the derivation of the empty clause, could have been obtained by resolving upon the leftmost link between P and −P .

G0 -P

P

-Q

P

-P

Q

☞

Example 3 (adapted from Eisinger [1991]).

Q

-R

-Q

R

An Open Research Problem

239

G0 ! G1 -P

P

-Q

P

-P

Q

-Q

R

Q

-R

-Q

R

Q

-R

Q

-R

☞ P

-P

P

-Q

P

-P

Q

☞

G1 ! G2

-R

P

-R

Fig. 7.

Examples of this nature gave rise to the strong completeness conjecture, which in spite of numerous attacks has remained an open problem now for over twenty years: How can we ensure for an unsatisﬁable graph that the derivation stops after ﬁnitely many steps with a graph that contains the empty clause? If this crucial property cannot be ascertained, the whole procedure would be rendered useless for all practical purposes, as we would have to backtrack to some earlier state in the derivation, and hence would have to store all intermediate graphs. The theoretical problems and strange counter intuitive facts that arise from the (graphical ) representation were ﬁrst discovered by J¨ org Siekmann and Werner Stephan and reported independently in Siekmann and Stephan [1976] and [1980] and by Frank Brown in Brown [1976]. They suggested a remedy to the problem: the obvious ﬂaw in the above example can be attributed to the fact that the proof procedure never selects the essential link for the refutation (the link between −P and P ). This, of course, is a property which a control strategy should have, i.e. it should be fair in the sense that every link is eventually selected. However this is

240

J¨ org Siekmann and Graham Wrightson

a subtle property in the dynamic context of the clause graph procedure as we shall see in the following. Control Strategies In order to capture the strange metalogical properties of the clause graph procedure, Siekmann and Stephan [1976] and [1980] introduced two essential notions in order to capture the above-mentioned awkward phenomenon. These two notions have been the essence of all subsequent investigations: (i) the notion of a kernel. This is now sometimes called the minimal refutable subgraph of a graph, e.g. in Bibel and Eder [1997]; (ii) several notions of covering, called fairness in Bibel and Eder [1997], exhaustiveness in Brown [1976], fairness-one and fairness-two in Eisinger [1991] and covering-one, two and three in Siekmann and Stephan [1976]. Let us have a look at these notions in turn, using the more recent and advanced notation of Eisinger [1991]. Why is it not enough to simply prove refutation completeness as in the case of clause set resolution? Ordinary refutation completeness ensures that if the initial set of clauses is unsatisﬁable, then there exists a refutation, i.e. a ﬁnite derivation of the empty clause. Of course, there is a control strategy for which this would be suﬃcient for clause graph resolution as well, namely an exhaustive enumeration of all possible graphs, as in Figure 8, where we assume that the initial graph G0 has n links. However such a strategy is computationally infeasible and far too expensive and would make the whole approach useless. G0

G01

G02

G03

·

·

G0n

G011 · G01m

·

·

·

·

·

Fig. 8.

We know by Theorem 2 that the clause graph procedure is refutation complete, i.e. that there exists a subgraph from which the derivation can be obtained. Could we not use this information from a potential derivation we know to exist in order to guide the procedure in general?

An Open Research Problem

241

Many strategies for clause graphs are in fact based on this very idea (Andrews [1981], Antoniou and Ohlbach [1983], Bibel [1981a], Bibel [1982], Chang and Slagle [1979], Sickel [1976]). However, in general, ﬁnding the appropriate subgraph essentially amounts to ﬁnding a proof in the ﬁrst place and we might as well use a standard resolution-based proof procedure to ﬁnd the derivation and then use this information to guide the clause graph procedure. So let us just assume in the abstract that every full (i.e. a graph where every possible link is set) and unsatisﬁable graph contains a subgraph, called a kernel (the shaded area in Figure 9), from which an actual refutation can be found in a ﬁnite number of steps.

Fig. 9.

We know from Theorem 2 above and from the results in Siekmann and Stephan [1976] and [1980] that every resolution step upon a link within the kernel eventually leads to the empty clause and thus to the desired refutation. If we can ensure that:

1. resolution steps involving links outside of the kernel do not destroy the kernel, and 2. every link in the kernel is eventually selected,

then we are done. This has been the line of attack ever since. Unfortunately the second condition turned out to be more subtle and rather diﬃcult to establish. So far no satisfactory solution to this problem has been found. So let us look at these concepts a little closer.

242

J¨ org Siekmann and Graham Wrightson

Definition 2. A ﬁlter for an inference system is a unary predicate F on the ∗ set of ﬁnite sequences of states. The notation S0 −→ Sn with F stands for ∗ a derivation S0 −→ Sn where F(S0 . . . Sn ) holds. For an inﬁnite derivation, S0 → . . . → Sn → . . . with F means that F(S0 . . . Sn . . . ) holds for each n. This notion is due to Gert Smolka in [1982b ] and Norbert Eisinger in [1991] and it is now used in several monographs on deduction systems (see e.g. K. Bl¨asius and H. J. B¨ urckert [1992]). Typical examples for a ﬁlter are the usual restriction and ordering strategies in automated theorem proving, such as set-ofsupport by Wos and Robinson and Carson [1965], linear refutation by Loveland [1970], merge resolution by Andrews [1968], unit resolution by Wos [1964], or see Kowalski [1970]. Definition 3. A ﬁlter F for clause graph resolution is called ∗ refutation sound: INIT(S) −→ {} with F then S is unsatisﬁable; refutation complete: if S is unsatisﬁable then there exists ∗ INIT(S) −→ {} with F; refutation conﬂuent: Let S be unsatisﬁable, ∗ ∗ For INIT(S) −→ G1 with F and INIT(S) −→ G2 ∗ with F then there exists G1 −→ G with F and ∗ G2 −→ G with F, for some G ; strong refutation for an unsatisﬁable S there does not exist an inﬁnite completeness: derivation INIT(S) → G1 → G1 → . . . → Gn → . . . with F. Note that → with F need not be transitive, hence the special form of conﬂuence, also note that the procedure terminates with {} or with ?. The most important and still open question is now: can we ﬁnd a general property for a ﬁlter that turns the clause graph proof procedure into a strongly complete system? Obviously the ﬁlter has to make sure that every link (in particular every link in some ﬁxed kernel) is eventually selected for resolution and not inﬁnitely postponed. Definition 4. A ﬁlter F for clause graph resolution is called covering, if the ∗ following holds: Let G0 be an initial graph, let G0 −→ Gn with F be a derivation, and let λ be a link in Gn . Then there is a ﬁnite number n(λ), such that for any ∗ ∗ derivation G0 −→ Gn −→ G with F extending the given one by at least n(λ) steps, λ is not in G. This is the weakest notion, called “coveringthree” in Siekmann and Stephan [1976], exhaustiveness in Brown [1976] and fairness in Bibel and Eder [1997]. It is well-known and was already observed in Siekmann and Stephan [1976] that the strong completeness conjecture is false for this notion of covering. The problem is that a link can disappear without being resolved upon, namely by purity deletion, as the examples from the beginning demonstrate. Even the original links in the kernel can be deleted without being resolved upon, but may reappear after the copying process.

An Open Research Problem

243

For this reason stronger notions of fairness are required: apparently even essential links can disappear without being resolved upon and reappear later due to the copying process. Hence we have to make absolutely sure that every link in the kernel is eventually resolved upon. To this end imagine that each initial link bears a distinct colour and that each descendant of a coloured link inherits the ancestor’s colour: Definition 5. An ordering ﬁlter F for clause graph resolution is called coveringtwo, if it is a covering and at least one link of each colour must have been resolved upon after at most ﬁnitely many steps. At ﬁrst sight this deﬁnition now seems to capture the essence, but how do we know that the “right” descendant (as there may be more than one) of the coloured ancestor has been operated upon? Hence the strongest deﬁnition of fairness for a ﬁlter: Definition 6. A ﬁlter F for clause graph resolution is called coveringone, if each colour must have disappeared after at most ﬁnitely many steps. While the strong completeness conjecture can be shown in the positive for the latter notion of covering (see Siekmann and Stephan [1980]), hardly any of the practical and standard ﬁlters actually fulﬁll this property (except for some obvious and exotic cases). So the strong completeness conjecture boils down to ﬁnding: 1. a proof or a refutation that a covering ﬁlter is strongly complete, for the appropriate notions of coveringone, -two, and -three, and 2. strong completeness results for subclasses of the full ﬁrst-order predicate calculus, or 3. an alternative notion of covering for which strong completeness can be shown. The ﬁrst two problems were settled by Norbert Eisinger and Gerd Smolka. Theorem 8 (Smolka). For the unit refutable class the strong completeness conjecture is true, i.e. the conjunction of a covering ﬁlter with any refutation complete and refutation conﬂuent restriction ﬁlter is refutation complete, refutation conﬂuent, and Noetherian, i.e. it terminates. This theorem, whose essential contribution is due to Gerd Smolka [1982a] accounts for the optimism at the time. After all the unit refutable class of clauses (Horn clauses) turned out to be very important for many practical purposes, includng logic programming, and the theorem shows that all the essential properties of a useful proof procedure now hold for the clause graph procedure. Based on an ingenious construction, Norbert Eisinger showed however the following devastating result which we will look at again in more detail in Section 4.

244

J¨ org Siekmann and Graham Wrightson

Theorem 9 (Eisinger). In general the strong completeness conjecture is false, even for a restriction ﬁlter based on the coveringtwo deﬁnition. This theorem destroyed once and for all the hope of ﬁnding a solution to the problem based on the notion of fairness, as it shows that even for the strongest possible form of fairness, strong completeness cannot be obtained. So attention turned to the third of the above options, namely of ﬁnding alternative notions of a ﬁlter for which strong completeness can be shown. Early results are in Wrightson [1989], Eisinger [1991] and more recent results are H¨ahnle et al. [2001], Meagher and Hext [1998]. Let us now look at the proof of Theorem 9 in more detail.

4

The Eisinger Example

This example is taken from Eisinger [1991], p. 158, Example 7.4 7. It shows a cyclic coveringtwo derivation, i.e. it shows that the clause graph proof procedure does not terminate even for the strong notion of a coveringtwo ﬁlter, hence in particular not for the notion of coveringthree either. Let S = {P Q, −P Q, −Q − R, RS, R − S} and INIT(S) = G0 .

G0

G1 5

Q

P

2

4 -Q

!

-R

S

Q

-P

-R

S

6

7

P

8

8

-S

R

9

-R

-S

-S

6

3 R

9

13

-S

-Q

R

9

Q

8

P

-P

S

6

-R

!

7

11

R

-S

10

G5 12

12 -Q

13

14 -S

S

12 -Q

14

G4 S

-P

G3 R

R

7

4 -Q

4

-R

P

10

-S

11 5

R

Q

Q

6 3

G2 -Q

5

S

-R

1

Q

-P

R

8 Q

-Q

-P

-R

9 R

P

11

-S

17 16

S

-P

S

15

-Q

-S

13

14 -S

8 Q

-Q

16

9

R

11

-P

-R

19

P

17 S

-P

18

-P

R

An Open Research Problem G6

G7 12

-S

-Q

S

13

14

Q 11

-Q

-S

R

-Q

22

8

-Q

-P

P

21

20

!

9

18

-P

-R

S

19

14

R

-S

-S

-P

13

8 Q

-R 19

18

-P

-P

P

11

-Q

245

-Q

-P

-Q

-P

-P

R

-P

-R

G9

G8 -Q -Q

S

13

14

24

Q

-P

8

P

11

-Q

-S

-P

25

18 -Q

S

19

14

R

-P

20

21

-Q

-R

-S

-P

13

24

25

8 Q

-Q

18 27

Q

29

19

P

11 26

-P

R

-Q 30

G8 includes two copies of −Q − P , one of which might be removed by subsumption. To make sure that the phenomenon is not just a variation of the notorious subsumption problem described earlier in his monograph, Norbert Eisinger does not subsume, but performs the corresponding resolution steps for both clause nodes in succession. G10

G11 -Q

S

13

8

14

Q 11 -Q

-S

28 31

32

P

26 27

-P

18

30 Q

-R

S

19

14

-Q

13 Q 11

-P

R

-S

8

-Q

-P

-R

19

P 18

-P

R

-Q

33

34

Q

35

-Q 36

G10 contains two tautologies and all links which are possible among its clause nodes. In other words, it is the initial clause graph of {S − Q, −S − Q, QP, −P − R, −P R, Q − Q, Q − Q}. So far only resolution steps and purity removals were performed; now apply two tautology removals to obtain G11 . G11 has the same structure as G0 , from which it can be obtained by applying the literal permutation π : ±Q → ∓Q, ±P → ±S → ∓R → ±P . Since π 6 = id, ﬁve more “rounds” with the analogous sequence of inference steps will reproduce G0 as G66 , thus after sixty-six steps we arrive at a graph isomorphic to G0 . The only object of G0 still present in G11 is the clause node labelled P Q. In particular, all initial links disappeared during the derivation. Hence G0 and G66 have no object in common, which implies that the derivation is covering. The following classes of link numbers represent the “colours” introduced for the cover-

246

J¨ org Siekmann and Graham Wrightson

ingtwo concept in Deﬁnition 5; the numbers of links resolved upon are asterisked: {1∗}, {2, 8, 17, 18, 20∗, 23, 24∗},{3∗, 9∗, 19},{4∗, 7∗},{5, 11, 13, 21, 25, 26, . . . , 36}, {6, 10, 12, 14, 15∗, 16∗, 22∗}. Only the colour {5, 11, . . . , 36} was never selected for resolution during the ﬁrst round, and it just so happens that the second round starts with a resolution on link 11, which bears the critical colour. Hence the derivation also belongs to the coveringtwo class. This seminal example was discovered in the autumn of 1986 and has since been published and quoted many times. It has once and for all destroyed all hope of a positive result for the strong completeness conjecture based only on the notion of covering or fairness. The consequence of this negative result has been compared to the most unfortunate fact that the halting problem of a Turing machine is unsolvable. The (weak) analogy is in the following sense: all the work on deduction systems rests upon the basic result that the predicate calculus is semidecidable, i.e. if the theorem to be shown is in fact valid then this can be shown after a ﬁnite number of steps, provided the uniform proof procedure carries out every possible inference step. Yet, here we have a uniform proof procedure — clause graph resolution — which by any intuitive notion of fairness (“carries out every possible inference step eventually”) runs forever even on a valid theorem — hence is not even semidecidable. In summary: The open problem is to ﬁnd a ﬁlter that captures the essence of fairness on the kernel which is practically useful1 — and then to show the strong completeness property holds for this new notion of a ﬁlter. The open problem is not to invent an appropriate termination condition (even as brilliant as the decomposition criteria of Bibel and Eder [1987]2) as the proof procedure will not terminate even for the strongest known notion of covering (fairness) — and this is exactly why the problem is still interesting even when the day is gone.

1

2

This is important, as there are strategies which are known to be complete (for example to take a standard resolution theorem prover to ﬁnd a proof and then use this information for clause-graph resolution). Hence these strategies are either based on some strange notion, or else on some too speciﬁc property. The weak notion of fairness as deﬁned by W. Bibel and E. Eder [1987] can easily be refuted by much simpler examples (see e.g. Siekmann and Stephan [1976]) and Norbert Eisinger’s construction above refutes a much stronger conjecture. The proof in the Bibel and Eder paper not only contains an excusable technical error, which we all are unfortunately prone to (the ﬂaw is on page 336, line 29, where they assume that the fairness condition forces the procedure to resolve upon every link in the minimal complementary submatrix, here called the kernel), but unfortunately misses the very nature of the open problem (see also Siekmann and Wrightson [2001]).

An Open Research Problem

5

247

Lifting

All of the previous results and counterexamples apply to the propositional case or ground level as it is called in the literature on deduction systems. The question is, if and how these ground results can be lifted to the general case of the predicate calculus. While lifting is not necessarily the wrong approach for the connection graph, the proof techniques known so far are too weak: the problem is more subtle and requires much stronger machinery for the actual lifting. The standard argument is as follows: ﬁrst the result is established for the ground case, and there is now a battery of proof techniques3 known in order to do so. After that the result is “lifted” to the general case in the following sense: Let S be an unsatisﬁable set of clauses, then by Herbrand’s theorem we know that there exists a ﬁnite truth-functionally contradictory set S of ground instances of S. Now since we have the completeness result for this propositional case we know there exists a (resolution style) derivation. Taking this derivation, we observe that all the clauses involved are just instances of the clauses at the general level and hence “lifting” this derivation amounts to exhibiting a mirror image of this derivation at the general level, as the following ﬁgures shows:

S ⇓ S

{} ⇑

ground

{}

This proof technique is due to Alan Robinson [1965]. Unfortunately this is not enough for the clause graph procedure, as we have the additional graph-like structure: not only has the ground proof to be lifted to the general level as usual, it has also to be shown that an isomorphic (or otherwise suﬃcient) graph structure can be mirrored from the ground level graph INIT(S ) to the graph at the general level INIT(S), such that the derivation can actually be carried out within this graph structure as well:

INIT(S) ⇓ INIT(S )

{G()} ⇑

ground

{G ()}

where G() is a clause graph that contains the empty clause . This turned out to be more diﬃcult than expected in the late 1970’s, when most of this work got started. However by the end of the 1980’s it was wellknown that standard lifting techniques fail: the non-standard graph-oriented 3

Such as induction on the excess-literal-number, which is due to W. Bledsoe (see Loveland [1978]).

248

J¨ org Siekmann and Graham Wrightson

lifting results in Siekmann and Stephan [1980] turned out to be false. Similarly the lifting results in Bibel [1982] and in Bibel and Eder [1997], theorem 5.4 are also false. To quote from Norbert Eisinger’s monograph ([1991], p. 125) on clause graphs “Unfortunately the idea (of lifting a refutation) fails for an intricate difﬁculty which is the central problem in lifting graph theoretic properties. A resolution step on a link in G (the general case) requires elimination of all links in G (the ground refutation) that are mapped to the link in G. . . . Such a side eﬀect can forestall the derivation of the successor.” This phenomenon seems to touch upon a new and fundamental problem, namely, the lifting technique has to take the topological structure of the two graphs (the ground graph and the general clause graph) into account as well, and several additional graph-theoretical arguments are asked for. The ground case part essentially develops a strategy which from any ground initial state leads to a ﬁnal state. In the clause graph resolution system any such strategy has to willy-nilly distinguish between “good” steps and “bad” steps from each ground state, because there are ground case examples where an inappropriate choice of inference steps leads to inﬁnite derivations that do not reach a ﬁnal state. Eliminating or reducing the number of links with a given atom are sample criteria for “good” steps in diﬀerent strategies. The lifting part then exploits the fact that it suﬃces to consider the conjunction of ﬁnitely many ground instances of a given ﬁrst order formula, and show how to lift the steps of a derivation existing for the ground formula to the ﬁrst order level. Clause graph resolution faces the problem that a single resolution step on the general level couples diﬀerent ground level steps together in a way that may be incompatible with a given ground case strategy, because “bad” steps have to be performed as a side eﬀect of “good” steps. That this is not always straightforward and may fail in general is shown by several (rather complex) examples (pp.123–130 in Eisinger [1991]), which we shall omit here. The interested reader may consult the monograph itself, which still represents most of what is known about the theoretical properties of clause graphs today. To be sure, there is a very simple way to solve this problem: just add to the inference system an unrestricted copy rule and use it to insert suﬃciently many variants. However to introduce an unrestricted copy rule, as, for example, implicitly assumed in the Bibel [1982] monograph, completely destroys the practical advantages of the clause graph procedure. It is precisely the advantage of the strong redundancy removal which motivated so many practical systems to employ this rather complicated machinery (see e.g. Ohlbach and Siekmann [1991]). Otherwise we may just use ordinary resolution instead. We feel that maybe the lifting technique should be abandoned altogether for clause graph refutation systems: the burden of mapping the appropriate graph structure (and taking its dynamically changing nature into account) seems to

An Open Research Problem

249

outweigh its advantages and a direct proof at the most general level with an appropriate technique appears far more promising. But only the future will tell.

6

Conclusion

The last twenty-ﬁve years have seen many attempts and partial results about so far unencountered theoretical problems that marred this new proof procedure, but it is probably no unfair generalisation to say, that almost every paper (including ours) on the problems has had technical ﬂaws or major errors and the main problem — strong completeness — has been open ever since 1975 when clause graph resolution was ﬁrst introduced to the scholarly community. Why is that so? One reason may be methodological. Clause graph resolution is formulated within three diﬀerent conceptual frameworks: the usual clausal logic, the graphtheoretic properties and ﬁnally the algorithmic aspects, which account for its nonmonotonic nature. So far most of the methodological eﬀort has been spent on the graphtheoretical notions (see e.g. Eisinger [1991]) in order to obtain a ﬁrm theoretical basis. The hope being that once these graphtheoretical properties have a sound mathematical foundation, the rest will follow suit. But this may have been a misconception: it is — after all — the metalogical properties of the proof procedure we are after and hence the time may have come to question the whole approach. In (Gabbay, Siekmann [2001]) we try to turn the situation back from its (graphtheroetical) head to standing on its (logical) feet, by showing a logical encoding of the proof procedure without explicit reference to graphtheoretical properties. Mathematics, it is said, advances through conjectures and refutations and this is a social process often carried out over more than one generation. Theoretical computer science and artiﬁcial intelligence apparently are no exceptions to this general rule.

Acknowledgements This paper has been considerably improved by critical comments and suggestions from the anonymous referees and from Norbert Eisinger, Christoph Walther and Dov Gabbay. The authors would like to thank Oxford University Press for their kind permissin to reprint this paper, which is appearing in the Logic Journal of the IGPL.

References Andrews, P. B.: Resolution with Merging. J. ACM 15 (1968) 367–381. Andrews, P. B.: Refutations by Matings. IEEE Trans. Comp. C-25, (1976) 8, 801–807.

250

J¨ org Siekmann and Graham Wrightson

Andrews, P.B.: Theorem Proving via General Matings. J. ACM 28 (1981) 193–214. Antoniuo, G., Ohlbach, H.J.: Terminator. Proceedings 8th IJCAI, Karlsruhe, (1983) 916–919. Bibel, W.: A Strong Completeness Result for the Connection Graph Proof Procedure. Bericht ATP-3-IV-80, Institut f¨ ur Informatik, Technische Universit¨ at, M¨ unchen (1980) Bibel, W.: On the completeness of connection graph resolution. In German Workshop on Artificial Intelligence. J.Siekmann, ed. Informatik Fachberichte 47, Springer, Berlin, Germany (1981a) pp.246–247 Bibel, W.: On matrices with connections. J.ACM, 28 (1981b) 633–645 Bibel, W.: Automated Theorem Proving. (1982) Vieweg. Wiesbaden. Bibel, W.: Matings in matrices. Commun. ACM, 26, (1983) 844–852 Bibel, W., Eder, E.: Decomposition of tautologies into regular formula and strong completeness of connection-graph resolution J. ACM 44 (1997) 320–344 Bl¨ asius, K. H.: Construction of equality graphs. SEKI report SR-86-01 (1986) Univ. Karlsruhe, Germany Bl¨ asius, K. H.: Equality reasoning based on graphs. SEKI report SR-87-01 (1987) Univ. Karlsruhe, Germany Bl¨ asius, K. H., B¨ urckert, H. J.: Deduktions Systeme, (1992) Oldenbourg Verlag. Also in English: Ellis Horwood, 1989 Bl¨ asius, K. H., Eisinger, N., Siekmann, J., Smolka, G., Herald A., Walter, C. The Markgraf Karl refutation procedure. Proc 7th IJCAI, Vancouver (1981) Brown, F. Notes on Chains and Connection Graphs. Personal Notes, Dept. of Computation and Logic, University of Edinburgh (1976) Chang, C.-L., Lee, R.C.-T.: Symbolic Logic and Mechanical Theorem Proving, Academic Press (1973) Chang, C.-L., Slagle, J.R.: Using Rewriting Rules for Connection Graphs to Prove Theorems. Artificial Intelligence 12 (1979) 159–178. Eisinger, N.: What you always wanted to know about clause graph resolution. In Proc of 8th Conf. on Automated Deduction Oxford (1986) LNCS 230, Springer Eisinger, N.: Subsumption for connection graphs. Proc 7th IGCAI, Vancouver (1981) Eisinger, N.: Completeness, Conﬂuence, and Related Properties of Clause Graph Resolution. Ph.D. dissertation, Universit¨ at Kaiserslautern (1988) Eisinger, N.: Completeness, Confluence, and Related Properties of Clause Graph Resolution. Pitman, London, Morgan Kaufmann Publishers,Inc., San Mateo,California (1991) Eisinger, N., Siekmann, J., Unvericht, E.: The Markgraf Karl refutation procedure. Proc of Conf on Automated Deduction, Austin, Texas (1979) Eisinger, N., Ohlbach, H. J., Pr¨ acklein, A.: Elimination of redundancies in clause sets and clause graphs (1989) SEKI report, SR-89-01, University of Karlsruhe Gabbay, D., Siekmann, J.: Logical encoding of the clause graph proof procedure, 2002, forthcoming H¨ ahnle, R., Murray, N. V., Rosenthal, E.: Ordered resolution versus connection graph resolution. In: R. Gor´e, A. Leitsch, T. Nipkow Automated Reasoning, Proc of IJCAR 2001 (2001) LNAI 2083, Springer Kowalski, R.: Search Strategies for Theorem Proving. Machine Intelligence (B.Meltzer and D.Michie, eds.), 5 Edinburgh University Press, Edinburgh, (1970) 181–201 Kowalski, R.: . A proof procedure using connection graphs. J.ACM 22 (1975) 572–595 Loveland, D. W.: A Linear Format for Resolution. Proc. of Symp. on Automatic Demonstration. Lecture Notes in Math 125, Springer Verlag, Berlin, (1970) 147– 162. Also in Siekmann and Wrightson [1983b], 377–398

An Open Research Problem

251

Loveland, D. W.: Automated Theorem Proving: A Logical Basis North- Holland, New York (1978) Meagher D., Hext, J.: Link deletion in resolution theorem proving (1998) unpublished manuscript Murray, N. V., Rosenthal, E.: Path resolution with link deletion. Proc. of 9th IJCAII Los Angeles (1985) Murray, N. V., Rosenthal, E.: Dissolution: making paths vanish. J. ACM 40 (1993) Ohlbach, H. J.: Ein regelbasiertes Klauselgraph Beweisverfahren. Proc. of German Conference on AI, GWAI-83 (1983) Springer Verlag IFB vol 76 Ohlbach, H. J.: Theory uniﬁcation in abstract clause graphs. Proc. of German Conf. on AI GWAI-85 (1985) Springer Verlag IFB vol 118 Ohlbach, H. J.: Link inheritance in abstract clause graphs J. Autom. Reasoning 3 (1987) Ohlbach, H. J., Siekmann, J.: The Markgraf Karl refutation procedure. In: J. L. Lassez, G. Plotkin, Computational Logic (1991) MIT Press, Cambridge MA Omodeo, E. G.: The linked conjunct method for automatic deduction and related search techniques. Computers and Mathematics with Applications 8 (1982) 185–203 Ramesh, A., Beckert, B., H¨ ahnle, R., Murray, N. V.: Fast subsumption checks using anti-links J. Autom. Reasoning 18 (1997) 47–83 Robinson, J.A.: A machine-oriented logic based on the resolution principle. J.ACM 12 (1965) 23–41 Shostak, R.E.: Refutation Graphs. J. Artificial Intelligence 7, (1976), 51–64 Shostak, R.E.: A Graph-Theoretic View of Resolution Theorem-Proving. Report SRI International, Menlo Park (1979) Sickel, S.: A Search Technique for Clause Interconnectivity Graphs. IEEE Trans. Comp. C-25 (1976) 823–835 Siekmann, J. H., Stephan, W.: Completeness and Soundness of the Connection Graph Proof Procedure. Bericht 7/76, Fakult¨ at Informatik, Universit¨ at Karlsruhe (1976). Also in Proceedings of AISB/GI Conference on Artificial Intelligence, Hamburg (1978) Siekmann, J. H., Stephan, W.: Completeness and Consistency of the Connection Graph Proof Procedure. Interner Bericht Institut I, Fakult¨ at Informatik, Universit¨ at Karlsruhe (1980) Siekmann, J. H., Wrightson, G.: Paramodulated connection graphs Acta Informatica 13 (1980) Siekmann, J. H., Wrightson, G.: Automation of Reasoning. Springer- Verlag, Berlin, Heidelberg, New York. Vol 1 and vol 2 (1983) Siekmann, J. H., Wrightson, G.: Erratum: A counterexample to W. Bibel’s and E. Eder’s strong completeness result for connection graph resolution. J. ACM 48 (2001) 145 Smolka, G.: Completeness of the connection graph proof procedure for unit refutable clause sets. In Proceedings of GWAI-82. Informatik Fachberichte, vol. 58. SpringerVerlag, Berlin, Germany (1982a) 191-204. Smolka, G.: Einige Ergebnisse zur Vollst¨ andigkeit der Beweisprozedur von Kowalski. Diplomarbeit, Fakult¨ at Informatik, Universit¨ at Karlsruhe (1982b) Smolka, G.: Completeness and conﬂuence properties of Kowalksi’s clause graph calculus (1982c) SEKI report SR-82-03, University of Karlsruhe, Germany Stickel, M.: A Non-Clausal Connection-Graph Resolution Theorem-Proving Program. Proceedings AAAI-82, Pittsburgh (1982) 229–233 Walthe, Chr.: Elimination of redundant links in extended connection graphs. Proc of German Workshop on AI, GWAI-81 (1981) Springer Verlag, Fachberichte vol 47

252

J¨ org Siekmann and Graham Wrightson

Wos, L.T., Carson, D.F., Robinson, G.A.: The Unit Preference Strategy in Theorem Proving. AFIPS Conf. Proc. 26, (1964) Spartan Books, Washington. Also in Siekmann and Wrightson [1983], 387–396. Wos, L.T., Robinson, G.A., Carson, D.F.: Eﬃciency and Completeness of the Set of Support Strategy in Theorem Proving. J.ACM 12, (1965) 536–541. Also in Siekmann and Wrightson [1983], 484–492 Wos, L. T, et al.: Automated Reasoning: Introduction and Applications (1984) Englewood Cliﬀs, new Jersey, Prentice-Hall Wrightson, G.: A pragmatic strategy for clause graphs or the strong completeness of connection graphs. Report 98-3, Dept Comp. Sci., Univ of Newcastle, Australia (1989) Yarmush, D. L.: The linked conjunct and other algorithms for mechanical theoremproving. Technical Report IMM 412, Courant Institute of Mathematical Sciences, New York University (1976) Yates, R. A., Raphael, B., Hart, T. P.: Resolution Graphs. Artificial Intelligence 1 (1970) 257–289.

Meta-reasoning: A Survey Stefania Costantini Dipartimento di Informatica Universit` a degli Studi di L’Aquila, via Vetoio Loc. Coppito, I-67100 L’Aquila, Italy [email protected]

Abstract We present the basic principles and possible applications of systems capable of meta-reasoning and reﬂection. After a discussion of the seminal approaches, we outline our own perception of the state of the art, mainly but not only in computational logic and logic programming. We review relevant successful applications of meta-reasoning, and the basic underlying semantic principles.

1

Introduction

The meaning of the term “meta-reasoning” is “reasoning about reasoning”. In a computer system, this means that the system is able to reason about its own operation. This is diﬀerent from performing object-level reasoning, which refers in some way to entities external to the system. A system capable of meta-reasoning may be able to reﬂect, or introspect, i.e. to shift from meta-reasoning to objectlevel reasoning and vice versa. We present the main principles and the possible applications of metareasoning and reﬂective systems. After a review of the relevant approaches, mainly in computational logic and logic programming, we discuss the state of the art and recent interesting applications of meta-reasoning. Finally, we brieﬂy summarize the semantic foundations of meta-reasoning. We necessarily express our own partial point of view on the ﬁeld and provide the references that we consider the most important. There are previous good reviews on this subject, to which we are indebted and to which we refer the reader for a wider perspective and a careful discussion of problems, foundations, languages, approaches, and systems. We especially mention [1], [2], [3]. Also, the reader may refer, for the computational logic aspects, to the Proceedings of the Workshops on Meta-Programming in Logic [4], [5], [6], [7], [8]. Much signiﬁcant work on Meta-Programming was carried out in the Esprit funded European projects Compulog I and II. Some of the results of this work are discussed in the following sections. For a wider report we refer the reader to [9]. More generally, about meta-reasoning in various kinds of paradigms, including object-oriented, functional and imperative languages, the reader may refer to [10] [11], [12]. A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 253–288, 2002. c Springer-Verlag Berlin Heidelberg 2002

254

Stefania Costantini

Research about meta-reasoning and reﬂection in computer science has its roots in principles and techniques developed in logic, since the fundamental work of G¨odel and Tarski, for which it may be useful to refer to the surveys [13], [14]. In meta-level approaches, knowledge about knowledge is represented by admitting sentences to be arguments of other sentences, without abandoning the framework of ﬁrst-order logic. An alternative important approach to formalize knowledge about knowledge is the modal approach that has initially been developed by logicians and philosophers and then has received a great deal of attention in the ﬁeld of Artiﬁcial Intelligence. It aims at formalizing knowledge by a logic language augmented by a modal operator, interpreted as knowledge or belief. Thus, sentences can be expressed to represent properties of knowledge (or belief). The most common modal systems adopt a possible world semantics [15]. In this semantics, knowledge and belief are regarded as propositions specifying the relationship between knowledge expressed in the theory and the external world. For a review of modal and meta-languages, focused on their expressivity, on consistency problems and on the possibility of translating modal languages into a meta-level setting, the reader may refer to [16].

2

Meta-programming and Meta-reasoning

Whatever the underlying computational paradigm, every piece of software included in any system (in the following, we will say software component ) manipulates some kind of data, organized in suitable data structures. Data can be used in various ways: for producing results, sending messages, performing actions, or just updating the component’s internal state. Data are often assumed to denote entities which are external to the software component. Whenever the computation should produce eﬀects that are visible in the external environment, it is necessary to assume that there exists a causal connection between the software system and the environment, in the sense that the intended eﬀect is actually achieved, by means of suitable interface devices. This means, if the software component performs an action in order, for instance, either to print some text, or to send an e-mail message, or to switch a light on, causal connection should guarantee that this is what actually happens. There are software components however that take other programs as data. An important well-known example is a compiler, which manipulates data structures representing the source program to be translated. A compiler can be written in the language it is intended to translate (for instance, a C compiler can be written in C), or in a diﬀerent language as well. It is important to notice that in any case there is no mixture between the compiler and the source program. The compiler performs a computation whose outcome is some transformed form of the source program. The source program is just text, recorded in a suitable data structure, that is step by step transformed into other representations. In essence, a compiler accepts and manipulates a description of the source program.

Meta-reasoning: A Survey

255

In logic, a language that takes sentences of another language as its objects of discourse is called a meta-language. The other language is called the object language. A clear separation between the object language and the meta-language is necessary: namely, it consists in the fact that sentences written in the metalanguage can refer to sentences written in the object language only by means of some kind of description, or encoding, so that sentences written in the object language are treated as data. As it is well-known, Kurt G¨odel developed a technique (g¨ odelization) for coding the formulas of the theory of arithmetic by means of numbers (g¨odel numbers). Thus, it became possible to write formulas for manipulating other formulas, the latter represented by the corresponding g¨ odel numbers. In this view a compiler is a meta-program, and writing a compiler is more than just programming: it is meta-programming. The language in which the compiler is written acts as a meta-language. The language in which the source program is written acts as the object language. More generally, all tools for program analysis, debugging and transformation are meta-programs. They perform a kind of meta-programming that can be called syntactic meta-programming. Syntactic meta-programming can be particularly useful for theorem proving. In fact, as ﬁrst stressed in [17] and [18], many lemmas and theorems are actually meta-theorems, asserting the validity of a fact by simply looking at its syntactic structure. In this case a software component, namely the theorem prover, consists of two diﬀerent parts: one, that we call the object level, where proofs are performed by repeatedly applying the inference rules; another one, that we call the meta-level, where meta-theorems are stated. We may notice that a theorem prover is an “intelligent” system that performs deduction, which is a form of (mechanized) “reasoning”. Then, we can say that the theorem prover at the object level performs “object-level reasoning”. Meta-theorems take as arguments the description of object-level formulas and theorems, and meta-level proofs manipulate these descriptions. Then, at the meta-level the system performs reasoning about entities that are internal to the system, as opposed to object-level reasoning that concerns entities denoting elements of some external domain. This is why we say that at the meta-level the theorem prover performs “meta-level reasoning”, or shortly meta-reasoning. Meta-theorems are a particular kind of meta-knowledge, i.e. knowledge about properties of the object-level knowledge. The object and the meta-level can usefully interact: meta-theorems can be used in order to shorten object-level proofs, thus improving the eﬃciency of the theorem prover, which can derive proofs more easily. In this view, meta-theorems may constitute auxiliary inference rules that enhance (in a pragmatic view) the “deductive power” of the system [19] [20]. Notice that, at the meta-level, new meta-theorems can also be proved, by applying suitable inference rules. As pointed out in [21], most software components implicitly incorporate some kind of meta-knowledge: there are pieces of object-level code that “do” something in accordance to what meta-knowledge states. For instance, an object-level planner program might “know” that near(b,a) holds whenever near(a,b) holds,

256

Stefania Costantini

while this is not the case for on(a,b). A planner with a meta-level could explicitly encode a meta-rule stating that whenever a relation R is symmetric, then R(a, b) is equivalent to R(b, a) and whenever instead a relation is antisymmetric this is never the case. So, at the meta-level, there could be statements that near is symmetric and on is antisymmetric. The same results could then be obtained by means of explicit meta-reasoning, instead of implicit “knowledge” hidden in the code. The advantage is that the meta-reasoning can be performed in the same way for any symmetric and antisymmetric relation that one may have. Other properties of relations might be encoded at the meta-level in a similar way, and such a meta-level speciﬁcation (which is independent of the speciﬁc object-level knowledge or application domain) might be reused in future applications. There are several possible architectures for meta-knowledge and metareasoning, and many applications. Some of them are reviewed later. For a wider perspective however, the reader may refer to [22], [23], [24], [25], [20], [26], [27], [28], [29], [30], [31], [32], [33] where various speciﬁc architectures, applications and systems are discussed.

3

Reification

Meta-level rules manipulate a representation of object-level knowledge. Since knowledge is represented in some kind of language, meta-rules actually manipulate a representation of syntactic expressions of the object-level language. In analogy with natural language, such a representation is usually called a name of the syntactic expression. The diﬀerence between a word of the language, such as for instance ﬂower, and a name, like “ﬂower”, is the following: the word is used to denote an entity of the domain/situation we are talking about; the name denotes the word, so that we can say that “ﬂower” is composed of six characters, is expressed in English and its translation into Italian is “ﬁore”. That is, a word can be used, while a name can be inspected (for instance to count the characters) and manipulated (for instance translated). An expression in a formal language may have diﬀerent kinds of names that allow diﬀerent kinds of meta-reasoning to be made on that expression. Names are expressions of the meta-language. Taking for instance an equation such as a=b − 2 we may have a very simple name, like in natural language, i.e. “a = b − 2” This kind of name, called quotation mark name, is usually intended as a constant of the meta-language.

Meta-reasoning: A Survey

257

A name may be instead a complex term, such as: equation (lef t hand side(variable(“a”)), (right hand side (binop(minus, f irstop(variable(“b”)), secondop(constant(“2”))))) This term describes the equation in terms of its left-hand side and righthand side and then describes the right-hand side as the application of a binary operator (binop) on two operands (f irstop and secondop) where the ﬁrst operand is a variable and the second one a constant. “a”, “b” and “2” are constants of the meta-language, they are the names of the expressions a, b and 2 of the object language. This more complex name, called a structural description name, makes it easier to inspect the expression (for instance to see whether it contains variables) and to manipulate it (for instance it is possible to transform this name into the name of another equation, by modifying some of the composing terms). Of course, many variations are possible in how detailed names are, and what kind of detail they express. Also, many choices can be made about what names should be: for instance, the name of a variable can be a meta-constant, but can also be a meta-variable. For a discussion of diﬀerent possibilities, with their advantages and disadvantages, see [34], [35], [36]. The deﬁnition of names, being a relation between object-level expressions and meta-level expressions that play the role of names, is usually called naming relation. Which naming relation to choose? In general, it depends upon the kind of meta-reasoning one wants to perform. In fact, a meta-theory can only reason about the properties of object-level expressions made explicit by the naming relation. We may provide names to any language expression, from the simplest, to the more complex ones. In a logic meta-language, we may have names for variables, constants, function and predicate symbols, terms and atoms and even for entire theories: the meta-level may in principle encode and reason about the description of several object-level theories. In practice, there is a trade-oﬀ between expressivity and simplicity. In fact, names should be kept as simple as possible, to reduce the complexity (and improve the readability) of meta-level expressions. Starting from these considerations, [37] argues that the naming relation should be adapted to each particular case and therefore should be deﬁnable by the user. In [38] it is shown that two diﬀerent naming relations can coexist in the same context, for diﬀerent purposes, also providing operators for transforming one representation into the other one. The deﬁnition of a naming relation implies the deﬁnition of two operation: the ﬁrst one, to compute the name of a given language expression. The second one, to compute the expression a given name stands for. The operation of obtaining the name of an object-level expression is called reiﬁcation or referentiation or quoting. The inverse operation is called dereiﬁcation or dereferentiation or unquoting. These are built-in operations, whose operational semantics consists in applying the naming relation in the two directions.

258

Stefania Costantini

In [39] it is shown how the naming relation can be a sort of input parameter for a meta-language. That is, a meta-language may be, if carefully designed, to a large extent independent of the syntactic form of names, and of the class of expressions that are named. Along this line, in [36] and [33] a full theory of deﬁnable naming relations is developed, where a naming relation (with some basic properties) can be deﬁned as a set of equations, with the associated rewrite system for applying referentiation/dereferentiation.

4

Introspection and Reflection

The idea that meta-knowledge and meta-reasoning could be useful for improving the reasoning performed at the object level (for instance by exploiting properties of relations, like symmetry), suggests that the object and the meta-level should interact. In fact, the object and the meta-level can be seen as diﬀerent software components that interact by passing the control to each other. At the object level, the operation of referentiation allows an expression to be transformed into its name and this name can be given as input argument to a meta-level component. This means that object-level computation gives place to meta-level computation. This computational step is called upward reﬂection, or introspection, or shift up. Upward because the meta-level is considered to be a “higher level” with respect to the object level. Reﬂection, or introspection, because the object level component suspends its activity, in order to initiate a meta-level one. This is meant to be in analogy with the process by which people become conscious (at the meta-level of mind) of mental states they are currently in (at the object level). The inverse action, that consists in going back to the object-level activity, is called downward reﬂection, or shift down. The object-level activity can be resumed from where it had been suspended, or can be somehow restarted. Its state (if any) can be the same as before, or can be altered, according to the meta-level activity that has been performed. Downward reﬂection may imply that some name is dereferenced and the resulting expression (“extracted” from the name) given as input argument to the resumed or restarted object-level activity. In logical languages, upward and downward reﬂection can be speciﬁed by means of special inference rules (reﬂection rules) or axioms (reﬂection axioms), that may also state what kind of knowledge is exchanged. In functional and procedural languages, part of the run-time state of the object-level ongoing computation can be reiﬁed and passed to a meta-level function/procedure that can inspect and modify this state. When this function terminates, object-level computation resumes on a possibly modiﬁed state. A reﬂection act, that shifts the level of the activity between the object and the meta-level, may be: explicit, in the sense that it is either invoked by the user (in interactive systems) or determined by some kind of speciﬁcation explicitly present in the text of the theory/program; implicit, in the sense that it is auto-

Meta-reasoning: A Survey

259

matically performed upon occurrence of certain predeﬁned conditions. Explicit and implicit reﬂection may co-exist. Both forms of reﬂection rely on the requirement of causal connection or, equivalently, of introspective ﬁdelity: that is, the recommendations of the metalevel must be always followed at the object level. For instance, in the procedural case, the modiﬁcations to the state performed at the meta-level are eﬀective and have a corresponding impact on the object-level computation. The usefulness of reﬂection consists exactly in the fact that the overall system (object + metalevels) not only reasons about itself, but is also properly aﬀected by the results of that reasoning. In summary, a meta-level architecture for building software components has to provide the possibility of deﬁning a meta-level that by means of a naming relation can manipulate the representation of object-level expressions. Notice that the levels may be several: beyond the meta-level there may be a meta-metalevel that uses a naming relation representing meta-level expressions. Similarly, we can have a meta-meta-meta-level, and so on. Also, we may have one object level and several independent meta-levels with which the object level may be from time to time associated, for performing diﬀerent kinds of meta-reasoning. The architecture may provide a reﬂection mechanism that allows the diﬀerent levels to interact. If the reﬂection mechanism is not provided, then the computation is performed at the meta-level, that simulates the object-level formulas through the naming relation and simulates the object-level inference rules by means of meta-level axioms. As discussed later, this is the case in many of the main approaches to meta-reasoning. The languages in which the object level and the meta-level(s) are expressed may be diﬀerent, or they may coincide. For instance, we may have a meta-level based on a ﬁrst-order logic language, were meta-reasoning is performed about an object level based on a functional or imperative language. Sometimes the languages coincide: the object language and the meta-language may be in fact the same one. In this case, this language is expressive enough as to explicitly represent (some of) its own syntactic expressions, i.e. the language is capable of self-reference. An interesting deep discussion about languages with self-reference can be found in [40] and [41]. The role of introspection in reasoning is discussed in [42] and [43]. An interesting contribution about reﬂection and its applications is [44].

5 5.1

Seminal Approaches FOL

FOL [19], standing for First Order Logic, has been (to the best of our knowledge) the ﬁrst reﬂective system appeared in the literature. It is a proof checker based on natural deduction, where knowledge and meta-knowledge are expressed in diﬀerent contexts. The user can access these contexts both for expressing and for inferring new facts.

260

Stefania Costantini

The FOL system consists of a set of theories, called contexts, based on a ﬁrst-order language with sorts and conditional expressions. A special context named META describes the proof theory and some of the model theory of FOL contexts. Given a speciﬁc context C that we take as the object theory, the naming relation is deﬁned by attachments, which are user-deﬁned explicit deﬁnitions relating symbols and terms in META with their interpretation in C. The connection between C and META is provided by a special linking rule that is applicable in both directions: T heorem(“W ”) W where W is any formula in the object theory C, “W ” is its name, and Theorem(“W ”) is a fact in the meta-theory. By means of a special primitive, called REFLECT, the linking rule can be explicitly applied by the user. Its effect is either that of reﬂecting up a formula W to the meta-theory, to derive meta-theorems involving “W ”, or vice versa that of reﬂecting down a metatheorem “W ”, so that W becomes a theorem of the theory. Meta-theorems can therefore be used as subsidiary deduction rules. Interesting applications of the FOL system to mathematical problems can be found in [17], [45]. 5.2

Amalgamating Language and Meta-language in Logic Programming

A seminal approach to reﬂection in the context of the Horn clause language is MetaProlog, proposed by Bowen and Kowalski [46]. The proposal is based on representing Horn clause syntax and provability in the logic itself, by means of a meta-interpreter, i.e. an interpreter of the Horn clause language written in the Horn clause language itself. Therefore, also in this case the object language and the meta-language coincide. The concept (and the ﬁrst implementation) of a meta-interpreter was introduced by John McCarthy for the LISP programming language [47]. McCarthy in particular deﬁned a universal function, written in LISP, which represents the basic features of a LISP interpreter. In particular, the universal function is able to: (i) accept as input the deﬁnition of a LISP function, together with the list of its arguments; (ii) evaluate the given function on the given arguments. Bowen and Kowalski, with MetaProlog, have developed this powerful and important idea in the ﬁeld of logic programming, where the inference process is based on building proofs from a given theory, rather than on evaluating functions. The Bowen and Kowalski meta-interpreter is speciﬁed via a predicate demo, that is deﬁned by a set of meta-axioms P r, where the relevant aspects of Hornclause provability are made explicit. The Demo predicate takes as ﬁrst argument the representation (name) of an object-level theory T and the representation (name) of a goal A. Demo(“T”,“A”) means that the goal A is provable in the theory T .

Meta-reasoning: A Survey

261

With the above formulation, we might have an approach where inference is performed at the meta-level (via invocation of Demo) and the object level is simulated, by providing Demo with a suitable description “T ” of an object theory T . The strength and originality of MetaProlog rely instead in the amalgamation between the object level and the meta-level. It consists in the introduction of the following linking rules for upward and downward reﬂection: T L A P r M Demo(“T ”, “A”)

P r M Demo(“T ”, “A”) T L A

where M means provability at the meta-level M and L means provability at the object level L. The application of the linking rules coincides, in practice, with the invocation of Demo, i.e., reﬂection is explicit. Amalgamation allows mixed sentences: there can be object-level sentences where the invocation of Demo determines a shift up to the meta-level, and meta-level sentences where the invocation of Demo determines a shift down to the object level. Since moreover the theory in which deduction is performed is an input argument of Demo, several object-level and meta-level theories can co-exist and can be used in the same inference process. Although the extension is conservative, i.e. all theorems provable in L+M are provable either in L or in M alone, the gain of expressivity, in practical terms, is great. Many traditional problems in knowledge representation ﬁnd here a natural formulation. The extension can be made non-conservative, whenever additional rules are added to Demo, to represent auxiliary inference rules and deduction strategies. Additional arguments can be added to Demo for integrating forms of control in the basic deﬁnition of provability. For instance it is possible to control the amount of resources consumed by the proof process, or to make the structure of the proof explicit. The semantics of the Demo predicate is, however, not easy to deﬁne (see e.g. [35], [48], [49], [50]), and holds only if the meta-theory and the linking rules provide an extension to the basic Horn clause language which is conservative, i.e., only if Demo is a faithful representation of Horn clause provability. Although the amalgamated language is far more expressive than the object language alone, enhanced meta-interpreters are (semantically) ruled out, since in that case the extension is non-conservative. In practice, the success of the approach has been great: enhanced metainterpreters are used everywhere in logic programming and artiﬁcial intelligence (see for instance [51], or any other logic programming textbook). This seminal work has initiated the whole ﬁeld of meta-programming in logic programming and computational logic. Problems and promises of this ﬁeld are discussed by Kowalski himself in [52], [53]. The approach of meta-interpreters and other relevant applications of meta-programming are discussed in the next section.

262

5.3

Stefania Costantini

3-LISP

3–Lisp [54] is another important example of a reﬂective architecture where the object language and meta-language coincide. 3–Lisp is a meta-interpreter for Lisp (and therefore it is an elaboration of McCarthy’s original proposal) where (the interesting aspects of) the state of the program that is being interpreted are not stored, but are passed by as an argument of all the functions that are internal to the meta-interpreter. Then, each of these procedures takes the state as argument, makes some modiﬁcation and passes the modiﬁed state to another internal procedure. These procedures call each other tail-recursively (i.e. the next procedure call is the last action they make) so as the state remains always explicit. Such a meta-interpreter is called a meta-circular interpreter. If one assumes that the meta-circular interpreter is itself executed by another metacircular interpreter and so on, one can imagine a potentially inﬁnite tower of interpreters, the lowest one executing the object level program (see the summary and formalization of this approach presented in [55]). Here, the meta-level is accessible from the object level at run-time through a reﬂection act represented by a special kind of function invocation. Whenever the object-level program invokes any function f in this special way, f receives as an additional parameter a representation of the state of the program itself. Then, f can inspect and/or modify the state, before returning control to object-level execution. A reﬂective act implies therefore the reiﬁcation of the state and the execution of f as if it were a procedure internal to the interpreter. Since f might in turn contain a reﬂection act, the meta-circular interpreter is able to reify its own state and start a brand-new copy of itself. In this approach one might in principle perform, via reﬂection, an inﬁnite regress on the reﬂective tower of interpreters. A program is thus able to interrupt its computation, to change something in its own state, and to continue with a modiﬁed interpretation process. This kind of mechanism is called computational reﬂection. The semantics of computational reﬂection is procedural, however, rather than declarative. A reﬂective architecture conceptually similar to 3-Lisp has been proposed for the Horn clause language and has been fully implemented [56]. Although very procedural in nature, and not easy to understand in practice, computational reﬂection has been having a great success in the last few years, especially in the context of imperative and object-oriented programming [11], [12]. Some authors even propose computational reﬂection as the basis of a new programming paradigm [57]. Since computational reﬂection can be perceived as the only way of performing meta-reasoning in non-logical paradigms, this success enlights once more how important meta-reasoning is, especially for complex applications. 5.4

Other Important Approaches

The amalgamated approach has been experimented by Attardi and Simi in Omega [58]. Omega is an object-oriented formalism for knowledge representation

Meta-reasoning: A Survey

263

which can deal with meta-theoretical notions by providing objects that describe Omega objects themselves and derivability in Omega. A non-amalgamated approach in logic programming is that of the G¨ odel language, where object theory and meta-theory are distinct. G¨ odel provides a (conservative) provability predicate, and an explicit form of reﬂection. The language has been developed and experimented in the context of the Compulog European project. It is described in the book [59]. In [60] a contribution to meta-programming in G¨ odel is proposed, on two aspects: on the one hand, a programming style for eﬃcient meta-programming is outlined; on the other hand, modiﬁcations to the implementation are proposed, in order to improve the performance of meta-programs. A project that extends and builds on both FOL and 3–Lisp is GETFOL [61],[62]. It is developed on top of a novel implementation of FOL (therefore the approach is not amalgamated: the object theory and meta-theory are distinct). GETFOL is able to introspect its own code (lifting), to reason deductively about it in a declarative meta-theory and, as a result, to produce new executable code that can be pushed back to the underlying interpretation (ﬂattening). The architecture is based on a sharp distinction between deduction (FOL style) and computation (3–Lisp style). Reﬂection in GETFOL gives access to a meta-theory where many features of the system are made explicit, even the code that implements the system itself. The main objective of GETFOL is that of implementing theorem-provers, given its ability of implementing ﬂexible control strategies to be adapted (via computational reﬂection) to the particular situation. Similarly to FOL, the kind of reasoning performed in GETFOL consists in: (i) performing some reasoning at the meta-level; (ii) using the results of this reasoning to assert facts at the object level. An interesting extension is that of applying this concept to a system with multiple theories and multiple languages (each theory formulated in its own language) [63], where the two steps are reinterpreted as (i) doing some reasoning in one theory and (ii) jumping into another theory to do some more reasoning on the basis of what has been derived in the previous theory. These two deductions are concatenated by the application of bridge rules, which are inference rules where the premises belong to the language of the former theory, and the conclusion belongs to the language of the latter. A diﬀerent concept of reﬂection is embodied in Reﬂective Prolog [39] [64] [65], a self-referential Horn clause language with logical reﬂection. The objective of this approach is that of developing a more expressive and powerful language, while preserving the essential features of logic programming: Horn clause syntax, model-theoretic semantics, resolution via uniﬁcation as procedural semantics, correctness and completeness properties. In Reﬂective Prolog, Horn clauses are extended with self-reference and resolution is extended with logical reﬂection, in order to achieve greater expressive and inference power. The reﬂection mechanism is implicit, i.e., the interpreter of the language automatically reﬂects upwards and downwards by applying suit-

264

Stefania Costantini

able linking rules called reﬂection principles. This allows reasoning and metareasoning to interleave without user’s intervention, so as to exploit both knowledge and meta-knowledge in proofs: in most of the other approaches instead, there is one level which is “ﬁrst–class”, where deduction is actually performed, and the other level which plays a secondary role. Reﬂection principles are embedded in both the procedural and the declarative semantics of the language, that is, in the extended resolution procedure which is used by the interpreter and in the construction of the models which give meanings to programs. Procedurally, this implies that there is no need to axiomatize provability in the meta-theory. Object level reasoning is not simulated by meta-interpreters, but directly executed by the language interpreter, thus avoiding unnecessary ineﬃciency. Semantically, a theory composed of an object level and (one or more) meta-levels is regarded as an enhanced theory, enriched by new axioms which are entailed by the given theory and by the reﬂection principles interpreted as axiom schemata. Therefore, in Reﬂective Prolog, language and metalanguage are amalgamated in a non-conservative extension. Reﬂection in Reﬂective Prolog gives access to a meta-theory where various kinds of meta-knowledge can be expressed, either about the application domain or about the behavior of the system. Deduction in Reﬂective Prolog means using at each step either meta-level or object level knowledge, in a continuous interleaving between levels. Meta-reasoning in Reﬂective Prolog implies a declarative deﬁnition of meta-knowledge, which is automatically integrated into the inference process. The relation between meta-reasoning in Reﬂective Prolog and modal logic has been discussed in [66]. An interpreter of Reﬂective Prolog has been fully implemented [67]. It is interesting to notice that Reﬂective Prolog has been implemented by means of computational reﬂection. This is another demonstration that computational reﬂection can be a good (although low-level) implementation tool. An approach that has been successful in the context of object-oriented languages, including the most recent ones like Java, is the meta-object protocol. A meta-object protocol [68] [69] gives every object a corresponding meta-object that is an instance of a meta-class. Then, the behavior of an object becomes the behavior of the object/meta-object pair. At the meta-level, important aspects such as the operational semantics of inheritance, instantiation and method invocation can be deﬁned. A meta-object protocol constitutes a ﬂexible mean of modifying and extending an object-oriented language. This approach has been applied to logic programming, in the ObjVProlog language [70] [71]. In addition to the above-mentioned meta-class capabilities, this language preserves the Prolog capabilities of manipulating clauses in the language itself, and provides a provability predicate. As an example of more recent application of this approach, a review of Java reﬂective implementations can be found in [72]. A limitation is that only aspects directly related to objects can be described in a meta-object. Properties of sets of objects, or of the overall system, cannot

Meta-reasoning: A Survey

265

be directly expressed. Nevertheless, some authors [72] argue that non-functional requirements such as security, fault-tolerance, atomicity, can be implemented by implicit reﬂection to the meta-object before and after the invocation of every object method.

6

Applications of Meta-reasoning

Meta-reasoning has been widely used for a variety of purposes, and recently the interest in new potential applications of meta-reasoning and reﬂection has been very signiﬁcant. In this section, we provide our (necessarily partial and limited) view of some of the more relevant applications in the ﬁeld. 6.1

Meta-interpreters

After the seminal work of Bowen and Kowalski [46], the most common application of meta-logic in computational logic is to deﬁne and to implement metainterpreters. This technique has been especially used in Prolog (which is probably the most popular logic programming language) for a variety of purposes. The basic version of a meta-interpreter for propositional Horn clause programs, reported in [53], is the following. demo(T, P ) ← demo(T, P ← Q), demo(T, Q). demo(T, P ∧ Q) ← demo(T, P ), demo(T, Q). In the above deﬁnition, ’∧’ names conjunction and ’←’ names ’←’ itself. A theory can be named by a list containing the names of its sentences. In the propositional case, formulas and their names may coincide without the problems of ambiguity (discussed below), that arise in presence of variables. If a theory is represented by a list, then the meta-interpreter must be augmented by the additional meta-axiom: demo(T, P ) ← member(T, P ). For instance, query ?q to program q ← p, s. p. s. can be simulated by query ?demo([q ← p ∧ s, p, s], q) to the above metainterpreter. Alternatively, it is possible to use a constant symbol to name a theory. In this case, the theory, say t1, can be deﬁned by the following metalevel axioms: demo(t1, q ← p ∧ s). demo(t1, p). demo(t1, s). and the query becomes ?demo(t1, q).

266

Stefania Costantini

The meta-axioms deﬁning demo can be themselves regarded as a theory that can be named, by either a list or a constant (say d). Thus, it is possible to write queries like ?demo(d, demo(t1, q)) which means to ask whether we can derive, by the meta-interpreter d, that the goal q can be proved in theory t1. In many Prolog applications however, the theory argument is omitted, as in the so-called “Vanilla” meta-interpreter [35]. The standard declarative formulation of the Vanilla meta-interpreter in Prolog is the following (where ’:−’ is the Prolog counterpart of ’←’ and ’&’ indicates conjunction): demo(empty). demo(X) :−clause(X, Y ), demo(Y ). demo(X&Y ) :−demo(X), demo(Y ). For the above object-level program, we should add to the meta-interpreter the unit clauses: clause(q, p&s). clause(p, empty). clause(s, empty).. and the query would be :− demo(q). The vanilla meta-interpreter can be used for propositional programs, as well as for programs containing variables. In the latter case however, there is an important ambiguity concerning variables. In fact, variables in the object-level program are meant to range (as usual) over the domain of the program. These variables are instantiated to object-level terms. Instead, the variables occurring in the deﬁnition of the meta-interpreter, are intended to range over object-level atoms. Then, in a correct approach these are meta-variables (for an accurate discussion of this problem see [34]). In [35], a typed version of the Vanilla meta-interpreter is advocated and its correctness proved. In [46] and [65], suitable naming mechanisms are proposed to overcome the problem. Since however it is the untyped version that is generally used in Prolog practice, some researchers have tried to specify a formal account of the Vanilla metainterpreter as it is. In particular, a ﬁrst-order logic with ambivalent syntax has been proposed to this purpose [73], [74] and correctness results have been obtained [75]. The Vanilla meta-interpreter can be enhanced in various ways, often by making use of built-in Prolog meta-predicates that allow Prolog to act as a metalanguage of itself. These predicates in fact are aimed at inspecting, building and modifying goals and at inspecting the instantiation status of variables. First, more aspects of the proof process can be made explicit. In the above formalization, uniﬁcation is implicitly demanded to the underlying Prolog interpreter and so is the order of execution of subgoals in conjunctions. Below is a formulation where these two aspects become explicit. Uniﬁcation is performed by a unify procedure and reorder rearranges subgoals of the given conjunction.

Meta-reasoning: A Survey

267

demo(empty). demo(X) :−clause(H, Y ), unif y(H, X, Y, Y 1), demo(Y 1). demo(X&Y ) :−reorder(X&Y, X1&Y 1), demo(X1), demo(Y 1). Second, extra arguments can be added to demo, to represent for instance: the maximum number of steps that demo is allowed to perform; the actual number of steps that demo has performed; the proof tree; an explanation to be returned to a user and so on. Clearly, the deﬁnition of the meta-interpreter will be suitably modiﬁed according to the use of the extra arguments. Third, extra rules can enhance the behavior of the meta-interpreter, by specifying auxiliary deduction rules. For instance, the rule demo(X) :−ask(X, yes). states that we consider X to be true, if the user answers “yes” when explicitly asked about X. In this way, the meta-interpreter exhibits an interactive behavior. The auxiliary deduction rules may be several and may interact. In Reﬂective Prolog, [65] one speciﬁes the additional rules only, while the deﬁnition of standard provability remains implicit. In the last example for instance, on failure of goal X, a goal demo(X) would be automatically generated (this is an example of implicit upward reﬂection), thus employing the additional rule to query the user about X. An interesting approach to meta-interpreters is that of [76], [77], where a binary predicate demo may answer queries with uninstantiated variables, which represent arbitrary fragments of the program currently being executed. The reader may refer to [51] for an illustration of the meta-interpreter programming techniques and of their applications, including the speciﬁcation of Expert Systems in Prolog. 6.2

Theory Composition and Theory Systems

Theory construction and combination is an important tool of software engineering, since it promotes modularity, software reuse and programming-in-thelarge. In [53] it is observed that theory-construction can be regarded as a metalinguistic operation. Within the Compulog European projects, two meta-logic approaches to working with theories have been proposed. In the Algebra of Logic Programs, proposed in [78] and [79], a program expression deﬁnes a combination of object programs (that can be seen as theories, or modules) through a set of composition operators. The provability of a query with respect to a composition of programs can be deﬁned by meta-axioms specifying the intended meaning of the various composition operations. Four basic operations for composing logic programs are introduced: encapsulation (denoted by ∗), union (∪), intersection (∩) and import (). Encapsulation copes with the requirement that a module can import from another one only its functionality, without caring of the implementation. This kind of behavior can be realized by encapsulation and union: if P is the “main program” and S is a module, the combined program is: P ∪ S∗

268

Stefania Costantini

Intersection yields a combined theory where both the original theories are forced to agree during deduction, on every single partial conclusion. The operation builds a module P Q out of two modules P and Q, where P is the visible part and Q the hidden part of the resulting module. The usefulness of these operators for knowledge representation and reasoning is shown in [78]. The meta-logical deﬁnition of the operations is given in [79], by extending the Vanilla meta-interpreter. Two alternative implementations using the G¨ odel programming language are proposed and discussed in [80]. One extends the untyped Vanilla meta-interpreter. The other one exploits the metaprogramming facilities oﬀered by the language, thus using names and typed variables. The second, cleaner version seems to the authors themselves more suitable than the ﬁrst one, for implementing program composition operations requiring a ﬁne-grained manipulation of the object programs. In the Alloy language, proposed in [81] and [82], a theory system is a collection of interdependent theories, some of which stand in a meta/object relationship, forming an arbitrary number of meta-levels. Theory systems are proposed for a meta-programming based software engineering methodology aimed at specifying, for instance, reasoning agents, programs to be manipulated, programs that manipulate them, etc. The meta/object relationship between theories provides the inspection and control facilities needed in these applications. The basic language of theory systems is a deﬁnite clause language, augmented with ground names for every well-formed expression of the language. Each theory is named by a ground theory term. A theory system can be deﬁned out of a collection of theories by using the following tools. 1. The symbol ’’ for relating theory terms and sentences. A theoremhood statement, like for instance t1 u1 Ψ where t1 and u1 are theory terms, says that u1 Ψ is a theorem of theory t1 . 2. The distinguishes function symbol ’’, where t1 t2 means that t1 is a metatheory of t2 . 3. The coincidence statement t1 ≡ t2 , expressing that t1 and t2 have exactly the same theorems. The behavior of the above operators is deﬁned by reﬂection principles (in the form of meta-axioms) that are suitably integrated in the declarative and proof-theoretic semantics. 6.3

The Event Calculus

Representing and reasoning about actions and temporally-scoped relations has been for years one of the key research topics in knowledge representation [83]. The Event Calculus (EC) has been proposed by Kowalski and Sergot [84] as a system for reasoning about time and actions in the framework of Logic Programming. In particular, the Event Calculus adapts the ontology of McCarthy and Hayes’s Situation Calculus [85] i.e., actions and ﬂuents 1 , to a new task: assimilating a narrative, which is the description of a course of events. The essential 1

It is interesting to notice that the ﬂuent/ﬂuxion terminology dates back to Newton

Meta-reasoning: A Survey

269

idea is to have terms, called ﬂuents, which are names of time-dependent relations. Kowalski and Sergot however write holds(r(x, y), t) which is understood as “ﬂuent r(x, y) is true at time t”, instead of r(x, y, t) like in situation calculus. It is worthwhile to discuss the connection between Kowalski’s work on metaprogramming and the deﬁnition of the Event Calculus. In the logic programming framework it comes natural to recognize the higher-order nature of timedependent propositions and to try to represent them at the meta-level. Kowalski in fact [86] considers McCarthy’s Situation Calculus and comments: Thus we write Holds(possess(Bob, Book1), S0) instead of the weaker but also adequate P ossess(Bob, Book1, S0). In the ﬁrst formulation, possess(Bob, Book1) is a term which names a relationship. In the second, P ossess(Bob, Book1, S0) is an atomic formula. Both representations are expressed within the formalism of ﬁrstorder classical logic. However, the ﬁrst allows variables to range over relationships whereas the second does not. If we identify relationships with atomic variable-free sentences, then we can regard a term such as possess(Bob, Book1) as the name of a sentence. In this case Holds is a meta-level predicate [ . . . ] There is a clear advantage with reiﬁcation from the computational point of view: by reifying, we need to write only one frame axiom, or inertia law, saying that truth of any relation does not change in time unless otherwise speciﬁed. Negation-as-failure is a natural choice for implementing the default inertia law. In a simpliﬁed, time points-oriented version, default inertia can be formulated as follows: Holds(f, t) ← Happens(e), initiates(e, f ), Date(e, ts ), ts < t, not Clipped(ts , f, t) where Clipped(ts , f, t) is true when there is record of an event happening between ts and t that terminates the validity of f . In other words, Holds(f, t) is derivable whenever in the interval between the initiation of the ﬂuent and the time the query is about, no terminating events has happened. It is easy to see Holds as a specialization of Demo. Kowalski and Sadri [87] [88], discuss in depth how an Event Calculus program can be speciﬁed and assumptions on the nature of the domain accommodated, by manipulating the usual Vanilla meta-interpreter deﬁnition.

270

Stefania Costantini

Since the ﬁrst proposal, a number of improved formalization have steamed, in order to adapt the calculus to diﬀerent tasks, such as abductive planning, diagnosis, temporal database and models of legislation. All extensions and applications cannot be accounted for here, but the reader may for instance refer to [89], [90], and [91]. 6.4

Logical Frameworks

A logical framework [92] is a formal system that provides tools for experimenting with deductive systems. Within a logical framework, a user can invent a new deductive system by deﬁning its syntax, inference rules and proof-theoretic semantics. This speciﬁcation is executable, so as the user can make experiments with this new system. A logical framework however cannot reasonably provide tools for deﬁning any possible deductive system, but will stay within a certain class. Formalisms with powerful meta-level features and strong semantic foundations have the possibility of evolving towards becoming logical frameworks. The Maude system for instance [93] is a particular implementation of the meta-theory of rewriting logic. It provides the predeﬁned functional module META-LEVEL, where Maude terms can be reiﬁed and where: the process of reducing a term to a normal form is represented by a function meta-reduce; the default interpreter is represented by a function meta-rewrite; the application of a rule to a term by meta-apply. Recently, a reﬂective version of Maude has been proposed [94], based on the formalization of computational reﬂection proposed in [95]. The META-LEVEL module has been made more ﬂexible, so as to allow a user to deﬁne the syntax of her own logic language L by means of meta-rules. The new language must however consist in an addition/variation to the basic syntax of the Maude language. Reﬂection is the tool for integrating the user-deﬁned syntax into the proof procedure of Maude. In particular, whenever a piece of user-deﬁned syntax is found, a reﬂection act to the META-LEVEL module happens, so as to apply the corresponding syntactic meta-rules. Then, the rewriting system Maude has evolved into a logical framework for logic languages based on rewriting. The RCL (Reﬂective Computational Logic) logical framework [33] is an evolution of the Reﬂective Prolog metalogic language. The implicit reﬂection of Reﬂective Prolog has a semantic counterpart [39] in adding to the given theory a set of new axioms called reﬂection axioms, according to axiom schemata called reﬂection principles. Reﬂection principles can specify not only the shift between levels, but also many other meta-reasoning principles. For instance, reﬂection principles can deﬁne forms of analogical reasoning [96], and synchronous communication among logical agents [97]. RCL has originated from the idea that, more generally, reﬂection principles may be used to express the inference rules of user-deﬁned deductive systems. The deductive systems that can be speciﬁed in RCL are however evolutions of the Horn clause language, based on a predeﬁned enhanced syntax. A basic version

Meta-reasoning: A Survey

271

of naming is provided in the enhanced Horn clause language, formalized through an equational theory. The speciﬁcation of a new deductive system DS in RCL is accomplished through the following four steps. Step I Deﬁnition of the naming device (encoding) for DS. The user deﬁnition must extend the predeﬁned one. RCL leaves signiﬁcant freedom in the representation of names. Step II After deﬁning the naming convention, the user of RCL has to provide a corresponding uniﬁcation algorithm (again by suitable additions to the predeﬁned one). Step III Representation of the axioms of DS, in the form of enhanced Horn clauses. Step IV Deﬁnition of the inference rules of DS as reﬂection principles. In particular, the user is required to express each inference rule R as a function R, from clauses, which constitute the antecedent of the rule, to sets of clauses, which constitute the consequent. Then, given a theory T of DS consisting of a set of axioms A and a reﬂection principle R, a theory T containing T is obtained as the deductive closure of A ∪ A , where A is the set of additional axioms generated by R. Consequently, the model-theoretic and ﬁxed point semantics of T under R are obtained as the model-theoretic and ﬁxed point semantics of T . RCL does not actually generate T . Rather, given a query for T , RCL dynamically generates the speciﬁc additional axioms usable to answer the query according to the reﬂection principle R, i.e., according to the inference rule R of DS. 6.5

Logical Agents

In the area of intelligent software agents there are several issues that require the integration of some kind of meta-reasoning ability into the system. In fact, most existing formalisms, systems and frameworks for deﬁning agents incorporate, in diﬀerent forms, a meta-component. An important challenge in this area is that of interconnecting several agents that are heterogeneous in the sense that they are not necessarily uniform in the implementation, in the knowledge they possess and in the behavior they exhibit. Any framework for developing multi-agent systems must provide a great deal of ﬂexibility for integrating heterogeneous agents and assembling communities of independent service providers. Flexibility is required in structuring cooperative interactions among agents, and for creating more accessible and intuitive user interfaces. Meta-reasoning is essential for obtaining such a degree of ﬂexibility. Metareasoning can either be performed within the single agent, or special meta-agents can be designed, to act as meta-theories for sets of other agents. Meta-reasoning can help: (i) in the interaction among agents and with the user; (ii) in the implementation suitable strategies and plans for responding to requests. These

272

Stefania Costantini

strategies can be either domain-independent, or rely on domain- and applicationspeciﬁc knowledge or reasoning (auxiliary inference rules, learning algorithms, planning, and so forth) Meta-rules and meta-programming may be particularly useful for coping with some aspects of the ontology problem: meta-rules can switch between descriptions that are syntactically diﬀerent though semantically equivalent, and can help ﬁll the gap between descriptions that are not equivalent. Also, meta-reasoning can be used for managing incomplete descriptions or requests. The following are relevant examples of approaches to developing agent systems that make use of some form of meta-reasoning. In the Open Agent ArchitectureT M [98], which is meant for integrating a community of heterogeneous software agents, there are specialized server agents, called facilitators, that perform reasoning (and, more or less explicitly, metareasoning) about the agent interactions necessary for handling a complex expression. There are also meta–agents, that perform more complex meta-reasoning so as to assist the facilitator agent in coordinating the activities of the other agents. In the constraint logic programming language CaseLP, there are logical agents, which show capabilities of complex reasoning, and interface agents, which provide an interface with external modules. There are no meta-agents, but an agent has meta–goals that trigger meta-reasoning to guide the planning process. There are applications where agents may have objectives and may need to reason about their own as well as other agents’ beliefs and about the actions that agents may take. This is the perspective of the BDI formalization of multiagent systems proposed in [99] and [100], where BDI stands for “Belief, Desire, Intentions”. The approach of Meta-Agents [101] allow agents to reason about other agents’ state, beliefs, and potential actions by introducing powerful meta-reasoning capabilities. Meta-Agents are a speciﬁcation tool, since for eﬃcient implementation they are translated into ordinary agent programs, plus some integrity constraints. In logic programming, research on multi-agent systems starts, to the best of our knowledge, from the work by Kim and Kowalski in [102], [103]. The amalgamation of language and meta-language and the demo predicate with theories named by constants are used for formalizing reasoning capabilities in multi-agent domains. In this approach, the demo predicate is interpreted as a belief predicate and thus agents can reason, like in the BDI approach, about beliefs. In the eﬀort of obtaining logical agents that are rational, but also reactive (i.e. logical reasoning agents capable of timely response to external events) a more general approach has been proposed in [82], by Kowalski, and in [104] and [105] by Kowalski and Sadri. A meta-logic program deﬁnes the “observe-think-act” cycle of an agent. Integrity constraints are used to generate actions in response to updates from the environment. In the approach of [97], agents communicate via the two meta-level primitives tell/told. An agent is represented by a theory, i.e. by a set of clauses preﬁxed with the corresponding theory name. Communication between agents is formalized by the following reﬂection principle Rcom :

Meta-reasoning: A Survey

273

T : told (“S”, “A”)⇐Rcom S : tell (“T”, “A”). The intuitive meaning is that every time an atom of the form tell (“T”,“A”) can be derived from a theory S (which means that agent S wants to communicate proposition A to agent T ), the atom told (“S”,“A”) is consequently derived in theory T (which means that proposition A becomes available to agent T ). The objective of this formalization is that each agent can specify, by means of clauses deﬁning the predicate tell, the modalities of interaction with the other agents. These modalities can thus vary with respect to diﬀerent agents or different conditions. For instance, let P be a program composed of three agents, a and b and c, deﬁned as follows. a : tell (X, “ciao”):- friend (X). a : friend (“b”). b : happy :-told(“a”, “ciao”). c : happy :-told(“a”, “ciao”). Agent a says “ciao” to every other agent X that considers to be its friend. In the above deﬁnition, the only friend is b. Agents b and c are happy if a says “ciao” to them. The conclusion happy can be derived in agent b, while it cannot be derived in agent c. In fact, we get a : tell (“b”,“ciao”) from a : friend (“b”); instead, a : tell (“c”,“ciao”) is not a conclusion of agent a. In [106], Dell’Acqua, Sadri and Toni propose an approach to logic-based agents as a combination of the above approaches, i.e. the approach to agents by Kowalski and Sadri [105] and the approach to meta-reasoning by Costantini et al. [65], [97]. Similarly to Kowalski and Sadri’s agents, the agents in [106] are hybrid in that they exhibit both rational (or deliberative) and reactive behavior. The reasoning core of these agents is a proof procedure that combines forward and backward reasoning. Backward reasoning is used primarily for deliberative activities. Forward reasoning is used primarily for reactivity to the environment, possibly including other agents. The proof procedure is executed within an “observe-think-act” cycle that allows the agent to be alert to the environment and react to it, as well as think and devise plans. The proof procedure (IFF proof procedure proposed by Fung and Kowalski in [107]) treats both inputs from the environment and agents’ actions as abducibles (hypotheses). Moreover, by adapting the techniques proposed in [97], the agents are capable of reasoning about their own beliefs and the beliefs of other agents. In [108], the same authors extend the approach by providing agents with proactive communication capabilities. Proactive agents are able to communicate on their own initiative, not only in response to stimula. In the resulting framework reactive, rational or hybrid agents can reason about their own beliefs as well as the beliefs of other agents and can communicate proactively with each other. The agents’ behavior can be regulated by condition-action rules. In this approach, there are two primitives for communication, tell and ask, treated as abducibles within the “observe-think-act” cycle of the agent architecture. The

274

Stefania Costantini

predicate told is used to express both passive reception of messages from other agents and reception of information in response to an active request. The following example is taken by [108] and is aimed at illustrating the basic features of the approach. Let Ag be represented by the abductive logic program P, A, I with: told(A, X) ← ask(A, X) ∧ tell(A, X) told(A, X) ← tell(A, X) P = solve(X) ← told(A, X) desire(y) ← y = car good price(p, x) ← p = 0 A = tell, ask, offer

desire(x) ∧ told(B,good price(p,x)) I = . ⇒ tell(B,offer(p,x)) The ﬁrst two clauses in P state that Ag may be told something, say X, by another agent A either because A has been explicitly asked about X (ﬁrst clause) or because A tells X proactively (second clause). The third clause in P says that Ag believes anything it is told. The fourth and ﬁfth clauses in P say, respectively, that the agent desires a car and that anything that is free is at a good price. The integrity constraint says that, if the agent desires something and it is told (by some other agent B) of a good price for it, then it makes an oﬀer to B, by telling it. The logic programming language DALI [109], is indebted to all previously mentioned approaches to logical agents. DALI introduces explicit reactive and proactive rules at the object level. Thus, reactivity and proactivity are modeled in the basic logic language of the agent In fact, declarative semantics is very close to that of the standard Horn clause language. Procedural semantics relies on an extended resolution. The language incorporates tell/told primitives, integrity constraints and solve rules. An “observe-think-act” cycle can of course been implemented in a DALI agent, but it is no longer necessary for modeling reactivity and proactivity. Below is a simpliﬁed fragment of a DALI agent representing the waiter of a pub, that tries to serve a customer that enters. The customer wants some X. This request is an external event (indicated with ’E’) that arrives to the agent. The event triggers a reactive rule (indicated with ’:>’ instead of usual ’:-’), and determines the body of the rule to be executed. This is very much like any other goal: only, computation is not initiated by a query, but starts on reception of the event. During the execution of the body of the reactive rule, the waiter ﬁrst checks whether X is one of the available drinks. If so, the waiter serves the drink: the predicate serve drink is in fact an action (indicated with ’A’). Otherwise, the waiter checks whether the request is expressed in some foreign language, for which a translation is available (this is a simple example of coping with one

Meta-reasoning: A Survey

275

aspect of the ontology problem). If this is not the case, the waiter asks the customer for explanation about X: it expects to be told that X is actually an Y , in order to try to serve this Y . Notice that the predicate translate is symmetric, where symmetry is managed by the solve rule. To understand the behavior, one can assume this rule to be an additional rule of a basic meta-interpreter that is not explicitly reported. A subgoal like translate(beer, V ) is automatically transformed into a call to the meta-interpreter, of the form solve(“translate”(“beer”, “V ”)) (formally, this is implicit upward reﬂection). Then, since symmetric(“translate”) succeeds, solve(“translate”(“beer”, “V ”)) is attempted, and automatically reﬂected at the object level (formally, this is implicit downward reﬂection). Finally, the unquoted subgoal translate(beer, V ) succeeds with V instantiated to birra. W aiter request(Customer,“X”)E :> serve(Customer,X). serve(C,X) :- drink(X), serve drink(C,X)A . serve(C,X) :- translate(X,Y), drink(Y), serve drink(C,Y)A . serve(C,X) :- ask(C, X, Y ), serve(C, Y ). ask(C,X,Y) :- ask for explanation(C,“X”),told(C,“Y”). drink(beer). drink(coke). translate(birra,beer). translate(cocacola,coke). symmetric(“translate”). solve(“P”(“X”,“Y”)) :- symmetric(“P ”), solve(“P ”(“Y ”, “X”)). Agents that interact with other agents and/or with an external environment, may expand and modify their knowledge base by incorporating new information. In a dynamic setting, the knowledge base of an agent can be seen as the set of beliefs of the agent, that may change over time. An agent may reach a stage where its beliefs have become inconsistent, and actions must be taken to regain consistency. The theory of belief revision aims at modeling how an agent updates its state of belief as a result of receiving new information [110], [111]. Belief revision is, in our opinion, another important issue related to intelligent agents where meta-reasoning can be usefully applied.

276

Stefania Costantini

In [32] a model-based diagnosis system is presented, capable of revision of the description of the system to be diagnosed if inconsistencies arise from observations. Revision strategies are implemented by means of meta-programming and meta-reasoning methods. In [112], a framework is proposed where rational, reactive agents can dynamically change their own knowledge bases as well as their own goals. In particular, an agent can make observations, learn new facts and new rules from the environment (even in contrast with its current knowledge) and then update its knowledge accordingly. To solve contradictions, techniques of contradiction removal and preferences among several sources can be adopted [113]. In [114] it is pointed out that most existing approaches to intelligent agents have diﬃculties to model the way agents revise their beliefs, because new information always come together certain meta-information: e.g., where the new information comes from? Is the source reliable? and so on. Then, the agent has to reason about this meta-information, in order to revise its beliefs. This leads to the proposal of a new approach, where this meta-information can be explicitly represented and reasoned about, and revision strategies can be deﬁned in a declarative way.

7

Semantic Issues

In computational logic, meta-programming and meta-reasoning capabilities are mainly based on self-reference, i.e. on the possibility of describing language expressions in the language itself. In fact, in most of the relevant approaches the object language and the meta-language coincide. The main tool for self-reference is a naming mechanism. An alternative form of self-reference has been proposed by McCarthy [115], who suggests that introducing function symbols denoting concepts (rather than quoted expressions) might be suﬃcient for most forms of meta-reasoning. But Perlis [40] observes: “The last word you just said” is an expression that although representable as a function still refers to a particular word, not to a concept. Thus quotation seems necessarily involved at some point if we are to have a self-describing language. It appears we must describe speciﬁc expressions as carriers of (the meaning of) concepts. The issue of appropriate language facilities for naming is addressed by Hill and Lloyd in [35]. They point out the distinction between two possible representation schemes: the non-ground representation, in which an object-level variable is represented by a meta-level variable, and the ground representation, in which object-level expressions are represented by ground (i.e. variable free) terms at the meta-level. In the ground representation, an object level variable may be represented by a meta-level constant, or by any other ground term. The problem with the non-ground representation is related to meta-level predicates such as the Prolog var(X), which is true if the variable X is not instantiated, and is false otherwise. As remarked in [35]:

Meta-reasoning: A Survey

277

To see the diﬃculty, consider the goals: :−var(X) ∧ solve(p(X)) and :−solve(p(X)) ∧ var(X) If the object program consists solely of the clause p(a), then (using the “leftmost literal” computation rule) the ﬁrst goal succeeds, while the second goal fails. Hill and Lloyd propose a ground representation of expressions of a ﬁrst-order language L in another ﬁrst-order language L with three types ω, µ and η. Definition 1 (Hill and Lloyd ground representation). Given a constant a in L, there is a corresponding constant a of type ω in L . Given a variable x in L, there is a corresponding constant x of type ω in L . Given an n-ary function symbol f in L, there is a corresponding n-ary function symbol f of type ω × . . . ω −→ ω in L . Given an n-ary predicate symbol p in L, there is a corresponding n-ary function symbol f of type ω × . . . ω −→ µ in L . The language L has a constant empty of type µ. The mappings a −→ a , x −→ x , f −→ f and p −→ p are all injective. Moreover, L contains some function and predicate symbols useful for declaratively redeﬁning the “impure” features of Prolog and the Vanilla metainterpreter. For instance we will have: constant(a1 ). ... constant(an ). ∀ω x nonvar(x) ← constant(x). ∀ω x var(x) ← ¬ nonvar(x). The above naming mechanism is used in [35] for providing a declarative semantics to a meta-interpreter that implements SLDNF resolution [116] for normal programs and goals. This approach has then evolved into the metalogical facilities of the G¨ odel language [59]. Notice that, since names of predicate symbols are function symbols, properties of predicates (e.g. symmetry) cannot be explicitly stated. Since levels in G¨odel are separated rather than amalgamated, this naming mechanism does not provide operators for referentiation/dereferentiation. An important issue raised in [40] is the following: Now, it is essential to have also an un-naming device that would return a quoted sentence to its original (assertive) form, together with axioms stating that that is what naming and un-naming accomplish.

278

Stefania Costantini

Along this line, the approach of [36], developed in detail in [117], proposes to name an atom of the form α0 (α1 , . . . , αn ) as [β0 , β1 , . . . , βn ], where each βi is the name of αi . The name of the name of α0 (α1 , . . . , αn ) is the name term [γ0 , γ1 , . . . , γn ], where each γi is the name of βi , etc. Requiring names of compound expressions to be compositional allows one to use uniﬁcation for constructing name terms and accessing their components. In this approach, we are able to express properties of predicates by using their names. For instance, we can say that predicate p is binary and predicate q is symmetric, by asserting binary pred (p1 ) and symmetric(q 1 ). Given a term t and a name term s, the expression ↑ t indicates the result of quoting t and the expression ↓ s indicates the result of unquoting s. The following axioms for the operators ↑ and ↓ formalize the relationship between terms and the corresponding name terms. They form an equality theory, called NT and ﬁrst deﬁned in [118], for the basic compositional encoding outlined above. Enhanced encodings can be obtained by adding axioms to this theory. N T states that there exist names of names (each term can be referenced n times, for any n ≥ 0) and that the name of a compound term is obtained from the names of its components. Definition 2 (Basic encoding NT ). Let NT be the following equality theory. – For every constant or meta-constant cn , n ≥ 0, ↑ cn = cn+1 . – For every function or predicate symbol f of arity k, ∀x1 . . . ∀xk ↑ (f (x1 , . . . , xk )) = [f 1 , ↑ x1 , . . . , ↑ xk ]. – For every compound name term [x0 , x1 , . . . , xk ] ∀x0 . . . ∀xk ↑ [x0 , x1 , . . . , xk ] = [↑ x0 , ↑ x1 , . . . , ↑ xk ]. – For every term t ↓↑ t = t. The above set of axioms admits an associated convergent rewrite system U N . Then, a corresponding extended uniﬁcation algorithm (E-uniﬁcation algorithm) U A(U N ) can be deﬁned, that deals with name terms in addition to usual terms. In [118] it is shown that: Proposition 1 (Unification Algorithm for NT ). The E-uniﬁcation algorithm U A(U N ) is sound for NT, terminates and converges. The standard semantics of the Horn clause language can be adapted, so as to include the naming device. Precisely, the technique of quotient universes by Jaﬀar et al. [119] can be used to this purpose. Definition 3 (Quotient Universe). Let R be a congruence relation. The quotient universe of U with respect to R, indicated as U/R, is the set of the equivalence classes of U under R, i.e., the partition given by R in U . By taking R as the ﬁnest congruence relation corresponding to U N (that always exists) we get the standard semantics of the Horn clause language [116], modulo the naming relation. The naming relation can be extended according to the

Meta-reasoning: A Survey

279

application domain at hand, by adding new axioms to N T and by correspondingly extending U N and U A(U N ), provided that their nice formal properties are preserved. What is important is that, as advocated in [37], the approach to meta-programming and the approach to naming become independent. It is important to observe that, as shown in [36], any (ground or non-ground) encoding providing names for variables shows in an amalgamated language the same kind of problems emphasized in [35]. In fact, let P be the following deﬁnite program, x an object-level variable and Y a meta-variable: p(x) :- Y =↑ x, q(Y ) q(a1 ). Goal :-p(a) succeeds by ﬁrst instantiating Y to a1 and then proving q(a1 ). In contrast, the goal :-p(x) fails, as Y is instantiated to the name of x, say x1 , and subgoal q(x1 ) fails, x1 and a1 being distinct. Therefore, if choosing naming mechanisms providing names for variables, on the one hand terms can be inspected with respect to variable instantiation, on the other hand however important properties are lost. A ground naming mechanism is used in [49] for providing a declarative semantics to the (conservative) amalgamation of language and meta-language in logic programming. A naming mechanism where each well-formed expression can act as a name of itself is provided by the ambivalent logic AL of Jiang [73]. It is based on the assumption that each expression can be interpreted as a formula, as a term, as a function and as a predicate, where predicates and functions have free arity. Uniﬁcation must be extended accordingly, with the following results: Theorem 1 (Termination of AL Unification Algorithm). The uniﬁcation algorithm for ambivalent logic terminates. Theorem 2 (Correctness of AL Unification Algorithm). If the uniﬁcation algorithm for ambivalent logic terminates successfully, then it provides an ambivalent uniﬁer. If the algorithm halts with failure, then no ambivalent uniﬁer exists. The limitation is that ambivalent uniﬁers are less general than traditional uniﬁers. Theorem 3 (Properties of Resolution for AL). Resolution is a sound and complete inference method for AL. Ambivalent logic has been used in [75] for proving correctness of the Vanilla meta-interpreter, also with respect to the (conservative) amalgamation of object language and meta-language. Let P be the object program, LP the language of P , VP the Vanilla meta-interpreter and LVP the language of VP . Let MP be the least Herbrand model of P , MVP be the least Herbrand model of VP , and MVP ∪P be the least Herbrand model of VP ∪ P . We have:

280

Stefania Costantini

Theorem 4 (Properties of Vanilla Meta-Interpreter under AL). For all (ground) A in LVP , demo(A) ∈ MVP iﬀ demo(A) ∈ MVP ∪P ; for all (ground) A in LP , demo(A) ∈ MP iﬀ demo(A) ∈ MVP ∪P A similar result is obtained by Martens and De Schreye in [120] and [50] for the class of language independent programs. They use a non-ground representation with overloading of symbols, so as the name of an atom is a term, identical to the atom itself. Language independent programs can be characterized as follows: Proposition 2 (Language Independence). Let P be a deﬁnite program. Then P is language independent iﬀ for any deﬁnite goal G, all (SLD) computed answers for P ∪ G are ground. Actually however, the real practical interest lies in enhanced metainterpreters. Martens and De Schreye extend their results to meta-interpreters without additional clauses, but with additional arguments. An additional argument can be for instance an explicit theory argument, or an argument denoting the proof tree. The amalgamation is still conservative, but more expressivity is achieved. The approach to proving correctness of the Vanilla meta-interpreter proposed by Levi and Ramundo in [48] uses the S-semantics introduced by Falaschi et al. in [121]. In order to ﬁll the gap between the procedural and declarative interpretations of deﬁnite programs, the S-least Herbrand model MPS of a program P contains not only ground atoms, but all atoms Q(T ) such that t = x θ, where θ is the computed answer substitution for P ∪ {← Q(x)}. The S-semantics is obtained as a variation of the standard semantics of the Horn clause language. Levi and Ramundo [48] and Martens and De Schreye prove (independently) that demo(p(t)) ∈ MVSP iﬀ p(t) ∈ MPS . In the approach of Reﬂective Prolog, axiom schemata are deﬁned at the meta-level, by means of a distinguished predicate solve and of a naming facility. Deduction is performed at any level where there are applicable axioms. This means, conclusions drawn in the basic theory are available (by implicit reﬂection) at the meta-level, and vice versa. The following deﬁnition of RSLD-resolution [65] (SLD-resolution with reﬂection) is independent of the naming mechanism, provided that a suitable uniﬁcation algorithm is supplied. Definition 4 (RSLD-resolution). Let G be a deﬁnite goal ← A1 , . . . , Ak , let Am be the selected atom in G and let C be a deﬁnite clause. The goal (← A1 , . . . , Am−1 , B1 , . . . , Bq , Am+1 , . . . , Ak )θ is derived from G and C using mgu θ iﬀ one of the following conditions holds: i. C is A ← B1 , . . . , Bq θ is a mgu of Am and A ii. C is solve(α) ← B1 , . . . , Bq Am = solve(δ) ↑ Am = α θ is a mgu of α and α

Meta-reasoning: A Survey

281

iii. Am is solve(α) C is A ← B1 , . . . , Bq ↓ α = A θ is a mgu of A and A If the selected atom Am is an object-level atom (e.g p(a, b)), it can be resolved in two ways. First, by using as usual the clauses deﬁning the corresponding predicate (case (i)); for instance, if Am is p(a, b), by using the clauses deﬁning the predicate p. Second, by using the clauses deﬁning the predicate solve (case (ii), upward reﬂection) if the name ↑ Am of Am and α unify with mgu θ; for instance, referring to the N T naming relation deﬁned above, we have ↑ p(a, b) = [p1 , a1 , b1 ] and then a clause with conclusion solve([p1 , v, w]) can be used, with θ = {v/a1 , w/b1 }. If the selected atom Am is solve(α) (e.g solve([q 1 , c1 , d1 ])), again it can be resolved in two ways. First, by using the clauses deﬁning the predicate solve itself, similarly to any other goal (case (i)). Second, by using the clauses deﬁning the predicate corresponding to the atom denoted by the argument α of solve (case (iii), downward reﬂection); for instance, if α is [q 1 , c1 , d1 ] and thus ↓ α = q(c, d), by using the clauses deﬁning the predicate q can be used. In the declarative semantics of Reﬂective Prolog, upward and downward reﬂection are modeled by means of axiom schemata called reﬂection principles. The Least Reﬂective Herbrand Model RMP of program P is the Least Herbrand Model of the program itself, augmented by all possible instances of the reﬂection principles. RMP is the least ﬁxed point of a suitably modiﬁed version of operator TP . Theorem 5 (Properties of RSLD-Resolution). RSLD-resolution is correct and complete w.r.t. RMP

8

Conclusions

In this paper we have discussed the meta-level approach to knowledge representation and reasoning that has its roots in the work of logicians and has played a fundamental role in computer science. We believe in fact that meta-programming and meta-reasoning are essential ingredients for building any complex application and system. We have tried to illustrate to a broad audience what are the main principles meta-reasoning is based upon and in which way these principles have been applied in a variety of languages and systems. We have illustrated how sentences can be arguments of other sentences, by means of naming devices. We have distinguished between amalgamated and separated approaches, depending on whether the meta-expressions are deﬁned in (an extension of) a given language, or in a separate language. We have shown that the diﬀerent levels of knowledge can interact by reﬂection. In our opinion, the choice of logic programming as a basis for metaprogramming and meta-reasoning has several theoretical and practical advantages. ¿From the theoretical point of view, all fundamental issues (including

282

Stefania Costantini

reﬂection) can be coped with on a strong semantic basis. In fact, the usual framework of ﬁrst-order logic can be suitably modiﬁed and extended, as demonstrated by the various existing meta-logic languages. ¿From the practical point of view, in logic programming the meta-level mechanisms are understandable and easy-to-use and this has given rise to several successful applications. We have in fact tried (although necessarily shortly) to revise some of the important applications of meta-programming and meta-reasoning. At the end of this survey, I wish to explicitly acknowledge the fundamental, deep and wide contribution that Robert A. Kowalski has given to this ﬁeld. Robert A. Kowalski initiated meta-programming in logic programming, as well as many of its successful applications, including meta-interpreters, event calculus, logical agents. With his enthusiasm he has given constant encouragement to research in this ﬁeld, and to researchers as well, including myself.

9

Acknowledgements

I wish to express my gratitude to Gaetano Aurelio Lanzarone, who has been the mentor of my research work on meta-reasoning and reﬂection. I gratefully acknowledge Pierangelo Dell’Acqua for his participation to this research and for the important contribution to the study of naming mechanisms and reﬂective resolution. I also wish to mention Jonas Barklund, for the many interesting discussions and the fruitful cooperation on these topics. Many thanks are due to Luigia Carlucci Aiello, for her careful review of the paper, constructive criticism and useful advice. Thanks to Alessandro Provetti for his help. Thanks also to the anonymous referees, for their useful comments and suggestions. Any remaining errors or misconceptions are of course my entire responsibility.

References 1. Hill, P.M., Gallagher, J.: Meta-programming in logic programming. In Gabbay, D., Hogger, C.J., Robinson, J.A., eds.: Handbook of Logic in Artiﬁcial Intelligence and Logic Programming, Vol. 5, Oxford University Press (1995) 2. Barklund, J.: Metaprogramming in logic. In Kent, A., Williams, J.G., eds.: Encyclopedia of Computer Science and Technology. Volume 33. M. Dekker, New York (1995) 205–227 3. Lanzarone, G.A.: Metalogic programming. In Sessa, M.I., ed.: 1985–1995 Ten Years of Logic Programming in Italy. Palladio (1995) 29–70 4. Abramson, H., Rogers, M.H., eds.: Meta-Programming in Logic Programming, Cambridge, Mass., THE MIT Press (1989) 5. Bruynooghe, M., ed.: Proc. of the Second Workshop on Meta-Programming in Logic, Leuven (Belgium), Dept. of Comp. Sci., Katholieke Univ. Leuven (1990) 6. Pettorossi, A., ed.: Meta-Programming in Logic. LNCS 649, Berlin, SpringerVerlag (1992) 7. Fribourg, L., Turini, F., eds.: Logic Program Synthesis and Transformation – Meta-Programming in Logic. LNCS 883, Springer-Verlag (1994)

Meta-reasoning: A Survey

283

8. Barklund, J., Costantini, S., van Harmelen, F., eds.: Proc. Workshop on Meta Programming and Metareasonong in Logic, post-JICSLP96 workshop, Bonn (Germany), UPMAIL technical Report No. 127 (Sept. 2, 1996), Computing Science Dept., Uppsala Univ. (1996) 9. Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 10. Maes, P., Nardi, D., eds.: Meta-Level Architectures and Reﬂection, Amsterdam, North-Holland (1988) 11. Kiczales, G., ed.: Meta-Level Architectures and Reﬂection, Proc. Of the First Intnl. Conf. Reﬂection 96, Xerox PARC (1996) 12. Cointe, A., ed.: Meta-Level Architectures and Reﬂection, Proc. Of the Second Intnl. Conf. Reﬂection 99. LNCS 1616, Berlin, Springer-Verlag (1999) 13. Smorinski, C.: The incompleteness theorem. In Barwise, J., ed.: Handbook of Mathematical Logic. North-Holland (1977) 821–865 14. Smullyan, R.: Diagonalization and Self-Reference. Oxford University Press (1994) 15. Kripke, S.A.: Semantical considerations on modal logic. In: Acta Philosophica Fennica. Volume 16. (1963) 493–574 16. Carlucci Aiello, L., Cialdea, M., Nardi, D., Schaerf, M.: Modal and meta languages: Consistency and expressiveness. In Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 243–266 17. Aiello, M., Weyhrauch, L.W.: Checking proofs in the metamathematics of ﬁrst order logic. In: Proc. Fourth Intl. Joint Conf. on Artiﬁcial Intelligence, Tbilisi, Georgia, Morgan Kaufman Publishers (1975) 1–8 18. Bundy, A., Welham, B.: Using meta-level inference for selective application of multiple rewrite rules in algebraic manipulation. Artiﬁcial Intelligence 16 (1981) 189–212 19. Weyhrauch, R.W.: Prolegomena to a theory of mechanized formal reasoning. Artiﬁcial Intelligence (1980) 133–70 20. Carlucci Aiello, L., Cecchi, C., Sartini, D.: Representation and use of metaknowledge. Proc. of the IEEE 74 (1986) 1304–1321 21. Carlucci Aiello, L., Levi, G.: The uses of metaknowledge in AI systems. In: Proc. European Conf. on Artiﬁcial Intelligence. (1984) 705–717 22. Davis, R., Buchanan, B.: Meta-level knowledge: Overview and applications. In: Procs. Fifth Int. Joint Conf. On Artiﬁcial Intelligence, Los Altos, Calif., Morgan Kaufmann (1977) 920–927 23. Maes, P.: Computational Reﬂection. PhD thesis, Vrije Universiteit Brussel, Faculteit Wetenschappen, Dienst Artiﬁciele Intelligentie, Brussel (1986) 24. Genesereth, M.R.: Metalevel reasoning. In: Logic-87-8, Logic Group, Stanford University (1987) 25. Carlucci Aiello, L., Levi, G.: The uses of metaknowledge in AI systems. In Maes, P., Nardi, D., eds.: Meta-Level Architectures and Reﬂection. North-Holland, Amsterdam (1988) 243–254 26. Carlucci Aiello, L., Nardi, D., Schaerf, M.: Yet Another Solution to the Three Wisemen Puzzle. In Ras, Z.W., Saitta, L., eds.: Methodologies for Intelligent Systems 3: ISMIS-88, Elsevier Science Publishing (1988) 398–407 27. Carlucci Aiello, L., Nardi, D., Schaerf, M.: Reasoning about Knowledge and Ignorance. In: Proceedings of the International Conference on Fifth Generation Computer Systems 1988: FGCS-88, ICOT Press (1988) 618–627 28. Genesereth, M.R., Nilsson, J.: Logical Foundations of Artiﬁcial Intelligence. Morgan Kaufmann, Los Altos, California (1987)

284

Stefania Costantini

29. Russell, S.J., Wefald, E.: Do the right thing: studies in limited rationality (Chapter 2: Metareasoning Architectures). The MIT Press (1991) 30. Carlucci Aiello, L., Cialdea, M., Nardi, D.: A meta level abstract description of diagnosis in Intelligent Tutoring Systems. In: Proceedings of the Sixth International PEG Conference, PEG-91. (1991) 437–442 31. Carlucci Aiello, L., Cialdea, M., Nardi, D.: Reasoning about Student Knowledge and Reasoning. Journal of Artiﬁcial Intelligence and Education 4 (1993) 397–413 32. Dam´ asio, C., Nejdl, W., Pereira, L.M., Schroeder, M.: Model-based diagnosis preferences and strategies representation with logic meta-programming. In Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 267–308 33. Barklund, J., Costantini, S., Dell’Acqua, P., Lanzarone, G.A.: Reﬂection Principles in Computational Logic. Journal of Logic and Computation 10 (2000) 34. Barklund, J.: What is a meta-variable in Prolog? In Abramson, H., Rogers, M.H., eds.: Meta-Programming in Logic Programming. The MIT Press, Cambridge, Mass. (1989) 383–98 35. Hill, P.M., Lloyd, J.W.: Analysis of metaprograms. In Abramson, H., Rogers, M.H., eds.: Meta-Programming in Logic Programming, Cambridge, Mass., THE MIT Press (1988) 23–51 36. Barklund, J., Costantini, S., Dell’Acqua, P., Lanzarone, G.A.: Semantical properties of encodings in logic programming. In Lloyd, J.W., ed.: Logic Programming – Proc. 1995 Intl. Symp., Cambridge, Mass., MIT Press (1995) 288–302 37. van Harmelen, F.: Deﬁnable naming relations in meta-level systems. In Pettorossi, A., ed.: Meta-Programming in Logic. LNCS 649, Berlin, Springer-Verlag (1992) 89–104 38. Cervesato, I., Rossi, G.: Logic meta-programming facilities in Log. In Pettorossi, A., ed.: Meta-Programming in Logic. LNCS 649, Berlin, Springer-Verlag (1992) 148–161 39. Costantini, S.: Semantics of a metalogic programming language. Intl. Journal of Foundation of Computer Science 1 (1990) 40. Perlis, D.: Languages with self-reference I: foundations (or: we can have everything in ﬁrst-order logic!). Artiﬁcial Intelligence 25 (1985) 301–322 41. Perlis, D.: Languages with self-reference II. Artiﬁcial Intelligence 34 (1988) 179– 212 42. Konolige, K.: Reasoning by introspection. In Maes, P., Nardi, D., eds.: Meta-Level Architectures and Reﬂection. North-Holland, Amsterdam (1988) 61–74 43. Genesereth, M.R.: Introspective ﬁdelity. In Maes, P., Nardi, D., eds.: Meta-Level Architectures and Reﬂection. North-Holland, Amsterdam (1988) 75–86 44. van Harmelen, F., Wielinga, B., Bredeweg, B., Schreiber, G., Karbach, W., Reinders, M., Voss, A., Akkermans, H., Bartsch-Sp¨ orl, B., Vinkhuyzen, E.: Knowledgelevel reﬂection. In: Enhancing the Knowledge Engineering Process – Contributions from ESPRIT. Elsevier Science, Amsterdam, The Netherlands (1992) 175– 204 45. Carlucci Aiello, L., Weyhrauch, R.W.: Using Meta-theoretic Reasoning to do Algebra. Volume 87 of Lecture Notes in Computer Science., Springer Verlag (1980) 1–13 46. Bowen, K.A., Kowalski, R.A.: Amalgamating language and metalanguage in logic ˜ arnlund, S.˚ programming. In Clark, K.L., T¨ A., eds.: Logic Programming. Academic Press, London (1982) 153–172 47. McCarthy, J.e.a.: (The LISP 1.5 Programmer’s Manual)

Meta-reasoning: A Survey

285

48. Levi, G., Ramundo, D.: A formalization of metaprogramming for real. In Warren, D.S., ed.: Logic Programming - Procs. of the Tenth International Conference, Cambridge, Mass., The MIT Press (1993) 354–373 49. Subrahmanian, V.S.: Foundations of metalogic programming. In Abramson, H., Rogers, M.H., eds.: Meta-Programming in Logic Programming, Cambridge, Mass., The MIT Press (1988) 1–14 50. Martens, B., De Schreye, D.: Why untyped nonground metaprogramming is not (much of) a problem. J. Logic Programming 22 (1995) 51. Sterling, L., Shapiro, E.Y., eds.: The Art of Prolog. The MIT Press, Cambridge, Mass. (1986) 52. Kowalski, R.A.: Meta matters. invited presentation at Second Workshop on Meta-Programming in Logic META90 (1990) 53. Kowalski, R.A.: Problems and promises of computational logic. In Lloyd, J.W., ed.: Computational Logic. Springer-Verlag, Berlin (1990) 1–36 54. Smith, B.C.: Reﬂection and semantics in Lisp. Technical report, Xerox Parc ISL-5, Palo Alto (CA) (1984) 55. Lemmens, I., Braspenning, P.: A formal analysis of smithinsonian computational reﬂection. (In Cointe, P., ed.: Proc. Reﬂection ’99) 135–137 56. Casaschi, G., Costantini, S., Lanzarone, G.A.: Realizzazione di un interprete riﬂessivo per clausole di Horn. In Mello, P., ed.: Gulp89, Proc. 4th Italian National Symp. on Logic Programming, Bologna (1989 (in italian)) 227–241 57. Friedman, D.P., Sobel, J.M.: An introduction to reﬂection-oriented programming. In Kiczales, G., ed.: Meta-Level Architectures and Reﬂection, Proc. Of the First Intnl. Conf. Reﬂection 96, Xerox PARC (1996) 58. Attardi, G., Simi, M.: Meta–level reasoning across viewpoints. In O’Shea, T., ed.: Proc. European Conf. on Artiﬁcial Intelligence, Amsterdam, North-Holland (1984) 315–325 59. Hill, P.M., Lloyd, J.W.: The G¨ odel Programming Language. The MIT Press, Cambridge, Mass. (1994) 60. Bowers, A.F., Gurr, C.: Towards fast and declarative meta-programming. In Apt, K.R., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 137–166 61. Giunchiglia, F., Cimatti, A.: Introspective metatheoretic reasoning. In Fribourg, L., Turini, F., eds.: Logic Program Synthesis and Transformation – MetaProgramming in Logic. LNCS 883 (1994) 425–439 62. Giunchiglia, F., Traverso, A.: A metatheory of a mechanized object theory. Artiﬁcial Intelligence 80 (1996) 197–241 63. Giunchiglia, F., Seraﬁni, L.: Multilanguage hierarchical logics, or: how we can do without modal logics. Artiﬁcial Intelligence 65 (1994) 29–70 64. Costantini, S., Lanzarone, G.A.: A metalogic programming language. In Levi, G., Martelli, M., eds.: Proc. 6th Intl. Conf. on Logic Programming, Cambridge, Mass., The MIT Press (1989) 218–233 65. Costantini, S., Lanzarone, G.A.: A metalogic programming approach: language, semantics and applications. Int. J. of Experimental and Theoretical Artiﬁcial Intelligence 6 (1994) 239–287 66. Konolige, K.: An autoepistemic analysis of metalevel reasoning in logic programming. In Pettorossi, A., ed.: Meta-Programming in Logic. LNCS 649, Berlin, Springer-Verlag (1992) 67. Dell’Acqua, P.: Development of the interpreter for a metalogic programming language. Degree thesis, Univ. degli Studi di Milano, Milano (1989 (in italian))

286

Stefania Costantini

68. Maes, P.: Concepts and experiments in computational reﬂection. In: Proc. Of OOPSLA’87. ACM SIGPLAN NOTICES (1987) 147–155 69. Kiczales, G., des Rivieres, J., Bobrow, D.G.: The Art of Meta-Object Protocol. The MIT Press (1991) 70. Malenfant, J., Lapalme, G., Vaucher, G.: Objvprolog: Metaclasses in logic. In: Proc. Of ECOOP’89, Cambridge Univ. Press (1990) 257–269 71. Malenfant, J., Lapalme, G., Vaucher, G.: Metaclasses for metaprogramming in prolog. In Bruynooghe, M., ed.: Proc. of the Second Workshop on MetaProgramming in Logic, Dept. of Comp. Sci., Katholieke Univ. Leuven (1990) 272–83 72. Stroud, R., Welch, I.: the evolution of a reﬂective java extension. LNCS 1616, Berlin, Springer-Verlag (1999) 73. Jiang, Y.J.: Ambivalent logic as the semantic basis of metalogic programming: I. In Van Hentenryck, P., ed.: Proc. 11th Intl. Conf. on Logic Programming, Cambridge, Mass., THE MIT Press (1994) 387–401 74. Kalsbeek, M., Jiang, Y.: A vademecum of ambivalent logic. In Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 27–56 75. Kalsbeek, M.: Correctness of the vanilla meta-interpreter and ambivalent syntax. In Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 3–26 76. Christiansen, H.: A complete resolution principle for logical meta-programming languages. In Pettorossi, A., ed.: Meta-Programming in Logic. LNCS 649, Berlin, Springer-Verlag (1992) 205–234 77. Christiansen, H.: Eﬃcient and complete demo predicates for deﬁnite clause languages. Datalogiske Skrifter, Technical Report 51, Dept. of Computer Science, Roskilde University (1994) 78. Brogi, A., Mancarella, P., Pedreschi, D., Turini, F.: Composition operators for logic theories. In Lloyd, J.W., ed.: Computational Logic. Springer-Verlag, Berlin (1990) 117–134 79. Brogi, A., Contiero, S.: Composing logic programs by meta-programming in G¨ odel. In Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 167–194 80. Brogi, A., Turini, F.: Meta-logic for program composition: Semantic issues. In Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 83–110 81. Barklund, J., Boberg, K., Dell’Acqua, P.: A basis for a multilevel metalogic programming language. In Fribourg, L., Turini, F., eds.: Logic Program Synthesis and Transformation – Meta-Programming in Logic. LNCS 883, Berlin, SpringerVerlag (1994) 262–275 82. Barklund, J., Boberg, K., Dell’Acqua, P., Veanes, M.: Meta-programming with theory systems. In Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 195–224 83. Shoham, Y., McDermott, D.: Temporal reasoning. In Encyclopedia of Artiﬁcial Intelligence (ed. Shapiro, S. C.) pp. 967–981, 1987. 84. Kowalski, R.A., Sergot, M.: A logic-based calculus of events. New Generation Computing 4 (1986) 67–95 85. McCarthy, J., Hayes, P.: Some philosophical problems from the standpoint of artiﬁcial intelligence. Machine Intelligence 4 (1969) 463–502 86. Kowalski, R.A.: Database updates in the event calculus. J. Logic Programming (1992) 121–146

Meta-reasoning: A Survey

287

87. Kowalski, R.A., Sadri, F.: The situation calculus and event calculus compared. In: Proc. 1994 Intl. Logic Programming Symp. (1994) 539–553 88. Kowalski, R.A., Sadri, F.: Reconciling the event calculus with the situation calculus. J. Logic Programming 31 (1997) 39–58 89. Provetti, A.: Hypothetical reasoning: From situation calculus to event calculus. Computational Intelligence Journal 12 (1996) 478–498 90. D´ıaz, O., Paton, N.: Stimuli and business policies as modeling constructs: their deﬁnition and validation through the event calculus. In: Proc. of CAiSE’97. (1997) 33–46 91. Sripada, S.: Eﬃcient implementation of the event calculus for temporal database applications. In Lloyd, J.W., ed.: Proc. 12th Intl. Conf. on Logic Programming, Cambridge, Mass., The MIT Press (1995) 99–113 92. Pfenning, F.: The practice of logical frameworks. In Kirchner, H., ed.: Trees in Algebra and Programming - CAAP ’96. LNCS 1059, Linkoping, Sweden, Springer– Verlag (1996) 119–134 93. Clavel, M.G., Eker, S., Lincoln, P., Meseguer, J.: Principles of Maude. In Proc. First Intl Workshop on Rewriting Logic, volume 4 of Electronic Notes in Th. Comp. Sc. (ed. Meseguer, J.), 1996. 94. Clavel, M.G., Duran, F., Eker, S., Lincoln, P., Marti-Oliet, N., Meseguer, J., Quesada, J.: Maude as a metalanguage. In Proc. Second Intl. Workshop on Rewriting Logic, volume 15 of Electronic Notes in Th. Comp. Sc., 1998. 95. Clavel, M.G., Meseguer, J.: Axiomatizing reﬂective logics and languages. In Kiczales, G., ed.: Proc. Reﬂection ’96, Xerox PARC (1996) 263–288 96. Costantini, S., Lanzarone, G.A., Sbarbaro, L.: A formal deﬁnition and a sound implementation of analogical reasoning in logic programming. Annals of Mathematics and Artiﬁcial Intelligence 14 (1995) 17–36 97. Costantini, S., Dell’Acqua, P., Lanzarone, G.A.: Reﬂective agents in metalogic programming. In Pettorossi, A., ed.: Meta-Programming in Logic. LNCS 649, Berlin, Springer-Verlag (1992) 135–147 98. Martin, D.L., Cheyer, A.J., Moran, D.B.: The open agent architecture: a framework for building distributed software systems. Applied Artiﬁcial Intelligence 13(1–2) (1999) 91–128 99. Rao, A.S., Georgeﬀ, M.P.: Modeling rational agents within a BDI-architecture. In Fikes, R., Sandewall, E., eds.: Proceedings of Knowledge Representation and Reasoning (KR&R-91), Morgan Kaufmann Publishers: San Mateo, CA (1991) 473–484 100. Rao, A.S., Georgeﬀ, M.: BDI Agents: from theory to practice. In: Proceedings of the First International Conference on Multi-Agent Systems (ICMAS-95), San Francisco, CA (1995) 312–319 101. J., D., Subrahmanian, V., Pick, G.: Meta-agent programs. J. Logic Programming 45 (2000) 102. Kim, J.S., Kowalski, R.A.: An application of amalgamated logic to multi-agent belief. In Bruynooghe, M., ed.: Proc. of the Second Workshop on Meta-Programming in Logic, Dept. of Comp. Sci., Katholieke Univ. Leuven (1990) 272–83 103. Kim, J.S., Kowalski, R.A.: A metalogic programming approach to multi-agent knowledge and belief. In Lifschitz, V., ed.: Artiﬁcial Intelligence and Mathematical Theory of Computation, Academic Press (1991) 104. Kowalski, R.A., Sadri, F.: Towards a uniﬁed agent architecture that combines rationality with reactivity. In: Proc. International Workshop on Logic in Databases. LNCS 1154, Berlin, Springer-Verlag (1996)

288

Stefania Costantini

105. Kowalski, R.A., Sadri, F.: From logic programming towards multi-agent systems. In Annals of Mathematics and Artiﬁcial Intelligence, Vol. 25, pp. 391–410, 1999. 106. Dell’Acqua, P., Sadri, F., Toni, F.: Combining introspection and communication with rationality and reactivity in agents. In Dix, J., Cerro, F.D., Furbach, U., eds.: Logics in Artiﬁcial Intelligence. LNCS 1489, Berlin, Springer-Verlag (1998) 107. Fung, T.H., R. A. Kowalski, R.A.: The IFF proof procedure for abductive logic programming. J. Logic Programming 33 (1997) 151–165 108. Dell’Acqua, P., Sadri, F., Toni, F.: Communicating agents. In: Proc. International Workshop on Multi-Agent Systems in Logic Programming, in conjunction with ICLP’99, Las Cruces, New Mexico (1999) 109. Costantini, S.: Towards active logic programming. In Brogi, A., Hill, P., eds.: Proc. of 2nd International Workshop on Component-based Software Development in Computational Logic (COCL’99). PLI’99, Paris, France, http://www.di.unipi.it/ brogi/ ResearchActivity/COCL99/ proceedings/index.html (1999) 110. G¨ ardenfors, P.: Belief revision: a vademecum. In Pettorossi, A., ed.: MetaProgramming in Logic. LNCS 649, Berlin, Springer-Verlag (1992) 135–147 111. G¨ ardenfors, P., Roth, H.: Belief revision. In Gabbay, D., Hogger, C., Robinson, J., eds.: Handbook of Logic in Artiﬁcial Intelligence and Logic Programming. Volume 4. Clarendon Press (1995) 36–119 112. Dell’Acqua, P., Pereira, L.M.: Updating agents. (1999) 113. Lamma, E., Riguzzi, F., Pereira, L.M.: Agents learning in a three-valued logical setting. In Panayiotopoulos, A., ed.: Workshop on Machine Learning and Intelligent Agents, in conjunction with Machine Learning and Applications, Advanced Course on Artiﬁcial Intelligence (ACAI’99), Chania (Greece) (1999) (Also available at http://centria.di.fct.unl.pt/∼lmp/). 114. Brewka, G.: Declarative representation of revision strategies. In Baral, C., Truszczynski, M., eds.: NMR’2000, Proc. Of the 8th Intl. Workshop on NonMonotonic Reasoning. (2000) 115. McCarthy, J.: First order theories of individual concepts and propositions. Machine Intelligence 9 (1979) 129–147 116. Lloyd, J.W.: Foundations of Logic Programming, Second Edition. SpringerVerlag, Berlin (1987) 117. Dell’Acqua, P.: Reﬂection principles in computational logic. PhD Thesis, Uppsala University, Uppsala (1998) 118. Dell’Acqua, P.: SLD–Resolution with reﬂection. PhL Thesis, Uppsala University, Uppsala (1995) 119. Jaﬀar, J., Lassez, J.L., Maher, M.J.: A theory of complete logic programs with equality. J. Logic Programming 3 (1984) 211–223 120. Martens, B., De Schreye, D.: Two semantics for deﬁnite meta-programs, using the non-ground representation. In Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 57–82 121. Falaschi, M.and Levi, G., Martelli, M., Palamidessi, C.: A new declarative semantics for logic languages. In Kowalski, R. A.and Bowen, K.A., ed.: Proc. 5th Intl. Conf. Symp. on Logic Programming, Cambridge, Mass., MIT Press (1988) 993–1005

Argumentation-Based Proof Procedures for Credulous and Sceptical Non-monotonic Reasoning Phan Minh Dung1 , Paolo Mancarella2, and Francesca Toni3 1

3

Division of Computer Science, Asian Institute of Technology, GPO Box 2754, Bangkok 10501, Thailand [email protected] 2 Dipartimento di Informatica, Universit` a di Pisa, Corso Italia 40, I-56125 Pisa, Italy [email protected] Department of Computing, Imperial College of Science, Technology and Medicine, 180 Queen’s Gate, London SW7 2BZ, U.K. [email protected]

Abstract. We deﬁne abstract proof procedures for performing credulous and sceptical non-monotonic reasoning, with respect to the argumentation-theoretic formulation of non-monotonic reasoning proposed in [1]. Appropriate instances of the proposed proof procedures provide concrete proof procedures for concrete formalisms for non-monotonic reasoning, for example logic programming with negation as failure and default logic. We propose (credulous and sceptical) proof procedures under diﬀerent argumentation-theoretic semantics, namely the conventional stable model semantics and the more liberal partial stable model or preferred extension semantics. We study the relationships between proof procedures for diﬀerent semantics, and argue that, in many meaningful cases, the (simpler) proof procedures for reasoning under the preferred extension semantics can be used as sound and complete procedures for reasoning under the stable model semantics. In many meaningful cases still, proof procedures for credulous reasoning under the preferred extension semantics can be used as (much simpler) sound and complete procedures for sceptical reasoning under the preferred extension semantics. We compare the proposed proof procedures with existing proof procedures in the literature.

1

Introduction

In recent years argumentation [1,3,4,6,12,15,21,23,24,29,30,32] has played an important role in understanding many non-monotonic formalisms and their semantics, such as logic programming with negation as failure, default logic and autoepistemic logic. In particular, Eshghi and Kowalski [9] have given an interpretation of negation as failure in Logic Programming as a form of assumption based reasoning (abduction). Continuing this line of work, Dung [5] has given A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 289–310, 2002. c Springer-Verlag Berlin Heidelberg 2002

290

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

a declarative understanding of this assumption based view, by formalizing the concept that an assumption can be safely accepted if “there is no evidence to the contrary”. It has also been shown in [5] that the assumption based view provides a unifying framework for diﬀerent semantics of logic programming. Later, this view has been further put forward [1,6,12] by the introduction the notions of attack and counterattacks between sets of assumptions, ﬁnally leading to an argumentation-theoretic understanding of the semantics of logic programming and nonmonotonic reasoning. In particular, Dung [6] has introduced an abstract framework of argumentation, that consists of a set of arguments and an attack relation between them. However, this abstract framework leaves open the question of how the arguments and their attack relationship are deﬁned. Addressing this issue, Bondarenko et al. [1] has deﬁned an abstract, argumentation-theoretic assumption-based framework to non-monotonic reasoning that can be instantiated to capture many of the existing approaches to non-monotonic reasoning, namely logic programming with negation as failure, default logic [25], (many cases of) circumscription [16], theorist [22], autoepistemic logic [18] and nonmonotonic modal logics [17]. The semantics of argumentation can be used to characterize a number of alternative semantics for non-monotonic reasoning, each of which can be the basis for credulous and sceptical reasoning. In particular, three semantics have been proposed in [1,6] generalizing, respectively, the semantics of admissible scenaria for logic programming [5], the semantics of preferred extensions [5] or partial stable models [26] for logic programming, and the conventional semantics of stable models [10] for logic programming as well as the standard semantics of theorist [22], circumscription [16], default logic [25], autoepistemic logic [18] and non-monotonic modal logic [17]. More in detail, Bondarenko et al. understand non-monotonic reasoning as extending theories in some monotonic language by means of sets of assumptions, provided they are “appropriate” with respect to some requirements. These are expressed in argumentation-theoretic terms, as follows. According to the semantics of admissible extensions, a set of assumptions is deemed “appropriate” iﬀ it does not attack itself and it attacks all sets of assumptions which attack it. According to the semantics of preferred extensions, a set of assumptions is deemed “appropriate” iﬀ it is maximally admissible, with respect to set inclusion. According to the semantics of stable extensions, a set of assumptions is deemed “appropriate” iﬀ it does not attack itself and it attacks every assumption which it does not belong. Given any such semantics of extensions, credulous and sceptical non-monotonic reasoning can be deﬁned, as follows. A given sentence in the underlying monotonic language is a credulous non-monotonic consequence of a theory iﬀ it holds in some extension of the theory that is deemed “appropriate” by the chosen semantics. It is a sceptical non-monotonic consequence iﬀ it holds in all extensions of the theory that are deemed “appropriate” by the chosen semantics. In this paper we propose abstract proof procedures for performing credulous and sceptical reasoning under the three semantics of admissible, preferred and

Argumentation-Based Proof Procedures

291

stable extensions, concentrating on the special class of ﬂat frameworks. This class includes logic programming with negation as failure and default logic. We deﬁne all proof procedures parametrically with respect to a proof procedure computing the semantics of admissible extensions. A number of such procedures have been proposed in the literature, e.g. [9,5,7,8,15]. We argue that the proof procedures for reasoning under the preferred extension semantics are “simpler” than those for reasoning under the stable extension semantics. This is an interesting argument in that, in many meaningful cases (e.g. when the frameworks are order-consistent [1]), the proof procedures for reasoning under the preferred extension semantics can be used as sound and complete procedures for reasoning under the stable model semantics. The paper is organized as follows. Section 2 summarises the main features of the approach in [1]. Section 3 gives some preliminary deﬁnitions, used later on in the paper to deﬁne the proof procedures. Sections 4 and 5 describe the proof procedures for performing credulous reasoning under the preferred and stable extension semantics, respectively. Sections 6 and 7 describe the proof procedures for performing sceptical reasoning under the stable and preferred extension semantics, respectively. Section 8 compares the proposed proof procedures with existing proof procedures proposed in the literature. Section 9 concludes.

2

Argumentation-Based Semantics

In this section we brieﬂy review the notion of assumption-based framework [1], showing how it can be used to extend any deductive system for a monotonic logic to a non-monotonic logic. A deductive system is a pair (L, R) where – L is a formal language consisting of countably many sentences, and – R is a set of inference rules of the form α1 , . . . , αn α where α, α1 , . . . , αn ∈ L and n ≥ 0. If n = 0, then the inference rule is an axiom. A set of sentences T ⊆ L is called a theory. A deduction from a theory T is a sequence β1 , . . . , βm , where m > 0, such that, for all i = 1, . . . , m, – βi ∈ T , or α1 , . . . , αn in R such that α1 , . . . , αn ∈ {β1 , . . . , βi−1 }. – there exists βi T α means that there is a deduction (of α) from T whose last element is α. T h(T ) is the set {α ∈ L | T α}. Deductive systems are monotonic, in the sense that T ⊆ T implies T h(T ) ⊆ T h(T ). They are also compact, in the sense that T α implies T α for some ﬁnite subset T of T . Given a deductive system (L, R), an argumentation-theoretic framework with respect to (L, R) is a tuple T, Ab, where

292

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

– T, Ab ⊆ L, Ab = {} is a mapping from Ab into L. α is called the contrary of α. – The theory T can be viewed as a given set of beliefs, and Ab as a set of candidate assumptions that can be used to extend T . An extension of a theory T is a theory T h(T ∪ ∆), for some ∆ ⊆ Ab. Sometimes, informally, we refer to the extension simply as T ∪ ∆ or ∆. Given a deductive system (L, R) and an argumentation-theoretic framework

T, Ab, with respect to (L, R), the problem of determining whether a given sentence σ in L is a non-monotonic consequence of the framework is understood as the problem of determining whether there exist “appropriate” extensions ∆ ⊆ Ab of T such that T ∪ ∆ σ. In particular, σ is a credulous non-monotonic consequence of T, Ab, if there exists some “appropriate” extension of T . Many logics for default reasoning are credulous in this same sense, diﬀering however in the way they understand what it means for an extension to be “appropriate”. Some logics, in contrast, are sceptical, in the sense they they require that σ belong to all “appropriate” extensions. However, the semantics of any of these logics can be made sceptical or credulous, simply by varying whether a sentence is deemed to be a non-monotonic consequence of a theory if it belongs to all “appropriate” extensions or if it belongs to some “appropriate” extension. A number of notions of “appropriate” extensions are given in [1], for any argumentation-theoretic framework T, Ab, with respect to (L, R). All these notions are formulated in argumentation-theoretic terms, with respect to a notion of “attack” deﬁned as follows. Given a set of assumptions ∆ ⊆ Ab: – ∆ attacks an assumption α ∈ Ab iﬀ T ∪ ∆ α – ∆ attacks a set of assumptions ∆ ⊆ Ab iﬀ ∆ attacks an assumption α, for some α ∈ ∆ . In this paper we will consider the notions of “stable”, “admissible” and “preferred” extensions, deﬁned below. Let a set of assumptions ∆ ⊆ Ab be closed iﬀ ∆ = {α ∈ Ab | T ∪ ∆ α}. Then, ∆ ⊆ Ab is stable if and only if 1. ∆ is closed, 2. ∆ does not attack itself, and 3. ∆ attacks α, for every assumption α ∈ ∆. Furthermore, ∆ ⊆ Ab is admissible if and only if 1. ∆ is closed, 2. ∆ does not attack itself, and 3. for each closed set of assumptions ∆ ⊆ Ab, if ∆ attacks ∆ then ∆ attacks ∆ . Finally, ∆ ⊆ Ab is preferred if and only if ∆ is maximally admissible, with respect to set inclusion.

Argumentation-Based Proof Procedures

293

In general, every admissible extension is contained in some preferred extension. Moreover, every stable extension is preferred (and thus admissible) [1] but not vice versa. However, in many cases, e.g. for stratiﬁed and order-consistent argumentation-theoretic frameworks (see [1]), preferred extensions are always stable1 . In this paper we concentrate on ﬂat frameworks [1], namely frameworks in which every set of assumptions ∆ ⊆ Ab is closed. For this kind of frameworks, the deﬁnitions of admissible and stable extensions can be simpliﬁed by dropping condition 1 and by dropping the requirement that ∆ be closed in condition 3 of the deﬁnition of admissible extension. In general, if the framework is ﬂat, both admissible and preferred extensions are guaranteed to exist. Instead, even for ﬂat frameworks, stable extensions are not guaranteed to exist. However, in many cases, e.g. for stratiﬁed argumentation-theoretic frameworks [1], stable extensions are always guaranteed to exist. Diﬀerent logics for default reasoning diﬀer, not only in whether they are credulous or sceptical and how they interpret the notion of what it means to be an “appropriate” extension, but also in their underlying framework. Bondarenko et al. [1] show how the framework can be instantiated to obtain theorist [22], (some cases of) circumscription [16], autoepistemic logic [18], nonmonotonic modal logics [17], default logic [25], and logic programming, with respect to, e.g., the semantics of stable models [10] and partial stable models [26], the latter being equivalent [13] to the semantics of preferred extensions [5]. They also prove that the instances of the framework for default logic and logic programming are ﬂat. Default logic is the instance of the abstract framework T, Ab, where the is ﬁrst-order logic augmented with domain-speciﬁc inference rules of the form α1 , . . . , αm , M β1 , . . . , M βn γ where αi , βj , γ are sentences in classical logic. T is a classical theory and Ab consists of all expressions of the form M β where β is a sentence of classical logic. The contrary M β of an assumption M β is ¬β. The conventional semantics of extensions of default logic [25] corresponds to the semantics of stable extensions of the instance of the abstract framework for default logic [1]. Moreover, default logic inherits the semantics of admissible and preferred extensions, simply by being an instance of the framework. Logic programming is the instance of the abstract framework T, Ab, where T is a logic program, the assumptions in Ab are all negations not p of atomic sentences p, and the contrary not p of an assumption is p. is Horn logic provability, with assumptions, not p, understood as new atoms, p∗ (see [9]). The logic programming semantics of stable models [10], admissible scenaria [5], and partial stable models [26]/preferred extensions [5] correspond to the semantics of stable, admissible and preferred extensions, respectively, of the instance of the abstract framework for logic programming [1]. 1

See the Appendix for the deﬁnition of stratiﬁed and order-consistent frameworks.

294

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

In the remainder of the paper we will concentrate on computing credulous and sceptical consequences under the semantics of preferred and stable extensions. We will rely upon a proof procedure for computing credulous consequences under the semantics of admissible extensions (see Sect. 8 for a review of such procedures). Note that we ignore the problem of computing sceptical consequences under the semantics of admissible extensions as, for ﬂat frameworks, this problem reduces to that of computing monotonic consequences in the underlying deductive system. Indeed, in ﬂat frameworks, the empty set of assumptions is always admissible. We will propose abstract proof procedures, but, for simplicity, we will illustrate their behaviour within the concrete instance of the abstract framework for logic programming.

3

Preliminaries

In the sequel we assume that a framework is given and we omit mentioning it explicitly if clear by the context. Let S be a set of sets. A subset B of S is called a base of S if for each element s in S there is an element b in B such that b ⊆ s. We assume that the following procedures are deﬁned, where α is a sentence in L and ∆ ⊆ Ab is a set of assumptions: – support(α, ∆) computes a set of sets ∆ ⊆ Ab such that α ∈ T h(T ∪ ∆ ) and ∆ ⊇ ∆. support(α, ∆) is said to be complete if it is a base of the set {∆ ⊆ Ab|α ∈ T h(T ∪ ∆ ) and ∆ ⊇ ∆}. – adm expand(∆) computes a set of sets ∆ ⊆ Ab such that ∆ ⊇ ∆ and ∆ is admissible. adm expand(∆) is said to be complete if it is a base of the set of all admissible supersets of ∆. We will assume that the above procedures are nondeterministic. We will write, e.g. A := support(α, ∆) meaning that the variable A is assigned, if any, a result of the procedure support. Such a statement represents a backtracking point, which may eventually fail if no further result can be produced by support. The following example illustrates the above procedures. Example 1. Consider the following logic program p ← q, not r q ← not s t ← not h f

Argumentation-Based Proof Procedures

295

and the sentence p. Possible outcomes of the procedure support(p, {}) are ∆1 = {not s, not r} and ∆2 = {not s, not r, not f }. Possible outcomes of the procedure adm expand(∆1 ) are ∆1 and ∆1 ∪ {not h}. No possible outcomes exist for adm expand(∆2 ). Note that diﬀerent implementations for the above procedures are possible. In all examples in the remainder of the paper we will assume that support and adm expand return minimal sets. In the above example, ∆1 is a minimal support whereas ∆2 is not, and ∆1 is a minimal admissible expansion of ∆1 whereas ∆1 ∪ {not h} is not.

4

Computing Credulous Consequences under Preferred Extensions

To show that a sentence is a credulous consequence under the preferred extension semantics, we simply need to check the existence of an admissible set of assumptions which entails the desired sentence. This can be done by: – ﬁnding a support set for the sentence – showing that the support set can be extended into an admissible extension. Proof procedure 4.1 (Credulous Preferred Extensions). CP E(α): S := support(α, {}); ∆ := adm expand(S); return ∆ Notice that the two assignments in the procedure are backtracking points, due to the nondeterministic nature of both support and adm expand. Example 2. Consider the following logic program p ← not s s←q q ← not r r ← not q and the sentence p. The procedure CP E(p) will perform the following steps: – ﬁrst the set S = {not s} is generated by support(p, {}) – then the set ∆ = {not s, not q} is generated by adm expand(S) – ﬁnally, ∆ is the set returned by the procedure Consider now the conjunction p, q. The procedure CP E((p, q))2 would fail, since – S = {not s, not r} is generated by support((p, q), {}) – there exists no admissible set ∆ ⊇ S. 2

Note that, in the instance of the framework of [1] for logic programming, conjunction of atoms are not part of the underlying deductive system. However, conjunctions can be accommodated by additional program clauses. E.g., in the given example, the logic program can be extended by t ← p, q, and CPE can be called for t.

296

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

Theorem 1 (Soundness and Completeness of CP E). 1. If CP E(α) succeeds then there exists a preferred extension ∆ such that α ∈ T h(T ∪ ∆). 2. If both support and adm expand are complete then for each preferred extension E there exist appropriate selections such that CP E(α) returns ∆ ⊆ E. Proof. 1. It follows immediately from the fact that each admissible set of assumptions could be extended into a preferred extension. 2. Let E be a preferred extension such that α ∈ T h(T ∪E). Since support(α, {}) is complete, there is a set S ⊂ E such that S could be computed by support(α, {}). From the completeness of adm expand, it follows that there is ∆ ⊆ E such that ∆ is computed by adm expand(S).

5

Computing Credulous Consequences under Stable Extensions

A stable model is nothing but a preferred extension which entails either α or its contrary, for each assumption α [1]. Hence, to show that a sentence is a credulous consequence under the stable model semantics, we simply need to ﬁnd an admissible extension which entails the sentence and which can be extended into a stable model. We assume that the following procedures are deﬁned: – f ull cover(Γ ) returns true iﬀ the set of sentences Γ entails any assumption or its contrary, f alse otherwise; – uncovered(Γ ) nondeterministically returns, if any, an assumption which is undeﬁned, given Γ , i.e. neither the assumption nor its contrary is entailed by Γ . In the following procedure CSM , both f ull cover and uncovered will be applied to sets of assumptions only. Proof procedure 5.1 (Credulous Stable Models). CSM (α): ∆ := CP E(α); loop if f ull cover(∆) then return ∆ else β := uncovered(∆) ∆ := adm expand(∆ ∪ {β}); end if end loop

Argumentation-Based Proof Procedures

297

Note that CSM is a non-trivial extension of CP E: once an admissible extension is selected, as in CP E, CSM needs to further expand the selected admissible extension, if possible, to render it stable. This is achieved by the main loop in the procedure. Clearly, the above procedure may not terminate if the underlying framework

T, Ab, contains inﬁnitely many assumptions, since in this case the main loop may go on forever. In the following theorem we assume that the set of assumptions Ab is ﬁnite. Theorem 2 (Soundness and Completeness of CSM ). Let T, Ab, be a framework such that Ab is ﬁnite. 1. If CSM (α) succeeds then there exists a stable extension ∆ such that α ∈ T h(T ∪ ∆). 2. If both support and adm expand are complete then for each stable extension E such that α ∈ T h(T ∪ E) there exist appropriate selections such that CSM (α) returns E. Proof. The theorem follows directly from theorem 3.

The CSM procedure is based on backward-chaining in contrast to the procedure of Niemel¨a et al. [19,20] that is based on forward-chaining. We explain the diﬀerence between the two procedures in the following example. Example 3. p ← not q q ← not r r ← not q Assume that the given query is p. The CSM procedure would compute {not q} as a support for p. The procedure adm expand({not q}) will produce ∆ = {not q} as its result. Since ∆ covers all assumptions, ∆ is the result produced by the procedure. Niemel¨ a et. al procedure would start by picking an arbitrary element from {not p, not q, not r} and start to apply the Fitting operator to it to get a ﬁxpoint. For example, not r may be selected. Then the set B = {q, not r} is obtained. Since there is no conﬂict in B and B does not cover all the assumptions, not p will be selected. Since {not p, q, not r} covers all assumptions, a test to check whether p is implied from it is performed with f alse as the result. Therefore backtracking will be made and not q will be selected leading to the expected result. A drawback of Niemel¨a et. al procedure is that it may have to make too many unnecessary choices as the above example shows. However forward chaining may help in getting closer to the solution more eﬃciently. The previous observations suggest a modiﬁcation of the procedure which tries to combine both backward and forward chaining. This can be seen as an integration of ours and Niemel¨a et. al procedures. In the new procedure, CSM 2, we make use of some additional procedures and notations:

298

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

– Given a set of sentences Γ , Γ − denotes the set of assumptions contained in Γ. – A set of sentences Γ is said to be coherent if Γ − is admissible and Γ ⊆ T h(T ∪ Γ ), – Given a set of sentences Γ , expand(Γ ) deﬁnes a forward expansion of Γ satisfying the following conditions: 1. Γ ⊆ expand(Γ ) 2. If Γ is coherent then (a) expand(Γ ) is also coherent, and (b) for each stable extension E, if Γ − ⊆ E then expand(Γ )− ⊆ E. Proof procedure 5.2 (Credulous Stable Models). CSM 2(α): ∆ := CP E(α); Γ := expand(∆); loop if f ull cover(Γ ) then return Γ − else β := uncovered(Γ ); ∆ := adm expand(Γ − ∪ {β}); Γ := expand(∆ ∪ Γ ); end if end loop As anticipated, the procedure expand can be deﬁned in various ways. If expand is simply the identity function, i.e. expand(∆) = ∆ the procedure CSM 2 collapses down to CSM . In some other cases, expand could also eﬀectively perform forward reasoning, and try to produce the deductive closure of the given set of sentences. This can be achieved by deﬁning expand in such a way that expand(∆) = T h(T ∪ ∆). In still other cases, expand(∆) could be extended to be closed under the Fitting’s operator. As in the case of Theorem 2, we need to assume that the set of assumptions in the underlying framework is ﬁnite, in order to prevent non termination of the main loop. Theorem 3 (Soundness and Completeness of CSM 2). Let T, Ab, be a framework such that Ab is ﬁnite. 1. If CSM 2(α) succeeds then there exists a stable extension ∆ such that α ∈ T h(T ∪ ∆). 2. If both CP E and adm expand are complete then for each stable extension E such that α ∈ T h(T ∪ E) there exist appropriate selections such that CSM 2(α) returns E.

Argumentation-Based Proof Procedures

299

Proof. 1. We ﬁrst prove by induction that at the beginning of each iteration of the loop, Γ is coherent. The basic step is clear since ∆ is admissible. Inductive Step: Let Γ be coherent. From ∆ := adm expand(Γ − ∪ {β}), it follows that ∆ is admissible. Because Γ − ⊆ ∆ and Γ ⊆ T h(T ∪ Γ ), it follows that Γ ⊆ T h(T ∪ ∆). From (∆ ∪ Γ )− = ∆, it follows that ∆ ∪ Γ is coherent. Therefore expand(∆ ∪ Γ ) is coherent. It is obvious that for any coherent set of sentences Γ such that f ull cover(Γ ) holds, Γ − is stable. 2. Let E be a stable model such that α ∈ T h(T ∪E). Because CP E is complete, there is a selection such that executing the command ∆ := CP E(α) yields an admissible ∆ ⊆ E. From the properties of expand, it follows that Γ obtained from Γ := expand(∆), is coherent and Γ − ⊆ E. If f ull cover(Γ ) does not hold, then we can always select a β ∈ E − Γ − . Therefore due to the completeness of adm expand, we can get a ∆ that is a subset of E. Hence Γ obtained from Γ := expand(∆ ∪ Γ ), is coherent and Γ − ⊆ E. Continuing this process until termination, which is guaranteed by the hypothesis that Ab is ﬁnite, will return E as the result of the procedure. However, if in the underlying framework every preferred extension is also stable, then CSM can be greatly simpliﬁed by dropping the main loop, namely CSM coincides with CP E. As shown in [1], this is the case if the underlying framework is order-consistent (see Appendix). Theorem 4 (Soundness and completeness of CP E wrt stable models and order consistency). Let the underlying framework be order-consistent. 1. If CP E(α) succeeds then there exists a stable extension ∆ such that α ∈ T h(T ∪ ∆). 2. If both support and adm expand are complete then for each stable extension E there exist appropriate selections such that CP E(α) returns ∆ ⊆ E. The use of CP E instead of CSM , whenever possible, greatly simpliﬁes the task of performing credulous reasoning under the stable semantics, in that it allows to keep the search for a stable extension “localised”, as illustrated by the following example. Example 4. Consider the following order-consistent logic program p ← not s q ← not r r ← not q which has two preferred (and stable) extensions containing p, corresponding to the sets of assumptions ∆1 = {not s, not r} and ∆2 = {not s, not q}. The

300

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

procedure CP E(p) would compute the admissible extension {not s} as a result, since {not s} is a support for p and it is admissible (there are no attacks against not s) . On the other hand, the procedure CSM (p) would produce either ∆1 or ∆2 , which are both stable sets extending {not s}.

6

Computing Sceptical Consequences under Stable Extensions

First, we deﬁne the notion of “contrary of sentences”, by extending the notion of “contrary of assumptions”. In all concrete instances of the abstract framework, e.g. logic programming, default logic, autoepistemic logic and non-monotonic modal logic, for each non-assumption sentence β there is a unique assumption α such that α = β, so the natural way of deﬁning the “contrary of a sentence” β which is not an assumption is β = α such that α = β. But in general, it is possible that for some non-assumption sentence β there may be no assumption α such that α = β, or there may be more than one assumption α such that α = β. Thus, for general frameworks, we deﬁne the concept of contrary of sentences which are not assumptions as follows. Let β be a sentence such that β ∈ Ab. – if there exists α such that α = β then β = {γ|γ = β} – if there exists no α such that α = β then we introduce a new assumption κβ , not already in the language, and we deﬁne • κβ = β • β = {κβ } Note that, in this way, the contrary of a sentence β ∈ / Ab is a set of assumptions. Let us denote by Ab ⊇ Ab the new set of assumptions. It is easy to see that the original framework, T, Ab, , and the extended framework, T, Ab , , are equivalent in the following sense: – if ∆ ⊆ Ab is admissible wrt the original framework then it is also admissible wrt the new framework; – if ∆ ⊆ Ab is admissible wrt the new framework then ∆ ∩ Ab is admissible wrt the original framework. Therefore from now on, we will assume that for each sentence β which is not an assumption there exists at least an assumption α such that α = β. In order to show that a sentence β is entailed by each stable model, we can proceed as follows: – check that β is a credulous consequence under the stable model semantics – check that the contrary of the sentence is not a credulous consequence under the stable models semantics.

Argumentation-Based Proof Procedures

301

Notice that if β ∈ / Ab the second step amounts to checking that each α ∈ β is not a credulous consequence under the stable models semantics. Moreover, notice that the ﬁrst step of the computation cannot be omitted (as one could expect) since there may be cases in which neither β nor its contrary hold in any stable model (e.g. in the framework corresponding to the logic program p ← not p). Lemma 1. Let E be a stable extension. Then for each non-assumption β such that β ∈ T h(T ∪ E), the following statements are equivalent: 1. β ∩ E = ∅ 2. β ⊆ E Proof. It is clear that the second condition implies the ﬁrst. We need only to prove now that the ﬁrst condition implies the second one. Let β ∩E = ∅. Suppose that β −E = ∅. Let α ∈ β −E. Then it is clear that α ∈ T h(T ∪E). Contradiction to the condition that α = β and β ∈ T h(T ∪ E). Proof procedure 6.1 (Sceptical Stable Models). SSM (α): if CSM (α) fails then fail; select β ∈ α; if CSM (β) succeeds then fail; Notice that the SSM procedure makes use of the CSM procedure. To prevent non termination of CSM we need to assume that the set of assumptions Ab of the underlying extended framework is ﬁnite. This guarantees the completeness of CSM (cfr. Theorem 2). Theorem 5 (Soundness and Completeness of SSM ). Let CSM be complete. 1. If SSM (α) succeeds then α ∈ T h(T ∪ ∆), for every stable extension ∆. 2. If α ∈ T h(T ∪ ∆), for every stable extension ∆, and the set of stable extensions is not empty, then SSM (α) succeeds. Proof. 1. Let SSM (α) succeed. Assume now that α is not a skeptical consequence wrt stable semantics. There are two cases: α ∈ Ab and α ∈ Ab. Consider the ﬁrst case where α ∈ Ab. It follows that there is a stable extension E such that α ∈ T h(T ∪ E). Because of the completeness of CSM, it follows that CSM (α) succeeds. Hence SSM (α) fails, contradiction. Let α ∈ Ab. From lemma 1, it follows that there is a stable extension E such that E ∩ α = ∅. That means CSM (β) succeeds for some β ∈ α. Lemma 1 implies CSM (β) succeeds for each β ∈ α. Hence SM M (α) fails. Contradiction.

302

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

2. Because CSM is complete, it is clear that CSM (α) succeeds. Also because of the soundness of CSM, CSM (β) fails for each β ∈ α. Therefore it is obvious that SSM succeeds. For a large class of argumentation frameworks, preferred extensions and stable models semantics coincide, e.g. if the frameworks are order-consistent [1]. In these frameworks, the procedure SSM can be simpliﬁed signiﬁcantly as follows. Proof procedure 6.2 (Sceptical Stable Models via CP E). SSM P E(α): if CP E(α) fails then fail; select β ∈ α; if CP E(β) succeeds then fail ; The procedure is structurally the same as the earlier SSM , but it relies upon CP E rather than CSM , and is therefore “simpler” in the same way that CP E is “simpler” than CSM , as discussed earlier in Sect. 5. Theorem 6 (Soundness and completeness of SSM P E wrt sceptical stable semantics). Let the underlying framework be order-consistent and CPE be complete. 1. If SSM P E(α) succeeds then α ∈ T h(T ∪ ∆), for every stable extension ∆. 2. If α ∈ T h(T ∪ ∆), for every stable extension ∆, then SSM P E(α) succeeds. Note that the second statement in the above theorem does not require the existence of stable extensions. This is due to the assumption that order-consistency always guarantees such condition.

7

Computing Sceptical Consequences under Preferred Extensions

The naive way of showing that a sentence is a sceptical consequence under the preferred extensions semantics is to consider each preferred extension in turn and check that the sentence is entailed by it. The earlier procedure SSM P E can be used as a simpliﬁcation of the naive method only if every preferred extension is guaranteed to be stable. In general, however, the procedure SSM P E is not sound under the preferred extensions semantics, since there might exist preferred extensions in which, for some assumption α, neither α nor its contrary hold, as the following example shows. Example 5. p ← not p p←q q ← not r r ← not q

Argumentation-Based Proof Procedures

303

Notice that there are two preferred extensions,namely E1 = {not q, r} and E2 = {not r, q, p}. E2 is also a stable extension, whereas E1 is not since neither p nor not p hold in E1 . Notice that SSM P E(p) would succeed, hence giving an unsound result. Nonetheless, in the general case, the following theorem shows that it is possible to restrict the number of preferred extensions to consider. This theorem is a variant of theorem 16 in [30], as we will discuss in Sect. 8. Theorem 7. Given an argumentation-theoretic framework T, Ab, and a sentence α in its language, α is a sceptical non-monotonic consequence of T with respect to the preferred extension semantics, i.e. α ∈ T h(T ∪ ∆) for all preferred ∆ ⊆ Ab, iﬀ 1. α ∈ T h(T ∪ ∆0 ), for some admissible set of assumptions ∆0 ⊆ Ab, and 2. for every set of assumptions ∆ ⊆ Ab, if ∆ is admissible and ∆ attacks ∆0 , then α ∈ T h(T ∪ ∆ ) for some set of assumptions ∆ ⊆ Ab such that (a) ∆ ⊇ ∆, and (b) ∆ is admissible. Proof. The only if half is trivial. The if half is proved by contradiction. Suppose there exists a set of assumptions ∆∗ such that ∆∗ is preferred and α ∈ T h(T ∪ ∆∗ ). Suppose ∆0 is the set of assumptions provided in part 1. If ∆0 = ∅ then α ∈ T h(T ) and therefore α ∈ T h(T ∪ ∆∗ ), thus contradicting the hypothesis. Therefore, ∆0 = ∅. Consider the following two cases: (i) ∆∗ ∪ ∆0 attacks itself, or (ii) ∆∗ ∪ ∆0 does not attack itself. Case (ii) implies that ∆∗ ∪ ∆0 is admissible, thus contradicting the hypothesis that ∆∗ is preferred (and therefore maximally admissible). Case (i) implies that (i.1) ∆∗ ∪ ∆0 attacks ∆∗ , or (i.2) ∆∗ ∪ ∆0 attacks ∆0 . Assume that (i.1) holds. . ∆∗ ∪ ∆0 attacks ∆∗ ⇒ {by admissibility of ∆∗ } ∆∗ attacks ∆∗ ∪ ∆0 ⇒ {by admissibility, ∆∗ does not attack itself} ∆∗ attacks ∆0 ⇒ {by part 2 } α ∈ T h(T ∪ ∆∗ ) thus contradicting the hypothesis.

304

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

Assume now that (i.2) holds. ∆∗ ∪ ∆0 attacks ∆0 ⇒ {by admissibility of ∆0 } ∆0 attacks ∆∗ ∪ ∆0 ⇒ {by admissibility, ∆0 does not attack itself} ∆0 attacks ∆∗ ⇒ {by admissibility of ∆∗ } ∆∗ attacks ∆0 ⇒ {by part 2 } α ∈ T h(T ∪ ∆∗ ) thus contradicting the hypothesis.

This result can be used to deﬁne the following procedure to check whether or not a given sentence is a sceptical consequence with respect to the preferred extension semantics. Let us assume the following procedure is deﬁned – attacks(∆) computes a base of the set of all attacks against the set of assumptions ∆. Proof procedure 7.1 (Sceptical Preferred Extensions). SP E(α): ∆ := CP E(α); for each A := attacks(∆) for each ∆ := adm expand(A) ∆ := support(α, ∆ ); if adm expand(∆ ) fails then fail end if end for end for

The following soundness theorem is a trivial corollary of theorem 7. Theorem 8 (Soundness and Completeness of SP E). Let adm expand be complete. 1. if SP E(α) succeeds, then α ∈ T h(T ∪ ∆), for every preferred extension ∆. 2. If CP E is complete and α ∈ T h(T ∪ ∆), for every preferred extension ∆, then SP E(α) succeeds. In many cases where the framework has exactly one preferred extension that is also stable (for example when the framework is stratiﬁed), it is obvious that the CPE procedure could be used as a procedure for skeptical preferred extension semantics.

Argumentation-Based Proof Procedures

8

305

Related Work

The proof procedures we propose in this paper rely upon proof procedures for computing credulous consequences under the semantics of admissible extensions. A number of such procedures have been proposed in the literature. Eshghi and Kowalski [9] (see also the revised version proposed by Dung in [5]) propose a proof procedure for logic programming based upon interleaving abductive derivations, for the generation of negative literals to “derive” goals, and consistency derivations, to check “consistency” of negative literals with atoms “derivable” from the program. The proof procedure can be understood in argumentation-theoretic terms [12], as interleaving the generation of assumptions supporting goals or counter-attacking assumptions (abductive derivations) and the generation of attacks against any admissible support (consistency derivations), while checking that the generated support does not attack itself. Dung, Kowalski and Toni [7] propose abstract proof procedures for computing credulous consequences under the semantics of admissible extensions, deﬁned via logic programs. Kakas and Toni [15] propose a number of proof procedures based on the construction of trees whose nodes are sets of assumptions, and such that nodes attack their parents, if any. The proof procedures are deﬁned in abstract terms and, similarly to the procedures we propose in this paper, can be adopted for any concrete framework that is an instance of the abstract one. The procedures allow to compute credulous consequences under the semantics of admissible extensions as well as under semantics that we have not considered in this paper, namely the semantics of weakly stable extensions, acceptable extensions, well-founded extensions. The concrete procedure for computing credulous consequences under the semantics of admissible extensions, in the case of logic programming, corresponds to the proof procedure of [9]. Dung, Kowalski and Toni [8] also propose abstract proof procedures for computing credulous consequences under the semantics of admissible extensions, that can be instantiated to any instance of the framework of [1]. These procedures are deﬁned in terms of trees whose nodes are assumptions, as well as via derivations as in [9]. Kakas and Dimopoulos [2] propose a proof procedure to compute credulous consequences under the semantics of admissible extensions for the argumentation framework of Logic Programming without Negation as Failure proposed in [14]. Here, negation as failure is replaced and extended by priorities over logic programs with no negation as failure but with explicit negation instead. Other proof procedures for computing credulous consequences under the stable extension semantics and sceptical consequences under the semantics of preferred and stable extensions have been proposed. Thielscher [30] proposes a proof procedure for computing sceptical consequences under the semantics of preferred extensions for the special case of logic programming [31]. This proof procedure is based upon a version of theorem 7 (theorem 16 in [30]). However, whereas [30] uses the notion of “conﬂict-free set

306

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

of arguments” (which is an atomic, abstract notion), we use the notion of admissible set of assumptions. Moreover, theorem 16 in [30] replaces the condition in part 2 of theorem 7 “∆ attacks ∆0 ” by the (equivalent) condition corresponding to “∆ ∪ ∆0 attacks itself”. For a formal correspondence between the two approaches see [31]. Niemel¨a [19] and Niemel¨ a and Simons [20] give proof procedures for computing credulous and sceptical consequences under stable extensions, for default logic and logic programming, respectively. As discussed in Sect. 5, their proof procedures for computing credulous consequences under stable extensions rely upon forward chaining, whereas the proof procedures we propose for the same task rely either on backward chaining (CSM) or on a combination of backward and forward chaining (CSM2). Satoh and Iwayama [28] deﬁne a proof procedure for logic programming, computing credulous consequences under the stable extension semantics for rangerestricted logic programs that admit at least one stable extension. Satoh [27] adapts the proof procedure in [28] to default logic. The proof procedure applies to consistent and propositional default theories. Inoue et al. [11] apply the model generation theorem prover to logic programming to generate stable extensions, thus allowing to perform credulous reasoning under the stable extension semantics by forward chaining.

9

Conclusions

We have presented abstract proof procedures for computing credulous and sceptical consequences under the semantics of preferred and stable extensions for non-monotonic reasoning, as proposed in [1], relying upon any proof procedure for computing credulous consequences under the semantics of admissible extensions. The proposed proof procedures are abstract in that they can be instantiated to any concrete framework for non-monotonic reasoning which is an instance of the abstract ﬂat framework of [1]. These include logic programming and default logic. They are abstract also in that they abstract away from implementation details. We have compared our proof procedures with existing, state of the art procedures deﬁned for logic programming and default logic. We have argued that the proof procedures for computing consequences under the semantics of preferred extensions are simpler than those for computing consequences under the semantics of stable extensions, and supported our arguments with examples. However, note that the (worst-case) computational complexity of the problem of computing consequences under the semantics of stable extensions is in general no worse than that of computing consequences under the semantics of preferred extensions, and in some cases it is considerably simpler [3,4]. In particular, in the case of autoepistemic logic, the problem of computing sceptical consequences under the semantics of preferred extensions is located at

Argumentation-Based Proof Procedures

307

the fourth level of the polynomial hierarchy, whereas the same problem under the semantics of stable extensions is located at the second level. Of course, these results do not contradict the expectation that in practice, in many cases, computing consequences under the semantics of preferred extensions is easier than under the semantics of stable extensions. Indeed, preferred extensions supporting a desired sentence can be constructed “locally”, by restricting attention to the sentences in the language that are directly relevant to the sentence. Instead, stable extensions need to be constructed “globally”, by considering all sentences in the language, whether they are directly relevant to the given sentence or not. This is due to the fact that stable extensions are not guaranteed to exist. However, note that in all cases where stable extensions are guaranteed to exist and coincide with preferred extensions, e.g. for stratiﬁed and order-consistent frameworks [1], any proof procedure for reasoning under the latter is a correct (and simpler) computational mechanism for reasoning under the former. Finally, the “locality” feature in the computation of consequences under the preferred extension semantics renders it a feasible alternative to the computation of consequences under the stable extension semantics in the non-propositional case, when the language is inﬁnite. Indeed, both CPE and SPE do not require that the given framework be propositional.

Acknowledgements This research has been partially supported by the EC KIT project “Computational Logic for Flexible Solutions to Applications”. The third author has been supported by the UK EPSRC project “Logic-based multi-agent systems”.

References 1. A. Bondarenko, P. M. Dung, R. A. Kowalski, F. Toni, An abstract, argumentationtheoretic framework for default reasoning. Artificial Intelligence, 93:63-101, 1997. 2. Y. Dimopoulos, A. C. Kakas, Logic Programming without Negation as Failure, Proceedings of the 1995 International Symposium on Logic Programming, pp. 369383, 1995. 3. Y. Dimopoulos, B. Nebel, F. Toni, Preferred Arguments are Harder to Compute than Stable Extensions, Proc. of the Sixteenth International Joint Conference on Artiﬁcial Intelligence, IJCAI 99, (T. Dean ed.), pp. 36-43, 1999. 4. Y. Dimopoulos, B. Nebel, F. Toni, Finding Admissible and Preferred Arguments Can Be Very Hard, Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning, KR 2000, (A. G. Cohn, F. Giunchiglia, B. Selman eds.), pp. 53-61, Morgan Kaufmann Publishers, 2000. 5. P. M. Dung, Negation as hypothesis: an abductive foundation for logic programming. Proceedings of the 8th International Conference on Logic Programming, Paris, France (K. Furukawa, ed.), MIT Press, pp. 3–17, 1991. 6. P. M. Dung, On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games Artificial Intelligence,, 77:321-357, Elsevier, 1993.

308

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

7. P. M. Dung, R. A. Kowalski, F. Toni, Synthesis of proof procedures for default reasoning, Proc. LOPSTR’96, International Workshop on Logic Program Synthesis and Transformation, (J. Gallagher ed.), pp. 313–324, LNCS 1207, Springer Verlag, 1996. 8. P. M. Dung, R. A. Kowalski, F. Toni, Proof procedures for default reasoning. In preparation, 2002. 9. K. Eshghi, R. A. Kowalski, Abduction compared with negation as failure. Proceedings of the 6th International Conference on Logic Programming, Lisbon, Portugal (G. Levi and M. Martelli, eds), MIT Press, pp. 234–254, 1989 10. M. Gelfond, V. Lifschitz, The stable model semantics for logic programming. Proceedings of the 5th International Conference on Logic Programming, Washington, Seattle (K. Bowen and R. A. Kowalski, eds), MIT Press, pp. 1070–1080, 1988 11. K. Inoue, M. Koshimura, R. Hasegawa, Embedding negation as failure into a model generation theorem-prover. Proc. CADE’92, pp. 400-415, LNCS 607, Springer, 1992. 12. A. C. Kakas, R. A. Kowalski, F. Toni, The role of abduction in logic programming. Handbook of Logic in Artificial Intelligence and Logic Programming (D.M. Gabbay, C.J. Hogger and J.A. Robinson eds.), 5: 235-324, , Oxford University Press, 1998. 13. A. C. Kakas, P. Mancarella. Preferred extensions are partial stable models. Journal of Logic Programming 14(3,4), pp.341–348, 1993. 14. A. C. Kakas, P. Mancarella, P. M. Dung, The Acceptability Semantics for Logic Programs, Proceedings of the Eleventh International Conference on Logic Programming, pp. 504-519, 1994. 15. A. C. Kakas, F. Toni, Computing Argumentation in Logic Programming. Journal of Logic and Computation 9:515-562, Oxford University Press, 1999. 16. J. McCarthy, Circumscription – a form of non-monotonic reasoning. Artificial Intelligence, 1327–39, 1980. 17. D. McDermott, Nonmonotonic logic II: non-monotonic modal theories. Journal of ACM 29(1), pp. 33–57, 1982. 18. R. Moore, Semantical considerations on non-monotonic logic. Artificial Intelligence 25:75–94, 1985. 19. I. Niemel¨ a, Towards eﬃcient default reasoning. Proc. IJCAI’95, pp. 312–318, Morgan Kaufman, 1995. 20. I. Niemel¨ a, P. Simons, Eﬃcient implementation of the well-founded and stable model semantics. Proc. JICSLP’96, pp. 289–303, MIT Press, 1996. 21. J. L. Pollock. Defeasible reasoning. Cognitive Science, 11(4):481–518, 1987. 22. D. Poole, A logical framework for default reasoning. Artificial Intelligence 36:27– 47, 1988. 23. H. Prakken and G. Sartor. A system for defeasible argumentation, with defeasible priorities. Artificial Intelligence Today, (M. Wooldridge and M. M. Veloso, eds.), LNCS 1600, pp. 365–379, Springer, 1999. 24. H. Prakken and G. Vreeswijk. Logical systems for defeasible argumentation. Handbook of Philosophical Logic, 2nd edition, (D. Gabbay and F. Guenthner eds.), Vol. 4, Kluwer Academic Publishers, 2001. 25. R. Reiter, A logic for default reasoning. Artificial Intelligence 13:81–132, Elsevier, 1980). 26. D. Sacc` a, C. Zaniolo, Stable model semantics and non-determinism for logic programs with negation. Proceedings of the 9th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, ACM Press, pp. 205–217, 1990.

Argumentation-Based Proof Procedures

309

27. K. Satoh, A top-down proof procedure for default logic by using abduction. Proceedings of the Eleventh European Conference on Artificial Intelligence, pp. 65-69, John Wiley and Sons, 1994. 28. K. Satoh and N. Iwayama. A Query Evaluation Method for Abductive Logic Programming. Proceedings of the Joint International Conference and Symposium on Logic Programming, pp. 671 – 685, 1992. 29. G.R. Simari and R.P. Loui. A mathematical treatment of defeasible reasoning and its implementation. Artificial Intelligence, 52:125–257, 1992. 30. M. Thielscher, A nonmonotonic disputation-based semantics and proof procedure for logic programs. Proceedings of the 1996 Joint International Conference and Symposium on Logic Programming (M. Maher ed.), pp. 483–497, 1996. 31. F. Toni, Argumentation-theoretic proof procedures for logic programming. Technical Report, Department of Computing, Imperial College, 1997. 32. G. Vreeswijk. The feasibility of defeat in defeasible reasoning. Proceedings of the 2nd Int. Conf. on Principles of Knowledge Representation and Reasoning (KR’91), (J.F. Allen, R. Fikes, E. Sandewall, eds.), pp. 526–534, 1991.

310

A

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

Stratified and Order Consistent Frameworks

We recall the deﬁnitions of stratiﬁed and order consistent ﬂat argumentationtheoretic frameworks, and theire semantics properties, ad given in [1]. Both classes are characterized in terms of their attack relationship graphs. The attack relationship graph of a ﬂat assumption-based framework

T, Ab, is a directed graph whose nodes are the assumptions in Ab and such that there exists an edge from an assumption δ to an assumption α if and only if δ belongs to a minimal (with respect to set inclusion) attack ∆ against α. A ﬂat assumption-based framework is stratiﬁed if and only if its attack relationship graph is well-founded, i.e. it contains no inﬁnite path of the form α1 , . . . , αn , . . . , where for every i ≥ 0 there is an edge from αi+1 to αi . The notion of order-consistency requires some more auxiliary deﬁnitions. Given a ﬂat assumption-based framework T, Ab, let δ, α ∈ Ab. – δ is friendly (resp. hostile) to α if and only if the attack relationship graph for T, Ab, contains a path from δ to α with an even (resp. odd) number of edges. – δ is two-sided to α, written δ ≺ α, if δ is both friendly and hostile to α. A ﬂat assumption-based framework T, Ab, is order-consistent if the relation ≺ is well-founded, i.e. there exists no inﬁnite sequence of the form α1 , . . . , αn , . . . , where for every i ≥ 0, αi+1 ≺ αi . The following proposition summarizes some of the semantics results of [1] as far as stratiﬁed and order-consistent frameworks are concerned. Proposition 1 (see [1]). – for any stratiﬁed assumption-based framework there exists a unique stable set of assumptions, which coincides with the well-founded set of assumptions. – for any order-consistent assumption-based framework stable sets of assumptions are preferred sets of assumptions and viceversa. It is worth recalling that the abstract notions of stratiﬁcation and orderconsistency generalize the notions of stratiﬁcation and order-consistency for logic programming.

Automated Abduction Katsumi Inoue Department of Electrical and Electronics Engineering Kobe University Rokkodai, Nada, Kobe 657-8501, Japan [email protected]

Abstract. In this article, I review Peirce’s abduction in the context of Artiﬁcial Intelligence. First, I connect abduction from ﬁrst-order theories with nonmonotonic reasoning. In particular, I consider relationships between abduction, default logic, and circumscription. Then, based on a ﬁrst-order characterization of abduction, I show a design of abductive procedures that utilize automated deduction. With abductive procedures, proof procedures for nonmonotonic reasoning are also obtained from the relationship between abduction and nonmonotonic reasoning.

1

Introduction

Kowalski had a decisive impact on the research of abductive reasoning in AI. In 1979, Kowalski showed the role of abduction in information system in his seminal book “Logic for Problem Solving” [58]. In the book, Kowalski also pointed out some similarity between abductive hypotheses and defaults in nonmonotonic reasoning. This article is devoted to investigate such a relation in detail and to give a mechanism for automated abduction from ﬁrst-order theories. In this article, Peirce’s logic of abduction is ﬁrstly reviewed in Section 2, and is then related to a formalization of explanation within ﬁrst-order logic. To know what formulas hold in the theory augmented by hypotheses, the notion of prediction is also introduced. There are two approaches to nonmonotonic prediction: credulous and skeptical approaches, depending on how conﬂicting hypotheses are treated. In Section 3, it is shown that abduction is related to the brave approach, in particular to the simplest subclass of default logic [87] for which eﬃcient theorem proving techniques may exist. On the other hand, circumscription [70] is a notable example of the skeptical approach. Interestingly, the skeptical approach is shown to be realized using the brave approach. In Section 4, computational properties of abduction are discussed in the context of ﬁrst-order logic. To make abduction and nonmonotonic reasoning computable, the consequence-ﬁnding problem in ﬁrst-order logic is reviewed, which is an important challenging problem in automated deduction [61,35,68]. The problem of consequence-ﬁnding is then modiﬁed so that only interesting clauses with a certain property (called characteristic clauses) are found. Then, A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 311–341, 2002. c Springer-Verlag Berlin Heidelberg 2002

312

Katsumi Inoue

abduction is formalized in terms of characteristic clauses. Two consequenceﬁnding procedures are then introduced: one is SOL resolution [35], and the other is ATMS [14]. Compared with other resolution methods, SOL resolution generates fewer clauses to ﬁnd characteristic clauses in general. Finally, this article is concluded in Section 5, where Peirce’s abduction is revisited with future work. It should be noted that this article does not cover all aspects of abductive reasoning in AI. General considerations on abduction in science and AI are found in some recent books [50,26,67] and survey papers [56,78]. Applications of abduction in AI are also excluded in this article. This article mostly focuses on ﬁrst-order abduction, i.e., automated abduction from ﬁrst-order theories, and its relationship with nonmonotonic reasoning with ﬁrst-order theories. Often however, abduction is used in the framework of logic programming, which is referred to as abductive logic programming [53,54,20]. This article omits details of abductive logic programming, but see [51] in this volume. Part of this article is excerpted from the author’s thesis [36] and a summary paper by the author [37].

2

Logic of Abduction

Abduction is one of the three fundamental modes of reasoning characterized by Peirce [79], the others being deduction and induction. To see the diﬀerences between these three reasoning modes, let us look at the “beans” example used by Peirce [79, paragraph 623] in a syllogistic form. Abduction amounts to concluding the minor premise (Case) from the major premise (Rule) and the conclusion (Result): (Rule) All the beans from this bag are white. (Result) These beans are white. (Case) These beans are from this bag. On the contrary, deduction amounts to concluding Result from Rule and Case, and induction amounts to concluding Rule from Case and Result. Later, Peirce wrote an inferential form of abduction as follows. The (surprising) fact, C, is observed; But if A were true, C would be a matter of course; Hence, there is reason to suspect that A is true. This corresponds to the following rule of the form, called the fallacy of aﬃrming the consequent : C A⊃C . (1) A Sometimes A is called an explanans for an explanandum C. Both abduction and induction are non-deductive inference and generate hypotheses. However, hypothesis generation by abduction is distinguished from that by induction, in

Automated Abduction

313

the sense that while induction infers something to be true through generalization of a number of cases of which the same thing is true, abduction can infer something quite diﬀerent from what is observed.1 Therefore, according to Peirce [79, paragraph 777], abduction is “the only kind of reasoning which supplies new ideas, the only kind which is, in this sense, synthetic”. Since abduction can be regarded as a method to explain observations, Peirce considered it as the basic method for scientiﬁc discovery. In the above sense, abduction is “ampliative” reasoning and may play a key role in the process of advanced inference. For example, analogical reasoning can be formalized by abduction plus deduction [79, paragraph 513]. Abduction is, however, only “probable” inference as it is non-deductive. That is, as Peirce argues, abduction is “a weak kind of inference, because we cannot say that we believe in the truth of the explanation, but only that it may be true”. This phenomenon of abduction is preferable, since our commonsense reasoning also has a probable nature. In everyday life, we regularly form hypotheses, to explain how other people behave or to understand a situation, by ﬁlling in the gaps between what we know and what we observe. Thus, abduction is a very important form of reasoning in everyday life as well as in science and engineering. Another important issue involved in abduction is the problem of hypothesis selection: what is the best explanation, and how can we select it from a number of possible explanations which satisfy the rule (1)? Peirce considered this problem philosophically, and suggested various preference criteria that are both qualitative and economical. One example of such criteria is the traditional maxim of Occam’s razor, which adopts the simplest hypotheses. In the following subsections, we give a logic of abduction studied in AI from two points of views, i.e., explanation and prediction. 2.1

Explanation

Firstly, we connect Peirce’s logic of abduction with formalizations of abduction developed in AI within ﬁrst-order logic. The most popular formalization of abduction in AI deﬁnes an explanation as a set of hypotheses which, together with the background theory, logically entails the given observations. This deductivenomological view of explanation [33] has enabled us to have logical speciﬁcations of abduction and their proof procedures based on the resolution principle [89]. There are a number of proposals for resolution-based abductive systems [85,10,25,84,88,91,96,34,83,18,35,53,97,13,19,16]. According to the deductive-nomological view of explanation, we here connect Peirce’s logic of abduction (1) with research on abduction in AI. To this end, we make the following assumptions. 1. Knowledge about a domain of discourse, or background knowledge, can be represented in a set of ﬁrst-order formulas as the proper axioms. In the following, we denote such an axiom set by Σ, and call it a set of facts. 1

The relation, diﬀerence, similarity, and interaction between abduction and induction are now extensively studied by many authors in [26].

314

Katsumi Inoue

2. An observation is also expressed as a ﬁrst-order formula. Given an observation C, each explanation A of C satisfying the rule (1) can be constructed from a sub-vocabulary H of the representation language that contains Σ. We call each formula constructed from such a subset of the language a hypothesis. In general, a hypothesis constructed from H is a formula whose truth value is indeﬁnite but may be assumed to be true. Sometimes H is the representation language itself. 3. The major premise A ⊃ C in the rule (1) can be obtained deductively from Σ, either as an axiom contained in Σ or as a logical consequence of Σ: Σ |= A ⊃ C .

(2)

4. Σ contains all the information required to judge the acceptability of each hypothesis A as an explanation of C. That is, each formula A satisfying (2) can be tested for its appropriateness without using information not contained in Σ. One of these domain-independent, necessary conditions is that A should not be contradictory to Σ, or that Σ ∪ {A} is consistent. 5. We adopt Occam’s razor as a domain-independent criterion for hypothesis selection. Namely, the simplest explanation is preferred over any other. These assumptions are useful particularly for domain-independent automated abduction. The ﬁrst and second conditions above deﬁne a logical framework of abduction: the facts and the hypotheses are both ﬁrst-order formulas. The third and fourth conditions give a logical speciﬁcation of the link between observations and explanations: theories augmented with explanations should both entail observations and be consistent. Although these conditions are most common in abductive theories proposed in AI, the correctness of them from the philosophical viewpoint is still being argued. The ﬁfth condition, simplicity, is also one of the most agreeable criterion to select explanations: a simpler explanation is preferred if every other condition is equal in multiple explanations. Note that these conditions are only for the deﬁnition of explanations. Criteria for good, better, or best explanations are usually given using meta information and domain-dependent heuristics. A number of factors should be considered in selecting the most reasonable explanation. Since there has been no concrete consensus among AI researchers or philosophers about the preference criteria, we will not discuss them further in this article. An example of the above abductive theory can be seen in the Theorist system by Poole, Goebel and Aleliunas [84], which consists of a ﬁrst-order theorem prover that distinguishes facts from hypotheses. Definition 2.1 (Theorist) Let Σ be a set of facts, and Γ a set of hypotheses. We call a pair (Σ, Γ ) an abductive theory. Given a closed formula G, a set E of ground instances of elements of Γ is an explanation of G from (Σ, Γ )2 if 1. Σ ∪ E |= G, and 2. Σ ∪ E is consistent. 2

Some Theorist literature [81] gives a slightly diﬀerent deﬁnition, where a set Σ ∪ E (called a scenario) satisfying the two conditions is called an explanation of G.

Automated Abduction

315

An explanation E of G is minimal if no proper subset E of E is an explanation of G. The ﬁrst condition in the above deﬁnition reﬂects the fact that Theorist has been devised for automated scientiﬁc theory formation, which is useful for prototyping AI problem solving systems by providing a simple “hypothesizetest” framework, i.e., hypothetical reasoning. When an explanation is a ﬁnite set of hypotheses, E = {H1 , . . . , Hn }, the ﬁrst condition is equivalent to Σ |= H1 ∧ . . . ∧ Hn ⊃ G by deduction theorem, and thus can be written in the form of (2). The minimality criterion is a syntactical form of Occam’s razor. Since for an explanation E of G, any E ⊆ E is consistent with Σ, the condition can be written as: an explanation E of G is minimal if no E ⊂ E satisﬁes Σ ∪ E |= G. Note that in Theorist, explanations are deﬁned as a set of ground instances. A more general deﬁnition of (minimal) explanations is deﬁned in [35], in which variables can be contained in explanations. Example 2.2 Suppose that (Σ1 , Γ1 ) is an abductive theory, where Σ1 = { ∀x( Bird(x) ∧ ¬Ab(x) ⊃ F lies(x) ) , ∀x( P enguin(x) ⊃ Ab(x) ) , Bird(T weety) } , Γ1 = { ¬Ab(x) } . Here, the hypothesis ¬Ab(x) means that for any ground term t, ¬Ab(t) can be hypothesized. In other words, a hypothesis containing variables is shorthand for the set of its ground instances with respect to the elements from the universe of the language. Intuitively, ¬Ab(x) means that anything can be assumed to be not abnormal (i.e., normal). In this case, a minimal explanation of F lies(T weety) is { ¬Ab(T weety) }. In Theorist, a set Γ of hypotheses can be any set of ﬁrst-order formulas. Poole [81] shows a naming method which transforms each hypothesis in Γ into an atomic formula. The naming method converts an abductive theory (Σ, Γ ) into a new abductive theory (Σ , Γ ) in the following way. For every hypothesis F (x) in Γ , where x = x1 , . . . , xn is the tuple of the free variables appearing in F , we associate a predicate symbol δF not appearing anywhere in (Σ, Γ ), and deﬁne the following sets of formulas: Γ = { δF (x) | F (x) ∈ Γ } , Σ = Σ ∪ { ∀x( δF (x) ⊃ F (x) ) | F (x) ∈ Γ } . Then, there is a 1-1 correspondence between the explanations of G from (Σ, Γ ) and the explanations of G from (Σ , Γ ) [81, Theorem 5.1].

316

Katsumi Inoue

Example 2.2 (continued) The hypothesis ¬Ab(x) can be named N ormal(x): Σ1 = Σ ∪ { ∀x( N ormal(x) ⊃ ¬Ab(x) ) } , Γ1 = { N ormal(x) } . In this case, a minimal explanation of F lies(T weety) is { N ormal(T weety) }, which corresponds to the explanation { ¬Ab(T weety) } from the original (Σ1 , Γ1 ). Naming hypotheses is a technique commonly used in most abductive systems because hypotheses in the form of atomic formulas can be processed very easily in their implementation. Restriction of hypotheses to atoms is thus used in many abductive systems such as [25,96,52,9]. Note that when we use a resolution procedures for non-Horn clauses, we can allow for negative as well as positive literals as names of hypotheses, since both positive and negative literals can be resolved upon in the procedure. For Example 2.2, we do not have to rename the negative literal ¬Ab(x) to the positive literal N ormal(x). This kind of negative abnormal literal was originally used by McCarthy [71], and is convenient for computing circumscription through abduction. Abductive systems that allow literal hypotheses can be seen in such as [85,10,35]. It should be noted that there are many other formalizations of abduction. For example, abduction is deﬁned by the set covering model [6], is discussed at the knowledge level [63], and is formalized in various ways [100,5,12,65,80,1]. Levesque’s [63] formulation suggests that abduction does not have to be formalized within ﬁrst-order logic. There are some proposals for abductive theories based on other logical languages. In such cases, the background knowledge is often written in a nonmonotonic logic. For example, abductive logic programming (ALP) is an extension of logic programming, which is capable of abductive reasoning as well as nonmonotonic reasoning [52,53,38,44,13,28,54,19,20,51]. Abduction is also deﬁned within a modal logic in [94], autoepistemic logic in [43], or default logic in [22]. Inoue and Sakama [43] point out that, in abduction from nonmonotonic theories, abductive explanations can be obtained not only by addition of new hypotheses, but also by removal of old hypotheses that become inappropriate. 2.2

Prediction

Theory formation frameworks like Theorist can be used for prediction as well as abduction. In [82], a distinction between explanation and prediction is discussed as follows. Let (Σ, Γ ) be an abductive theory, G a formula, and E an explanation of G from (Σ, Γ ) as deﬁned by Deﬁnition 2.1. 1. In abduction, G is an observation which is known to be true. We may assume E is true because G is true. 2. In prediction, G is a formula or a query whose truth value is unknown but is expected to be true. We may assume E is true to make G hold under E.

Automated Abduction

317

Both of the above ways of theory formation perform hypothetical reasoning, but in diﬀerent ways. In abduction, hypotheses used to explain observations are called conjectures, whereas, in prediction, hypotheses are called defaults [81,82]. In Example 2.2, if we have observed that T weety was ﬂying and we want to know why this observation could have occurred, then obtaining the explanation E1 = ¬Ab(T weety) is abduction; but if all we know is only the facts Σ1 and we want to know whether T weety can ﬂy or not, then ﬁnding E1 is prediction where we can expect T weety may ﬂy by default reasoning. These two processes may occur successively: when an observation is made, we abduce possible hypotheses; from these hypotheses, we predict what else we can expect to be true. In such a case, hypotheses can be used as both conjectures and defaults. See also [91,50] for other discussions on the diﬀerence between explanation and prediction. A hypothesis regarded as a default may be used unless there is evidence to the contrary. Therefore, defaults may be applied as many as possible unless augmented theories are inconsistent. This leads to the notion of extensions [81]. Definition 2.3 Given the facts Σ and the hypotheses (defaults) Γ , an extension of the abductive theory (Σ, Γ ) is the set of logical consequences of Σ ∪ M where M is a maximal (with respect to set inclusion) set of ground instances of elements of Γ such that Σ ∪ M is consistent. Using the notion of extensions, various alternative deﬁnitions of what should be predicted can be given [82]. They are related to the multiple extension problem: if G1 holds in an extension X1 and G2 holds in another extension X2 , but there is no extension in which both G1 and G2 hold (i.e., X1 ∪X2 is inconsistent), then what should we predict? —Nothing? Both? Or just one of G1 and G2 ? The next two are the most well-known prediction policies: 1. Predict what holds in an extension of (Σ, Γ ); 2. Predict what holds in all extensions of (Σ, Γ ). The ﬁrst approach to default reasoning leads to multiple extensions and is called a credulous approach. On the other hand, the latter approach is called a skeptical approach. Credulous and skeptical reasoning are also called brave and cautious reasoning, respectively. In the next section, we see that credulous prediction can be directly characterized by explanation and that skeptical prediction can be represented by combining explanations.

3

Relating Abduction to Nonmonotonic Reasoning

In this section, we relate the abductive theories introduced in Section 2 to formalisms of nonmonotonic reasoning. Since abduction is ampliative and plausible reasoning, conclusions of abductive reasoning may not be correct. Therefore, abduction is nonmonotonic. This can be easily veriﬁed for abductive theories. First, an explanation E is consistent with the facts Σ by deﬁnition, but E is not necessarily an explanation with

318

Katsumi Inoue

respect to the new facts Σ (⊃ Σ) because Σ ∪ E may not be consistent. Second, a minimal explanation E of G with respect to Σ may not be minimal with respect to Σ (⊃ Σ) because a subset E of E may satisfy Σ ∪ E |= G. Poole [82] investigates other possibilities of nonmonotonicity that may arise according to changes of facts, hypotheses, and observations. The above discussion can also be veriﬁed by considering relationships between abduction and nonmonotonic logics. In fact, this link is bidirectional [36,56]: abduction can be formalized by a credulous form of nonmonotonic logic (default logic), and a skeptical nonmonotonic formalism (circumscription) can be represented using an abductive theory. The former relationship veriﬁes the nonmonotonicity of abduction, and the latter implies that abduction can be used for commonsense reasoning as well as scientiﬁc theory formation. 3.1

Nonmonotonic Reasoning

We here review two major formalisms for nonmonotonic reasoning: default logic [87] and circumscription [70]. Both default logic and circumscription extend the classical ﬁrst-order predicate calculus, but in diﬀerent ways. Default logic introduces inference rules referring to the consistency with a belief set, and uses them meta-theoretically to extend a ﬁrst-order theory. Circumscription, on the other hand, augments a ﬁrst-order theory with a second-order axiom expressing a kind of minimization principle, and restricts the objects satisfying a certain predicate to just those that the original theory says must satisfy that predicate. Default Logic. Default logic, proposed by Reiter [87], is a logic for drawing plausible conclusions based on consistency. This is one of the most intuitive and natural logics for nonmonotonic reasoning. One of the most successful results derived from the studies on default logic can be seen in the fact that logic programming with negation as failure can be interpreted as a class of default logic [2,29]. In this article, we also see that abduction can be characterized by one of the simplest classes of default logic (Section 3.2). A default is an inference rule of the form: α(x) : M β1 (x), . . . , M βm (x) , γ(x)

(3)

where α(x), β1 (x), . . . , βm (x), and γ(x) are ﬁrst-order formulas whose free variables are contained in a tuple of variables x. α(x) is called the prerequisite, β1 (x), . . . , βm (x) the justiﬁcations, and γ(x) the consequent of the default. A default is closed if no formula in it contains a free variable; otherwise it is open. An open default is usually identiﬁed with the set of closed defaults obtained by replacing the free variables with ground terms. A default is normal if it contains only one justiﬁcation (m = 1) that is equivalent to the consequent (β1 ≡ γ). A default theory is a pair, (D, W ), where D is a set of defaults and W is a set of ﬁrst-order formulas which represents proper axioms. A default theory is normal if every default is normal.

Automated Abduction

319

The intended meaning of the default (3) is: for any tuple t of ground terms, “if α(t) is inferable and each of β1 (t), . . . , βm (t) is consistently assumed, then infer γ(t)”. When a default is applied, it is necessary that each of its justiﬁcations is consistent with a “belief set”. In order to express this condition formally, an acceptable “belief set” induced by reasoning with defaults (called an extension) is precisely deﬁned in default logic as follows. Definition 3.1 [87] Let (D, W ) be a default theory, and X a set of formulas. X is an extension of (D, W ) if it coincides with the smallest set Y of formulas satisfying the following three conditions: 1. W ⊆ Y . 2. Y is deductively closed, that is, it holds that cl(Y ) = Y , where cl(Y ) is the logical closure of Y under classical ﬁrst-order deduction. 3. For any ground instance of any default in D of the form (3), if α(t) ∈ Y and ¬β1 (t), . . . , ¬βm (t) ∈ X, then γ(t) ∈ Y . A default theory may have multiple or, even, no extensions. However, it is known that for any normal default theory, there is at least one extension [87, Theorem 3.1]. It is also noted that in default logic each extension is interpreted as an acceptable set of beliefs in accordance with default reasoning. Such an approach to default reasoning leads to multiple extensions and is a credulous approach. By credulous approaches one can get more conclusions depending on the choice of the extension so that conﬂicting beliefs can be supported by different extensions. This behavior is not necessarily intrinsic to a reasoner dealing with a default theory; we could deﬁne the theorems of a default theory to be the intersection of all its extensions so that we remain agnostic to conﬂicting information. This latter variant is a skeptical approach. Circumscription. Circumscription, proposed by McCarthy [70], is one of the most “classical” and best-developed formalisms for nonmonotonic reasoning. An important property of circumscription that many other nonmonotonic formalisms lack, is that it is based on classical predicate logic. Let T be a set of ﬁrst-order formulas, and P and Z denote disjoint tuples of distinct predicate symbols in the language of T . The predicates in P are said to be minimized and those in Z to be variables; Q denotes the rest of the predicates in the language of T , called the ﬁxed predicates (or parameters). We denote a theory T by T (P; Z) when we want to indicate explicitly that T mentions the predicates P and Z. Adopting the formulation by Lifschitz [64], the circumscription of P in T with Z, written CIRC (T ; P; Z), is the augmentation of T with a second-order axiom expressing the minimality condition: T (P; Z) ∧ ¬∃ pz (T (p; z) ∧ p < P) .

(4)

Here, p and z are tuples of predicate variables each of which has the same arity as the corresponding predicate symbol in P and Z, and T (p; z) denotes a theory obtained from T (P; Z) by replacing each occurrence of P and Z with p and z.

320

Katsumi Inoue

Also, p < P stands for the conjunction of formulas each of which is deﬁned, for every member Pi of P with a tuple x of object variables and the corresponding predicate variable pi in p, in the form: ∀x(pi (x) ⊃ Pi (x)) ∧ ¬∀x(Pi (x) ⊃ pi (x)) . Thus, the second-order formula in the deﬁnition (4) represents that the extension of the predicates from P is minimal in the sense that it is impossible to make it smaller without violating the constraint T . Intuitively, CIRC (T ; P; Z) is intended to minimize the number of objects satisfying P, even at the expense of allowing more or diﬀerent objects to satisfy Z. The model-theoretic characterization of circumscription is based on the notion of minimal models. Definition 3.2 [64] Let M1 and M2 be models of T with the same universe. We write M1 ≤P,Z M2 if M1 and M2 diﬀer only in the way they interpret predicates from P and Z, and the extension of every predicate P from P in M1 is a subset of the extension of P in M2 . Then, a model M of T is (P, Z)-minimal if, for no other model M of T , M ≤P,Z M but M ≤P,Z M . It is known that, for any formula F , CIRC (T ; P; Z) |= F if and only if F is satisﬁed by every (P, Z)-minimal model of T [64]. Since each theorem of a circumscription is satisﬁed by all minimal models, this property makes the behavior of circumscription skeptical. 3.2

Abduction and Default Logic

Suppose that Σ is a set of facts and Γ is a set of hypotheses. In order to avoid confusion in terminology, we here call an extension of the abductive theory (Σ, Γ ) given by Deﬁnition 2.3 a Theorist extension, and call an extension of a default theory (D, W ) given by Deﬁnition 3.1 a default extension. Let w(x) be a formula whose free variables are x. For Σ and Γ , we deﬁne a normal default theory (DΓ , Σ), where : M w(x) DΓ = w(x) ∈ Γ . w(x) Notice that DΓ is a set of prerequisite-free normal defaults, that is, normal defaults whose prerequisites are true. We obtain the next theorem by resluts from [81, Theorems 2.6 and 4.1]. Theorem 3.3 Let (Σ, Γ ) be an abductive theory, and G a formula. The following three are equivalent: (a) There is an explanation of G from (Σ, Γ ). (b) There is a Theorist extension of (Σ, Γ ) in which G holds. (c) There is a default extension of the default theory (DΓ , Σ) in which G holds.

Automated Abduction

321

Theorem 3.3 is very important for the following reasons. 1. It is veriﬁed that each abductive explanation is contained in a possible set of beliefs. In particular, when the hypotheses Γ represent defaults for normal or typical properties, then in order to predict a formula G by default reasoning, it is suﬃcient to ﬁnd an explanation of G from (Σ, Γ ) [81]. 2. All properties possessed by normal default theories are valid for abductive explanations and Theorist extensions. For instance, for any Σ and Γ , there is at least one Theorist extension of (Σ, Γ ). 3. Computation of abduction can be given by top-down default proofs [87], which is an extension of linear resolution theorem proving procedures such as [59,7,66]. This fact holds for the following reasons. It is shown that, G holds in some default extension of a normal default theory (D, W ) if and only if there is a top-down default proof of G with respect to (D, W ) [87, Theorem 7.3]. Also, every top-down default proof returns a set S of instances of consequents of defaults from D with which G can be proven from W , i.e., W ∪ S |= G. Therefore, such an S is an explanation from the corresponding abductive theory whenever W ∪ S is consistent. The last point above is also very useful for designing and implementing hypothetical reasoning systems. In fact, many ﬁrst-order abductive procedures [85,10,84,96,83] can be regarded as variants of Reiter’s top-down default proof procedures: computation of explanations of G from (Σ, Γ ) can be seen as an extension of proof-ﬁnding in linear resolution by introducing a set of hypotheses from Γ that, if they could be proven by preserving the consistency of the augmented theories, would complete the proofs of G. Alternatively, abduction can be characterized by a consequence-ﬁnding problem [35], in which some literals are allowed to be hypothesized (or skipped ) instead of proven, so that new theorems consisting of only those skipped literals are derived at the end of deductions instead of just deriving the empty clause. In this sense, abduction can be implemented as an extension of deduction, in particular of a top-down, backwardchaining theorem-proving procedure. For example, Theorist [84,83] and SOL resolution [35] are extensions of the Model Elimination procedure [66]. Example 2.2 (continued) For the goal G = F lies(T weety), a version of Theorist implementation works as follows (written using a Prolog-like notation): ← F lies(T weety) , ← Bird(T weety) ∧ ¬Ab(T weety) , ← ¬Ab(T weety) , 2 by defaults: {¬Ab(T weety)} . Then, the returned set of defaults S = {¬Ab(T weety)} is checked for the consistency with Σ1 by failing to prove the negation of S from Σ1 . In this case, it holds that Σ1 |= Ab(T weety) , thus showing that S is an explanation of G from (Σ1 , Γ1 ).

322

Katsumi Inoue

Next, suppose that P enguin(T weety) is added to Σ1 , and let Σ2 = Σ1 ∪ { P enguin(T weety) } . We then get S again by the same top-down default proof as above, but the consistency check of S in this case results in a success proof: ← Ab(T weety) , ← P enguin(T weety) , 2. Therefore, S is no longer an explanation of G from (Σ2 , Γ1 ). 3.3

Abduction and Circumscription

A signiﬁcant diﬀerence between circumscription and default logic lies in their ways to handle variables and equality. We then assume that the function symbols are the constants only and the number of constants is ﬁnite. Furthermore, in this subsection, a theory T means a set of formulas over the language including the equality axioms, and both the domain-closure assumption (DCA) and the unique-names assumption (UNA) are assumed to be satisﬁed by T . In this setting, the UNA represents that each pair of distinct constants denotes diﬀerent individuals in the domain. The DCA implies that the theory has ﬁnite models and that every formula containing variables is equivalent to a propositional combination of ground atoms. Although these assumptions are strong, their importance is widely recognized in databases and logic programming. For circumscription, these assumptions make the universe ﬁxed, so that the comparison with default logic becomes clear [24]. In particular, circumscription with these assumptions is essentially equivalent to the Extended Closed World Assumption (ECWA) [30]. Another big diﬀerence between circumscription and default logic is in their approaches to default prediction: skeptical versus credulous. The theorems of a circumscription are the formulas satisﬁed by every minimal model, while there are multiple default extensions in default logic. We, therefore, compare the theorems of a circumscription with the formulas contained in every default extension of a default theory. On the relationship between circumscription and default logic, Etherington [24] has shown that, under some conditions, a formula is entailed by a circumscription plus the DCA and the UNA if and only if the formula is contained in every default extension of the corresponding default theory. Proposition 3.4 [24] Assume that T is a theory satisfying the above conditions. Let P be a tuple of predicates, and Z the tuple of all predicates other than those in P in the language. Then, the formulas true in every default extension of the default theory: : M ¬Pi (x) P ∈ P , T (5) i ¬Pi (x) are precisely the theorems of CIRC (T ; P; Z).

Automated Abduction

323

Since the default theory (5) is a prerequisite-free normal default theory, we can connect each of its default extensions with a Theorist extension using Theorem 3.3. Therefore, in the abductive theory, we hypothesize the negative occurrences of the minimized predicates P. The following corollary can be obtained by Theorem 3.3 and the model theory of circumscription. Corollary 3.5 Let T , P and Z be the same as in Proposition 3.4. A (P, Z)minimal model of T satisﬁes a formula F if and only if F has an explanation from the abductive theory (T, { ¬Pi (x) | Pi ∈ P }). The above corollary does not deal with a skeptical prediction but a credulous one. Moreover, Proposition 3.4 does not allow for the speciﬁcation of ﬁxed predicates. Gelfond et al. [30], on the other hand, show a more general result for the ECWA by allowing some predicates to be ﬁxed. The idea of reducing circumscription to the ECWA is very important as it is helpful for designing resolution-based theorem provers for circumscription [86,31,41,32]. Earlier work for such a reduction of circumscription to a special case of the ECWA can be found in [73,3] where all predicates in the language are minimized. To compute circumscription, we are particularly interested in two results of the ECWA obtained by Gelfond et al. [30, Theorems 5.2 and 5.3] with the notion of free for negation. These are also adopted as the basic characterizations for query answering in circumscriptive theories by Przymusinski [86, Theorems 2.5 and 2.6]. Inoue and Helft [41] express them using diﬀerent terminology (characteristic clauses). Here, we relate these results of the ECWA with abduction. Let T be a theory as above, P the minimized predicates, Q the ﬁxed predicates, and Z variables. For a tuple R of predicates in the language, we denote by R+ (R− ) the positive (negative) occurrences of predicates from R in the language. Then, we deﬁne the abductive theory for circumscription, (T, Γcirc ), where the hypotheses are given as: Γcirc = P− ∪ Q+ ∪ Q− . Intuitively, both positive and negative occurrences of Q are hypothesized as defaults to prevent the abductive theory from altering the deﬁnition of each predicate from Q. The next theorem can be obtained from [30, Theorems 5.2 and 5.3]. Theorem 3.6 [41] (1) For any formula F not containing predicate symbols from Z, CIRC (T ; P; Z) |= F if and only if ¬F has no explanation from (T, Γcirc ). (2) For any formula F , CIRC (T ; P; Z) |= F if and only if there exist explanations E1 , . . . , En (n ≥ 1) of F from (T, Γcirc ) such that ¬(E1 ∨ . . . ∨ En ) has no explanation from (T, Γcirc ). Using Theorem 3.6, we can reduce query answering in a circumscriptive theory to the ﬁnding of a combination of explanations of a query such that the

324

Katsumi Inoue

negation of the disjunction cannot be explained. The basic intuition behind this theorem is as follows. In abduction, by Corollary 3.5, if a formula F is explained, then F holds in some default extension, that is, F is satisﬁed by some minimal model. In circumscription, on the other hand, F should be satisﬁed by every minimal model, or F should hold in all default extensions. This condition is checked by computing multiple explanations E1 , . . . , En of F corresponding to multiple default extensions such that those explanations cover all default extensions. Then, the disjunction E1 ∨ . . . ∨ En is also an explenation of F , and is a skeptical but the weakest explanation of F [55]. Combining explanations is like an argument system [82,83,32], which consists of two processes where one tries to ﬁnd explanations of the query and the other tries to ﬁnd a counter argument to refute them. Example 3.7 Consider the theory T consisting of the two formulas: ¬Bird(x) ∨ ¬Ab(x) ∨ F lies(x) , Bird(T weety) , where P = {Ab}, Q = {Bird} and Z = {F lies}, so that the abductive hypotheses are set to Γcirc = {Ab}− ∪ {Bird}+ ∪ {Bird}− . Let us consider the query F = F lies(T weety). Now, {¬Ab(T weety)} is an explanation of F . The negation of this explanation has no explanation. F is thus a theorem of CIRC (T ; Ab; F lies). Next, let T = T ∪ { Ab(T weety) ∨ Ab(Sam) }. Then ¬Ab(Sam) is an explanation of ¬Ab(T weety) from (T , Γcirc ). Hence, F is not a theorem of the circumscription of Ab in T . Skeptical prediction other than circumscription can also be characterized by credulous prediction. Instead of giving the hypotheses Γcirc , any set Γ of hypotheses can be used in Theorem 3.6 as follows. Corollary 3.8 Let (Σ, Γ ) be an abductive theory. A formula F holds in every Theorist extension of (Σ, Γ ) if and only if there exist explanations E1 , . . . , En (n ≥ 1) of F from (Σ, Γ ) such that ¬(E1 ∨ . . . ∨ En ) has no explanation from (Σ, Γ ). 3.4

Abduction and Other Nonmonotonic Formalization

Although we focused on default logic and circumscription as two major nonmonotonic formalization, abduction can also be used to represent other form of nonmonotonic reasoning. Here we brieﬂy cite such work for reference. One of the most important results in this area is a formalization of nonmonotonic reasoning by means of argumentation framework [60,4,54]. In [4], an assumption-based

Automated Abduction

325

framework (Σ, Γ, ∼) is deﬁned as a generalization of the Theorist framework. Here, like Theorist, Σ and Γ are deﬁned as facts and hypotheses respectively, but are not restricted to ﬁrst-order language. The mapping ∼ deﬁnes some notion of contrary of assumptions, and a defeated argument is deﬁned as an augmented theory whose contrary is proved. Varying the underlying language of Σ and Γ and the notion of ∼, this framework is powerful enough to deﬁne the semantics of most nonmonotonic logics, including Theorist, default logic, extended logic programs [29], autoepistemic logic [74], other non-monotonic modal logics, and certain instances of circumscription. This framework is applied to defeasible rules in legal reasoning [60] and is related to other methods in abductive logic programming [54]. In [45], abduction is also related to autoepistemic logic and negation as failure in extended disjunctive logic programs. In particular, an autoepistemic translation of a hypothesis γ is given as Bγ ⊃ γ . The set consisting of this autoepistemic formula produces two stable expansions, one containing γ and Bγ, the other containing ¬Bγ but neither γ nor ¬γ. With this property, we can deﬁne the world in which γ is assumed to be true, while another world not assuming γ is also kept.

4

Computing Abduction via Automated Deduction

This section presents computational methods for abduction. In Section 2.1, we have seen that abduction can be characterized within ﬁrst-order logic. Using this characterization, here we show a realization of automated abduction based on the resolution principle. 4.1

Consequence-Finding

As explained in Section 3.2, many abductive systems based on the resolution principle can be viewed as procedures that perform a kind of Reiter’s top-down default proofs. Now, we see the underlying principle behind such abductive procedures from a diﬀerent, purely deductive, viewpoint [35]. Firstly, the deﬁnition of abduction given in Section 2.1 can be represented as a consequence-ﬁnding problem, which is a problem of ﬁnding theorems of the given axiom set Σ. The consequence-ﬁnding problem is ﬁrstly addressed by Lee in 1967 [61] in the context of Robinson’s resolution principle [89]. Lee proved the completeness result that: Given a set of clauses Σ, if a clause C is a logical consequence of Σ, then the resolution principle can derive a clause D such that D implies C. In this sense, the resolution principle is said complete for consequence-ﬁnding. In Lee’s theorem, “D implies C” can be replaced with “D subsumes C”. Later,

326

Katsumi Inoue

Slagle, Chang and Lee [95] and Minicozzi and Reiter [72] showed that “the resolution principle” can also be replaced with “semantic resolution” and “linear resolution”, respectively. In practice, however, the set of theorems of an axiom set is generally inﬁnite, and hence the complete deductive closure is neither obtainable nor desirable. Toward more practical automated consequence-ﬁnding, Inoue [35] reformulated the consequence-ﬁnding problem as follows. Given a set of clauses Σ and some criteria of “interesting” clauses, derive each “interesting” clause that is a logical consequence of Σ and is minimal with respect to subsumption. Here, each interesting clause is called a characteristic clause. Criteria of interesting clauses are speciﬁed by a sub-vocabulary of the representation language called a production ﬁeld. In the propositional case, each characteristic clause of Σ is a prime implicate of Σ. The use of characteristic clauses enables us to characterize various reasoning problems of interest to AI, such as nonmonotonic reasoning [3,41,32,8], diagnosis [25,93], and knowledge compilation [69,15,90] as well as abduction. Moreover, for inductive logic programming (ILP), consequence-ﬁnding can be applied to generate hypothesis rules from examples and background knowledge [98,39], and is used as the theoretical background for discussing the completeness of ILP systems [76].3 An extensive survey of consequence-ﬁnding in propositional logic is given by Marquis [68]. Now, characteristic clauses are formally deﬁned as follows [35]. Let C and D be two clauses. C subsumes D if there is a substitution θ such that Cθ ⊆ D and C has no more literals than D [66]. C properly subsumes D if C subsumes D but D does not subsume C. For a set of clauses Σ, µΣ denotes the set of clauses in Σ not properly subsumed by any clause in Σ. A production ﬁeld P is a pair, L, Cond , where L is a set of literals and is closed under instantiation, and Cond is a certain condition to be satisﬁed. When Cond is not speciﬁed, P is denoted as L . A clause C belongs to P = L, Cond if every literal in C belongs to L and C satisﬁes Cond. When Σ is a set of clauses, the set of logical consequence of Σ belonging to P is denoted as T hP (Σ). Then, the characteristic clauses of Σ with respect to P are deﬁned as: Carc(Σ, P) = µ T hP (Σ) . Note that the empty clause 2 is the unique clause in Carc(Σ, P) if and only if Σ is unsatisﬁable. This means that proof-ﬁnding is a special case of consequenceﬁnding. When a new clause F is added to the set Σ of clauses, some consequences are newly derived with this new information. Such a new and “interesting” clause is called a “new” characteristic clauses. Formally, the new characteristic clauses of F with respect to Σ and P are deﬁned as: N ewcarc(Σ, F, P) = µ [ T hP (Σ ∪ {F }) − T h(Σ) ] . 3

In ILP, the completeness result of consequence-ﬁnding is often called the subsumption theorem [76], which was originally coined by Kowalski in 1970 [57].

Automated Abduction

327

The above deﬁnition is equivalent to the following [35]: N ewcarc(Σ, F, P) = Carc(Σ ∪ {F }, P) − Carc(Σ, P). 4.2

Abduction as Consequence-Finding

Now, we are ready to characterize abduction as consequence-ﬁnding. In the following, we denote the set of all literals in the representation language by L, and a set Γ of hypotheses is deﬁned as a subset of L. Any subset E of Γ is identiﬁed with the conjunction of all elements in E. Also, for any set T of formulas, T represents the set of formulas obtained by negating every formula in T , i.e., T = { ¬C | C ∈ T }. Let G1 , . . . , Gn be a ﬁnite number of observations, and suppose that they are all literals. We want to explain the observations G = G1 ∧ . . . ∧ Gn from (Σ, Γ ), where Σ is a set of clauses representing facts and Γ is a set of ground literals representing hypotheses. Let E = E1 ∧ . . . ∧ Ek be any explanation of G from (Σ, Γ ). Then, the following three hold: 1. Σ ∪ { E1 ∧ . . . ∧ Ek } |= G1 ∧ . . . ∧ Gn , 2. Σ ∪ { E1 ∧ . . . ∧ Ek } is consistent, 3. Each Ei is an element of Γ. These are equivallent to the next three conditions: 1 . Σ ∪ { ¬G1 ∨ . . . ∨ ¬Gn } |= ¬E1 ∨ . . . ∨ ¬Ek , 2 . Σ |= ¬E1 ∨ . . . ∨ ¬Ek , 3 . Each ¬Ei is an element of Γ . By 1 , a clause derived from the clause set Σ by adding the clause ¬G is the negation of an explanation of G from (Σ, Γ ), and this computation can be done as automated deduction over clauses.4 By 2 , such a derived clause must not be a consequence of Σ before adding ¬G. By 3 , every literal appearing in such a clause must belong to Γ . Moreover, E is a minimal explanation from (Σ, Γ ) if and only if ¬E is a minimal theorem from Σ ∪ {¬G}. Hence, the problem of abduction is reduced to the problem of seeking a clause such that (i) it is a minimal theorem of Σ ∪ {¬G}, but (ii) it is not a theorem of Σ alone, and (iii) it consists of literals only from Γ . Therefore, we obtain the following result. Theorem 4.1 [35] Let (Σ, Γ ) be an abductive theory, where Γ ⊆ L. Put the production ﬁeld as P = Γ . Then, the set of minimal explanations of an observation G from (Σ, Γ ) is: N ewcarc(Σ, ¬G, P) . 4

This way of computing hypotheses is often referred as “inverse entailment” in ILP [75,39]. Although there are some discussion against such a scheme of “abduction as deduction-in-reverse” [12], it is surely one of the most recognizable ways to construct possible hypotheses deductively.

328

Katsumi Inoue

In the above setting, we assumed that G is a conjunction of literals. Extending the form of each observation Gi to a clause is possible. When G is any formula, suppose that by converting ¬G into the conjunctive normal form we obtain a formula F = C1 ∧ · · · ∧ Cm , where each Ci is a clause. In this case, N ewcarc(Σ, F, P) can be decomposed into m N ewcarc operations each of whose added new formula is a single clause [35]: N ewcarc(Σ, F, P) = µ [

m

N ewcarc(Σi , Ci , P) ] ,

i=1

where Σ1 = Σ, and Σi+1 = Σi ∪ {Ci } for i = 1, . . . , m − 1. This incremental computation can also be applied to get the characteristic clauses of Σ with respect to P as: Carc(Σ, P) = N ewcarc(∅, Σ, P). In Theorem 4.1, explanations obtained by a consequence-ﬁnding procedure are not necessarily ground and can contain variables. Note, however, that in implementing resolution-based abductive procedures, both the query G and its explanation E are usually considered as existentially quantiﬁed formulas. When G contains universally quantiﬁed variables, each of them is replaced with a new constant or function in ¬G through Skolemization. Then, to get a universally quantiﬁed explanation in negating each new characteristic clause containing Skolem functions, we need to apply the reverse Skolemization algorithm [10]. For example, if ¬P (x, ϕ(x), u, ψ(u)) is a new characteristic clause where ϕ, ψ is a Skolem function, we get two explanations, ∃x∀y∃u∀v P (x, y, u, v) and ∃u∀v∃x∀y P (x, y, u, v) by reverse Skolemization. Using Theorems 3.6 and 4.1, skeptical prediction can also be realized by consequence-ﬁnding procedures as follows. Corollary 4.2 [41] Let CIRC (Σ; P; Z) be the circumscription of P in Σ with variables Z. Put Pcirc = P+ ∪ Q+ ∪ Q− , where Q is the ﬁxed predicates. (1) For any formula F not containing literals from Z, CIRC (Σ; P; Z) |= F if and only if N ewcarc(Σ, F, Pcirc ) = ∅. (2) For any formula F , CIRC (Σ; P; Z) |= F if and only if there is a conjunction G of clauses from N ewcarc(Σ, ¬F, Pcirc ) such that N ewcarc(Σ, ¬G, Pcirc ) = ∅. 4.3

SOL Resolution

To compute new characteristic clauses, Inoue [35] deﬁned an extension of the Model Elimination (ME) calculus [59,7,66] by adding the Skip rule to ME. The extension is called SOL resolution, and can be viewed either as OL resolution [7] (or SL resolution [59]) augmented with the Skip rule, or as a ﬁrst-order generalization of Siegel’s propositional production algorithm [93]. Note here that, although ME is complete for proof-ﬁnding (i.e., refutation-complete) [66], it is not complete for consequence-ﬁnding [72]. SOL resolution is useful for computing the (new) characteristic clauses for the following reasons.

Automated Abduction

329

(1) In computing N ewcarc(Σ, C, P), SOL resolution treats a newly added clause C as the top clause (or a start clause) input to ME. This is a desirable feature for consequence-ﬁnding since the procedure can directly derive the theorems relevant to the added information. (2) It is easy to focus on producing only those theorems belonging to the production ﬁeld. This is implemented by allowing an ME procedure to skip the selected literal belonging to P. In other words, SOL resolution is restricted to searching only characteristic clauses. Here, we show a deﬁnition of SOL resolution based on [35]. An ordered clause is a sequence of literals possibly containing framed literals which represent literals that have been resolved upon. A structured clause P, Q is a pair of a clause P and an ordered clause Q, whose clausal meaning is P ∪ Q. Definition 4.3 (SOL Resolution) Given a set of clauses Σ, a clause C, and a production ﬁeld P, an SOL-deduction of a clause S from Σ + C and P consists of a sequence of structured clauses, D0 , D1 , . . . , Dn , such that: D0 = 2, C . Dn = S, 2 . For each Di = Pi , Qi , Pi ∪ Qi is not a tautology. For each Di = Pi , Qi , Qi is not subsumed by any Qj with the empty substitution, where Dj = Pj , Qj is a previous structured clause, j < i. 5. For each Di = Pi , Qi , Pi belongs to P. 6. Di+1 = Pi+1 , Qi+1 is generated from Di = Pi , Qi according to the following steps: (a) Let l be the selected literal in Qi . Pi+1 and Ri+1 are obtained by applying one of the rules: i. (Skip) If Pi ∪ {l} belongs to P, then Pi+1 = Pi ∪ {l} and Ri+1 is the ordered clause obtained by removing l from Qi . ii. (Resolve) If there is a clause Bi in Σ ∪ {C} such that ¬k ∈ Bi and l and k are uniﬁable with mgu θ, then Pi+1 = Pi θ and Ri+1 is an ordered clause obtained by concatenating Bi θ and Qi θ, framing lθ, and removing ¬kθ. iii. (Reduce) If either A. Pi or Qi contains an unframed literal k (factoring/merge) or B. Qi contains a framed literal ¬k (ancestry), and l and k are uniﬁable with mgu θ, then Pi+1 = Pi θ and Ri+1 is obtained from Qi θ by deleting lθ. (b) Qi+1 is obtained from Ri+1 by deleting every framed literal not preceded by an unframed literal in the remainder (truncation).

1. 2. 3. 4.

When the Skip rule is applied to the selected literal in an SOL deduction, it is never solved by applying any resolution. To apply this rule, the selected literal has to belong to the production ﬁeld. When a deduction with the top clause C is completed, that is, every literal is either solved or skipped, those skipped literals are collected and output. This output clause is a logical consequence of Σ ∪ {C}

330

Katsumi Inoue

and every literal in it belongs to the production ﬁeld P. Note that when both Skip and resolution can be applied to the selected literal, these two rules are chosen non-deterministically. In [35], it is proved that SOL resolution is complete for both consequence-ﬁnding and ﬁnding (new) characteristic clauses. In [99], SOL resolution is implemented using the Weak Model Elimination method [66]. In [49], various pruning methods are introduced to enhance the eﬃciency of SOL resolution in a connection-tableau format [62]. In [16], del Val deﬁnes a variant of consequence-ﬁnding procedure for ﬁnding characteristic clauses, which is based on ordered resolution instead of Model Elimination. Example 4.4 [35] Suppose that Σ consists of the two clauses: (1) ¬P (x) ∨ Q(y, y) ∨ R(z, x) , (2) ¬Q(x, y) ∨ R(x, y) . Suppose also that the set of hypotheses is given as Γ = {P }+ . Then the production ﬁeld is P = Γ = {P }− . Now, consider the query, G = R(A, x), where the variable x is interpreted as existentially quantiﬁed, and we want to compute its answer substitution. The ﬁrst SOL-deduction from Σ + ¬G and P is as follows: (3) (4)

2 , ¬R(A, x) ,

2 , ¬P (x) ∨ Q(y, y) ∨ ¬R(A, x) ,

(5) ¬P (x) , Q(y, y) ∨ ¬R(A, x) ,

top clause resolution with (1) skip

(6) ¬P (x) , R(y, y) ∨ Q(y, y) ∨ ¬R(A, x) , resolution with (2) (7a) ¬P (A) , Q(A, A) ∨ ¬R(A, A) , (7b) ¬P (A) , 2 .

ancestry truncation

In the above SOL-deduction, P (A) is an explanation of the answer R(A, A) from (Σ, Γ ). Namely, Σ |= P (A) ⊃ R(A, A) . The second SOL-deduction from Σ + ¬G and P takes the same four steps as the above (3)–(6), but instead of applying ancestry at (7), R(y, y) is resolved upon against the clause ¬R(A, x ), yielding (7a ) ¬P (x) , R(A, A) ∨ Q(A, A) ∨ ¬R(A, x) , (7b ) ¬P (x) , 2 . In this case, ¬G is used twice in the SOL-deduction. Note that P (x) is not an explanation of any deﬁnite answer. It represents that for any term t, P (t) is an explanation of the indeﬁnite answer R(A, t) ∨ R(A, A). Namely, Σ |= ∀x( P (x) ⊃ R(A, x) ∨ R(A, A) ) .

Automated Abduction

331

By Theorem 4.1 and the completeness result of SOL resolution, we can guarantee the completeness for ﬁnding explanations from ﬁrst-order abductive theories. In contrast, the completeness does not hold for abductive procedures like [85,10], in which hypothesizing literals is allowed only when resolution cannot be applied for selected literals. The hypothesized, unresolved literals are “deadends” of deductions, and explanations obtained in this way are most-speciﬁc [96]. This kind of abductive computation can also be implemented in a variant of SOL resolution, called SOL-R resolution [35], by preferring resolution to Skip whenever both can be applied. On the other hand, there is another variant of SOL resolution, called SOL-S resolution [35], in which only Skip is applied by ignoring the possibility of resolution when the selected literal belongs to P. Each explanation obtained by using SOL-S resolution is called a least-speciﬁc explanation [96]. While most-speciﬁc explanations are often useful for application to diagnosis [85,10], least-speciﬁc explanations are used in natural language understanding [96] and computing circumscription by Corollary 4.2 [41]. 4.4

Bottom-Up Abduction

As shown by Reiter and de Kleer [88], an assumption-based truth maintenance system (ATMS) [14] is a propositional abductive system. In ATMS, facts are given as propositional Horn clauses and hypotheses are propositional atoms [63,34,92]. An extension of ATMS, which allows non-Horn propositional clauses for facts and propositional literals for hypotheses, is called a clause management system (CMS) [88]. The task of CMS is to compute the set of all minimal explanations of a literal G from (Σ, Γ ), where Σ is a set of propositional clauses and Γ ⊆ L is a set of hypotheses. In ATMS, the minimal explanations of an atom G is called the label of G. The label updating algorithm of ATMS [14] computes the label of every propositional atom in a bottom-up manner. This algorithm can be logically understood as a ﬁxpoint computation of the following semantic resolution. Let Γ be a set of propositional atoms, and Σ be a set of propositional Horn clauses. Suppose that N is either f alse or any atom appearing in Σ, and that Ni (1 ≤ i ≤ m; m ≥ 0) is any atom and Ai,j (1 ≤ i ≤ m; 1 ≤ j ≤ ni ; ni ≥ 0) is an element of Γ . Then, a clash in semantic resolution of the form: N1 ∧ . . . ∧ Nm ⊃ N Ai,1 ∧ . . . ∧ Ai,ni ⊃ Ni , for all i = 1, . . . , m Ai,j ⊃ N 1≤i≤m, 1≤j≤ni

represents multiple applications of resolution. The label updating algorithm of ATMS takes each clause in Σ as input one by one, applies the above clash as many as possible, and incrementally computes every theorem of Σ that are not subsumed by any other theorem of Σ. Then, each resultant minimal theorem

332

Katsumi Inoue

obtained by this computation yields a prime implicate of Σ. Now, let P I(Σ, Γ ) be the set of such prime implicates. The label of an atom N is obtained as { {A1 , . . . , Ak } ⊆ Γ | ¬A1 ∨ . . . ∨ ¬Ak ∨ N ∈ P I(Σ, Γ ) }. In particular, each element in the label of f alse is called a nogood, which is obtained as the negation of each negative clause from P I(Σ, Γ ). Nogoods are useful for recognizing forbidden combinations of hypotheses in many AI applications, and work as integrity constraints saying that those atoms cannot be assumed simultaneously. A typical implementation of the label updating algorithm performs the above clash computation for an atom N by: (i) generating the product of the labels of antecedent atoms of N , (ii) eliminating each element which is a superset of some nogood, and (iii) eliminating every non-minimal element from the rest. Although ATMS works for propositional abduction only, a similar clash rule that is complete for ﬁrst-order abduction is also proposed in [18], and a method to simulate the above crash using hyperresolution is proposed for ﬁrst-order abductive theories in [97]. Example 4.5 Let (Σ, Γ ) be a propositional Horn abductive theory such that Σ = { A ∧ B ⊃ P, C ⊃ P, B ∧ C ⊃ Q, D ⊃ Q, P ∧ Q ⊃ R,

C ∧ D ⊃ f alse },

Γ = { A, B, C, D }. We here presuppose the existence of tautology α ⊃ α in Σ for each assumption α ∈ Γ , i.e., A ⊃ A, B ⊃ B, C ⊃ C, D ⊃ D. Then, the label of each non-assumption atom is computed as: P : Q: R: f alse :

{{A, B}, {C}}, {{B, C}, {D}}, {{B, C}, {A, B, D}}, {{C, D}}.

To compute the label of R in ATMS, we ﬁrstly construct the product of P and Q’s labels as {{A, B, C}, {B, C}, {A, B, D}, {C, D}}, then eliminate {C, D} as a nogood and {A, B, C} as a superset of {B, C}. The above label updating method from [14] cannot be directly used when Σ contains non-Horn clauses. This is because semantic resolution in the above form is not deductively complete for non-Horn clauses. For a full CMS, the level saturation method is proposed in [88], which involves computation of all prime implicates of Σ. In [34], it is shown that a sound and complete procedure of CMS/ATMS can be provided using SOL resolution, without computing all prime implicates of Σ, for both label generating and label updating.

Automated Abduction

333

Example 4.6 Consider a propositional abductive theory (Σ, Γ ), where Σ = { P ∨ Q, Γ = { A, B }.

¬B ∨ P },

Let N be the set of all atoms appearing in Σ. We set the production ﬁeld as P ∗ = Γ ∪ G, the number of literals from N − Γ is at most one . Then, Carc(Σ, P ∗ ) in this case is equivalent to Σ. While P has the label {{B}}, Q’s label is empty. Now, suppose that a new clause, ¬A ∨ ¬P , is added to Σ. Then, an updating algorithm based on SOL resolution ﬁnds Q’s new label {{A}}, as well as a new nogood {A, B}:

2, ¬A ∨ ¬P , ¬A, ¬P ,

¬A, Q ∨ ¬P , ¬A ∨ Q, ¬P , ¬A ∨ Q, 2 .

¬A, ¬B ∨ ¬P , ¬A ∨ ¬B, ¬P , ¬A ∨ ¬B, 2 .

Abductive procedures based on Clark completion [9,55,28,47] also perform computation of abduction in a deductive manner. This kind of abductive procedures is often used in implementing abductive logic programming. Inoue et al. [42] develop a model generation procedure for bottom-up abduction based on a translation in [44], which applies the Skip rule of SOL resolution [35] in model generation. Abductive procedures that combine top-down and bottom-up approaches are also proposed in two ways: one is to achieve the goal-directedness in bottom-up procedures [77,42,97], and the other is to utilize derived lemmas in top-down methods [49]. Other than these resolution-based procedures, Cialdea Mayer and Pirri [11] propose tableau and sequent calculi for ﬁrst-order abduction. 4.5

Computational Complexity

The computational complexity of abduction has been extensively studied. First, in the case that the background knowledge is expressed in ﬁrst-order logic as in Section 2.1, the problem of ﬁnding an explanation that is consistent with Σ is not semi-decidable. That is, the problem of deciding the satisﬁability of an axiom set is undecidable for ﬁrst-order logic in general, hence computing an explanation is not decidable even if there exists an explanation. For the consequence-ﬁnding problem in Section 4.1, the set of characteristic clauses of Σ is not even recursively enumerable [48]. Similarly, the set of new characteristic clauses of F with respect to Σ, which is used to characterize explanations in

334

Katsumi Inoue

abduction (Theorem 4.1), involves computation as whether a derived formula is not a logical consequence of Σ, which cannot be necessarily determined in a ﬁnite amount of time. Hence, to check if a set E of hypotheses obtained in a top-down default proof or SOL resolution is in fact consistent with Σ, we need some approximation like a procedure which makes a theory consistent whenever a refutation-complete theorem prover cannot succeed to prove ¬E in a ﬁnite amount of time. Next, in the propositional case, the computational complexity of abduction is studied in [6,92,21]. From the theory of enumerating prime implicates, it is known that the number of explanations grows exponentially as number of clauses or propositions grows. Selman and Levesque [92] show that ﬁnding even one explanation of an atom from a Horn theory and a set of atomic hypotheses is NP-hard. Therefore, even if we abandon the completeness of explanations, it is still intractable. However, if we do not restrict a set Γ of hypotheses and can hypothesize any atom to construct explanations, an explanation can be found in polynomial time. Hence, the restriction of abducible atoms is a source of complexity. On the other hand, as analyses by [6,21] show, the intrinsic diﬃculty also lies in checking the consistency of explanations, and the inclusion of negative clauses in a theory increases the complexity. Another source of complexity lies in the requirement of minimality for abductive explanations [21]. However, some tractable classes of abductive theories have also been discovered [23,17]. Thus, in propositional abduction, it is unlikely that there exists a polynomialtime algorithm for abductive explanations in general. We can consider approximation of abduction, by discarding either the consistency or the soundness. However, we should notice that showing that a logical framework of abduction or default reasoning is undecidable or intractable does not mean that it is useless. Since they are intrinsically diﬃcult problems (consider, for instance, scientiﬁc discovery as the process of abduction), what we would like to know is that representing a problem in such a framework does not increase the computational complexity of the original problem.

5 5.1

Final Remark Problems to Be Addressed

In this article, we observed that automated abduction involves automated deduction in some way. However, clarifying the relationship between abduction and deduction is just a ﬁrst step towards a mechanization of Peirce’s abduction. There are many future research topics in automated abduction, which include fundamental problems of abduction, applications of abduction, and computational problems of abduction. Some of these problems are also listed in [51] in this volume, and some philosophical problems are discussed in [26,67]. As a fundamental problem of abduction, we have not yet fully understood the human mechanism of explanation and prediction. The formalization in this article only reﬂects a small part of the whole. Most importantly, there are non-logical aspects of abduction, which are hard to be represented. The mechanization of

Automated Abduction

335

hypothesis selection is one of the most challenging topics. Research on acquiring meta-knowledge like preference among explanations [47] and inventing new abducible hypotheses [40] is related to increase the quality of explanations in abduction. For computational problems, this article showed a directly mechanized way to compute abduction. There are another approach for computation, which translates the abduction problem into other technologies developed in AI. For example, some classes of abductive theories can be transformed into propositional satisﬁability and other nonmonotonic formalizations for which eﬃcient solvers exist. Such indirect approaches are taken in recent applications involving assumptionbased reasoning such as planning and diagnoses. One might think that nonmonotonic logic programming such as the stable model semantics or default logic is enough for reasoning under incomplete information when they are as expressive as the class of abductive theories. The question as to why we need abductive theories should be answered by considering the role of abduction in application domains. One may often understand abductive theories more easily and intuitively than theories represented in other nonmonotonic logics. For example, in diagnostic domains, background knowledge contains cause-eﬀect relations and hypotheses are written as a set of causes. In the process of theory formation, incomplete knowledge is naturally represented in the form of hypothetical rules. We thus can use an abductive framework as a high-level description language while computation of abduction can be compiled into other technologies. 5.2

Towards Mechanization of Scientific Reasoning

Let us recall Peirce’s theory of scientiﬁc reasoning. His theory of scientiﬁc discovery relies on the cycle of “experiment, observation, hypothesis generation, hypothesis veriﬁcation, and hypothesis revision”. Peirce mentions that this process involves all modes of reasoning; abduction takes place at the ﬁrst stage of scientiﬁc reasoning, deduction follows to derive the consequences of the hypotheses that were given by abduction, and ﬁnally, induction is used to verify that those hypotheses are true. According to this viewpoint, let us review the logic of abduction: (1) Facts ∪ Explanation |= Observation . (2) Facts ∪ Explanation is consistent . A possible interpretation of this form of hypothetical reasoning is now as follows. The formula (1) is the process of abduction, or the fallacy of aﬃrming the consequent. The consistency check (2), on the other hand, is the place where deduction plays a role. Since our knowledge about the world may be incomplete, we should experiment with the consequences using an inductive manner in order to verify that the hypotheses are consistent with the knowledge base. At the same time, the process of inductive generalization or the synthesis from examples involves abduction too. This phenomenon of human reasoning is also discussed by Flach and Kakas [27] as the “cycle” of abductive and inductive knowledge development.

336

Katsumi Inoue

When we are given some examples, we ﬁrst make hypotheses. While previous AI approaches for inductive generalization often enumerated all the possible forms of formulas, abduction would help to restrict the search space. Additional heuristics, once they are formalized, would also be helpful for constructing the hypotheses. Investigation on knowledge assimilation involving abduction, deduction and induction will become more and more important in AI research in the 21st century. Acknowledgements. Discussion with many researchers were very helpful in preparing this article. In particular, Bob Kowalski gave me valuable comments on an earlier draft of this article. I would also like to thank Toshihide Ibaraki, Koji Iwanuma, Chiaki Sakama, Ken Satoh, and Hiromasa Haneda for their suggestions on this work.

References 1. Chitta Baral. Abductive reasoning through ﬁltering. Artificial Intelligence, 120:1–28, 2000. 2. Nicole Bidoit and Christine Froidevaux. Minimalism subsumes default logic and circumscription. In: Proceedings of LICS-87, pages 89–97, 1987. 3. Genevieve Bossu and Pierre Siegel. Saturation, nonmonotonic reasoning, and the closed-world assumption. Artificial Intelligence, 25:13–63, 1985. 4. A. Bondarenko, P. M. Dung, R. A. Kowalski, and F. Toni. An abstract, argumentation-theoretic approach to default reasoning. Artificial Intelligence, 93:63–101, 1997. 5. Craig Boutilier and Ver´ onica Becher. Abduction as belief revision. Artificial Intelligence, 77:43–94, 1995. 6. Tom Bylander, Dean Allemang, Michael C. Tanner, and John R. Josephson. The computational complexity of abduction. Artificial Intelligence, 49:25–60, 1991. 7. Chin-Liang Chang and Richard Char-Tung Lee. Symbolic Logic and Mechanical Theorem Proving. Academic Press, New York, 1973. 8. Viorica Ciorba. A query answering algorithm for Lukaszewicz’ general open default theory. In: Proceedings of JELIA ’96, Lecture Notes in Artiﬁcial Intelligence, 1126, pages 208–223, Springer, 1996. 9. Luca Console, Daniele Theseider Dupre, and Pietro Torasso. On the relationship between abduction and deduction. Journal of Logic and Computation, 1:661–690, 1991. 10. P.T. Cox and T. Pietrzykowski. Causes for events: their computation and applications. In: Proceedings of the 8th International Conference on Automated Deduction, Lecture Notes in Computer Science, 230, pages 608–621, Springer, 1986. 11. Marita Cialdea Mayer and Fiora Pirri. First order abduction via tableau and sequent calculi. Journal of the IGPL, 1(1):99–117, 1993. 12. Marita Cialdea Mayer and Fiora Pirri. Abduction is not deduction-in-reverse. Journal of the IGPL, 4(1):95–108, 1996. 13. Hendrik Decker. An extension of SLD by abduction and integrity maintenance for view updating in deductive databases. In: Proceedings of the 1996 Joint International Conference and Symposium on Logic Programming, pages 157–169, MIT Press, 1996.

Automated Abduction

337

14. Johan de Kleer. An assumption-based TMS. Artificial Intelligence, 28:127–162, 1986. 15. Alvaro del Val. Approximate knowledge compilation: the ﬁrst order case. In: Proceedings of AAAI-96, pages 498–503, AAAI Press, 1996. 16. Alvaro del Val. A new method for consequence ﬁnding and compilation in restricted languages. In: Proceedings of AAAI-99, pages 259–264, AAAI Press, 1999. 17. Alvaro del Val. On some tractable classes in deduction and abduction. Artificial Intelligence, 116:297–313, 2000. 18. Robert Demolombe and Luis Fari˜ nas del Cerro. An inference rule for hypothesis generation. In: Proceedings of IJCAI-91, pages 152–157, 1991. 19. Marc Denecker and Danny De Schreye. SLDNFA: an abductive procedure for abductive logic programs. Journal of Logic Programming, 34:111–167, 1998. 20. Marc Denecker and Antonis Kakas, editors. Special Issue: Abductive Logic Programming. Journal of Logic Programming, 44(1–3), 2000. 21. Thomas Eiter and George Gottlob. The complexity of logic-based abduction. Journal of the ACM, 42(1):3–42, 1995. 22. Thomas Eiter, George Gottlob, and Nicola Leone. Semantics and complexity of abduction from default theories. Artificial Intelligence, 90:177–223, 1997. 23. Kave Eshghi. A tractable class of abduction problems. In: Proceedings of IJCAI93, pages 3–8, 1993. 24. David W. Etherington. Reasoning with Incomplete Information. Pitman, London, 1988. 25. Joseph J. Finger. Exploiting constraints in design synthesis. Ph.D. Dissertation, Technical Report STAN-CS-88-1204, Department of Computer Science, Stanford University, Stanford, CA, 1987. 26. Peter A. Flach and Antonis C. Kakas, editors. Abduction and Induction—Essays on their Relation and Integration. Kluwer Academic, 2000. 27. Peter A. Flach and Antonis C. Kakas. Abductive and inductive reasoning: background and issues. In: [26], pages 1–27, 2000. 28. T. H. Fung and R. Kowalski. The iﬀ procedure for abductive logic programming. Journal of Logic Programming, 33:151–165, 1997. 29. Michael Gelfond and Vladimir Lifschitz. Classical negation in logic programs and disjunctive databases. New Generation Computing, 9:365–385, 1991. 30. Michael Gelfond, Halina Przymusinska, and Teodor Przymusinski. On the relationship between circumscription and negation as failure. Artificial Intelligence, 38:75–94, 1989. 31. Matthew L. Ginsberg. A circumscriptive theorem prover. Artificial Intelligence, 39:209–230, 1989. 32. Nicolas Helft, Katsumi Inoue, and David Poole. Query answering in circumscription. In: Proceedings of IJCAI-91, pages 426–431, 1991. 33. Carl Gustav Hempel. Philosophy of Natural Science. Prentice-Hall, New Jersey, 1966. 34. Katsumi Inoue. An abductive procedure for the CMS/ATMS. In: Jo˜ ao P. Martins and Michael Reinfrank, editors, Truth Maintenance Systems, Lecture Notes in Artiﬁcial Intelligence, 515, pages 34–53, Springer, 1991. 35. Katsumi Inoue. Linear resolution for consequence ﬁnding. Artificial Intelligence, 56:301–353, 1992. 36. Katsumi Inoue. Studies on abductive and nonmonotonic reasoning. Doctoral Dissertation, Kyoto University, Kyoto, 1992.

338

Katsumi Inoue

37. Katsumi Inoue. Principles of abduction. Journal of Japanese Society for Artificial Intelligence, 7(1):48–59, 1992 (in Japanese). 38. Katsumi Inoue. Hypothetical reasoning in logic programs. Journal of Logic Programming, 18(3):191–227, 1994. 39. Katsumi Inoue. Induction, abduction, and consequence-ﬁnding. In: C´eline Rouveirol and Mich`ele Sebag, editors, Proceedings of the 11th International Conference on Inductive Logic Programming, Lecture Notes in Artiﬁcial Intelligence, 2157, pages 65–79, Springer, 2001. 40. Katsumi Inoue and Hiromasa Haneda. Learning abductive and nonmonotonic logic programs. In: [26], pages 213–231, 2000. 41. Katsumi Inoue and Nicolas Helft. On theorem provers for circumscription. In: Peter F. Patel-Schneider, editor, Proceedings of the 8th Biennial Conference of the Canadian Society for Computational Studies of Intelligence, pages 212–219, Morgan Kaufmann, 1990. 42. Katsumi Inoue, Yoshihiko Ohta, Ryuzo Hasegawa, and Makoto Nakashima. Bottom-up abduction by model generation. In: Proceedings of IJCAI-93, pages 102–108, Morgan Kaufmann, 1993. 43. Katsumi Inoue and Chiaki Sakama. Abductive framework for nonmonotonic theory change. In: Proceedings of IJCAI-95, pages 204–210, Morgan Kaufmann, 1995. 44. Katsumi Inoue and Chiaki Sakama. A ﬁxpoint characterization of abductive logic programs. Journal of Logic Programming, 27(2):107–136, 1996. 45. Katsumi Inoue and Chiaki Sakama. Negation as failure in the head. Journal of Logic Programming, 35(1):39–78, 1998. 46. Katsumi Inoue and Chiaki Sakama. Abducing priorities to derive intended conclusions. In: Proceedings of IJCAI-99, pages 44–49, Morgan Kaufmann, 1999. 47. Katsumi Inoue and Chiaki Sakama. Computing extended abduction through transaction programs. Annals of Mathematics and Artificial Intelligence, 25(3,4):339-367, 1999. 48. Koji Iwanuma and Katsumi Inoue. Minimal conditional answer computation and SOL. To appear, 2002. 49. Koji Iwanuma, Katsumi Inoue, and Ken Satoh. Completeness of pruning methods for consequence ﬁnding procedure SOL. In: Peter Baumgartner and Hantao Zhang, editors, Proceedings of the 3rd International Workshop on First-Order Theorem Proving, pages 89–100, Research Report 5-2000, Institute for Computer Science, University of Koblenz, Germany, 2000. ˙ 50. John R. Jpsephson and Susan G.Josephson. Abductive Inference: Computation, Philosophy, Technology. Cambridge University Press, 1994. 51. Antonis Kakas and Marc Denecker. Abductive logic programming. In this volume, 2002. 52. A.C. Kakas and P. Mancarella. Generalized stable models: a semantics for abduction. In: Proceedings of ECAI-90, pages 385–391, 1990. 53. A. C. Kakas, R. A. Kowalski, and F. Toni. Abductive logic programming. Journal of Logic and Computation, 2:719–770, 1992. 54. A. C. Kakas, R. A. Kowalski, and F. Toni. The role of abduction in logic programming. In: Dov M. Gabbay, C. J. Hogger, and J. A. Robinson, editors, Handbook of Logic in Artificial Intelligence and Logic Programming, Volume 5, pages 235–324, Oxford University Press, 1998. 55. Kurt Konolige. Abduction versus closure in causal theories. Artificial Intelligence, 53:255–272, 1992.

Automated Abduction

339

56. Kurt Konolige. Abductive theories in artiﬁcial intelligence. In: Gerhard Brewka, editor, Principles of Knowledge Representation, pages 129–152, CSLI Publications & FoLLI, 1996. 57. R. Kowalski. The case for using equality axioms in automated demonstration. In: Proceedings of the IRIA Symposium on Automatic Demonstration, Lecture Notes in Mathematics, 125, pages 112–127, Springer, 1970. 58. Robert A. Kowalski. Logic for Problem Solving. Elsevier, New York, 1979. 59. Robert Kowalski and Donald G. Kuehner. Linear resolution with selection function. Artificial Intelligence, 2:227–260, 1971. 60. Robert A. Kowalski and Francesca Toni. Abstract argumentation. Artificial Intelligence and Law, 4:275–296, 1996. 61. Char-Tung Lee. A completeness theorem and computer program for ﬁnding theorems derivable from given axioms. Ph.D. thesis, Department of Electrical Engineering and Computer Science, University of California, Berkeley, CA, 1967. 62. R. Letz, K. Mayer, and C. Goller. Controlled integration of the cut rule into connection tableau calculi. Journal of Automated Reasoning, 13(3):297–337, 1994. 63. Hector J. Levesque. A knowledge-level account of abduction (preliminary version). In: Proceedings of IJCAI-89, pages 1061–1067, 1989. 64. Vladimir Lifschitz. Computing circumscription. In: Proceedings of IJCAI-85, pages 121–127, 1985. 65. Jorge Lobo and Carlos Uzc´ ategui. Abductive consequence relations. Artificial Intelligence, 89:149–171, 1997. 66. Donald W. Loveland. Automated Theorem Proving: A Logical Basis. NorthHolland, Amsterdam, 1978. 67. Lorenzo Magnani. Abduction, Reason, and Science—Processes of Discovery and Explanation. Kluwer Academic, 2001. 68. Pierre Marquis. Consequence ﬁnding algorithms. In: Dov M. Gabbay and Philippe Smets, editors, Handbook for Defeasible Reasoning and Uncertain Management Systems, Volume 5, pages 41–145, Kluwer Academic, 2000. 69. Philippe Mathieu and Jean-Paul Delahaye. A kind of logical compilation for knowledge bases. Theoretical Computer Science, 131:197–218, 1994. 70. John McCarthy. Circumscription—a form of non-monotonic reasoning. Artificial Intelligence, 13:27–39, 1980. 71. John McCarthy. Applications of circumscription to formalizing common-sense knowledge. Artificial Intelligence, 28:89–116, 1986. 72. Eliana Minicozzi and Raymond Reiter. A note on linear resolution strategies in consequence-ﬁnding. Artificial Intelligence, 3:175–180, 1972. 73. Jack Minker. On indeﬁnite databases and the closed world assumption. In: Proceedings of the 6th International Conference on Automated Deduction, Lecture Notes in Computer Science, 138, pages 292–308, Springer, 1982. 74. Robert C. Moore. Semantical considerations on nonmonotonic logic. Artificial Intelligence, 25:75–94, 1985. 75. Stephen Muggleton. Inverse entailment and Progol. New Generation Computing, 13:245–286, 1995. 76. Shan-Hwei Nienhuys-Cheng and Ronald de Wolf. Foundations of Inductive Logic Programming. Lecture Notes in Artiﬁcial Intelligence, 1228, Springer, 1997. 77. Yoshihiko Ohta and Katsumi Inoue. Incorporating top-down information into bottom-up hypothetical reasoning. New Generation Computing, 11:401–421, 1993.

340

Katsumi Inoue

78. Gabriele Paul. AI approaches to abduction. In: Dov M. Gabbay and Philippe Smets, editors, Handbook for Defeasible Reasoning and Uncertain Management Systems, Volume 4, pages 35–98, Kluwer Academic, 2000. 79. Charles Sanders Peirce. Elements of Logic. In: Charles Hartshorne and Paul Weiss, editors, Collected Papers of Charles Sanders Peirce, Volume II, Harvard University Press, Cambridge, MA, 1932. 80. Ram´ on Pino-P´erez and Carlos Uzc´ ategui. Jumping to explanations versus jumping to conclusions. Artificial Intelligence, 111:131–169, 1999. 81. David Poole. A logical framework for default reasoning. Artificial Intelligence, 36:27–47, 1988. 82. David Poole. Explanation and prediction: an architecture for default and abductive reasoning. Computational Intelligence, 5:97–110, 1989. 83. David Poole. Compiling a default reasoning system into Prolog. New Generation Computing, 9:3–38, 1991. 84. David Poole, Randy Goebel, and Romas Aleliunas. Theorist: a logical reasoning system for defaults and diagnosis. In: Nick Cercone and Gordon McCalla, editors, The Knowledge Frontier: Essays in the Representation of Knowledge, pages 331– 352, Springer, New York, 1987. 85. Harry E. Pople, Jr. On the mechanization of abductive logic. In: Proceedings of IJCAI-73, pages 147–152, 1973. 86. Teodor C. Przymusinski. An algorithm to compute circumscription. Artificial Intelligence, 38:49–73, 1989. 87. Raymond Reiter. A logic for default reasoning. Artificial Intelligence, 13:81–132, 1980. 88. Raymond Reiter and Johan de Kleer. Foundations of assumption-based truth maintenance systems: preliminary report. In: Proceedings of AAAI-87, pages 183–187, 1987. 89. J.A. Robinson. A machine-oriented logic based on the resolution principle. Journal of the ACM, 12:23–41, 1965. 90. Olivier Roussel and Philippe Mathieu. Exact knowledge compilation in predicate calculus: the partial achievement case. In: Proceedings of the 14th International Conference on Automated Deduction, Lecture Notes in Artiﬁcial Intelligence, 1249, pages 161–175, Springer, 1997. 91. Murray Shanahan. Prediction is deduction but explanation is abduction. In: Proceedings of IJCAI-89, pages 1055–1060, Morgan Kaufmann, 1989. 92. Bart Selman and Hector J. Levesque. Support set selection for abductive and default reasoning. Artificial Intelligence, 82:259–272, 1996. 93. Pierre Siegel, Repr´esentation et utilization de la connaissance en calcul propo´ sitionnel. Th`ese d’Etat, Universit´e d’Aix-Marseille II, Luminy, France, 1987 (in French). 94. Pierre Siegel and Camilla Schwind. Hypothesis theory for nonmonotonic reasoning. In: Proceedings of the Workshop on Nonstandard Queries and Nonstandard Answers, pages 189–210, 1991. 95. J.R. Slagle, C.L. Chang, and R.C.T. Lee, Completeness theorems for semantic resolution in consequence-ﬁnding. In: Proceedings of IJCAI-69, pages 281–285, Morgan Kaufmann, 1969. 96. Mark E. Stickel. Rationale and methods for abductive reasoning in naturallanguage interpretation. In: R. Studer, editor, Natural Language and Logic, Proceedings of the International Scientific Symposium, Lecture Notes in Artiﬁcial Intelligence, 459, pages 233–252, Springer, 1990.

Automated Abduction

341

97. Mark E. Stickel. Upside-down meta-interpretation of the model elimination theorem-proving procedure for deduction and abduction. Journal of Automated Reasoning, 13(2):189–210, 1994. 98. Akihiro Yamamoto. Using abduction for induction based on bottom generalization. In: [26], pages 267–280, 2000. 99. Eiko Yamamoto and Katsumi Inoue. Implementation of SOL resolution based on model elimination. Transactions of Information Processing Society of Japan, 38(11):2112–2121, 1997 (in Japanese). 100. Wlodek Zadrozny. On rules of abduction. Annals of Mathematics and Artificial Intelligence, 9:387–419, 1993.

The Role of Logic in Computational Models of Legal Argument: A Critical Survey Henry Prakken1 and Giovanni Sartor2 1

Institute of Information and Computing Sciences Utrecht University, The Netherlands http://www.cs.uu.nl/staff/henry.html 2 Faculty of Law, University of Bologna, Italy [email protected]

Abstract. This article surveys the use of logic in computational models of legal reasoning, against the background of a four-layered view on legal argument. This view comprises a logical layer (constructing an argument); a dialectical layer (comparing and assessing conﬂicting arguments); a procedural layer (regulating the process of argumentation); and a strategic, or heuristic layer (arguing persuasively). Each further layer presupposes, and is built around the previous layers. At the ﬁrst two layers the information base is ﬁxed, while at the third and fourth layer it is constructed dynamically, during a dialogue or dispute.

1 1.1

Introduction AI & Law Research on Legal Argument

This article surveys a ﬁeld that has been heavily inﬂuenced by Bob Kowalski, the logical analysis of legal reasoning and legal knowledge representation. Not only has he made important contributions to this ﬁeld (witness the many times his name will be mentioned in this survey) but also has he inﬂuenced many to undertake such a logical analysis at all. Our research has been heavily inﬂuenced by his work, building on logic programming formalisms and on the well-known argumentation-theoretic account of nonmonotonic logic, of which Bob Kowalski was one of the originators [Kakas et al., 1992, Bondarenko et al., 1997]. We feel therefore very honoured to contribute to this volume in honour of him. The precise topic of this survey is the role of logic in computational models of legal argument. Argumentation is one of the central topics of current research in Artiﬁcial Intelligence and Law. It has attracted the attention of both logically inclined and design-oriented researchers. Two common themes prevail. The ﬁrst is that legal reasoning is defeasible, i.e., an argument that is acceptable in itself can be overturned by counterarguments. The second is that legal reasoning is usually performed in a context of debate and disagreement. Accordingly, such notions are studied as argument moves, attack, dialogue, and burden of proof. Historically, perhaps the ﬁrst AI & Law attempt to address legal reasoning in an adversarial setting was McCarty’s (partly implemented) Taxman A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 342–381, 2002. c Springer-Verlag Berlin Heidelberg 2002

The Role of Logic in Computational Models of Legal Argument

343

project, which aimed to reconstruct the lines of reasoning in the majority and dissenting opinions of a few leading American tax law cases (see e.g. [McCarty and Sridharan, 1981, McCarty, 1995]). Perhaps the ﬁrst AI & Law system that explicitly deﬁned notions like dispute and dialectical role was Rissland & Ashley’s (implemented) HYPO system [Rissland and Ashley, 1987], which modelled adversarial reasoning with legal precedents. It generated 3-ply disputes between plaintiﬀ and defendant in a legal case, where each dispute is an alternating series of attacks by the defendant on the plaintiﬀ’s claim, and of defences or counterattacks by the plaintiﬀ against these attacks. This research was continued in Rissland & Skalak’s CABARET project [Rissland and Skalak, 1991], and Aleven & Ashley’s CATO project [Aleven and Ashley, 1997], both also in the ‘design’ strand. The main focus of all these projects is deﬁning persuasive argument moves, moves which would be made by ‘good’ human lawyers. By contrast, much logic-based research on legal argument has focused on defeasible inference, inspired by AI research on nonmonotonic reasoning and defeasible argumentation [Gordon, 1991, Kowalski and Toni, 1996, Prakken and Sartor, 1996, Prakken, 1997, Nitta and Shibasaki, 1997, Hage, 1997, Verheij, 1996]. Here the focus was ﬁrst on reasoning with rules and exceptions and with conﬂicting rules. After a while, some turned their attention to logical accounts of case-based reasoning [Loui et al., 1993, Loui and Norman, 1995, Prakken and Sartor, 1998]. Another shift in focus occurred after it was realised that legal reasoning is bound not only by the rules of logic but also by those of fair and eﬀective procedure. Accordingly, logical models of legal argument have been augmented with a dynamic component, capturing that the information with which a case is decided is not somehow ‘there’ to be applied, but is constructed dynamically, in the course of a legal procedure (e.g. [Hage et al., 1994, Gordon, 1994, Bench-Capon, 1998, Lodder, 1999, Prakken, 2001b]). In contrast to the above-mentioned work on dispute in the ‘design’ strand, here the focus is more on procedure and less on persuasive argument moves, i.e., more on the rules of the ’debating game’ and less on how to play this game well. In this survey we will discuss not only logical approaches but also some work from the ’design strand’. This is since, in our opinion, these approaches should not be regarded as alternatives but should complement and inspire each other. A purely logic-based approach runs the risk of becoming too abstract and ignored by the ﬁeld for which it is intended, while a purely design-based approach is in danger of becoming too self-centred and ad-hoc. 1.2

A Four-Layered View on Legal Argument

How can all these research projects be compared and contrasted? We propose that models of legal argument can be described in terms of four layers.1 The 1

The combination of the ﬁrst three layers was ﬁrst discussed by [Prakken, 1995]. The ﬁrst and third layer were also discussed by [Brewka and Gordon, 1994]. The fourth layer was added by [Prakken, 1997] and also discussed in [Sartor, 1997].

344

Henry Prakken and Giovanni Sartor

ﬁrst, logical layer deﬁnes what arguments are, i.e., how pieces of information can be combined to provide basic support for a claim. The second, dialectical layer focuses on conﬂicting arguments: it introduces such notions as ‘counterargument’, ‘attack’, ‘rebuttal’ and ‘defeat’, and it deﬁnes, given a set of arguments and evaluation criteria, which arguments prevail. The third, procedural layer regulates how an actual dispute can be conducted, i.e., how parties can introduce or challenge new information and state new arguments. In other words, this level deﬁnes the possible speech acts, and the discourse rules governing them. Thus the procedural layer diﬀers from the ﬁrst two in one crucial respect. While those layers assume a ﬁxed set of premises, at the procedural layer the set of premises is constructed dynamically, during a debate. This also holds for the ﬁnal layer, the strategic or heuristic one, which provides rational ways of conducting a dispute within the procedural bounds of the third layer. All four layers are to be integrated into a comprehensive view of argumentation: the logical layer deﬁnes, by providing a notion of arguments, the objects to be evaluated at the dialectical layer; the dialectical layer oﬀers to the procedural and heuristic layers a judgement of whether a new argument might be relevant in the dispute; the procedural layer constrains the ways in which new inputs, supplied by the heuristic layer can be submitted to the dialectical one; the heuristic layer provides the matter which is to be processed in the system. Each layer can obviously be studied (and implemented) in abstraction from the other ones. However, a main premise of this article is that research at the individual levels would beneﬁt if the connection with the other layers is always kept in mind. For instance, logical techniques (whether monotonic or not) have a better chance of being accepted by the AI & Law community when they can easily be embedded in procedural or heuristic layers of legal argument. Let us illustrate the four layers with an example of a legal dispute. P1 : I claim that John is guilty of murder. O1 : I deny your claim. P2 : John’s ﬁngerprints were on the knife. If someone stabs a person to death, his ﬁngerprints must be on the knife, so, John has stabbed Bill to death. If a person stabs someone to death, he is guilty of murder, so, John is guilty of murder. O2 : I concede your premises, but I disagree that they imply your claim: Witness X says that John had pulled the knife out of the dead body. This explains why his ﬁngerprints were on the knife. P3 X’s testimony is inadmissible evidence, since she is anonymous. Therefore, my claim still stands. P1 illustrates the procedural layer: the proponent of a claim starts a dispute by stating his claim. The procedure now says that the opponent can either accept or deny this claim. O does the latter with O1 . The procedure now assigns the burden of proof to P . P attempts to fulﬁl this burden with an argument for his claim (P2 ). Note that this argument is not deductive since it includes

The Role of Logic in Computational Models of Legal Argument

345

an abductive inference step; whether it is constructible, is determined at the logical layer. The same holds for O’s counterargument O2 , but whether it is a counterargument and has suﬃcient attacking strength is determined at the dialectical layer, while O’s right to state a counterargument is deﬁned by the procedure. The same remarks hold for P ’s counterargument P3 . In addition, P3 illustrates the heuristic layer: it uses the heuristic that evidence can be attacked by arguing that it is inadmissible. This paper is organised as follows. First, in Section 2 we discuss the four layers in more detail. Then in Section 3, we use them in discussing the most inﬂuential computational models of legal argument. In Section 4, we do the same for the main logical analyses of legal argument, after which we conclude.

2

Four Layers in Legal Argument

Let us now look in more detail at the four layers of legal argument. It is important to note that the ﬁrst two layers comprise the subject matter of nonmonotonic logics. One type of such logics explicitly separates the two layers, viz. logical systems for defeasible argumentation (cf. [Prakken and Vreeswijk, 2002]). For this reason we will largely base our discussions on the structure of these systems. However, since [Dung, 1995] and [Bondarenko et al., 1997] have shown that essentially all nonmonotonic logics can be recast as such argument-based systems, most of what we will say also applies to other nonmonotonic logics.

2.1

The Logical Layer

The logical layer is concerned with the language in which information can be expressed, and with the rules for constructing arguments in this language.

The Logical Language Deontic terms One ongoing debate in AI & Law is whether normative terms such as ‘obligatory’, ‘permitted’ and ‘forbidden’ should be formalised in (modal) deontic logics or whether they can be expressed in ﬁrst-order logic; cf. e.g. [Jones and Sergot, 1992]. From our perspective this issue is not very relevant, since logics for defeasible argumentation can cope with any underlying logic. Moreover, as for the defeasibility of deontic reasoning, we think that special deontic defeasible logics (see e.g. [Nute, 1997]) are not very suited. It is better to embed one’s preferred deontic monotonic logic in one’s preferred general defeasible logic, since legal defeasibility is not restricted to deontic terms, but extends to all other kinds of legal knowledge, including deﬁnitions and evidential knowledge. Obviously, a uniﬁed treatment of defeasibility is to be preferred; cf. [Prakken, 1996].

346

Henry Prakken and Giovanni Sartor

Conceptual structures Others have focused on the formalisation of recurring conceptual legal structures. Important work in this area is McCarty’s[1989] Language of Legal Discourse, which addresses the representation of such categories as space, time, mass, action, causation, intention, knowledge, and belief. This strand of work is, although very important for AI & Law, less relevant for our concerns, for the same reasons as in the deontic case: argument-based systems can deal with any underlying logic. Conditional rules A topic that is more relevant for our concerns is the representation of conditional legal rules. The main issue here is whether legal rules satisfy contrapositive properties or not. Some AI & Law formalisms, e.g. Gordon’s [1995] Pleadings Game, validate contraposition. However, [Prakken, 1997] has argued that contraposition makes counterarguments possible that would never be considered in actual reasoning practice. A possible explanation for why this is the case is Hage’s [1996, 1997] view on legal rules as being constitutive. In this view (based on insights of analytical philosophy) a legal rule does not describe but constitutes states of aﬀairs: for instance, a legal rule makes someone a thief or something a contract, it does not describe that this is the case. According to Hage, a legal rule must be applied to make things the case, and lawyers never apply rules contrapositively. This view is related to AI interpretations of defaults as inference licences or inference policies [Loui, 1998, Nute, 1992], while the invalidity of contraposition has also been defended in the context of causal reasoning; see e.g. [Geﬀner, 1992]. Finally, contraposition is also invalid in extended logic programming, where programs can have both weak and strong negations; cf. [Gelfond and Lifschitz, 1990]. Weak and strong negation The desire to formalise reasoning with rules and exceptions sometimes motivates the use of a nonprovability, consistency or weak negation operator, such as negation as failure in logic programming. Whether such a device should be used depends on one’s particular convention for formalising rules and exceptions (see further Section 2 below). Metalogic Features Much legal knowledge is metaknowledge, for instance, knowledge about the general validity of rules or their applicability to certain kinds of cases, priority principles for resolving conﬂicts between conﬂicting rules, or principles for interpreting legal rules. Clearly, for representing such knowledge metalogic tools are needed. Logic-based AI & Law research of legal argument has made ample use of such tools, as this survey will illustrate. Non-logical languages Finally, non-logical languages can be used. On the one hand, there are the well-known knowledge representation formalisms, such as frames and semantic networks. In AI, their logical interpretation has been thoroughly studied. On the other hand, in AI & Law various special-purpose schemes have been developed, such as HYPO’s factor-based representation of cases (see Section 3.3), ZENO’s issue-position-based language [Gordon and Kara¸capilidis, 1997], Room 5’s encapsulated text frames

The Role of Logic in Computational Models of Legal Argument

347

[Loui et al., 1997], ArguMed’s linked-boxes language [Verheij, 1999], or variants of Toulmin’s [1958] well-known argument scheme [Bench-Capon, 1998]. Simple non-logical languages are especially convenient in systems for intelligent tutoring (such as CATO) or argument mediation (such as ROOM 5, ZENO and ArguMed), since users of such systems cannot be expected to formalise their arguments in logic. In formally reconstructing such systems, one issue is whether their representation language should be taken as primitive or translated into some known logical language. Argument-based logics leave room for both options. Argument Construction As for argument construction, a minor issue is how to format arguments: as simple premises - conclusion pairs, as sequences of inferences (deductions) or as trees of inferences. The choice between these options seems a matter of convenience; for a discussion of the various options see e.g. [Prakken and Vreeswijk, 2002]. More crucial issues are whether incomplete arguments, i.e., arguments with hidden premises, should be allowed and whether nondeductive arguments should be allowed. Incomplete Arguments In ordinary language people very often omit information that could make their arguments valid, such as in “John has killed Pete, so John is guilty of Murder”. Here the hidden premise “Who kills another person is guilty of murder” is omitted. In some argument mediation applications, e.g. [Lodder, 1999], such incomplete arguments have been allowed, for instance, to give the listener the opportunity to agree with the argument, so that obvious things can be dealt with eﬃciently. In our opinion this makes sense, but only if a listener who does not agree with the argument has a way to challenge its validity. Non-deductive argument types Non-deductive reasoning forms, such as inductive, abductive and analogical reasoning are clearly essential to any form of practical reasoning, so they must have a place in the four-layered view on argumentation. In legal reasoning inductive and abductive arguments play an important role in evidential reasoning, while analogical arguments are especially important in the interpretation of legal concepts. The main issue is whether these reasoning forms should be regarded as argument construction principles (the logical layer) or as heuristics for ﬁnding new information (the heuristic layer). In [Prakken, 1995], one of us argued for the latter option. For instance, Prakken argued that an analogy is inherently unable to justify its conclusion since in the end it must always be decided whether the similarities outweigh the diﬀerences or not. However, others, e.g. [Loui et al., 1993, Loui, 1998], have included analogical arguments at the logical layer on the grounds that if they are untenable, this will show itself in a rational dispute. Clearly, the latter view presupposes that the dialectical layer is embedded in the procedural layer. For a legal-theoretical discussion of the issue see [Peczenik, 1996, pp. 310–313]. Outside AI & Law, a prominent argument-based

348

Henry Prakken and Giovanni Sartor

system that admits non-deductive arguments is [Pollock, 1995]’s OSCAR system. Our present opinion is that both approaches make sense. One important factor here is whether the dialectical layer is embedded in the procedural layer. Another important factor is whether a reasoning form is used to justify a conclusion or not. For instance, some uses of analogy concern learning [Winston, 1980], while other uses concern justiﬁcation (as in much AI & Law work on case-based reasoning). One thing is especially important: if non-deductive arguments are admitted at the logical layer, then the dialectical layer should provide for ways to attack the link between their premises and conclusion; cf. Pollock’s [1995] undercutters of defeasible inference rules. For instance, if analogies are admitted, it should not only be possible to rebut them with counterexamples, i.e., with analogies for contradictory conclusions, but it should also be possible to undercut analogies by saying that the similarities are irrelevant, or that the diﬀerences are more important than the similarities. 2.2

The Dialectical Layer

The dialectical layer addresses three issues: when arguments are in conﬂict, how conﬂicting arguments can be compared, and which arguments survive the competition between all conﬂicting arguments. Conflict In the literature, three types of conﬂicts between arguments are discussed. The ﬁrst is when arguments have contradictory conclusions, as in ‘A contract exists because there was an oﬀer and an acceptance’ and ‘A contract does not exist because the oﬀerer was insane when making the oﬀer’. Clearly, this form of attack, often called rebutting an argument, is symmetric. The other two types of conﬂict are not symmetric. One is where one argument makes a nonprovability assumption (e.g. with logic-programming’s negation as failure) and another argument proves what was assumed unprovable by the ﬁrst. For example, an argument ‘A contract exists because there was an oﬀer and an acceptance, and it is not provable that one of the parties was insane’, is attacked by any argument with conclusion ‘The oﬀerer was insane’. In [Prakken and Vreeswijk, 2002] this is called assumption attack. The ﬁnal type of conﬂict (identiﬁed by Pollock, e.g. 1995) is when one argument challenges a rule of inference of another argument. After Pollock, this is usually called undercutting an inference. Obviously, a rule of inference can only be undercut if it is not deductive. For example, an analogy can be undercut by saying that the similarity is insuﬃcient to warrant the same conclusion. Note, ﬁnally, that all these senses of attack have a direct and an indirect version; indirect attack is directed against a subconclusion or a substep of an argument. For instance, indirect rebuttals contradict an intermediate conclusion of an argument. Comparing Arguments The notion of conﬂicting, or attacking arguments does not embody any form of evaluation; comparing conﬂicting pairs of arguments, or in other words, determining whether an attack is successful, is

The Role of Logic in Computational Models of Legal Argument

349

another element of argumentation. The terminology varies: some terms that have been used are ‘defeat’ [Prakken and Sartor, 1996], ‘attack’ [Dung, 1995, Bondarenko et al., 1997] and ‘interference’ [Loui, 1998]. In this article we shall use defeat for the weak notion and strict defeat for the strong, asymmetric notion. How are conﬂicting arguments compared in the legal domain? Two main points must be stressed here. The ﬁrst is that general, domain-independent standards are of little use. Lawyers use many domain-speciﬁc standards, ranging from general principles such as “the superior law overrides the inferior law” and “the later regulation overrides the earlier one” to case-speciﬁc and context-dependent criteria such as “preferring this rule promotes economic competition, which is good for society”, or “following this argument would lead to an enormous increase in litigation, which should be avoided”. The second main point is that these standards often conﬂict, so that the comparison of conﬂicting arguments is itself a subject of dispute. For instance, the standards of legal certainty and individual fairness often conﬂict in concrete situations. For logical models of legal argument this means that priority principles must be expressible in the logical language, and that their application must be modelled as defeasible reasoning. Speciﬁcity Some special remarks are in order about the speciﬁcity principle. In AI this principle is often regarded as very important. However, in legal reasoning it is just one of the many standards that might be used, and it is often overridden by other standards. Moreover, there are reasons to doubt whether speciﬁcity of regulations can be syntactically deﬁned at all. Consider the following imaginary example (due to Marek Sergot, personal communication). 1. All cows must have earmarks 2. Calfs need not have earmarks 3. All cows must have earmarks, whether calf or not 4. All calfs are cows Lawyers would regard (2) as an exception to (1) because of (4) but they would certainly not regard (2) as an exception to (3), since the formulation of (3) already takes the possible exception into account. Yet logically (3) is equivalent to (1), since the addition “whether calf or not” is a tautology. In conclusion, speciﬁcity may be suitable as a notational convention for exceptions, but it cannot serve as a domain-independent conﬂict resolution principle. Assessing the Status of Arguments The notion of defeat only tells us something about the relative strength of two individual conﬂicting arguments; it does not yet tell us with what arguments a dispute can be won. The ultimate status of an argument depends on the interaction between all available arguments. An important phenomenon here is reinstatement :2 it may very well be that argument B defeats argument A, but that B is itself defeated by a third argument 2

But see [Horty, 2001] for a critical analysis of the notion of reinstatement.

350

Henry Prakken and Giovanni Sartor

C; in that case C ‘reinstates’ A. Suppose, for instance, that the argument A that a contract exists because there there was an oﬀer and acceptance, is defeated by the argument B that a contract does not exist because the oﬀerer was insane when making the oﬀer. And suppose that B is in turn (strictly) defeated by an argument C, attacking B’s intermediate conclusion that the oﬀerer was insane at the time of the oﬀer. In that case C reinstates argument A. The main distinction is that between justiﬁed , defensible and overruled arguments. The distinction between justiﬁed and defensible arguments corresponds to the well-known distinction between sceptical and credulous reasoning, while overruled arguments are those that are defeated by a justiﬁed argument. Several ways to deﬁne these notions have been studied, both in semantic and in proof-theoretic form, and both for justiﬁcation and for defensibility. See [Prakken and Vreeswijk, 2002] for an overview and especially [Dung, 1995, Bondarenko et al., 1997] for semantical studies. For present purposes the diﬀerences in semantics do not matter much; what is more important is that argumentbased proof theories can be stated in the dialectical form of an argument game, as a dispute between a proponent and opponent of a claim. The proponent starts with an argument for this claim, after which each player must attack the other player’s previous argument with a counterargument of suﬃcient strength. The initial argument provably has a certain status if the proponent has a winning strategy, i.e., if he can make the opponent run out of moves in whatever way she attacks. Clearly, this setup ﬁts well with the adversarial nature of legal argument, which makes it easy to embed the dialectical layer in the procedural and heuristic ones. To give an example, consider the two dialogue trees of in Figure 1. Assume that they contain all constructible arguments and that the defeat relations are as shown by the arrows (single arrows denote strict defeat while double arrows stand for mutual defeat). In the tree on the left the proponent has a winning strategy, since in all dialogues the opponent eventually runs out of moves; so argument A is provable. The tree on the right extends the ﬁrst tree with three arguments. Here the proponent does not have a winning strategy, since one dialogue ends with a move by the opponent; so A is not provable in the extended theory. Partial computation Above we said that the status of an argument depends on its interaction with all available arguments. However, we did not specify what ‘available’ means. Clearly, the arguments processed by the dialectical proof theory are based on input from the procedural layer, viz. on what has been said and assumed in a dispute. However, should only the actually stated arguments be taken into account, or also additional arguments that can be computed from the theory constructed during the dispute? And if the latter option is chosen, should all constructible arguments be considered, or only those that can be computed within given resource bounds? In the literature, all three options have been explored. The methods with partial and no computation have been defended by pointing at the fact that computer algorithms cannot be guaranteed to ﬁnd arguments in reasonable time, and sometimes not even in ﬁnite time (see especially

The Role of Logic in Computational Models of Legal Argument

P1: A

P1: A

O1: B

O1’: C

O1: B

O1’: C

O1’’: H

P2: D

P2’: E

P2: D

P2’: E

P2’’: I

O2: F

O2’: C

O2: F

O2’: C

O2’’: C

P3: G

P3’: E

P3: G

P3’: E

P3’’: E

A is provable

351

O2’’’: J

A is not provable

Fig. 1. Two trees of proof-theoretical dialogues.

Pollock 1995; Loui 1998). In our opinion, the choice essentially depends on the context and the intended use of the system. Representing Exceptions Finally, we discuss the representation of exceptions to legal rules, which concerns a very common phenomenon in the law. Some exceptions are stated by statutes themselves, while others are based, for instance on the purpose of rules or on legal principles. Three diﬀerent techniques have been used for dealing with exceptions. Two of them are well-known from nonmonotonic logic, while the third one is, to our knowledge, a contribution of AI & Law research. The ﬁrst general technique is the exception clause or explicit-exceptions approach, which corresponds to the use of ‘unless’ clauses in natural language. Logically, such clauses are captured by a nonprovability operator, which can be formalised with various well-known techniques from nonmonotonic logic or logic programming. In argument-based models the idea is that arguments concluding for the exception, thus establishing what the rule requires not to be proved, defeat arguments based upon the rule. In some formalisations, the not-to-beproved exception is directly included in the antecedent of the rule to which it refers. So, the rule ‘A if B, unless C’, is (semiformally) represented as follows (where ∼ stands for nonprovability). r1 : A ∧ ∼ C ⇒ B A more abstract and modular representation is also possible within the exception clause approach. This is achieved when the rule is formulated as requiring that no exception is proved to the rule itself. The exception now becomes the antecedent of a separate conditional.

352

Henry Prakken and Giovanni Sartor

r1 : A ∧ ∼ Exc(r1 ) ⇒ B r2 : C ⇒ Exc(r1 ) While in this approach rules themselves refer to their exceptions, a variant of this technique has been developed where instead the no-exception requirement is built into the logic of rule application [Routen and Bench-Capon, 1991, Hage, 1996, Prakken and Sartor, 1996]. Semiformally this looks as follows. r1 : A ⇒ B r2 : C ⇒ Exc(r1 ) We shall call this the exclusion approach. In argument-based versions it takes the form of allowing arguments for the inapplicability of a rule defeat the arguments using that rule. Exclusion resembles Pollock’s [1995] notion of undercutting defeaters. Finally, a third technique for representing exceptions is provided by the choice or implicit-exceptions approach. As in the exclusion approach, rules do not explicitly refer to exceptions. However, unlike with exclusion, the exception is not explicitly stated as an exception. Rather it is stated as a rule with conﬂicting conclusion, and is turned into an exception by preference information that gives the exceptional rule priority over the general rule. r1 : A ⇒ B r2 : C ⇒ ¬B r1 < r2 In argument-based models this approach is implemented by making arguments based on stronger rules defeat arguments based on weaker rules. In the general study of nonmonotonic reasoning usually either only the exception-clause- or only the choice approach is followed. However, AI & Law researchers have stressed that models of legal argument should support the combined use of all three techniques, since the law itself uses all three of them. 2.3

The Procedural Layer

There is a growing awareness that there are other grounds for the acceptability of arguments besides syntactic and semantic grounds. One class of such grounds lies in the way in which a conclusion was actually reached. This is partly inspired by a philosophical tradition that emphasises the procedural side of rationality and justice; see e.g. [Toulmin, 1958, Rawls, 1972, Rescher, 1977, Habermas, 1981]. Particularly relevant for present purposes is Toulmin’s [1958, pp. 7–8] advice that logicians who want to learn about reasoning in practice, should turn away from mathematics and instead study jurisprudence, since outside mathematics the validity of arguments would not depend on their syntactic form but on the disputational process in which they have been defended. According to Toulmin an argument is valid if it can stand against criticism in a properly conducted

The Role of Logic in Computational Models of Legal Argument

353

dispute, and the task of logicians is to ﬁnd criteria for when a dispute has been conducted properly; moreover, he thinks that the law, with its emphasis on procedures, is an excellent place to ﬁnd such criteria. Toulmin himself has not carried out his suggestion, but others have. For instance, Rescher [1977] has sketched a dialectical model of scientiﬁc reasoning which, so he claims, explains the bindingness of inductive arguments: they must be accepted if they cannot be successfully challenged in a properly conducted scientiﬁc dispute. A formal reconstruction of Rescher’s model has been given by Brewka [1994]. In legal philosophy Alexy’s [1978] discourse theory of legal argumentation addresses Toulmin’s concerns, based on the view that a legal decision is just if it is the outcome of a fair procedure. Another source of the concern for procedure is AI research on resourcebounded reasoning; e.g. [Simon, 1982, Pollock, 1995, Loui, 1998]. When the available resources do not guarantee ﬁnding an optimal solution, rational reasoners have to rely on eﬀective procedures. One kind of procedure that has been advocated as eﬀective is dialectics [Rescher, 1977, Loui, 1998]. It is not necessary to accept the view that rationality is essentially procedural in order to see that it at least has a procedural side. Therefore, a study of procedure is of interest to anyone concerned with normative theories of reasoning. How can formal models of legal procedure be developed? Fortunately, there already exists a formal framework that can be used. In argumentation theory, formal dialogue systems have been developed for so-called ‘persuasion’ or ‘critical discussion’; see e.g. [Hamblin, 1971, MacKenzie, 1990, Walton and Krabbe, 1995]. According to Walton and Krabbe [1995], dialogue systems regulate four aspects of dialogues: – – – –

Locution rules (what moves are possible) Structural rules (when moves are legal) Commitment rules (The eﬀects of moves on the players’ commitments); Termination rules (when dialogues terminate and with what outcome).

In persuasion, the parties in a dispute try to solve a conﬂict of opinion by verbal means. The dialogue systems regulate the use of speech acts for such things as making, challenging, accepting, withdrawing, and arguing for a claim. The proponent of a claim aims at making the opponent concede his claim; the opponent instead aims at making the proponent withdraw his claim. A persuasion dialogue ends when one of the players has fulﬁlled their aim. Logic governs the dialogue in various ways. For instance, if a participant is asked to give grounds for a claim, then in most systems these grounds have to logically imply the claim. Or if a proponent’s claim is logically implied by the opponent’s concessions, the opponent is forced to accept the claim, or else withdraw some of her concessions. Most computational models of legal procedure developed so far [Hage et al., 1994, Gordon, 1995, Bench-Capon, 1998, Lodder, 1999, Prakken, 2001b] have incorporated such formal dialogue systems. However, they have extended them with one interesting feature, viz. the possibility of counterargument. In argumentation-theoretic models of persuasion the only way to challenge an argument is by asking an argument for its premises. In

354

Henry Prakken and Giovanni Sartor

a legal dialogue, by contrast, a party can challenge an argument even if he accepts all premises, viz. by stating a counterargument. In other words, while in the argumentation-theoretic models the underlying logic is deductive, in the AI & Law systems it is defeasible: support for a claim may be defeasible (e.g. inductive or analogical) instead of watertight, and forced or implied concession of a claim is deﬁned in terms of defeasible instead of deductive consequence. Or in terms of our four-layered view: while the argumentation theorists only have the logical and procedural layer, the AI & Law models have added the dialectical layer in between. In fact, clarifying the interplay between the dialectical and the procedural layer is not a trivial matter, and is the subject of ongoing logical research. See e.g. [Brewka, 2001, Prakken, 2000, Prakken, 2001c]. 2.4

The Heuristic Layer

This layer (which addresses much of what is traditionally called ‘rhetoric’) is the most diverse one. In fact, heuristics play a role at any aspect of the other three levels: they say which premises to use, which arguments to construct, how to present them, which arguments to attack, which claims to make, concede or deny, etc. Heuristics can be divided into (at least) three kinds: inventional heuristics, which say how a theory can be formed (such as the classical interpretation schemes for legal rules), selection heuristics, which recommend a choice between various options (such as ‘choose an argument with as few premises as possible, to minimise its attacking points’), and presentation heuristics, which tell how to present an argument (e.g. ‘don’t draw the conclusion yourself but invite the listener to draw it’). A keyword at the heuristic level is persuasion. For instance, which arguments are the most likely to make the opponent accept one’s claims? Persuasiveness of arguments is not a matter of logic, however broadly conceived. Persuasiveness is not a function from a given body of information: it involves an essential nondeterministic element, viz. what the other player(s) will do in response to a player’s dialectic acts. To model persuasiveness, models are needed predicting what other players (perhaps the normal, typical other player) will do. Analogous models have been studied in research on argument in negotiation [Kraus et al., 1998, Parsons et al., 1998]. An interesting issue is how to draw the dividing line between argument formation rules and inventional heuristics. Below we will discuss several reasoning schemes that can be reasonably regarded as of either type. We think that the criterion is whether the schemes are meant to justify a claim or not. 2.5

Intertwining of the Layers

The four layers can be intertwined in several ways. For instance, allocating the burden of proof is a procedural matter, usually done by the judge on the basis of procedural law. However, sometimes it becomes the subject of dispute, for

The Role of Logic in Computational Models of Legal Argument

355

instance, when the relevant procedural provisions are open-textured or ambiguous. In such a case, the judge will consider all relevant arguments for and against a certain allocation and decide which argument prevails. To this the dialectical layer applies. The result, a justiﬁed argument concerning a certain allocation, is then transferred to the procedural layer as a decision concerning the allocation. Moreover, sometimes the question at which layer one ﬁnds himself depends on the use that is made of a reasoning scheme instead of on the reasoning scheme itself. We already mentioned analogy, which can be used in learning (heuristic layer) but also in justiﬁcation (dialectical layer). Or consider, for another example, the so-called teleological interpretation scheme, i.e., the idea that law texts should usually be understood in terms of their underlying purposes. This principle may be used by a party (when it provides him with a rule which is in his interest to state) as an inventional heuristic, i.e., as a device suggesting suitable contents to be stated in his argument: interpret a law text as a rule which achieves the legislator’s purposes, whenever this rule promotes your interest. If this is the use of the interpretation scheme, then a party would not input it in the dispute, but would just state the results it suggests. The principle, however, could also be viewed by a party as a justiﬁcatory premise, which the party explicitly uses to support the conclusion that a certain rule is valid, or that it prevails over alternative interpretations. Not all inventional heuristics could equally be translatable as justiﬁcatory meta-rules. Consider for example the heuristic: interpret a text as expressing the rule that best matches the political ideology (or the sexual of racial prejudices) of the judge of your case, if this rule promotes your interest. This suggestion, even though it may be a successful heuristic, usually could not be inputted in the argumentation as a justiﬁcatory meta-rule.

3

Computational Models of Legal Argument

In the introduction we said that logic-based and design-based methods in AI & law should complement and inﬂuence each other. For this reason, we now discuss some of the most inﬂuential implemented architectures of legal argument. We do so in the light of our four-layered view. 3.1

McCarty’s Work

The TAXMAN II project of McCarty (e.g. McCarty and Sridharan, 1981; McCarty, 1995) aims to model how lawyers argue for or against the application of a legal concept to a problem situation. In McCarty and Sridharan [1981] only a theoretical model is presented but in McCarty [1995] an implementation is described of most components of the model. However, their interaction in ﬁnding arguments is still controlled by the user. Among other things, the project involves the design of a method for representing legal concepts, capturing their open-textured and dynamic nature. This method is based on the view that legal concepts have three components: ﬁrstly, a

356

Henry Prakken and Giovanni Sartor

(possibly empty) set of necessary conditions for the concept’s applicability; secondly, a set of instances (“exemplars”) of the concept; and ﬁnally, a set of rules for transforming a case into another one, particularly for relating “prototypical” exemplars to “deformations”. According to McCarty, the way lawyers typically argue about application of a concept to a new case is by ﬁnding a plausible sequence of transformations which maps a prototype, possibly via other cases, onto the new case. In our opinion, these transformations might be regarded as invention heuristics for argument construction. 3.2

Gardner

An early success of logic-based methods in AI & Law was their logical reconstruction of Gardner’s [1987] program for so-called “issue spotting”. Given an input case, the task of the program was to determine which legal questions involved were easy and which were hard, and to solve the easy ones. If all the questions were found easy, the program reported the case as clear, otherwise as hard. The system contained domain knowledge of three diﬀerent types: legal rules, common-sense rules, and rules extracted from cases. The program considered a question as hard if either “the rules run out”, or diﬀerent rules or cases point at diﬀerent solutions, without there being any reason to prefer one over the other. Before a case was reported as hard, conﬂicting alternatives were compared to check whether one is preferred over the other. For example, case law sets aside legal rules or common-sense interpretations of legal concepts. Clearly, Gardner’s program can be reconstructed as nonmonotonic reasoning with prioritised information, i.e., as addressing the dialectical layer. Reconstructions of this kind have been given by [Gordon, 1991], adapting [Poole, 1988]’s abductive model of default reasoning, and [Prakken, 1997], in terms of an argument-based logic. 3.3

HYPO

HYPO aims to model how lawyers make use of past decisions when arguing a case. The system generates 3-ply disputes between a plaintiﬀ and a defendant of a legal claim concerning misuse of a trade secret. Each move conforms to certain rules for analogising and distinguishing precedents. These rules determine for each side which are the best cases to cite initially, or in response to the counterparty’s move, and how the counterparty’s cases can be distinguished. A case is represented as a set of factors pushing the case towards (pro) or against (con) a certain decision, plus a decision which resolves the conﬂict between the competing factors. A case is citable for a side if it has the decision wished by that side and shares with the Current Fact Situation (CFS) at least one factor which favours that decision. A citation can be countered by a counterexample, that is, a case that is at least as much on point, but has the opposite outcome. A citation may also be countered by distinguishing, that is, by indicating a factor in the CFS which is absent in the cited precedent and which supports the opposite outcome, or a factor in the precedent which is missing in the CFS,

The Role of Logic in Computational Models of Legal Argument

357

and which supports the outcome of the cited case. Finally, HYPO can create hypothetical cases by using magnitudes of factors. In evaluating the relative force of the moves, HYPO uses the set inclusion ordering on the factors that the precedents share with the CFS. However, unlike logic-based argumentation systems, HYPO does not compute an ‘outcome’ or ‘winner’ of a dispute; instead it outputs 3-ply disputes as they could take place between ‘good’ lawyers. HYPO in Terms of the Four Layers Interpreting HYPO in terms of the four layers, the main choice is whether to model HYPO’s analogising and distinguishing moves as argument formation rules (logical layer) or as inventional heuristics (heuristic layer). In the ﬁrst interpretation, the representation language is simply as described above (a decision, and sets of factors pro and con a decision), analogising a precedent is a constructible argument, stating a counterexample is a rebutter, and distinguishing a precedent is an undercutter. Defeat is deﬁned such that distinctions always defeat their targets, while counterarguments defeat their targets iﬀ they are not less on point. In the second interpretation, proposed by [Prakken and Sartor, 1998], analogising and distinguishing a precedent are regarded as ‘theory constructors’, i.e., as ways of introducing new information into a dispute. We shall discuss this proposal below in Section 3. Which interpretation of HYPO’s argument moves is the best one is not an easy question. Essentially it asks for the nature of analogical reasoning, which is a deep philosophical question. In both interpretations HYPO has some heuristic aspects, since it deﬁnes the “best cases to cite” for each party, selecting the most-on-point cases from those allowed by the dialectical protocol. This can be regarded as a selection heuristic. 3.4

CATO

The CATO system of Aleven and Ashley [1997] applies an extended HYPO architecture for teaching case-based argumentation skills to law students, also in the trade secrets domain. CATO’s main new component is a ‘factor hierarchy’, which expresses expert knowledge about the relations between the various factors: more concrete factors are classiﬁed according to whether they are a reason pro or con the more abstract factors they are linked to; links are given a strength (weak or strong), which can be used to solve certain conﬂicts. Essentially, this hierarchy ﬁlls the space between the factors and decision of a case. Thus it can be used to explain why a certain decision was taken, which in turn facilitates debates on the relevance of diﬀerences between cases. For instance, the hierarchy positively links the factor Security measures taken to the more abstract concept Eﬀorts to maintain secrecy. Now if a precedent contains the ﬁrst factor but the CFS lacks it, then not only could a citation of the precedent be distinguished on the absence of Security measures taken, but also could this distinction be emphasised by saying that thus no eﬀorts were made to maintain secrecy. However, if the CFS also contains a factor Agreed not to disclose information, then the factor hierarchy enables downplaying this

358

Henry Prakken and Giovanni Sartor

distinction, since it also positively links this factor to Eﬀorts to maintain secrecy: so the party that cited the precedent can say that in the current case, just as in the precedent, eﬀorts were made to maintain secrecy. The factor hierarchy is not meant to be an independent source of information from which arguments can be constructed. Rather it serves as a means to reinterpret precedents: initially cases are in CATO, as in HYPO, still represented as one-step decisions; the factor hierarchy can only be used to argue that the decision was in fact reached by one or more intermediate steps. CATO in Terms of the Four Layers At the logical layer CATO adds to HYPO the generation of multi-steps arguments, exploiting the factor hierarchy. As for CATO’s ability to reinterpret precedents, we do not regard this as an inventional heuristic, since the main device used in this feature, the factor hierarchy, is given in advance; instead we think that this is just the logic-layer ability to build multi-steps arguments from given information. However, CATO’s way of formatting the emphasising and downplaying moves in its output can be regarded as built-in presentation heuristics. 3.5

CABARET

The CABARET system of Rissland and Skalak [1991] attempts at combining rule-based and case-based reasoning. Its case-based component is the HYPO system. The focus is on statutory interpretation, in particular on using precedents to conﬁrm or contest the application of a rule. In [Skalak and Rissland, 1992], CABARET’s model is described as a hierarchy of argument techniques including strategies, moves and primitives. A strategy is a broad characterisation of how one should argue, given one’s particular viewpoint and dialectical situation. A move is a way to carry out the strategy, while a primitive is a way to implement a move. For example, when one wants to apply a rule, and not all of the rule’s conditions are satisﬁed, then a possible strategy is to broaden the rule. This strategy can be implemented with a move that argues with an analogised precedent that the missing condition is not really necessary. This move can in turn be implemented with HYPO’s ways to analogise cases. Similarly, CABARET also permits arguments that a rule which prima facie appears to cover the case, should not be applied to it. Here the strategy is discrediting a rule and the move may consist in analogising a case in which the rule’s conditions were met but the rule was not applied. Again the move can be implemented with HYPO’s ways to analogise cases. CABARET in Terms of the Four Layers At the logical layer CABARET adds to HYPO the possibility to construct simple rule-based arguments, while at the dialectical layer, CABARET adds corresponding ways to attack arguments. CABARET’s main feature, its model of argument strategies, clearly addresses the heuristic layer. The strategies can be seen as selection heuristics: they choose between the available attacking points, and pick up from the rule- and case-base the most relevant materials.

The Role of Logic in Computational Models of Legal Argument

3.6

359

DART

Freeman & Farley [1996] have semi-formally described and implemented a dialectical model of argumentation. For legal applications it is especially relevant since it addresses the issue of burden of proof. Rules are divided into three epistemic categories: ‘suﬃcient’, ‘evidential’ and ‘default’, in decreasing order of priority. The rules for constructing arguments involve standard logic principles, such as modus ponens and modus tollens, but also nonstandard ones, such as for abductive reasoning (p ⇒ q and q imply p) and a contrario reasoning (p ⇒ q and ¬p imply ¬q). Taken by themselves these inferences clearly are the well-known fallacies of ‘aﬃrming the consequent’ and ‘denying the antecedent’ but this is dealt with by deﬁning undercutters for such arguments. For instance, the above abductive argument can be undercut by providing an alternative explanation for q, in the form of a rule r ⇒ q. The defeat relations between arguments depend both on the type of premise and on the type of inference rule. The status of arguments is deﬁned in terms of an argument game based on a static knowledge base. DART’s argument game has several variants, depending on which level of proof holds for the main claim. This is because Freeman and Farley maintain that diﬀerent legal problem solving contexts require diﬀerent levels of proof. For instance, for the question whether a case can be brought before court, only a ‘scintilla of evidence’ is required (in present terms a defensible argument), while for a decision in a case ‘dialectical validity’ is needed (in our terms a justiﬁed argument). DART in Terms of the Four Layers DART essentially addresses the logical and dialectical layers, while it assumes input from the procedural layer. At the logical layer, it allows both deductive and nondeductive arguments. Freeman and Farley are well aware that this requires the deﬁnition of undercutters for the nondeductive argument types. DART’s argument games are similar to dialectical proof theories for argument-based logics. However, they are not given a formal semantics. Finally, DART assumes procedural input in the form of an assignment of a level of proof to the main claim. 3.7

The Pleadings Game

Next we discuss Gordon’s [1994, 1995] Pleadings Game, which is an attempt to model the procedural view on justice discussed above in Section 2.3. The legal-procedural example domain is ‘civil pleading’, which is the phase in AngloAmerican civil procedure where the parties exchange arguments and counterarguments to identify the issues that must be decided by the court. The system is not only implemented but also formally deﬁned. Thus this work is an excellent illustration of how logic can be used as a tool in computational models of legal argument. For this reason, and also since it clearly illustrates the relation between the ﬁrst three layers, we shall discuss it in some detail. The implemented system mediates between parties in a legal procedure: it keeps track of the stated arguments and their dialectical relations, and it checks

360

Henry Prakken and Giovanni Sartor

whether the procedure is obeyed. Gordon models civil pleading as a HamblinMacKenzie-style dialogue game, deﬁning speech acts for stating, conceding and denying (= challenging) a claim, and stating an argument for a claim. In addition, Gordon allows for counterarguments, thus choosing for a nonmonotonic logic as the underlying logical system. In fact, Gordon uses the argument-based proof theory of Geﬀner’s [1992] conditional entailment. As for the structural rules of the game, a game starts when the plaintiﬀ states his main claim. Then the game is governed by a general rule saying that at each turn a player must respond in some permissible way to every move of the opponent that is still relevant. A move is relevant iﬀ it concerns an issue. An issue is, very roughly, deﬁned as a claim that dialectically matters for the main claim and has not yet been replied-to. The other structural rules deﬁne under which conditions a move is permissible. For instance, a claim of a player may be denied by the other player iﬀ it is an issue and is not defeasibly implied by the denier’s own previous claims. And a denied claim may be defended with an argument as long as (roughly) the claim is an issue, and the argument’s premises are consistent with the mover’s previous claims, and (in case the other party had previously claimed them) they were conceded by the mover. If no such ‘permission rule’ applies, the other player is to move, except when this situation occurs at the beginning of a turn, in which case the game terminates. The result of a terminated game is twofold: a list of issues identiﬁed during the game (i.e., the claims on which the players disagree), and a winner, if there is one. Winning is deﬁned relative to the set of premises agreed upon during a game. If issues remain, there is no winner and the case must be decided by the court. If no issues remain, then the plaintiﬀ wins iﬀ its main claim is defeasibly implied by the jointly constructed theory, while the defendant wins otherwise. An Example We now illustrate the Pleadings Game with an example. Besides illustrating this particular system, the example also illustrates the interplay between the logical, dialectical and procedural layers of legal argument. For the sake of illustration we simplify the Game on several points, and use a diﬀerent (and semiformal) notation. The example, loosely based on Dutch law, concerns a dispute on oﬀer and acceptance of contracts. The players are called plaintiﬀ (π) and defendant (δ). Plaintiﬀ, who had made an oﬀer to defendant, starts the game by claiming that a contract exists. Defendant denies this claim, after which plaintiﬀ supports it with the argument that defendant accepted his oﬀer and that an accepted oﬀer creates a contract. π1 : Claim[ (1) Contract ] δ1 : Deny(1) π2 : Argue[ (2) Offer, (3) Acceptance, (4) Offer ∧ Acceptance ⇒ Contract, so Contract ] Now defendant attacks plaintiﬀ’s supporting argument [2,3,4] by defeating its subargument that she accepted the oﬀer. The counterargument says that defen-

The Role of Logic in Computational Models of Legal Argument

361

dant sent her accepting message after the oﬀer had expired, for which reason there was no acceptance in a legal sense. δ2 : Concede(2,4), Deny(3) Argue[ (5) “Accept” late, (6) “Accept” late ⇒ ¬ Acceptance, so ¬ acceptance ] Plaintiﬀ responds by strictly defeating δ2 with a more speciﬁc counterargument (conditional entailment compares arguments on speciﬁcity), saying that even though defendant’s accepting message was late, it still counts as an acceptance, since plaintiﬀ had immediately sent a return message saying that he recognises defendant’s message as an acceptance. π3 : Concede(5), Deny(6), Argue[ (5) “Accept” late, (7) “Accept” recognised, (8) “Accept” late ∧ “Accept” recognised ⇒ Acceptance, so Acceptance ] Defendant now attempts to leave the issues for trial by conceding π3 ’s argument (the only eﬀect of this is giving up the right to state a counterargument) and its premise (8), and by denying one of the other premises, viz. (7) (she had already implicitly claimed premise (5) herself, in δ2 ). Plaintiﬀ goes along with defendant’s aim by simply denying defendant’s denial of (7) and not stating a supporting argument for his claim, after which the game terminates since no relevant moves are left to answer for either party. δ3 : Concede(8,[5,7,8]), Deny(7) π4 : Deny(Deny(7)) This game has resulted in the following dialectical graph. π1 : [2,3,4] for Contract δ1 : [5,6] for ¬ Acceptance π2 : [5,7,8] for Acceptance The claims in this graph that have not been conceded are (1) Contract (3) Acceptance (6) “Accept” late ⇒ ¬ Acceptance (7) “Accept” recognised So these are the issues. Moreover, the set of premises constructed during the game, i.e. the set of conceded claims, is {2, 4, 5}. It is up to the judge whether to extend it with the issues (6) and (7). In each case conditional-entailment’s proof theory must be used to verify whether the other two issues, in particular plaintiﬀ’s main claim (1), are (defeasibly) implied by the resulting premises. In fact, it is easy to see that they are entailed only if (6) and (7) are added.

362

Henry Prakken and Giovanni Sartor

The Pleadings Game in Terms of the Four Layers Clearly, the Pleadings Game explicitly models the ﬁrst three layers of our model. (In fact, the game was a source of inspiration of [Prakken, 1995]’s ﬁrst formulation of these layers.) Its contribution to modelling the procedural layer should be apparent from the example. Gordon has also addressed the formalisation of the dialectical layer, adapting within conditional entailment well-known AI techniques concerning naming of rules in (ab)normality predicates. Firstly, he has shown how information about properties of rules (such as validity and backing) can be expressed and, secondly, he has deﬁned a way to express priority rules as object level rules, thus formalising disputes about rule priorities. However, a limitation of his method is that it has to accept conditional-entailment’s built-in speciﬁcity principle as the highest source of priorities.

4

Logical Models of Legal Argument

Having discussed several implemented models of legal argument, we now turn to logical models. Again we will discuss them in light of our four-layers model. 4.1

Applications of Logic (Meta-)Programming

The British Nationality Act First we discuss the idea of formalising law as logic programs, viz. as a set of formulas of a logical language for which automated theorem provers exist. The underlying ideas of this approach are set out in [Sergot, 1988] and [Kowalski and Sergot, 1990], and is most closely associated with Sergot and Kowalski. The best known application is the formalisation of the British Nationality Act [Sergot et al., 1986] (but see also [Bench-Capon et al., 1987]). For present purposes the main relevance of the work of Sergot et al. is its treatment of exceptions by using negation by failure (further explored by Kowalski, 1989, 1995). To our knowledge, this was the ﬁrst logical treatment of exceptions in a legal context. In this approach, which implements the explicit-exceptions approach of Section 2, negation by failure is considered to be an appropriate translation for such locutions as ‘unless the contrary is shown’ or ‘subject to section . . . ’, which usually introduce exception clauses in legislation. Consider, for example, the norm to the eﬀect that, under certain additional conditions, an abandoned child acquires British citizenship unless it can be shown that both parents have a diﬀerent citizenship. Since Kakas et al. have shown that negation as failure can be given an argument-based interpretation, where negation-as failure assumptions are defeated by proving their contrary, we can say that [Sergot et al., 1986] model reasoning with rules and exceptions at the logical and the dialectical layer. Allen & Saxon’s criticism An interesting criticism of Sergot et al.’s claim concerning exceptions was put forward by [Allen and Saxon, 1989]. They argued that the defeasible nature of legal reasoning is irreducibly procedural, so that it cannot be captured by current nonmonotonic logics, which deﬁne defeasible

The Role of Logic in Computational Models of Legal Argument

363

consequence only as a ‘declarative’ relation between premises and conclusion of an argument. In particular, they attacked the formalisation of ‘unless shown otherwise’ with negation as failure by arguing that ‘shown’ in this context does not mean ‘logically proven from the available premises’ but “shown by a process of argumentation and the presenting of evidence to an authorized decision-maker”. So ‘shown’ would not refer to logical but to legal-procedural nonprovability. In our opinion, Allen & Saxon are basically right, since such expressions address the allocation of the burden of proof, which in legal procedure is a matter of decision by the judge rather than of inference, and therefore primarily concerns the procedural layer rather than the dialectical one (as is Sergot et al.’s use of negation by failure). Note that these remarks apply not only to Sergot et al.’s work, but to any approach that stays within the dialectical layer. In Section 4.4 we will come back to this issue in more detail. Applications of Logic Metaprogramming In two later projects the legal application of logic-programming was enriched with techniques from logic metaprogramming. Hamfelt [1995] uses such techniques for (among other things) representing legal collision rules and interpretation schemes. His method uses logic programming’s DEMO predicate, which represents provability in the object language. Since much knowledge used in legal reasoning is metalevel knowledge, Hamfelt’s approach might be a useful component of models of legal argument. However, it is not immediately clear how it can be embedded in a dialectical context, so that more research is needed. The same holds for the work of Routen and Bench-Capon [1991], who have applied logic metaprogramming to, among other things, the representation of rules and exceptions. Their method provides a way to implement the exclusion approach of Section 2. They enrich the knowledge representation language with metalevel expressions Exception to(rule1 , rule2 ), and ensure that their metainterpreter applies a rule only if no exceptional rule can be applied. Although this is an elegant method, it also has some restrictions. Most importantly, it is not embedded in an argument-based model, so that it cannot easily be combined with other ways to compare conﬂicting arguments. Thus their method seems better suited for representing coherent legal texts than for modelling legal argument. 4.2

Applications of Argument-Based Logics

Next we discuss legal applications of logics for defeasible argumentation. Several of these applications in fact use argument-based versions of logic programming. Prakken & Sartor Prakken and Sartor [1996, 1997] have developed an argument-based logic similar to the one of [Simari and Loui, 1992], but that is expressive enough to deal with contradictory rules, rules with assumptions, inapplicability statements, and priority rules. Their system applies the wellknown abstract approach to argumentation, logic programming and nonmonotonic reasoning developed by Dung [1995] and Bondarenko et al. [1997]. The

364

Henry Prakken and Giovanni Sartor

logical language of the system is that of extended logic programming i.e., it has both negation as failure (∼) and classical, or strong negation (¬). Moreover, each formula is preceded by a term, its name. (In [Prakken, 1997] the system is generalised to the language of default logic.) Rules are strict, represented with →, or else defeasible, represented with ⇒. Strict rules are beyond debate; only defeasible rules can make an argument subject to defeat. Accordingly, facts are represented as strict rules with empty antecedents (e.g. → gave-up-house). The input information of the system, i.e., the premises, is a set of strict and defeasible rules, which is called an ordered theory (‘ordered’ since an ordering on the defeasible rules is assumed). Arguments can be formed by chaining rules, ignoring weakly negated antecedents; each head of a rule in the argument is a conclusion of the argument. Conﬂicts between arguments are decided according to a binary relation of defeat among arguments, which is partly induced by rule priorities. An important feature of the system is that the information about these priorities is itself presented as premises in the logical language. Thus rule priorities are as any other piece of legal information established by arguments, and may be debated as any other legal issue. The results of such debates are then transported to and used by the metatheory of the system. There are three ways in which an argument Arg2 can defeat an argument Arg1 . The ﬁrst is assumption defeat (in the above publications called “undercutting” defeat), which occurs if a rule in Arg1 contains ∼ L in its body, while Arg2 has a conclusion L. For instance, the argument [r1 : → p, r2 : p ⇒ q] (strictly) defeats the argument [r3 : ∼ q ⇒ r] (note that ∼ L reads as ‘there is no evidence that L’). This way of defeat can be used to formalise the explicit-exception approach of Section 2. The other two forms of defeat are only possible if Arg1 does not assumption-defeat Arg2 . One way is by excluding an argument, which happens when Arg2 concludes for some rule r in Arg1 that r is not applicable (formalised as ¬appl(r)). For instance, the argument [r1 : → p, r2 : p ⇒ ¬appl(r3 )] (strictly) defeats the argument [r3 : ⇒ r] by excluding it. This formalises the exclusion approach of Section 2. The ﬁnal way in which Arg2 can defeat Arg1 is by rebutting it: this happens when Arg1 and Arg2 contain rules that are in a head-to-head conﬂict and Arg2 ’s rule is not worse than the conﬂicting rule in Arg1 . This way of defeat supports the implicit-exception approach. Argument status is deﬁned with a dialectical proof theory. The proof theory is correct and complete with respect to [Dung, 1995]’s grounded semantics, as extended by Prakken and Sartor to the case with reasoning about priorities. The opponent in a game has just one type of move available, stating an argument that defeats proponent’s preceding argument (here defeat is determined as if no priorities were deﬁned). The proponent has two types of moves: the ﬁrst is an argument that combines an attack on opponent’s preceding argument with a priority argument that makes the attack strictly defeating opponent’s argument; the second is a priority argument that neutralises the defeating force of O’s last move. In both cases, if proponent uses a priority argument that is not justiﬁed

The Role of Logic in Computational Models of Legal Argument

365

by the ordered theory, this will reﬂect itself in the possibility of successful attack of the argument by the opponent. We now present the central deﬁnition of the dialogue game (‘Arg-defeat’ means defeat on the basis of the priorities stated by Arg). The ﬁrst condition says that the proponent begins and then the players take turns, while the second condition prevents the proponent from repeating a move. The last two conditions were just explained and form the heart of the deﬁnition. A dialogue is a ﬁnite nonempty sequence of moves movei = (P layeri , Argi ) (i > 0), such that 1. P layeri = P iﬀ i is odd; and P layeri = O iﬀ i is even; 2. If P layeri = P layerj = P and i = j, then Argi = Argj ; 3. If P layeri = P then Argi is a minimal (w.r.t. set inclusion) argument such that (a) Argi strictly Argi -defeats Argi−1 ; or (b) Argi−1 does not Argi -defeat Ai−2 ; 4. If P layeri = O then Argi ∅-defeats Argi−1 . The following simple dialogue illustrates this deﬁnition. It is about a tax dispute about whether a person temporarily working in another country has changed his ﬁscal domicile. All arguments are citations of precedents.3 P1 : [f1 : kept-house, r1 : kept-house ⇒ ¬ change] (Keeping one’s old house is a reason against change of ﬁscal domicile.) O1 : [f2 : ¬ domestic-headquarters, r2 : ¬ domestic-headquarters ⇒ ¬ domestic-company, r3 : ¬ domestic-company ⇒ change] (If the employer’s headquarters are in the new country, it is a foreign company, in which case ﬁscal domicile has changed.) P2 : [f3 : domestic-property, r4 : domestic-property ⇒ domestic-company, f4 : r4 is decided by higher court than r2 , r5 : r4 is decided by higher court than r2 ⇒ r2 ≺ r4 ] (If the employer has property in the old country, it is a domestic company. The court which decided this is higher than the court deciding r2 .) The proponent starts the dialogue with an argument P1 for ¬ change, after which the opponent attacks this argument with an argument O1 for the opposite conclusion. O1 defeats P1 as required, since in our logical system two rebutting 3

Facts fi : → pi are abbreviated as fi : pi .

366

Henry Prakken and Giovanni Sartor

arguments defeat each other if no priorities are stated. P2 illustrates the ﬁrst possible reply of the proponent to an opponent’s move: it combines an object level argument for the conclusion domestic-company with a priority argument that gives r4 precedence over r2 and thus makes P2 strictly defeat O1 . The second possibility, just stating a priority argument that neutralises the opponent’s move, is illustrated by the following alternative move, which resolves the conﬂict between P1 and O1 in favour of P1 : P2 : [f5 : r1 is more recent than r3 , p : r1 is more recent than r3 ⇒ r3 ≺ r1 ] Kowalski & Toni Like Prakken and Sartor, Kowalski and Toni [1996] also apply the abstract approach of [Dung, 1995, Bondarenko et al., 1997] to the legal domain, instantiating it with extended logic programming. Among other things, they show how priority principles can be encoded in the object language without having to refer to priorities in the metatheory of the system. We illustrate their method using the language of [Prakken and Sartor, 1996]. Kowalski and Toni split each rule r: P ⇒ Q into two rules Applicable(r) ⇒ Q P ∧ ∼ D ef eated(r) ⇒ Applicable(r) The predicate Defeated is deﬁned as follows: r ≺ r ∧ C onf licting(r, r ) ∧ Applicable(r) → D ef eated(r) Whether r ≺ r holds, must be (defeasibly) derived from other information. Kowalski and Toni also deﬁne the Conﬂicting predicate in the object language. Three Formal Reconstructions of HYPO-style Case-Based Reasoning The dialectical nature of the HYPO system has inspired several logically inclined researchers to reconstruct HYPO-style reasoning in terms of argument-based defeasible logics. We brieﬂy discuss three of them, and refer to [Hage, 1997] for a reconstruction in Reason-based Logic (cf. Section 4.3 below). Loui et al. (1993) Loui et al. [1993] proposed a reconstruction of HYPO in the context of the argument-based logic of [Simari and Loui, 1992]. They mixed the pro and con factors of a precedent in one rule Pro-factors ∧ Con-factors ⇒ Decision but then implicitly extended the case description with rules containing a superset of the con factors and/or a subset of the con factors in this rule. Loui et al. also studied the combination of reasoning with rules and cases. This work was continued in [Loui and Norman, 1995] (discussed below in Section 4.5).

The Role of Logic in Computational Models of Legal Argument

367

Prakken and Sartor (1998) Prakken and Sartor [1998] have modelled HYPOstyle reasoning in their [1996] system, also adding additional expressiveness. As Loui et al. [1993] they translate HYPO’s cases into a defeasible-logical theory. However, unlike Loui et al., Prakken and Sartor separate the pro and con factors into two conﬂicting rules, and capture the case decision with a priority rule. This method is an instance of a more general idea (taken from [Loui and Norman, 1995]) to represent precedents as a set of arguments pro and con the decision, and to capture the decision by a justiﬁed priority argument that in turn makes the argument for the decision justiﬁed. In its simplest form where, as in HYPO, there are just a decision and sets of factors pro and con the decision, this amounts to having a pair of rules r1 : Pro-factors ⇒ Decision r2 : Con-factors ⇒ ¬Decision and an unconditional priority rule p: ⇒ r1 r2 However, in general arguments can be multi-step (as suggested by [Branting, 1994]) and priorities can very well be the justiﬁed outcome of a competition between arguments. Analogy is now captured by a ‘rule broadening’ heuristic, which deletes the antecedents missing in the new case. And distinguishing is captured by a heuristic which introduces a conﬂicting rule ‘if these factors are absent, then the consequent of your broadened rule does not hold’. So if a case rule is r1 : f1 ∧ f2 ⇒ d, and the CFS consists of f1 only, then r1 is analogised by b(r1 ): f1 ⇒ d, and b(r1 ) is distinguished by d(b(r1 )): ¬f2 ⇒ ¬d. To capture the heuristic nature of these moves, Prakken and Sartor ‘dynamify’ their [1996] dialectical proof procedure, to let it cope with the introduction of new premises. Finally, in [Prakken, 2002] it is, inspired by [Bench-Capon and Sartor, 2001], shown how within this setup cases can be compared not on factual similarities but on the basis of underlying values. Horty (1999) Horty [1999] has reconstructed HYPO-style reasoning in terms of his own work on two other topics: defeasible inheritance and defeasible deontic logic. Since inheritance systems are a forerunner of logics for defeasible argumentation, Horty’s reconstruction can also be regarded as argument-based. It addresses the analogical citation of cases and the construction of multi-steps arguments. To support the citation of precedents on their intermediate steps, cases are separated into ‘precedent constituents’, which contain a set of factors and a possibly intermediate outcome. Arguments are sequences of factor sets, starting with the current fact situation and further constructed by iteratively applying precedent constituents that share at least one factor with the set constructed thus far. Conﬂicting uses of precedent constituents are compared with a variant of HYPO’s more-on-point similarity criterion. The dialectical status of

368

Henry Prakken and Giovanni Sartor

the constructible arguments is then assessed by adapting notions from Horty’s inheritance systems, such as ‘preemption’. Other Work on Argument-Based Logics Legal applications of argumentbased logic programming have also been studied by Nitta and his colleagues; see e.g. [Nitta and Shibasaki, 1997]. Besides rule application, their argument construction principles also include some simple forms of analogical reasoning. However, no undercutters for analogical arguments are deﬁned. The system also has a rudimentary dialogue game component. Formal work on dialectical proof theory with an eye to legal reasoning has been done by Jakobovits and Vermeir [1999]. Their focus is more on technical development than on legal applications. 4.3

Reason-Based Logic

Hage [1996, 1997] and Verheij [1996] have developed a formalism for legal reasoning called ‘reason-based logic’ (RBL), centering around a deep philosophical account of the concept of a rule. It is meant to capture how legal (and other) principles, goals and rules give rise to reasons for and against a proposition and how these reasons can be used to draw conclusions. The underlying view on principles, rules and reasons is inﬂuenced by insights from analytical philosophy on the role of reasons in practical reasoning, especially [Raz, 1975]. Hage and Verheij stress that rule application is much more than simple modus ponens. It involves reasoning about the validity and applicability of a rule, and weighing reasons for and against the rule’s consequent. RBL’s View on Legal Knowledge RBL reﬂects a distinction between two levels of legal knowledge. The primary level includes principles and goals, while the secondary level includes rules. Principles and goals express reasons for or against a conclusion. Without the secondary level these reasons would in each case have to be weighed to obtain a conclusion, but according to Hage and Verheij rules express the outcome of certain weighing process. Therefore, a rule does not only generate a reason for its consequent but also generates a so-called ‘exclusionary’ reason against applying the principles underlying the rule: the rule replaces the reasons on which it is based. This view is similar to Dworkin’s [1977] well-known view that while principles are weighed against each other, rules apply in an all-or-nothing fashion. However, according to Hage [1996] and Verheij [Verheij et al., 1998] this diﬀerence is just a matter of degree: if new reasons come up, which were not taken into account in formulating the rule, then these new reasons are not excluded by the rule; the reason based on the rule still has to be compared with the reasons based on the new principles. Consequently, in RBL rules and principles are syntactically indistinguishable; their diﬀerence is only reﬂected in their degree of interaction with other rules and principles (but Hage [1997] somewhat deviates from this account.)

The Role of Logic in Computational Models of Legal Argument

369

A Sketch of the Formal System To capture reasoning about rules, RBL provides the means to express properties of rules in the object language. To this end Hage and Verheij use a sophisticated naming technique, viz. reiﬁcation, wellknown from metalogic and AI [Genesereth and Nilsson, 1988, p. 13], in which every predicate constant and logical symbol is named by a function expression. For instance, the conjunction R(a) ∧ S(b) is denoted by the inﬁx function expression r(a) ∧ s(b). Unlike the naming techniques used by [Gordon, 1995] and [Prakken and Sartor, 1996], RBL’s technique reﬂects the logical structure of the named formula. Rules are named with a function symbol rule, resulting in terms like rule(r, p(x), q(x)) Here r is a ‘rule identiﬁer’, p(x) is the rule’s condition, and q(x) is its consequent. RBL’s object language does not contain a conditional connective corresponding to the function symbol rule; rules can only be stated indirectly, by assertions that they are valid, as in Valid(rule(r, conditionr , conclusionr )) Hage and Verheij state RBL as extra inference rules added to standard ﬁrstorder logic or, in some versions, as extra semantic constraints on models of a ﬁrst-order theory. We ﬁrst summarise the most important rules and then give some (simpliﬁed) formalisations. 1. If a rule is valid, its conditions are satisﬁed and there is no evidence that it is excluded, the rule is applicable. 2. If a rule is applicable, it gives rise to a reason for its application. 3. A rule applies if and only if the set of all derivable reasons for its application outweighs the set of all derivable reasons against its application. 4. If a rule applies, it gives rise to a reason for its consequent. 5. A formula is a conclusion of the premises if and only if the reasons for the formula outweigh the reasons against the formula. Here is how a simpliﬁed formal version of inference rule (1) looks like. Note that condition and consequent are variables, which can be instantiated with the name of any formula. If Valid(rule(r, condition, consequent)) is derivable and Obtains(condition) is derivable and Excluded(r)) is not derivable, then Applicable(r, rule(condition, consequent)) is derivable. Condition (4) has the following form. If Applies(r, rule(condition, consequent)) is derivable, then Proreason(consequent) is derivable.

370

Henry Prakken and Giovanni Sartor

Finally, here is how in condition (5) the connection between object- and metalevel is made. If Outweighs(Proreasons(f ormula),Conreasons(f ormula)) is derivable, then Formula is derivable. Whether the pro-reasons outweigh the con-reasons must itself be derived from the premises. The only built-in constraint is that any nonempty set outweighs the empty set. Note that while f ormula is a variable for an object term, occurring in a well-formed formula of RBL, Formula is a metavariable which stands for the formula named by the term f ormula. This is how object and metalevel are in RBL connected. In RBL the derivability of certain formulas is deﬁned in terms of the nonderivability of other formulas. For instance, in (1) it may not be derivable that the rule is excluded. To deal with this, RBL adapts techniques of default logic, by restating the inference rules as conditions on membership of an extension. Using RBL In RBL exceptions can be represented both explicitly and implicitly. As for explicit exceptions, since RBL has the validity and applicability requirements for rules built into the logic, the exclusion method of Section 2 can be used. RBL also supports the choice approach: if two conﬂicting rules both apply and do not exclude each other, then their application gives rise to conﬂicting reasons, which have to be weighed. Finally, Hage and Verheij formalise legal priority principles in a similar way as [Kowalski and Toni, 1996], representing them as inapplicability rules. The following example illustrates their method with the three well known legal principles Lex Superior (the higher regulation overrides the lower one), Lex Posterior (the later rule overrides the earlier one) and Lex Specialis (the speciﬁcity principle). It is formalised in the language of [Prakken and Sartor, 1996]; recall that with respect to applicability, this system follows, as RBL, the exclusion approach. The three principles can be expressed as follows. H: x conflicts with y ∧ y is inferior to x ∧ ∼ ¬appl(x) ⇒ ¬appl(y) T : x conflicts with y ∧ y is earlier than x ∧ ∼ ¬appl(x) ⇒ ¬appl(y) S: x conflicts with y ∧ x is more specific than y ∧ ∼ ¬appl(x) ⇒ ¬appl(y) Likewise for the ordering of these three principles: HT : T conflicts with H ∧ ∼ ¬appl(H) ⇒ ¬appl(T ) T S: S conflicts with T ∧ ∼ ¬appl(T ) ⇒ ¬appl(S) HS: S conflicts with H ∧ ∼ ¬appl(H) ⇒ ¬appl(S) Thus the metatheory of the logic does not have to refer to priorities. However, the method contains another metareasoning feature, viz. the ability to express metalevel statements of the kind x conflicts with y.

The Role of Logic in Computational Models of Legal Argument

371

Evaluation RBL clearly conﬁnes itself to the logical and dialectical layer of legal argument. At these layers, it is a philosophically well-motivated analysis of legal reasoning, while technically it is very expressive, supporting reasoning with rules and exceptions, with conﬂicting rules, and about rules and their priority relations. However, it remains to be investigated how RBL can, given its complicated technical nature and the lack of the notion of an argument, be embedded in procedural and heuristic accounts of legal argument. 4.4

Procedural Accounts of Legal Reasoning

The Pleadings Game is not the only procedural AI & Law model. We now brieﬂy discuss some formal models of this kind. Hage, Leenes, and Lodder At the same time when Gordon designed his system, Hage et al. [1994] developed a procedural account of Hart’s distinction between clear and hard cases. They argued that whether a case is easy or hard depends on the stage of a procedure: a case that is easy at an earlier stage, can be made hard by introducing new information. This is an instance of their purely procedural view on the law, which incorporates substantive law by the judge’s obligation to apply it. To formalise this account, a Hamblin-MacKenzie-style formal dialogue system with the possibility of counterargument was developed. This work was extended by [Lodder, 1999] in his DiaLaw system. The general setup of these systems is the same as that of the Pleadings Game. For the technical diﬀerences the reader is referred to the above publications. One diﬀerence at the dialectical layer is that instead of an argument-based logic, Hage and Verheij’s reason-based logic is used. Another diﬀerence in [Hage et al., 1994] is that it includes a third party, the referee, who is entitled to decide whether certain claims should be accepted by the parties or not. The dialogue systems also support disputes about the procedural legality of a move. Finally, arguments do not have to be logically valid; the only use of reason-based logic is to determine whether a claim of one’s opponent follows from one’s commitments and therefore must be accepted. Bench-Capon Bench-Capon [1998] has also developed a dialogue game for legal argument. As the above-discussed games, it has the possibility of counterargument (although it does not incorporate a formalised account of the dialectical layer). The game also has a referee, with roughly the same role as in [Hage et al., 1994]. Bench-Capon’s game is especially motivated by the desire to generate more natural dialogues than the “stilted” ones of Hamblin-MacKenziestyle systems. To this end, arguments are deﬁned as variants of Toulmin’s [1958] argument structures, containing a claim, data for this claim, a warrant connecting data and claim, a backing for the warrant, and possible rebuttals of the claim with an exception. The available speech acts refer to the use or attack of these items, which, according to Bench-Capon, induces natural dialogues.

372

Henry Prakken and Giovanni Sartor

Formalising Allocations of the Burden of Proof Above we supported Allen and Saxon’s [1989] criticism of Sergot et al.’s [1986] purely logical- and dialectical-layer account of reasoning with exceptions. Additional support is provided by Prakken [2001a], who argues that allocations of burden of proof cannot be modelled by ‘traditional’ nonmonotonic means. Burden of proof is one of the central notions of legal procedure, and it is clearly connected with defeasibility [Loui, 1995, Sartor, 1995]. There are two aspects of having the burden of proving a claim: the task to come with an argument for that claim, and the task to uphold this argument against challenge in a dispute. The ﬁrst aspect can be formalised in Hamblin-MacKenzie-style dialogue systems (discussed above in Section 2.3). The second aspect requires a system that assesses arguments on the basis of the dialectical interactions between all available arguments. At ﬁrst sight, it would seem that dialectical proof theories of nonmonotonic logics can be directly applied here. However, there is a problem, which we shall illustrate with an example from contract law. In legal systems it is generally the case that the one who argues that a valid contract exists has the burden of proving those facts that ordinarily give rise to the contract, while the party who denies the existence of the contract has the burden of proving why, despite these facts, exceptional circumstances prevent the contract from being valid. Now suppose that plaintiﬀ claims that a contract between him and defendant exists because plaintiﬀ oﬀered defendant to sell her his car, and defendant accepted. Then plaintiﬀ has the burden of proving that there was such an oﬀer and acceptance, while defendant has the burden of proving, for instance, that the car had a hidden defect. Suppose we formalise this in [Prakken and Sartor, 1996] as follows: r1 : oﬀer ∧ acceptance ∧ ∼ exception(r1 ) ⇒ contract r2 : hidden defect ⇒ exception(r1 ) Suppose further that in the dispute arguments for and against hidden defect are exchanged, and that the judge regards them of equal strength. What follows dialectically? If plaintiﬀ starts with moving his argument for contract , then defendant can assumption-defeat this argument with her argument for exception(r1 ). Plaintiﬀ cannot attack this with his argument against hidden defect since it is of equal strength as defendant’s argument, so it does not strictly defeat it. In conclusion, plaintiﬀ’s argument is not justiﬁed (but merely defensible), so the outcome of our logical reconstruction is that plaintiﬀ has not fulﬁlled his burden of proof. However, the problem with this reconstruction is that it ignores that neither has defendant fulﬁlled her burden of proof: she had to prove hidden defect , but her argument for this conclusion also is merely defensible. The problem with the (sceptical) dialectical proof theory is that plaintiﬀ has the burden of proof with respect to all subissues of the dispute; there is no way to distribute the burden of proof over the parties, as is common in legal dispute. This problem is not conﬁned to the particular system or knowledge representation method, but seems a fundamental problem of current ‘traditional’ nonmonotonic logics.

The Role of Logic in Computational Models of Legal Argument

373

An additional problem for such logics is that in legal procedure the allocation of the burden of proof is ultimately a matter of decision by the judge, and therefore cannot be determined by logical form. Any full model of reasoning under burden of proof should leave room for such decisions. In [Prakken, 2001a] the dialectical proof theory for grounded semantics is adapted to enable distributions of the burden of proof over the parties, which in [Prakken, 2001b] is embedded in a dialogue game model for legal procedure. The basic idea of [Prakken, 2001a] is that the required strength of a move depends on who has the burden of proof concerning the issue under attack (as decided by the judge in the dialogue game). The resulting system has no clear link with argument-based semantics in the style of [Dung, 1995, Bondarenko et al., 1997]. For logicians this is perhaps disappointing, but for others this will count as support for the view that the semantics of (legal) defeasible reasoning is essentially procedural. ZENO’s argumentation framework Another account of distributions of the burden of proof in dialectical systems is given by Gordon and Kara¸capilidis [1997]. In fact, [Prakken, 2001a]’s proposal can partly be seen as a generalisation and logical formalisation of this account. Gordon and Kara¸capilidis incorporate variants of Freeman and Farley’s ‘levels of proof’ in their ‘ZENO argumentation framework’. This is the dialectical-layer part of the ZENO argument mediation system: it maintains a ‘dialectical graph’ of the issues, the positions with respect to these issues, and the arguments pro and con these positions that have been advanced in a discussion, including positions and arguments about the strength of other arguments. Arguments are links between positions. Part of the framework is a status assignment to positions: each position is assigned in or out depending on two factors: the required level of proof for the position, and the relative strengths of the arguments pro and con the position that themselves have antecedents that are in. For instance, a position with level ‘scintilla of evidence’ is in iﬀ at least one argument pro is in (here they deviate from Freeman and Farley). And a position with level ‘preponderance of evidence’ is in iﬀ the joint pro arguments that are in outweigh the joint con arguments that are in. The burden of proof can be distributed over the parties since levels of proof can be assigned to arbitrary positions instead of (as in [Freeman and Farley, 1996]) only to the initial claim of a dispute.

4.5

Formalisations of the Heuristic Layer

In logical models of legal argument the heuristic layer has so far received very little attention. Above we discussed Prakken and Sartor’s [1998] logical reconstruction of HYPO-style ana

Computational Logic: Logic Programming and Beyond Essays in Honour of Robert A. Kowalski Part II

13

Series Editors Jaime G. Carbonell,Carnegie Mellon University, Pittsburgh, PA, USA J¨org Siekmann, University of Saarland, Saarbr¨ucken, Germany Volume Editors Antonis C. Kakas University of Cyprus, Department of Computer Science 75 Kallipoleos St., 1678 Nicosia, Cyprus E-mail:[email protected] Fariba Sadri Imperial College of Science, Technology and Medicine Department of Computing, 180 Queen’s Gate London SW7 2BZ, United Kingdom E-mail: [email protected]

Cataloging-in-Publication Data applied for Die Deutsche Bibliothek - CIP-Einheitsaufnahme Computational logic: logig programming and beyond : essays in honour of Robert A. Kowalski / Antonis C. Kakas ; Fariba Sadri (ed.). - Berlin ; Heidelberg ; New York ; Barcelona ; Hong Kong ; London ; Milan ; Paris ; Tokyo : Springer Pt. 2 . (2002) (Lecture notes in computer science ; Vol. 2408 : Lecture notes in artificial intelligence) ISBN 3-540-43960-9

CR Subject Classification (1998): I.2.3, D.1.6, I.2, F.4, I.1 ISSN 0302-9743 ISBN 3-540-43960-9 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH http://www.springer.de © Springer-Verlag Berlin Heidelberg 2002 Printed in Germany Typesetting: Camera-ready by author, data conversion by Boller Mediendesign Printed on acid-free paper SPIN 10873683 06/3142 543210

Foreword Alan Robinson This set of essays pays tribute to Bob Kowalski on his 60th birthday, an anniversary which gives his friends and colleagues an excuse to celebrate his career as an original thinker, a charismatic communicator, and a forceful intellectual leader. The logic programming community hereby and herein conveys its respect and thanks to him for his pivotal role in creating and fostering the conceptual paradigm which is its raison d'être. The diversity of interests covered here reflects the variety of Bob's concerns. Read on. It is an intellectual feast. Before you begin, permit me to send him a brief personal, but public, message: Bob, how right you were, and how wrong I was. I should explain. When Bob arrived in Edinburgh in 1967 resolution was as yet fairly new, having taken several years to become at all widely known. Research groups to investigate various aspects of resolution sprang up at several institutions, the one organized by Bernard Meltzer at Edinburgh University being among the first. For the half-dozen years that Bob was a leading member of Bernard's group, I was a frequent visitor to it, and I saw a lot of him. We had many discussions about logic, computation, and language. By 1970, the group had zeroed in on three ideas which were soon to help make logic programming possible: the specialized inference rule of linear resolution using a selection function, together with the plan of restricting it to Horn clauses ("LUSH resolution"); the adoption of an operational semantics for Horn clauses; and a marvellously fast implementation technique for linear resolution, based on structure-sharing of syntactic expressions. Bob believed that this work now made it possible to use the predicate calculus as a programming language. I was sceptical. My focus was still on the original motivation for resolution, to build better theorem provers. I worried that Bob had been sidetracked by an enticing illusion. In particular because of my intellectual investment in the classical semantics of predicate logic I was quite put off by the proposed operational semantics for Horn clauses. This seemed to me nothing but an adoption of MIT's notorious "Planner" ideology of computational inference. I did try, briefly, to persuade Bob to see things my way, but there was no stopping him. Thank goodness I could not change his mind, for I soon had to change mine. In 1971, Bob and Alain Colmerauer first got together. They pooled their thinking. The rest is history. The idea of using predicate logic as a programming language then really boomed, propelled by the rush of creative energy generated by the ensuing Marseilles-Edinburgh synergy. The merger of Bob's and Alain's independent insights launched a new era. Bob's dream came true, confirmed by the spectacular practical success of Alain's Prolog. My own doubts were swept away. In the thirty years since then, logic programming has developed into a jewel of computer science, known all over the world. Happy 60th birthday, Bob, from all of us.

Preface Bob Kowalski together with Alain Colmerauer opened up the new field of Logic Programming back in the early 1970s. Since then the field has expanded in various directions and has contributed to the development of many other areas in Computer Science. Logic Programming has helped to place logic firmly as an integral part of the foundations of Computing and Artificial Intelligence. In particular, over the last two decades a new discipline has emerged under the name of Computational Logic which aims to promote logic as a unifying basis for problem solving. This broad role of logic was at the heart of Bob Kowalski’s work from the very beginning as expounded in his seminal book “Logic for Problem Solving.” He has been instrumental both in shaping this broader scientific field and in setting up the Computational Logic community. This volume commemorates the 60th birthday of Bob Kowalski as one of the founders of and contributors to Computational Logic. It aspires to provide a landmark of the main developments in the field and to chart out its possible future directions. The authors were encouraged to provide a critical view of the main developments of the field together with an outlook on the important emerging problems and the possible contribution of Computational Logic to the future development of its related areas. The articles in this volume span the whole field of Computational Logic seen from the point of view of Logic Programming. They range from papers addressing problems concerning the development of programming languages in logic and the application of Computational Logic to real-life problems, to philosophical studies of the field at the other end of the spectrum. Articles cover the contribution of CL to Databases and Artificial Intelligence with particular interest in Automated Reasoning, Reasoning about Actions and Change, Natural Language, and Learning. It has been a great pleasure to help to put this volume together. We were delighted (but not surprised) to find that everyone we asked to contribute responded positively and with great enthusiasm, expressing their desire to honour Bob Kowalski. This enthusiasm remained throughout the long process of reviewing (in some cases a third reviewing process was necessary) that the invited papers had to go through in order for the decision to be made, whether they could be accepted for the volume. We thank all the authors very much for their patience and we hope that we have done justice to their efforts. We also thank all the reviewers, many of whom were authors themselves, who exhibited the same kind of zeal towards the making of this book. A special thanks goes out to Bob himself for his tolerance with our continuous stream of questions and for his own contribution to the book – his personal statement on the future of Logic Programming. Bob has had a major impact on our lives, as he has had on many others. I, Fariba, first met Bob when I visited Imperial College for an interview as a PhD applicant. I had not even applied for logic programming, but, somehow, I ended up being interviewed by Bob. In that very first meeting his enormous enthusiasm and energy for his subject was fully evident, and soon afterwards I found myself registered to do a PhD in logic

VIII

Preface

programming under his supervision. Since then, throughout all the years, Bob has been a constant source of inspiration, guidance, friendship, and humour. For me, Antonis, Bob did not supervise my PhD as this was not in Computer Science. I met Bob well after my PhD and I became a student again. I was extremely fortunate to have Bob as a new teacher at this stage. I already had some background in research and thus I was better equipped to learn from his wonderful and quite unique way of thought and scientific endeavour. I was also very fortunate to find in Bob a new good friend. Finally, on a more personal note the first editor wishes to thank Kim for her patient understanding and support with all the rest of life’s necessities thus allowing him the selfish pleasure of concentrating on research and other academic matters such as putting this book together. Antonis Kakas and Fariba Sadri

Table of Contents, Part II

VI Logic in Databases and Information Integration MuTACLP: A Language for Temporal Reasoning with Multiple Theories . . Paolo Baldan, Paolo Mancarella, Alessandra Raﬀaet` a, Franco Turini

1

Description Logics for Information Integration . . . . . . . . . . . . . . . . . . . . . . . . . 41 Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini Search and Optimization Problems in Datalog . . . . . . . . . . . . . . . . . . . . . . . . . 61 Sergio Greco, Domenico Sacc` a The Declarative Side of Magic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Paolo Mascellani, Dino Pedreschi Key Constraints and Monotonic Aggregates in Deductive Databases . . . . . . 109 Carlo Zaniolo

VII Automated Reasoning A Decidable CLDS for Some Propositional Resource Logics . . . . . . . . . . . . . 135 Krysia Broda A Critique of Proof Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Alan Bundy A Model Generation Based Theorem Prover MGTP for First-Order Logic . 178 Ryuzo Hasegawa, Hiroshi Fujita, Miyuki Koshimura, Yasuyuki Shirai A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 Eugenio G. Omodeo, Jacob T. Schwartz An Open Research Problem: Strong Completeness of R. Kowalski’s Connection Graph Proof Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 J¨ org Siekmann, Graham Wrightson

VIII Non-deductive Reasoning Meta-reasoning: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Stefania Costantini Argumentation-Based Proof Procedures for Credulous and Sceptical Non-monotonic Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 Phan Minh Dung, Paolo Mancarella, Francesca Toni

X

Table of Contents, Part II

Automated Abduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Katsumi Inoue The Role of Logic in Computational Models of Legal Argument: A Critical Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 Henry Prakken, Giovanni Sartor

IX Logic for Action and Change Logic Programming Updating - A Guided Approach . . . . . . . . . . . . . . . . . . . . 382 Jos´e J´ ulio Alferes, Lu´ıs Moniz Pereira Representing Knowledge in A-Prolog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Michael Gelfond Some Alternative Formulations of the Event Calculus . . . . . . . . . . . . . . . . . . . 452 Rob Miller, Murray Shanahan

X Logic, Language, and Learning Issues in Learning Language in Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 James Cussens On Implicit Meanings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506 Veronica Dahl Data Mining as Constraint Logic Programming . . . . . . . . . . . . . . . . . . . . . . . . 526 Luc De Raedt DCGs: Parsing as Deduction? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548 Chris Mellish Statistical Abduction with Tabulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Taisuke Sato, Yoshitaka Kameya

XI Computational Logic and Philosophy Logicism and the Development of Computer Science . . . . . . . . . . . . . . . . . . . . 588 Donald Gillies Simply the Best: A Case for Abduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Stathis Psillos

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627

Table of Contents, Part I

A Portrait of a Scientist as a Computational Logician . . . . . . . . . . . . . . . . . . Maurice Bruynooghe, Lu´ıs Moniz Pereira, J¨ org H. Siekmann, Maarten van Emden

1

Bob Kowalski: A Portrait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marek Sergot

5

Directions for Logic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Robert A. Kowalski

I Logic Programming Languages Agents as Multi-threaded Logical Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Keith Clark, Peter J. Robinson Logic Programming Languages for the Internet . . . . . . . . . . . . . . . . . . . . . . . . 66 Andrew Davison Higher-Order Computational Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 John W. Lloyd A Pure Meta-interpreter for Flat GHC, a Concurrent Constraint Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Kazunori Ueda

II Program Derivation and Properties Transformation Systems and Nondeclarative Properties . . . . . . . . . . . . . . . . . 162 Annalisa Bossi, Nicoletta Cocco, Sandro Etalle Acceptability with General Orderings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Danny De Schreye, Alexander Serebrenik Specification, Implementation, and Verification of Domain Specific Languages: A Logic Programming-Based Approach . . . . . . . . . . . . . . . . . . . . . 211 Gopal Gupta, Enrico Pontelli Negation as Failure through Abduction: Reasoning about Termination . . . . 240 Paolo Mancarella, Dino Pedreschi, Salvatore Ruggieri Program Derivation = Rules + Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Alberto Pettorossi, Maurizio Proietti

XII

Table of Contents, Part I

III Software Development Achievements and Prospects of Program Synthesis . . . . . . . . . . . . . . . . . . . . . 310 Pierre Flener Logic for Component-Based Software Development . . . . . . . . . . . . . . . . . . . . . 347 Kung-Kiu Lau, Mario Ornaghi Patterns for Prolog Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 Leon Sterling

IV Extensions of Logic Programming Abduction in Logic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402 Mark Denecker, Antonis Kakas Learning in Clausal Logic: A Perspective on Inductive Logic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Peter Flach, Nada Lavraˇc Disjunctive Logic Programming: A Survey and Assessment . . . . . . . . . . . . . . 472 Jack Minker, Dietmar Seipel Constraint Logic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 Mark Wallace

V Applications in Logic Planning Attacks to Security Protocols: Case Studies in Logic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 Luigia Carlucci Aiello, Fabio Massacci Multiagent Compromises, Joint Fixpoints, and Stable Models . . . . . . . . . . . . 561 Francesco Buccafurri, Georg Gottlob Error-Tolerant Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586 Thomas Eiter, Viviana Mascardi, V.S. Subrahmanian Logic-Based Hybrid Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626 Christoph G. Jung, Klaus Fischer Heterogeneous Scheduling and Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655 Thomas Sj¨ oland, Per Kreuger, Martin Aronsson

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677

MuTACLP: A Language for Temporal Reasoning with Multiple Theories Paolo Baldan, Paolo Mancarella, Alessandra Raﬀaet` a, and Franco Turini Dipartimento di Informatica, Universit` a di Pisa Corso Italia, 40, I-56125 Pisa, Italy {baldan,p.mancarella,raffaeta,turini}@di.unipi.it

Abstract. In this paper we introduce MuTACLP, a knowledge representation language which provides facilities for modeling and handling temporal information, together with some basic operators for combining diﬀerent temporal knowledge bases. The proposed approach stems from two separate lines of research: the general studies on meta-level operators on logic programs introduced by Brogi et al. [7,9] and Temporal Annotated Constraint Logic Programming (TACLP) deﬁned by Fr¨ uhwirth [15]. In MuTACLP atoms are annotated with temporal information which are managed via a constraint theory, as in TACLP. Mechanisms for structuring programs and combining separate knowledge bases are provided through meta-level operators. The language is given two diﬀerent and equivalent semantics, a top-down semantics which exploits meta-logic, and a bottom-up semantics based on an immediate consequence operator.

1

Introduction

Interest in research concerning the handling of temporal information has been growing steadily over the past two decades. On the one hand, much eﬀort has been spent in developing extensions of logic languages capable to deal with time (see, e.g., [14,36]). On the other hand, in the ﬁeld of databases, many approaches have been proposed to extend existing data models, such as the relational, the object-oriented and the deductive models, to cope with temporal data (see, e.g., the books [46,13] and references therein). Clearly these two strands of research are closely related, since temporal logic languages can provide solid theoretical foundations for temporal databases, and powerful knowledge representation and query languages for them [11,17,35]. Another basic motivation for our work is the need of mechanisms for combining pieces of knowledge which may be separated into various knowledge bases (e.g., distributed over the web), and thus which have to be merged together to reason with. This paper aims at building a framework where temporal information can be naturally represented and handled, and, at the same time, knowledge can be separated and combined by means of meta-level composition operators. Concretely, we introduce a new language, called MuTACLP, which is based on Temporal Annotated Constraint Logic Programming (TACLP), a powerful framework deﬁned A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 1–40, 2002. c Springer-Verlag Berlin Heidelberg 2002

2

Paolo Baldan et al.

by Fr¨ uhwirth in [15], where temporal information and reasoning can be naturally formalized. Temporal information is represented by temporal annotations which say at what time(s) the formula to which they are attached is valid. Such annotations make time explicit but avoid the proliferation of temporal variables and quantiﬁers of the ﬁrst-order approach. In this way, MuTACLP supports quantitative temporal reasoning and allows one to represent deﬁnite, indeﬁnite and periodic temporal information, and to work both with time points and time periods (time intervals). Furthermore, as a mechanism for structuring programs and combining diﬀerent knowledge sources, MuTACLP oﬀers a set of program composition operators in the style of Brogi et al. [7,9]. Concerning the semantical aspects, the use of meta-logic allows us to provide MuTACLP with a formal and, at the same time, executable top-down semantics based on a meta-interpreter. Furthermore the language is given a bottom-up semantics by introducing an immediate consequence operator which generalizes the operator for ordinary constraint logic programs. The two semantics are equivalent in the sense that the meta-interpreter can be proved sound and complete with respect to the semantics based on the immediate consequence operator. An interesting aspect of MuTACLP is the fact that it integrates modularity and temporal reasoning, a feature which is not common to logical temporal languages (e.g., it is lacking in [1,2,10,12,15,16,21,28]). Two exceptions are the language Temporal Datalog by Orgun [35] and the work on amalgamating knowledge bases by Subrahmanian [45]. Temporal Datalog introduces a concept of module, which, however, seems to be used as a means for deﬁning new nonstandard algebraic operators, rather than as a knowledge representation tool. On the other hand, the work on amalgamating knowledge bases oﬀers a multitheory framework, based on annotated logics, where temporal information can be handled, but only a limited interaction among the diﬀerent knowledge sources is allowed: essentially a kind of message passing mechanism allows one to delegate the resolution of an atom to other databases. In the database ﬁeld, our approach is close to the paradigm of constraint databases [25,27]. In fact, in MuTACLP the use of constraints allows one to model temporal information and to enable eﬃcient implementations of the language. Moreover, from a deductive database perspective, each constraint logic program of our framework can be viewed as an enriched relational database where relations are represented partly intensionally and partly extensionally. The meta-level operators can then be considered as a means of constructing views by combining diﬀerent databases in various ways. The paper is organized as follows. Section 2 brieﬂy introduces the program composition operators for combining logic theories of [7,9] and their semantics. Section 3, after reviewing the basics of constraint logic programming, introduces the language TACLP. Section 4 deﬁnes the new language MuTACLP, which integrates the basic ideas of TACLP with the composition operators on theories. In Section 5 the language MuTACLP is given a top-down semantics by means of a meta-interpreter and a bottom-up semantics based on an immediate consequence operator, and the two semantics are shown to be equivalent. Section 6 presents

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

3

some examples to clarify the use of operators on theories and to show the expressive power and the knowledge representation capabilities of the language. Section 7 compares MuTACLP with some related approaches in the literature and, ﬁnally, Section 8 outlines our future research plans. Proofs of propositions and theorems are collected in the Appendix. Due to space limitations, the proofs of some technical lemmata are omitted and can be found in [4,38]. An extended abstract of this paper has been presented at the International Workshop on Spatio-Temporal Data Models and Languages [33].

2

Operators for Combining Theories

Composition operators for logic programs have been thoroughly investigated in [7,9], where both their meta-level and their bottom-up semantics are studied and compared. In order to illustrate the basic notions and ideas of such an approach this section describes the meta-level deﬁnition of the operators, which is simply obtained by adding new clauses to the well-known vanilla metainterpreter for logic programs. The resulting meta-interpreter combines separate programs without actually building a new program. Its meaning is straightforward and, most importantly, the meta-logical deﬁnition shows that the multitheory framework can be expressed from inside logic programming itself. We consider two operators to combine programs: union ∪ and intersection ∩. Then the so-called program expressions are built by starting from a set of plain programs, consisting of collections of clauses, and by repeatedly applying the composition operators. Formally, the language of program expressions Exp is deﬁned by the following abstract syntax: Exp ::= Pname | Exp ∪ Exp | Exp ∩ Exp where Pname is the syntactic category of constant names for plain programs. Following [6], the two-argument predicate demo is used to represent provability. Namely, demo(E, G) means that the formula G is provable with respect to the program expression E. demo(E, empty). demo(E, (B1 , B2 )) ← demo(E, B1 ), demo(E, B2 ) demo(E, A) ← clause(E, A, B), demo(E, B) The unit clause states that the empty goal, represented by the constant symbol empty, is solved in any program expression E. The second clause deals with conjunctive goals. It states that a conjunction (B1 , B2 ) is solved in the program expression E if B1 is solved in E and B2 is solved in E. Finally, the third clause deals with the case of atomic goal reduction. To solve an atomic goal A, a clause with head A is chosen from the program expression E and the body of the clause is recursively solved in E. We adopt the simple naming convention used in [29]. Object programs are named by constant symbols, denoted by capital letters like P and Q. Object

4

Paolo Baldan et al.

level expressions are represented at the meta-level by themselves. In particular, object level variables are denoted by meta-level variables, according to the socalled non-ground representation. An object level program P is represented, at the meta-level, by a set of axioms of the kind clause(P, A, B), one for each object level clause A ← B in the program P . Each program composition operator is represented at the meta-level by a functor, whose meaning is deﬁned by adding new clauses to the above metainterpreter. clause(E1 ∪ E2 , A, B) ← clause(E1 , A, B) clause(E1 ∪ E2 , A, B) ← clause(E2 , A, B) clause(E1 ∩ E2 , A, (B1 , B2 )) ← clause(E1 , A, B1 ), clause(E2 , A, B2 ) The added clauses have a straightforward interpretation. Informally, union and intersection mirror two forms of cooperation among program expressions. In the case of union E1 ∪E2 , whose meta-level implementation is deﬁned by the ﬁrst two clauses, either expression E1 or E2 may be used to perform a computation step. For instance, a clause A ← B belongs to the meta-level representation of P ∪ Q if it belongs either to the meta-level representation of P or to the meta-level representation of Q. In the case of intersection E1 ∩ E2 , both expressions must agree to perform a computation step. This is expressed by the third clause, which exploits the basic uniﬁcation mechanism of logic programming and the non-ground representation of object level programs. A program expression E can be queried by demo(E, G), where G is an object level goal.

3

Temporal Annotated CLP

In this section we ﬁrst brieﬂy recall the basic concepts of Constraint Logic Programming (CLP). Then we give an overview of Temporal Annotated CLP (TACLP), an extension of CLP suited to deal with time, which will be used as a basic language for plain programs in our multi-theory framework. The reader is referred to the survey of Jaﬀar and Maher [22] for a comprehensive introduction to the motivations, foundations, and applications of CLP languages, and to the recent work of Jaﬀar et al. [23] for the formal presentation of the semantics. A good reference for TACLP is Fr¨ uhwirth’s paper [15]. 3.1

Constraint Logic Programming

A CLP language is completely determined by its constraint domain. A constraint domain C is a tuple SC , LC , DC , TC , solvC , where – SC = ΣC , ΠC is the constraint domain signature, comprising the function symbols ΣC and the predicate symbols ΠC .

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

5

– LC is the class of constraints, a set of ﬁrst-order SC -formulae, denoted by C, possibly subscripted. – DC is the domain of computation, a SC -structure which provides the intended interpretation of the constraints. The domain (or support) of DC is denoted by DC . – TC is the constraint theory, a SC -theory describing the logical semantics of the constraints. – solvC is the constraint solver, a (computable) function which maps each formula in LC to either true, or false, or unknown, indicating that the formula is satisﬁable, unsatisﬁable or it cannot be told, respectively. We assume that ΠC contains the predicate symbol “=”, interpreted as identity in DC . Furthermore we assume that LC contains all atoms constructed from “=”, the always satisﬁable constraint true and the unsatisﬁable constraint false, and that LC is closed under variable renaming, existential quantiﬁcation and conjunction. A primitive constraint is an atom of the form p(t1 , . . . , tn ) where p is a predicate in ΠC and t1 , . . . , tn are terms on ΣC . We assume that the solver does not take variable names into account. Also, the domain, the theory and the solver agree in the sense that DC is a model of TC and for every C ∈ LC : – solvC (C) = true implies TC |= ∃C, and – solvC (C) = f alse implies TC |= ¬∃C. Example 1. (Real) The constraint domain Real has as predicate symbols, +, -, *, / as function symbols and sequences of digits (possibly with a decimal point) as constant symbols. Examples of primitive constraints are X + 3 10. The domain of computation is the structure with reals as domain, and where the predicate symbols and the function symbols +, -, *, / are interpreted as the usual relations and functions over reals. Finally, the theory TReal is the theory of real closed ﬁelds. A possible constraint solver is provided by the CLP(R) system [24], which relies on Gauss-Jordan elimination to handle linear constraints. Non-linear constraints are not taken into account by the solver (i.e., their evaluation is delayed) until they become linear. Example 2. (Logic Programming) The constraint domain Term has = as predicate symbol and strings of alphanumeric characters as function or constant symbols. The domain of computation of Term is the set Tree of ﬁnite trees (or, equivalently, of ﬁnite terms), while the theory TTerm is Clark’s equality theory. The interpretation of a constant is a tree with a single node labeled by the constant. The interpretation of an n-ary function symbol f is the function fTree : Tree n → Tree mapping the trees t1 , . . . , tn to a new tree with root labeled by f and with t1 , . . . , tn as children. A constraint solver is given by the uniﬁcation algorithm. Then CLP(Term) coincides with logic programming.

6

Paolo Baldan et al.

For a given constraint domain C, we denote by CLP(C) the CLP language based on C. Our results are parametric to a language L in which all programs and queries under consideration are included. The set of function symbols in L, denoted by ΣL , coincides with ΣC , while the set of predicate symbols ΠL includes ΠC . A constraint logic program, or simply a program, is a ﬁnite set of rules of the form: A ← C1 , . . . , Cn , B1 , . . . , Bm where A and B1 , . . . , Bm (m ≥ 0) are atoms (whose predicate symbols are in ΠL but not in ΠC ), and C1 , . . . , Cn (n ≥ 0) are primitive constraints1 (A is called the head of the clause and C1 , . . . , Cn , B1 , . . . , Bm the body of the clause). If m = 0 then the clause is called a fact. A query is a sequence of atoms and/or constraints. Interpretations and Fixpoints. A C-interpretation for a CLP(C) program is an interpretation which agrees with DC on the interpretations of the symbols in LC . Formally, a C-interpretation I is a subset of C-base L , i.e. of the set {p(d1 , . . . , dn ) | p predicate in ΠL \ ΠC , d1 , . . . , dn ∈ DC }. Note that the meaning of primitive constraints is not speciﬁed, being ﬁxed by C. The notions of C-model and least C-model are a natural extension of the corresponding logic programming concepts. A valuation σ is a function that maps variables into DC . A C-ground instance A of an atom A is obtained by applying a valuation σ to the atom, thus producing a construct of the form p(a1 , . . . , an ) with a1 , . . . , an elements in DC . C-ground instances of queries and clauses are deﬁned in a similar way. We denote by ground C (P ) the set of C-ground instances of clauses from P . Finally the immediate consequence operator for a CLP(C) program P is a function TPC : ℘(C-baseL ) → ℘(C-baseL ) deﬁned as follows: A ← C1 , . . . , Ck , B1 , . . . , Bn , ∈ ground C (P ), C TP (I) = A | {B1 , . . . , Bn } ⊆ I, DC |= C1 , . . . , Ck The operator TPC is continuous, and therefore it has a least ﬁxpoint which can be computed as the least upper bound of the ω-chain {(TPC )i } i≥0 of the iterated applications of TPC starting from the empty set, i.e., (TPC )ω = i∈N (TPC )i . 3.2

Temporal Annotated Constraint Logic Programming

Temporal Annotated Constraint Logic Programming (TACLP), proposed by Fr¨ uhwirth in [15,39], has been shown to be a natural and powerful framework for formalizing temporal information and reasoning. In [15] TACLP is presented 1

Constraints and atoms can be in any position inside the body of a clause, although, for the sake of simplicity, we will always assume that the sequence of constraints precedes the sequence of atoms.

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

7

as an instance of annotated constraint logic (ACL) suited for reasoning about time. ACL, which can be seen as an extension of generalized annotated programs [26,30], generalizes basic ﬁrst-order languages with a distinguished class of predicates, called constraints, and a distinguished class of terms, called annotations, used to label formulae. Moreover ACL provides inference rules for annotated formulae and a constraint theory for handling annotations. An advantage of the languages in the ACL framework is that their clausal fragment can be eﬃciently implemented: given a logic in this framework, there is a systematic way to make a clausal fragment executable as a constraint logic program. Both an interpreter and a compiler can be generated and implemented in standard constraint logic programming languages. We next summarize the syntax and semantics of TACLP. As mentioned above, TACLP is a constraint logic programming language where formulae can be annotated with temporal labels and where relations between these labels can be expressed by using constraints. In TACLP the choice of the temporal ontology is free. In this paper, we will consider the instance of TACLP where time points are totally ordered and labels involve convex, non-empty sets of time points. Moreover we will assume that only atomic formulae can be annotated and that clauses are negation free. With an abuse of notation, in the rest of the paper such a subset of the language will be referred to simply as TACLP. Time can be discrete or dense. Time points are totally ordered by the relation ≤. We denote by D the set of time points and we suppose to have a set of operations (such as the binary operations +, −) to manage such points. We assume that the time-line is left-bounded by the number 0 and open to the future, with the symbol ∞ used to denote a time point that is later than any other. A time period is an interval [r, s] with r, s ∈ D and 0 ≤ r ≤ s ≤ ∞, which represents the convex, non-empty set of time points {t | r ≤ t ≤ s}2 . Thus the interval [0, ∞] denotes the whole time line. An annotated formula is of the form A α where A is an atomic formula and α an annotation. In TACLP, there are three kinds of annotations based on time points and on time periods. Let t be a time point and J = [r, s] be a time period. (at) The annotated formula A at t means that A holds at time point t. (th) The annotated formula A th J means that A holds throughout, i.e., at every time point in, the time period J. The deﬁnition of a th-annotated formula in terms of at is: A th J ⇔ ∀t (t ∈ J → A at t). (in) The annotated formula A in J means that A holds at some time point(s) but we do not know exactly which - in the time period J. The deﬁnition of an in-annotated formula in terms of at is: A in J ⇔ ∃t (t ∈ J ∧ A at t). The in temporal annotation accounts for indeﬁnite temporal information. 2

The results we present naturally extend to time lines that are bounded or unbounded in other ways and to time periods that are open on one or both sides.

8

Paolo Baldan et al.

The set of annotations is endowed with a partial order relation which turns it into a lattice. Given two annotations α and β, the intuition is that α β if α is “less informative” than β in the sense that for all formulae A, A β ⇒ A α. More precisely, being an instance of ACL, in addition to Modus Ponens, TACLP has two further inference rules: the rule () and the rule ( ). Aα

γα Aγ

rule ()

Aα

Aβ γ=α Aγ

β

rule ( )

The rule () states that if a formula holds with some annotation, then it also holds with all annotations that are smaller according to the lattice ordering. The rule ( ) says that if a formula holds with some annotation α and the same formula holds with another annotation β then it holds with the least upper bound α β of the two annotations. Next, we introduce the constraint theory for temporal annotations. Recall that a constraint theory is a non-empty, consistent ﬁrst order theory that axiomatizes the meaning of the constraints. Besides an axiomatization of the total order relation ≤ on the set of time points D, the constraint theory includes the following axioms deﬁning the partial order on temporal annotations. (at th) (at in) (th ) (in )

at t = th [t, t] at t = in [t, t] th [s1 , s2 ] th [r1 , r2 ] ⇔ r1 ≤ s1 , s1 ≤ s2 , s2 ≤ r2 in [r1 , r2 ] in [s1 , s2 ] ⇔ r1 ≤ s1 , s1 ≤ s2 , s2 ≤ r2

The ﬁrst two axioms state that th I and in I are equivalent to at t when the time period I consists of a single time point t.3 Next, if a formula holds at every element of a time period, then it holds at every element in all sub-periods of that period ((th ) axiom). On the other hand, if a formula holds at some points of a time period then it holds at some points in all periods that include this period ((in ) axiom). A consequence of the above axioms is (in th )

in [s1 , s2 ] th [r1 , r2 ] ⇔ s1 ≤ r2 , r1 ≤ s2 , s1 ≤ s2 , r1 ≤ r2

i.e., an atom annotated by in holds in any time period that overlaps with a time period where the atom holds throughout. To summarize the above explanation, the axioms deﬁning the partial order relation on annotations can be arranged in the following chain, where it is assumed that r1 ≤ s1 , s1 ≤ s2 , s2 ≤ r2 : in [r1 , r2 ] in [s1 , s2 ] in [s1 , s1 ] = at s1 = th [s1 , s1 ] th [s1 , s2 ] th [r1 , r2 ] Before giving an axiomatization of the least upper bound on temporal annotations, let us recall that, as explained in [15], the least upper bound of two annotations always exists but sometimes it may be “too large”. In fact, rule ( ) is correct only if the lattice order ensures A α ∧ A β ∧ (γ = α β) =⇒ A γ whereas, 3

Especially in dense time, one may disallow singleton periods and drop the two axioms. This restriction has no eﬀects on the results we are presenting.

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

9

in general, this is not true in our case. For instance, according to the lattice, th [1, 2] th [4, 5] = th [1, 5], but according to the deﬁnition of th-annotated formulae in terms of at , the conjunction A th [1, 2] ∧ A th [4, 5] does not imply A th [1, 5], since it does not express that A at 3 holds. From a theoretical point of view, this problem can be overcome by enriching the lattice of annotations with expressions involving . In practice, it suﬃces to consider the least upper bound for time periods that produce another diﬀerent meaningful time period. Concretely, one restricts to th annotations with overlapping time periods that do not include one another: (th )

th [s1 , s2 ]

th [r1 , r2 ] = th [s1 , r2 ] ⇔ s1 < r1 , r1 ≤ s2 , s2 < r2

Summarizing, a constraint domain for time points is ﬁxed where the signature includes suitable constants for time points, function symbols for operations on time points (e.g., +, −, . . .) and the predicate symbol ≤, modeling the total order relation on time points. Such constraint domain is extended to a constraint domain A for handling annotations, by enriching the signature with function symbols [·, ·], at, th, in, and the predicate symbol , axiomatized as described above. Then, as for ordinary constraint logic programming, a TACLP language is determined by ﬁxing a constraint domain C, which is required to contain the constraint domain A for annotations. We denote by TACLP(C) the TACLP language based on C. To lighten the notation, in the following, the “C” will be often omitted. The next deﬁnition introduces the clausal fragment of TACLP that can be used as an eﬃcient temporal programming language. Definition 1. A TACLP clause is of the form: A α ← C1 , . . . , Cn , B1 α1 , . . . , Bm αm (n, m ≥ 0) where A is an atom (not a constraint), α and αi are (optional) temporal annotations, the Cj ’s are constraints and the Bi ’s are atomic formulae. Constraints Cj cannot be annotated. A TACLP program is a ﬁnite set of TACLP clauses.

4

Multi-theory Temporal Annotated Constraint Logic Programming

A ﬁrst attempt to extend the multi-theory framework introduced in Section 2 to handle temporal information is presented in [32]. In that paper an object level program is a collection of annotated logic programming clauses, named by a constant symbol. An annotated clause is of the kind A ← B1 , . . . , Bn 2 [a, b] where the annotation [a, b] represents the period of time in which the clause holds. The handling of time is hidden at the object level and it is managed at the meta-level by intersecting or joining the intervals associated with clauses. However, this approach is not completely satisfactory, in that it does not oﬀer

10

Paolo Baldan et al.

mechanisms for modeling indeﬁnite temporal information and for handling periodic data. Moreover, some problems arise when we want to extract temporal information from the intervals at the object level. To obtain a more expressive language, where in particular the mentioned deﬁciencies are overcome, in this paper we consider a multi-theory framework where object level programs are taken from Temporal Annotated Constraint Logic Programming (TACLP) and the composition operators are generalized to deal with temporal annotated constraint logic programs. The resulting language, called Multi-theory Temporal Annotated Constraint Logic Programming (MuTACLP for short), thus arises as a synthesis of the work on composition operators for logic programs and of TACLP. It can be thought of both as a language which enriches TACLP with high-level mechanisms for structuring programs and for combining separate knowledge bases, and as an extension of the language of program expressions with constraints and with time-representation mechanisms based on temporal annotations for atoms. The language of program expressions remains formally the same as the one in Section 2. However now plain programs, named by the constant symbols in Pname, are TACLP programs as deﬁned in Section 3.2. Also the structure of the time domain remains unchanged, whereas, to deal with program composition, the constraint theory presented in Section 3.2 is enriched with the axiomatization of the greatest lower bound # of two annotations: (th #) th [s1 , s2 ] # th [r1 , r2 ] = th [t1 , t2 ] ⇔ s1 ≤ s2 , r1 ≤ r2 , t1 = max {s1 , r1 }, t2 = min{s2 , r2 }, t1 ≤ t2 (th # ) th [s1 , s2 ] # th [r1 , r2 ] = in [t2 , t1 ] ⇔ s1 ≤ s2 , r1 ≤ r2 , t1 = max {s1 , r1 }, t2 = min{s2 , r2 }, t2 < t1 (th in #) th [s1 , s2 ] # in [r1 , r2 ] = in [r1 , r2 ] ⇔ s1 ≤ r2 , r1 ≤ s2 , s1 ≤ s2 , r1 ≤ r2 (th in # ) th [s1 , s2 ] # in [r1 , r2 ] = in [s2 , r2 ] ⇔ s1 ≤ s2 , s2 < r1 , r1 ≤ r2 (th in # ) th [s1 , s2 ] # in [r1 , r2 ] = in [r1 , s1 ] ⇔ r1 ≤ r2 , r2 < s1 , s1 ≤ s2 (in #) in [s1 , s2 ] # in [r1 , r2 ] = in [t1 , t2 ] ⇔ s1 ≤ s2 , r1 ≤ r2 , t1 = min{s1 , r1 }, t2 = max {s2 , r2 } Keeping in mind that annotations deal with time periods, i.e., convex, nonempty sets of time points, it is not diﬃcult to verify that the axioms above indeed deﬁne the greatest lower bound with respect to the partial order relation . For instance the greatest lower bound of two th annotations, th [s1 , s2 ] and th [r1 , r2 ], can be: – a th [t1 , t2 ] annotation if [r1 , r2 ] and [s1 , s2 ] are overlapping intervals and [t1 , t2 ] is their (not empty) intersection (axiom (th #)); – an in [t1 , t2 ] annotation, otherwise, where interval [t1 , t2 ] is the least convex set which intersects both [s1 , s2 ] and [r1 , r2 ] (axiom (th # ), see Fig. 1.(a)). In all other cases the greatest lower bound is always an in annotation. For instance, as expressed by axiom (th in # ), the greatest lower bound of two

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

11

annotations th [s1 , s2 ] and in [r1 , r2 ] with disjoint intervals is given by in [s2 , r2 ], where interval [s2 , r2 ] is the least convex set containing [r1 , r2 ] and intersecting [s1 , s2 ] (see Fig. 1.(b)). The greatest lower bound will play a basic role in the deﬁnition of the intersection operation over program expressions. Notice that in TACLP it is not needed since the problem of combining programs is not dealt with. th s1

th s2

r1

th r2

in

s1

s2

r1

in t1

r2

in t2

(a)

s2

r2

(b)

Fig. 1. Greatest lower bound of annotations.

Finally, as in TACLP we still have, in addition to Modus Ponens, the inference rules () and ( ). Example 3. In a company there are some managers and a secretary who has to manage their meetings and appointments. During the day a manager can be busy if she/he is on a meeting or if she/he is not present in the oﬃce. This situation is modeled by the theory Managers. Managers: busy(M ) th [T1 , T2 ] ← in-meeting(M ) th [T1 , T2 ] busy(M ) th [T1 , T2 ] ← out -of -oﬃce(M ) th [T1 , T2 ] This theory is parametric with respect to the predicates in-meeting and out -of -oﬃce since the schedule of managers varies daily. The schedules are collected in a separate theory Today-Schedule and, to know whether a manager is busy or not, such a theory is combined with Managers by using the union operator. For instance, suppose that the schedule for a given day is the following: Mr. Smith and Mr. Jones have a meeting at 9am lasting one hour. In the afternoon Mr. Smith goes out for lunch at 2pm and comes back at 3pm. The theory Today-Schedule below represents such information. Today-Schedule: in-meeting(mrSmith) th [9am, 10am]. in-meeting(mrJones) th [9am, 10am]. out -of -oﬃce(mrSmith) th [2pm, 3pm]. To know whether Mr. Smith is busy between 9:30am and 10:30am the secretary can ask for busy(mrSmith) in [9:30am, 10:30am]. Since Mr. Smith is in a meeting

12

Paolo Baldan et al.

from 9am till 10am, she will indeed obtain that Mr. Smith is busy. The considered query exploits indeﬁnite information, because knowing that Mr. Smith is busy in one instant in [9:30am, 10:30am] the secretary cannot schedule an appointment for him for that period. Example 4. At 10pm Tom was found dead in his house. The only hint is that the answering machine recorded some messages from 7pm up to 8pm. At a ﬁrst glance, the doctor said Tom died one to two hours before. The detective made a further assumption: Tom did not answer the telephone so he could be already dead. We collect all these hints and assumptions into three programs, Hints, Doctor and Detective, in order not to mix ﬁrm facts with simple hypotheses that might change during the investigations. Hints:

found at 10pm. ans-machine th [7pm, 8pm].

Doctor:

dead in [T − 2:00, T − 1:00] ← found at T

Detective:

dead in [T1 , T2 ] ← ans-machine th [T1 , T2 ]

If we combine the hypotheses of the doctor and those of the detective we can extend the period of time in which Tom possibly died. The program expression Doctor ∩ Detective behaves as dead in [S1 , S2 ] ← in [T − 2:00, T − 1:00] # in [T1 , T2 ] = in [S1 , S2 ], found at T , ans-machine th [T1 , T2 ] The constraint in [T − 2:00, T − 1:00] # in [T1 , T2 ] = in [S1 , S2 ] determines the annotation in [S1 , S2 ] in which Tom possibly died: according to axiom (in #) the resulting interval is S1 = min{T − 2:00, T1 } and S2 = max {T − 1:00, T2}. In fact, according to the semantics deﬁned in the next section, a consequence of the program expression Hints ∪ (Doctor ∩ Detective) is just dead in [7pm, 9pm] since the annotation in [7pm, 9pm] is the greatest lower bound of in [8pm, 9pm] and in [7pm, 8pm].

5

Semantics of MuTACLP

In this section we introduce an operational (top-down) semantics for the language MuTACLP by means of a meta-interpreter. Then we provide MuTACLP with a least ﬁxpoint (bottom-up) semantics, based on the deﬁnition of an immediate consequence operator. Finally, the meta-interpreter for MuTACLP is proved sound and complete with respect to the least ﬁxpoint semantics.

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

13

In the deﬁnition of the semantics, without loss of generality, we assume all atoms to be annotated with th or in labels. In fact at t annotations can be replaced with th [t, t] by exploiting the (at th) axiom. Moreover, each atom which is not annotated in the object level program is intended to be true throughout the whole temporal domain and thus it can be labelled with th [0, ∞]. Constraints remain unchanged. 5.1

Meta-interpreter

The extended meta-interpreter is deﬁned by the following clauses. demo(E, empty).

(1)

demo(E, (B1 , B2 )) ← demo(E, B1 ), demo(E, B2 )

(2)

demo(E, A th [T1 , T2 ]) ← S1 ≤ T1 , T1 ≤ T2 , T2 ≤ S2 , clause(E, A th [S1 , S2 ], B), demo(E, B)

(3)

demo(E, A th [T1 , T2 ]) ← S1 ≤ T1 , T1 < S2 , S2 < T2 , clause(E, A th [S1 , S2 ], B), demo(E, B), demo(E, A th [S2 , T2 ])

(4)

demo(E, A in [T1 , T2 ]) ← T1 ≤ S2 , S1 ≤ T2 , T1 ≤ T2 , clause(E, A th [S1 , S2 ], B), demo(E, B)

(5)

demo(E, A in [T1 , T2 ]) ← T1 ≤ S1 , S2 ≤ T2 , clause(E, A in [S1 , S2 ], B), demo(E, B)

(6)

demo(E, C) ← constraint(C), C

(7)

clause(E1 ∪ E2 , A α, B) ← clause(E1 , A α, B)

(8)

clause(E1 ∪ E2 , A α, B) ← clause(E2 , A α, B)

(9)

clause(E1 ∩ E2 , A γ, (B1 , B2 )) ← clause(E1 , A α, B1 ), clause(E2 , A β, B2 ), α#β =γ

(10)

A clause A α ← B of a plain program P is represented at the meta-level by clause(P, A α, B) ← S1 ≤ S2 where α = th [S1 , S2 ] or α = in [S1 , S2 ].

(11)

14

Paolo Baldan et al.

This meta-interpreter can be written in any CLP language that provides a suitable constraint solver for temporal annotations (see Section 3.2 for the corresponding constraint theory). A ﬁrst diﬀerence with respect to the metainterpreter in Section 2 is that our meta-interpreter handles constraints that can either occur explicitly in its clauses, e.g., the constraint s1 ≤ t1 , t1 ≤ t2 , t2 ≤ s2 in clause (3), or can be produced by resolution steps. Constraints of the latter kind are managed by clause (7) which passes each constraint C to be solved directly to the constraint solver. The second diﬀerence is that our meta-interpreter implements not only Modus Ponens but also rule () and rule ( ). This is the reason why the third clause for the predicate demo of the meta-interpreter in Section 2 is now split into four clauses. Clauses (3), (5) and (6) implement the inference rule (): the atomic goal to be solved is required to be labelled with an annotation which is smaller than the one labelling the head of the clause used in the resolution step. For instance, clause (3) states that given a clause A th [s1 , s2 ] ← B whose body B is solvable, we can derive the atom A annotated with any th [t1 , t2 ] such that th [t1 , t2 ] th [s1 , s2 ], i.e., according to axiom (th ), [t1 , t2 ] ⊆ [s1 , s2 ], as expressed by the constraint s1 ≤ t1 , t1 ≤ t2 , t2 ≤ s2 . Clauses (5) and (6) are built in an analogous way by exploiting axioms (in th ) and (in ), respectively. Rule ( ) is implemented by clause (4). According to the discussion in Section 3.2, it is applicable only to th annotations involving overlapping time periods which do not include one another. More precisely, clause (4) states that if we can ﬁnd a clause A th [s1 , s2 ] ← B such that the body B is solvable, and if moreover the atom A can be proved throughout the time period [s2 , t2 ] (i.e., demo(E, A th [s2 , t2 ]) is solvable) then we can derive the atom A labelled with any annotation th [t1 , t2 ] th [s1 , t2 ]. The constraints on temporal variables ensure that the time period [t1 , t2 ] is a new time period diﬀerent from [s1 , s2 ], [s2 , t2 ] and their subintervals. Finally, in the meta-level representation of object clauses, as expressed by clause (11), the constraint s1 ≤ s2 is added to ensure that the head of the object clause has a well-formed, namely non-empty, annotation. As far as the meta-level deﬁnition of the union and intersection operators is concerned, clauses implementing the union operation remain unchanged with respect to the original deﬁnition in Section 2, whereas in the clause implementing the intersection operation a constraint is added, which expresses the annotation for the derived atom. Informally, a clause A α ← B, belonging to the intersection of two program expressions E1 and E2 , is built by taking one clause instance from each program expression E1 and E2 , such that the head atoms of the two clauses are uniﬁable. Let such instances of clauses be cl1 and cl2 . Then B is the conjunction of the bodies of cl1 and cl2 and A is the uniﬁed atom labelled with the greatest lower bound of the annotations of the heads of cl1 and cl2 . The following example shows the usefulness of clause (4) to derive new temporal information according to the inference rule ( ). Example 5. Consider the databases DB1 and DB2 containing information about people working in two companies. Jim is a consultant and he works for the ﬁrst

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

15

company from January 1, 1995 to April 30, 1995 and for the second company from April 1, 1995 to September 15, 1995. DB1: consultant(jim) th [Jan 1 1995 , Apr 30 1995 ]. DB2: consultant(jim) th [Apr 1 1995 , Sep 15 1995 ]. The period of time in which Jim works as a consultant can be obtained by querying the union of the above theories as follows: demo(DB1 ∪ DB2, consultant(jim) th [T1 , T2 ]). By using clause (4), we can derive the interval [Jan 1 1995 , Sep 15 1995 ] (more precisely, the constraints Jan 1 1995 ≤ T1 , T1 < Apr 30 1995 , Apr 30 1995 < T2 , T2 ≤ Sep 15 1995 are derived) that otherwise would never be generated. In fact, by applying clause (3) alone, we can prove only that Jim is a consultant in the intervals [Jan 1 1995 , Apr 30 1995 ] and [Apr 1 1995 , Sep 15 1995 ] (or in subintervals of them) separately. 5.2

Bottom-Up Semantics

To give a declarative meaning to program expressions, we deﬁne a “higherorder” semantics for MuTACLP. In fact, the results in [7] show that the least Herbrand model semantics of logic programs does not scale smoothly to program expressions. Fundamental properties of semantics, like compositionality and full abstraction, are deﬁnitely lost. Intuitively, the least Herbrand model semantics is not compositional since it identiﬁes programs which have diﬀerent meanings when combined with others. Actually, all the programs whose least Herbrand model is empty are identiﬁed with the empty program. For example, the programs {r ← s} {r ← q} are both denoted by the empty model, though they behave quite diﬀerently when composed with other programs (e.g., consider the union with {q.}). Brogi et al. showed in [9] that deﬁning as meaning of a program P the immediate consequence operator TP itself (rather than the least ﬁxpoint of TP ), one obtains a semantics which is compositional with respect to several interesting operations on programs, in particular ∪ and ∩. Along the same line, the semantics of a MuTACLP program expression is taken to be the immediate consequence operator associated with it, i.e., a function from interpretations to interpretations. The immediate consequence operator of constraint logic programming is generalized to deal with temporal annotations by considering a kind of extended interpretations, which are basically sets of annotated elements of C-base L . More precisely, we ﬁrst deﬁne a set of (semantical) annotations Ann = {th [t1 , t2 ], in [t1 , t2 ] | t1 , t2 time points ∧ DC |= t1 ≤ t2 }

16

Paolo Baldan et al.

where DC is the SC -structure providing the intended interpretation of the constraints. Then the lattice of interpretations is deﬁned as (℘(C-base L × Ann), ⊆) where ⊆ is the usual set-theoretic inclusion. Finally the immediate consequence operator TCE for a program expression E is compositionally deﬁned in terms of the immediate consequence operator for its sub-expressions. Definition 2 (Bottom-up semantics). Let E be a program expression, the function TCE : ℘(C-base L × Ann) → ℘(C-base L × Ann) is deﬁned as follows. – (E is a plain program P ) TCP (I) = (α = th [s1 , s2 ] ∨ α = in [s1 , s2 ]), ¯ B1 α1 , . . . , Bn αn ∈ ground C (P ), A α ← C, (A, α) | {(B1 , β1 ), . . . , (Bn , βn )} ⊆ I, ¯ α1 β1 , . . . , αn βn , s1 ≤ s2 DC |= C, ∪ ¯ B1 α1 , . . . , Bn αn ∈ ground C (P ), A th [s1 , s2 ] ← C, (A, th [s1 , r2 ]) | {(B1 , β1 ), . . . , (Bn , βn )} ⊆ I, (A, th [r1 , r2 ]) ∈ I, ¯ α1 β1 , . . . , αn βn , s1 < r1 , r1 ≤ s2 , s2 < r2 DC |= C, where C¯ is a shortcut for C1 , . . . , Ck . – (E = E1 ∪ E2 ) TCE1 ∪E2 (I) = TCE1 (I) ∪ TCE2 (I) – (E = E1 ∩ E2 ) TCE1 ∩E2 (I) = TCE1 (I) e TCE2 (I) where I1 e I2 = {(A, γ) | (A, α) ∈ I1 , (A, β) ∈ I2 , DC |= α # β = γ}. Observe that the deﬁnition above properly extends the standard deﬁnition of the immediate consequence operator for constraint logic programs (see Section 3.1). In fact, besides the usual Modus Ponens rule, it captures rule ( ) (as expressed by the second set in the deﬁnition of TCP ). Furthermore, also rule () is taken into account to prove that an annotated atom holds in an interpretation: to derive the head A α of a clause it is not necessary to ﬁnd in the interpretation exactly the atoms B1 α1 , . . . , Bn αn occurring in the body of the clause, but it suﬃces to ﬁnd atoms Bi βi which imply Bi αi , i.e., such that each βi is an annotation stronger than αi (DC |= αi βi ). Notice that TCE (I) is not downward closed, namely, it is not true that if (A, α) ∈ TCE (I) then for all (A, γ) such that DC |= γ α, we have (A, γ) ∈ TCE (I). The downward closure will be taken only after the ﬁxpoint of TCE is computed. We will see that, nevertheless, no deductive capability is lost and rule () is completely modeled. The set of immediate consequences of a union of program expressions is the set-theoretic union of the immediate consequences of each program expression. Instead, an atom A labelled by γ is an immediate consequence of the intersection of two program expressions if A is a consequence of both program expressions,

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

17

possibly with diﬀerent annotations α and β, and the label γ is the greatest lower bound of the annotations α and β. Let us formally deﬁne the downward closure of an interpretation. Definition 3 (Downward closure). The downward closure of an interpretation I ⊆ C-base L × Ann is deﬁned as: ↓ I = {(A, α) | (A, β) ∈ I, DC |= α β}. The next proposition sheds some more light on the semantics of the intersection operator, by showing that, when we apply the downward closure, the image of an interpretation through the operator TCE1 ∩E2 is the set-theoretic intersection of the images of the interpretation through the operators associated with E1 and E2 , respectively. This property supports the intuition that the program expressions have to agree at each computation step (see [9]). Proposition 1. Let I1 and I2 be two interpretations. Then ↓ (I1 e I2 ) = (↓ I1 ) ∩ (↓ I2 ). The next theorem shows the continuity of the TCE operator over the lattice of interpretations. Theorem 1 (Continuity). For any program expression E, the function TCE is continuous (over (℘(C-base L × Ann), ⊆)). The ﬁxpoint semantics for a program expression is now deﬁned as the downward of the least ﬁxpoint of TCE which, by continuity of TCE , is determined closure C i as i∈N (TE ) . Definition 4 (Fixpoint semantics). Let E be a program expression. The ﬁxpoint semantics of E is deﬁned as

F C (E) =↓ (TCE )ω . We remark that the downward closure is applied only once, after having computed the ﬁxpoint of TCE . However, it is easy to see that the closure is a continuous operator on the lattice of interpretations ℘(C-base L × Ann). Thus ↓

(TCE )i

i∈N

=

i∈N

↓ (TCE )i

showing that by taking the closure at each step we would have obtained the same set of consequences. Hence, as mentioned before, rule () is completely captured.

18

5.3

Paolo Baldan et al.

Soundness and Completeness

In the spirit of [7,34] we deﬁne the semantics of the meta-interpreter by relating the semantics of an object program to the semantics of the corresponding vanilla meta-program (i.e., including the meta-level representation of the object program). When stating the correspondence between the object program and the meta-program we consider only formulae of interest, i.e., elements of C-base L annotated with labels from Ann, which are the semantic counterpart of object level annotated atoms. We show that given a MuTACLP program expression E (object program) for any A ∈ C-base L and any α ∈ Ann, demo(E, A α) is provable at the meta-level if and only if (A, α) is provable in the object program. Theorem 2 (Soundness and completeness). Let E be a program expression and let V be the meta-program containing the meta-level representation of the object level programs occurring in E and clauses (1)-(10). For any A ∈ C-base L and α ∈ Ann, the following statement holds: demo(E, A α) ∈ (TVM )ω

⇐⇒

(A, α) ∈ F C (E),

where TVM is the standard immediate consequence operator for CLP programs. Note that V is a CLP(M) program where M is a multi-sorted constraint domain, including the constraint domain Term, presented in Example 2, and the constraint domain C. It is worth observing that if C is a C-ground instance of a constraint then DM |= C ⇔ DC |= C.

6

Some Examples

This section is devoted to present examples which illustrate the use of annotations in the representation of temporal information and the structuring possibilities oﬀered by the operators. First we describe applications of our framework in the ﬁeld of legal reasoning. Then we show how the intersection operator can be employed to deﬁne a kind of valid-timeslice operator.

6.1

Applications to Legal Reasoning

Laws and regulations are naturally represented in separate theories and they are usually combined in ways that are necessarily more complex than a plain merging. Time is another crucial ingredient in the deﬁnition of laws and regulations, since, quite often, they refer to instants of time and, furthermore, their validity is restricted to a ﬁxed period of time. This is especially true for laws and regulations which concern taxation and government budget related regulations in general.

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

19

British Nationality Act. We start with a classical example in the ﬁeld of legal reasoning [41], i.e. a small piece of the British Nationality Act. Simply partitioning the knowledge into separate programs and using the basic union operation, one can exploit the temporal information in an orderly way. Assume that Jan 1 1955 is the commencement date of the law. Then statement x obtains the British Nationality at time t if x is born in U.K. at time t and t is after commencement and y is parent of x and y is a British citizen at time t or y is a British resident at time t is modeled by the following program. BNA: get-citizenship(X) at T ← T ≥ Jan 1 1955 , born(X,uk) at T , parent(Y,X) at T , british-citizen(Y) at T get-citizenship(X) at T ← T ≥ Jan 1 1955 , born(X,uk) at T , parent(Y,X) at T , british-resident(Y) at T Now, the data for a single person, say John, can be encoded in a separate program. John: born(john,uk) at Aug 10 1969 . parent(bob,john) th [T, ∞] ← born(john, ) at T british-citizen(bob) th [Sept 6 1940 , ∞]. Then, by means of the union operator, one can inquire about the citizenship of John, as follows demo(BNA ∪ John, get-citizenship(john) at T ) obtaining as result T = Aug 10 1969 . Movie Tickets. Since 1997, an Italian regulation for encouraging people to go to the cinema, states that on Wednesdays the ticket price is 8000 liras, whereas in the rest of the week it is 12000 liras. The situation can be modeled by the following theory BoxOff. BoxOff: ticket (8000 , X ) at T ← T ≥ Jan 1 1997 , wed at T ticket (12000 , X ) at T ← T ≥ Jan 1 1997 , non wed at T The constraint T ≥ Jan 1 1997 represents the validity of the clause, which holds from January 1, 1997 onwards. The predicates wed and non wed are deﬁned in a separate theory Days, where w is assumed to be the last Wednesday of 1996.

20

Days:

Paolo Baldan et al.

wed at w. wed at T + 7 ← wed at T non wed th [w + 1, w + 6]. non wed at T + 7 ← non wed at T

Notice that, by means of recursive predicates one can easily express periodic temporal information. In the example, the deﬁnition of the predicate wed expresses the fact that a day is Wednesday if it is a date which is known to be Wednesday or it is a day coming seven days after a day proved to be Wednesday. The predicate non wed is deﬁned in an analogous way; in this case the unit clause states that all six consecutive days following a Wednesday are not Wednesdays. Now, let us suppose that the owner of a cinema wants to increase the discount for young people on Wednesdays, establishing that the ticket price for people who are eighteen years old or younger is 6000 liras. By resorting to the intersection operation we can build a program expression that represents exactly the desired policy. We deﬁne three new programs Cons, Disc and Age. Cons:

ticket (8000 , X ) at T ← Y > 18, age(X , Y ) at T ticket (12000 , X ) at T .

The above theory speciﬁes how the predicate deﬁnitions in BoxOff must change according to the new policy. In fact to get a 8000 liras ticket now a further constraint must be satisﬁed, namely the customer has to be older than eighteen years old. On the other hand, no further requirement is imposed to buy a 12000 liras ticket. Disc:

ticket (6000 , X ) at T ← a ≤ 18, wed at T , age(p, a) at T

The only clause in Disc states that a 6000 liras ticket can be bought on Wednesdays by a person who is eighteen years old or younger. The programs Cons and Disc are parametric with respect to the predicate age, which is deﬁned in a separate theory Age. Age:

age(X , Y ) at T ← born(X ) at T1 , year-diﬀ(T1 , T, Y )

At this point we can compose the above programs to obtain the program expression representing the new policy, namely (BoxOff ∩ Cons) ∪ Disc ∪ Days ∪ Age. Finally, in order to know how much is a ticket for a given person, the above program expression must be joined with a separate program containing the date of birth of the person. For instance, such program could be Tom:

born(tom) at May 7 1982 .

Then the answer to the query demo(((BoxOff ∩ Cons) ∪ Disc ∪ Days ∪ Tom), ticket (X , tom) at May 20 1998 ) is X = 6000 since May 20 1998 is a Wednesday and Tom is sixteen years old.

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

21

Invim. Invim was an Italian law dealing with paying taxes on real estate transactions. The original regulation, in force since January 1, 1950, requires time calculations, since the amount of taxes depends on the period of ownership of the real estate property. Furthermore, although the law has been abolished in 1992, it still applies but only for the period antecedent to 1992. To see how our framework allows us to model the described situation let us ﬁrst consider the program Invim below, which contains a sketch of the original body of regulations. Invim: due(Amount,X,Prop) th [T2 , ∞] ← T2 ≥ Jan 1 1950 , buys(X,Prop) at T1 , sells(X,Prop) at T2 , compute(Amount,X,Prop,T1 ,T2 ) compute(Amount,X,Prop,T1 ,T2 ) ← . . . To update the regulations in order to consider the decisions taken in 1992, as in the previous example we introduce two new theories. The ﬁrst one includes a set of constraints on the applicability of the original regulations, while the second one is designed to embody regulations capable of handling the new situation. Constraints: due(Amount,X,Prop) th [Jan 1 1993 , ∞] ← sells(X,Prop) in [Jan 1 1950 , Dec 31 1992 ] compute(Amount,X,Prop,T1 ,T2 ). The ﬁrst rule speciﬁes that the relation due is computed, provided that the selling date is antecedent to December, 31 1992. The second rule speciﬁes that the rules for compute, whatever number they are, and whatever complexity they have, carry on unconstrained to the new version of the regulation. It is important to notice that the design of the constraining theory can be done without taking care of the details (which may be quite complicated) embodied in the original law. The theory which handles the case of a property bought before December 31, 1992 and sold after the ﬁrst of January, 1993, is given below. Additions: due(Amount,X,Prop) th [T2 , ∞] ← T2 ≥ Jan 1 1993 , buys(X,Prop) at T1 , sells(X,Prop) at T2 , compute(Amount,X,Prop,T1 ,Dec 31 1992 ) Now consider a separate theory representing the transactions regarding Mary, who bought an apartment on March 8, 1965 and sold it on July 2, 1997. Trans1: buys(mary,apt8) at Mar 8 1965 . sells(mary,apt8) at Jul 2 1997 .

22

Paolo Baldan et al.

The query demo(Invim ∪ Trans1, due(Amount,mary,apt8) th [ , ]) yields the amount, say 32.1, that Mary has to pay when selling the apartment according to the old regulations. On the other hand, the query demo(((Invim ∩ Constraints) ∪ Additions) ∪ Trans1, due(Amount,mary,apt8) th [ , ]) yields the amount, say 27.8, computed according to the new regulations. It is smaller than the previous one because taxes are computed only for the period from March 8, 1965 to December 31, 1992, by using the clause in the program Additions. The clause in Invim ∩ Constraints cannot be used since the condition regarding the selling date (sells(X,Prop) in [Jan 1 1950 , Dec 31 1992 ]) does not hold. In the transaction, represented by the program below, Paul buys the ﬂat on January 1, 1995. Trans2: buys(paul,apt9) at Jan 1 1995 . sells(paul,apt9) at Sep 12 1998 . demo(Invim ∪ Trans2, due(Amount,paul,apt9) th [ , ]) Amount = 1.7 demo(((Invim ∩ Constraints) ∪ Additions) ∪ Trans2, due(Amount,paul,apt9) th [ , ]) no If we query the theory Invim ∪ Trans2 we will get that Paul must pay a certain amount of tax, say 1.7, while, according to the updated regulation, he must not pay the Invim tax because he bought and sold the ﬂat after December 31, 1992. Indeed, the answer to the query computed with respect to the theory ((Invim ∩ Constraints) ∪ Additions) ∪ Trans2 is no, i.e., no tax is due. Summing up, the union operation can be used to obtain a larger set of clauses. We can join a program with another one to provide it with deﬁnitions of its undeﬁned predicates (e.g., Age provides a deﬁnition for the predicate age not deﬁned in Disc and Cons) or alternatively to add new clauses for an existing predicate (e.g., Disc contains a new deﬁnition for the predicate ticket already deﬁned in BoxOff). On the other hand, the intersection operator provides a natural way of imposing constraints on existing programs (e.g., the program Cons constrains the deﬁnition of ticket given in BoxOff). Such constraints aﬀect not only the computation of a particular property, like the intersection operation deﬁned by Brogi et al. [9], but also the temporal information in which the property holds.

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

23

The use of TACLP programs allows us to represent and reason on temporal information in a natural way. Since time is explicit, at the object level we can directly access the temporal information associated with atoms. Periodic information can be easily expressed by recursive predicates (see the predicates wed and non-wed in the theory Days). Indeﬁnite temporal information can be represented by using in annotations. E.g., in the program Additions the in annotation is used to specify that a certain date is within a time period (sell(X,Prop) in [Jan 1 1950, Dec 31 1992]). This is a case in which it is not important to know the precise date but it is suﬃcient to have an information which delimits the time period in which it can occur. 6.2

Valid-Timeslice Operator

By exploiting the features of the intersection operator we can deﬁne an operator which eases the selection of information holding in a certain interval. Definition 5. Let P be a plain program. For a ground interval [t1 , t2 ] we deﬁne [t ,t2 ]

P ⇓ [t1 , t2 ] = P ∩ 1P1 [t ,t ]

where 1P1 2 is a program which contains a fact “p(X1 , . . . , Xn )th [t1 , t2 ].” for all p deﬁned in P with arity n. Intuitively the operator ⇓ selects only the clauses belonging to P that hold in [t1 , t2 ] or in a subinterval of [t1 , t2 ], and it restricts their validity time to such an interval. Therefore ⇓ allows us to create temporal views of programs, for instance P ⇓ [t, t] is the program P at time point t. Hence it acts as a validtimeslice operator in the ﬁeld of databases (see the glossary in [13]). Consider again the Invim example of the previous section. The whole history of the regulation concerning Invim, can be represented by using the following program expression (Invim ⇓ [0, Dec 31 1992 ]) ∪ ((Invim ∩ Constraints) ∪ Additions) By applying the operation ⇓, the validity of the clauses belonging to Invim is restricted to the period from January 1, 1950 up to December 31, 1992, thus modeling the law before January 1, 1993. On the other hand, the program expression (Invim ∩ Constraints) ∪ Additions expresses the regulation in force since January 1, 1993, as we previously explained. This example suggests how the operation ⇓ can be useful to model updates. Suppose that we want to represent that Frank is a research assistant in mathematics, and that, later, he is promoted becoming an assistant professor. In our formalism we can deﬁne a program Frank that records the information associated with Frank as a research assistant. Frank: research assistant(maths) th [Mar 8 1993 , ∞].

24

Paolo Baldan et al.

On March 1996 Frank becomes an assistant professor. In order to modify the information contained in the program Frank, we build the following program expression: (Frank ⇓ [0, Feb 29 1996 ]) ∪ {assistant prof(maths) th [Mar 1 1996 , ∞].} where the second expression is an unnamed theory. Unnamed theories, which have not been discussed so far, can be represented by the following meta-level clause: clause({X α ← Y }, X α, Y ) ← T1 ≤ T2 where α = th [T1 , T2 ] or α = in [T1 , T2 ]. The described update resembles the addition and deletion of a ground atom. For instance in LDL++ [47] an analogous change can be implemented by solving the goal −research assistant(maths), +assistant prof (maths). The advantage of our approach is that we do not change directly the clauses of a program, e.g. Frank in the example, but we compose the old theory with a new one that represents the current situation. Therefore the state of the database before March 1, 1996 is preserved, thus maintaining the whole history. For instance, the ﬁrst query below inquires the updated database before Frank’s promotion whereas the second one shows how information in the database has been modiﬁed. demo((Frank ⇓ [0, Feb 29 1996 ]) ∪ {assistant prof(maths) th [Mar 1 1996 , ∞].}, research assistant(X) at Feb 23 1994 ) X = maths demo((Frank ⇓ [0, Feb 29 1996 ]) ∪ {assistant prof(maths) th [Mar 1 1996 , ∞].}, research assistant(X) at Mar 12 1996 ) no.

7

Related Work

Event Calculus by Kowalski and Sergot [28] has been the ﬁrst attempt to cast into logic programming the rules for reasoning about time. In more details, Event Calculus is a treatment of time, based on the notion of event, in ﬁrstorder classical logic augmented with negation as failure. It is closely related to Allen’s interval temporal logic [3]. For example, let E1 be an event in which Bob gives the Book to John and let E2 be an event in which John gives Mary the Book. Assume that E2 occurs after E1. Given these event descriptions, we can deduce that there is a period started by the event E1 in which John possesses the book and that there is a period terminated by E1 in which Bob possesses the book. This situation is represented pictorially as follows:

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

25

Bob has the Book John has the Book < −−−−−−−−−−−−−−−−− ◦ −−−−−−−−−−−−−−−−− − > E1 John has the Book Mary has the Book < −−−−−−−−−−−−−−−−− − ◦ −−−−−−−−−−−−−−−−−−−−− > E2

A series of axioms for deducing the existence of time periods and the Start and End of each time period are given by using the Holds predicate. Holds(before(e r )) if Terminates(e r ) means that the relationship r holds in the time period before(e r) that denotes a time period terminated by the event e. Holds(after(e r)) is deﬁned in an analogous way. Event Calculus provides a natural treatment of valid time in databases, and it was extended in [43,44] to include the concept of transaction time. Therefore Event Calculus exploits the deductive power of logic and the computational power of logic programming as in our approach, but the modeling of time is diﬀerent: events are the granularity of time chosen in Event Calculus, whereas we use time points and time periods. Furthermore no provision for multiple theories is given in Event Calculus. Kifer and Subrahmanian in [26] introduce generalized annotated logic programs (GAPs), and show how Templog [1] and an interval based temporal logic can be translated into GAPs. The annotations used there correspond to the th annotations of MuTACLP. To implement the annotated logic language, the paper proposes to use “reductants”, additional clauses which are derived from existing clauses to express all possible least upper bounds. The problem is that a ﬁnite program may generate inﬁnitely many such reductants. Then a new kind of resolution for annotated logic programs, called “ca-resolution”, is proposed in [30]. The idea is to compute dynamically and incrementally the least upper bounds by collecting partial answers. Operationally this is similar to the metainterpreter presented in Section 5.1 which relies on recursion to collect the partial answers. However, in [30] the intermediate stages of the computation may not be sound with respect to the standard CLP semantics. The paper [26] presents also two ﬁxpoint semantics for GAPs, deﬁned in terms of two diﬀerent operators. The ﬁrst operator, called TP , is based on interpretations which associate with each element of the Herbrand Base of a program P a set of annotations which is an ideal, i.e., a set downward closed and closed under ﬁnite least upper bounds. For each atom A, the computed ideal is the least one containing the annotations α of annotated atoms A α which are heads of (instances of) clauses whose body holds in the interpretation. The other operator, RP , is based on interpretations which associate with each atom of the Herbrand Base a single annotation, obtained as the least upper bound of the set of annotations computed as in the previous case. Our ﬁxpoint operator for MuTACLP works similarly to the TP operator: at each step we take the closure with respect to (representable) ﬁnite least upper bounds, and, although we perform the downward closure only at the end of the computation, this does

26

Paolo Baldan et al.

not aﬀect the set of derivable consequences. The main diﬀerence resides in the language: MuTACLP is an extension of CLP, which focuses on temporal aspects and provides mechanisms for combining programs, taking from GAP the basic ideas for handling annotations, whereas GAP is a general language with negation and arbitrary annotations but without constraints and multiple theories. Our temporal annotations correspond to some of the predicates proposed by Galton in [19], which is a critical examination of Allen’s classical work on a theory of action and time [3]. Galton accounts for both time points and time periods in dense linear time. Assuming that the intervals I are not singletons, Galton’s predicate holds-in(A,I) can be mapped into MuTACLP’s A in I, holdson(A,I) into A th I, and holds-at(A,t) into A at t, where A is an atomic formula. From the described correspondence it becomes clear that MuTACLP can be seen as reiﬁed FOL where annotated formulae, for example born(john)at t, correspond to binary meta-relations between predicates and temporal information, for example at(born(john), t). But also, MuTACLP can be regarded as a modal logic, where the annotations are seen as parameterized modal operators, e.g., born(john) (at t). Our temporal annotations also correspond to some temporal characteristics in the ChronoBase data model [42]. Such a model allows for the representation of a wide variety of temporal phenomena in a temporal database which cannot be expressed by using only th and in annotations. However, this model is an extension of the relational data model and, diﬀerently from our model, it is not rule-based. An interesting line of research could be to investigate the possibility of enriching the set of annotations in order to capture some other temporal characteristics, like a property that holds in an interval but not in its subintervals, still maintaining a simple and clear semantics. In [10], a powerful temporal logic named MTL (tense logic extended by parameterized temporal operators) is translated into ﬁrst order constraint logic. The resulting language subsumes Templog. The parameterized temporal operators of MTL correspond to the temporal annotations of TACLP. The constraint theory of MTL is rather complex as it involves quantiﬁed variables and implication, whose treatment goes beyond standard CLP implementations. On the other hand, MuTACLP inherits an eﬃcient standard constraint-based implementation of annotations from the TACLP framework. As far as the multi-theory setting is concerned, i.e. the possibility oﬀered by MuTACLP to structure and compose (temporal) knowledge, there are few logic-based approaches providing the user with these tools. One is Temporal Datalog [35], an extension of Datalog based on a simple temporal logic with two temporal operators, namely ﬁrst and next. Temporal Datalog introduces a notion of module, which however does not seem to be used as a knowledge representation tool but rather to deﬁne new non-standard algebraic operators. In fact, to query a temporal Datalog program, Orgun proposes a “point-wise extension” of the relational algebra upon the set of natural numbers, called TRAalgebra. Then he provides a mechanism for specifying generic modules, called temporal modules, which are parametric Temporal Datalog programs, with a

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

27

number of input predicates (parameters) and an output predicate. A module can be then regarded as an operator which, given a temporal relation, returns a temporal relation. Thus temporal modules are indeed used as operators of TRA, through which one has access to the use of recursion, arithmetic predicates and temporal operators. A multi-theory framework in which temporal information can be handled, based on annotated logics, is proposed by Subrahmanian in [45]. This is a very general framework aimed at amalgamating multiple knowledge bases which can also contain temporal information. The knowledge bases are GAPs [26] and temporal information is modeled by using an appropriate lattice of annotations. In order to integrate these programs, a so called Mediatory Database is given, which is a GAP having clauses of the form A0 : [m, µ] ← A1 : [D1 , µ1 ], . . . , An : [Dn , µn ] where each Di is a set of database names. Intuitively, a ground instance of a clause in the mediator can be interpreted as follows: “If the databases in set Di , 1 ≤ i ≤ n, (jointly) imply that the truth value of Ai is at least µi , then the mediator will conclude that the truth value of A0 is at least µ”. Essentially the fundamental mechanism provided to combine knowledge bases is a kind of message passing. Roughly speaking, the resolution of an atom Ai : [Di , µi ] is delegated to diﬀerent databases, speciﬁed by the set Di of database names, and the annotation µi is obtained by considering the least upper bounds of the annotations of each Ai computed in the distinct databases. Our approach is quite diﬀerent because the meta-level composition operators allow us to access not only to the relation deﬁned by a predicate but also to the deﬁnition of the predicate. For instance P ∪ Q is equivalent to a program whose clauses are the union of the clauses of P and Q and thus the information which can be derived from P ∪ Q is greater than the union of what we can derive from P and Q separately.

8

Conclusion

In this paper we have introduced MuTACLP, a language which joins the advantages of TACLP in handling temporal information with the ability to structure and compose programs. The proposed framework allows one to deal with time points and time periods and to model deﬁnite, indeﬁnite and periodic temporal information, which can be distributed among diﬀerent theories. Representing knowledge in separate programs naturally leads to use knowledge from diﬀerent sources; information can be stored at diﬀerent sites and combined in a modular way by employing the meta-level operators. This modular approach also favors the reuse of the knowledge encoded in the programs for future applications. The language MuTACLP has been given a top-down semantics by means of a meta-interepreter and a bottom-up semantics based on an immediate consequence operator. Concerning the bottom-up semantics, it would be interesting to investigate on diﬀerent deﬁnitions of the immediate consequence operator,

28

Paolo Baldan et al.

for instance by considering an operator similar to the function RP for generalized annotated programs [26]. The domain of interpretations considered in this paper is, in a certain sense, unstructured: interpretations are general sets of annotated atoms and the order, which is simply subset inclusion, does not take into account the order on annotations. Alternative solutions, based on diﬀerent notions of interpretation, may consider more abstract domains. These domains can be obtained by endowing C-base L × Ann with the product order (induced by the identity relation on C-base L and the order on Ann) and then by taking as elements of the domain (i.e. as interpretations) only those subsets of annotated atoms that satisfy some closure properties with respect to such an order. For instance, one can require “downward-closedness”, which amounts to including subsumption in the immediate consequence operator. Another possible property is “limit-closedness”, namely the presence of the least upper bound of all directed sets, which, from a computational point of view, amounts to consider computations which possibly require more than ω steps. In [15] the language TACLP is presented as an instance of annotated constraint logic (ACL) for reasoning about time. Similarly, we could have ﬁrst introduced a Multi-theory Annotated Constraint Logic (MuACL in brief), viewing MuTACLP as an instance of MuACL. To deﬁne MuACL the constructions described in this paper should be generalized by using, as basic language for plain programs, the more general paradigm of ACL where atoms can be labelled by a general class of annotations. In deﬁning MuACL we should require that the class of annotations forms a lattice, in order to have both upper bounds and lower bounds (the latter are necessary for the deﬁnition of the intersection operator). Indeed, it is not diﬃcult to see that, under the assumption that only atoms can be annotated and clauses are free of negation, both the meta-interpreter and the immediate consequence operator smootly generalize to deal with general annotations. Another interesting topic for future investigation is the treatment of negation. In the line of Fr¨ uhwirth, a possible solution consists of embodying the “negation by default” of logic programming into MuTACLP by exploiting the logical equalities proved in [15]: ((¬A) th I) ⇔ ¬(A in I)

((¬A) in I) ⇔ ¬(A th I)

Consequently, the meta-interpreter is extended with two clauses which use such equalities: demo(E, (¬A) th I) ← ¬demo(E, A in I) demo(E, (¬A) in I) ← ¬demo(E, A th I) However the interaction between negation by default and program composition operations is still to be fully understood. Some results on the semantic interactions between operations and negation by default are presented in [8], where, nevertheless, the handling of time is not considered. Furthermore, it is worth noticing that in this paper we have implicitly assumed that the same unit for time is used in diﬀerent programs, i.e. we have not dealt with diﬀerent time granularities. The ability to cope with diﬀerent

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

29

granularities (e.g. seconds, days, etc.) is particularly relevant to support interoperability among systems. A simple way to handle this feature, is by introducing in MuTACLP a notion of time unit and a set of conversion predicates which transform time points into the chosen time unit (see, e.g., [5]). We ﬁnally observe that in MuTACLP also spatial data can be naturally modelled. In fact, in the style of the constraint databases approaches (see, e.g., [25,37,20]) spatial data can be represented by using constraints. The facilities to handle time oﬀered by MuTACLP allows one to easily establish spatiotemporal correlations, for instance time-varying areas, or, more generally, moving objects, supporting either discrete or continuous changes (see [38,31,40]). Acknowledgments: This work has been partially supported by Esprit Working Group 28115 - DeduGIS.

References 1. M. Abadi and Z. Manna. Temporal logic programming. Journal of Symbolic Computation, 8:277–295, 1989. 2. J.F. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11):832–843, 1983. 3. J.F. Allen. Towards a general theory of action and time. Artificial Intelligence, 23:123–154, 1984. 4. P. Baldan, P. Mancarella, A. Raﬀaet` a, and F. Turini. Mutaclp: A language for temporal reasoning with multiple theories. Technical report, Dipartimento di Informatica, Universit` a di Pisa, 2001. 5. C. Bettini, X. S. Wang, and S. Jajodia. An architecture for supporting interoperability among temporal databases. In [13], pages 36–55. 6. K.A. Bowen and R.A. Kowalski. Amalgamating language and metalanguage in logic programming. In K. L. Clark and S.-A. Tarnlund, editors, Logic programming, volume 16 of APIC studies in data processing, pages 153–172. Academic Press, 1982. 7. A. Brogi. Program Construction in Computational Logic. PhD thesis, Dipartimento di Informatica, Universit` a di Pisa, 1993. 8. A. Brogi, S. Contiero, and F. Turini. Programming by combining general logic programs. Journal of Logic and Computation, 9(1):7–24, 1999. 9. A. Brogi, P. Mancarella, D. Pedreschi, and F. Turini. Modular logic programming. ACM Transactions on Programming Languages and Systems, 16(4):1361– 1398, 1994. 10. C. Brzoska. Temporal Logic Programming with Metric and Past Operators. In [14], pages 21–39. 11. J. Chomicki. Temporal Query Languages: A Survey. In Temporal Logic: Proceedings of the First International Conference, ICTL’94, volume 827 of Lecture Notes in Artificial Intelligence, pages 506–534. Springer, 1994. 12. J. Chomicki and T. Imielinski. Temporal Deductive Databases and Inﬁnite Objects. In Proceedings of ACM SIGACT/SIGMOD Symposium on Principles of Database Systems, pages 61–73, 1988. 13. O. Etzion, S. Jajodia, and S. Sripada, editors. Temporal Databases: Research and Practice, volume 1399 of Lecture Notes in Computer Science. Springer, 1998.

30

Paolo Baldan et al.

14. M. Fisher and R. Owens, editors. Executable Modal and Temporal Logics, volume 897 of Lecture Notes in Artificial Intelligence. Springer, 1995. 15. T. Fr¨ uhwirth. Temporal Annotated Constraint Logic Programming. Journal of Symbolic Computation, 22:555–583, 1996. 16. D. M. Gabbay. Modal and temporal logic programming. In [18], pages 197–237. 17. D.M. Gabbay and P. McBrien. Temporal Logic & Historical Databases. In Proceedings of the Seventeenth International Conference on Very Large Databases, pages 423–430, 1991. 18. A. Galton, editor. Temporal Logics and Their Applications. Academic Press, 1987. 19. A. Galton. A Critical Examination of Allen’s Theory of Action and Time. Artificial Intelligence, 42:159–188, 1990. 20. S. Grumbach, P. Rigaux, and L. Segouﬁn. The DEDALE system for complex spatial queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD-98), pages 213–224, 1998. 21. T. Hrycej. A temporal extension of Prolog. Journal of Logic Programming, 15(1& 2):113–145, 1993. 22. J. Jaﬀar and M.J. Maher. Constraint Logic Programming: A Survey. Journal of Logic Programming, 19 & 20:503–582, 1994. 23. J. Jaﬀar, M.J. Maher, K. Marriott, and P.J. Stuckey. The Semantics of Constraint Logic Programs. Journal of Logic Programming, 37(1-3):1–46, 1998. 24. J. Jaﬀar, S. Michaylov, P. Stuckey, and R. Yap. The CLP(R) Language and System. ACM Transactions on Programming Languages and Systems, 14(3):339–395, 1992. 25. P.C. Kanellakis, G.M. Kuper, and P.Z. Revesz. Constraint query languages. Journal of Computer and System Sciences, 51(1):26–52, 1995. 26. M. Kifer and V.S. Subrahmanian. Theory of Generalized Annotated Logic Programming and its Applications. Journal of Logic Programming, 12:335–367, 1992. 27. M. Koubarakis. Database models for inﬁnite and indeﬁnite temporal information. Information Systems, 19(2):141–173, 1994. 28. R. A. Kowalski and M.J. Sergot. A Logic-based Calculus of Events. New Generation Computing, 4(1):67–95, 1986. 29. R.A. Kowalski and J.S. Kim. A metalogic programming approach to multi-agent knowledge and belief. In Artificial Intelligence and Mathematical Theory of Computation. Academic Press, 1991. 30. S.M. Leach and J.J. Lu. Computing Annotated Logic Programs. In Proceedings of the eleventh International Conference on Logic Programming, pages 257–271, 1994. 31. P. Mancarella, G. Nerbini, A. Raﬀaet` a, and F. Turini. MuTACLP: A language for declarative GIS analysis. In Proceedings of the Sixth International Conference on Rules and Objects in Databases (DOOD2000), volume 1861 of Lecture Notes in Artificial Intelligence, pages 1002–1016. Springer, 2000. 32. P. Mancarella, A. Raﬀaet` a, and F. Turini. Knowledge Representation with Multiple Logical Theories and Time. Journal of Experimental and Theoretical Artificial Intelligence, 11:47–76, 1999. 33. P. Mancarella, A. Raﬀaet` a, and F. Turini. Temporal Annotated Constraint Logic Programming with Multiple Theories. In Tenth International Workshop on Database and Expert Systems Applications, pages 501–508. IEEE Computer Society Press, 1999. 34. B. Martens and D. De Schreye. Why Untyped Nonground Metaprogramming Is Not (Much Of) A Problem. Journal of Logic Programming, 22(1):47–99, 1995. 35. M. A. Orgun. On temporal deductive databases. Computational Intelligence, 12(2):235–259, 1996.

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

31

36. M. A. Orgun and W. Ma. An Overview of Temporal and Modal Logic Programming. In Temporal Logic: Proceedings of the First International Conference, ICTL’94, volume 827 of Lecture Notes in Artificial Intelligence, pages 445–479. Springer, 1994. 37. J. Paredaens, J. Van den Bussche, and D. Van Gucht. Towards a theory of spatial database queries. In Proceedings of the 13th ACM Symposium on Principles of Database Systems, pages 279–288, 1994. 38. A. Raﬀaet` a. Spatio-temporal knowledge bases in a constraint logic programming framework with multiple theories. PhD thesis, Dipartimento di Informatica, Universit` a di Pisa, 2000. 39. A. Raﬀaet` a and T. Fr¨ uhwirth. Semantics for Temporal Annotated Constraint Logic Programming. In Labelled Deduction, volume 17 of Applied Logic Series, pages 215–243. Kluwer Academic, 2000. 40. A. Raﬀaet` a and C. Renso. Temporal Reasoning in Geographical Information Systems. In International Workshop on Advanced Spatial Data Management (DEXA Workshop), pages 899–905. IEEE Computer Society Press, 2000. 41. M. J. Sergot, F. Sadri, R. A. Kowalski, F. Kriwaczek, P. Hammond, and H. T. Cory. The British Nationality Act as a logic program. Communications of the ACM, 29(5):370–386, 1986. 42. S. Sripada and P. M¨ oller. The Generalized ChronoBase Temporal Data Model. In Meta-logics and Logic Programming, pages 310–335. MIT Press, 1995. 43. S.M. Sripada. A logical framework for temporal deductive databases. In Proceedings of the Very Large Databases Conference, pages 171–182, 1988. 44. S.M. Sripada. Temporal Reasoning in Deductive Databases. PhD thesis, Department of Computing Imperial College of Science & Technology, 1991. 45. V. S. Subrahmanian. Amalgamating Knowledge Bases. ACM Transactions on Database Systems, 19(2):291–331, 1994. 46. A. Tansel, J. Cliﬀord, S. Gadia, S. Jajodia, A. Segev, and R. Snodgrass editors. Temporal Databases: Theory, Design, and Implementation. Benjamin/Cummings, 1993. 47. C. Zaniolo, N. Arni, and K. Ong. Negation and aggregates in recursive rules: The LDL++Approach. In International conference on Deductive and ObjectOriented Databases (DOOD’93), volume 760 of Lecture Notes in Computer Science. Springer, 1993.

32

Paolo Baldan et al.

Appendix: Proofs Proposition 1 Let I1 and I2 be two interpretations. Then ↓ (I1 e I2 ) = ↓ I1 ↓ I2 . Proof. Assume (A, α) ∈↓ (I1 e I2 ). By deﬁnition of downward closure there exists γ such that (A, γ) ∈ I1 e I2 and DC |= α γ. By deﬁnition of e there exist β and β such that (A, β) ∈ I1 and (A, β ) ∈ I2 and DC |= β # β = γ. Therefore DC |= α β, α β , by deﬁnition of downward closure we conclude (A, α) ∈↓ I1 and (A, α) ∈↓ I2 , i.e., (A, α) ∈↓ I1 ↓ I2 . Vice versa assume (A, α) ∈↓ I1 ∩ ↓ I2 . By deﬁnition of set-theoretic intersection and downward closure there exist β and β such that DC |= α β, α β and (A, β) ∈ I1 and (A, β ) ∈ I2 . By deﬁnition of e, (A, γ) ∈ I1 e I2 and DC |= β # β = γ. By property of the greatest lower bound DC |= α β # β , hence (A, α) ∈↓ (I1 e I2 ). Theorem 1 Let E be a program expression. The function TCE is continuous (on (℘(C-base L × Ann), ⊆)). Proof. Let {Ii }i∈N be a chain in (℘(C-base L × Ann), ⊆), i.e., I0 ⊆ I1 ⊆ . . . ⊆ Ii . . .. Then we have to prove

C Ii ⇐⇒ (A, α) ∈ TCE (Ii ). (A, α) ∈ TE i∈N i∈N The proof is by structural induction of E. (E is a plain program P ). (A, α) ∈ TCP ( i∈N Ii ) ⇐⇒ {deﬁnition of TCP } ((α = th [s1 , s2 ] ∨ α = in [s1 , s2 ]) ∧ A α ← C1 , . . . , Ck , B1 α1 , . . . , Bn αn ∈ ground C (P ) ∧ {(B1 , β1 ), . . . , (Bn , βn )} ⊆ i∈N Ii ∧ DC |= C1 , . . . , Ck , α1 β1 , . . . , αn βn , s1 ≤ s2 ) ∨ (α = th [s1 , r2 ] ∧ A th [s1 , s 2 ] ← C1 , . . . , Ck , B1 α1 , . . . , B n αn ∈ ground C (P ) ∧ {(B1 , β1 ), . . . , (Bn , βn )} ⊆ i∈N Ii ∧ (A, th [r1 , r2 ]) ∈ i∈N Ii ∧ DC |= C1 , . . . , Ck , α1 β1 , . . . , αn βn , s1 < r1 , r1 ≤ s2 , s2 < r2 ) ⇐⇒ {property of set-theoretic union and {Ii }i∈N is a chain. Notice that for (=⇒) j can be any element of the set {k | (Bi , βi ) ∈ Ik , i = 1, . . . , n} which is clearly not empty} ((α = th [s1 , s2 ] ∨ in [s1 , s2 ]) ∧ A α ← C1 , . . . , Ck , B1 α1 , . . . , Bn αn ∈ ground C (P ) ∧ {(B1 , β1 ), . . . , (Bn , βn )} ⊆ Ij ∧ DC |= C1 , . . . , Ck , α1 β1 , . . . , αn βn , s1 ≤ s2 ) ∨

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

33

(α = th [s1 , r2 ] ∧ A th [s1 , s2 ] ← C1 , . . . , Ck , B1 α1 , . . . , Bn αn ∈ ground C (P ) ∧ {(B1 , β1 ), . . . , (Bn , βn )} ⊆ Ij ∧ (A, th [r1 , r2 ]) ∈ Ij ∧ DC |= C1 , . . . , Ck , α1 β1 , . . . , αn βn , s1 < r1 , r1 ≤ s2 , s2 < r2 ) ⇐⇒ {deﬁnition of TCP } (A, α) ∈ TCP (Ij ) ⇐⇒ {set-theoretic union} (A, α) ∈ i∈N TCP (Ii ) (E = Q ∪ R).

(A, α) ∈ TCQ∪R ( i∈N Ii ) ⇐⇒ {deﬁnitionof TCQ∪R } (A, α) ∈ TCQ ( i∈N Ii ) ∪ TCR ( i∈N Ii ) ⇐⇒ {inductive hypothesis} C C (A, α) ∈ i∈N TQ (Ii ) ∪ i∈N TR (Ii ) ⇐⇒ {properties of union} (A, α) ∈ i∈N TCQ (Ii ) ∪ TCR (Ii ) C ⇐⇒ {deﬁnition ofCTQ∪R } (A, α) ∈ i∈N TQ∪R (Ii ) (E = Q ∩ R).

(A, α) ∈ TCQ∩R ( i∈N Ii ) ⇐⇒ {deﬁnitionof TCQ∩R } (A, α) ∈ TCQ ( i∈N Ii ) e TCR ( i∈N Ii ) ⇐⇒ {inductive hypothesis} C C (A, α) ∈ i∈N TQ (Ii ) e i∈N TR (Ii ) ⇐⇒ {deﬁnition and monotonicity of TC } e of C C (A, α) ∈ i∈N TQ (Ii ) e TR (Ii ) C ⇐⇒ {deﬁnition ofCTQ∩R } (A, α) ∈ i∈N TQ∩R (Ii ) Soundness and Completeness This section presents the proofs of the soundness and completeness results for MuTACLP meta-interpreter. Due to space limitations, the proofs of the technical lemmata are omitted and can be found in [4,38]. We ﬁrst ﬁx some notational conventions. In the following we will denote by E, N , R and Q generic program expressions, and by C the ﬁxed constraint domain where the constraints of object programs are interpreted. Let M be the ﬁxed constraint domain, where the constraints of the meta-interpreter deﬁned in Section 5.1 are interpreted. We denote by A, B elements of C-base L , with α, β, γ annotations in Ann and by C a C-ground instance of a constraint. All symbols may have subscripts. In the following for simplicity we will drop the reference to C and M in the name of the immediate consequence operators. Moreover we refer to the program containing the meta-level representation of object level programs and clauses (1)-(10) as “the meta-program V corresponding to a program expression”.

34

Paolo Baldan et al.

We will say that an interpretation I ⊆ C-base L × Ann satisﬁes the body of a C-ground instance A α ← C1 , . . . , Ck , B1 α1 , . . . , Bn αn of a clause, or in symbols I |= C1 , . . . , Ck , B1 α1 , . . . , Bn αn , if 1. DC |= C1 , . . . , Ck and 2. there are annotations β1 , . . . , βn such that {(B1 , β1 ), . . . , (Bn , βn )} ⊆ I and DC |= α1 β1 , . . . , αn βn . Furthermore, will often denote a sequence C1 , . . . , Ck of C-ground instances ¯ while a sequence B1 α1 , . . . , Bn αn of annotated atoms in of constraints by C, ¯ For example, with this convention a clause of C-base L ×Ann will be denoted by B. ¯ B, ¯ and, the kind A α ← C1 , . . . , Ck , B1 α1 , . . . , Bn αn will be written as A α ← C, ¯ B)) ¯ similarly, in the meta-level representation, we will write clause(E, A α, (C, in place of clause(E, A α, (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )). Soundness. In order to show the soundness of the meta-interpreter (restricted to the atoms of interest), we present the following easy lemma, stating that if a conjunctive goal is provable at the meta-level then also its atomic conjuncts are provable at the meta-level. Lemma 1. Let E be a program expression and let V be the corresponding metainterpreter. For any B1 α1 , . . . , Bn αn with Bi ∈ C-base L and αi ∈ Ann and for any C1 , . . . , Ck , with Ci a C-ground instance of a constraint, we have: For all h demo(E, (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )) ∈ TVh =⇒ {demo(E, B1 α1 ), . . . , demo(E, Bn αn )} ⊆ TVh ∧ DC |= C1 , . . . , Ck . The next two lemmata relate the clauses computed from a program expression E at the meta-level, called “virtual clauses”, with the set of consequences of E. The ﬁrst lemma states that whenever we can ﬁnd a virtual clause computed from E whose body is satisﬁed by I, the head A α of the clause is a consequence of the program expression E. The second one shows how the head of a virtual clause can be “joined” with an already existing annotated atom in order to obtain an atom with a larger th annotation. Lemma 2 (Virtual Clauses Lemma 1). Let E be a program expression and V be the corresponding meta-interpreter. For any sequence C¯ of C-ground instances ¯ in C-base L × Ann and any interpretation I ⊆ of constraints, for any A α, B C-base L × Ann, we have: ¯ B ¯ ¯ B)) ¯ ∈ T ω ∧ I |= C, clause(E, A α, (C, V

=⇒

(A, α) ∈ TE (I).

Lemma 3 (Virtual Clauses Lemma 2). Let E be a program expression and ¯ in V be the corresponding meta-program. For any A th [s1 , s2 ], A th [r1 , r2 ], B C-base L × Ann, for any sequence C¯ of C-ground instances of constraints, and any interpretation I ⊆ C-base L × Ann, the following statement holds:

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

35

¯ B)) ¯ ∈ T ω ∧ I |= C, ¯ B ¯ ∧ clause(E, A th [s1 , s2 ], (C, V (A, th [r1 , r2 ]) ∈ I ∧ DC |= s1 < r1 , r1 ≤ s2 , s2 < r2 =⇒ (A, th [s1 , r2 ]) ∈ TE (I). Now, the soundness of the meta-interpreter can be proved by showing that if an annotated atom A α is provable at the meta-level from the program expression E then A γ is a consequence of E for some γ such that A γ ⇒ A α, i.e., the annotation α is less or equal to γ. Theorem 3 (soundness). Let E be a program expression and let V be the corresponding meta-program. For any A α with A ∈ C-base L and α ∈ Ann, the following statement holds: demo(E, A α) ∈ TVω

=⇒

(A, α) ∈ FC (E).

Proof. We ﬁrst show that for all h demo(E, A α) ∈ TVh

=⇒

∃γ : (A, γ) ∈ Tω E ∧ DC |= α γ.

(12)

The proof is by induction on h. (Base case). Trivial since TV0 = ∅. (Inductive case). Assume that demo(E, A α) ∈ TVh

=⇒

∃γ : (A, γ) ∈ Tω E ∧ DC |= α γ.

Then: demo(E, A α) ∈ TVh+1 ⇐⇒ {deﬁnition of TVi } demo(E, A α) ∈ TV (TVh ) We have four cases corresponding to clauses (3), (4), (5) and (6). We only show the cases related to clause (3) and (4) since the others are proved in an analogous way. (clause (3)) {α = th [t1 , t2 ], deﬁnition of TV and clause (3)} ¯ B)), ¯ demo(E, (C, ¯ B))} ¯ {clause(E, A th [s1 , s2 ], (C, ⊆ TVh ∧ DC |= s1 ≤ t1 , t2 ≤ s2 , t1 ≤ t2 ¯ B) ¯ = (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )} =⇒{Lemma 1 and (C, clause(E, A th [s1 , s2 ], (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )) ∈ TVh ∧ {demo(E, B1 α1 ), . . . , demo(E, Bn αn )} ⊆ TVh ∧ DC |= C1 , . . . , Ck ∧ DC |= s1 ≤ t1 , t2 ≤ s2 , t1 ≤ t2 =⇒{inductive hypothesis} ∃β1 , . . . , βn : clause(E, A th [s1 , s2 ], (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )) ∈ TVh ∧ {(B1 , β1 ), . . . , (Bn , βn )} ⊆ Tω E ∧ DC |= α1 β1 , . . . , αn βn ∧ DC |= C1 ,. . . , Ck ∧ DC |= s1 ≤ t1 , t2 ≤ s2 , t1 ≤ t2 =⇒{TVω = i∈N TVi } clause(E, A th [s1 , s2 ], (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )) ∈ TVω ∧ {(B1 , β1 ), . . . , (Bn , βn )} ⊆ Tω E ∧ DC |= α1 β1 , . . . , αn βn ∧ DC |= C1 , . . . , Ck ∧ DC |= s1 ≤ t1 , t2 ≤ s2 , t1 ≤ t2

36

Paolo Baldan et al.

=⇒{Lemma 2} (A, th [s1 , s2 ]) ∈ TE (Tω E ) ∧ DC |= s1 ≤ t1 , t2 ≤ s2 , t1 ≤ t2 =⇒{Tω is a ﬁxpoint of TE and DC |= s1 ≤ t1 , t2 ≤ s2 , t1 ≤ t2 } E ∧ DC |= th [t1 , t2 ] th [s1 , s2 ] (A, th [s1 , s2 ]) ∈ Tω E (clause (4)) {α = th [t1 , t2 ], deﬁnition of TV and clause (4)} ¯ B)), ¯ demo(E, (C, ¯ B)), ¯ demo(E, A th [s2 , t2 ])} ⊆ T h {clause(E, A th [s1 , s2 ], (C, V ∧ DC |= s1 ≤ t1 , t1 < s2 , s2 < t2 ¯ B) ¯ = (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )} =⇒{Lemma 1 and (C, clause(E, A th [s1 , s2 ], (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )) ∈ TVh ∧ {demo(E, B1 α1 ), . . . , demo(E, Bn αn ), demo(E, A th [s2 , t2 ])} ⊆ TVh ∧ DC |= C1 , . . . , Ck ∧ DC |= s1 ≤ t1 , t1 < s2 , s2 < t2 =⇒{inductive hypothesis} ∃β, β1 , . . . , βn : clause(E, A th [s1 , s2 ], (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )) ∈ TVh ∧ {(B1 , β1 ), . . . , (Bn , βn ), (A, β)} ⊆ Tω E ∧ DC |= α1 β1 , . . . , αn βn , th [s2 , t2 ] β ∧ DC |= C1 , . . . , Ck ∧ DC |= s1 ≤ t1 , t1 < s2 , s2 < t2 . Since DC |= th [s2 , t2 ] β then β = th [w1 , w2 ] with DC |= w1 ≤ s2 , t2 ≤ w2 . Hence we distinguish two cases according to the relation between w1 and s1 . – DC |= w1 ≤ s1 . In this case we immediately conclude because DC |= th [t1 , t2 ] th [w1 , w2 ], and thus (A, th [w1 , w2 ]) ∈ Tω E ∧ DC |= th [t1 , t2 ] th [w1 , w2 ]. – DC |= s1 < w1 . In this case clause(E, Ath [s1 , s2 ], (C1 , . . . , Ck , B1 α1 , . . . , Bn αn )) ∈ TVω , since ω TV = i∈N TVi . Moreover, from DC |= s1 < w1 , w1 ≤ s2 , s2 < t2 , t2 ≤ w2 , ω by Lemma 3 we obtain (A, th [s1 , w2 ]) ∈ TE (Tω E ). Since TE is a ﬁxpoint of TE and DC |= s1 ≤ t1 , t2 ≤ w2 we can conclude (A, th [s1 , w2 ]) ∈ TωE and DC |= th [t1 , t2 ] th [s1 , w2 ]. We are ﬁnally able to prove the soundness of the meta-interpreter with respect to the least ﬁxpoint semantics. demo(E, A α) ∈ TVω ω =⇒ {TV = i∈N TVi } ∃h : demo(E, A α) ∈ TVh =⇒ {Statement (12)} ∃β : (A, β) ∈ Tω E ∧ DC |= α β =⇒ {deﬁnition of F C } (A, α) ∈ F C (E). Completeness. We ﬁrst need a lemma stating that if an annotated atom A α is provable at the meta-level in a program expression E then we can prove at the meta-level the same atom A with any other “weaker” annotation (namely A γ, with γ α).

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

37

Lemma 4. Let E be a program expression and V be the corresponding metaprogram. For any A ∈ C-base L and α ∈ Ann, the following statement holds: demo(E, A α) ∈ TVω

=⇒

{demo(E, A γ) | γ ∈ Ann, DC |= γ α} ⊆ TVω .

Now the completeness result for MuTACLP meta-interpreter basically relies on two technical lemmata (Lemma 7 and Lemma 8). Roughly speaking they assert that when th and in annotated atoms are derivable from an interpretation I by using the TE operator then we can ﬁnd corresponding virtual clauses in the program expression E which permit to derive the same or greater information. Let us ﬁrst introduce some preliminary notions and results. Definition 6 (covering). A covering for a th -annotation th [t1 , t2 ] is a sequence of annotations {th [ti1 , ti2 ]}i∈{1,...,n} , such that DC |= th [t1 , t2 ] th [t11 , t2n ] and for any i ∈ {1, . . . , n} ≤ ti2 , ti1 < ti+1 DC |= ti1 ≤ ti2 , ti+1 1 1 . In words, a covering of a th annotation th [t1 , t2 ] is a sequence of annotations {th [ti1 , ti2 ]}i∈{1,...,n} such that each of the intervals overlaps with its successor, and the union of such intervals includes [t1 , t2 ]. The next simple lemma observes that, given two annotations and a covering for each of them, we can always build a covering for their greatest lower bound. Lemma 5. Let th [t1 , t2 ] and th [s1 , s2 ] be annotations and th [w1 , w2 ] = th [t1 , t2 ] # th [s1 , s2 ]. Let {th [ti1 , ti2 ]}i∈{1,...,n} and {th [sj1 , sj2 ]}j∈{1,...,m} be coverings for th [t1 , t2 ] and th [s1 , s2 ], respectively. Then a covering for th [w1 , w2 ] can be extracted from {th [ti1 , ti2 ] # th [sj1 , sj2 ] | i ∈ {1, . . . n} ∧ j ∈ {1, . . . , m}}. In the hypothesis of the previous lemma [w1 , w2 ] = [t1 , t2 ] ∩ [s1 , s2 ]. Thus the result of the lemma is simply a consequence of the distributivity of set-theoretical intersection with respect to union. Definition 7. Let E be a program expression, let V be the corresponding metaprogram and let I ⊆ C-base L × Ann be an interpretation. Given an annotated atom (A, th [t1 , t2 ]) ∈ C-base L × Ann, an (E, I)-set for (A, th [t1 , t2 ]) is a set ¯ i ))}i∈{1,...,n} ⊆ T ω {clause(E, A th [ti1 , ti2 ], (C¯ i , B V such that 1. {th [ti1 , ti2 ]}i∈{1,...,n} is a covering of th [t1 , t2 ], and ¯ i. 2. for i ∈ {1, . . . , n}, I |= C¯ i , B An interpretation I ⊆ C-base L × Ann is called th -closed with respect to E (or E-closed, for short) if there is an (E, I)-set for every annotated atom (A, th [t1 , t2 ]) ∈ I.

38

Paolo Baldan et al.

The next lemma presents some properties of the notion of E-closedness, which essentially state that the property of being E-closed is invariant with respect to some obvious algebraic transformations of the program expression E. Lemma 6. Let E, R and N be program expressions and let I be an interpretation. Then the following properties hold, where op ∈ {∪, ∩} 1. 2. 3. 4. 5. 6.

I is (E op E)-closed iﬀ I is E-closed; I is (E op R)-closed iﬀ I is (R op E)-closed; I is ((E op R) op N )-closed iﬀ I is E op (R op N )-closed; if I is E-closed then I is (E ∪ R)-closed; if I is (E ∩ R)-closed then I is E-closed; I is ((E ∩ R) ∪ N )-closed iﬀ I is ((E ∪ N ) ∩ (R ∪ N ))-closed.

We next show that if we apply the TE operator to an E-closed interpretation, then for any derived th -annotated atom there exists an (E, I)-set (see Deﬁnition 7). This result represents a basic step towards the completeness proof. In fact, it tells us that starting from the empty interpretation, which is obviously E-closed, and iterating the TE then we get, step after step, th -annotated atoms which can be also derived from the virtual clauses of the program expression at hand. For technical reasons, to make the induction work, we need a slightly stronger property. Lemma 7. Let E and Q be program expressions, let V be the corresponding meta-program4 and let I ⊆ C-base L × Ann be an (E ∪ Q)-closed interpretation. Then for any atom (A, th [t1 , t2 ]) ∈ TE (I) there exists an (E ∪ Q, I)-set. Corollary 1. Let E be any program expression and let V be the corresponding meta-program. Then for any h ∈ N the interpretation ThE is E-closed. Therefore TωE is E-closed. Another technical lemma is needed for dealing with the in annotations, which comes in pair with Lemma 7. Lemma 8. Let E be a program expression, let V be the corresponding metaprogram and let I be any E-closed interpretation. For any atom (A, in [t1 , t2 ]) ∈ TE (I) we have ¯ B)) ¯ ∈ T ω ∧ I |= C, ¯ B ¯ ∧ DC |= in [t1 , t2 ] α. clause(E, A α, (C, V Now we can prove the completeness of the meta-interpreter with respect to the least ﬁxpoint semantics. Theorem 4 (Completeness). Let E be a program expression and V be the corresponding meta-program. For any A ∈ C-base L and α ∈ Ann the following statement holds: (A, α) ∈ F C (E) 4

=⇒

demo(E, A α) ∈ TVω .

The meta-program contains the meta-level representation of the plain programs in E and Q.

MuTACLP: A Language for Temporal Reasoning with Multiple Theories

39

Proof. We ﬁrst show that for all h (A, α) ∈ ThE

=⇒

demo(E, A α) ∈ TVω .

(13)

The proof is by induction on h. (Base case). Trivial since T0E = ∅. (Inductive case). Assume that (A, α) ∈ ThE

=⇒

demo(E, A α) ∈ TVω .

Observe that, under the above assumption, ¯ B ¯ ThE |= C,

⇒

¯ B)) ¯ ∈ T ω. demo(E, (C, V

(14)

¯ = B1 α1 , . . . , Bn αn . Then the notation Th |= C¯ In fact let C¯ = C1 , . . . , Ck and B E amounts to say that for each i, DC |= Ci and thus demo(E, Ci ) ∈ TVω , by deﬁnition ¯ means that for each i, (Bi , βi ) ∈ Th of TV and clause (7). Furthermore ThE |= B E and DC |= αi βi . Hence by inductive hypothesis demo(E, Bi βi ) ∈ TVω and thus, by Lemma 4, demo(E, Bi αi ) ∈ TVω . By several applications of clause (2) in ¯ C)) ¯ ∈ T ω. the meta-interpreter we ﬁnally deduce demo(E, (B, V It is convenient to treat separately the cases of th and in annotations. If we assume that α = th [t1 , t2 ], then (A, th [t1 , t2 ]) ∈ Th+1 E ⇐⇒ {deﬁnition of TiE } (A, th [t1 , t2 ]) ∈ TE (ThE ) =⇒ {Lemma 7 and ThE is E-closed by Corollary 1} ¯ i ))}i∈{1,...,n} ⊆ T ω ∧ {clause(E, A th [ti1 , ti2 ], (C¯ i , B V h i i ¯ for i ∈ {1, . . . , n} ∧ TE |= C¯ , B {th [ti1 , ti2 ]}i∈{1,...,n} covering of th [t1 , t2 ] =⇒ {previous remark (14)} ¯ i ))}i∈{1,...,n} ⊆ T ω ∧ {clause(E, A th [ti1 , ti2 ], (C¯ i , B V i i ω ¯ )) ∈ T for i ∈ {1, . . . , n} ∧ demo(E, (C¯ , B V {th [ti1 , ti2 ]}i∈{1,...,n} covering of th [t1 , t2 ] =⇒ {deﬁnition of TV , clause (3) and TVω is a ﬁxpoint of TV } demo(E, A th [tn1 , tn2 ]) ∈ TVω ∧ ¯ i ))}i∈{1,...,n−1} ⊆ T ω ∧ {clause(E, A th [ti1 , ti2 ], (C¯ i , B V i i ω ¯ )) ∈ T for i ∈ {1, . . . , n − 1} ∧ demo(E, (C¯ , B V {th [ti1 , ti2 ]}i∈{1,...,n} covering of th [t1 , t2 ] =⇒ {deﬁnition of TV , clause (4), Lemma 4 and TVω is a ﬁxpoint of TV } ¯ i ))}i∈{1,...,n−2} ⊆ T ω demo(E, A th [tn−1 , tn2 ]) ∧ {clause(E, A th [ti1 , ti2 ], (C¯ i , B 1 V i i ω ¯ )) ∈ T for i ∈ {1, . . . , n − 2} ∧ ∧ demo(E, (C¯ , B V {th [ti1 , ti2 ]}i∈{1,...,n} covering of th [t1 , t2 ] =⇒ {by exploiting several times clause (4) as above} demo(E, A th [t11 , tn2 ]) ∧ {th [ti1 , ti2 ]}i∈{1,...,n} covering of th [t1 , t2 ] =⇒ {by deﬁnition of covering DC |= th [t1 , t2 ] th [t11 , tn2 ] and Lemma 4} demo(E, A th [t1 , t2 ]) ∈ TVω

40

Paolo Baldan et al.

Instead, if α = in [t1 , t2 ], then (A, in [t1 , t2 ]) ∈ Th+1 E ⇐⇒ {deﬁnition of TiE } (A, in [t1 , t2 ]) ∈ TE (ThE ) =⇒ {Lemma 8} ¯ B ¯ ∧ DC |= in [t1 , t2 ] β ¯ B)) ¯ ∈ T ω ∧ Th |= C, clause(E, A β, (C, V E =⇒ {previous remark (14)} ¯ B)) ¯ ∈ T ω ∧ DC |= in [t1 , t2 ] β ¯ B)) ¯ ∈ T ω ∧ demo(E, (C, clause(E, A β, (C, V V ω =⇒ {clause (3) or (6), and TV is a ﬁxpoint of TV } demo(E, A β) ∈ TVω ∧ DC |= in [t1 , t2 ] β =⇒ {Lemma 4} demo(E, A in [t1 , t2 ]) ∈ TVω We now prove the completeness of the meta-interpreter of the program expressions with respect to the least ﬁxpoint semantics. (A, α) ∈ F C (E) =⇒ {deﬁnition of FC (E)} ∃γ ∈ Ann: (A, γ) ∈ Tω E ∧ DC |= α γ i =⇒ {Tω = T } E i∈N E ∃h : (A, γ) ∈ ThE ∧ DC |= α γ =⇒ {statement (13)} demo(E, A γ) ∈ TVω ∧ DC |= α γ =⇒ {Lemma 4} demo(E, A α) ∈ TVω

Description Logics for Information Integration Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini Dipartimento di Informatica e Sistemistica Universit` a di Roma “La Sapienza” Via Salaria 113, 00198 Roma, Italy lastname @dis.uniroma1.it, http://www.dis.uniroma1.it/∼lastname Abstract. Information integration is the problem of combining the data residing at diﬀerent, heterogeneous sources, and providing the user with a uniﬁed view of these data, called mediated schema. The mediated schema is therefore a reconciled view of the information, which can be queried by the user. It is the task of the system to free the user from the knowledge on where data are, and how data are structured at the sources. In this chapter, we discuss data integration in general, and describe a logic-based approach to data integration. A logic of the Description Logics family is used to model the information managed by the integration system, to formulate queries posed to the system, and to perform several types of automated reasoning supporting both the modeling, and the query answering process. We focus, in particular, on a speciﬁc Description Logic, called DLR, speciﬁcally designed for database applications. In the chapter, we illustrate how DLR is used to model a mediated schema of an integration system, to specify the semantics of the data sources, and ﬁnally to support the query answering process by means of the associated reasoning methods.

1

Introduction

Information integration is the problem of combining the data residing at diﬀerent sources, and providing the user with a uniﬁed view of these data, called mediated schema. The mediated schema is therefore a reconciled view of the information, which can be queried by the user. It is the task of the data integration system to free the user from the knowledge on where data are, and how data are structured at the sources. The interest in this kind of systems has been continuously growing in the last years. Many organizations face the problem of integrating data residing in several sources. Companies that build a Data Warehouse, a Data Mining, or an Enterprise Resource Planning system must address this problem. Also, integrating data in the World Wide Web is the subject of several investigations and projects nowadays. Finally, applications requiring accessing or re-engineering legacy systems must deal with the problem of integrating data stored in diﬀerent sources. The design of a data integration system is a very complex task, which comprises several diﬀerent issues, including the following: A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 41–60, 2002. c Springer-Verlag Berlin Heidelberg 2002

42

1. 2. 3. 4. 5. 6.

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini

heterogeneity of the sources, relation between the mediated schema and the sources, limitations on the mechanisms for accessing the sources, materialized vs. virtual integration, data cleaning and reconciliation, how to process queries expressed on the mediated schema.

Problem (1) arises because sources are typically heterogeneous, meaning that they adopt diﬀerent models and systems for storing data. This poses challenging problems in specifying the mediated schema. The goal is to design such a schema so as to provide an appropriate abstraction of all the data residing at the sources. One aspect deserving special attention is the choice of the language used to express the mediated schema. Since such a schema should mediate among diﬀerent representations of overlapping worlds, the language should provide ﬂexible and powerful representation mechanisms. We refer to [34] for a more detailed discussion on this subject. Following the work in [32,16,40], in this paper we use a formalism of the family of Description Logics to specify mediated schemas. With regard to Problem (2), two basic approaches have been used to specify the relation between the sources and the mediated schema. The ﬁrst approach, called global-as-view (or query-based), requires that the mediated schema is expressed in terms of the data sources. More precisely, to every concept of the mediated schema, a view over the data sources is associated, so that its meaning is speciﬁed in terms of the data residing at the sources. The second approach, called local-as-view (or source-based), requires the mediated schema to be speciﬁed independently from the sources. The relationships between the mediated schema and the sources are established by deﬁning every source as a view over the mediated schema. Thus, in the local-as-view approach, we specify the meaning of the sources in terms of the concepts in the mediated schema. It is clear that the latter approach favors the extensibility of the integration system, and provides a more appropriate setting for its maintenance. For example, adding a new source to the system requires only to provide the deﬁnition of the source, and does not necessarily involve changes in the mediated schema. On the contrary, in the global-as-view approach, adding a new source typically requires changing the deﬁnition of the concepts in the mediated schema. For this reason, in the rest of the paper, we adopt the local-as-view approach. A comparison between the two approaches is reported in [51]. Problem (3) refers to the fact, that, both in the local-as-view and in the global-as-view approach, it may happen that a source presents some limitations on the types of accesses it supports. A typical example is a web source accessible through a form where one of the ﬁelds must necessarily be ﬁlled in by the user. Such a situation can be modeled by specifying the source as a relation supporting only queries with a selection on a column. Suitable notations have been proposed for such situations [44], and the consequences of these access limitations on query processing in integration systems have been investigated in several papers [44,43,27,56,55,41,42]. Problem (4) deals with a further criterion that we should take into account in the design of a data integration system. In particular, with respect to the

Description Logics for Information Integration

43

data explicitely managed by the system, we can follow two diﬀerent approaches, called materialized and virtual. In the materialized approach, the system computes the extensions of the concepts in the mediated schema by replicating the data at the sources. In the virtual approach, data residing at the sources are accessed during query processing, but they are not replicated in the integration system. Obviously, in the materialized approach, the problem of refreshing the materialized views in order to keep them up-to-date is a major issue [34]. In the following, we only deal with the virtual approach. Whereas the construction of the mediated schema concerns the intentional level of the data integration system, problem (5) refers to a number of issues arising when considering the integration at the extensional/instance level. A ﬁrst issue in this context is the interpretation and merging of the data provided by the sources. Interpreting data can be regarded as the task of casting them into a common representation. Moreover, the data returned by various sources need to be converted/reconciled/combined to provide the data integration system with the requested information. The complexity of this reconciliation step is due to several problems, such as possible mismatches between data referring to the same real world object, possible errors in the data stored in the sources, or possible inconsistencies between values representing the properties of the real world objects in diﬀerent sources [28]. The above task is known in the literature as Data Cleaning and Reconciliation, and the interested reader is referred to [28,10,4] for more details on this subject. Finally, problem (6) is concerned with one of the most important issues in a data integration system, i.e., the choice of the method for computing the answer to queries posed in terms of the mediated schema. While query answering in the global-as-view approach typically reduces to unfolding, an integration system based on the local-as-view approach must resort to more sophisticated query processing techniques. The main issue is that the system should be able to reexpress the query in terms of a suitable set of queries posed to the sources. In this reformulation process, the crucial step is deciding how to decompose the query on the mediated schema into a set of subqueries on the sources, based on the meaning of the sources in terms of the concepts in the mediated schema. The computed subqueries are then shipped to the sources, and the results are assembled into the ﬁnal answer. In the rest of this paper, we concentrate on Problem (6), namely, query processing in a data integration system speciﬁed by means of the local-as-view approach, and we present the following contributions: – We ﬁrst provide a logical formalization of the problem. In particular, we illustrate a general architecture for a data integration system, comprising a mediated schema, a set of views, and a query. Query processing in this setting is formally deﬁned as the problem of answering queries using views: compute the answer to a query only on the basis of the extension of a set of views [1,29]. We observe that, besides data integration, this problem is relevant in several ﬁelds, including data warehousing [54], query optimization [17], supporting physical data independence [50], etc.

44

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini

– Then we instantiate the general framework to the case where schemas, views and queries are expressed by making use of a particular logical language. In particular: • The mediated schema is expressed in terms of a knowledge base constituted by general inclusion assertions and membership assertions, formulated in an expressive Description Logic [6]. • Queries and views are expressed as non-recursive datalog programs, whose predicates in the body are concepts or relations that appear in the knowledge base. • For each view, it can be speciﬁed whether the provided extension is sound, complete, or exact with respect to the view deﬁnition [1,11]. Such assumptions are used in data integration with the following meaning. A sound view corresponds to an information source which is known to produce only, but not necessarily all, the answers to the associated query. A complete view models a source which is known to produce all answers to the associated query, and maybe more. Finally, an exact view is known to produce exactly the answers to the associated query. – We then illustrate a technique for the problem of answering queries using views in our setting. We ﬁrst describe how to formulate the problem in terms of logical implication, and then we present a technique to check logical implication in 2EXPTIME worst case complexity. The paper is organized as follows. Section 2 presents the general framework. Section 3 illustrates the use of Description Logics for setting up a particular architecture for data integration, according to the general framework. Section 4 presents the method we use for query answering using views in our architecture. Section 5 describes other works on the problem of answering query using views. Finally, Section 6 concludes the paper.

2

Framework

In this section we set up a logical framework for data integration. Since we assume to work with relational databases, in the following we refer to a relational alphabet A, i.e., an alphabet constituted by a set of predicate and constant symbols. Predicate symbols are used to denote the relations in the database, whereas constant symbols denote the objects stored in relations. We adopt the so-called unique name assumption, i.e., we assume that diﬀerent constants denote diﬀerent objects. A database (DB) DB is simply a set of relations, one for each predicate symbol in the alphabet A. The relation corresponding to the predicate symbol Ri is constituted by a set of tuples of constants, which specify the objects that satisfy the relation associated to Ri . The main components of a data integration system are the mediated schema, the sources, and the queries. Each component is expressed in a speciﬁc language over the alphabet A:

Description Logics for Information Integration

45

– the mediated schema is expressed in the schema language LS , – the sources are modeled as views over the mediated schema, expressed in the view language LV , – queries are issued over the mediated schema, and are expressed in the query language LQ . In what follows, we provide a speciﬁcation of the three components of a data integration system. – The mediated schema S is a set of constraints, each one expressed in the language LS over the alphabet A. The language LS determines the expressiveness allowed for specifying the schema of our database, i.e., the constraints that the database must satisfy. If S is constituted by the constraints {C1 , . . . , Cn }, we say that a database DB satisﬁes S if all constraints C1 , . . . , Cn are satisﬁed by DB. – The sources are modeled in terms of a set of views V = {V1 , . . . , Vm } over the mediated schema. Associated to each view Vi we have: • A deﬁnition def (Vi ) in terms of a query Vi (x) ← vi (x, y) over DB, where vi (x, y) is expressed in the language LV over the alphabet A. The arity of x determines the arity of the view Vi . • A set ext(Vi ) of tuples of constants, which provides the information about the extension of Vi , i.e., the content of the sources. The arity of each tuple is the same as that of Vi . • A speciﬁcation as(Vi ) of which assumption to adopt for the view Vi , i.e., how to interpret the content of the source ext (Vi ) with respect to the actual set of tuples in DB that satisfy Vi . We describe below the various possibilities that we consider for as(Vi ). – A query is expressed in the language LQ over the alphabet A, and is intended to provide the speciﬁcation of which data to extract from the virtual database represented in the integration system. In general, if Q is a query and DB is a database satsfying S, we denote with ans(Q, DB) the set of tuples in DB that satisfy Q. The speciﬁcation as(Vi ) determines how accurate is the knowledge on the pairs satisfying the views, i.e., how accurate is the source with respect to the speciﬁcation def (Vi )1 . As pointed out in several papers [1,29,37,11], the following three assumptions are relevant in a data integration system: – Sound Views. When a view Vi is sound (denoted with as(Vi ) = sound ), its extension provides any subset of the tuples satisfying the corresponding deﬁnition. In other words, from the fact that a tuple is in ext(Vi ) one can conclude that it satisﬁes the view, while from the fact that a tuple is not in ext(Vi ) one cannot conclude that it does not satisfy the view. Formally, a database DB is coherent with the sound view Vi , if ext(Vi ) ⊆ ans(def (Vi ), DB). 1

In some papers, for example [11], diﬀerent assumptions on the domain of the database are also taken into account.

46

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini

– Complete Views. When a view Vi is complete (denoted with as(Vi ) = complete), its extension provides any superset of the tuples satisfying the corresponding deﬁnition. In other words, from the fact that a tuple is in ext(Vi ) one cannot conclude that such a tuple satisﬁes the view. On the other hand, from the fact that a tuple is not in ext (Vi ) one can conclude that such a tuple does not satisfy the view. Formally, a database DB is coherent with the complete view Vi , if ext(Vi ) ⊇ ans(def (Vi ), DB). – Exact Views. When a view Vi is exact (denoted with as(Vi ) = exact ), its extension is exactly the set of tuples of objects satisfying the corresponding deﬁnition. Formally, a database DB is coherent with the exact view Vi , if ext(Vi ) = ans(def (Vi ), DB). The ultimate goal of a data integration system is to allow a client to extract information from the database, taking into account that the only knowledge s/he has on the database is the extension of the set of views, i.e., the content of the sources. More precisely, the problem of extracting information from the data integration system reduces to the problem of answering queries using views. Given – a schema S, – a set of views V = {V1 , . . . , Vm }, with, for each Vi , • its deﬁnition def (Vi ), • its extension ext(Vi ), and • the speciﬁcation as(Vi ) of whether it is sound, complete, or exact, – a query Q of arity n, and – a tuple d = (d1 , . . . , dn ) of constants, the problem consists in deciding whether d ∈ ans(Q, S, V), i.e., deciding whether (d1 , . . . , dn ) ∈ ans(Q, DB), for each DB such that: – DB satisﬁes the schema S, – DB is coherent with V1 , . . . , Vm . ¿From the above deﬁnition, it is easy to see that answering queries using views is essentially an extended form of reasoning in the presence of incomplete information [53]. Indeed, when we answer the query on the basis of the views, we know only the extensions of the views, and this provides us with only partial information on the database. Moreover, since the query language may admit various forms of incomplete information (due to union, for instance), there are in general several possible databases that are coherent with the views. The following example rephrases an example given in [1]. Example 1. Consider a relational alphabet containing (among other symbols) a binary predicate couple, and two constants Ann and Bill. Consider also two views female and male, respectively with deﬁnitions female(f ) ← couple(f, m) male(m) ← couple(f, m)

Description Logics for Information Integration

47

and extensions ext (female) = {Ann} and ext (male) = {Bill}, and assume that there are no constraints imposed by a schema. If both views are sound, we only know that some couple has Ann as its female component and Bill as its male component. Therefore, the query Qc (x, y) ← couple(x, y) asking for all couples would return an empty answer, i.e., ans(Qc , S, V) = ∅. However, if both views are exact, we can conclude that all couples have Ann as their female component and Bill as their male component, and hence that (Ann, Bill) is the only couple, i.e., ans(Qc , S, V) = (Ann, Bill).

3

Specifying the Content of the Data Integration System

We propose here an architecture for data integration that is coherent with the framework described in Section 2, and is based on Description Logics [9,8]. In such an architecture, to specify mediated schemas, views, and queries we use the Description Logic DLR [6]. We ﬁrst introduce DLR, and then we illustrate how we use the logic to specify the three components of a data integration system. 3.1

The Description Logic DLR

Description Logics 2 (DLs) have been introduced in the early 80’s in the attempt to provide a formal ground to Semantic Networks and Frames. Since then they have evolved into knowledge representation languages that are able to capture virtually all class-based representation formalisms used in Artiﬁcial Intelligence, Software Engineering, and Databases. One of the distinguishing features of the work on these logics is the detailed computational complexity analysis both of the associated reasoning algorithms, and of the logical implication problem that the algorithms are supposed to solve. By virtue of this analysis, most of these logics have optimal reasoning algorithms, and practical systems implementing such algorithms are now used in several projects. In DLs, the domain of interest is modeled by means of concepts and relations, which denote classes of objects and relationships, respectively. Here, we focus our attention on the DL DLR [5,6]. The basic elements of DLR are concepts (unary relations), and n-ary relations. We assume to deal with an alphabet A constituted by a ﬁnite set of atomic relations, atomic concepts, and constants, denoted by P , A, and a, respectively. We use R to denote arbitrary relations (of given arity between 2 and nmax ), and C to denote arbitrary concepts, respectively built according to the following syntax: R ::= n | P | $i/n : C | ¬R | R1 R2 C ::= 1 | A | ¬C | C1 C2 | ∃[$i]R | (≤ k [$i]R) where i denotes a component of a relation, i.e., an integer between 1 and nmax , n denotes the arity of a relation, i.e., an integer between 2 and nmax , and k denotes a nonnegative integer. We also use the following abbreviations: 2

See http://dl.kr.org for the home page of Description Logics.

48

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini In PI $i/n : C I (¬R)I (R1 R2 )I

⊆ ⊆ = = =

(∆I )n In {(d1 , . . . , dn ) ∈ In | di ∈ C I } In \ RI R1I ∩ R2I

I1 AI (¬C)I (C1 C2 )I (∃[$i]R)I (≤ k [$i]R)I

= ⊆ = = = =

∆I ∆I ∆I \ C I C1I ∩ C2I {d ∈ ∆I | ∃(d1 , . . . , dn ) ∈ RI . di = d} {d ∈ ∆I | {(d1 , . . . , dn ) ∈ R1I | di = d} ≤ k}

Fig. 1. Semantic rules for DLR (P , R, R1 , and R2 have arity n) – – – –

⊥ for ¬, C1 C2 for ¬(¬C1 ¬C2 ), C1 ⇒ C2 for ¬C1 C2 , and C1 ≡ C2 for (C1 ⇒ C2 ) (C2 ⇒ C1 ).

We consider only concepts and relations that are well-typed, which means that – only relations of the same arity n are combined to form expressions of type R1 R2 (which inherit the arity n), and – i ≤ n whenever i denotes a component of a relation of arity n. The semantics of DLR is speciﬁed as follows. An interpretation I is constituted by an interpretation domain ∆I , and an interpretation function ·I that assigns to each constant an element of ∆I under the unique name assumption, to each concept C a subset C I of ∆I , and to each relation R of arity n a subset RI of (∆I )n , such that the conditions in Figure 1 are satisﬁed. Observe that, the “¬” constructor on relations is used to express diﬀerence of relations, and not the complement [6]. 3.2

Mediated Schema, Views, and Queries

We remind the reader that a mediated schema is constituted by a ﬁnite set of constraints expressed in a schema language LS . In our setting, the schema language LS is based on the DL DLR. In particular, each constraint is formulated as an assertion of one of the following forms: R1 R2

C1 C2

where R1 and R2 are DLR relations of the same arity, and C1 and C2 are DLR concepts. As we said before, a database DB is a set of relations, one for each predicate symbol in the alphabet A. We denote with RDB the relation in DB corresponding

Description Logics for Information Integration

49

to the predicate symbol R (either an atomic concept, or an atomic relation). Note that a database can be seen as an interpretation for DLR, whose domain coincides with the set of constants in the alphabet A. We say that a database DB satisfies an assertion R1 R2 (resp., C1 C2 ) if R1DB ⊆ R2DB (resp., C1DB ⊆ C2DB ). Moreover, DB satisﬁes a schema S if DB satisﬁes all assertions in S. In order to deﬁne views and queries, we now introduce the notion of query expression in our setting. We assume that the alphabet A is enriched with a ﬁnite set of variable symbols, simply called variables. A query expression Q is a non-recursive datalog query of the form Q(x) ← conj 1 (x, y 1 ) ∨ · · · ∨ conj m (x, y m ) where each conj i (x, y i ) is a conjunction of atoms, and x, y i are all the variables appearing in the conjunct. Each atom has one of the forms R(t) or C(t), where t and t are variables in x and y i or constants in A, R is a relation, and C is a concept. The number of variables of x is called the arity of Q, and is the arity of the relation denoted by the query Q. We observe that the atoms in the query expressions are arbitrary DLR relations and concepts, freely used in the assertions of the KB. This distinguishes our approach with respect to [22,39], where no constraints on the relations that appear in the queries can be expressed in the KB. Given a database DB, a query expression Q of arity n is interpreted as the set QDB of n-tuples of constants (c1 , . . . , cn ), such that, when substituting each ci for xi , the formula ∃y 1 .conj 1 (x, y 1 ) ∨ · · · ∨ ∃y m .conj m (x, y m ) evaluates to true in DB. With the introduction of query expressions, we can now deﬁne views and queries. Indeed, in our setting, query expressions constitute both the view language LV , and the query language LQ : – Associated to each view Vi in the set V = {V1 , . . . , Vm } we have: • A deﬁnition def (Vi ) in terms of a query expression • A set ext(Vi ) of tuples of constants, • A speciﬁcation as(Vi ) of which assumption to adopt for the view Vi , where each as(Vi ) is either sound, complete, or exact. – A query is simply a query expression, as deﬁned above. Example 2. Consider for example the following DLR schema Sd , expressing that Americans who have a doctor as relative are wealthy, and that each surgeon is also a doctor American ∃[$1](RELATIVE $2 : Doctor) Wealthy Surgeon Doctor

50

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini

and two sound views V1 and V2 , respectively with deﬁnitions V1 (x) ← RELATIVE(x, y) ∧ Surgeon(y) V2 (x) ← American(x) and extensions ext (V1 ) = {Ann, Bill} ext (V2 ) = {Ann, Dan} Given the query Qw (x) ← Wealthy(x), asking for those who are wealthy, we have that the only constant in ans(Qw , Sd , V) is Ann. Moreover, if we add an exact view V3 with deﬁnition V3 (x) ← Wealthy(x), and an extension ext(V3 ) not containing Bill, then, from the constraints in Sd and the information we have on the views, we can conclude that Bill is not American. 3.3

Discussion

We observe that DLR is able to capture a great variety of data models with many forms of constraints [15,6]. For example, DLR is capable to capture formally Conceptual Data Models typically used in databases [33,24], such as the EntityRelationship Model [18]. Hence, in our setting, query answering using views is done under the constraints imposed by a conceptual data model. The interest in DLR is not conﬁned to the expressiveness it provides for specifying data schemas. It is also equipped with eﬀective reasoning techniques that are sound and complete with respect to the semantics. In particular, checking whether a given assertion logically follows from a set of assertions is EXPTIMEcomplete in DLR (assuming that numbers are encoded in unary), and query containment, i.e., checking whether one query is contained in another one in every model of a set of assertions, is EXPTIME-hard and solvable in 2EXPTIME [6].

4

Query Answering

In this section we study the problem of query answering using views in the setting just deﬁned: the schema is expressed as a DLR knowledge base, and queries and view deﬁnitions are espressed as DLR query expressions. We call the resulting problem answering query using views in DLR. The technical results regarding answering query using views in DLR illustrated in this section are taken from [7]. The ﬁrst thing to observe is that, given a schema S expressed in DLR, a set of views V = {V1 , . . . , Vm }, a query Q, and a tuple d = (d1 , . . . , dn ) of constants, verifying whether, d is in ans(Q, S, V) is essentially a form of logical implication. This observation can be made even sharper if we introduce special assertions, expressed in ﬁrst-order logic with equality, that encode as logical formulas the extension of the views. In particular, for each view V ∈ V, with def (V ) = (V (x) ← v(x, y)) and ext(V ) = {a1 , . . . , ak }, we introduce the following assertions.

Description Logics for Information Integration

51

– If V is sound, then for each tuple ai , 1 ≤ i ≤ k, we introduce the existentially quantiﬁed assertion ∃y.v(ai , y) – If V is complete, then we introduce the universally quantiﬁed assertion ∀x.∀y.((x != a1 ∧ · · · ∧ x != ak ) → ¬v(x, y)) – If V is exact, then, according to the deﬁnition, we treat it as a view that is both sound and complete, and introduce both types of assertions above. Let us call Ext(V) the set of assertions corresponding to the extension of the views V. Now, the problem of query answering using views in DLR, i.e., checking whether d ∈ ans(Q, S, V), can be reformulated as checking whether the following logical implication holds: S ∪ Ext(V) |= ∃y.q(d, y) where q(x, y) is the right hand part of Q. Checking such a logical implication can in turn be rephrased as checking the unsatisﬁability of S ∪ Ext (V) ∪ {∀y.¬q(d, y)} Observe that the assertion ∀y.¬q(d, y) has the same form as the universal assertion used for expressing extensions of complete views, except that the antecedent in the implication is empty. The problem with the newly introduced assertions is that they are not yet expressed in a DL. The next step is to translate them in a DL. Instead of working directly with DLR, we are going to translate the problem of query answering using views in DLR to reasoning in a DL, called CIQ, that directly corresponds to a variant of Propositional Dynamic Logic [20,6]. 4.1

The Description Logic CIQ

The DL CIQ is obtained from DLR by restricting relations to be binary (such relations are called roles and inverse roles) and allowing for complex roles corresponding to regular expressions [20]. Concepts of CIQ are formed according to the following abstract syntax: C ::= | A | C1 C2 | ¬C | ∃R.C | (≤ k Q. C) Q ::= P | P − R ::= Q | R1 R2 | R1 ◦ R2 | R∗ | R− | id (C) where A denotes an atomic concept, C a generic concept, P an atomic role, Q a simple role, i.e., either an atomic role or the inverse of an atomic role, and R a generic role. We also use the following abbreviations:

52

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini AI I (¬C)I (C1 C2 )I (∃R.C)I (≤ k Q. C)I

⊆ = = = = =

∆I ∆I ∆I \ C I C1I ∩ C2I {d ∈ ∆I | ∃(d, d ) ∈ RI . d ∈ C I } {d ∈ ∆I | {(d, d ) ∈ QI | d ∈ C I } ≤ k}

PI (R1 R2 )I (R1 ◦ R2 )I (R∗ )I (R− )I id(C)I

⊆ = = = = =

∆I × ∆I R1I ∪ R2I R1I ◦ R2I (RI )∗ = i≥0 (RI )i {(d1 , d2 ) ∈ ∆I × ∆I | (d2 , d1 ) ∈ RI } {(d, d) ∈ ∆I × ∆I | d ∈ C I }

Fig. 2. Semantic rules for CIQ – ∀R.C for ¬∃R.¬C, – (≥ k Q. C) for ¬(≤ k−1 Q. C) The semantic conditions for CIQ are speciﬁed in Figure 2 3 . The use of CIQ allows us to exploit various results established recently for reasoning in such a logic. The basis of these results lies in the correspondence between CIQ and a variant of Propositional Dynamic Logic [26,35] that includes converse programs and “graded modalities” [25,52] on atomic programs and their converse [47]. CIQ inherits from Propositional Dynamic Logics the ability of internalizing assertions. Indeed, one can deﬁne a role U that essentially corresponds to a universal modality, as the reﬂexive-transitive closure of all roles and inverse roles in the language. Using such a universal modality we can re-express each assertion C1 C2 as the concept ∀U .(C1 ⇒ C2 ). This allows us to re-express logical implication as concept satisﬁability [47]. Concept satisﬁability (and hence logical implication) in CIQ is EXPTIME-complete [20]. Although CIQ does not have constructs for n-ary relations as DLR, it is possible to represent n-ary relations in a sound and complete way wrt concept satisﬁability (and hence logical implication) by means of reification [20]. An atomic relation P is reiﬁed by introducing a new atomic concept AP and n functional roles f1 , . . . , fn , one for each component of P . In this way, a tuple of the relation is represented by an instance of the corresponding concept, which is linked through each of the associated roles to an object representing the component of the tuple. Performing the reiﬁcation requires however some attention, since in a relation there may not be two equal tuples (i.e., constituted by the same components in the same positions) in its extension. In the reiﬁed counterpart, on the other hand, one cannot explicitly rule out (e.g., by using speciﬁc assertions) that there are two objects o1 and o2 “representing” the same tuple, i.e., that are connected to exactly the same objects denoting the components of 3

The notation (RI )i stands for i repetitions of RI – i.e., (RI )1 = RI , and (RI )i = RI ◦ (RI )i−1 .

Description Logics for Information Integration

53

the tuple. However, due to the fundamental inability of CIQ to express that two role sequences meet in the same object, no CIQ concept can force such a situation. Therefore one does not need to take this constraint explicitly into account when reasoning. Finally, we are going to make use of CIQ extended with object-names. An object-name is an atomic concept that, in each model, has as extension a single object. Object-names are not required to be disjoint, i.e, we do not make the unique name assumption on them. Disjointness can be explicitly enforced when needed through explicit assertions. In general, adding object-names to CIQ makes reasoning NEXPTIME-hard [49]. However our use of object-names in CIQ is restricted so as to keep reasoning in EXPTIME. 4.2

Reduction of Answering Queries Using Views in DLR to CIQ Unsatisfiability

We tackle answering queries using views in DLR, by reducing the problem of checking whether d ∈ ans(Q, S, V) to the problem of checking the unsatisﬁability of a CIQ concept in which object-names appear. Object-names are then eliminated, thus obtaining a CIQ concept. We translate S ∪ Ext (V) into a CIQ concept as follows. First, we eliminate n-ary relations by means of reification. Then, we reformulate each assertion in S as a concept by internalizing assertions. Instead, representing assertions in Ext(V) requires the following ad-hoc techniques. We translate each existentially quantiﬁed assertion ∃y.v(a, y) as follows. We represent every constant ai by an object-name Nai , enforcing disjointness between the object-names corresponding to diﬀerent constants. We represent each existentially quantiﬁed variable y, treated as a Skolem constant, by a new object-name without disjointness constraints. We also use additional concept-names representing tuples of objects. Speciﬁcally: – An atom C(t), where C is a concept and t is a term (either a constant or a variable), is translated to ∀U .(Nt ⇒ σ(C)) where σ(C) is the reiﬁed counterpart of C, Nt is the object-name corresponding to t, and U is the reﬂexive-transitive closure of all roles and inverse roles introduced in the reiﬁcation. – An atom R(t), where R is a relation of arity n and t = (t1 , . . . , tn ) is a tuple of terms, is translated to the conjunction of the following concepts: ∀U .(Nt ⇒ σ(R)) where σ(R) is the reiﬁed counterpart of R and Nt is an object-name corresponding to t, ∀U .(Nt ≡ (∃f1 .Nt1 · · · ∃fn .Ntn ))

54

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini

and for each i, 1 ≤ i ≤ n, a concept ∀U .(Nti ⇒((∃fi− .Nt ) (≤ 1 fi− . Nt ))) Then, the translations of the atoms are combined as in v(a, y). To translate universally quantiﬁed assertions corresponding to the complete views and also to the query, it is suﬃcient to deal with assertions of the form: ∀x.∀y.((x != a1 ∧ · · · ∧ x != ak ) → ¬conj (x, y)) Following [6], we construct for conj (x, y) a special graph, called tuple-graph, which reﬂects the dependencies between variables. Speciﬁcally, the tuple-graph is used to detect cyclic dependencies. In general, the tuple-graph is composed of ! ≥ 1 connected components. For the i-th connected component we build a CIQ concept δi (x, y) as in [6]. Such a concept contains newly introduced concepts Ax and Ay , one for each x in x and y in y. We have to treat variables in x and y that occur in a cycle in the tuple-graph diﬀerently from those outside of cycles. Let xc (resp., y c ) denote the variables in x (resp., y) that occur in a cycle, and xl (resp., y l ) those that do not occur in cycles. We ﬁrst deﬁne the concept C[xc /s, y c /t] as the concept obtained from (∀U .¬δ1 (x, y)) · · · (∀U .¬δ (x, y)) as follows: – for each variable xi in xc (resp., yi in y c ), the concept Axi (resp., Ayi ) is replaced by Nsi (resp., Nti ); – for each variable yi in y l , the concept Ayi is replaced by . Then the concept corresponding to the universally quantiﬁed assertion is constructed as the conjunction of: – ∀U .Cxl , where Cxl is obtained from x != a1 ∧ · · · ∧ x != ak by replacing each (x != a) with (Ax ≡ ¬Na ). Observe that (x1 , . . . , xn ) != (a1 , . . . , an ) is an abbreviation for (x1 != a1 ∨ · · · ∨ xn != an ). – One concept C[xc /s, y c /t] for each possible instantiation of s and t with the constants in Ext(V) ∪ {d}, with the proviso that s cannot coincide with any of the ai , for 1 ≤ i ≤ k (notice that the proviso applies only in the case where all variables in x occur in a cycle in the tuple-graph). The critical point in the above construction is how to express a universally quantiﬁed assertion ∀x.∀y.((x != a1 ∧ · · · ∧ x != ak ) → ¬conj (x, y)) If there are no cycles in the corresponding tuple-graph, then we can directly translate the assertion into a CIQ concept. As shown in the construction above,

Description Logics for Information Integration

55

dealing with a nonempty antecedent requires some special care to correctly encode the exceptions to the universal rule. Instead, if there is a cycle, due to the fundamental inability of CIQ to express that two role sequences meet in the same object, no CIQ concept can directly express the universal assertion. The same inability, however, is shared by DLR. Hence we can assume that the only cycles present in a model are those formed by the constants in the extension of the views or those in the tuple for which we are checking whether it is a certain answer of the query. And these are taken care of by the explicit instantiation. As the last step to obtain a CIQ concept, we need to encode object-names in CIQ. To do so we can exploit the construction used in [21] to encode CIQABoxes as concepts. Such a construction applies to the current case without any need of major adaptation. It is crucial to observe that the translation above uses object-names in order to form a sort of disjunction of ABoxes (cfr. [31]). In [7], the following basic fact is proved for the construction presented above. Let Cqa be the CIQ concept obtained by the construction above. Then d ∈ ans(Q, S, V) if and only if Cqa is unsatisﬁable. The size of Cqa is polynomial in the size of the query, of the view deﬁnitions, and of the inclusion assertions in S, and is at most exponential in the number of constants in ext(V) ∪ {d}. The exponential blow-up is due to the number of instantiations of C[xc /s, y c /t] with constants in ext (V) ∪ {d} that are needed to capture universally quantiﬁed assertions. Hence, considering EXPTIME-completeness of satisﬁability in DLR and in CIQ, we get that query answering using views in DLR is EXPTIME-hard and can be done in 2EXPTIME.

5

Related Work

We already observed that query answering using views can be seen as a form of reasoning with incomplete information. The interested reader is referred to [53] for a survey on this subject. We also observe that, to compute the whole set ans(Q, S, V), we need to run the algorithm presented above once for each possible tuple (of the arity of Q) of objects in the view extensions. Since we are dealing with incomplete information in a rich language, we should not expect to do much better than considering each tuple of objects separately. Indeed, in such a setting reasoning on objects, such as query answering, requires sophisticated forms of logical inference. In particular, verifying whether a certain tuple belongs to a query gives rise to a line of reasoning which may depend on the tuple under consideration, and which may vary substantially from one tuple to another. For simple languages we may indeed avoid considering tuples individually, as shown in [45] for query answering in the DL ALN without cyclic TBox assertions. Observe, however, that for such a DL, reasoning on objects is polynomial in both data and expression complexity [36,46], and does not require sophisticated forms of inference. Query answering using views has been investigated in the last years in the context of simpliﬁed frameworks. In [38,44], the problem has been studied for the

56

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini

case of conjunctive queries (with or without arithmetic comparisons), in [2] for disjunctive views, in [48,19,30] for queries with aggregates, in [23] for recursive queries and nonrecursive views, and in [11,12] for several variants of regular path queries. Comprehensive frameworks for view-based query answering, as well as several interesting results for various query languages, are presented in [29,1]. Query answering using views is tightly related to query rewriting [38,23,51]. In particular, [3] studies rewriting of conjunctive queries using conjunctive views whose atoms are DL concepts or roles (the DL used is less expressive thatn DLR). In general, a rewriting of a query with respect to a set of views is a function that, given the extensions of the views, returns a set of tuples that is contained in the answer set of the query with respect to the views. Usually, one ﬁxes a priori the language in which to express rewritings (e.g., unions of conjunctive queries), and then looks for the best possible rewriting expressible in such a language. On the other hand, we may call perfect a rewriting that returns exactly the answer set of the query with respect to the views, independently of the language in which it is expressed. Hence, if an algorithm for answering queries using views exists, it can be viewed as a perfect rewriting [13,14]. The results presented here show the existence of perfect, and hence maximal, rewritings in a setting where the mediated schema, the views, and the query are expressed in DLR.

6

Conclusions

We have illustrated a logic-based framework for data integration, and in particular for the problem of query answering using views in a data integration system. We have addressed the problem for the case of non-recursive datalog queries posed to a mediated schema expressed in DLR. We have considered different assumptions on the view extensions (sound, complete, and exact), and we have presented a technique that solves the problem in 2EXPTIME worst case computational complexity. We have seen in the previous section that an algorithm for answering queries using views is in fact a perfect rewriting. For the setting presented here, it remains open to ﬁnd perfect rewritings expressed in a more declarative query language. Moreover it is of interest to ﬁnd maximal rewritings belonging to well behaved query languages, in particular, languages with polynomial data complexity, even though we already know that such rewritings cannot be perfect [13].

Acknowledgments The work presented here was partly supported by the ESPRIT LTR Project No. 22469 DWQ – Foundations of Data Warehouse Quality, and by MURST Coﬁn 2000 D2I – From Data to Integration. We wish to thank all members of the projects. Also, we thank Daniele Nardi, Riccardo Rosati, and Moshe Y. Vardi, who contributed to several ideas illustrated in the chapter.

Description Logics for Information Integration

57

References 1. Serge Abiteboul and Oliver Duschka. Complexity of answering queries using materialized views. In Proc. of the 17th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS’98), pages 254–265, 1998. 2. Foto N. Afrati, Manolis Gergatsoulis, and Theodoros Kavalieros. Answering queries using materialized views with disjunction. In Proc. of the 7th Int. Conf. on Database Theory (ICDT’99), volume 1540 of Lecture Notes in Computer Science, pages 435–452. Springer-Verlag, 1999. 3. Catriel Beeri, Alon Y. Levy, and Marie-Christine Rousset. Rewriting queries using views in description logics. In Proc. of the 16th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS’97), pages 99–108, 1997. 4. Mokrane Bouzeghoub and Maurizio Lenzerini. Special issue on data extraction, cleaning, and reconciliation. Information Systems, 26(8), pages 535–536, 2001. 5. Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini. Conjunctive query containment in Description Logics with n-ary relations. In Proc. of the 1997 Description Logic Workshop (DL’97), pages 5–9, 1997. 6. Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini. On the decidability of query containment under constraints. In Proc. of the 17th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS’98), pages 149–158, 1998. 7. Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini. Answering queries using views over description logics knowledge bases. In Proc. of the 17th Nat. Conf. on Artificial Intelligence (AAAI 2000), pages 386–391, 2000. 8. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, Daniele Nardi, and Riccardo Rosati. Description logic framework for information integration. In Proc. of the 6th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR’98), pages 2–13, 1998. 9. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, Daniele Nardi, and Riccardo Rosati. Information integration: Conceptual modeling and reasoning support. In Proc. of the 6th Int. Conf. on Cooperative Information Systems (CoopIS’98), pages 280–291, 1998. 10. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, Daniele Nardi, and Riccardo Rosati. Data integration in data warehousing. Int. J. of Cooperative Information Systems, 10(3), pages 237–271, 2001. 11. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. Answering regular path queries using views. In Proc. of the 16th IEEE Int. Conf. on Data Engineering (ICDE 2000), pages 389–398, 2000. 12. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. Query processing using views for regular path queries with inverse. In Proc. of the 19th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS 2000), pages 58–66, 2000. 13. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. View-based query processing and constraint satisfaction. In Proc. of the 15th IEEE Symp. on Logic in Computer Science (LICS 2000), pages 361–371, 2000. 14. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. What is query rewriting? In Proc. of the 7th Int. Workshop on Knowledge Representation meets Databases (KRDB 2000), pages 17–27. CEUR Electronic Workshop Proceedings, http://sunsite.informatik.rwth-aachen.de/Publications/ CEUR-WS/Vol-29/, 2000.

58

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini

15. Diego Calvanese, Maurizio Lenzerini, and Daniele Nardi. Description logics for conceptual data modeling. In Jan Chomicki and G¨ unter Saake, editors, Logics for Databases and Information Systems, pages 229–264. Kluwer Academic Publisher, 1998. 16. Tiziana Catarci and Maurizio Lenzerini. Representing and using interschema knowledge in cooperative information systems. J. of Intelligent and Cooperative Information Systems, 2(4):375–398, 1993. 17. S. Chaudhuri, S. Krishnamurthy, S. Potarnianos, and K. Shim. Optimizing queries with materialized views. In Proc. of the 11th IEEE Int. Conf. on Data Engineering (ICDE’95), Taipei (Taiwan), 1995. 18. P. P. Chen. The Entity-Relationship model: Toward a uniﬁed view of data. ACM Trans. on Database Systems, 1(1):9–36, March 1976. 19. Sara Cohen, Werner Nutt, and Alexander Serebrenik. Rewriting aggregate queries using views. In Proc. of the 18th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS’99), pages 155–166, 1999. 20. Giuseppe De Giacomo and Maurizio Lenzerini. What’s in an aggregate: Foundations for description logics with tuples and sets. In Proc. of the 14th Int. Joint Conf. on Artificial Intelligence (IJCAI’95), pages 801–807, 1995. 21. Giuseppe De Giacomo and Maurizio Lenzerini. TBox and ABox reasoning in expressive description logics. In Luigia C. Aiello, John Doyle, and Stuart C. Shapiro, editors, Proc. of the 5th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR’96), pages 316–327. Morgan Kaufmann, Los Altos, 1996. 22. Francesco M. Donini, Maurizio Lenzerini, Daniele Nardi, and Andrea Schaerf. ALlog: Integrating Datalog and description logics. J. of Intelligent Information Systems, 10(3):227–252, 1998. 23. Oliver M. Duschka and Michael R. Genesereth. Answering recursive queries using views. In Proc. of the 16th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS’97), pages 109–116, 1997. 24. Ramez A. ElMasri and Shamkant B. Navathe. Fundamentals of Database Systems. Benjamin and Cummings Publ. Co., Menlo Park, California, 1988. 25. M. Fattorosi-Barnaba and F. De Caro. Graded modalities I. Studia Logica, 44:197– 221, 1985. 26. Michael J. Fischer and Richard E. Ladner. Propositional dynamic logic of regular programs. J. of Computer and System Sciences, 18:194–211, 1979. 27. Daniela Florescu, Alon Y. Levy, Ioana Manolescu, and Dan Suciu. Query optimization in the presence of limited access patterns. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 311–322, 1999. 28. Helena Galhardas, Daniela Florescu, Dennis Shasha, and Eric Simon. An extensible framework for data cleaning. Technical Report 3742, INRIA, Rocquencourt, 1999. 29. G¨ osta Grahne and Alberto O. Mendelzon. Tableau techniques for querying information sources through global schemas. In Proc. of the 7th Int. Conf. on Database Theory (ICDT’99), volume 1540 of Lecture Notes in Computer Science, pages 332– 347. Springer-Verlag, 1999. 30. St´ephane Grumbach, Maurizio Rafanelli, and Leonardo Tininini. Querying aggregate data. In Proc. of the 18th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS’99), pages 174–184, 1999. 31. Ian Horrocks, Ulrike Sattler, Sergio Tessaris, and Stephan Tobies. Query containment using a DLR ABox. Technical Report LTCS-Report 99-15, RWTH Aachen, 1999.

Description Logics for Information Integration

59

32. Michael N. Huhns, Nigel Jacobs, Tomasz Ksiezyk, Wei-Min Shen an Munindar P. Singh, and Philip E. Cannata. Integrating enterprise information models in Carnot. In Proc. of the Int. Conf. on Cooperative Information Systems (CoopIS’93), pages 32–42, 1993. 33. R. B. Hull and R. King. Semantic database modelling: Survey, applications and research issues. ACM Computing Surveys, 19(3):201–260, September 1987. 34. Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou, and Panos Vassiliadis, editors. Fundamentals of Data Warehouses. Springer-Verlag, 1999. 35. Dexter Kozen and Jerzy Tiuryn. Logics of programs. In Jan van Leeuwen, editor, Handbook of Theoretical Computer Science — Formal Models and Semantics, pages 789–840. Elsevier Science Publishers (North-Holland), Amsterdam, 1990. 36. Maurizio Lenzerini and Andrea Schaerf. Concept languages as query languages. In Proc. of the 9th Nat. Conf. on Artificial Intelligence (AAAI’91), pages 471–476, 1991. 37. Alon Y. Levy. Obtaining complete answers from incomplete databases. In Proc. of the 22nd Int. Conf. on Very Large Data Bases (VLDB’96), pages 402–412, 1996. 38. Alon Y. Levy, Alberto O. Mendelzon, Yehoshua Sagiv, and Divesh Srivastava. Answering queries using views. In Proc. of the 14th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS’95), pages 95–104, 1995. 39. Alon Y. Levy and Marie-Christine Rousset. CARIN: A representation language combining Horn rules and description logics. In Proc. of the 12th Eur. Conf. on Artificial Intelligence (ECAI’96), pages 323–327, 1996. 40. Alon Y. Levy, Divesh Srivastava, and Thomas Kirk. Data model and query evaluation in global information systems. J. of Intelligent Information Systems, 5:121– 143, 1995. 41. Chen Li and Edward Chang. Query planning with limited source capabilities. In Proc. of the 16th IEEE Int. Conf. on Data Engineering (ICDE 2000), pages 401–412, 2000. 42. Chen Li and Edward Chang. On answering queries in the presence of limited access patterns. In Proc. of the 8th Int. Conf. on Database Theory (ICDT 2001), 2001. 43. Chen Li, Ramana Yerneni, Vasilis Vassalos, Hector Garcia-Molina, Yannis Papakonstantinou, Jeﬀrey D. Ullman, and Murty Valiveti. Capability based mediation in TSIMMIS. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 564–566, 1998. 44. Anand Rajaraman, Yehoshua Sagiv, and Jeﬀrey D. Ullman. Answering queries using templates with binding patterns. In Proc. of the 14th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS’95), 1995. 45. Marie-Christine Rousset. Backward reasoning in ABoxes for query answering. In Proc. of the 1999 Description Logic Workshop (DL’99), pages 18–22. CEUR Electronic Workshop Proceedings, http://sunsite.informatik.rwth-aachen. de/Publications/CEUR-WS/Vol-22/, 1999. 46. Andrea Schaerf. Query Answering in Concept-Based Knowledge Representation Systems: Algorithms, Complexity, and Semantic Issues. PhD thesis, Dipartimento di Informatica e Sistemistica, Universit` a di Roma “La Sapienza”, 1994. 47. Klaus Schild. A correspondence theory for terminological logics: Preliminary report. In Proc. of the 12th Int. Joint Conf. on Artificial Intelligence (IJCAI’91), pages 466–471, Sydney (Australia), 1991. 48. D. Srivastava, S. Dar, H. V. Jagadish, and A. Levy. Answering queries with aggregation using views. In Proc. of the 22nd Int. Conf. on Very Large Data Bases (VLDB’96), pages 318–329, 1996.

60

Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini

49. Stephan Tobies. The complexity of reasoning with cardinality restrictions and nominals in expressive description logics. J. of Artificial Intelligence Research, 12:199–217, 2000. 50. O. G. Tsatalos, M. H. Solomon, and Y. E. Ioannidis. The GMAP: A versatile tool for phyisical data independence. Very Large Database J., 5(2):101–118, 1996. 51. Jeﬀrey D. Ullman. Information integration using logical views. In Proc. of the 6th Int. Conf. on Database Theory (ICDT’97), volume 1186 of Lecture Notes in Computer Science, pages 19–40. Springer-Verlag, 1997. 52. Wiebe Van der Hoek and Maarten de Rijke. Counting objects. J. of Logic and Computation, 5(3):325–345, 1995. 53. Ron van der Meyden. Logical approaches to incomplete information. In Jan Chomicki and G¨ unter Saake, editors, Logics for Databases and Information Systems, pages 307–356. Kluwer Academic Publisher, 1998. 54. Jennifer Widom. Special issue on materialized views and data warehousing. IEEE Bulletin on Data Engineering, 18(2), 1995. 55. Ramana Yerneni, Chen Li, Hector Garcia-Molina, and Jeﬀrey D. Ullman. Computing capabilities of mediators. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 443–454, 1999. 56. Ramana Yerneni, Chen Li, Jeﬀrey D. Ullman, and Hector Garcia-Molina. Optimizing large join queries in mediation systems. In Proc. of the 7th Int. Conf. on Database Theory (ICDT’99), pages 348–364, 1999.

Search and Optimization Problems in Datalog Sergio Greco1,2 and Domenico Sacc` a1,2 1

DEIS, Univ. della Calabria, 87030 Rende, Italy 2 ISI-CNR, 87030 Rende, Italy {greco,sacca}@deis.unical.it

Abstract. This paper analyzes the ability of DATALOG languages to express search and optimization problems. It is ﬁrst shown that NP search problems can be formulated as unstratiﬁed DATALOG queries under nondeterministic stable model semantics so that each stable model corresponds to a possible solution. NP optimization problems are then formulated by adding a max (or min) construct to select the stable model (thus, the solution) which maximizes (resp., minimizes) the result of a polynomial function applied to the answer relation. In order to enable a simpler and more intuitive formulation for search and optimization problems, it is introduced a DATALOG language in which the use of stable model semantics is disciplined to refrain from abstruse forms of unstratiﬁed negation. The core of our language is stratiﬁed negation extended with two constructs allowing nondeterministic selections and with query goals enforcing conditions to be satisﬁed by stable models. The language is modular as the level of expressivity can be tuned and selected by means of a suitable use of the above constructs, thus capturing signiﬁcant subclasses of search and optimization queries.

1

Introduction

DATALOG is a logic-programming language that was designed for database applications, mainly because of its declarative style and its ability to express recursive queries[3,32]. Later DATALOG has been extended along many directions (e.g., various forms of negations, aggregate predicates and set constructs) to enhance its expressive power. In this paper we investigate the ability of DATALOG languages to express search and optimization problems. We recall that, given an alphabet Σ, a search problem is a partial multivalued function f , deﬁned on some (not necessarily proper) subset of Σ ∗ , say dom(f ), which maps every string x of dom(f ) into a number of strings y1 , · · · , yn (n > 0), thus f (x) = {y1 , · · · , yn }. The function f is therefore represented by the following relation on Σ ∗ ×Σ ∗ : graph(f ) = {(x, y)| x ∈ dom(x) and y ∈ f (x)}. We say that graph(f ) is polynomially balanced if for each (x, y) in graph(f ), the size of y is polynomially bounded in the size of x. NP search problems are those functions

Work partially supported by the Italian National Research Council (CNR) and by MURST (projects DATA-X and D2I).

A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 61–82, 2002. c Springer-Verlag Berlin Heidelberg 2002

62

Sergio Greco and Domenico Sacc` a

f for which both graph(f ) is polynomially balanced and graph(f ) is in NP, i.e., given x, y ∈ Σ ∗ , deciding whether (x, y) ∈ graph(f ) is in NP. In this paper we show that NP search problems can be formulated as DATALOG¬ (i.e., DATALOG with unstratiﬁed negation) queries under the nondeterministic version of total stable model semantics[11], thus the meaning of a DATALOG¬ program is given by any stable model. As an example of the language, take the Vertex Cover problem: given a graph G = (V, E), ﬁnd a vertex cover — a subset V of V is a vertex cover of G if for each pair edge (x, y) in E either x or y is in V . The problem can be formulated by the query Pvc , v (X) where Pvc is the following DATALOG¬ program: v (X) ← v(X), ¬v”(X). v”(X) ← v(X), ¬v (X). no cover ← e(X, Y), ¬v (X), ¬v (Y). refuse no cover ← no cover, ¬refuse no cover. The predicates v and e deﬁne the vertices and the edges of the graph by means of a suitable number of facts. The last rule enforces that every total stable model correspond to some vertex cover (otherwise no cover would be true and, then, the atom ref use no cover would result undeﬁned). In order to enable a simpler and more intuitive formulation of search problems, we introduce a DATALOG language where the usage of stable model semantics is disciplined to avoid both undeﬁnedness and unnecessary computational complexity, and to refrain from abstruse forms of unstratiﬁed negation. Thus the core of our language is stratiﬁed negation extended with two constructs (choice and subset) allowing nondeterministic selections and an additional ground goal (called constraint goal) in the query, enforcing conditions to be satisﬁed by stable models. For instance, the above query can be formulated as Pvc , !¬no cover, v (X) where Pvc is the following stratiﬁed DATALOG¬ program with a subset construct to nondeterministically select a subset of the vertices: v (X) ⊆ v(X). no cover ← e(X, Y), ¬v (X), ¬v (Y). The constraint goal !¬no cover speciﬁes that only those stable models by which ¬no cover is made true are to be taken into consideration. The expressive power (and the complexity as well) of the language gradually increases by moving from the basic language (stratiﬁed DATALOG¬ ) up to the whole repertoire of additional constructs. Observe that, if we do not add any constraint goal in the query, the query reduces to a stratiﬁed program with additional constructs for nondeterministic selections, which cannot be eventually retracted, thus avoiding exponential explosion of the search space. For example, the query Pst , st(X, Y ) , where Pst is deﬁned below, computes a spanning tree of the graph G in polynomial time: st(nil, X) ← v(X), choice((), (X)). st(X, Y) ← st(Z, X), e(X, Y), st(nil, Z), Y = Z, Y = X, choice((X), (Y)).

Search and Optimization Problems in Datalog

63

The ﬁrst choice selects any vertex of the graph as the root of the tree; the second choice selects one vertex y at a time to be added to the current spanning tree st so that y is connected to exactly one vertex x of st, thus preserving the tree structure. Polynomial-time computation is guaranteed since nondeterministic selections made by the choice constructs cannot be eventually discarded because there is no constraint goal to satisfy as in the example of vertex cover. Observe that also a vertex cover can be computed in polynomial time; thus we may rewrite the above query using the choice construct without constraint goal so that polynomial-time computation is guaranteed. Obviously, this kind of rewriting is not feasible for all NP search queries as they can be NP hard. In the paper we characterize various classes of search queries, including tractable classes (for which an answer can be computed in polynomial time), and we show how such classes can be captured by a suitably disciplined usage of our DATALOG¬ language. In the paper we also deal with the issue of formulating optimization problems. We recall that an optimization (min or max) problem, associated to a search problem f , is a function g such that dom(g) = dom(f ) and for each x ∈ dom(g), g(x) = {y| y ∈ f (x) and for each other y ∈ f (x), |y| ≤ |y | (or |y| ≥ |y | if is a maximization problem)}. The optimization problems associated to NP search problems are called NP optimization problems. We show that NP optimization problems can be formulated as DATALOG¬ queries under the non deterministic version of total stable model semantics by using a max (or min) construct to select the model which maximizes (resp., minimizes) the cardinality of the answer relation. As an example of the language, take the Min Vertex Cover problem: given a graph G = (V, E), ﬁnd the vertex cover with minimal cardinality. The problem can be formulated by the query Pvc , !¬no cover, min(v (X)) where Pvc is the above program. The goal min(v (X)) further restricts the set of suitable stable models to those for which the subset of nodes v is minimum. The advantage of expressing NP search and NP optimization problems by using rules with built-in predicates rather than standard DATALOG¬ rules, is that the use of built-in atoms preserves simplicity and intuition in expressing problems and permits to perform query optimization. The language is ‘modular’ in the sense that the desired level of expressivity is achieved by enabling the constructs for non-stratiﬁed negation only when needed; in particular, if no constraint goal and min/max goal are used then polynomial time computation is guaranteed. The paper is organized as follows. In Section 2 we introduce search and optimization queries and provide a formal ground for their classiﬁcation using results from complexity theory on multivalued functions. In Section 3 we prove that NP search queries coincide with DATALOG¬ queries under nondeterministic total stable model semantics. We also introduce the min/max goal to capture NP optimization queries. In order to capture meaningful subclasses of NP search and optimization queries, in Section 4 we then present our language, called DATALOG¬s ,c , and we show its ability of expressing tractable NP search problems. We also prove that optimization problems can be hard also when associated to

64

Sergio Greco and Domenico Sacc` a

tractable search problems. This explains the renewed attention [26,25,19,20,6,7] towards optimization problems, mainly with the aim of characterizing classes of problems that are constant or log approximable (i.e., there is a polynomial time algorithm that approximates the optimum value of the problem within a factor that is respectively constant or logarithmic in the size of the input). In Section 5 we introduce suitable restrictions to DATALOG¬s ,c in order to capture NP optimization subclasses that are approximable and present meaningful examples. We draw conclusions and discuss further work in Section 6.

2

Search and Optimization Queries

We assume that the reader is familiar with the basic terminology and notation of relational databases and of database queries [3,18,32]. A relational database scheme DB over a ﬁxed countable domain U is a set of relation symbols {r1 , ..., rk } where each ri has a given arity, denoted by |ri |. A database D on DB is a ﬁnite structure (A, R1 , ..., Rk ) where A ⊆ U is the active domain and Ri ⊆ A|ri | are the (ﬁnite) relations of the database, one for each relation scheme ri — we denote A by U (D) and Ri by D(ri ). We assume that a database is suitably encoded by a string and the recognition of whether a string represents a database on DB is done in polynomial time. Definition 1. Given a database scheme DB and an additional relation symbol f (the query goal), a search query N Q = DB, f is a (possibly partial) multivalued recursive function which maps every database D on DB to a ﬁnite, non-empty set of ﬁnite (possibly empty) relations F ⊆ U (D)|f | and is invariant under an isomorphism on U − W , where W is any ﬁnite subset of U (i.e., the function is W -generic). Thus N Q(D) yields a set of relations on the goal, that are the answers of the query; the query has no answer if this set is empty or the function is not deﬁned on D. 2 The class of all search queries is denoted by NQ. In classifying query classes, we shall refer to the following complexity classes of languages: the class P (languages that are recognized by deterministic Turing machines in polynomial time), the class NP (languages that are recognized by nondeterministic Turing machines in polynomial time), and the class coNP (the complement of NP) — the reader can refer to [10,17,24] for excellent sources of information on this subject. As search queries correspond to functions rather than to languages as it instead happens for boolean queries, we next introduce, for their classiﬁcation, some background on complexity of functions (for a more comprehensive description of this topic we address readers to [30,31,9]). Let a ﬁnite alphabet Σ with at least two elements be given. A partial multivalued (MV) function f : Σ ∗ → Σ ∗ associates zero, one or several outcomes (outputs) to each input string. Let f (x) stands for the set of possible results of f on an input string x; thus, we write y ∈ f (x) if y is a value of f on the input string x. Deﬁne dom(f ) = {x | ∃y(y ∈ f (x))} and graph(f ) = { x, y | x ∈

Search and Optimization Problems in Datalog

65

dom(f ), y ∈ f (x)}. If x ∈dom(f ), we will say that f is undeﬁned at x. It is now clear that a search query is indeed a computable MV function: the input x is a suitable encoding of a database D and each output string y encodes an answer of the query. A computable (i.e., partial recursive) MV function f is computed by some Turing transducer, i.e., a (deterministic or not) Turing machine T which, in addition to accept any string x ∈ dom(f ), writes a string y ∈ f (x) on an output tape before entering the accepting state. So, if x ∈ dom(f ), the set of all strings that are written in all accepting computations is f (x); on the other hand, if x ∈ dom(f ), T never enters the accepting state. Given two MV functions f and g, deﬁne g to be a reﬁnement of f if dom(g) = dom(f ) and graph(g) ⊆ graph(f ). Moreover, given a class G of MV functions, we say that f ∈c G if G contains a reﬁnement of f . For a class of MV functions F , deﬁne F ⊆c G if, for all f ∈ F , f ∈c G. Since we are in general interested in ﬁnding any output of a MV function, an important practical question is whether an output can be eﬃciently computed by means of a polynomial-time, singlevalued function. In other terms, desirable MV function classes are those which are reﬁned by PF, where PF is the class of all functions that are computed by deterministic polynomial-time transducers. Let us now recall some important classes of MV functions. A MV function f is polynomially balanced if, for each x, the size of each result in f (x) is polynomially bounded in the size of x. The class NPMV is deﬁned as the set of all MV functions f such that (i) f is polynomially balanced, and (ii) graph(f ) is in NP. By analogy, the classes NPMV g and coNPMV are deﬁned as the classes of all polynomiallybalanced multivalued functions f for which graph(f ) is respectively in P and in coNP. Observe that NPMV consists of all MV functions that are computed by nondeterministic transducers in polynomial time [30]. Definition 2. 1. NQPMV (resp., NQPMV g and coNQPMV) is the class of all search queries which are in NPMV (resp., NPMV g and coNPMV) — we shall also call the queries in this class NP search queries; 2. NQPTIME is the class of all queries that are computed by a nondeterministic polynomial-time transducer for which every computation path ends in an accepting state; 2 3. NQPTIME g is equal to NQPTIME ∩ NQPMV g . Observe that a query N Q = DB, f is in NQPMV (resp., NQPMV g and coNQPMV) if and only if for each database D on DB and for each relation F on f , deciding whether F is in N Q(D) is in NP (resp., in P and in coNP). We stress that NQPMV is diﬀerent from the class NQPTIME ﬁrst introduced in [1,2] — in fact, the latter class consists of all queries in NQPMV for which acceptance is guaranteed no matter which nondeterministic moves are guessed by the transducer.

66

Sergio Greco and Domenico Sacc` a

We next present some results on whether the above query classes can be reﬁned by PF, thus whether a query answer in these classes can be computed in deterministic polynomial time — the results have been proven in [21]. Fact 1 [21] 1. NQPMV g ⊆ (NQPMV ∩ coNQPMV) and the inclusion is strict unless P = NP; 2. neither NQPMV ⊆ coNQPMV nor coNQPMV ⊆ NQPMV unless NP = coNP; 3. NQPTIME ⊂ NQPMV, NQPTIME ⊆ coNQPMV unless NP = coNP, and NQPTIME g ⊆ NQPTIME and the inclusion is strict unless P = NP; 2 4. NQPTIME ⊆c PF and NQPMV g ⊆c PF unless P = NP. It turns out that queries in NQPTIME and NQPTIME g can be eﬃciently computed whereas queries in the other classes may not. Observe that queries in NQPTIME have a strange anomaly: computing an answer can be done in polynomial time, but testing whether a given relation is an answer cannot (unless P = NP). This anomaly does not occur in the class NQPTIME g which, therefore, turns out to be very desirable. Example 1. Let a database scheme DBG = {v, e} represent a directed graph G = (V, E) such that v has arity 1 and deﬁnes the nodes while e has arity 2 and deﬁnes the edges. We recall that a kernel is a subset V of V such that (i) no two nodes in V are joined by an edge and (ii) for each node x not in V , there is a node y in V for which (y, x) ∈ E. – N QKernel is the query which returns the kernels of the input graph G; if the graph has no kernel then the query is not deﬁned. The query is in NQPMV g , but an answer cannot be computed in polynomial time unless P = NP since deciding whether a graph has a kernel is NP-complete [10]. – N QSubKernel is the query that, given an input graph G, returns any subset of some kernel of G. This query is in NQPMV, but neither in NQPMV g (unless P = NP) nor in coNQPMV (unless NP = coNP). – N QN odeN oK is the query that, given an input graph G, returns a node not belonging to any kernel of G. This query is in coNQPMV, but not in NQPMV (unless NP = coNP). – N Q01K is the query that, given a graph G, returns the relation {0} if G has no kernel, {1} if every subset of nodes of G is a kernel, both relations {0} and {1} otherwise. Clearly, the query is in NQPTIME : indeed it is easy to construct a non-deterministic polynomial-time transducer which ﬁrst nondeterministically generates any subset of nodes of G and then outputs {1} or {0} according to whether this subset is a kernel or not. The query is not in NQPTIME g otherwise we could check in polynomial time if a graph has a kernel – as the graph has a kernel iﬀ {1} is a result of N Q01K – and, therefore, P would coincide with NP. – N QCUT is the query which returns a subset E of the edges such that the 2 graph G = (V, E ) is 2-colorable. The query is in NQPTIME g .

Search and Optimization Problems in Datalog

67

According to Fagin’s well-known result [8], a class of ﬁnite structures is NPrecognizable iﬀ it is deﬁnable by a second order existential formula, thus queries in NQPMV may be expressed as follows. Fact 2 Let N Q = DB, f be a search query in NQPMV, then there is a sequence S of relation symbols s1 , . . . , sk , distinct from those in DB ∪ {f }, and a closed ﬁrst-order formula φ(DB, f, S) such that for each database D on DB, N Q(D) = 2 { F : F ⊆ U (D)|f | , Si ⊆ U (D)|si | (1 ≤ i ≤ k), and φ(D, F, S) is true }. From now on, we shall formulate a query in NQPMV as N Q = { f : (DB, f, S) |= φ(DB, f, S) }. Example 2. CUT. The query N QCUT of Example 1 can be deﬁned as follows: { e : (DB G , e , s) |= (∀x, y)[e (x, y) → ( (e(x, y) ∧ s(x) ∧ ¬s(y)) ∨(e(x, y) ∧ ¬s(x) ∧ s(y)) ) ] }.

2

Example 3. KERNEL. The query N QKernel of Example 1 can be deﬁned as: { v : (DB G , v ) |= (∀x) [ (v (x) ∧ ∀y(¬v (y) ∨ ¬e(x, y))) ∨(¬v (x) ∧ ∃y(v (y) ∧ e(y, x))) ] }

2

Definition 3. Given a search query N Q = DB, f , an optimization query OQ = opt(N Q) = DB, opt(f ) , where opt is either max or min, is a search query reﬁning N Q such that for each database D on DB for which N Q is deﬁned, OQ(D) = opt|F | {F : F ∈ N Q(D)} — i.e., OQ(D) consists of the answers in N Q(D) with the maximum or minimum (resp., if opt = max or min) cardinality. The query N Q is called the search query associated to OQ and the relations in N Q(D) are the feasible solutions of OQ. The class of all optimization queries is denoted by OPT NQ. Given a search class QC, the class of all queries whose search queries are in QC is denoted by OPT QC. The queries in the class OPT NQPMV are called NP optimization 2 queries. Proposition 1. Let OQ = DB, opt|f | be an optimization query, then the following statements are equivalent: 1. OQ is in OPT NQPMV. 2. There is a closed ﬁrst-order formula φ(DB, f, S) over relation symbols DB ∪ {f } ∪ S such that OQ = opt|f | {f : (DB, f, S) |= φ(DB, f, S)}. 3. There is a ﬁrst-order formula φ(w, DB, S), where w is a a(f )-tuple of distinct variables, such that the relation symbols are those in DB∪S, the free variables are exactly those in w, and OQ = opt|w| {w : (DB, S) |= φ(w, DB, S)}).

68

Sergio Greco and Domenico Sacc` a

PROOF. The equivalence of statements (1) and (2) is obvious. Clearly optimization formulae deﬁned in Item 2 (called feasible in [20]) are a special case of ﬁrst order optimization formulae deﬁned in Item 3 which deﬁne the class OPT PB, of all optimization problems that can be logically deﬁned. Moreover, in [20] it has been shown that the class OPT PB, can be expressed by means of 2 feasible optimization ﬁrst order formulae. The above results pinpoint that the class OPT NQPMV corresponds to the class OPT PB of all optimization problems that can be logically deﬁned [19,20]. For simplicity, but without substantial loss of generality, we use as objective function the cardinality rather than a generic polynomial-time computable function. Moreover, we output the relation with the optimal cardinality rather than just the cardinality. Example 4. MAX-CUT. The problem consists in ﬁnding the cardinality of the largest cut in the graph G = (V, E). The query coincides with max(N Qcut ) (see Example 2) and can also be deﬁned as: max({ (x, y) : (DBG , s) |= [(e(x, y) ∧ s(x) ∧ ¬s(y)) ∨ (e(x, y) ∧ ¬s(x) ∧ s(y))]}). The query is an NP maximization query.

2

Example 5. MIN-KERNEL. In this case we want to ﬁnd the minimum cardinality of the kernels of a graph G = (V, E). The query is min(N Qkernel ) (see Example 3) and can be equivalently deﬁned as: min({ w : (DB G , v ) |= v (w) ∨ ¬(∀x) [ (v (x) ∧ ∀y(¬v (y) ∨ ¬e(x, y))) ∨(¬v (x) ∧ ∃y(v (y) ∧ e(y, x))) ] }) This query is a NP minimization query.

2

Finally, note that the query max(N QKernel ) equals the query max(N QSubKernel ) although their search queries are distinct. The following results show that in general optimization queries are much harder than search queries, e.g., they cannot be solved in polynomial time even when the associated query is in NQPTIME g . Proposition 2. 1. neither OPT NQPMV ⊆ coNQPMV nor OPT NQPMV ⊆ NQPMV unless NP = coNP; 2. neither OPT coNQPMV ⊆ coNQPMV nor OPT coNQPMV ⊆ NQPMV unless NP = coNP; 3. OPT NQPMV g ⊂ coNQPMV and OPT NQPMV g ⊆ NQPMV unless NP = coNP; 4. neither OPT NQPTIME ⊆ coNPMV nor OPT NQPTIME ⊆ NQPMV unless NP = coNP; 5. OPT NQPTIME g ⊂ coNQPMV and OPT NQPTIME g ⊆ NQPMV g unless P = NP.

Search and Optimization Problems in Datalog

69

PROOF. 1. Let max Q be a query in MAX NQPMV — the same argument would hold also for a minimization query. Then, given a database D, to decide whether a relation f is an answer of max Q(D), ﬁrst we have to test whether f is an answer of Q(D) and, then, we must verify that there is no other answer of Q(D) with fewer tuples than f . As the former test is in NP and the latter test is in coNP, it is easy to see that deciding whether f is an answer of max Q(D) is neither in NP nor in coNP unless NP=coNP — indeed it is in the class DP [24]. 2. Let us now assume that the query in the proof of part (1) is in MAX coNQPMV. Then testing whether f is an answer of Q(D) is in coNP whereas verifying that there is no other answer of Q(D) with fewer tuples than f is in coNP NP , that is a class at the second level of the polynomial hierarchy [24]. 3. Suppose now that the query in the proof of part (1) is in MAX NQPMV g . Then testing whether f is an answer of Q(D) is in P whereas verifying that there is no other answer of Q(D) with fewer tuples than f is in coNP. 4. Take any query max Q in MAX NQPMV. We construct the query Q by setting Q (D) = Q(D) ∪ {∅} for each D. Then Q is in NQPTIME as the transducer for Q can now accept on every branch by eventually returning the empty relation. It is now easy to see that the complexity of ﬁnding the maximum answer for Q is in general the same of ﬁnding the maximum answer for Q. So the results follow from part (1). 5. OPT NQPTIME g ⊂ coNQPMV follows from part (3) as NQPTIME g ⊂ NQPMV g by deﬁnition. Consider now the query Q returning a maximal clique (i.e., a clique which is not contained in another one) of an undirected graph. Q is obviously in NQPTIME g as a maximal clique can be constructed by selecting any node and adding additional nodes as long as the clique property is preserved. We have that max Q is the query returning the maximum clique in a graph (i.e., the maximal clique with the maximum number of 2 nodes) which is known to be NP-hard.

3

Search and Optimization Queries in DATALOG

We assume that the reader is familiar with basic notions of logic programming and DATALOG¬ [3,22,32]. A program P is a ﬁnite set of rules r of the form H(r) ← B(r), where H(r) is an atom (head of the rule) and B(r) is a conjunction of literals (body of the rule). A rule with empty body is called a fact. The ground instantiation of P is denoted by ground(P ); the Herbrand universe and the Herbrand base of P are denoted by UP and BP , respectively. An interpretation I ⊆ BP is a T-stable (total stable) model [11] if I = T∞ pos(P,I) (∅), where T is the classical immediate consequence transformation and pos(P, I) denotes the positive logic program that is obtained from ground(P )

70

Sergio Greco and Domenico Sacc` a

by (i) removing all rules r such that there exists a negative literal ¬A in B(r) and A is in I, and (ii) by removing all negative literals from the remaining rules. It is well-known that a program may have n T-stable models with n ≥ 0. Given a program P and two predicate symbols p and q, we write p → q if there exists a rule where q occurs in the head and p in the body or there exists a predicate s such that p → s and s → q. A program is stratiﬁed if there exists no rule where a predicate p occurs in a negative literal in the body, q occurs in the head and q → p, i.e. there is no recursion through negation [5]. Stratiﬁed programs have a unique stable model which coincides with the stratiﬁed model, obtained by partitioning the program into an ordered number of suitable subprograms (called ’strata’) and computing the ﬁxpoints of every stratum in their order [5]. A DATALOG¬ program is a logic program with negation in the rule bodies, but without functions symbols. Predicate symbols can be either extensional (i.e. deﬁned by the facts of a database — EDB predicate symbols) or intensional (i.e. deﬁned by the rules of the program — IDB predicate symbols). The class of all DATALOG¬ programs is simply called DATALOG¬ ; the subclass of all positive (resp. stratiﬁed) programs is called DATALOG (resp. DATALOG¬s ). A DATALOG¬ program P has associated a relational database scheme DB P , which consists of all EDB predicate symbols of P . We assume that possible constants in P are taken from the same domain U of DB P . Given a database D on DB P , the tuples of D are seen as facts added to P ; so P on D yields the following logic program PD = P ∪{q(t). : q ∈ DB P ∧t ∈ D(q)}. Given a T-stable model M of PD and a relation symbol r in PD , M (r) denotes the relation {t : r(t) ∈ M }. Definition 4. A DATALOG¬ search query P, f , where P is a DATALOG¬ program and f is an IDB predicate symbol of P , deﬁnes the query N Q = DB P , f such that for each D on DBP , N Q(D) = {M (f ) : M is a T-stable model of PD }. The set of all DATALOG¬ , DATALOG or DATALOG¬s search queries are denoted respectively by search(DATALOG¬ ), search(DATALOG) and search(DATALOG¬s ). The DATALOG¬ optimization query P, opt(f ) deﬁnes the optimization query opt(N Q). The set of all DATALOG¬ , DATALOG or DATALOG¬s optimization queries are denoted respectively by opt(DATALOG¬ ), opt(DATALOG) and opt(DATALOG¬s ). 2 Observe that, given a database D, if the program PD has no stable models then both search and optimization queries are not deﬁned on D. Proposition 3. 1. search(DATALOG¬ ) = NQPMV and opt(DATALOG¬ ) = OPT NQPMV; 2. search(DATALOG) ⊂ search(DATALOG¬s ) ⊂ NQPTIME g . PROOF. In [28] it has been shown that a database query N Q is deﬁned by a query in search(DATALOG¬ ) if and only if, for each input database, the answers of N Q are NP-recognizable. Hence search(DATALOG¬ ) = NQPMV and opt(DATALOG¬ ) = OPT NQPMV. Concerning part (2), observe that queries in search(DATALOG¬s ) are a proper subset of deterministic polynomial-time queries

Search and Optimization Problems in Datalog

71

[5] and then search(DATALOG¬s ) ⊂ NQPTIME g . Finally, the relationship search (DATALOG) ⊂ search(DATALOG¬s ) is well known in the literature [3]. 2 Note that search(DATALOG) = opt(DATALOG) and search(DATALOG¬s ) = opt (DATALOG¬s ) as the queries are deterministic. Example 6. Take the queries N Qcut and max(N Qcut ) of Examples 2 and 4, respectively. Consider the following DATALOG¬ program Pcut v (X) ← v(X), ¬^ v (X). v (X) ← v(X), ¬v (X). ^ e (X, Y) ← e(X, Y), v (X), ¬v (Y). e (X, Y) ← e(X, Y), ¬v (X), v (Y). We have that N Qcut = Pcut , e and max(N Qcut ) = Pcut , max(e ) .

2

Example 7. Take the queries N Qkernel and min(N Qkernel ) of Examples 3 and 5. Consider the following DATALOG¬ program Pkernel v (X) ← v(X), ¬^ v (X). v (X) ^ ← v(X), ¬v (X). joined to v (X) ← v (Y), e(Y, X). no kernel ← v (X), joined to v (X). no kernel ←^ v (X), ¬joined to v (X). constraint ← ¬no kernel, ¬constraint. We have that N Qkernel = Pkernel , v and min(N Qkernel ) = Pkernel , min(v ) . Observe that Pkernel has no T-stable model iﬀ N Qkernel is not deﬁned on D (i.e., there is no kernel). 2 The problem in using DATALOG¬ to express search and optimization problems is that the usage of unrestricted negation in programs is often neither simple nor intuitive and, besides, it does not allow to discipline the expressive power (e.g., the classes NQPTIMEand NQPTIME g are not captured). This situation might lead to write queries that have no total stable models or whose computation is hard even though the problem is not. On the other hand, as pointed out in Proposition 3, if we just use DATALOG¬s the expressive power is too low so that we cannot express simple polynomial-time problems. For instance, the query asking for a spanning tree of an undirected graph needs the use of a program with unstratiﬁed negation such as: (1) (2) (3) (4)

reached(a). reached(Y) ← spanTree(X, Y). spanTree(X, Y) ← reached(X), e(X, Y), Y = a, ¬ diffChoice(X, Y). diffChoice(X, Y) ← spanTree(Z, Y), Z = X.

But the freedom in the usage of negation may result in meaningless programs. For instance, in the above program, in an attempt to simplify it, one could decide to modify the third rule into

72

Sergio Greco and Domenico Sacc` a

(3 ) spanTree(X, Y) ← reached(X), arc(X, Y), Y = a, ¬ reached(Y). and remove the fourth rule. Then the resulting program will have no total stable models, thus loosing its practical meaning. Of course the risk of writing meaningless programs is present in any language, but this risk is much higher in a language with non-intuitive semantics as for unstratiﬁed negation. In the next section we propose a language where the usage of stable model semantics is disciplined to avoid both undeﬁnedness and unnecessary computational complexity, and to refrain from abstruse forms of unstratiﬁed negation. The core of the language is stratiﬁed DATALOG extended with only one type of non-stratiﬁed negation, hardwired into two ad-hoc constructs. The disciplined structure of negation in our language will enable us to capture interesting subclasses of NQPMV.

4

Datalog Languages for Search and Optimization Problems

In this section we analyze the expressive power of several languages derived from DATALOG¬ by restricting the use of negation. In particular, we consider the combination of stratiﬁed negation, a nondeterministic construct, called choice and subset rules computing subsets of tuples of a given relation. The choice construct is supported by several deductive database systems such as LDL++ [33] and Coral [27], and it is used to enforce functional constraints on rules of a logic program. Thus, a goal of the form, choice((X), (Y )), in a rule r denotes that the set of all consequences derived from r must respect the FD X → Y . In general, X can be a vector of variables — possibly an empty one denoted by “( )” — and Y is a vector of one or more variables. As shown in [29] the formal semantics of the construct can be given in terms of stable model semantics. For instance, a rule r of the form r : p(X, Y, W ) ← q(X, Y, Z, W ), choice((X), (Y )), choice((Y ), (X)). expressing that for any stable model M , the ground instantiation of r w.r.t. M must satisfy the FDs X → Y and Y → X, is rewritten into the following standard rules r1 r2 r3 r4

: : : :

p(X, Y, W ) ← q(X, Y, Z, W ), chosen(X, Y, Z). chosen(X, Y, Z) ← q(X, Y, Z, W ), ¬dif f choice(X, Y, Z). dif f choice(X, Y, Z) ← chosen(X, Y , Z ), Y = Y . dif f choice(X, Y, Z) ← chosen(X , Y, Z ), Z = Z .

where the choice predicates have been substituted by the chosen predicate and for each choice predicate there is a diﬀchoice rule. The rule r will be called choice rule, the rule r1 will be called modiﬁed rule, the rule r2 will be called chosen rule and the rules r3 and r4 will be called diﬀchoice rules. Let P be a DATALOG¬ program with choice constructs, we denote with sv(P ) the program obtained by rewriting the choice rules as above — sv(P ) is called the standard version of P .

Search and Optimization Problems in Datalog

73

In general, the program sv(P ) generated by the transformation discussed above has the following properties [29,13]: 1) if P is in DATALOG or in DATALOG¬s then sv(P ) has one or more total stable models, and 2) the chosen atoms in each stable model of sv(P ) obey the FDs deﬁned by the choice goals. The stable models of sv(P ) are called choice models for P . The set of functional dependencies deﬁned by choice atoms on the instances of a rule r (resp., program P ) will be denoted F Dr (resp., F DP ). A subset rule is of the form s(X) ⊆ A1 , . . . , An . where s is an IDB predicate symbol not deﬁned elsewhere in the program (subset predicate symbol) and all literals A1 , . . . , An in the body are EDB. The rule enforces to select any subset of the relation that is derived from the body. The formal semantics of the rule is given by rewriting it into the following set of normal DATALOG¬ rules s(X) ← A1 , . . . , An , ¬ˆs(X). ˆs(X) ← A1 , . . . , An , ¬s(X). where sˆ is a new IDB predicate symbol with the same arity as s. Observe that the semantics of a subset rule can be also given in terms of choice as follows: label(1). label(2). ˆs(1, X) ← A1 , . . . , An , label(L), choice((X), (L)). s(X) ← ˆs(1, X). It turns out that subset rules are not necessary in our language, but we keep them in order to simplify the formulation of optimization queries. In the following we shall denote with DATALOG¬s ,c the language DATALOG¬s with choice and subset rules. More formally we say: Definition 5. A DATALOG¬ program P with choice and subset rules is in DATALOG¬s ,c if P is stratiﬁed, where P is obtained from sv(P ”) by removing diﬀchoice rules and diﬀchoice atoms and P ” is obtained from P by rewriting subset rules in terms of choice constructs. Search and otpimization queries are denoted by search(DATALOG¬s ,c ) and opt(DATALOG¬s ,c ), respectively. Moreover, search(DATALOG¬s ,c )g denotes the class of queries N Q = P, f such that f is a relation deﬁned by choice or subset rules and such rules are not deﬁned in terms of other choice or subset rules; the cor2 responding optimization class is opt(DATALOG¬s ,c )g . Proposition 4. 1. search(DATALOG¬s ,c ) = NQPTIME and opt(DATALOG¬s ,c ) = OPT NQPTIME; 2. search(DATALOG¬s ,c )g = NQPTIME g and opt(DATALOG¬s ,c )g = OPT NQPTIME g .

74

Sergio Greco and Domenico Sacc` a

PROOF. The fact that search(DATALOG¬s ,c ) = NQPTIME has been proven in many places, e.g., in [13,21,12]. Observe now that, given any query Q in search(DATALOG¬s ,c )g , Q ∈ NQPTIME as Q is also in search(DATALOG¬s ,c ). Moreover, for each D and for each answer of Q(D), the non-deterministic choices, that are issued while executing the logic program, are kept into the answer; thus every answer contains a certiﬁcate of its recognition and, then, recognition is in P. Hence also Q ∈ NQPMV g and, then, Q ∈ NQPTIME g . To show that every query Q in NQPTIME g is also in search(DATALOG¬s ,c )g , we use the following characterization of NQPTIME g [21]: every answer of Q can be constructed starting from the empty relation by adding one tuple at a time after a polynomialtime membership test. This construction can be easily implemented by deﬁning 2 a suitable query in search(DATALOG¬s ,c )g . Next we show how to increase the expressive power of the language. We stress that the additional power is added in a controlled fashion so that a high level of expressivity is automatically enabled only if required by the complexity of the problem at hand. Definition 6. Let search(DATALOG¬s ,c )! denote the class of queries N Q = P, !A, f such that P, f is in search(DATALOG¬s ,c ) and A is a ground literal (the constraint goal); for each D in DBP , N Q(D) = {M (f ) : M is a T-stable model of PD and either A ∈ M if A is positive or A ∈ M otherwise}. Accordingly, we deﬁne opt(DATALOG¬s ,c )! , search(DATALOG¬s ,c )g,! and opt(DATALOG¬s ,c )g,! . 2 Proposition 5. 1. search(DATALOG¬s ,c )! = NQPMV and opt(DATALOG¬s ,c )! = OPT NQPMV; 2. search(DATALOG¬s ,c )g,! = NQPMV g and opt(DATALOG¬s ,c )g,! = OPT NQPMV g . PROOF. Given any query Q = P, !A, f in search(DATALOG¬s ,c )! , Q ∈ NQPMV since for each database D and for each relation F , testing whether F ∈ Q(D) can be done in nondeterministic polynomial time as follows: we guess an interpretation M and, then, we check in deterministic polynomial time whether both M is a stable model and A is true in M . To prove that every query Q in NQPMV can be deﬁned by some query in search(DATALOG¬s ,c )! , we observe that Q can be expressed by a closed ﬁrst-order formula by Fact 2 and that this formula can be easily translated into a query in search(DATALOG¬s ,c )! . The proof of part (2) follows the lines of the proof of part (2) of Proposition 4. 2 Example 8. The program Pcut of Example 6 can be replaced by the following program Pcut : v (X) ⊆ v(X). e (X, Y) ← e(X, Y ), v (X), ¬v (Y ). e (X, Y) ← e(X, Y ), ¬v (X), v (Y ). The query Pcut , e is in search(DATALOG¬s ,c )g Pcut , max(e ) is in max(DATALOG¬s ,c )g .

and, therefore, the query

2

Search and Optimization Problems in Datalog

75

The program of the above example has been derived from the program of Example 6 by replacing the two rules with unstratiﬁed negation, deﬁning v with a subset rule. Example 9. The program Pkernel of Example 7 can be replaced by the following program Pkernel : v (X) ⊆ v(X). joined to v (X) ← v (Y), e(Y, X). no kernel ← v (X), joined to v (X). no kernel ← ¬v (X), ¬joined to v (X). The query Pkernel , ¬no kernel, v is in search(DATALOG¬s ,c )g,! and, therefore, the query Pkernel , min|v | is in min(DATALOG¬s ,c )g,! . 2 The advantage of using restricted languages is that programs with built-in predicates are more intuitive and it is possible to control the expressive power.

5

Capturing Desirable Subclasses of NP Optimization Problems

We have shown that optimization queries are much harder than associated search queries. Indeed it often happens that the optimization of polynomial-time computable search queries cannot be done in polynomial time. In this section we show how to capture optimization queries for which “approximate” answers can be found in polynomial time. Let us ﬁrst recall that, as said in Proposition 1, an NP optimization query opt|N Q| = DB, opt|f | corresponds to a problem in the class OPT PB that is deﬁned as opt|N Q| = optS |{w : (DB, S) |= φ(w, DB, S)}|. In addition to the free variables w, the ﬁrst order formula φ may also contain quantiﬁed variables so that the general format of it is of two types: (∃x1 )(∀x2 ) . . . (Qk xk )ψ(w, DB, S, x1 , . . . , xk ), or (∀x1 )(∃x2 ) . . . (Qk xk )ψ(w, DB, S, x1 , . . . , xk ), where k ≥ 0, Qk is either ∃ or ∀, and ψ is a non-quantiﬁed formula. In the ﬁrst case φ is a Σk formula while it is a Πk formula in the latter case. (If φ has no quantiﬁers then it is both a Σ0 and a Π0 formula.) Accordingly, the class of all NP optimization problems for which the formula φ is a Σk (resp., Πk ) formula is called OPT Σk (resp., OPT Πk ). Kolaitis and Thakur [20] have introduced two hierarchies for the polynomially bounded NP minimization problems and for the polynomially bounded NP maximization problems: MAX Σ0 ⊂ MAX Σ1 ⊂ MAX Π1 = MAX Σ2 ⊂ MAX Π2 = MAX PB MIN Σ0 = MIN Σ1 ⊂ MIN Π1 = MIN Σ2 = MIN PB

76

Sergio Greco and Domenico Sacc` a

Observe that the classes MAX Σ0 and MAX Σ1 have been ﬁrst introduced in [26] with the names MAX SNP and MAX NP, respectively, whereas the class MAX Π1 has been ﬁrst introduced in [25]. A number of maximization problems have a desirable property: approximation. In particular, Papadimitriou and Yannakakis have shown that every problem in the class MAX Σ1 is constant-approximable [26]. This is not the case for the complementary class MIN Σ1 or other minimization subclasses: indeed the class MIN Σ0 contains problems which are not log-approximable (unless P = NP) [20]. To single out desirable subclasses for minimization problems, Kolaitis and Thakur introduced a reﬁnement of the hierarchies of NP optimization problems by means of the notion of feasible NP optimization problem, based on the fact that, as pointed out in Proposition 1, an NP optimization query, opt|N Q| = DB, opt|f | , can be also deﬁned as optf,S {|f | : (D, f, S) |= φ(DB, f, S)}. Therefore, the class of all NP optimization problems for which the above formula φ is a Σk (resp., Πk ) formula is called OPT F Σk (resp., OPT F Πk ). The following containment relations hold: MAX Σ0 ⊂ MAX Σ1 ⊂ MAX F Π1 = MAX F Σ2 = MAX Π1 = MAX F Σ1

MAX Σ2 ⊂ MAX F Π2 = MAX Π2 = MAX PB

MIN Σ0 = MIN Σ1 = MIN F Π1 MIN F Σ1

⊂ MIN F Σ2 ⊂ MIN Π1 = MIN Σ2 = M IN F Π2 = MIN Π2 = MIN PB

Observe that all problems in MAX F Σ1 are constant-approximable since MAX F Σ1 ⊂ MAX Σ1 . A further reﬁnement of feasible NP optimization classes can be obtained as follows. A ﬁrst order formula φ(S) is positive w.r.t. the relation symbol S if all occurrences of S are within an even number of negation. The class of feasible NP minimization problems whose ﬁrst order part is a positive Πk formula (1 ≤ k ≤ 2) is denoted by MIN F + Πk . Particularly relevant is MIN F + Π1 as all optimization problems contained in this class are constant-approximable [20]. We next show that it is possible to further discipline DATALOG¬s ,c in order to capture most of the above mentioned optimization subclasses. First of all we point out that feasible NP optimization problems can be captured in DATALOG¬s ,c,! by restricting to the class opt(DATALOG¬s ,c )g . For instance, the problem expressed by the query of Example 9 is feasible whereas the problem expressed by the query of Example 8 is not feasible. Let P be a DATALOG¬s ,c program, p(y) be an atom and X a set of variables. We say that p(y) is free w.r.t. X (in P ) if 1. var(p(y)) ⊆ X, where var(p(y)) is the set of variables occurring in y, and 2. ∀r ∈ P such that the head H(r) and p(y) unify, then var(B(r)) ⊆ var(H(r)) (i.e., the variables in the body also appear in the head) and for each atom q(w) in B(r), either q is an EDB predicate or q(w) is free w.r.t. var(q(w)).

Search and Optimization Problems in Datalog

77

We denote with opt(DATALOG¬s ,c ) ∃ the class of all queries P, opt|f | in opt(DATALOG¬s ,c ) such that f (X) is free w.r.t. X, where X is a list of distinct variables. Thus, opt(DATALOG¬s ,c ) ∃ denotes the class of all queries P, opt|f | in opt(DATALOG¬s ,c ), where all rules used to deﬁne (transitively) the predicate f , do not have additional variables w.r.t. to the head variables. For instance, the query of Example 8 is in opt(DATALOG¬s ,c ) ∃ . Theorem 1. opt(DATALOG¬s ,c ) ∃ = OPT Σ0 . PROOF. Let P, opt|f | be a query in opt(DATALOG¬s ,c ) ∃ . Consider the rules that deﬁne directly or indirectly the goal f and let X be a list of a(f ) distinct variables. Since f (X) is free w.r.t. X by hypothesis, it is possible to rewrite the variables in the above rules such that they are a subset of X. It is now easy to show that the query can be written as a quantiﬁer-free ﬁrst-order formula with the free variables X, i.e., the query is in OPT Σ0 . The proof that every query in OPT Σ0 can be formulated as a query in opt(DATALOG¬s ,c ) ∃ is straightforward. 2 It turns out that all queries in max(DATALOG¬s ,c ) ∃ are constant-approximable. Example 10. MAX CUT. Consider the program Pcut of Example 8. The query 2 Pcut , max(e ) is in MAX Σ0 since e (X, Y ) is free w.r.t. X, Y . Let P be a DATALOG¬s ,c program and p(y) be an atom. We say that P is semipositive w.r.t. p(y) if 1. p is an EDB or a subset predicate symbol, or 2. ∀r ∈ P deﬁning p, P is semipositive w.r.t. every positive literal in the body B(r) while each negative literal is EDB or subset. We now denote with opt(DATALOG¬s ,c )+ the class of all queries P, opt(f ) in opt(DATALOG¬s ,c ) such that P is semipositive w.r.t.f (X). Thus,opt(DATALOG¬s ,c )+ denotes the class of all queries P, opt|f | in opt(DATALOG¬s ,c ) where negated predicates used to deﬁne (transitively) the predicate f are either EDB predicates or subset predicates. For instance, the query of Example 8 is inopt(DATALOG¬s ,c )+ . Moreover, since the predicate appearing in the goal is a subset predicate, the query of Example 8 is in opt(DATALOG¬s ,c )g,+ . Theorem 2. 1. opt(DATALOG¬s ,c )+ = OPT Σ1 , 2. opt(DATALOG¬s ,c )g,+ = OPT F Σ1 . PROOF. Let P, opt|f | be a query in opt(DATALOG¬s ,c )+ and X be a list of a(f ) distinct variables. Consider the rules that deﬁne directly or indirectly the goal f . Since P is semipositive w.r.t. f (X) by hypothesis, it is possible to rewrite the variables in the above rules such that each of them is either in X or existentially quantiﬁed. It is now easy to show that the query can be formulated 2 in the OPT Σ1 format. The proof of part (2) is straightforward. Then all queries in both max(DATALOG¬s ,c )+ and max(DATALOG¬s ,c )g,+ are constant-approximable.

78

Sergio Greco and Domenico Sacc` a

Example 11. MAX SATISFIABILITY. We are given two unary relation c and a such that a fact c(x) denotes that x is a clause and a fact a(v) asserts that v is a variable occurring in some clause. We also have two binary relations p and n such that the facts p(x, v) and n(x, v) say that a variable v occurs in the clause x positively or negatively, respectively. A boolean formula, in conjunctive normal form, can be represented by means of the relations c, a, p, and n. The maximum number of clauses simultaneously satisﬁable under some truth assignment can be expressed by the query Psat , max(f ) where Psat is the following program: s(X) ⊆ a(X). f(X) ← c(X), p(X, V), s(V). f(X) ← c(X), n(X, V), ¬s(V). Observe that f (X) is not free w.r.t. X (indeed the query is not in MAX Σ0 ) but Psat is semipositive w.r.t. f (X) so that the query is in MAX Σ1 . Observe now that the query goal f is not a subset predicate: indeed the query is not in 2 MAX F Σ1 . Let !A be a goal in a query in opt(DATALOG¬s ,c )! on a program P — recall that A is a positive or negative ground literal. Then a (not necessarily ground) atom C has 1. a mark 0 w.r.t. A if C = A; 2. a mark 1 w.r.t. A if C = ¬A; 3. a mark k ≥ 0 w.r.t. A if there exists a rule r in P and a substitution σ for the variables in C such that either (i) H(r ) has mark (k − 1) w.r.t. A and Cσ occurs negated in the body of r , or (ii) H(r ) has mark k w.r.t. A and Cσ is a positive literal in the body of r . Let us now deﬁne the class opt(DATALOG¬s ,c )!, ∃ of all queries P, !A, opt(f )

in opt(DATALOG¬s ,c )! such that (i) f (X) is free w.r.t. X and (ii) for each atom C that has an even mark w.r.t. A and for every rule r in P , whose head uniﬁes with C, the variables occurring in the body B(r ) also occur in the head H(r ). We are ﬁnally able to deﬁne a subclass which captures OPT F + Π1 that is approximable when OPT = MIN . To this end, we deﬁne opt(DATALOG¬s ,c )!, ∃,g,+ as the subclass of opt(DATALOG¬s ,c )!, ∃,g consisting of those queries P, !A, opt(f )

such that there exists no subset atom s(x) having an odd mark w.r.t. A. Theorem 3. 1. opt(DATALOG¬s ,c )!, ∃ = OPT Π1 ; 2. opt(DATALOG¬s ,c )!, ∃,g = OPT F Π1 ; 3. opt(DATALOG¬s ,c )!, ∃,g,+ = OPT F + Π1 . PROOF. Let P, !A, opt(f ) be a query in opt(DATALOG¬s ,c )!, ∃ . Consider the rules that deﬁne directly or indirectly the goal f and let X be a list of a(f ) distinct variables. Since f (X) is free w.r.t. X by hypothesis, it is possible to

Search and Optimization Problems in Datalog

79

rewrite the variables in the above rules such that they are a subset of X. Consider now the rules that deﬁne directly or indirectly the goal !A. We can now rewrite the variables in the above rules such that they are universally quantiﬁed. It is now easy to show that the query can be written as an existential-free ﬁrst-order formula with the free variables X and possibly additional variables universally quantiﬁed, i.e., the query is in OPT Π1 . The proofs of the other relationships are simple. 2 Example 12. MAX CLIQUE. In this example we want to ﬁnd the cardinality of a maximum clique, i.e. a set of nodes V such that for each pair of nodes (x, y) in V there is an edge joining x to y. The maximum clique problem can be expressed by the query Pclique , !¬no clique, max(v ) where the program Pclique is as follows: v (X) ⊆ v(X). no clique ← v (X), v (Y), X = Y, ¬e(X, Y). The query is in the class max(DATALOG¬s ,c )!, ∃,g and, therefore, the optimization query is in MAX F Π1 (= MAXΠ1 ). On the other hand both atoms v (X) and v (Y) in the body of the rule deﬁning the predicate no clique have mark 1 (i.e. odd) w.r.t. the ”!” goal. Therefore, the query Pclique , !¬no clique, max(v )

2 is not in the class max(DATALOG¬s ,⊆ )!, ∃,g,+ , thus it is not in MAX F + Π1 . Example 13. MIN VERTEX COVER. As discussed in the introduction, the problem can be formulated by the query Pvc , !¬no cover, min(v (X)) where Pvc is the following program: v (X) ⊆ v(X). no cover ← e(X, Y), ¬v (X), ¬v (Y). Observe that both atoms v (X) and v (Y) in the rule deﬁning no cover have a mark 2 (i.e., even) w.r.t. the “!” goal. Therefore, the query is in min(DATALOG¬s ,c )!, ∃,g,+ and, then, in MIN F + Π1 ; so the problem is constantapproximable. 2 Additional interesting subclasses could be captured in our framework, but they are not investigated here. We just give an example of a query which is in the class MIN F + Π2 (1) — this class is a subset of MIN Π2 where every subset predicate symbol occurs positively and at most once in every disjunction of the formula ψ. Problems in this class are log-approximable [20]. Example 14. MIN DOMINATING SET. Let G = (V, E) be a graph. A subset V of V is a dominating set if every node is either in V or has a neighbour in V . The query Pds , !¬no ds, min(v (X)) where Pds is the following program, computes the cardinality of a minimum dominating set: v (X) ⊆ v(X). q(X) ← v (X). q(X) ← e(X, Y), v (Y). no ds ← v(X), ¬q(X). This problem belongs to MIN F + Π2 (1).

2

80

Sergio Greco and Domenico Sacc` a

Observe that the problem min kernel as deﬁned in Example 8 is in the class MIN F Π2 , but not in MIN F + Π2 , as it contains occurrences of the subset predicate v which have an odd mark w.r.t. the ”!” goal.

6

Conclusion

In this paper we have shown that NP search and optimization problems can be formulated as DATALOG¬ queries under non-deterministic total stable model semantics. In order to enable a simpler and more intuitive formulation of such problems, we have also introduced an extension of stratiﬁed DATALOG¬ that is able to express all NP search and optimization queries using a disciplined style of programming in which only simple forms of unstratiﬁed negations are supported. The core of this language, denoted by DATALOG¬s ,c,! , is stratiﬁed DATALOG¬ augmented with three types of non-stratiﬁed negations which are hardwired into ad-hoc constructs: choice predicate, subset rule and constraint goal. The former two constructs serve to issue non-deterministic selections while constructing one of possible total stable models, whereas the latter one deﬁnes some constraint that must be respected by the stable model in order to be accepted as an intended meaning of the program. The language DATALOG¬s ,c,! has been further reﬁned in order to capture interesting subclasses of NP search queries, some of them computable in polynomial time. As for optimization queries, since in general they are not tractable also when the associated search problems are, we introduced restrictions to our language to single out classes of approximable optimization problems which have been recently introduced in the literature. Our on-going research follows two directions: 1. eﬃcient implementation schemes for the language, particularly to perform eﬀective subset selections by pushing down constraints and possibly adopting ‘intelligent’ search strategies; this is particularly useful if one wants to ﬁnd approximate solutions; 2. further extensions of the language such as (i) adding the possibility to use IDB predicates whenever an EDB predicate is required (provided that IDB deﬁnitions are only given by stratiﬁed rules), (ii) freezing, under request, nondeterministic selections to enable a “don’t care” non-determinism (thus, some selections cannot be eventually retracted because of the constraint goal), and (iii) introducing additional constructs, besides to choice and subset rule, to enable nondeterministic selections satisfying predeﬁned constraints that are tested on the ﬂy.

References 1. Abiteboul, S., Simon, E., and Vianu, V., Non-deterministic languages to express deterministic transformations. In Proc. ACM Symp. on Principles of Database Systems, 1990, pp. 218-229.

Search and Optimization Problems in Datalog

81

2. Abiteboul, S., and Vianu, V., Non-determinism in logic-based languages. Annals of Mathematics and Artificial Intelligence 3, 1991, pp. 151-186. 3. Abiteboul, S., Hull, R., and Vianu, V., Foundations of Databases. Addison-Wesley, 1994. 4. Afrati, F., Cosmadakis, S. S., and Yannakakis, M., On Datalog vs. Polynomial Time. Proc. ACM Symp. on Principles of Database Systems, 1991, pp. 13-25. 5. Apt, K., Blair, H., and Walker, A., Towards a theory of declarative knowledge. In Foundations of Deductive Databases and Logic Programming, J. Minker (ed.), Morgan Kauﬀman, Los Altos, USA, 1988, 89-142. 6. Ausiello, G., Crescenzi, P., and Protasi M., Approximate solution of NP optimization problems. Theoretical Computer Science, No. 150, 1995, pp. 1-55. 7. Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A., and Protasi, M., Complexity and Approximation - Combinatorial optimization problems and their approximability properties Springer-Verlag, 1999. 8. Fagin, R., Generalized First-Order Spectra and Polynomial-Time Recognizable Sets. In Complexity of Computation (R. Karp, Ed.), SIAM-AMS Proc., Vol. 7, 1974, pp. 43-73. 9. Fenner, S., Green, F., Homer, S., Selman, A. L., Thierauf, T. and Vollmer H., Complements of Multivalued Functions. Chicago Journal of Theoretical Computer Science, 1999. 10. Garey, M., and Johnson, D. S., Computers and Intractability — A Guide to the Theory of NP-Completeness. W.H. Freeman, New York, USA, 1979. 11. Gelfond, M., and Lifschitz, V., The Stable Model Semantics for Logic Programming. Proc. 5th Int. Conf. on Logic Programming, 1988, pp. 1070-1080. 12. Giannotti, F., Pedreschi, D., and Zaniolo, C., Semantics and Expressive Power of Non-Deterministic Constructs in Deductive Databases. Journal of Computer and System Sciences, 62, 1, 2001, pp. 15-42. 13. Giannotti, F., Pedreschi, D., Sacc` a, D., and Zaniolo, C., Nondeterminism in Deductive Databases. Proc. 2nd Int. Conf. on Deductive and Object-Oriented Databases, 1991, pp. 129-146. 14. Greco, S., Sacc` a, D., and Zaniolo C., Datalog with Stratiﬁed Negation and Choice: from P to DP . Proc. Int. Conf. on Database Theory, 1995, pp. 574–589. 15. Greco, S., and Sacc` a, D., NP-Optimization Problems in Datalog. Proc. Int. Logic Programming Symp., 1997, pp. 181-195. 16. Greco, S., and Zaniolo, C., Greedy Algorithms in Datalog. Proc. Int. Joint Conf. and Symp. on Logic Programming, 1998, pp. 294-309. 17. Johnson, D. S., A Catalog of Complexity Classes. In Handbook of Theoretical Computer Science, Vol. 1, J. van Leewen (ed.), North-Holland, 1990. 18. Kanellakis, P. C., Elements of Relational Database Theory. In Handbook of Theoretical Computer Science, Vol. 2, J. van Leewen (ed.), North-Holland, 1991. 19. Kolaitis, P. G., and Thakur, M. N., Logical Deﬁnability of NP Optimization Problems. Information and Computation, No. 115, 1994, pp. 321-353. 20. Kolaitis, P. G., and Thakur, M. N., Approximation Properties of NP Minimization Classes. Journal of Computer and System Science, No. 51, 1995, pp. 391-411.

82

Sergio Greco and Domenico Sacc` a

21. Leone, N., Palopoli, L., and Sacc` a, D. On the Complexity of Search Queries. In Fundamentals Of Information Systems (T. Plle, T. Ripke, K.D. Schewe, eds), 1999, pp. 113-127. 22. Lloyd, J., Foundations of Logic Programming. Springer-Verlag, 1987. 23. Marek, W., and Truszczynski, M., Autoepistemic Logic. Journal of the ACM, Vol. 38, No. 3, 1991, pp. 588-619. 24. Papadimitriou, C. H., Computational Complexity. Addison-Wesley, Reading, MA, USA, 1994. 25. Panconesi, A., and Ranjan, D., Quantiﬁers and Approximation. Theoretical Computer Science, No. 1107, 1992, pp. 145-163. 26. Papadimitriou, C. H., and Yannakakis, M., Optimization, Approximation, and Complexity Classes. Journal Computer and System Sciences, No. 43, 1991, pp. 425-440. 27. Ramakrisnhan, R., Srivastava, D., and Sudanshan, S., CORAL — Control, Relations and Logic. In Proc. of 18th Conf. on Very Large Data Bases, 1992, pp. 238-250. 28. Sacc` a, D., The Expressive Powers of Stable Models for Bound and Unbound Queries. Journal of Computer and System Sciences, Vol. 54, No. 3, 1997, pp. 441464. 29. Sacc` a, D., and Zaniolo, C., Stable Models and Non-Determinism in Logic Programs with Negation. In Proc. ACM Symp. on Principles of Database Systems, 1990, pp. 205-218. 30. Selman, A., A taxonomy of complexity classes of functions. Journal of Computer and System Science, No. 48, 1994, pp. 357-381. 31. A. Selman, Much ado about functions. Proc. of the 11th Conf. on Computational Complexity, IEEE Computer Society Press, 1996, pp. 198-212. 32. Ullman, J. K., Principles of Data and Knowledge-Base Systems, volume 1 and 2. Computer Science Press, New York, 1988. 33. Zaniolo, C., Arni, N., and Ong, K., Negation and Aggregates in Recursive Rules: the LDL++ Approach. Proc. 3rd Int. Conf. on Deductive and Object-Oriented Databases, 1993, pp. 204-221.

The Declarative Side of Magic Paolo Mascellani1 and Dino Pedreschi2 1

Dipartimento di Matematica, Universit` a di Siena via del Capitano 15, Siena - Italy [email protected] 2 Dipartimento di Informatica, Universit` a di Pisa Corso Italia 40, Pisa - Italy [email protected]

Abstract In this paper, we combine a novel method for proving partial correctness of logic programs with a known method for proving termination, and apply them to the study of the magic-sets transformation. As a result, a declarative reconstruction of eﬃcient bottom-up execution of goal-driven deduction is accomplished, in the sense that the obtained results of partial and total correctness of the transformation abstract away from procedural semantics.

1

Introduction

In the recent years, various principles and methods for the veriﬁcation of logic programs have been put forward, as witnessed for instance in [11,3,16,17,13]. The main aim of this line of research is to verify the crucial properties of logic programs, notably partial and total correctness, on the basis of the declarative semantics only, or, equivalently, by abstracting away from procedural semantics. The aim of this paper is to apply some new methods for partial correctness combined with some known methods for total correctness to a case study of clear relevance, namely bottom-up computing. More precisely, we: – introduce a method for proving partial correctness by extending the ideas in [14], – combine it with the approach in [6,7] for proving termination, and – apply both to the study of the transformation techniques known as magicsets, introduced for the eﬃcient bottom-up execution of goal-driven deduction — see [9,20] for a survey. We found the exercise stimulating, as all proofs of correctness of the magicsets transformation(s) available in the literature are based on operational arguments, and often quite laborious. The results of partial and total correctness presented in this paper, instead, are based on purely declarative reasoning, which clariﬁes the natural idea underlying the magic-sets transformation. Moreover, these results are applicable under rather general assumptions, which broadly encompass the programming paradigm of deductive databases. A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 83–108, 2002. c Springer-Verlag Berlin Heidelberg 2002

84

Paolo Mascellani and Dino Pedreschi

Preliminaries Throughout this paper we use the standard notation of Lloyd [12] and Apt [1]. In particular, for a logic program P we denote the Herbrand Base of P by BP , the least Herbrand model of P by MP and the immediate consequence operator by TP . Also, we use Prolog’s convention identifying in the context of a program each string starting with a capital letter with a variable, reserving other strings for the names of constants, terms or relations. In the programs we use Prolog’s list notation. Identiﬁers ending with “s”, like xs, range over lists. Bold capital letters, like A, identify a possibly empty sequence (conjunction) of atoms or set of variables: the context should always be clear. Plan of the Paper In Section 2 we introduce a declarative method for proving the partial correctness of a logic program w.r.t. a speciﬁcation. In Section 3 we use this method to obtain a declarative proof of the correctness of a particular implementation of the magic-sets transformation technique. In Section 4 is recalled the concept of acceptability, which allows to conduct declarative termination proofs for logic programs. In Section 5, we apply this concept to prove the termination of the magic programs under and some related properties. Finally, in Section 6, we provide a set of examples in order to clarify the use of the proof methods proposed and how the magic-sets transformation works.

2

Partial Correctness

Partial correctness aims at characterizing the input-output behavior of programs. The input to a logic program is a query, and the associated output is the set of computed instances of such a query. Therefore, partial correctness in logic programming deals with the problem of characterizing the computed instances of a query. In Apt [2,3], a notation recalling that of Hoare’s triples (correctness formulas) is used. The triple: {Q} P Q denotes the fact that Q is the set of computed instances of query Q. A natural question is: can we establish a correctness formula by reasoning on declarative semantics, i.e. by abstracting away from procedural semantics? The following simple result, which generalizes one from [3] tells us that this is possible in the case of ground output. Theorem 1. Consider the set Q of the correct instances of a query Q and a program P , and suppose that every query in Q is ground. Then: {Q} P Q.

The Declarative Side of Magic

85

Proof. Clearly, every computed instance of Q is also a correct instance of Q by the Soundness of SLD-resolution. Conversely, consider a correct instance Q1 of Q. By the Strong Completeness of SLD-resolution, there exists a computed instances Q2 of Q such that Q1 is an instances of Q2 . By the Soundness of SLDresolution, Q2 is a correct instance of Q, so it is ground. Consequently Q2 = Q1 , 2 hence Q1 is a computed instance of Q. So, for programs with ground output, correct and computed instances of queries coincide, and therefore we can use directly the declarative semantics to check partial correctness. When considering one-atom queries only, the above result can be rephrased as follows: if the one-atom query A to program P admits only ground correct instances, then: {A}

P MP ∩ [A] .

(1)

A simple suﬃcient condition (taken from [3]) to check that all correct instances of a one-atom query A are ground is to show that the set [A] ∩ MP is ﬁnite, i.e. that A admits a ﬁnite number of correct ground instances. So, in principle, it is possible to reason about partial correctness on the basis of the least Herbrand model only. As an example, consider the Append program: append([], Ys, Ys). append([X|Xs],Ys,[X|Zs]) ← append(Xs,Ys,Zs). the interpretation: IAppend = {append(xs,ys,zs) | xs,ys,zs are lists and xs * ys = zs}

(2)

where zs is some given list, and “∗” denotes list concatenation, and the correctness formula: {append(Xs, Ys, zs)}

Append IAppend

We can establish such a triple provided we can show that the interpretation IAppend is indeed the least Herbrand model of the Append program, since the number of pairs of lists whose concatenation yields zs is clearly ﬁnite. Unfortunately, despite the fact that the set IAppend is the natural intended interpretation of the Append program, it is not a model of Append, because the ﬁrst clause does not hold in it. In fact, for many programs it is quite cumbersome to construct their least Herbrand model. Note for example that MAppend contains elements of the form append(s,t,u) where neither t nor u is a list. A correct deﬁnition of MAppend is rather intricate, and clearly, it is quite clumsy to reason about programs when even in so simple cases their semantics is deﬁned in such a laborious way. Why is the least Herbrand model diﬀerent from the speciﬁcation, or intended interpretation, of a program? The reason is that we usually design programs with

86

Paolo Mascellani and Dino Pedreschi

reference to a class of intended queries which describes the admissible input for the program. As a consequence, the speciﬁcation of the program is relative to the set of intended queries, whereas MP is not. In the example, the intended queries for Append are described by the set: {append(s,t,u) | s,t are lists or u is a list}

(3)

and it is possible to show that the speciﬁcation (2) is indeed the fragment of the least Herbrand model MAppend restricted to the set (3) of the intended queries. A method for identifying the intended fragment of the least Herbrand model is proposed in [4]; such a fragment is then used to establish the desired correctness formulas. This method adopts a notion of well-typedness [8,3,18], which makes it applicable to Prolog programs only, in that it exploits the left-to-right ordering of atoms within clauses. In the next section we introduce a more general characterization of the intended fragment of the least Herbrand model, which remedies the asymmetry of the method in [4], and allows us to prove partial correctness of logic programs with no reference to control issues. Bases The key concept of this paper is introduced in the following: Definition 1. An interpretation I is called a base for a program P w.r.t. some model M of P iﬀ, for every ground instance A ← A of every clause of P : if I |= A and M |= A, then I |= A.

2 The notion of a base has been designed to formalize the idea of an admissible set of “intended (one-atom) queries”. Deﬁnition 1 requires that all possible clauses which allow to deduce an atom A in a base I have their bodies true in I itself. The condition that the body is true in some model of the program (obviously a necessary condition to conclude A) is used to get a weakening of the requirement. Roughly speaking, a base is required to include all possible atoms needed to deduce any atom in the base itself. The concept of a base was ﬁrst introduced in [14], where it is referred to as a closed interpretation. As an example, it is readily checked that the set (3) is a base for Append. Since a base I is assumed to describe the intended queries, the intended fragment of the least Herbrand model is MP ∩ I. The main motivation for introducing the notion of a base is that of obtaining a method to identify MP ∩ I directly, without having to construct MP ﬁrst. To this purpose, given a base I for a program P , we deﬁne the reduced program, denoted PI , as the set of ground instances of clauses from P whose heads are in I. In other words, PI = {A ← A ∈ ground(P ) | A ∈ I}.

(4)

The Declarative Side of Magic

87

The following observation is immediate: TPI (X) = TP (X) ∩ I.

(5)

The following crucial result, ﬁrst shown in [14], shows that the least Herbrand model of the reduced program coincides with the intended fragment (w.r.t. I) of the least Herbrand model of the program: Theorem 2. Let P be a program and I a base for P . Then: MP ∩ I = MPI . Proof. First, we show that, for any interpretation J ⊆ MP : TPI (J) = TPI (J ∩ I).

(6)

The (⊇) part is a direct consequence of the fact that TPI is monotonic. To establish the (⊆) part, consider A ∈ TPI (J). Then, for some clause A ← A from PI , we have J |= A, hence MP |= A, which together with the fact that I is a base and I |= A, implies that I |= A. Therefore J ∩ I |= A, which implies A ∈ TPI (J ∩ I). We now show that, for all n > 0, TPn (∅) ∩ I = TPnI (∅) which implies the thesis. The proof is by induction on n. In the base case (n = 0), the claim is trivially true. In the induction case (n > 0), we calculate: TPn (∅) ∩ I = TP (TPn−1 (∅)) ∩ I {(5)} = TPI (TPn−1 (∅)) {TPn−1 (∅) ⊆ MP and (6)} = TPI (TPn−1 (∅) ∩ I) {induction hypothesis} = TPI (TPn−1 (∅)) I = TPnI (∅).

2 So, given a base I for program P , MPI is exactly the desired fragment of MP . The reduced program PI is a tool to construct such a desired fragment of MP without constructing MP ﬁrst. Therefore, MPI directly can be used to prove correctness formulas for intended queries, i.e. queries whose ground instances are in I, as stated in the following:

88

Paolo Mascellani and Dino Pedreschi

Theorem 3. Let P be program, I a base for P , and Q a one-atom query which admits ground correct instances only. Then: {Q}

P MPI ∩ [Q] .

Proof. By Theorem 2, MPI = MP ∩ I, and [Q] ⊆ I implies MP ∩ [Q] = MP ∩ I ∩ [Q]. The result then follows immediately from (1) or, equivalently, Theorem 1. 2 In the Append example, the intended speciﬁcation (2) is indeed the least Herbrand model of the Append program reduced with respect to the base (3), so, using Theorem 3, we can establish the desired triple: {append(Xs, Ys, zs)}

Append {append(xs, ys, zs) | zs = xs ∗ ys}

Later, a simple, induction-less, method for proving that a given interpretation is the least Herbrand model of certain programs is discussed. Example 1. Consider the following program ListSum, computing the sum of a list of natural numbers: listsum([],0) ← listsum([X|Xs],Sum) ← listsum(Xs,PSum),sum(PSum,X,Sum) sum(X,0,X) ← sum(X,s(Y),s(Z)) ← sum(X,Y,Z) and the Herbrand interpretations IListSum and M , deﬁned as follows: IListSum =

{listsum(xs, sum) | listnat(xs)} ∪ {sum(x, y, z) | nat(x) ∧ nat(y)}

M

{listsum(xs, sum) | listnat(xs) ⇒ nat(sum)} ∪ {sum(x, y, z) | nat(x) ∧ nat(y) ⇒ nat(z)}

=

where listnat(x) and nat(x) hold when x is, respectively, a list of natural numbers and a natural number. First, we check that M is a model of ListSum: M M M M

|= listsum([], 0) |= listsum([x|xs], sum) ⇐ M |= listsum(xs, psum), sum(psum, x, sum) |= sum(x, 0, x) |= sum(x, s(y), s(z)) ⇐ M |= sum(x, y, z)

Next, we check that IListSum is a base for ListSum w.r.t. M : IListSum |= listsum([x|xs], sum) ∧ M |= listsum(xs, psum), sum(psum, x, sum) ⇒ IListSum |= listsum(xs, psum), sum(psum, x, sum) IListSum |= sum(x, s(y), s(z)) ∧ M |= sum(x, y, z) ⇒ IListSum |= sum(x, y, z)

The Declarative Side of Magic

The following set is the intended interpretation of the ListSum program: listsum(xs, sum) | listnat(xs) ∧ sum = x∈xs x ∪ {sum(x, y, z) | nat(x) ∧ nat(y) ∧ x + y = z}

89

(7)

and, although it is not a model of the program (the unit clause of sum does not hold in it), it is possible to prove that it is the fragment of the MListSum restricted to the base MListSum. Therefore, by Theorem 3, provided xs is a list of natural numbers, we establish the following triple: x . {listsum(xs, Sum)} ListSum listsum(xs, sum) | sum = x∈xs

2

In many examples, like the one above, bases are constructed using type information. Typically, a base is constructed by specifying the types of input positions of relations, and the model associated with a base is constructed by specifying how types propagate from input to output positions. If a decidable type deﬁnition language is adopted, such as the one proposed in [10], then checking that a given interpretation is base is fully automatazible. However, a full treatment of this aspects is outside the scope of this paper.

3

Partial Correctness and Bottom-Up Computing

Consider a naive bottom-up evaluation of the ListSum program. The sequence of approximating sets is hard to compute for several reasons. 1. The unit clause sum(X, 0, X) ← introduces inﬁnitely many facts at the very ﬁrst step. In fact, such a clause is not safe in the sense of [19], i.e. variables occur in the head, which do not occur in the body. 2. Even if a safe version of the ListSum program is used, using a relation which generates natural numbers, the approximating sets grow exponentially large. 3. In any case, the bottom-up computation diverges. In a goal-driven execution starting from the query listsum(xs,X), where xs is the input list and X is a variable, however, only a linearly increasing subset of each approximation is relevant. A more eﬃcient bottom-up computation can be achieved using the program ListSum reduced w.r.t. an appropriate base I which includes all instances of the desired query. Indeed, Theorem 2 tells us that, in the bottom-up computation, it is equivalent to take the intersection with the base I at the limit of the computation, or at each approximation. The second option is clearly more eﬃcient, as it allows to discard promptly all facts which are unrelevant to the purpose of answering the desired query. Therefore, the base should be chosen as small as possible, in order to minimize the size of the approximations. However, computing with the reduced program is unrealistic for

90

Paolo Mascellani and Dino Pedreschi

two reasons. First, constructing a suitable base before the actual computation takes place is often impossible. In the ListSum example, an appropriate base should be chosen as follows: Ixs =

{listsum(ys, sum) | listnat(ys) ∧ ys is a suﬃx of xs} ∪ {sum(x, y, z) | nat(x) ∧ nat(y) ∧ z ≥ n}

where xs is the input list and n is the sum of the numbers in xs, so the expected result of the computation! Second, a reduced program is generally inﬁnite or, at best, hopelessly large. Nevertheless, bases and reduced programs are useful abstractions to explain the idea behind the optimization techniques like magic-sets, widely used in deductive database systems to support eﬃcient bottom-up execution of goal-driven deduction. In fact, we shall see how the optimized magic program is designed to combine the construction of a base and its exploitation in an intertwined computation.

The Magic-Sets Transformation In the literature, the problem of the eﬃcient bottom-up execution of goal-driven computations has been tackled in a compilative way, i.e. by means of a repertoire of transformation techniques which are known under the name of magic-sets— see [9] or [20, Ch. 13] for a survey on this broad argument. Magic-sets is a non trivial program transformation which, given a program P and a query Q, yields a transformed program which, when executed bottom-up, mimics the top-down, Prolog-like execution of the original program P , activated on the query Q. Many variations of the basic magic-sets technique have been proposed, which however share the original idea. All available justiﬁcations of its correctness are given by means of procedural arguments, by relating the bottom-up computation of the transformed (magic) program with the top-down computation of the original program and query. As a consequence, all known proofs of correctness of the magic-sets transformation(s) are rather complicated, although informative about the relative eﬃciency of the top-down and bottom-up procedures—see for instance [20, pp.836-841]. We show here how the core of the magic-sets transformation can be explained in rather natural declarative terms, by adopting the notion of a base, and the related results discussed in the previous section. Actually, we show that the “magic” of the transformation lies in computing and exploiting a base of the original program. We provide an incremental version of the core magic-sets transformation, which allows us to compile separately each clause of the program. We need to introduce the concept of call pattern, or mode, which relates to that of binding pattern in [20]. Informally, modes indicate whether the arguments of a relation should be used either as an input or as an output, thus specifying the way a given program is intended to be queried.

The Declarative Side of Magic

91

Definition 2. Consider an n-ary relation symbol p. A mode for p is a function: mp : [1, n] → {+, −} . If mp (i) = + , we call i an input position of p, otherwise we call i an output position of p. By a moding we mean a collection of modes, one for each relation 2 symbol in a program. We represent modes in a compact way, writing mp in the more suggestive form p(mp (1), . . . , mp (n)). For instance the mode sum(+,+,-) speciﬁes the input/output behavior of the relation sum, which is therefore expected to be queried with the two ﬁrst positions ﬁlled in with ground terms. ¿From now on we assume that some ﬁxed moding is given for any considered program. To simplify our notation, we assume, without loss of generality, that, in each relation, input positions precede output positions, so that any atom A can be viewed as p(u, v), where u are the terms in the input positions of p and v are the terms in the output positions of p. With reference to this notation, the magic version of an atom A = p(u, v), denoted A , is the atom p (u), where p is a fresh predicate symbol (not occurring elsewhere in the program), whose arity is the number of input position of p. Intuitively, the magic atom p (u) represent the fact that the relation p is called with input arguments u. We are now ready to introduce our version of the magic-sets transformation. Definition 3. Consider a program P and a one-atom query Q. The magic program O is obtained from P and Q by the following transformation steps: 1. for every decomposition A ← A, B, B of every clause from P , add a new clause B ← A , A; 2. add a new unit clause Q ← ; 3. replace each original clause A ← A from P with the new clause A ← A , A.

2 The magic program O is the optimized version of the program P w.r.t. the query Q. Observe that the transformation step (1) is performed in correspondence with every body atom of every clause in the program. Also, the only unit clause, or fact, is that introduced at step (2), also called a “seed”. The collection of clauses generated at steps (1) and (2) allows to deduce all the magic atoms corresponding to the calls generated in the top-down/left-to-right execution of the original program P starting with the query Q. The declarative reading of the clause B ← A , A introduced at step (1) is: “if the relation in the head of the original clause is called with input arguments as in A , and the atoms A preceding B in the original clause have been deduced, then the relation B is called with input arguments as in B ”. Finally, the information about the calls represented by the magic atoms is exploited at step (3), where the premises of

92

Paolo Mascellani and Dino Pedreschi

the original clauses are strengthened by an extra constraint, namely that the conclusion A is taken only if it is pertinent to some needed call, represented by the fact that A has been deduced. Example 2. Consider the program ListSum of Example 1 with the moding: listsum(+,-) sum(+,+,-) and the query: listsum([2,1,5],Sum) that is consistent with the moding. The corresponding magic program is: listsum([],0) ← listsum’([]) listsum([X|Xs],Sum) ← listSum’([X|Xs]),listsum(Xs,PSum),sum(PSum,X,Sum) sum(X,0,X) ← sum’(X,0) sum(X,s(Y),s(Z)) ← sum’(X,s(Y)),sum(X,Y,Z) listsum’(Xs) ← listsum’([X|Xs]) sum’(Psum,X) ← listsum’([X|Xs]),listsum(Xs,PSum) sum’(X,Y) ← sum’(X,s(Y)) listsum’([2,1,5]) ←

2 Partial Correctness of the Magic-Sets Transformation We now want to show that the magic-sets transformation is correct. The correctness of the transformation is stated in natural terms in the main result of this section, which essentially says that the original and the magic program share the same logical consequences, when both are restricted to the intended query. Theorem 4. Let P be a program, Q be a one-atom query, and consider the magic program O. Then: MP ∩ [Q] = MO ∩ [Q]. Proof. The proof is organized in the following three steps: 1. the interpretation M = {A ∈ BP | MO |= A ⇒ MO |= A} is a model of P ; 2. the interpretation I = {A ∈ BP | MO |= A } is a base for P w.r.t. M ; 3. MP ∩ I = MO ∩ I.

The Declarative Side of Magic

93

The thesis follows directly from (3), observing that [Q] ⊆ I as a consequence of the fact that the magic program O contains the seed fact Q ← . We now prove the facts (1), (2) and (3). Proof of 1 Consider a ground instance A ← A of a clause from P : to show that M is a model of the clause, we assume: M |= A MO |= A

(8) (9)

and prove that MO |= A. In turn, such conclusion is implied by MO |= A as a consequence of (9) and the fact that the magic program O contains the clause A ← A , A. To prove MO |= A we proceed by induction on A: in the base case (A is empty) the conclusion trivially holds. In the induction case (A = B, B, C) the magic program contains the clause B ← A , B, and therefore MO |= B as a consequence of (9) and the induction hypothesis. As M |= B by (8), we have that MO |= B implies MO |= B, by the deﬁnition of M . Proof of 2 Consider a ground instance A ← A of a clause from P , and assume: I |= A M |= A

(10) (11)

To obtain the desired conclusion, we prove that I |= A by induction on A. In the base case (A is empty) the conclusion trivially holds. In the induction case (A = B, B, C) the magic program O contains the clause c : B ← A , B. By the induction hypothesis, I |= B, which implies MO |= B by the deﬁnition of I. This, together with (11), implies: MO |= B

(12)

by the deﬁnition of M . Next, by (10) and the deﬁnition of I, we obtain MO |= A , which, together with (12) and clause c, implies MO |= B . This directly implies I |= B. Proof of 3 (⊆). First we show that MO is a model of PI . In fact, consider clause A ← A of PI , and assume that MO |= A. By the deﬁnition of PI , I |= A, which by the deﬁnition of I implies MO |= A . Hence, considering that A ← A , A is a ground instance of a cause of O, MO |= A. This implies that MO includes MPI , which, by Lemma 2, is equal to MP ∩ I, since I is a base for P from (i) and (ii). (⊇). Clearly MO ∩ BP ⊆ MP , as the clauses from P are strengthened in O with extra premises in the body. Hence, observing that I ⊆ BP we obtain MO ∩ I ⊆ MP ∩ I. 2 The crucial point in this proof is the fact that the set I of atoms corresponding to the magic atoms derived in O is a base, i.e. an admissible set of intended

94

Paolo Mascellani and Dino Pedreschi

queries, which describes all possible calls to the program originating from the top level query Q. An Immediate consequence of Theorem 4 is the following: Corollary 1. Let P be a program, Q be a one-atom query, and consider the magic program O. Then, A is a ground instance of a computed answer of Q in 2 P iﬀ it is a ground instance of a computed answer of Q in O. Observe that the above equivalence result is obtained with no requirement about the fact the original program respects the speciﬁc moding, nor with any need of performing the so-called bound/free analysis. In this sense, this result is more general to the equivalence results in the literature, based on procedural reasoning. However, these results, such as that in [20] tell us more from the point of view of the relative eﬃciency of bottom-up and top-down computing. As a consequence of Theorems 1 and 4, we can conclude that, for any oneatom query A which admits only ground correct instances w.r.t. a program P , the following triple holds: {A}

P MO ∩ [A]

(13)

i.e. the computed instances of A in P coincide with the correct instances of A in the magic program O. However, we need a syntactic condition able to guarantee that every correct instance is ground. Well-Moded Programs In the practice of deductive databases, the magic-sets transformation is applied to so-called well-moded programs, as for this programs the computational beneﬁts of the transformation are fully exploited, in a sense which shall be clariﬁed in the sequel. Definition 4. With reference to some speciﬁc, ﬁxed moding: – a one-atom query p(i, o) is called well-moded iﬀ: vars(i) = ∅; – a clause p0 (o0 , in+1 ) ← p1 (i1 , o1 ), . . . , pn (in , on ) is called well-moded if, for i ∈ [1, n + 1]: vars(ii ) ⊆ vars(o0 ) ∪ · · · ∪ vars(oi−1 ); – a program is called well-moded if every clause of it is.

2

The Declarative Side of Magic

95

Thus, in well-moded clauses, all variables in the input positions of a body atom occur earlier in the clause, either in an output position of a preceding body atom, or in an input position of the head. Also, one-atom well-moded queries are ground at input positions. Well-modedness is a simple syntactic condition which guarantees that a given program satisﬁes a given moding. A well-known property of well-moded programs and queries is that they deliver ground output. Theorem 5. Let P be a well-moded program, and A a one-atom well-moded query. Then every computed instance of A in P is ground. Proof. See, for instance, [5]. The general idea of this proof is to show the following points: 1. at each step of the resolution process, the selected atom is well-moded; 2. all the output terms of a selected atom in a refutation appears in the input term of some selected atom of the refutation. This, together with the fact that the ﬁrst selected atom (the query) is well2 moded, implies the claim. So, well-modedness provides a (syntactic) suﬃcient condition to fulﬁll the proof obligation of triple (13). Example 3. The program ListSum of Example 1 is well-moded w.r.t.: listsum(+,-) sum(+,+,-) hence the following triple can be established: {listsum(xs, Sum)}

ListSum MListSum ∩ [listsum(xs, Sum)]

Consider the magic program O for ListSum and listsum(xs,Sum). As a consequence of (13), we can also establish that: {listsum(xs, Sum)}

ListSum MO ∩ [listsum(xs, Sum)]

So the computed instances of the desired query can be deduced using the magic program O. This is relevant because, as we shall see later, bottom-up computing with the magic program is much easier than with the original pro2 gram. Moreover, well-modedness of the original program implies safety of the magic program, in the sense of [19]: every variable that occurs in the head of a clause of the magic program, also occurs in its body. Theorem 6. Let P be a well-moded program and Q a well-moded query. Then, the magic program O is safe.

96

Paolo Mascellani and Dino Pedreschi

Proof. By Deﬁnition 3, there are three types of clauses in O. Case A ← A , A The variables in the input positions of A occur in A , by Deﬁnition 3. By Deﬁnition 4, the variables in the output positions of A appear either in the input positions of A, and hence in A , or in the output positions of A. Case Q ← By the fact the Q is well-moded, Q is ground. Case B ← A , A By Deﬁnition 3, the original clause from P is A ← A, B, B. The variables of B are those in the input positions of B, that, by Deﬁnition 4, occur either in the input terms of A, and hence in A , or in the output terms of A. 2 Thus, despite the fact that a well-moded program, such as ListSum of Example 1, may not be suited for bottom-up computing, its magic version is, in the sense that the minimum requirement that ﬁnitely many new facts are inferred at each bottom-up iteration is fulﬁlled. We conclude this section with some remarks about the transformation. First, observe that the optimization algorithm is modular, in the sense that each clause can be optimized separately. In particular we can obtain the optimized program transforming the program at compile time and the query, which provides the seed for the computation, at run time. Second, non-atomic queries can be dealt with easily: given a query A, it is suﬃcient to add to the program a new clause ans(X) ← A, where ans is a fresh predicate and X are the variables in A, and optimize the extended program w.r.t. the one-atom query ans(X). Finally, the traditional distinction between an extensional database (EDB) and an intensional one (IDB) is immaterial to the discussion presented in this paper.

4

Total Correctness

What is the meaning of a triple {Q} P Q in the sense of total correctness? Several interpretations are possible, but the most common is to require partial correctness plus the fact that all derivations for Q in P are ﬁnite—a property which is referred to as universal termination. However, such a requirement would be unnecessarily restrictive if an arbitrary selection strategy is allowed in the top-down computation. For this reason, the termination analysis is usually tailored for some particular top-down strategy, such as Prolog’s depth-ﬁrst strategy combined with a leftmost selection rule, referred to as LD-resolution. A proof method for termination of Prolog programs is introduced in [6,7], based on the following notion of an acceptable program. Definition 5. Let A be an atom and c be a clause, then: – A level mapping is a function | | from ground atoms to natural numbers. – A is bounded w.r.t. | |, if | | is bounded on the set of all ground instances of A.

The Declarative Side of Magic

97

– c is acceptable w.r.t. | | and an interpretation I, if • I is a model of c, • for all ground instances A ← A, B, B of c such that I |= A |A| > |B|. – A program is acceptable w.r.t. | | and I, if every clause of it is.

2

The intuition behind this deﬁnition is the following. The level mapping plays the role of a termination function, and it is required to decrease from head to the body of any (ground instance of a) clause. The model I used in the notion of acceptability gives a declarative account of the leftmost selection rule of Prolog. The decreasing of the level mapping from the head A to a body atom B is required only if the body atoms to the left of B have been already refuted: in this case, by the Soundness of SLD-resolution, these atoms are true in any model of the program. In the proof method, the model I is employed to propagate inter-argument relations from left to right. The following result about acceptable programs holds. Theorem 7. Suppose that – the program P is acceptable w.r.t. | | and I, – the one-atom query Q is bounded w.r.t. | |. Then all Prolog computations of Q in P are ﬁnite. Proof. See [6,7], for a detailed proof. The general idea is to associate a multiset of integers to each query of the resolution and to show the multiset associated 2 with a query is strictly greater than the one associated with its resolvent. Moreover, it is possible to show that each terminating Prolog program P is acceptable w.r.t. the following level mapping: |A| = nodesP (A) where nodesP denotes the number of nodes in the S-tree for P ∪ { ← A}. Example 4. The program ListSum of Example 1 is acceptable w.r.t. any model and the level mapping | | deﬁned as follows: |listsum(xs, sum)| = size(xs) |sum(x, y, z)| = size(y) where size(t) counts the number of symbols in the (ground) term t. This can be easily checked simply observing that the number of functional symbols of every atom in the body of the clauses is strictly less than the number of functional symbols in the corresponding head.

98

Paolo Mascellani and Dino Pedreschi

Also, for every ground term xs and variable Sum, the query listsum(xs,Sum) is bounded, so every Prolog computation for it terminates, as a consequence of Theorem 7. In many cases, a non-trivial model is needed in the proof of termination. In the ListSum example, if the two input arguments of the relation sum in the recursive clause of listsum are swapped, then a model I is needed, such that I |= listsum(xs, sum) iﬀ size(xs) ≥ size(sum). Moreover, it is in general possible to use simpler level mappings, but this 2 requires more complicate deﬁnitions: see [7,15] for details. Besides its use in proving termination, the notion of acceptability makes the task of constructing the least Herbrand model of a program much easier. Call an interpretation I for a program P supported if for any A ∈ I there exists a ground instance A ← B of a clause from P such that I |= B. The following result from [6] holds. Theorem 8. Any acceptable program P has a unique supported model, which coincides with its least Herbrand model MP . Proof. See [6] for details. Consider a ﬁx-point X of TP , strictly greater that MP , and an element A ∈ X\MP ; then, there must be a ground atom B ∈ X\MP such that A ← A, B, B ∈ ground(P ). But this leads to an inﬁnite chain of resovents, 2 starting from A. Usually, checking that an interpretation is a supported model of the program is straightforward, and does not require inductive reasoning. Also, this technique can be used with the reduced program, as reduced programs of acceptable programs are in turn acceptable. Summarizing, the problem of establishing a triple {A} P A in the sense of total correctness, for a well-moded program P and query A, can be solved by the following steps: 1. ﬁnd a base I for P such that [A] ⊆ I; 2. show that P is acceptable and A is bounded w.r.t. the same model and level mapping; 3. ﬁnd a supported model M of PI ; 4. check that A = M ∩ [A]. In the Append example of Section 2, it is easy to show that the set (2), namely {append(xs,ys,zs) | xs,ys,zs are lists and xs * ys = zs} is indeed a supported model of the program reduced by its base (3), so the desired triple can be established. In the ListSum example, it is readily checked that the set 7 from Example 2 is a supported model of the program reduced by its base IListSum .

5

Total Correctness and Bottom-Up Computing

Although a thorough study of the relative eﬃciency of bottom-up and top-down execution is outside the reach of our declarative methods, we are able to show

The Declarative Side of Magic

99

the total correctness of the magic-sets transformation on the basis of the results of the previous section. In fact, we can show that if the original program is terminating in a top-down sense, then the magic program is terminating in a bottom-up sense, in a way which is made precise by the next result. Two assumptions on the original programs are necessary, namely acceptability, which implies termination, and well-modedness, which implies ground output. Theorem 9. Let P be a well-moded, acceptable program, and Q a one-atom well-moded, bounded query. Then the least Herbrand model of the magic program O is ﬁnite. Proof. Let I and | | be the model and level mapping used in the proof of acceptability. We deﬁne a mapping of magic atoms into ω ∪ ∞ as follows: |A | = max{|B| | A = B }. Next, we show that MO contains a ﬁnite number of magic atoms. First, we observe that, for the seed fact Q ∈ TP (∅), |Q | < ω, as the query Q is bounded. Consider now a magic atom B deduced at stage n > 1 in the bottom-up computation, i.e. B ∈ TOn (∅) \ TOn−1 (∅). By the magic transformation, there is a clause B ← A , A in O such that TOn−1 (∅) |= A , A. Since TOn−1 (∅) |= A implies that A holds in any model of P by the partial correctness Theorem 4, we have by the acceptability of P that , for each clause A ← A, B, B in P , |A| > |B|, which implies |A | > |B |. Therefore, the level of newly deduced magic atoms is smaller than that of some preexisting magic atoms, which implies that ﬁnitely many magic atoms are in MO . To conclude the proof, we have to show that there are ﬁnitely many non-magic atoms in MO . Observe that every non-magic atom A of MO is a computed answer of a query B such that MO |= B . Given A ∈ MO , consider a query B with its output positions ﬁlled with distinct variables, and B = A . By Theorems 7 and 5, B has a ﬁnite set of ground computed answers. The thesis then follows by the fact that ﬁnitely many magic atoms are in MO . 2 As an immediate consequence of this theorem we have that, for some n ≥ 0: TPnI (∅) = MO and therefore the bottom-up computation with O terminates. Notice that this result does not imply that the bottom-up computation with O and the topdown one with P are equally eﬃcient, although both terminates. In [20], an extra condition on the original program is required, namely that it is subgoal rectiﬁed, in order to obtain that the cost of computing with the magic program is proportional to top-down evaluation. As a ﬁnal example, consider again the ListSum program of Example 1 and the query listsum(xs,Sum). By the partial correctness results, we know that: {listsum(xs, Sum)}

ListSum MO ∩ [listsum(xs, Sum)]

100

Paolo Mascellani and Dino Pedreschi

By Theorem 9 MO is ﬁnite, so we can actually perform a bottom-up computation with O, thus obtaining MO ﬁrst, and then extract the desired computed instances from it.

6

Examples

Length of a List Consider the program ListLen, the call pattern listlen(+,-) and the query listlen([a,b,b,a]). The optimized program is: listlen([],0) ← listlen’([]) listlen([X|Xs],s(L)) ← listlen’([X|Xs]),base(X),listlen(Xs,L) listlen’(Xs) ← base(X),listlen’([X|Xs]) listlen’([a,b,b,a]) ← As we can see there is only one clause which depends from the query, namely the optimized query w.r.t. C, and it can be easily produced at run time. The bottom-up evaluation of the optimized program is: TP1 (∅) = TP2 (∅) = TP3 (∅) = TP4 (∅) = TP5 (∅) = TP6 (∅) = TP7 (∅) = TP8 (∅) = TP9 (∅) = TP10 (∅) =

TP1 (∅) TP2 (∅) TP3 (∅) TP4 (∅) TP5 (∅) TP6 (∅) TP7 (∅) TP8 (∅) TP9 (∅)

{listlen([a, b, b, a])} ∪ {listlen([b, b, a])} ∪ {listlen([b, a])} ∪ {listlen([a])} ∪ {listlen([])} ∪ {listlen([], 0)} ∪ {listlen([b, a], s(s(0)))} ∪ {listlen([b, b, a], s(s(s(0))))} ∪ {listlen([a, b, b, a], s(s(s(s(0)))))}

It can be noted that in the ﬁrst part of the computation the optimized program computes the closed interpretation IListlen,[a,b,b,a] , and in the last one uses it in order to optimize the computation. Sum of a List of Numbers Consider the program ListSum, the call patterns: listsum(+,-) sum(+,+,-) and the query listsum([s(0),s(s(0))], Sum). The optimized program is:

The Declarative Side of Magic

101

listsum([],0) ← listsum’([]) listsum([X|Xs],Sum) ← listSum’([X|Xs]),listsum(Xs,PSum),sum(PSum,X,Sum) sum(X,0,X) ← sum’(X,0),nat(X) sum(X,s(Y),s(Z)) ← sum’(X,s(Y)),sum(X,Y,Z) listsum’(Xs) ← listsum’([X|Xs]) sum’(Psum,X) ← listsum’([X|Xs]),listsum(Xs,PSum) sum’(X,Y) ← sum’(X,s(Y)) listsum’([s(0),s(s(0))]) ← The bottom-up evaluation of the optimized program is: TP1 (∅) = TP2 (∅) = TP3 (∅) = TP4 (∅) = TP5 (∅) = TP6 (∅) = TP7 (∅) = TP8 (∅) = TP9 (∅) = TP10 (∅) = TP11 (∅) = TP12 (∅) = TP13 (∅) = TP14 (∅) = TP15 (∅) = TP16 (∅) =

TP1 (∅) ∪ TP2 (∅) ∪ TP3 (∅) ∪ TP4 (∅) ∪ TP5 (∅) ∪ TP6 (∅) ∪ TP7 (∅) ∪ TP8 (∅) ∪ TP9 (∅) ∪ TP10 (∅) ∪ TP11 (∅) ∪ TP12 (∅) ∪ TP13 (∅) ∪ TP14 (∅) ∪ TP15 (∅) ∪

{listsum([s(0), s(s(0))])} {listsum([s(s(0))])} {listsum([])} {listsum([], 0)} {sum (0, s(s(0)))} {sum (0, s(0))} {sum (0, 0)} {sum(0, 0, 0)} {sum(0, s(0), s(0))} {sum(0, s(s(0)), s(s(0)))} {listsum([s(s(0))], s(s(0)))} {sum (s(s(0)), s(0))} {sum (s(s(0)), 0)} {sum(s(s(0)), 0, s(s(0)))} {sum(s(s(0)), s(0), s(s(s(0))))} {listsum([s(s(0)), s(0)], s(s(s(0))))}

In this case the computation of the closed interpretation is interlaced with the computation of the interesting part of the least Herbrand model. Ancestors Consider the following program Ancestor: ancestor(X,Y) ← parent(X,Y) ancestor(X,Y) ← parent(X,Z),ancestor(Z,Y) where P arent is a base relation. Consider the moding ancestor(+,-) and the query ancestor(f,Y). The optimized program is: ancestor(X,Y) ← ancestor’(X),parent(X,Y) ancestor(X,Y) ← ancestor’(X),parent(X,Z),ancestor(Z,Y) ancestor’(Y) ← parent(X,Y),ancestor’(X) ancestor’(a) ←

102

Paolo Mascellani and Dino Pedreschi

If we suppose the following deﬁnition for the base relation parent:

parent(a,b) parent(a,c) parent(a,d) parent(e,b) parent(e,c) parent(e,d) parent(f,a) parent(f,g) parent(h,e) parent(h,i)

← ← ← ← ← ← ← ← ← ←

The computation is: {ancestor (f)} ancestor (a) ancestor(g) 2 1 TP (∅) = TP (∅) ∪ ancestor(f, a) g) ancestor(f, ancestor(b) ancestor(c) ancestor (d) TP3 (∅) = TP2 (∅) ∪ ancestor(f, b) ancestor(f, c) ancestor(f, d) TP4 (∅) = TP3 (∅)

TP1 (∅) =

However, we obtain a diﬀerent optimized program if we consider the moding ancestor(-,+) and the query ancestor(X,b):

ancestor(X,Y) ← ancestor’(Y),parent(X,Y) ancestor(X,Y) ← ancestor’(Y),parent(X,Z), ancestor(Z,Y) ancestor’(X) ← parent(X,Y),ancestor’(Y) ancestor’(Y) ←

The Declarative Side of Magic

103

The computation is: {ancestor (b)} (a) ancestor (e) ancestor 2 1 TP (∅) = TP (∅) ∪ ancestor(a, b) b) acenstor(e, ancestor (f) ancestor (h) 3 2 TP (∅) = TP (∅) ∪ ancestor(f, b) ancestor(h, b) 4 3 TP (∅) = TP (∅)

TP1 (∅) =

As we can see, diﬀerent call patterns generate diﬀerent optimized program. In general these programs are not equivalent. Powers Consider now the following program Power, which computes xy , where x and y are natural numbers: power(X,0,s(0)) ← power(X,s(Y),Z) ← power(X,Y,W),times(X,W,Z) times(X,0,0) ← times(X,s(Y),Z) ← times(X,Y,W),sum(X,W,Z) sum(X,0,X) ← sum(X,s(Y),s(Z)) ← sum(X,Y,Z) If we consider the call patterns: power(+,+,-) times(+,+,-) sum(+,+,-) and the query: power(s(s(0)),s(s(0)),Z) the optimized program is: power(X,0,s(0)) ← power’(X,0) power(X,s(Y),Z) ← power’(X,s(Y)),power(X,Y,W),times(X,W,Z) times(X,0,0) ← times’(X,0) times(X,s(Y),Z) ← times’(X,s(Y)),times(X,Y,W),sum(X,W,Z) sum(X,0,X) ← sum’(X,0) sum(X,s(Y),s(Z)) ← sum’(X,s(Y)),sum(X,Y,Z) power’(X,Y) ← power’(X,s(Y))

104

Paolo Mascellani and Dino Pedreschi

times’(X,W) ← power(X,Y,W),power’(X,s(Y)) times’(X,Y) ← times’(X,s(Y)) sum’(X,W) ← times(X,Y,W),times’(X,s(Y)) sum’(X,Y) ← sum’(X,s(Y)) power’(s(s(0)),s(s(0))) ← The computation is: TP1 (∅) = TP2 (∅) = TP3 (∅) = TP4 (∅) = TP5 (∅) = TP6 (∅) = TP7 (∅) = TP8 (∅) = TP9 (∅) = TP10 (∅) = TP11 (∅) = TP12 (∅) = TP13 (∅) = TP14 (∅) = TP15 (∅) = TP16 (∅) = TP17 (∅) = TP18 (∅) = TP19 (∅) =

TP1 (∅) ∪ TP2 (∅) ∪ TP3 (∅) ∪ TP4 (∅) ∪ TP5 (∅) ∪ TP6 (∅) ∪ TP7 (∅) ∪ TP8 (∅) ∪ TP9 (∅) ∪ TP10 (∅) ∪ TP11 (∅) ∪ TP12 (∅) ∪ TP13 (∅) ∪ TP14 (∅) ∪ TP15 (∅) ∪ TP16 (∅) ∪ TP17 (∅) ∪ TP18 (∅)

{power(s(s(0)), s(s(0)))} {power(s(s(0)), s(0))} {power(s(s(0)), 0)} {power(s(s(0)), 0, s(0))} {times(s(s(0)), s(0))} {times(s(s(0)), 0)} {times(s(s(0)), 0, 0)} {sum (s(s(0)), 0)} {sum(s(s(0)), 0, s(s(0)))} {times(s(s(0)), s(0), s(s(0)))} {power(s(s(0)), s(0), s(s(0)))} {times(s(s(0)), s(s(0)))} {sum (s(s(0)), s(s(0)))} {sum (s(s(0)), s(0))} {sum(s(s(0)), s(0), s(s(s(0))))} {sum(s(s(0)), s(s(0)), s(s(s(s(0)))))} {times(s(s(0)), s(s(0)), s(s(s(s(0)))))} {power(s(s(0)), s(s(0)), s(s(s(s(0)))))}

It is interesting to note that the computation is, in this case, really closed to that generate by a functional program with lazy evaluation. Binary Search Consider the following program Search, implementing the dichotomic (or binary) search on a list of pairs (Key, V alue) ordered with respect to Key: search(N,Xs,M) ← divide(Xs,Xs1,X,Y,Xs2),switch(N,X,Y,Xs1,Xs2,M) switch(N,N,M,Xs1,Xs2,M) ← key(N),value(M) switch(N,X,Y,Xs1,Xs2,M) ← greater(N,X),search(N,Xs2,M) switch(N,X,Y,Xs1,Xs2,M) ← greater(X,N),search(N,Xs1,M) where Key and Value are base relations. Observe that the program is not completely speciﬁed, as the relations Divide, and Greater have no deﬁnition. If we consider the following call patterns:

The Declarative Side of Magic

105

search(+,+,-) switch(+,+,+,+,+,-) and the query search(5,[(1,a),(3,b),(5,a),(10,c)],M), the optimized program is: search(N,Xs,M) ← search’(N,Xs),divide(Xs,Xs1,X,Y,Xs2), switch(N,X,Y,Xs1,Xs2,M) switch(N,N,M,Xs1,Xs2,M) ← switch’(N,N,M,Xs1,Xs2), key(N),value(M) switch(N,X,Y,Xs1,Xs2,M) ← switch’(N,X,Y,Xs1,Xs2),greater(N,X), search(N,Xs2,M) switch(N,X,Y,Xs1,Xs2,M) ← switch’(N,X,Y,Xs1,Xs2),greater(X,N), search(N,Xs1,M) switch’(N,X,Y,Xs1,Xs2) ← divide(Xs,Xs1,X,Y,Xs2),search’(N,Xs) search’(N,Xs2) ← N>X,switch’(N,X,Y,Xs1,Xs2) search’(N,Xs1) ← N<X,switch’(N,X,Y,Xs1,Xs2) search’(5,[(1,a),(3,b),(5,a),(10,c)]) ← The computation is the following: TP1 (∅) = TP2 (∅) = TP3 (∅) = TP4 (∅) = TP5 (∅) = TP6 (∅) = TP7 (∅) = TP8 (∅) = TP9 (∅) =

TP1 (∅) TP2 (∅) TP3 (∅) TP4 (∅) TP5 (∅) TP6 (∅) TP7 (∅) TP8 (∅)

{search(5, [(1, a), (3, b), (5, a), (10, c)])} ∪ {switch(5, 3, b, [(1, a)], [(5, a), (10, c)])} ∪ {search(5, [(5, a), (10, c)])} ∪ {switch(5, 5, a, [], (10, c)])} ∪ {switch(5, 5, a, [], (10, c)], a)} ∪ {search(5, [(5, a), (10, c)], a)} ∪ {switch(5, 3, b, [(1, a)], [(5, a), (10, c)], a)} ∪ {search(5, , [(1, a), (3, b), (5, a), (10, c)], a)}

Fibonacci Numbers Consider the following program, that computes the Fibonacci numbers: fib(0,0) ← fib(s(0),s(0)) ← fib(s(s(X)),Y) ← fib(s(X),Y1),fib(X,Y2),sum(Y1,Y2,Y) sum(X,0,X) ← sum(X,s(Y),s(Z)) ← sum(X,Y,Z) with the moding: fib(+,-) sum(+,+,-)

106

Paolo Mascellani and Dino Pedreschi

and the query fib(s(s(s(0)))),Y). The optimized program is: fib’(s(s(s(0)))) ← fib’(s(X)) ← fib’(s(s(X))) fib’(X) ← fib’(s(s(X)),fib(s(X),Y1) sum’(Y1,Y2) ← fib’(s(s(X))),fib(s(X),Y1),fib(X,Y2) sum’(X,Y) ← sum’(X,s(Y)) fib(0,0) ← fib’(0) fib(s(0),s(0)) ← fib’(s(0)) fib(s(s(X)),Y) ← fib’(s(s(X))),fib(s(X),Y1),fib(X,Y2),sum(Y1,Y2,Y) sum(X,0,X) ← sum’(X,0) sum(X,s(Y),s(Z)) ← sum’(X,s(Y)),sum(X,Y,Z) The computation is the following: TP1 (∅) = TP2 (∅) = TP3 (∅) = TP4 (∅) = TP5 (∅) = TP6 (∅) = TP7 (∅) = TP8 (∅) = TP9 (∅) = TP10 (∅) = TP11 (∅) = TP12 (∅) =

TP1 (∅) ∪ TP2 (∅) ∪ TP3 (∅) ∪ TP4 (∅) ∪ TP5 (∅) ∪ TP6 (∅) ∪ TP7 (∅) ∪ TP8 (∅) ∪ TP9 (∅) ∪ TP10 (∅) ∪ TP11 (∅)

{fib (s(s(s(0))))} {fib (s(s(0)))} {fib (s(0))} {fib (0), fib(s(0), s(0))} {fib(0, 0)} {sum (s(0), 0)} {sum(s(0), 0, s(0))} {fib(s(s(0)), s(0))} {sum (s(0), s(0))} {sum(s(0), s(0), s(s(0))} {fib(s(s(s(0)))), s(s(0))}

Here we can observe that the magic-sets transformation is suitable also for nonlinear recursive programs, i.e. program with more than one mutually recursive body atoms. Once again we can see that the computation is “lazy”.

7

Conclusions

In this paper, we introduced a method for proving partial correctness, revised another method for total correctness, and applied both to the case study of the magic-sets transformation for goal-driven bottom-up computing. The obtained results rely on purely declarative reasoning, abstracting away from procedural semantics, and are new under various points of view. First, partial correctness is obtained without any assumptions that the program respects the given moding. Second, termination is obtained under the only assumptions of well-modedness, which is natural in practical bottom-up computing, and acceptability, which is a necessary and suﬃcient condition for top-down termination.

The Declarative Side of Magic

107

Moreover, both partial correctness and termination are established for logic programs in full generality, and not only for function-free Datalog programs. Further research may be pursued on the topics of this paper. For instance, we are conﬁdent that the same kind of result can be established for other variants of the magic-sets transformation technique and also for extensions of it to general logic programs (i.e. logic program with negation in the body of the clauses). Moreover, it is interesting to investigate whether other optimization techniques may be deﬁned using the concept of base. Acknowledgements Thanks are owing to Yeoshua Sagiv for useful discussions.

References 1. 2.

3.

4.

5. 6. 7.

8.

9.

10.

11. 12.

K.R. Apt. Logic programming. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B, pages 493–574. Elsevier, 1990. K. R. Apt. Declarative programming in Prolog. In D. Miller, editor, Proc. International Symposium on Logic Programming, pages 11–35. MIT Press, 1993. K.R. Apt. Program Veriﬁcation and Prolog. In E. B¨ orger, editor, Specification and Validation methods for Programming languages and systems. Oxford University Press, 1994. K.R. Apt, M. Gabbrielli, and D. Pedreschi. A Closer Look at Declarative Interpretations. Technical Report CS-R9470, Centre for Mathematics and Computer Science, Amsterdam, Journal of Logic Programming. 28(2): 147180, 1996. K.R. Apt and E. Marchiori. Reasoning about Prolog programs: from modes through types to assertions. Formal Aspects of Computing, 6A:743–764, 1994. K.R. Apt and D. Pedreschi. Reasoning about termination of pure prolog programs. Information and computation, 106(1):109–157, 1993. K. R. Apt and D. Pedreschi. Modular termination proofs for logic and pure Prolog programs. In G. Levi, editor, Advances in Logic Programming Theory, pages 183–229. Oxford University Press, 1994. A. Bossi and N. Cocco. Verifying Correctness of Logic Programs. In J. Diaz and F. Orejas, editors, TAPSOFT ’89, volume 352 of Lecture Notes in Computer Science, pages 96–110. Springer-Verlag, Berlin, 1989. C. Beeri and R. Ramakrishnan. The power of magic. In Proc. 6th ACMSIGMOD-SIGACT Symposium on Principles of Database systems, pages 269– 283. The Association for Computing Machinery, New York, 1987. F. Bronsard, T.K. Lakshman, and U.S. Reddy. A framework of directionality for proving termination of logic programs. In K. R. Apt, editor, Proceedings of the Joint International Conference and Symposium on Logic Programming, pages 321–335. MIT Press, 1992. P. Deransart. Proof methods of declarative properties of deﬁnite programs. Theoretical Computer Science, 118:99–166, 1993. J.W. Lloyd. Foundations of logic programming. Springer-Verlag, Berlin, second edition, 1987.

108 13. 14.

15.

16. 17. 18.

19. 20.

Paolo Mascellani and Dino Pedreschi P. Mascellani. Declarative Veriﬁcation of General Logic Programs. In Proceedings of the Student Session, ESSLLI-2000. Birmingham UK, 2000. P. Mascellani and D. Pedreschi. Proving termination of prolog programs. In Proceedings 1994 Joint Conf. on Declarative Programming GULP-PRODE ’94, pages 46–61, 1994. P. Mascellani and D. Pedreschi. Total correctness of prolog programs. In F.S. de Boer and M. Gabbrielli, editors, Proceedings of the W2 Post-Conference Workshop ICLP’94. Vrije Universiteit Amsterdam, 1994. D. Pedreschi. Veriﬁcation of Logic Programs. In M. I. Sessa, editor, Ten Years of Logic Programming in Italy, pages 211–239. Palladio, 1995. D. Pedreschi and S. Ruggieri. Veriﬁcation of Logic Programs. Journal of Logic Programming, 39 (1-3):125-176, April 1999 S. Ruggieri. Proving (total) correctness of prolog programs. In F.S. de Boer and M. Gabbrielli, editors, Proceedings of the W2 Post-Conference Workshop ICLP’94. Vrije Universiteit Amsterdam, 1994. J.D. Ullman. Principles of Database and Knowledge-base Systems, Volume I. Principles of Computer Science Series. Computer Science Press, 1988. J.D. Ullman. Principles of Database and Knowledge-base Systems, Volume II; The New Technologies. Principles of Computer Science Series. Computer Science Press, 1989.

Key Constraints and Monotonic Aggregates in Deductive Databases Carlo Zaniolo Computer Science Department University of California at Los Angeles Los Angeles, CA 90095 [email protected] http://www.cs.ucla.edu/∼zaniolo

Abstract. We extend the ﬁxpoint and model-theoretic semantics of logic programs to include unique key constraints in derived relations. This extension increases the expressive power of Datalog programs, while preserving their declarative semantics and eﬃcient implementation. The greater expressive power yields a simple characterization for the notion of set aggregates, including the identiﬁcation of aggregates that are monotonic with respect to set containment and can thus be used in recursive logic programs. These new constructs are critical in many applications, and produce simple logic-based formulations for complex algorithms that were previously believed to be beyond the realm of declarative logic.

1

Introduction

The basic relational data model consists of a set of tables (or base relations) and of a query language, such as SQL or Datalog, from which new relations can be derived. Unique keys can be declared to enforce functional dependency constraints on base relations, and their important role in database schema design has been recognized for a long time [1,28]. However, little attention has been paid so far to the use of unique keys, or functional dependencies, in derived relations. This paper shows that keys in derived relations increase signiﬁcantly the expressive power of the query languages used to deﬁne such relations and this additional power yields considerable beneﬁts. In particular, it produces a formal treatment of database aggregates, including user-deﬁned aggregates, and monotonic aggregates, which can be used without restrictions in recursive queries to express complex algorithms that were previously considered problematic for Datalog and SQL.

2

Keys on Derived Relations

For example, consider a database containing relations student(Name, Major), and professor(Name, Major). In fact, let us consider the following microcollege example that only has three facts: A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 109–134, 2002. c Springer-Verlag Berlin Heidelberg 2002

110

Carlo Zaniolo

student(JimBlack, ee).

professor(ohm, ee). professor(bell, ee).

Now, the rule is that the major of a student must match his/her advisor’s main area of specialization. Then, eligible advisors can be computed as follows: elig adv(S, P) ← student(S, Majr), professor(P, Majr). Now the answer to a query ?elig adv(S, P) is {elig adv( JimBlack, ohm), elig adv( JimBlack, bell)} But, a student can only have one advisor. We can express this constraint by requiring that the ﬁrst argument be a unique key for the advisor relation. We denote this constraint by the notation unique key(advisor, [1])! Thus, the ﬁrst argument of unique key speciﬁes the predicate restricted by the key, and the second argument gives the list of the argument positions that compose the key. An empty list denotes that the derived relation can only contain a single tuple. The exclamation mark is used as the punctuation mark for key constraints. We can now write the following program for our microcollege: Example 1. For each student select one advisor from professors in the same area unique key(advisor, [1])! advisor(S, P) ←student(S, Majr), professor(P, Majr). student(JimBlack, ee). professor(ohm, ee). professor(bell, ee). Since the key condition ensures that there is only one professor in the resulting advisor table, our query has two possible answers. One is the set {advisor(JimBlack, ohm)} and the other is the set: {advisor(JimBlack, bell)} In the next section, we show that positive programs with keys can be characterized naturally by ﬁxpoint semantics containing multiple canonical answers; in Section 4, we show that their meaning can also be modelled by programs with negated goals under stable models semantics. Let us consider now some examples that provide a ﬁrst illustration of the expressive power brought to logic programming by keys in derived relations. The following program constructs a spanning tree rooted in node a, for a graph stored in a binary relation g as follows:

Key Constraints and Monotonic Aggregates in Deductive Databases

111

Example 2. Computing spanning trees unique key(tree, [2])! tree(root, a). tree(Y, Z) ← tree(X, Y), g(Y, Z). g(a, b). g(b, c). g(a, c). Two diﬀerent spanning trees can be derived, as follows: {tree(root, a), tree(a, b), tree(b, c)} {tree(root, a), tree(a, b), tree(a, c)} More than one key can be declared for each derived relation. For instance, let us add a second key, unique key(tree, [1]), to the previous graph example. Then, the result may no longer be a spanning tree; instead, it is a simple path, where for each source node, there is only one sink node and vice versa: Example 3. Computing simple paths unique key(spath, [1])! unique key(spath, [2])! spath(root, X) ←g(X, Y). spath(Y, Z) ← spath(X, Y), g(Y, Z). freenode ← g( , Y), ¬spath( , Y). The last rule in Example 3, above, detects whether any node remains free, i.e., whether there is a node not touched by the simple path. Now, a query on whether, for some simple path, there is no free node (i.e., is ¬freenode true?) can be used to decide the Hamiltonian path problem for our graph; this is an N P-complete problem. An equivalent way to pose the same question is asking whether freenode is true for all solutions. A system that generates all possible paths and returns a positive answer when f reenode holds for all paths implements an all-answer semantics. This example illustrates how exponential problems can be expressed in Datalog with keys under this semantics [14]. Polynomial time problems, however, are best treated using single-answer semantics, since this can be supported in polynomial time for Datalog programs with key constraints and stratiﬁed negation, as discussed later in this paper; moreover, these programs can express all the queries that are polynomial in the size of the database—i.e., the queries in the class DB-P T IM E [1]. Under singleanswer semantics, a deductive system is only expected to compute one out of the many existing canonical models for a program, and return an answer based on this particular model. For certain programs, this approach results in diﬀerent query answers being returned for diﬀerent canonical models computed by

112

Carlo Zaniolo

the system—nondeterministic queries. For other programs, however, the query answer remains the same for all canonical models—deterministic queries. This is, for instance, the case of the parity query below, which determines whether a non-empty database relation b(X) has an even number of tuples: Example 4. Counting mod 2 unique key(chain, [1])! unique key(chain, [2])! chain(nil, X) ← b(X). chain(X, Y) ← chain( , X), b(Y). ca(Y, odd) chain(nil, Y) ca(Y, even) ← ca(X, odd), chain(X, Y). ca(Y, odd) ← ca(X, even), chain(X, Y). mod2(Parity) ← ca(Y, Parity), ¬chain(Y, ). Observe that this program consists of three parts. The ﬁrst part is the chain rules that enumerate the elements of b(X) one-by-one. The second part is the ca rules that perform a speciﬁc aggregate-like computation on the elements of chain—i.e., the odd/even computation for the parity query. The third part is the mod2 rule that uses negation to detect the element of the chain without a successor, and to return the aggregate value ‘odd’ or ‘even’ from that of its ﬁnal element. We will later generalize this pattern to express the computation of generic aggregates. Observe that the query in Example 4 is deterministic, inasmuch as the answer to the parity question ?mod2(even) is independent of the particular chain being constructed, and only depends on the length of this chain, which is determined by the cardinality of b(x). The parity query is a well-known polynomial query that cannot be answered by Datalog with stratiﬁed negation under the genericity assumption [1]. Furthermore, the chain predicate illustrates how the elements of a domain can be arranged in a total order; we thus conclude that negationstratiﬁed Datalog with key constraints can express all DB-P T IM E queries [1]. In a nutshell, key constraints under single answer semantics extend the expressive power of logic programs, and ﬁnd important new applications. Of particular importance is the deﬁnition of set-aggregates. While aggregates have been used extensively in database applications, particularly in decision support and data mining applications, a general treatment of this fundamental concept had, so far, been lacking and is presented in this paper. 2.1

Basic Definitions

We assume that the reader is familiar with the relational data model and Datalog [1,36]. A logic program P/K consists of a set of rules, P , and a set of key constraints K; each such a constraint has the form unique key(q, γ), where q is the name of the predicate in P and γ is a subset of the arguments of q. Let I be an

Key Constraints and Monotonic Aggregates in Deductive Databases

113

interpretation of P ; we say that I satisﬁes the constraint unique key(q, γ), when no two atoms in I are identical in all their γ arguments. The notation I |= K will be used to denote that I satisﬁes every key constraint in K. The basic semantics of a positive Datalog program P consists of evaluating “in parallel” all applicable instantiations of P ’s rules. This semantics is formalized by the Immediate Consequences Operator, TP , that deﬁnes a mapping over the (Herbrand) interpretations of P , as follows: TP (I) = { A | A ← B1 , . . . , Bn ∈ ground(P ) ∧ B1 ∈ I ∧ . . . ∧ Bn ∈ I }. A rule r ∈ ground(P ) is said to be enabled by the interpretation I when all its goals are contained in I. Thus the operator TP (I) returns the set of the heads of rules enabled by I. The upward powers of TP starting from an interpretation I are deﬁned as follows: TP↑0 (I) = I ↑(i+1)

TP

(I) = TP (TP↑i (I)), ↑i TP↑ω (I) = TP (I).

for i ≥ 0

i≥0

The semantics of a positive program is deﬁned by the least ﬁxpoint of TP , denoted lf p(TP ), which is also equal to the least model of P , denoted MP [29]. The least ﬁxpoint of Tp can be computed as the ω-power of TP applied to the empty set: i.e., lf p(Tp ) = TP↑ω (∅). The inflationary version of the TP operator is denoted TP and deﬁned as follows: TP (I) = TP (I) ∪ I For positive programs, we have: TP↑ω = T↑ω P = MP = lf p(TP ) = lf p(TP ) The equivalence of model-theoretic and ﬁxpoint semantics no longer holds in Datalog¬ programs, which allow the use of negated goals in rules. Various semantics have therefore been proposed for Datalog¬ programs. For instance, the inflationary semantics, which adopts T↑ω P as the meaning of a program P , can be implemented eﬃciently but lacks desirable logical properties [1]. On the other hand, stratiﬁed negation is widely used and combines desirable computational and logical properties [22]; however, stratiﬁed negation severely restricts the class of programs that one can write. Formal semantics for more general classes of programs are also available [10,30,2]. Because of its generality and support for nondeterminism, we will use here the stable model semantics, that is deﬁned via a stability transformation [10], as discussed next. Given an interpretation I and a Datalog¬ program P , the stability transformation derives the positive program groundI (P ) by modifying the rules of ground(P ) as follows:

114

Carlo Zaniolo

– drop all clauses with a negative literal ¬A in the body with A ∈ I, and – drop all negative literals in the body of the remaining clauses. Next, an interpretation M is a stable model for a Datalog¬ program P iﬀ M is the least model of the program groundM (P ). In general, Datalog¬ programs may have zero, one, or many stable models. We shall see how the multiplicity of stable models can be exploited to give a declarative account of non-determinism.

3

Fixpoint Semantics

We use the notation P/K to denote a logic program P constrained by the set of unique keys K. We make no distinction between interpretations of P and interpretations of P/K; thus every I ⊆ BP is an interpretation for P/K. Since a program with key constraints can have multiple interpretations, we will now introduce the concept of family of interpretations. A family of interpretations for P is deﬁned as a non-empty set of maximal interpretations for P . More formally: Definition 1. Let be a nonempty set of interpretations for P where no element in is a subset of another. Then is called a family of interpretations for P . The set of families of interpretations for P will be denoted by f ins(P ). For instance, let P be the program: a. b ← a. Then f ins(P ) consists of the following families of interpretations: 1. 2. 3. 3. 4. 3.1

{{}} {{a}} {{b}} {{a}, {b}} {{a, b}} Lattice

The f ins(P ) can be partially ordered as follows: Definition 2. Let 1 and 2 be two elements of f ins(P ). If ∀I1 ∈ 1 , ∃I2 ∈ 2 s.t. I1 ⊆ I2 , then we say that 1 is a subfamily of 2 and write 1 2 . Now, (, f ins(P )) is a partial order, and also a complete lattice, with least upper bound (lub): 1 2 = {I ∈ 1 |¬∃I2 ∈ 2 s.t. I2 ⊃ I} ∪ {I ∈ 2 |¬∃I1 ∈ 1 s.t. I1 ⊇ I}

Key Constraints and Monotonic Aggregates in Deductive Databases

115

The greatest lower bound (glb) is: 1 2 = {I1 ∩I2 |I1 ∈ 1 , I2 ∈ 2 and ¬(∃I ∈ 1 , ∃I ∈ 2 s.t. I ∩I ⊃ I1 ∩I2 )} These two operations are easily extended to families with inﬁnitely many elements; thus we have a complete lattice, with {BP } as top and {∅} as bottom. 3.2

Fixpoint Semantics of Positive Programs with Keys

Let us consider ﬁrst the case of positive programs P without key constraints, by revisiting the computation of the successive power of TP , where TP denotes the immediate consequence operator for P . We will also use the inflationary version of this operator, which was previously deﬁned as TP (I) = TP (I) ∪ I. The computation TP↑ω (∅) = T↑ω P (∅) generates an ascending chain; if I is the result obtained at the last step, the application of TP (I) adds to the old I the set of new tuples TP (I) − I, all at once. We next deﬁne an operator where the new consequences are added one by one; this will be called the Atomic Consequence Operator (ACO), TP , which is a mapping on families of interpretations. For a singleton set {I}, TP is deﬁned as follows: TP ({I}) = {I | ∃x ∈ [TP (I) − I] s.t. I = I ∪ {x}} {I} Then, for a family of sets, , we have TP () =

TP ({I})

I∈

Therefore, our new operator adds to I a single new consequence atom from TP (I) − I, when this is not empty; thus, it produces a family of interpretations from a singleton interpretation {I}. When TP (I) = I, then, by the above deﬁnition, TP ({I}) = {I}. The following result follows immediately from the deﬁnitions: Proposition 1. Let P be a positive logic program without keys. Then, TP defines a mapping that is monotonic and also continuous. Since we have a continuous mapping in a complete lattice, the well-known Knaster-Tarski theorem, and related ﬁxpoint results, can be used to conclude that there always exists solutions of the ﬁxpoint equation = TP (), and there also exists the least of such solutions, called the least fixpoint of TP . The least ﬁxpoint of TP , denoted lf p(TP ), can be computed as the ω-power of TP starting from the bottom element {∅}. Proposition 2. Let P be a positive logic program without key constrains. Then, = TP () has a least fixpoint solution denoted lf p(TP ), where: ↑j TP ({∅}) = {lf p(TP )} lf p(TP ) = TP↑ω ({∅}) = 0<j

116

Carlo Zaniolo

Thus for a positive program without keys, the least ﬁxpoint of the TP provides an equivalent characterization of the semantics of positive logic programs since the least ﬁxpoint of TP is the singleton set containing the least ﬁxpoint of TP . We now consider the situation of a positive program with keys P/K. The Immediate Consequence Operator (ICO) for this program is obtained by simply ignoring the keys: TP/K (I) = TP (I). The ACO is deﬁned as follows: Definition 3. Let TP/K be a logic program with key constraints, and let {I} ∈ f ins(P ) and ∈ f ins(P ). Then, TP/K ({I}) and TP/K () are defined as follows: TP/K ({I}) = {I | ∃x ∈ [TP (I) − I] s.t. I = I ∪ {x} and I |= K} {I}

TP/K () =

TP ({I})

I∈

For instance, if T denotes the ACO for our tiny college example, thenT ↑1 ({∅}) is simply a family with three singleton sets, one for each fact in the program: T ↑1 ({∅}) = { {prof essor(ohm, ee)}, {prof essor(bell, ee)}, {student( JimBlack , ee)} }

Thus, T ↑2 ({∅}) consists of pairs taken from the three program facts: T ↑2 ({∅}) = { {prof essor(bell, ee), prof essor(ohm, ee)} {student( JimBlack , ee), prof essor(bell, ee)}, {student( JimBlack , ee), prof essor(ohm, ee)}} From the ﬁrst pair, above, we can only obtain a family containing the three original facts; but from the second pair and third pair we obtain two diﬀerent advisors. In fact, we obtain: T ↑3 ({∅}) = { {student( JimBlack , ee), prof essor(bell, ee), prof essor(ohm, ee)}, {student( JimBlack , ee), prof essor(bell, ee), advisor( JimBlack , bell)}, {student( JimBlack , ee), prof essor(ohm, ee), advisor( JimBlack , ohm)} }

In the next step, these three parallel derivations converge into the following two sets: T ↑4 ({∅}) = { { student( JimBlack , ee), prof essor(bell, ee), prof essor(ohm, ee), advisor( JimBlack , bell)} { student( JimBlack , ee), prof essor(bell, ee), prof essor(ohm, ee), advisor( JimBlack , ohm)}}

Key Constraints and Monotonic Aggregates in Deductive Databases

117

No set can be further enlarged at the next step, given that the addition of a new advisor would violate the key constraints. So we have T ↑5 ({∅}) = T ↑4 ({∅}), and we have reached the ﬁxpoint. As illustrated by this example, although the operator TP/K is not monotonic, the ω-power of TP/K has desirable characteristics that makes it the natural choice for canonical semantics of positive programs with keys. In fact we have the following property: Proposition 3. Let P/K be a positive program with key constraints. Then, ↑ω ↑ω ({∅}) is a fixpoint for TP/K , and each {I} ∈ TP/K ({∅}) is a minimal fixTP/K point for TP/K . ↑ω ({∅}) can only generate elements which Proof: The application of TP/K to TP/K

↑ω were generated in the ω-derivation. Thus TP/K ({∅}) is a ﬁxpoint. Now, let

↑ω ({∅}). Clearly, TP/K ({I}) = {I}, otherwise the previous property {I} ∈ TP/K does not hold. Thus {I} is a ﬁxpoint. To prove that it is minimal, let J ⊂ I. If we trace the derivation chain for {I}, we ﬁnd a predecessor of {I } where I is not a subset of J, but its immediate predecessor, I is. Now let {x} = I − I , then J ∪ {x} does not violate the key constraints (since its superset I does not), 2 and {x} is in TP (J). Thus {J} cannot be a ﬁxpoint.

Therefore, under the all-answer semantics, we expect the whole family ↑ω TP/K ({∅}) to be returned as the canonical answer, whereas under a single-answer

↑ω semantics any of the interpretations in TP/K ({∅}) is accepted as a valid answer. In the next section, we introduce an equivalent semantics for our programs with keys using the notion of stable models.

4

Stable-Model Semantics

Programs with keys have an equivalent model-theoretic semantics. We will next ↑ω show that TP/K ({∅}) corresponds to the family of stable models for the program f oe(P/K) obtained from P/K by expressing the key constraints by negated goals. The stable model semantics also extends naturally to stratiﬁed programs with key constraints. 4.1

Positive Programs with Key Constraints

An equivalent characterization of a positive programs P/K can be obtained by introducing negated goals in the rules of P to enforce the key constraints. The program obtained by this transformation will be denoted f oe(P/K), and called the first order equivalent of P/K. The program f oe(P/K) so obtained always has a formal meaning under stable model semantics [10]. Take, for instance, our advisor example; the rule in Example 1 can also be expressed as follows:

118

Carlo Zaniolo

Example 5. The Advisor Example 1 Expressed Using Negation advisor(S, P) ←

student(S, Majr, Year), professor(P, Majr), ¬kviol advisor(S, P). kviol advisor(S, P) ← advisor(S, P), P = P .

Therefore, we allow a professor P to become the advisor of a student S provided that no other P = P is already an advisor of S. In general, if q is the name of a predicate subject to a key constraint, we use a new predicate kviol q to denote the violation of key constraints on q; then, we add a kviol q rule for each key declared for q. Finally, a negated kviol q goal is added to the original rules deﬁning q. For instance, the simple path program of Example 3 can be re-expressed in the following way: Example 6. The simple-path program of Example 3 Expressed Using Negation spath(root, X) ← g(X, Y), ¬kviol spath(root, X). spath(Y, Z) ← spath(X, Y), g(Y, Z), ¬kviol spath(Y, Z). kviol spath(X1, X2) ← spath(X1, Y2), X2 = Y2. kviol spath(X1, X2) ← spath(Y1, X2), X1 = Y1. Derivation of f oe(P/K). In general, given a program P/K constrained with keys, its ﬁrst order equivalent f oe(P/K) is computed as follows: 1. For each rule r, with head q(Z1 , . . . , Zn ), where q is constrained by some key, add the goal ¬kviol q(Z1 , . . . , Zn ) to r, 2. For each unique key(q, ArgList)! in K, where n is the arity of q, add a new rule, kviol q(X1 , . . . , Xn ) ← q(Y1 , . . . , Yn ), Y1 θ1 X1 , . . . , Yn θn Xn . where θj denotes the equality symbol ‘=’ for every j in ArgList, and the inequality symbol ‘=’ for every j not in ArgList. For instance, the f oe of our advisor example is: advisor(S, P) ←

student(S, Majr, Year), professor(P, Majr), ¬kviol advisor(S, P). kviol advisor(X1, X2 ) ← advisor(Y1, Y2 ), X1 = Y1 , X2 = Y2 . This transformation does in fact produce the rules of Example 6, after we replace equals with equals and eliminate all equality goals. The newly introduced predicates with the preﬁx kviol will be called key-violation predicates. Stable models provide the formal semantics for our f oe programs: Proposition 4. Let P/K be a positive logic program with keys. Then f oe(P/K) has one or more stable models.

Key Constraints and Monotonic Aggregates in Deductive Databases

119

A proof for this proposition can be easily derived from [25,13], where the same transformation is used to deﬁne the formal semantics of programs with the choice construct. With I an interpretation of f oe(P ), let pos(I) denote the interpretation obtained by removing all the key-violation atoms from I and leaving the others unchanged. Likewise, if is a family of interpretation of f oe(P ), then we deﬁne: pos() = pos(I) I∈

Then, the following theorem elucidates the equivalence between the two semantics: Proposition 5. Let P/K be a positive program, and Σ be the set of stable ↑ω ({∅}). models for f oe(P/K). Then pos(Σ) = TP/K Proof: Let I ∈ TP↑ω ({∅}), and PI = groundI (f oe(P/K)) be the program produced by the stability transformation on f oe(P/K). It suﬃces to show that ↑ω TP↑ω ({∅}) = I, i.e., that {I} = TP↑ω ({∅}). Now, take a derivation in TP/K ({∅}) I I producing I; we can ﬁnd an identical derivation in TP↑ω ({∅}) . This concludes I our proof. 2

4.2

Stratification

The notion of stratiﬁcation signiﬁcantly increases the expressive power of Datalog, while retaining the declarative ﬁxpoint semantics of programs. Consider ﬁrst the notion of stratiﬁcation with respect to negation for programs without key constraints: Definition 4. Let P be a program with negated goals, and σ1 , . . . , σn be a partition of the predicate names in P . Then, P is said to be stratified, when for each rule r ∈ P (with head hr ) and each goal gr in r, the following property holds: 1. stratum(hr ) > stratum(gr ) if gr is a negated goal 2. stratum(hr ) ≥ stratum(gr ) if gr is a positive goal. Therefore, a stratiﬁed program P can be viewed as a stack of rule layers, where the higher layers do not inﬂuence the lower ones. Thus the correct semantics can be assigned to a program by starting from the bottom layer and proceeding upward, with the understanding that computation for the higher layers cannot aﬀect lower ones. The computation can be implemented using the ICO TP , which, in the presence of negated goals, is generalized as follows. A rule r ∈ ground(P ) is said to be enabled by an interpretation I when all of its positive goals are in I and none of its negated goals are in I. Then, TP (I) is deﬁned as containing the heads of all rules in ground(P ) that are enabled by I. (This change automatically adjusts the deﬁnitions of T and T that are based on TP .)

120

Carlo Zaniolo

Therefore, let I[≤ j] and P [≤ j], respectively, denote the atoms in I and the rules in P whose head belongs to strata ≤ j. Also let P [j] denote the set of rules in P whose head belongs to stratum j. Then, we observe that for a stratiﬁed program P , the mapping deﬁned by P [j] (i.e., TP [j] ) is monotonic with respect to I[j]. Thus, if Ij−1 is the meaning of P [≤ j − 1], then T↑ω P [j] (Ij−1 ) is the meaning of P [≤ j]. Thus, let P be a program stratiﬁed with respect to negation and without key constraints; then the following algorithm inductively constructs the iterated ﬁxpoint for TP (and TP ): Iterated Fixpoint computation for TP , where P is stratiﬁed with strata σ1 , . . . , σn . 1. Let I0 = ∅; 2. For j = 1, . . . , n, let Ij = T↑ω P [j] (Ij−1 ) For every 1 ≤ j ≤ n, Ij = In [≤ j] is a minimal ﬁxpoint of P [≤ j]. The interpretation In obtained at the end of this computation is called the iterated ﬁxpoint for TP and deﬁnes the meaning of the program P . It is well-known that the iterated ﬁxpoint for a stratiﬁed program P is equal to P ’s unique stable model [36]. These notions can now be naturally extended to programs with key constraints. A program P/K is stratiﬁed whenever its keyless counterpart P is stratiﬁed. Let P/K[j] denote the rules with head in the j th stratum, along with the key constraints on their head predicates; also, let P/K[≤ j] denote the rules with head in strata lower than the j th stratum, along with their applicable key constraints. Finally, let: [≤ j] =

I[≤ j]

I∈

The notion of T can be extended in natural fashion to stratiﬁed programs. If j−1 is the meaning of P/K[≤ j − 1], then TP↑ω [j] (j−1 ) is the natural meaning of P/K[≤ j]. Thus we have the following extension of the iterated ﬁxpoint algorithm: Iterated Fixpoint Computation for TP/K where P/K is stratiﬁed with strata σ1 , . . . , σn . 1. Let 0 = {∅}; ↑ω (j−1 ) 2. For j = 1, . . . , n, let j = TP/K[j] The family of interpretations n obtained from this computation will be called the iterated fixpoint for TP/K . The iterated ﬁxpoint for TP/K deﬁnes the meaning of P/K; it has the property that, for each 1 ≤ j ≤ n, each member in j = n [≤ j] is a minimal ﬁxpoint for TP/K[≤j] .

Key Constraints and Monotonic Aggregates in Deductive Databases

121

Stable Model Semantics for Stratified Programs. Every program P that is stratiﬁed with respect to negation has a unique stable model that can be computed by the iterated ﬁxpoint computation for TP previously discussed. Likewise, every stratiﬁed program P/K can be expanded into its ﬁrst order equivalent f oe(P/K). Then, it can be shown that (i) f oe(P/K) always has one or more stable models, and (ii) if Σ denotes the family of its stable models, then pos(Σ) coincides with the iterated ﬁxpoint of TP/K .

5

Single-Answer Semantics and Nondeterminism

↑ω The derivation TP/K ({∅}) can be used to compute in parallel all the stable models for a positive program f oe(P/K). In this computation, each application of TP/K expands in parallel all interpretations in the current family, by the addition of a single new element to each interpretation. In [38], we discuss condensed derivations based on TP/K , which accelerate the derivation process by adding several new elements at each step of the computation. This ensures a faster convergence toward the ﬁnal result, while still computing all stable models at once. Even with condensed derivations, the computation of all stable models requires exponential time, since the number of such models can be exponential in the size of the database. This, computational complexity might be acceptable when dealing with N P-complete problems, such as deciding the existence of an Hamiltonian path. However, in many situations involving programs with multiple stable models, only one such model, not all of them, is required in practice. For instance, this is the case of Example 4, where we use choice to enumerate into a chain the elements of a set one by one, with the knowledge that the even/odd parity of the whole set only depends on its cardinality, and not on the particular chain used. Therefore for Example 4, the computation of any stable model will suﬃce to answer correctly the parity query. Since this situation is common for many queries, we need eﬃcient operators for computing a single stable model. Even with N P-complete problems, it is normally desirable to generate the stable models in a serial rather than parallel fashion. For instance, for the Hamiltonian circuit problem of Example 3, we can test if the last generated model satisﬁes the desired property (i.e., if there is any freenode), and only if this test fails, proceed with the generation of another model— normally, calling on some heuristics to aid in the search for a good model. On the average, this search succeeds without having to produce an exponential number of stable models, since exponential complexity only represents the worst-case behavior for many N P-complete algorithms. Now, the computation of a single stable model is in general N P-hard [26]; however, this computation for a program f oe(P/K) derived from one with key constraints can be performed in polynomial time, and, as we describe next, with minimal overhead with respect to the standard ﬁxpoint computation. Therefore, ↑ω ({∅}), we next concentrate on the problem of generating a single element in TP/K and on expressing polynomial-time queries using this single-answer semantics.

122

Carlo Zaniolo

We deﬁne next the notions of soundness and completeness for nondeterministic operators to be used to compute an element in TP↑ω ({∅}). Definition 5. Let P/K be a logic program with keys, and C be a class of functions on interpretations of P . Then we define the following two properties: 1. Soundness. A function τ ∈ C will be said to be sound for a program P/K ↑ω when τ ↑ω (∅) ∈ TP/K ({∅}). The function class C will be said to be sound when all its members are sound. 2. Completeness. The function class C will be said to be complete for a program ↑ω ({∅}) there exists some τ ∈ C such that: TP/K when for each M ∈ TP/K ↑ω τ (∅) = M . In situations where any answer will solve the problem at hand, there is no point in seeking completeness and we can limit ourselves to classes of functions that are sound, and eﬃcient to compute, even if completeness is lost; eager derivations discussed next represent an interesting class of such functions. Definition 6. Let P/K be a program with key constraints, and let Γ (I) be a function on interpretations of P . Then, Γ (I) will be called an eager derivation operator for P/K if it satisfies the following three conditions: 1. I ⊆ Γ (I) ⊆ TP (I) 2. Γ (I) |= K 3. Every subset of TP (I) that is a proper superset of Γ (I) violates some key constraint in K. Let CΓ be the class of eager derivation operators for a given program P/K. Then it is immediate to see that CΓ is sound for all programs. Eager derivation operators can be implemented easily. Their implementation only requires tables to memorize atoms previously derived and compare the new values against previous ones to avoid key violations. Inasmuch as table-based memorization is already part of the basic mechanism for the computation of ﬁxpoints in deductive databases, key constraints are easy to implement. A limitation of eager derivation operators is that they do not form a complete class for all positive programs with key constraints. This topic is discussed in [38], where classes of operators which are both sound and complete are also discussed. However, in the rest of this paper, we only use key constraints to deﬁne chain rules, such as those in Example 4; for these rules, the eager derivations are complete—in addition to being sound and eﬃciently computable.

6

Set Aggregates in Logic

The additional expressive power brought to Datalog by key constraints ﬁnds many uses; here we employ it to achieve a formal characterization of database aggregates, thus solving an important open problem in database theory and logic

Key Constraints and Monotonic Aggregates in Deductive Databases

123

programming. In fact, the state-of-the-art characterization of aggregates relies on the assumption that the universe is totally ordered [36]. Using this assumption, the atoms satisfying a given predicate are chained together in ascending order, starting from the least value and ending with the largest value. Unfortunately, this solution has four serious drawbacks, since (i) it compromises data independence by violating the genericity property [1], (ii) it relies on negation, thus infecting aggregates with the nonmonotonic curse, (iii) it is often ineﬃcient since it requires the data to be sorted before aggregation, and (iv) it cannot be applied to more advanced forms of aggregation, such as on-line aggregates and rollups, that are used in decision support and other advanced applications [33]. Online aggregation [8], in particular, cannot be expressed under the current approach that relies on a totally ordered universe to sort the elements of the set being processed, starting from its least element. In fact, at the core of on-line aggregation, there is the idea of returning partial results after visiting a proper subset of the given dataset, while the rest is still unknown. Now, it is impossible to compute the least element of a set when only part of it is known. We next show that all these problems ﬁnd a simple solution once key constraints are added to Datalog. For concreteness, we use the aggregate constructs of LDL++ [4], but very similar syntactic constructs are used by other systems (e.g., CORAL [23]), and the semantics here proposed is general and applicable to every logic-based language and database query language. 6.1

User Defined Aggregates

Consider the parity query of Example 4. To deﬁne an equivalent parity aggregate in LDL++ the user will write the following rules: Example 7. Deﬁnition rules for the parity aggregate mod2 single(mod2, , odd). multi(mod2, X, odd, even). multi(mod2, X, even, odd). freturn(mod2, , Parity, Parity). These rules have the same function as the last four rules in Example 4. The single rule speciﬁes how to initialize the computation of the mod2 aggregate by specifying its value on a singleton set (same as the ﬁrst ca rule in the example). The two multi rules instead specify how the new aggregate value (the fourth argument) should be updated for each new input value (second argument), given its previous value (third argument). (Thus these rules perform the same function as the second and the third of the ca rules in Example 4.) The freturn rule speciﬁes (as fourth argument) the value to be returned once the last element in the set is detected (same as the last rule in Example 4). For mod2, the value returned is simply taken from the third argument, where it was left by the multi rule executed on the last element of the set. Two important observations can therefore be made:

124

Carlo Zaniolo

1. We have described a very general method for deﬁning aggregates by specifying the computation to be performed upon (i) the initial value, (ii) each successive value, and (iii) the ﬁnal value in the set. This paradigm is very general, and also describes the mechanism for introducing user deﬁned aggregates (UDAs) used by SQL3 and in the AXL system [33]. 2. The correspondence between the above rules and those of Example 4 outlines the possibility of providing a logic semantics to UDAs by simply expanding the single, multi, and freturn rules into an equivalent logic program (using the chain rules) such as that of Example 4. The rules in Example 7 are generic, and can be applied to any set of facts. To reproduce the behavior of Example 4, they must be applied to b(X). In LDL++ this is speciﬁed by the aggregate-invocation rule: p(mod2X ) ← b(X). that speciﬁes that the result of the computation of mod2 on b(X) is returned as the argument of a predicate, that our user has named p. There has been much recent interest in online aggregates [8], which also ﬁnd important applications in logic programming, as discussed later in this paper. For instance, when computing averages on non-skewed data, the aggregate often converges toward the ﬁnal value long before all the elements in the set are visited. Thus, the system should support early returns to allow the user to check convergence and stop the computation as soon as the series of successive values has converged within the prescribed accuracy [8]. UDAs with early returns can be deﬁned in LDL++ through the use of ereturn rules. Say, for instance, that we want to deﬁne a new aggregate myavg, and apply it to the elements of d(Y), and view the results of this computation as a predicate q. Then, the LDL++ programmer must specify one aggregate-application rule, and several aggregate-definition rules. For instance, the following is an aggregate application rule: r : q(myavgY) ← d(Y). The . . . notation in the head of r denotes an aggregate; this rule speciﬁes that the deﬁnition rules for myavg must be applied to the stream of Y-values that satisfy the body of the rule. The aggregate deﬁnition rules include: (i) single rule(s) (ii) multi rule(s), (iii) freturn rule(s) for ﬁnal returns and/or (iv) ereturn rule(s) for early returns. All four kinds of rules are used in the following deﬁnition of myavgr: single(myavg, Y, cs(1, Y)). multi(myavg, Y, cs(Cnt, Sum), cs(Cnt1, Sum1)) ← Cnt1 = Cnt + 1, Sum1 = Sum + Y. freturn(myavg, Y, cs(Cnt, Sum), Val) ← Val = Sum/Cnt.

Key Constraints and Monotonic Aggregates in Deductive Databases

125

ereturn(myavg, X, (Sum, Count), Avg) ← Count mod 100 = 0, Avg = Sum/Count. Observe that the ﬁrst argument in the head of the single, multi, ereturn, and freturn rules contains the name of the aggregate: therefore, these aggregate deﬁnition rules can only be used by aggregate application rules that contain myavg . . . in the head. The second argument in the head of a single or multi rule holds the ‘new’ value from the input stream, while the last argument holds the partial value returned by the previous computation. Thus, for averages, the last argument should hold the pair cs(Count, Sum). The single rule speciﬁes the value of the aggregate for a singleton set (containing the ﬁrst value in the stream); for myavg, the singleton rule must return cs(1, Y). The multi rules prescribe an inductive computation on a set with n + 1 elements, by specifying how the n + 1th element in the stream is to be combined with the value returned (as third argument in multi) by the computation on the ﬁrst n elements. For myavg, the count is increased by one and the sum is increased by the new value in the stream. The freturn rules specify how the ﬁnal value(s) of the aggregate are to be returned. For myavg, we return the ratio of sum and count. The ereturn rules specify when early returns are to be produced and what are their values. In particular for myavg, we produce early returns every 100 elements in the stream, and the value produced is the current ratio sum/count—online aggregation. 6.2

Semantics of Aggregates

In general, the semantics of an aggregate application rule r r : q(myavgY) ← d(Y). can be deﬁned by expanding it into its key-constrained equivalent logic program, denoted kce(r), which contains the following rules: 1. A main rule p(Y) ← results(avg, Y). where results(avg, Y) is derived from d(Y) by a program consisting of: 2. The chain rules that link the elements of d(Y) into an order-inducing chain ( nil is a special value not in d(Y)), unique key(chainr, [1])! unique key(chainr, [2])! chainr(nil, Y) ← d(Y). chainr(Y, Z) ← chainr(X, Y), d(Z). 3. The cagr rules that perform the inductive computation: cagr(AgName, Y, New) ← chainr(nil, Y), Y = nil, single(myagr, Y, New). cagr(AgName, Y2, New) ← chainr(Y1, Y2), cagr(AgName, Y1, Old), multi(AgName, Y2, Old, New).

126

Carlo Zaniolo

Thus, the cagr rules are used to memorize the previous results, and to apply (i) single to the ﬁrst element of d(Y) (i.e., for the pattern chainr(nil, Y)) and (ii) multi to the successive elements. 4. The two results rules, where the ﬁrst rule produces early returns and second rule produces final returns as follows: results(AgName, Y2, New) ← chainr(Y1, Y2), cagr(AgName, Y1, Old), ereturn(AgName, Y2, Old, Yield). results(AgName, AgValue) ← chainr(X, Y), ¬chainr(Y, ), cagr(AgName, Y, Old), freturn(AgName, Y, Old, AgValue). Therefore, the ﬁrst results rule produces the early returns by applying ereturn to every element in the chain, and the second rule produces the ﬁnal returns by applying freturn on the last element in the chain (i.e., the element without a successor). In LDL++, an implicit group-by operation is performed on the head arguments not used to apply aggregates. Thus, to compute the average salary of employees grouped by Dno, the user can write: avgsal(Dno, myavgSal) ← emp(Eno, Sal, Dno). As discussed in [34], the semantics of aggregates with group-by can simply be deﬁned by including an additional argument in the predicates chainr and results to hold the group-by attributes. 6.3

Applications of User Defined Aggregates

We will now discuss the use of UDAs to express polynomial algorithms in a natural and eﬃcient way. These algorithms use aggregates in programs that yield the correct ﬁnal results unaﬀected by the nondeterministic behavior of the aggregates. Therefore, aggregate computation here uses single-answer semantics, which assures polynomial complexity. Let us consider ﬁrst uses of nonmonotonic aggregates. For instance, say that from a set of pairs such as (Name, YearOfBirth) as input, we want to return the Name of the youngest person (i.e., the person born in the latest year). This computation cannot be expressed directly as an aggregate in SQL, but can be expressed by the UDA youngest given below (in LDL++, a vector of n arguments (X1 , . . . , Xn ) is basically treated as a n-argument function with a default name). single(youngest, (N, Y), (N, Y)). multi (youngest, (N, Y), (N1, Y1), (N, Y)) ← Y ≥ Y1. multi (youngest, (N, Y), (N1, Y1), (N1, Y1)) ← Y ≤ Y1. freturn(youngest, (N, Y), (N1, Y1), N1). User-deﬁned aggregates provide a simple solution to a number of complex problems in deductive databases; due to space limitations we will here consider only simple examples—a more complete set of examples can be found in [37].

Key Constraints and Monotonic Aggregates in Deductive Databases

127

We already discussed the deﬁnition and uses of online aggregates, such as myavg that returns values every 100 samples. In a more general framework, the user would want to control how often new results are to be returned to the user, on the basis of the estimated progress toward convergence in the computation [8]. UDAs provide a natural setting for this level of control. Applications of UDAs are too many to mention. But for an example, take the interval coalescing problem of temporal databases [35]. For instance, say that from a base relation emp(Eno, Sal, Dept, (From, To)), we project out the attribute Sal and Dept; then the same Eno appears in tuples with overlapping valid-time intervals and must be coalesced. Here we use closed intervals represented by the pair (From, To) where From is the start-time, and To is the end-time. Under the assumption that tuples are sorted by increasing start-time, we can use a special coales aggregate to perform the task in one pass through the data. Example 8. Coalescing overlapping intervals sorted by start time. emp(Eno, , , (From, To)). empProj(Eno, coales(From, To)) ← single(coales, (Frm, To), (Frm, To)). multi(coales, (Nfr, Nto), (Cfr, Cto), (Cfr, Lgr)) ← Nfr ≤ Cto, larger(Cto, Nto, Lgr). multi(coales, (Nfr, Nto), (Cfr, Cto), (Cfr, Nto)) ← Nfr > Cto. ereturn(coales, (Nfr, Nto), (Cfr, Cto), (Cfr, Cto)) ← Nfr > Cto. freturn(coales, , LastInt, LastInt). larger(X, Y, X) ← X ≥ Y. larger(X, Y, X) ← X < Y. Thus, the single rule starts the coalescing process by setting the current interval equal to the ﬁrst interval. The multi rule operates as follows: when the new interval (Nfr, Nto) overlaps the current interval (Cfr, Cto) (i.e., when Nfr ≤ Cto), the two are coalesced into an interval that begins at Cfr, and ends with the larger of Nto and Cto; otherwise, the current interval is returned and the new interval becomes the current one.

7

Monotonicity

Commercial database systems and most deductive database systems disallow the use of aggregates in recursion and require programs to be stratiﬁed with respect to aggregates. This restriction is also part of the SQL99 standards [7]. However, many important algorithms, particularly greedy algorithms, use aggregates such as count, sum, min and max in a monotonic fashion, inasmuch as previous results are never discarded. This observation has inspired a significant amount of previous work seeking eﬃcient expression of these algorithms in logic [27,6,24,31,9,15]. At the core of this issue there is the characterization of programs where aggregates behave monotonically and can therefore be freely used in recursion. For many interesting programs, special lattices can be found

128

Carlo Zaniolo

in which aggregates are monotonic [24]. But the identiﬁcation of such lattices cannot be automated [31], nor is the computation of ﬁxpoints for such programs. Our newly introduced theory of aggregates provides a deﬁnitive solution to the monotonic aggregation problem, including a simple syntactic characterization to determine if an aggregate is monotonic and can thus be used freely in recursion. 7.1

Partial Monotonicity

For a program P/K, we will use the words constrained predicates and free predicates to denote predicates that are constrained by keys and those that are not. With I an interpretation, let Ic , and If , respectively, denote the atoms in I that are instances of constrained and free predicates; Ic will be called the constrained component of I, and If is called the free component of I. Then, let I and J be two interpretations such that I ⊆ J and Ic = Jc (thus If ⊆ Jf ). Likewise, each family can be partitioned into the family of its constrained components, c , and the family of its free components, f . Then, the following proposition shows that a program P/K deﬁnes a monotonic transformation with respect to the free components of families of interpretations: Proposition 6. Partial Monotonicity: Let and be two families of interpretations for a program P/K. If , while c = c then, TP/K () TP/K ( ). Proof. It suﬃces to prove the property for two singleton sets {I} and {J} where If ⊆ Jf , while Ic = Jc . Take an arbitrary I ∈ TP/K ({I}): we need to show that there exists a J ∈ TP/K ({J}) where I ⊆ J . If I ⊆ J the conclusion is trivial; else, let I = I ∪ {x}, x ∈ TP (I) − I, and I |= K. Since I is a subset of J but I is not, x is not in J, and x ∈ TP (J) − J. Also, if J = J ∪ {x}, J |= K (since Jc = Ic ). Thus, J ∈ TP/K ({J}). 2 This partial monotonicity property (i.e., monotonicity w.r.t. free predicates only) extends to the successive powers of TP/K , including its ω-power. Thus If ↑ω ↑ω , while c = c then, TP/K () TP/K ( ). This result shows that the program P/K deﬁnes a monotonic mapping from unconstrained predicates to every other predicate in the program. It is customary in deductive databases to draw a distinction between extensional information (base relations) and intensional information (derived relations). Therefore, a program can be viewed as deﬁning a mapping from base relations to derived relations. Therefore, the partial monotonicity property states that the mapping from database relations free of key constraints to derived relations is monotonic—i.e., the larger the base relations, the larger the derived relations. For a base relation R that is constrained by keys, we can introduce an auxiliary input relation RI free of key constraints, along with a copy rule that derives R from RI . Then, we can view RI as the input relation and R as a result of ﬁltering RI with the key constraints. Then, we have a monotonic mapping from the input relation RI to the derived relations in the program.

Key Constraints and Monotonic Aggregates in Deductive Databases

7.2

129

Monotonic Aggregates

Users normally think of an aggregate application rule, such as r, as a direct mapping from r’s body to r’s head—a mapping which behaves according to the rules deﬁning the aggregate. This view is also close to the actual implementation, since in a system such as LDL++ the execution of the rules in kce(r) is already built into the system. The encapsulate program for an aggregate application rule r, will be denoted 0(r) and contains all the rules in kce(r) and the single, multi, ereturn and freturn rules deﬁning the aggregates used in r. Then, the transitive mapping deﬁned by 0(r) transforms families of interpretations of the body of r to families of interpretations of the heads of rules in 0(r). With I an interpretation of the body of r (i.e., a set of atoms from predicates in the body of r), then the mapping for ↑ω ({I}), when there are no freturn rules, and is equal to the 0(r) is equal to T(r) result of the iterated ﬁxpoint of the stratiﬁed 0(r) program, otherwise. For instance, consider the deﬁnition and application rules for an online count aggregate msum: r : q(msumX ) ← p(X). single(msum, Y, Y). multi(msum, Y, Old, New) ← New = Old + Y. ereturn(msum, Y, Old, New) ← New = Old + Y. The transitive mapping established by 0(r ) can be summarized by the chainr atoms, which describe a particular sequencing of the elements in I and the aggregate values for the sequence so generated:

↑ω T(r ) ()

{{p(3)}}

{{chainr (nil, 3), q(3)}}

{{p(1), p(3)}} {{chainr (nil, 1), chainr (1, 3), q(1), q(4)}, {chainr (nil, 3), chainr (3, 1), q(3), q(4)}} ... ... Therefore, the mapping deﬁned by the aggregate rules is multivalued —i.e., from families of interpretations to families of interpretations. The ICO for the set of non aggregate rules P can also be seen as a mapping between families of interpretations by simply letting TP ({I}) = {TP (I)}. Then, the encapsulated consequence operator for a program with aggregates combines the immediate consequence operator for regular rules with the transitive consequences for the aggregate rules. Because of the partial monotonicity properties of programs with key constraints, we now derive the following property: Proposition 7. Let P be a positive program with aggregates defined without final return rules. Then, the encapsulated consequence operator for P is monotonic in the lattice of families of interpretations.

130

Carlo Zaniolo

Therefore, aggregates deﬁned without freturn rules will be called monotonic; thus, monotonic aggregates can be used freely in recursive programs. Aggregate computation in actual programs is very similar to the seminaive computation used to implement deductive databases [5,35], which is based on combining old values with new values according to rules obtained by the symbolic diﬀerentiation of the original rules. For aggregates, we can use the same framework with the diﬀerence that the rules for storing the old values and those for producing the results are now given explicitly by the programmer through the single/multi and ereturn/freturn rules in the deﬁnition. 7.3

Aggregates in Recursion

Our newly introduced theory of aggregates provides a deﬁnitive solution to the monotonic aggregation problem, with a simple syntactic criterion to decide if an aggregate is monotonic and can thus be used freely in recursion. The rule is as follows: All aggregates which are defined without any freturn rule are monotonic and can be used freely in recursive rules. The ability of freely using aggregates with early returns in programs allows us to express concisely complex algorithms. For instance, we next deﬁne a continuous count that returns the current count after each new element but the ﬁrst one (thus, it does not have a freturn since that would be redundant). single(mcount, Y, 1). multi(mcount, Y, Old, New) ← New = Old + 1. ereturn(mcount, Y, Old, New) ← New = Old + 1. Using mcount we can now code the following applications, taken from [24]. Join the Party Some people will come to the party no matter what, and their names are stored in a sure(Person) relation. But others will join only after they know that at least K = 3 of their friends will be there. Here, friend(P, F) denotes that F is P’s friend. willcome(P) ← sure(P). willcome(P) ← c friends(P, K), K ≥ 3. c friends(P, mcountF ) ← willcome(F), friend(P, F). Consider now a computation of these rules on the following database. friend(jerry, mark). friend(penny, mark). friend(jerry, jane). friend(penny, jane). friend(jerry, penny). friend(penny, tom).

sure(mark). sure(tom). sure(jane).

Key Constraints and Monotonic Aggregates in Deductive Databases

131

Then, the basic semi-naive computation yields: willcome(mark), willcome(tom), willcome(jane), c friends(jerry, 1), c friends(penny, 1), c friends(jerry, 2), c friends(penny, 2), c friends(penny, 3), willcome(penny), c friends(jerry, 3), willcome(jerry). This example illustrates how the standard semi-naive computation can be applied to queries containing monotonic user-deﬁned aggregates. Another interesting example is transitive ownership and control of corporations [24]. Company Control Say that owns(C1, C2, Per) denotes the percentage of shares that corporation C1 owns of corporation C2. Then, C1 controls C2 if it owns more than, say, 50% of its shares. In general, to decide whether C1 controls C3 we must also add the shares owned by corporations such as C2 that are controlled by C1. This yields the transitive control rules deﬁned with the help of a continuous sum aggregate that returns the partial sum for each new element, but the ﬁrst one. control(C, C) ← owns(C, , ). control(Onr, C) ← twons(Onr, C, Per), Per > 50. towns(Onr, C2, msumPer ) ← control(Onr, C1), owns(C1, C2, Per). Thus, every company controls itself, and a company C1 that has transitive ownership of more than 50% of C2’s shares controls C2 . In the last rule, twons computes transitive ownership with the help of msum that adds up the shares of controlling companies. Observe that any pair (Onr, C2) is added at most once to control, thus the contribution of C1 to Onr’s transitive ownership of C2 is only accounted once. Bill-of-Materials (BoM) Applications BoM applications represent an important application area that requires aggregates in recursive rules. Say, for instance that assembly(P1, P2, QT) denotes that P1 contains part P2 in quantity QT. We also have elementary parts described by the relation basic part(Part, Price). Then, the following program computes the cost of a part as the sum of the cost of the basic parts it contains. part cost(Part, O, Cst) ← basic part(Part, Cst). part cost(Part, mcountSb , msumMCst ) ← part cost(Sb, ChC, Cst), prolfc(Sb, ChC), assembly(Part, Sb, Mult), MCst = Cst ∗ Mult. Thus, the key condition in the body of the second rule is that a subpart Sb is counted in part cost only when all of Sb’s children have been counted. This occurs when the number of Sb’s children counted so far by mcount is equal to the out-degree of this node in the graph representing assembly. This number is kept in the proliﬁcacy table, prolfc(Part, ChC), which can be computed as follows: prolfc(P1, countP2 ) ← assembly(P1, P2, ). prolfc(P1, 0) ← basic part(P1, ).

132

8

Carlo Zaniolo

Conclusions

Keys in derived relations extend the expressive power of deductive databases while retaining their declarative semantics and eﬃcient implementations. In this paper, we have presented equivalent ﬁxpoint and model-theoretic semantics for programs with key constraints in derived relations. Database aggregates can be easily modelled under this extension, yielding a simple characterization of monotonic aggregates. Monotonic aggregates can be freely used in recursive programs, thus providing simple and eﬃcient expressions for optimization and greedy algorithms that had been previously considered impervious to the logic programming paradigm. There has been a signiﬁcant amount of previous work that is relevant to the results presented in this paper. In particular the LDL++ provides the choice construct to declare functional dependency constraints in derived relations. The stable model characterization and several other results presented in this paper ﬁnd a similar counterpart in properties of LDL++ choice construct [13,37]; however, no ﬁxpoint characterization and related results were known for LDL++ choice. An extension of this concept to temporal logic programming was proposed by Orgun and Wadge [21], who introduced the notion of choice predicates that ensure that a given predicate is single-valued. This notion ﬁnds applications in intensional logic programming [21]. The cardinality and weight constraints proposed by Niemel¨a and Simons provide a powerful generalization to key constraints discussed here [20]. In fact, while the key constraint restrict the cardinality of the results to be one, the constraint that such cardinality must be restricted within a user-speciﬁed interval is supported in the mentioned work (where diﬀerent weights can also be attached to atoms). Thus Niemel¨a and Simons (i) provide a stable model characterization for logic programs containing such constraints, (ii) propose an implementation using Smodels [19], and (ii) show how to express NP-complete problems using these constraints. The implementation approach used for Smodels is quite diﬀerent from that of LDL++; thus investigating the performance of diﬀerent approaches in supporting cardinality constraints represents an interesting topic for future research. Also left for future research, there is the topic of SLD-resolution, which (along with the ﬁxpoint and model-theoretic semantics treated here) would provide a third semantic characterization for logic programs with key constraints [29]. Memoing techniques could be used for this purpose, and for an eﬃcient implementation of keys and aggregates [3]. Acknowledgements The author would like to thank the reviewers for the many improvements they have suggested, and Frank Myers for his careful proofreading of the manuscript. The author would also like to express his gratitude to Dino Pedreschi, Domenico Sacc´a, Fosca Giannotti and Sergio Greco who laid the seeds of these ideas during our past collaborations. This work was supported by NSF Grant IIS-007135.

Key Constraints and Monotonic Aggregates in Deductive Databases

133

References 1. S. Abiteboul, R. Hull, and V. Vianu: Foundations of Databases. Addison-Wesley, 1995. 2. N. Bidoit and C. Froidevaux: General logical Databases and Programs: Default Logic Semantics and Stratiﬁcation. Information and Computation, 91, pp. 15–54, 1991. 3. W. Chen, D. S. Warren: Tabled Evaluation With Delaying for General Logic Programs. JACM, 43(1): 20-74 (1996). 4. D. Chimenti, R. Gamboa, R. Krishnamurthy, S. Naqvi, S.Tsur and C. Zaniolo: The LDL System Prototype. IEEE Transactions on Knowledge and Data Engineering, 2(1), pp. 76-90, 1990. 5. S. Ceri, G. Gottlob and L. Tanca: Logic Programming and Databases. Springer, 1990. 6. S. W. Dietrich: Shortest Path by Approximation in Logic Programs. ACM Letters on Programming Languages and Systems, 1(2), pp. 119–137, 1992. 7. S. J. Finkelstein, N.Mattos, I.S. Mumick, and H. Pirahesh: Expressing Recursive Queries in SQL, ISO WG3 report X3H2-96-075, March 1996. 8. J. M. Hellerstein, P. J. Haas, H. J. Wang.: Online Aggregation. SIGMOD 1997: Proc. ACM SIGMOD Int. Conference on Management of Data, pp. 171-182, ACM, 1997. 9. S. Ganguly, S. Greco, and C. Zaniolo: Extrema Predicates in Deductive Databases. JCSS 51(2), pp. 244-259, 1995. 10. M. Gelfond and V. Lifschitz: The Stable Model Semantics for Logic Programming. Proc. Joint International Conference and Symposium on Logic Programming, R. A. Kowalski and K. A. Bowen (eds.), pp. 1070-1080, MIT Press, 1988. 11. F. Giannotti, D. Pedreschi, D. Sacc` a, C. Zaniolo: Non-Determinism in Deductive Databases. In DOOD’91, C. Delobel, M. Kifer, Y. Masunaga (eds.), pp. 129-146, Springer, 1991. 12. F. Giannotti, G. Manco, M. Nanni, D. Pedreschi: On the Eﬀective Semantics of Nondeterministic, Nonmonotonic, Temporal Logic Databases. Proceedings of 12th Int. Workshop, Computer Science Logic, pp. 58-72, LNCS Vol. 1584, Springer, 1999. 13. F. Giannotti, D. Pedreschi, and C. Zaniolo: Semantics and Expressive Power of Non-Deterministic Constructs in Deductive Databases. JCSS 62, pp. 15-42, 2001. 14. Sergio Greco, Domenico Sacc` a: NP Optimization Problems in Datalog. ILPS 1997: Proc. Int. Logic Programming Symposium, pp. 181-195, MIT Press, 1997. 15. S. Greco and C. Zaniolo: Greedy Algorithms in Datalog with Choice and Negation, Proc. 1998 Joint Int. Conference & Symposium on Logic Programming, JCSLP’98, pp. 294-309, MIT Press, 1998. 16. R. Krishnamurthy, S. Naqvi: Non-Deterministic Choice in Datalog. In Proc. 3rd Int. Conf. on Data and Knowledge Bases, pp. 416-424, Morgan Kaufmann, 1988. 17. V. W. Marek and M. Truszczynski: Nonmonotonic Logic. Springer-Verlag, New York, 1995. 18. J. Minker: Logic and Databases: A 20 Year Retrospective. In D. Pedreschi and C. Zaniolo (eds.), Proceedings International Workshop on Logic in Databases (LID’96), Springer-Verlag, pp. 5–52, 1996. 19. I. Niemel¨ a, P. Simons and T. Syrjanen: Smodels: A System for Answer Set Programming Proceedings of the 8th International Workshop on NonMonotonic Reasoning, April 9-11, 2000, Breckenridge, Colorado, 4 pages. (Also see: http://www.tcs.hut.ﬁ/Software/smodels/)

134

Carlo Zaniolo

20. I. Niemel¨ta and P. Simons: Extending the Smodels System with Cardinality and Weight Constraints. In Jack Minker (ed.): Logic-Based Artificial Intelligence, pp. 491-521. Kluwer Academic Publishers, 2001. 21. M.A. Orgun and W.W. Wadge, Towards an Uniﬁed Theory of Intensional Logic Programming. The Journal of Logic and Computation, 4(6), pp. 877-903, 1994. 22. T. C. Przymusinski: On the Declarative and Procedural Semantics of Stratiﬁed Deductive Databases: In J. Minker (ed.), Foundations of Deductive Databases and Logic Programming, pp. 193–216, Morgan Kaufmann, 1988. 23. R. Ramakrishnan, D. Srivastava, S. Sudanshan, and P. Seshadri: Implementation of the CORAL Deductive Database System. SIGMOD’93: Proc. Int. ACM SIGMOD Conference on Management of Data, pp. 167–176, ACM, 1993. 24. K. A. Ross and Yehoshua Sagiv: Monotonic Aggregation in Deductive Database, JCSS 54(1), pp. 79-97, 1997. 25. D. Sacc` a and C. Zaniolo: Deterministic and Non-deterministic Stable Models, Journal of Logic and Computation, 7(5), pp. 555-579, 1997. 26. J. S. Schlipf: Complexity and Undecidability Results in Logic Programming, Annals of Mathematics and Artificial Intelligence, 15, pp. 257-288, 1995. 27. S. Sudarshan and R. Ramakrishnan: Aggregation and relevance in deductive databases. VLDB’91: Proceedings of 17th Conference on Very Large Data Bases, pp. 501-511, Morgan Kaufmann, 1991. 28. J. D. Ullman: Principles of Data and Knowledge-Based Systems, Computer Science Press, New York, 1988. 29. M.H. Van Emden and R. Kowalski: The Semantics of Predicate Logic as a Programming Language. JACM 23(4), pp. 733-742, 1976. 30. A. Van Gelder, K. A. Ross, and J. S. Schlipf: The Well-Founded Semantics for General Logic Programs. JACM 38, pp. 620–650, 1991. 31. A. Van Gelder: Foundations of Aggregations in Deductive Databases. In DOOD’93, S. Ceri, K. Tanaka, S. Tsur (Eds.), pp. 13-34, Springer, 1993. 32. H. Wang and C. Zaniolo: User-Deﬁned Aggregates in Object-Relational Database Systems. ICDE 2000: International Conference on Database Engineering. pp. 111121, IEEE Press, 2000. 33. H. Wang and C. Zaniolo: Using SQL to Build New Aggregates and Extenders for Object-Relational Systems. VLDB 2000: Proceedings of 26th Conference on Very Large Data Bases, pp. 166-175, Morgan Kaufmann, 2000. 34. C. Zaniolo and H. Wang: Logic-Based User-Deﬁned Aggregates for the Next Generation of Database Systems. In K.R. Apt, V. Marek, M. Truszczynski, D.S. Warren (eds.): The Logic Programming Paradigm: Current Trends and Future Directions. Springer Verlag, pp. 121-140, 1999. 35. C. Zaniolo, S. Ceri, C. Faloutzos, R. Snodgrass, V.S. Subrahmanian, and R. Zicari: Advanced Database Systems, Morgan Kaufmann, 1997. 36. C. Zaniolo: The Nonmonotonic Semantics of Active Rules in Deductive Databases. In DOOD 1997, F. Bry, R. Ramakrishnan, K. Ramamohanarao (eds.), pp. 265-282, Springer, 1997. 37. C. Zaniolo et al.: LDL++ Documentation and Web Demo, 1988: http://www.cs.ucla.edu/ldl 38. C. Zaniolo: Key Constraints and Monotonic Aggregates in Deductive Databases. UCLA technical report, June 2001.

A Decidable CLDS for Some Propositional Resource Logics Krysia Broda Department of Computing, Imperial College 180 Queens’ Gate, London SW7 2BZ [email protected]

Abstract. The compilation approach for Labelled Deductive Systems (CLDS) is a general logical framework. Previously, it has been applied to various resource logics within natural deduction, tableaux and clausal systems, and in the latter case to yield a decidable (first order) CLDS for propositional Intuitionistic Logic (IL). In this paper the same clausal approach is used to obtain a decidable theorem prover for the implication fragments of propositional substructural Linear Logic (LL) and Relevance Logic (RL). The CLDS refutation method is based around a semantic approach using a translation technique utilising first-order logic together with a simple theorem prover for the translated theory using techniques drawn from Model Generation procedures. The resulting system is shown to correspond to a standard LL(RL) presentation as given by appropriate Hilbert axiom systems and to be decidable.

1

Introduction

Among the computational logic community no doubt there are very many people, like me, whose enthusiasm for logic and logic programming was ﬁred by Bob Kowalski. In my case it led to an enduring interest in automated reasoning, and especially the connection graph procedure. In appreciation of what Bob taught me, this paper deals with some non-classical resource logics and uses a classical ﬁrst order theory to give a clausal theorem prover for them. The general methodology based on Gabbay’s Labelled Deductive Systems (LDS) [9], called the Compiled Labelled Deductive Systems approach (CLDS), is described in [5], [6]. The method allows various logics to be formalised within a single framework and was ﬁrst applied to modal logics in [14] and generally to the multiplicative part of substructural logics in [5], [6]. The CLDS refutation method is based around a semantic approach using a translation into ﬁrst-order logic together with a simple theorem prover for the translated theory that employs techniques drawn from Model Generation procedures. However, one critical problem with the approach is that the resulting ﬁrst order theory is often too expressive and therefore not decidable, even when the logic being modelled is known to be so. It was described in [4] how to construct a decidable refutation prover for the case of Intuitionistic Logic (IL); in this paper that A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 135–159, 2002. c Springer-Verlag Berlin Heidelberg 2002

136

Krysia Broda

prover is extended, in diﬀerent ways, to deal with the implication fragments of the propositional resource logics Linear Logic (LL) and Relevance Logic (RL). The motivation for using LDS derives from the observation that many logics only diﬀer from each other in small ways. In the family of modal logics, for example, the diﬀerences can be captured semantically through the properties of the accessibility relation, or syntactically within various side-conditions on the proof steps. In substructural logics, the diﬀerences can be captured in the syntax by means of the structural proof rules. In a CLDS, capturing diﬀerences between logics is achieved through the use of a combined language, incorporating a language for wﬀs and a language for terms (known as labels), called a labelling language. Elements of the two languages are combined to produce declarative units of the form α : λ, where α is a wﬀ and λ is a label. The interpretation of a declarative unit depends on the particular family of logics being formalised. In the case of modal logics the label λ names a possible world, whereas in substructural, or resource, logics it names a combination of resources. A theory built from declarative units is called a conﬁguration and consists both of declarative units and literals stating the relationships between labels of the conﬁguration (called R-literals). In this LDS approach applied to resource logics the declarative unit α : λ represents the statement that the “resource λ veriﬁes the wﬀ α”. This was ﬁrst exploited in [9]. Resources can be combined using the operator ◦ and their power of veriﬁcation related by , where λ λ is interpreted to mean that λ can verify everything that λ can and is thus the more powerful of the two. Depending on the properties given to ◦ the power of combined resources can be controlled. In RL, for example, resources can be copied; that is, λ ◦ λ λ, or λ is just as powerful as multiple copies of itself. In both RL and LL the order in which resources are combined does not matter, so λ ◦ λ λ ◦ λ. These properties, contraction and commutativity, respectively, correspond to the structural rules of contraction and permutation of standard sequent calculi for RL and LL. In fact, in LDS, all substructural logics can be treated in a uniform way, simply by including diﬀerent axioms in the labelling algebra [1]. The semantics of a CLDS is given by translating a conﬁguration into ﬁrst order logic in a particular way, the notion of semantic entailment being deﬁned with respect to such translated conﬁgurations. An example of a conﬁguration is the set of declarative units {p → (p → (q → p)) : b, p : a, q : c, q → p : b ◦ a ◦ a, p : b ◦ a ◦ a ◦ c} and R-literals {a ◦ a a, a b ◦ a ◦ a ◦ c}, called constraints in this paper. The translation of a conﬁguration uses a language of special monadic predicates of the form [α]∗ , one predicate for each wﬀ α. For the above example of a conﬁguration the translation is {[p → (p → (q → p))]∗ (b), [p]∗ (a), [q → p]∗ (b ◦ a ◦ a), [p]∗ (b ◦ a ◦ a ◦ c), a ◦ a a, a b ◦ a ◦ a ◦ c} A set of axioms to capture the meanings of the logical operators and a theory, called the labelling algebra, are used for manipulating labels and the relations

A Decidable CLDS for Some Propositional Resource Logics

H * YH (3) HH (1) HHj (2) - A ∪ F OT (C) ∪ ¬F OT (C ) |= ∪ F OT (C) ∪ ¬F OT (C ) C |=S C

A+ S

AlgMG

137

+ S

F OL

Fig. 1. Refutation CLDS

between them. The language, axiom theory and labelling algebra considered in this paper are together referred to as LCLDS and RCLDS , respectively, for LL and RL. An example of a semantic axiom, using the monadic predicates of the form [α]∗ , in this case that captures the meaning of the → operator, is ∀x([α → β]∗ (x) ↔ ∀y([α]∗ (y) → [β]∗ (x ◦ y))) For a given problem, the set of semantic axioms is implicitly instantiated for every wﬀ that occurs in the problem; this set of instances together with a translation of the initial conﬁguration, in which α : λ is translated as [α]∗ (λ), can also be taken as a compiled form of the problem. Any standard ﬁrst order theorem prover, for example Otter [12], could be used to ﬁnd refutations, although not always very eﬃciently. In [4], a decidable refutation theorem prover based on the methods of Davis Putnam [8], Hyper-resolution [13] and model generation [11] was taken as the proof system and shown to be sound and complete with respect to the semantics. A similar approach can be taken for LL and RL, here called AlgMG, but appropriate new restrictions to retain decidability for LL and RL are required and deﬁnitions of these are the main contribution of this paper. The CLDS approach is part of a systematic general framework that can be applied to any logic, either old or new. In case a CLDS corresponds to a known logic, the correspondence with a standard presentation of that logic must also be provided. That is, it must be shown that (i) every derivation in the chosen standard presentation of that logic can be simulated by the rules of the CLDS, in this case by the refutation theorem prover, and (ii) how to build an interpretation such that, if a formula α is not a theorem of the logic in question, then there is an appropriate model in which a suitable declarative unit constructed using α is false. It is this second part that needs care in order to obtain a decidable system for the two logics LL and RL. The approach taken in a refutation CLDS is illustrated in Fig. 1, where C and C are conﬁgurations and ¬F OT (C) denotes the disjunction of the negations of the translated declarative units in F OT (C). Arrow (2) represents the soundness and completeness of the refutation prover and arrow (1) is the deﬁnition of the semantics of a CLDS. The derived arrow (3) represents a soundness and completeness property of the refutation procedure with respect to conﬁgurations. A fuller description of the language, labelling algebra and axioms modelling the derivation rules for the languages under consideration is given in Sect. 2, whilst Sect. 3 outlines the theorem prover and the results concerning soundness

138

Krysia Broda

and completeness. The main result of the paper, dealing with decidability, is in Sect. 4, with proofs of other properties in Sect. 5 and the paper concludes with a brief discussion in Sect. 6.

2

Refutation CLDS for Substructural Logics

The CLDS approach for the implication fragment1 of LL and RL is now described. Deﬁnitions of the language, syntax and semantics are given, and conﬁgurations are introduced. 2.1

Languages and Syntax

A CLDS propositional language is deﬁned as an ordered pair LP , LL , where LL is a labelling language and LP is a propositional language. For the implication fragment of LL and RL the language LP is composed of a countable set of proposition symbols, {p, q, r, . . .} and the binary connective →. A special proposition symbol is ⊥, where ¬A is deﬁned also as A → ⊥, so allowing negation to be represented. (The wﬀ is sometimes used in place of ⊥ → ⊥.) The labelling language LL is a fragment of a ﬁrst-order language composed of a binary operator ◦, a countable set of variables {x, y, z, . . .}, a binary predicate , the set of logical connectives {¬, ∧, ∨, →, ↔}, and the quantiﬁers ∀ and ∃. The ﬁrst-order language F unc(LP , LL ) is an extension of LL as follows. Definition 1. Let the set of all wﬀs in LP be {α1 , α2 , . . .}, then the semiextended labelling language F unc(LP , LL ) comprises LL extended with a set of skolem constant symbols {cα1 , cα2 , . . .}, also referred to as parameters. Terms of the semi-extended labelling language Func(LP , LL ) are deﬁned inductively, as consisting of parameters and variables, together with expressions of the form λ◦λ for terms λ and λ , and are also called labels. Note that the parameter cα represents the smallest label verifying α and that all parameters will have a special role in the semantics. There is the parameter 1 (shorthand for c ) that represents the empty resource, since is always provable. To capture diﬀerent classes of logics within the CLDS framework an appropriate ﬁrst-order theory written in the language Func(LP , LL ), called the labelling algebra, needs to be deﬁned. The labelling algebra is a binary ﬁrst-order theory which axiomatises (i) the binary predicate as a pre-ordering relation and (ii) the properties identity and order preserving of the commutative and associative function symbol ◦. For RL, the structural property contraction is also included. Definition 2. The labelling algebra AL , written in Func(LP , LL ), is the ﬁrst order theory given by the axioms (1) - (5), where x, y and z all belong to Func(LP , LL ). The algebra AR is the algebra AL enhanced by axiom (6). 1

Restricted in order to keep the paper short.

A Decidable CLDS for Some Propositional Resource Logics

139

(identity) ∀x[1 ◦ x x ∧ x 1 ◦ x] (order-preserving) ∀x, y, z[x y → x ◦ z y ◦ z ∧ z ◦ x z ◦ y] (pre-ordering) ∀x[x x] and ∀x, y, z[x y ∧ y z → x z] (commutativity) ∀x, y[x ◦ y y ◦ x] (associativity) ∀x, y, z[(x ◦ y) ◦ z x ◦ (y ◦ z)] and ∀x, y, z[x ◦ (y ◦ z) (x ◦ y) ◦ z] 6. (contraction) ∀x[x ◦ x x] 1. 2. 3. 4. 5.

The CLDS language facilitates the formalisation of two types of information, (i) what holds at particular points, given by the declarative units, and (ii) which points are in relation with each other and which are not, given by constraints (literals). A declarative unit is deﬁned as a pair “formula:label” expressing that a formula “holds” at a point. The label component is a ground term of the language F unc(LP , LL ) and the formula is a wﬀ of the language LP . A constraint is any ground literal in F unc(LP , LL ) of the form λ1 λ2 or λ1 λ2 ), where λ1 and λ2 are labels, expressing that λ2 is, or is not, related to λ1 . In the applications considered here, little use will be made of negated constraints. In Intuitionistic Logic “related to” was interpreted syntactically as “subset of”, but for LCLDS it is interpreted as “has exactly the same elements as” and for RCLDS as “has the same elements as, but possibly with more occurences”. This combined aspect of the CLDS syntax yields a deﬁnition of a CLDS theory, called a conﬁguration, which is composed of a set of constraints and a set of declarative units. An example of a conﬁguration was given in the introduction. The formal deﬁnition of a conﬁguration is as follows. Definition 3. Given a CLDS language, a conﬁguration C is a tuple D, F , where D is a ﬁnite set of constraints (referred to as a diagram) and F is a function from the set of ground terms of Func(LP , LL ) to the set of sets of wﬀs of LP . Statements of the form α ∈ F(λ) will be written as α : λ ∈ C. 2.2

Semantics

The model-theoretic semantics of CLDS is deﬁned in terms of a ﬁrst-order semantics using a translation method. This enables the development of a modeltheoretic approach which is equally applicable to any logic also belonging to diﬀerent families whose operators have a semantics which can be expressed in a ﬁrst-order theory. As mentioned before, a declarative unit α : λ represents that the formula is veriﬁed (or holds) at the point λ, whose interpretation is strictly related to the type of underlying logic. These notions are expressed in terms of ﬁrst-order statements of the form [α]∗ (λ), where [α]∗ is a predicate symbol. The relationships between these predicate symbols are constrained by a set of ﬁrst-order axiom schemas which capture the satisﬁability conditions of each type of formula α. The extended labelling algebra M on(LP , LL ) is an extension of the language F unc(LP , LL ) given by adding a monadic predicate symbol [α]∗ for each wﬀ α of LP . It is formally deﬁned below.

140

Krysia Broda

Table 1. Basic and clausal semantic axioms for LCLDS and RCLDS Ax1: Ax2: Ax3: Ax2a: Ax2b: Ax3a: Ax3b: Ax3c:

∀x∀y(x y ∧ [α]∗ (x) → [α]∗ (y)) ∀x([α]∗ (x) → ∃y([α]∗ (y) ∧ ∀z([α]∗ (z) → y z))) ∀x([α → β]∗ (x) ↔ ∀y([α]∗ (y) → [β]∗ (x ◦ y))) ∀x([α]∗ (x) → [α]∗ (cα )) ∀x([α]∗ (x) → cα x) ∀x∀y([α → β]∗ (x) ∧ [α]∗ (y) → [β]∗ (x ◦ y)) ∀x([α → β]∗ (x) ← [β]∗ (x ◦ cα )) ∀x([α → β]∗ (x) ∨ [α]∗ (cα ))

Definition 4. Let F unc(LP , LL ) be a semi-extended labelling language. Let the ordered set of wﬀs of LP be α1 , . . . , αn , . . ., then the extended labelling language, called M on(LP , LL ), is deﬁned as the language F unc(LP , LL ) extended with the set {[α1 ]∗ , . . . , [αn ]∗ , . . .} of unary predicate symbols. The extended algebra A+ L for LCLDS is a ﬁrst-order theory written in M on(LP ,LL ), which extends the labelling algebra AL with a particular set of axiom schemas. A LCLDS system S can now be deﬁned as S =

LP , LL , A+ L , AlgMG , where AlgMG is the program for processing the ﬁrst order theory A+ L . Similarly for that includes the (contraction) property. RCLDS , but using A+ R The axiom schemas are given in Table 1. There are the basic axioms, (Ax1) (Ax3), and the clausal axioms, (Ax3a), (Ax3b), etc., derived from them by taking each half of the ↔ in turn. The ﬁrst axiom (Ax1) characterises the property that increasing labels λ and λ , such that λ λ , imply that the sets of wﬀs veriﬁed by those labels are also increasing. The second axiom (Ax2) characterises a special property that states that, if a wﬀ α is veriﬁed by some label, then it is veriﬁed by a “smallest” label. Both these axioms relate declarative units to constraints. The axiom (Ax3) characterises the operator →. Several of the axioms have been simpliﬁed by the use of parameters, (Ax1) and (Ax2) (eﬀectively applying Skolemisation). In (Ax2) the variable y is Skolemised to the parameter cα . The Skolem term cα is a constant, not depending on x, and this is the feature that eventually yields decidability. A standard Skolemisation technique would result in a function symbol depending on x, but the simpler version suﬃces for the following reason. Using (Ax2), any two “normal” Skolem terms, cα (x1 ) and cα (x2 ), would satisfy cα (x1 ) cα (x2 ) and cα (x2 ) cα (x1 ) By (Ax1) this would allow the equivalence of [α]∗ (cα (x)) and [α]∗ (cα (y)) for any x and y. The single representative cα is introduced in place of the “normal” Skolem terms cα (x). It is not very diﬃcult to show that, for any set S of instances of the axiom schema Skolemised in the “normal” way using Skolem symbols cα (x), S is inconsistent iﬀ the same set of instances of the axioms, together with a set of clause schema of the form ∀x([α]∗ (cα ) ↔ [α]∗ (cα (x))), is inconsistent.

A Decidable CLDS for Some Propositional Resource Logics

141

The Skolemised (Ax2) can also be simpliﬁed to the following equivalent version (also called (Ax2)) ∀x([α]∗ (x) → ([α]∗ (cα ) ∧ cα x)) from which (Ax2a) and (Ax2b) are derived. In the system of [4] for IL a further simpliﬁcation was possible, in that (Ax3c) could be replaced by [α → β]∗ (1) ∨ [α]∗ (a). This is not the case for LCLDS or RCLDS , which consequently require a slightly more complicated algorithm AlgMG. The clausal axioms in Table 1, together with the appropriate properties of the Labelling Algebra, are also called or A+ . It is for ﬁnite sets of instances of the Extended Labelling Algebra, A+ L R these axioms that a refutation theorem prover is given in Sect. 3. The notions of satisﬁability and semantic entailment are common to any CLDS and are based on a translation method which associates syntactic expressions of the CLDS system with sentences of the ﬁrst-order language M on(LP ,LL), and hence associates conﬁgurations with ﬁrst-order theories in the language M on(LP , LL ). Each declarative unit α : λ is translated into the sentence [α]∗ (λ), and constraints are translated as themselves. A formal deﬁnition is given below. Definition 5. Let C = D, F be a conﬁguration. The ﬁrst-order translation of C, F OT (C), is a theory in M on(LP , LL ) and is deﬁned by the expression: F OT (C) = D ∪ DU , where DU = {[α]∗ (λ) | α ∈ F(λ), λ is a ground term of F unc(LP , LL )}. The notion of semantic entailment for LCLDS as a relation between conﬁgurations is given in terms of classical semantics using the above deﬁnition. In what follows, + wherever A+ L and |=L are used, AR and |=R could also be used, assuming the additional property of (contraction) in the Labelling Algebra.2

Definition 6. Let S =

LP , LL , , A+ , AlgMG be a LCLDS , C = D, F and C = L

D , F be two conﬁgurations of S, and F OT (C) = D∪DU and F OT (C ) = D ∪ DU be their respective ﬁrst-order translations. The conﬁguration C semantically entails C , written C |=L C , iﬀ A+ L ∪ F OT (C) ∪ ¬F OT (C ) |=F OL .

If δ is a declarative unit or constraint belonging to C and F OT (δ) its ﬁrst order translation, then C |=L C implies that A+ L ∪ F OT (C) ∪ ¬F OT (δ) |=F OL , which will also be written as C |=L δ. Declarative units of the form α : 1, such that T∅ |=L α : 1, where T∅ is an empty conﬁguration (i.e. D and F are both empty), are called theorems. In order to show that a theorem α : 1 holds in LCLDS (RCLDS ), appropriate instances (A+ ) are ﬁrst formed for each subformula of α, and then of the axioms in A+ L R ∗ ¬[α] (1) is added. This set of clauses is refuted by AlgMG. More generally, to show that α follows from the wﬀs β1 , . . . , βn , the appropriate instances include those for each subformula of α, β1 , . . . , βn , together with ¬[α]∗ (i), where i = cβ1 ◦ . . . ◦ cβn , together with the set {[βj ]∗ (cβj )}. This derives from consideration 2

Recall ¬F OT (C) means the disjunction of the negation of the literals in F OT (C).

142

Krysia Broda

of the deduction theorem, namely, that {βj } implies α iﬀ β1 → . . . βn → α is a theorem. Notice that, if a formula β occurs more than once, then [β]∗ (cβ ) need only be included once in the translated data, but its label cβ is included in i as many times as it occurs.

3

A Theorem Prover for LCLDS and RCLDS Systems

The Extended Labelling Algebra A+ enjoys a very simple clausal form. The L theorem prover AlgMG, described below as a logic program, uses an adaptation of the Model Generation techniques [11]. The axioms of the Labelling Algebra AL , or, including (contraction), AR , together with Axioms (Ax1) and (Ax2a) are incorporated into the uniﬁcation algorithm, called AlgU. Axioms (Ax1), (Ax2a) and (Ax2b) were otherwise accounted for in the derivation of the remaining axioms and are not explicitly needed any further. First, some deﬁnitions are given for this particular kind of ﬁrst order theory. Note 1. In this section, a clause will either be denoted by C, or by L ∨ D, where L is a literal and D is a disjunction of none or more literals. All variables are implicitly universally quantiﬁed. Literals are generally denoted by L or ¬L, but may also be denoted by: L(x) or L(y), when the argument is exactly the variable x or y, L(u), when the argument contains no variables, L(xu), when it contains a variable x and other ground terms u, in which case u is called the ground part, or L(w) when the argument may, or may not, contain a variable. The suﬃces 1 , 2 , etc. are also used if necessary. For ease of reading and writing, label combinations such as a ◦ b ◦ c will be written as abc. It is convenient to introduce the multi-set diﬀerence operator − on labels in which every occurrence counts. For example, aab − ab = a and ab − 1 = ab. In the sequel, by non-unit parameter will be meant any parameter cα other than c (=1). Definition 7. For a given set of clauses S, the set DS , the Herbrand Domain of S, is the set {cα |cα is a non-unit parameter occurring in S}∪{1}. The Herbrand Universe of S is the set of terms formed using the operator ◦ applied to elements from the Herbrand Domain. A ground instance of a clause C or literal L (written Cθ or Lθ) is the result of replacing each variable xi in C or L by a ground term ti from the Herbrand Universe, where the substitution θ = {xi := ti }. Definition 8. u1 uniﬁes with u2 (with respect to AlgU) iﬀ u1 u2 . Notice that uniﬁcation is not symmetric. In AlgMG it is also necessary to unify non-ground terms and details of the various cases (derived from the ground case), which are diﬀerent for each of RL and LL, are given next. They are labelled (a), (b) etc. for reference. (a) (ground, ground + var) u1 uniﬁes with xu2 , where u2 may implicitly be the label 1, iﬀ there is a ground substitution θ for x such that u1 uniﬁes

A Decidable CLDS for Some Propositional Resource Logics

143

with (xu2 )θ. In the case of LL there is only one possible value for θ, viz. x := u1 − u2 , but in the case of RL there may be several possible values, depending on the number of implicit contraction operations applied to u1 . For example, aaa uniﬁes with ax, with substitutions x := 1, x := a or x := aa. (b) (ground+var, ground) xu1 uniﬁes with u2 , where u1 may implicitly be the label 1, iﬀ there is a ground substitution θ such that (xu1 )θ uniﬁes with u2 . The substitution θ is chosen so that (xu1 )θ is the largest possible term that uniﬁes with u2 (under ). For example, in RL, ax uniﬁes with ab with substitution x := b, even though other substitutions for x are possible, eg x := abb.3 If u1 = 1 this case reduces to x := u2 . (c) (var+ground, var+ground) x1 u1 uniﬁes with x2 u2 iﬀ there are substitutions θ1 and θ2 for variables x1 and x2 of the form x1 := u3 x and x2 := u4 x, such that u1 u3 uniﬁes with u2 u4 . Either or both of u1 , u2 may implicitly be the label 1. The substitution for x1 is maximal (under ), in the sense that any other possible substitution for x1 has the form x1 := u5 x, where u5 u3 . In LL there is only one possible substitution for x2 of the right form, namely x2 := x ◦ (u1 − u2 ). In RL there may be several possible substitutions, depending on the number of implicit contraction steps. For example, in RL, aax1 uniﬁes with bx2 with both the substitutions x1 := bx, x2 := ax or x1 := bx, x2 := aax. However, because of the presence of the variable x in the substitution for x2 , it is only necessary to use the maximal substitution, which is the ﬁrst one. The reader can check the correct results are obtained if u1 = 1 or u2 = 1, respectively, that x1 = x2 u2 or x2 = u1 x1 . Subsumption can also be applied between literals. Definition 9. L(w) subsumes L(w ) iﬀ w uniﬁes with w with uniﬁer θ and L(w ) is identical to L(w)θ. This deﬁnition leads to the following cases. (d) (ground, ground) L(u1 ) subsumes L(u2 ) iﬀ u1 u2 (e) (ground, ground+var) L(u1 ) does not subsume L(xu2 ). (f ) (ground+var, ground) L(xu1 ) subsumes L(u2 ) iﬀ there is a ground substitution θ for x such that (xu1 )θ uniﬁes with u2 . (g) (ground+var, ground+var) L(x1 u1 ) subsumes L(x2 u2 ) iﬀ there is a substitution θ for x1 of the form x1 := x2 u3 such that u3 u1 uniﬁes with u2 . For example, in RL, P (xaa) subsumes P (ay) and P (aby), but it does not subsume P (by). Literal L subsumes clause C iﬀ L subsumes a literal in C. Definition 10. Unit clause L(w) resolves with D ∨ ¬L(w ) to give Dθ iﬀ w uniﬁes with w with uniﬁer θ. If D is empty and L(w) and ¬L(w ) resolve, then they are called complements of each other. A Hyper-resolvent is a clause with no negative literals formed by resolving a clause with one or more positive unit clauses. 3

Recall that in the presence of contraction bb b.

144

Krysia Broda

Brief Overview of AlgMG. AlgMG for the implication fragment operates on sets of clauses, each of which may either be a Horn clause (including unit clauses), or a non-Horn clause of the form ∀x([α]∗ (cα ) ∨ [α → β]∗ (x)). There is just one kind of negative unit clause, ¬[α]∗ (i), derived from the initial goal, where α is the wﬀ to be proved and i = i1 ◦ . . . ◦ in is the label consisting of the parameters i1 , . . . , in that verify the formulas from which α is to be proved. AlgMG incorporates the special uniﬁcation algorithm AlgU, which is used to unify two labels x and z, where x and/or z may contain a variable, implicitly or A+ and the diﬀerent deﬁnitions of taking into account the properties of A+ L R uniﬁer (cases (a) to (c) above). Notice that the order of parameters in a label does not matter because of the properties (associativity) and (commutativity), so abc would match with bca, for example. By (identity), the parameter 1 is only explicitly needed in the label 1 itself, which is treated as the empty multiset. There are, in fact, only a restricted number of kinds of uniﬁcation which can arise using AlgMG and these are listed after the available rules have been described. The initial set of clauses for refuting a formula α are derived from instances of the semantic axioms appropriate for the predicates occurring in the ﬁrst order translation of α (called the “appropriate set of clauses for showing α”). There are seven diﬀerent rules in AlgMG, which can be applied to a ﬁnite list of clauses. Five are necessary for the operation of the algorithm and the other two, (Simplify) and (Purity), are useful for the sake of practicality; only (Simplify) is included here. The (Purity) rule serves to remove a clause if it can be detected that it cannot usefully contribute to the derivation. Unit clauses in a list, derived by the (Hyper) or (Split) rule, or given initially, are maintained as a partial model of the initial clauses. The following rules are available in AlgMG: End A list containing an atom and its complement is marked as successfully ﬁnished. The only negative unit clause is derived from the initial goal. Subsumption Any clause subsumed by a unit clause L is removed. Simplify A unit clause [α]∗ (x) can be used to remove any literal ¬[α]∗ (w) in a clause since [α]∗ (x) complements ¬[α]∗ (w). Fail A list in which no more steps are possible is marked as failed and can be used to give a model of the initial clauseset. Hyper A hyper-resolvent (with respect to AlgU) is formed from a non-unit clause in the list and (positive) unit clauses in the list. Only hyper-resolvents that cannot immediately be subsumed are generated. Split If L is a list of clauses containing clause L ∨ L , two new lists [L |L− ] and [L |L− ] are formed, where L− results from removing L ∨ L from L. The algorithm is then applied to each list. The possible opportunities for uniﬁcation that arise in AlgMG are as follows: 1. Uniﬁcation of a label of the form xu in a positive literal, where x may be missing, with y in a negative literal in a (Hyper) step – the uniﬁer is given as in case (a) or case (c) as appropriate. 2. Uniﬁcation of a label x in a positive literal with some label w in a (Simplify) step. This always succeeds and w is unchanged. (This is a special case of (b) or (c).)

A Decidable CLDS for Some Propositional Resource Logics

145

3. Uniﬁcation of a label of the form xu1 in a positive literal, where either of x or u1 may be missing, with u2 in the negative literal in an (End) step. This is either the ground case of uniﬁcation, that is u1 u2 , or case (b). 4. Uniﬁcation in a (Hyper) step between a label of the form xu, where either x or u may be missing, with cα y. This is again either case (a) or (c). If use of either the (Hyper) or (Simplify) rule yields a label in which there are two variables, they can be replaced by a new variable x. The (Hyper) rule is the problem rule in AlgMG for the systems LCLDS and RCLDS . Its unrestricted use in a branch can lead to the generation of atoms with labels of increasing length. For example, the clause schema arising from α → α is [α → α]∗ (x) ∧ A(y) → A(xy), which, if there are atoms of the form [α → α]∗ (u1 ) and A(u2 ), will lead to A(u1 u2 ), A(u1 u2 u2 ) and so on, possibly none of them subsumed by earlier atoms. Therefore, without some restriction on its use, development of a branch could continue forever. The LDS tableau system in [1] and the natural deduction system in [2] both exhibited a similar problem, but its solution was not addressed in those papers. In the application to IL, due to the additional property of monotonicity in the labelling algebra, that x x ◦ y, labels could be regarded as sets of parameters. Together with the fact that the Herbrand Domain for any particular problem was ﬁnite, there was an upper bound on the size of labels generated (i.e. on the number of occurrences of parameters in a label) and hence the number of applications of (Hyper) was ﬁnite and termination of the algorithm was assured. In the two systems LCLDS and RCLDS this is not so any more and a more complex bound must be used to guarantee termination. Before introducing these restrictions, an outline logic program for AlgMG is given together with some examples of its operation. Outline Program for Algorithm AlgMG. The program is given below. A rudimentary version has been built in Prolog to check very simple examples similar to those in this paper. 0(start) 1(fail) 2(end) 3(subsume)

dp(S,F,R) :- dp1 ([ ],S,F,R). dp1(M,S,M,false) :- noRulesApply(M,S). dp1(M,S,[],true) :- endApplies(S,M). dp1(M,S,F,R) :- subsumed(C,M,S), remove(C,S,NewS), dp1( M,NewS,F,R). 4(simplify) dp1(M,S,F,R) :- simplify(M,S,NewS), dp1(M,NewS,F,R). 5(hyper) dp1(M,S,F,R) :- hyper(M,S,New), add(New,S,M,NewS,NewM), dp1(NewM,NewS,F,R). 6(split) dp1(M,S,F,R) :- split(M,S,NewS,S1,S2), dp1([S1|M],NewS,F1,R1),dp1([S2|M],NewS,F2,R2), join(F1,F2,F), and(R1,R2,R).

The initial call is the query dp(S, F, R), in which F and R will be variables, and S is a list of clauses appropriate for showing α and derived from a LCLDS or RCLDS . At termination, R will be bound either to true or to f alse and in the latter case F will be bound to a list of unit clauses. The list F can be used to ﬁnd a ﬁnite model of S Assume that any subsumed clauses in the initial set of

146

Krysia Broda

(1) (2) (3) (4) (5) (6) (7) (8)

Initial clauses: P0 (a) ¬P1 (a) P2 (b) ∨ P1 (x) P3 (bx) → P1 (x) P0 (x) ∧ A(y) → B(xy) P2 (x) ∧ B(y) → C(xy) A(c) ∨ P3 (x) C(cx) → P3 (x)

Initial translation: P0 (x) [α → β]∗ (x) ∗ (β → γ) (x) P1 (x) → (α → γ) ∗ P2 (x) [β → γ] (x) [α → γ]∗ (x) P3 (x) Derivation: (9) (Split (3)) P2 (b) (10) (Split (7)) A(c) (11) (Hyper (5)) B(ac)

(12) (13) (14) (15) (16) (17) (18) (19) (20)

(Hyper (6)) (Hyper (8)) (Hyper (4)) (End ) (Split (7)) (Hyper (4)) (End) (Split (3)) (End)

C(abc) P3 (ab) P1 (a) P3 (x) P1 (x) P1 (x)

Fig. 2. Refutation of (α → β) → ((β → γ) → (α → γ)) in LCLDS using AlgMG

clauses have been removed. This means that in the initial call to dp, S contains neither subsumed clauses nor tautologies - the latter because of the way the clauses are originally formed. This property will be maintained throughout. In dp1 the ﬁrst argument is the current (recognised) set of positive unit clauses, which is assumed to be empty at the start.4 The predicates used in the Prolog version of AlgMG can be interpreted as follows ((S, M ) represents the list of all clauses in S and M ): add(New,S,M,NewS,NewM) holds iﬀ the units in N ew derived from the (Hyper) rule are added to M to form N ewM and disjunctions in N ew are added to S to form N ewS. and(X,Y,Z) holds iﬀ Z = X ∧ Y . endApplies(S,M) holds iﬀ (End) can be applied to (S, M ). hyper(M,S,New) holds iﬀ N ew is a set of hyper-resolvents using unit clauses in M and a clause in S, that do not already occur in M . The labels of any new hyper-resolvents are subject to a size restriction (see later), in order that there are not an inﬁnite number of hyperresolvents. join(F1,F2,F) holds iﬀ F is the union of F 1 and F 2. noRulesApply(M,S) holds iﬀ there are no applicable rules to (M , S). remove(P,S,NewS) holds iﬀ clause P is removed from S to give N ewS. simplify(M,S,NewS) holds iﬀ clauses in S can be simpliﬁed to N ewS by units in M . split(M,S,NewS, S1,S2) holds iﬀ S1 ∨ S2 is removed from S to leave N ewS. subsumed(C,M,S) holds if Clause C in S is subsumed by clauses from S or M . Examples. Two examples of refutations appear in Figs. 2 and 3, in which the LL theorem (α → β) → ((β → γ) → (α → γ)) and the RL theorem (α → β) → 4

In case the initial goal is to be shown from some data, in the start clause this initial data would be placed in the first argument of dp1.

A Decidable CLDS for Some Propositional Resource Logics

147

Initial clauses: (6) P1 (x) ∧ B(y) → C(xy) P0 (a) (7) P2 (x) ∧ P3 (y) → P4 (xy) P1 (b) (8) P4 (x) ∧ P3 (y) → D(xy) P2 (c) ¬D(abc) (9) A(d) ∨ P3 (x) P0 (x) ∧ A(y) → B(xy) (10) C(dx) → P3 (x) Initial translation: P3 (x) [α → γ]∗ (x) P0 (x) [α → β]∗ (x) P4 (x) [(α → γ) → δ]∗ (x) P1 (x) [β → γ]∗ (x) P2 (x) [(α → γ) → ((α → γ) → δ)]∗ (x) Derivation: (11) (Split (9)) A(d) (17) (End) (12) (Hyper (5)) B(ad) (18) (Split(9)) P3 (x) (13) (Hyper (6)) C(bad) (19) (Hyper (7)) P4 (cx) (20) (Hyper (8) D(cx) (14) (Hyper (10)) P3 (ba) (15) (Hyper (7)) P4 (bac) (21) (End ) (16) (Hyper (8)) D(bacba)

(1) (2) (3) (4) (5)

Fig. 3. Refutation in RCLDS using AlgMG

((β → γ) → ((α → γ) → ((α → γ) → δ)) → δ) are, respectively, proved. For ease of reading, the parameters used are called a, b, c, . . . instead of having the form cα→β , etc. and the predicates A, B and C are used in place of [α]∗ , [β]∗ and [γ]∗ . In Fig. 2, the (translation of the) data α → β is added as a fact and the goal is (the translation of) (β → γ) → (α → γ). In Fig. 3, the initial data α → β, β → γ and (α → γ) → ((α → γ) → δ) are added as facts. The goal in this case is δ. These arrangements simply make the refutations a little shorter than if the initial goal had been the immediate translation of the theorem to be proved. The calls to dp1 can be arranged into a tree, bifurcation occurring when the (Split) rule is used. In the derivations each line after the list of initial clauses records a derived clause. Derived unit clauses would be added to an accumulating partial model M , which is returned in case of a branch ending in failure. In Fig. 2, for example, there are three branches in the tree of calls to dp1, which all contain lines (1) - (8) implicitly and terminate using the (End) rule. The ﬁrst branch contains lines (9) - (15), the second contains lines (9), (16) - (18), and the third contains lines (19), (20). Deletions of clauses due to purity and subsumption, and of literals due to simplify are not made, for the sake of simplicity. However, line (17) could have been achieved by a (Simplify) step instead. A possible subsumption step after line (16) is the removal of clauses (7) and (8). Notice that, in Fig. 2 only some of the appropriate axioms have been included. It might be expected that clauses derived from both halves of the appropriate equivalence schemas would be included, resulting in the inclusion of, for instance, P3 (x) ∧ A(y) → C(xy). However, it is only necessary to include a restricted number of clauses based on the polarity of the sub-formula occurrences.

148

4 4.1

Krysia Broda

Main Results Termination of AlgMG

In this section suitable termination criteria are described for the (Hyper) rule of AlgMG for the two logics in question, Linear Logic and Relevance Logic. A diﬀerent condition is imposed for each of LCLDS and RCLDS and in such a way that termination of a branch without the use of (End) will not cause loss of soundness. That is, AlgMG will terminate a branch without (End) only if the original goal to be checked is not a theorem of LL (or RL). It is assumed that the translation of the initial goal α is initially included in the list S in AlgMG in the form ¬[α]∗ (1). The termination conditions for the two logics are, at ﬁrst sight, rather similar; however, the condition for LL uses a global restriction, whereas that for RL uses local restrictions, dependent on the particular development of the AlgMG tree. When forming the translation of a conﬁguration, clauses corresponding to axiom (Ax3c) for which the same wﬀ α is involved all make use of the same parameter cα . The number of occurences of a non-unit parameter cα for wﬀ α in an instance of axiom (Ax3c) is called the relevant index of cα and is denoted by mα . For example, in case an occurrence of axiom (Ax3c) is made for the two wﬀs α → β and α → γ, then the two occurrences would be [α]∗ (cα ) ∨ [β]∗ (x) and [α]∗ (cα ) ∨ [γ]∗ (x) and mα = 2. Definition 11. Let LCLDS be a propositional Linear LDS based on the languages LP and LL , and S be a set of clauses appropriate for showing the wﬀ α. The ﬁnite subset of terms in Func(LP , LL ) that mentions only parameters in S and does not include any non-unit parameter cα more times than its relevant index mα is called the restricted Linear Herbrand Universe HL . The restricted set of ground instances SHL is the set of ground instances of clauses in S such that every argument is in HL . The restricted atom set BHL is the set of atoms using predicates mentioned in S and terms in HL . Termination in LCLDS . The criterion to ensure termination in LCLDS is as follows: Let B be a branch of a tree generated by AlgMG; an atom L(w) may be added to B only if it is not subsumed by any other atom in B and has a ground instance L(u), where u ∈ HL . (Notice that any atom of the form P (ux), where u contains every parameter exactly mα times, has only one ground instance, P (w), such that w ∈ HL . This instance occurs when x = 1 and w = u. This atom would therefore only be added to B if not already present.) The above criterion places an upper bound on the potential size of u such that, at worst, there can be Π(mαi + 1) atoms for each predicate in any branch, where mαi are the relevant indices for non-unit parameters cαi . There is one predicate for each subformula in α, the given formula to be tested. In fact, for

A Decidable CLDS for Some Propositional Resource Logics

149

LL, it is possibly simpler to use a more restrictive translation, in which a diﬀerent parameter is introduced for each occurrence of α. Then the relevant index of any non-unit parameter is always 1, and the terms in HL are restricted to containing any non-unit parameter at most once. The formula for the number of atoms then reduces to 2n , where n is the number of non-unit parameters introduced by the translation. In practice there are fewer than this maximum due to subsumption. If AlgMG is started with an initial set of sentences S appropriate for showing α and termination occurs with (End) in all branches, then, as is shown in Sect. 5, α is a theorem of LL. On the other hand, suppose termination of a branch B occurs without using (End), possibly because of the size restriction. Then a model of SHL can be constructed as follows: Assign true to each atom in BHL that occurs also in B or that is subsumed by an atom in B, and false to all other atoms in BHL . For illustration, if the example in Fig. 3 were redone using LCLDS , then the step at line (16) would not have been generated, nor could the branch be extended further; the atoms in it can be used to obtain a ﬁnite model of the initial clauses. The following atoms would be assigned true: P0 (a), P1 (b), P2 (c), A(d), B(ad), C(bad), P3 (ba), P4 (bac) and all other atoms in BL would be assigned false. It is easy to check that this is a model of the ground instances of clauses (1) - (10) whose terms all lie in HL . Suppose that each clause C in S is modiﬁed by the inclusion of a new condition of the form restricted(x), one condition for each variable x in C. The atom restricted(x) is to be interpreted as true exactly if x lies within HL . It is easy to show that the set of modiﬁed clauses is unsatisﬁable over DS iﬀ the set SHL is unsatisﬁable. This property will be exploited when proving the correspondence of LCLDS with LL. Termination in RCLDS . In the case of RL, the termination is complicated by the presence of contraction, illustrated in the example in Fig. 3, where the atom D(bacba), derived at line (16), includes the parameter b more than mb times (mb = 1).5 The restriction dictating which atoms to generate by (Hyper) in RCLDS uses the notion of relevant set, which in turn uses the notion of full labels. Unlike the case for LL, there is no easily stated global restriction on labels (such as that indicated by restricted(x)). The criterion described below was inspired by the description given in [16] for the relevant logic LR. Definition 12. Let RCLDS be a propositional relevant LDS based on LP and LL and S be a set of clauses appropriate for showing α. A ground label in LL , that mentions only parameters in S and in which every non-unit parameter a occurs at least ma times, is called full. A ground label in LL , that mentions only parameters in S and is not full, is called small. A parameter a that occurs in a 5

The inclusion of P (b) in the data is due to an implied occurrence of axiom (Ax3c) and there is just one such implicit occurrence.

150

Krysia Broda

small label, but less than ma times, belongs to its small part. A parameter a that occurs in a label (either full or small) at least ma times belongs to its full part. A ground atom having a predicate occurring in S that has a full/small label is also called a full/small atom. Definition 13. Let RCLDS be a propositional relevant LDS based on LP and LL and S be a set of clauses appropriate for showing α. Suppose that B is a branch derived from the application of AlgMG such that no subsumption steps can be made to B and let P (u1 ) be a ground atom occurring in B. The relevant set of P (u1 ) (relative to B), is the set of ground atoms P (u2 ) such that: only parameters occurring in S occur in u2 and either, (i) there is at least one non-unit parameter a in P (u1 ) occuring k times, 0 ≤ k < ma , that also occurs in P (u2 ) more than k times, or, (ii) there is at least one non-unit parameter a in P (u1 ) occuring k times, 1 ≤ k, that occurs in P (u2 ) zero times. As an example, suppose there are two parameters a and b and that ma = 2 and mb = 3, then the relevant set of P (aab) (=P (a2 b)) is the set of atoms of one of the forms: P (ar b2 ), P (ar b3 ), P (ar bp ), where r ≥ 1, p ≥ 4, or P (bs ), P (as ), where s ≥ 0. The relevant set of the full atom P (a2 b3 ) is the set of atoms of the form P (as ) or P (bs ), where s ≥ 0. If P (w) is not ground, then the relevant set is the intersection of the relevant set of each ground instance of P (w). The criterion to ensure termination in RCLDS can now be stated. In RCLDS the (Hyper) rule is restricted so that a ground atom P (w) is only added to a branch B if (i) it is not subsumed by any literal in B and (ii) it belongs to the relevant set of every other P -atom in B. In other words, if P (w) is added to a branch, then for every atom P (z) in the branch, either the number of occurrences of at least one non-unit parameter a in z that occurs fewer than ma times is increased in w, or some non-unit parameter in z is reduced to zero in w. Notice that, if there are no P -atoms in the branch, then P (w) can be added vacuously according to the criterion. In case the (Hyper) rule generates a non-ground atom, then as long as it is not subsumed and some ground instance of it satisﬁes property (ii) above it can be added to the branch. Although relevant sets are (countably) inﬁnite, the impact of all relevant sets having to include any new literal in a branch is quite strict and very quickly reduces the number of possibilities to a ﬁnite number. For instance, a literal P (u) in a branch with a small label u = u1 u2 , where u1 is the small part of u, will prevent any other literal P (u ), where the small part of u is subsumed by u1 , from being added to the branch. For instance, if P (a4 b2 ) belongs to a branch, and ma = 2, mb = 3, then no literal of the form P (as b2 ) or P (as b), s ≥ 1, can be added to the branch. If ma = mb = 2, then no literal of the form P (as br ) can be added, s ≥ 1, r ≥ 1. For any particular set of initial clauses there are only a ﬁnite number of labels that can occur as small parts of labels. This observation means that the maximum number of literals in a branch will be ﬁnite. It also

A Decidable CLDS for Some Propositional Resource Logics

151

allows for the following deﬁnition of measure for a branch that decreases with each new atom added to the branch. Definition 14. Let

LP , LL , A+ R , AlgMG be a RCLDS and S be a set of clauses appropriate for showing α. The relevant measure of the positive atoms in a branch B derived using AlgMG, with no pairwise subsumption, is deﬁned as the sum, over each predicate P in S, of the number of possible small parts of labels that do not occur in any P -literal in B or in any P -literal subsumed by a literal in B. It is easy to see that, when a new atom P (w) is added to a branch B by AlgMG, then the relevant measure will decrease. Eventually, either (i) (End) will be applied to B, or (ii) the measure of B will have been reduced to zero, or (iii) no further steps are possible using AlgMG. For example, suppose that branch B includes just the atom P (a2 b), that there is one predicate P and two parameters a and b each with a relevant index of 2. The relevant measure is 7, since the small parts a2 b and ab are, respectively, covered by P (a2 b) and P (ab), subsumed by P (a2 b). If P (a2 b2 ) is now added then the branch measure is reduced to 5. Also, the literal P (a2 b) would be subsumed. In summary, in applying AlgMG, an atom can be added to a branch as long as it respects the following (informal) criterion: LCLDS An atom is added to a branch B only if the ground part of its label belongs to HL and if it is not subsumed by any atom in B. RCLDS An atom P (w1 ) is added to a branch B only if it has a ground instance which belongs to some relevant set of every atom in B and if it is not subsumed by any atom in B. In practice, this means that P (w1 ) is not subsumed, and, for each atom P (w2 ), it must either increase the number of occurrences of at least one non-full parameter in w2 , or it must reduce the number of occurences of at least one non-unit parameter in w2 to zero. 4.2

Properties of AlgMG.

There are several properties that hold about the relationship between the Semantics given by the Axioms in the Extended Labelling Algebra A+ L and the procedure AlgMG, which are stated in Theorem 1. A proof of these properties can be made in a similar way to that given in [4] for IL. An outline is given here, including in detail the new cases for the two logics LL and RL. Theorem 1 (Properties of AlgMG). Let S be a LCLDS , α be a propositional LL formula, A+ (α) be the particular clauses and instances of the Semantic AxL ∗ ioms for showing α and Gα = A+ L (α) ∪ {¬[α] (1)}. Let AlgMG be initiated by the call dp(Gα , F, R) for variables F and R, then the following properties hold: 1. If AlgMG returns R = true then Gα |=FOL . 2. If AlgMG returns R = f alse then F is a partial model of Gα , in a way to be explained. 3. AlgMG terminates.

152

Krysia Broda

4. If α is also a Hilbert theorem of propositional LL (i.e. α can be derived from the Hilbert Axioms for LL and Modus Ponens), then Gα |=FOL . 5. If Gα |=FOL then α is a theorem of LL. Similar properties hold for RL. In AlgMG every step (except (Hyper)) reduces the total number of literals in M ∪ S. However, the number of (Hyper) steps is restricted to a ﬁnite number in RL by the use of relevant sets and in LL by the restriction of terms to belong to HL . Exactly the same proof for termination of AlgMG as in [4] can then be used. Properties (1) and (2) are soundness and completeness results for AlgMG, in the sense that they show that the algorithm is correct with respect to ﬁnding refutations. These properties can be proved as in [4], except for the case of clause 1, the case that covers extending the resulting value of F to become a model of the clauses S, which is detailed in the proof of Lemma 1. Properties (4) and (5) show that AlgMG corresponds with LL, (4) showing it gives a refutation for any theorem of LL, and (5) showing that it only succeeds for theorems. Similarly for RL. Proofs of these properties can be made following the same proof structure as in [4], but with some changes to cope with the diﬀerent logics. Lemmas 2 and 3 give the details for the two logics considered in this paper.

5

Proving the Properties of AlgMG

Proving Properties 1 and 2. Properties (1) and (2) of AlgMG are proved by showing that the following proposition, called (PROP1and2) holds for each clause of (dp1): if the dp1 conditions of the clause satisfy invariant (INV) and the other conditions are also true, then the dp1 conclusion of the clause satisﬁes (INV) also, where (INV) is Either, R = false, M ⊆ F and F can be extended to a model of S or, R = true, F = [ ] and M ∪ S have no Herbrand models. For the case of LCLDS , when R = false F is extended to be a model of the ground instances of S, taken over the domain of the initial clauses set of clauses S, SHL , which are called restricted ground instances in the Lemma below. Note that, for the (End) clause in LCLDS , when R = true, it is the set of restricted ground instances of M ∪ S that has no models. This implies that M ∪ S also has no Herbrand models, for any such model would also be a model of the restricted instances. (It suﬃces to deal with Herbrand models since nonexistence of a Herbrand model of S implies the non-existence of any model of S (see, for example, [7]).) Lemma 1. The fail clause of dp1 satisﬁes (PROP1and2).

A Decidable CLDS for Some Propositional Resource Logics

153

Proof. The details of the proof are diﬀerent for each of the two logics. For LL a model of restricted ground instances is found, whereas for RL a Herbrand model is given. R is false; all rules have been applied and F = M . Certainly, M ⊆ F . There are then two cases: for LL and for RL. Case for Linear Logic. The set F is extended to be a model M0 of the restricted ground instances of the clauses remaining in S as follows: Any ground atom with label in HL that is subsumed by a literal in M is true in M0 . All other ground atoms with label in HL are false in M0 . The clauses left in S can only generate subsumed clauses, disallowed atoms or they are a negative unit. Assume that there is a restricted ground instance of a non-negative clause C in S that is false in M0 . That is, for some instance C , of C, its condition literals are true in M0 and its conclusion is false in M0 . If the conclusion is a single literal then, as (Hyper) has been applied to C already, the conclusion is either true in M , and hence in M0 , or it is subsumed by a clause in M , and again is true in M0 . Both contradict the assumptions. If the conclusion is a disjunction, then (Split) must have eventually been applied and the conclusion will again be true in M , or the disjunction is subsumed by a literal in M , contradicting the assumption. In case C = ¬L is a false negative unit clause in S, then some instance C = ¬L is false, or L is true in M0 . But in that case (End) would have been applied, a contradiction. The model obtained is a model of the clauses remaining when no more steps are possible in some chosen branch. Case for Relevant logic. Let the set of atoms formed using predicates in the initial set of clauses S and labels drawn from the Herbrand Domain of S, DS , be called BS . A model M0 of the atoms in BS is assigned, using the atoms in M , by the following assignment conditions: (i) Any ground atom in BS that is subsumed by an atom in M is true in M0 . (ii) Any ground atom in BS that subsumes an atom L in M by contraction of parameters in the full part of L only, is true in M0 . (iii) All other ground atoms in BS are false in M0 . Assume that there is a ground instance of a non-negative clause C in S that is false in M0 . That is, for some instance C , of C, its condition literals are true in M0 and its conclusion is false in M0 . If the conclusion is a single literal then, as (Hyper) has been applied to C already, the conclusion L is either true in M , and hence in M0 , or it is subsumed by a clause in M , and again is true in M0 , or it is disallowed. Both the ﬁrst two circumstances contradict the assumption. For the third circumstance, since L is disallowed, there is some literal L , in M or subsumed by a literal in M , which is subsumed by L by contracting only parameters that occur in the full part of L . But then by assignment condition (ii) both L and L are assigned true, again contradicting the assumption. The remainder of the proof is as given for LCLDS . An example of a failed refutation in RL is given in Fig. 4, in which there is an attempt to show (α → α) → (α → (α → α)). For this problem there are two parameters a and b with respective relevant indices ma = 1 and mb = 2. In the

154

Krysia Broda

Initial translation: ∗ (α → α) → (x) P0 (x) (α → (α → α)) ∗ P1 (x) [α → α] (x) P2 (x) [α → (α → α)]∗ (x)

(1) (2) (3) (4) (5) (6)

Initial clauses: ¬P0 (1) P2 (ax) → P0 (x) P1 (a) ∨ P0 (x) P1 (x) ∧ A(y) → A(xy) A(b) ∨ P1 (x) A(bx) → P1 (x)

(7) A(b) ∨ P2 (x) (8) P1 (bx) → P2 (x) Derivation: (9) (Split (3)) P1 (a) (10) (Split (5)) A(b) (11) (Hyper (4)) A(ab) (12) (Hyper (6)) P1 (1)

Fig. 4. Failed refutation in AR using AlgMG

branch (9) - (12) any further literals generated using (4), such as A(a2 b), are not allowed as they are not a member of the relevant set of A(ab). The atoms P1 (a), P1 (1), A(b) and A(ab) are assigned true, as are P1 (ak ) and A(ak b), k ≥ 2. All others are assigned false. Note that atoms of the form A(bk ), k ≥ 2, are not assigned true by assignment condition (ii), based on atom A(b), because neither b nor a occur in the full part, which is just the parameter 1. The reader can check that this is a model for the clauses (1)-(8). The number of atoms in a branch for each predicate depends on how soon atoms with full parts are derived for that predicate. If, for example, there are two parameters a and b, ma = 2 and mb = 3, then if P (aabbb) happened to be derived immediately, no other P atoms with both a and b in the label would be derived. Those with fewer occurrences (at least one of each parameter) would be prevented by subsumption, whereas those with more occurrences would be prevented by the termination restriction (ii). On the other hand, the worst case number of P atoms generated, with at least one occurrence of each parameter in the label, would be 6; for example, the following generation order would require all 6: ab, a2 b, ab2 , a2 b2 , ab3 , a2 , b3 . 5.1

Proving Correspondence of LCLDS /RCLDS with LL/RL

In order to show that the refutation system LCLDS presented here does indeed correspond to a standard Hilbert axiom presentation for Linear Logic it is necessary to show that theorems derived within the two systems are the same (Properties 4 and 5 of Theorem 1). Similarly for RCLDS and Relevant Logic. The complete set of axioms used in the implication fragments of LL and RL is shown in Table 2. Axioms (I2), (I3) and (I4) correspond, respectively, to contraction, distributivity and permutation. A useful axiom, (I5), is derivable also from (I3) and (I4) and is included for convenience. All axioms are appropriate for RL, whereas (I2) is omitted for LL. Respectively, Theorems 2 and 3 state that theorems in LL and RL derived from these axioms together with the rule of Modus Ponens (MP) are also theorems of AlgMG, and that theorems of LCLDS and RCLDS are also theorems in the Hilbert System(s).

A Decidable CLDS for Some Propositional Resource Logics

155

Table 2. The Hilbert axioms for ICLDS α→α (I1) (α → (β → γ)) → (β → (α → γ)) (I4) (α → (α → β)) → (α → β) (I2) (α → β) → ((β → γ) → (α → γ)) (I5) (α → β) → ((γ → α) → (γ → β)) (I3)

Correspondence Part I. Property (4) of AlgMG is shown in Theorem 2. An outline proof is given. For RL the appropriate Hilbert axioms are (Ax1) - (Ax5); (Ax2) is omitted for LL. Theorem 2. Let P be a Hilbert theorem of LL then the union of {¬[P ]∗ (1)} and the appropriate set of instances of the semantic axioms (equivalences) for ¬[P ]∗ (1), PS , has no models in HL . (For RL, PS has no models.) Proof. (Outline only.) The proof is essentially the same for both logics. Let PS be the set of deﬁning equivalences for P and its subformulas, ∀x[[P ]∗ (x) ↔ R(x)] be the deﬁning equivalence for [P ]∗ and ∀x[[P ]∗ (x) ↔ TP (x)] be the resulting equivalence after replacing every occurrence in R(x) of an atom that has a deﬁning equivalence in PS by the right-hand side of that equivalence. It is shown next that TP (1) is always true and hence that there are no models of PS and ¬[P ]∗ (1). This property of TP (1) is shown by induction on the number of (MP) steps in the Hilbert proof of P . In case P is an axiom and uses no applications of (MP) in its proof then the property can be seen to hold by construction. For instance, in the case of the contraction axiom (I2), T(I2) (1) is the sentence ∀y(∀zv([α]∗ (z) ∧ [α]∗ (v) → [β]∗ (zyv)) → ∀u([α]∗ (u) → [β]∗ (uy))) In the case of LL, the equivalences include also the restricted predicate (shortened to r in the illustration below). For the permutation axiom (I4), T(I4) (1), after some simpliﬁcation6 , is the sentence ∀y([α]∗ (y) → ∀v(r(zyv) → ([β]∗ (v) → [γ]∗ (zyv)))) → ∀z ∀u([β]∗ (u) → ∀w(r(zuw) → ([α]∗ (u) → [γ]∗ (zuw)))) Let the property hold for all theorems that have Hilbert proofs using < n applications of (MP), and consider a theorem P such that its proof uses n (MP) steps, with the last step being a derivation from P and P → P . By hypothesis, TP (1) is true, and TP →P (1) is true. Hence, since ∀x[TP →P (x) ↔ ∀u[TP (u) → TP (ux)]], then TP (1) is also true. The contrapositive of Theorem 2 allows the conclusion that P is not a theorem to be drawn from the existence of a model for {¬[P ]∗ (1)} ∪ PS as found by a terminating AlgMG. 6

In particular, restricted(xy) implies also restricted(x) and restricted(y).

156

Krysia Broda

Correspondence Part II. To show that every formula classiﬁed as a theorem by AlgMG in RCLDS or LCLDS is also derivable using the appropriate Hilbert axioms and the rule of Modus Ponens, Theorem 3 is used. Theorem 3. Let Gα be the set of instances of A+ for showing α (not including L ¬[α]∗ (1)), then if there exists an AlgMG refutation in LCLDS of Gα ∪¬[α]∗ (1) then there is a Hilbert proof in LL of α, which is therefore a theorem of LL. That is, if Gα , ¬[α]∗ (1) |=FOL then "HI α7 . Similarly for RCLDS and RL. Proof. Suppose Gα , ¬[α]∗ (1) |=FOL , hence any model of Gα is also a model of [α]∗ (1); it is required to show "HI α. Lemma 2 below states there is a model M of A+ (A+ ), and hence of Gα , with the property that [α]∗ (1) = true iﬀ "HI α. L R Therefore, since M is a model of A+ (A+ ) it is a model of [α]∗ (1) and hence "HI α L R is true, as required. The desired model is based on the canonical interpretation introduced in [1]. Definition 15. The canonical interpretation for LCLDS is an interpretation from Mon(LP , LL ) onto the power set of LP deﬁned as follows: – ||cα || = {z :"HI α → z}, for each parameter cα ; – ||λ ◦ λ || = {z : "HI α ∧ β → z} = {z : "HI α → (β → z)} , where α ∈ ||λ|| and β ∈ ||λ ||; – ||1|| = {z : "HI z} and – || || = {(||x||, ||y||) : ||x|| ⊆ ||y||}; – ||[α]∗ || = {||x|| : α ∈ ||x||}; Similarly for RCLDS . For the case of LCLDS an interpretation of the restricted predicate is also needed. This depends on the particular theorem that is to be proven, as it makes use of the relevant indices of the parameters occurring in the translated clauses. The interpretation is given by: ||restricted|| = {||x|| : ∀z(z ∈ ||x|| → z is provable using ≤ mαi occurrences of αi )} In other words, restricted(x) = true iﬀ x includes ≤ mαi occurrences of parameter αi . (In case a new parameter is used for each instance of Axiom (Ax3c) then the deﬁnition does not depend on the particular theorem to be proven as mαi = 1 for every cαi .) + The canonical interpretation is used to give a Herbrand model for A+ L (AR ), ∗ ∗ by setting [α] (x) = true iﬀ α ∈ ||x||. This means, in particular, that if [α] (1) = true then α ∈ ||1|| and hence "HI α. The following Lemma states that the (A+ ). canonical interpretation of Deﬁnition 15 is a model of A+ I R Lemma 2. The properties of the labelling algebra AL (AR ) given in Deﬁnition 2 + and the semantic axioms of A+ L (AR ) are satisﬁed by the canonical interpretation for LCLDS (RCLDS ). 7

The notation HI γ indicates that γ is provable using the appropriate Hilbert axioms.

A Decidable CLDS for Some Propositional Resource Logics

157

Proof. Each of the properties of the labelling algebra is satisﬁed by the canonical interpretation. For RCLDS the case for contraction is given here. The other cases are as given in [4]. For LCLDS the case for Axiom (Ax3a) is given. The other cases are as given in [4] but modiﬁed to include the restricted predicate. contraction Suppose that δ ∈ ||λ|| ◦ ||λ||. Then there is a Hilbert proof of α → (α → δ), where α ∈ ||λ||. By axiom (I2) "HI α → δ and δ ∈ ||λ||. (Ax3a) Let the maximum number of parameter occurrences allowed be ﬁxed by the global relevant indices for the particular theorem to be proved. Suppose restricted(x), restricted(y) and restricted(xy) and that α ∈ ||x|| and α → β ∈ ||y||. Then there are Hilbert proofs of δ → α and γ → α → β for δ ∈ ||x|| and γ ∈ ||y|| such that no more than the allowed number of subformula occurrences, as given by the relevant indices for the problem, are used in the combined proofs of δ and γ. To show δ → (γ → β), and hence β ∈ ||x ◦ y||, use axioms (I4) and (I5).

6

Conclusions

In this paper the method of Compiled Labelled Deductive Systems, based on the principles in [9], is applied to the two resource logics, LL and RL. The method of CLDS provides logics with a uniform presentation of their derivability relations and semantic entailments and its semantics is given in terms of a translation approach into ﬁrst-order logic. The main features of a CLDS system and model theoretic semantics are described here. The notion of a conﬁguration in a CLDS system generalises the standard notion of a theory and the notion of semantic entailment is generalised to relations between structured theories. The method is used to give presentations of LCLDS and RCLDS , which are seen to be generalisations, respectively, of Linear and Relevance Logic through the correspondence results in Sect. 5, which shows that there is a one-way translation of standard theories into conﬁgurations, while preserving the theorems of LL and RL. The translation results in a compiled theory of a conﬁguration. A refutation system based on a Model Generation procedure is deﬁned for this theory, which, together with a particular uniﬁcation algorithm and an appropriate restriction on the size of terms, yields a decidability test for formulas of propositional Linear Logic or Relevance Logic. The main contribution of this paper is to show how the translation approach into ﬁrst order logic for Labelled Deductive Systems can still yield decidable theories. This meets one of the main criticisms levelled at LDS, and at CLDS in particular, that for decidable logics the CLDS representation is not decidable. The method used in this paper can be extended to include all operators of Linear Logic, including the additive and exponential operators. For instance, the axiom for the additive disjunction operator ∨ in LL is ∀x([α ∨ β]∗ (x) ↔ ∀y(([α → γ]∗ (y) ∧ [β → γ]∗ (y)) → [γ]∗ (x ◦ y))) From an applicative point of view, the CLDS approach provides a logic with reasoning which is closer to the needs of computing and A.I. These are in fact

158

Krysia Broda

application areas with an increasing demand for logical systems able to represent and to reason about structures of information (see [9]). For example in [3] it is shown how a CLDS can provide a ﬂexible framework for abduction. For the automated theorem proving point of view, the translation method described in Section 2.2 facilitates the use of ﬁrst-order therem provers for deriving theorems of the underlying logic. In fact, the ﬁrst order axioms of a CLDS extended algebra A+ S can be translated into clausal form, and so any clausal theorem proving method might be appropriate for using the axioms to automate the process of proving theorems. The clauses resulting from the translation of a particular conﬁguration represent a partial coding of the data. A resolution refutation that simulates the application of natural deduction rules could be developed, but because of the simple structure of the clauses resulting from a subtructural CLDS theory the extended Model Generation method used here is appropriate.

References 1. M. D’Agostino and D. Gabbay. A generalisation of analytic deduction via labelled deductive systems. Part I: Basic substructural Logics. Journal of Automated Reasoning, 13:243-281, 1994. 2. K. Broda, M. Finger and A. Russo. Labelled Natural Deduction for Substructural Logics. Logic Journal of the IGPL, Vol. 7, No. 3, May 1999. 3. K. Broda and D. Gabbay. An Abductive CLDS. In Labelled Deduction, Kluwer, Ed. D. Basin et al, 1999. 4. K.Broda and D. Gabbay. A CLDS for Propositional Intuitionistic Logic. TABLEAUX-99, USA, LNAI 1617, Ed. N. Murray, 1999. 5. K. Broda and A. Russo. A Unified Compilation Style Labelled Deductive System for Modal and Substructural Logic using Natural Deduction. Technical Report 10/97. Department of Computing, Imperial College 1997. 6. K. Broda, A. Russo and D. Gabbay. A Unified Compilation Style Natural Deduction System for Modal, Substructural and Fuzzy logics, in Dicovering World with Fuzzy logic: Perspectives and Approaches to Formalization of Human-consistent Logical Systems. Eds V. Novak and I.Perfileva, Springer-Verlag 2000 7. A. Bundy. The Computer Modelling of Mathematical Reasoning. Academic Press, 1983. 8. C. L. Chang and R. Lee. Symbolic Logic and Mechanical Theorem Proving. Academic Press 1973. 9. D. Gabbay. Labelled Deductive Systems, Volume I - Foundations. OUP, 1996. 10. J. H. Gallier. Logic for Computer Science. Harper and Row, 1986. 11. R. Hasegawa, H. Fujita and M. Koshimura. MGTP: A Model Generation Theorem Prover - Its Advanced Features and Applications. In TABLEAUX-97, France, LNAI 1229, Ed. D. Galmiche, 1997. 12. W. Mc.Cune. Otter 3.0 Reference Manual and Guide. Argonne National Laboraqtory, Argonne, Illinois, 1994. 13. J.A. Robinson. Logic, Form and Function. Edinburgh Press, 1979. 14. A. Russo. Modal Logics as Labelled Deductive Systems. PhD. Thesis, Department of Computing, Imperial College, 1996.

A Decidable CLDS for Some Propositional Resource Logics

159

15. R. A. Schmidt. Resolution is a decision procedure for many propositional modal logics. Advances in Modal Logic, Vol.1, CSLI, 1998. 16. P. B. Thistlethwaite, M. A. McRobbie and R. K. Meyer. Automated TheoremProving in Non-Classical Logics, Wiley, 1988.

A Critique of Proof Planning Alan Bundy Division of Informatics, University of Edinburgh

Abstract. Proof planning is an approach to the automation of theorem proving in which search is conducted, not at the object-level, but among a set of proof methods. This approach dramatically reduces the amount of search but at the cost of completeness. We critically examine proof planning, identifying both its strengths and weaknesses. We use this analysis to explore ways of enhancing proof planning to overcome its current weaknesses.

Preamble This paper consists of two parts: 1. a brief ‘bluﬀer’s guide’ to proof planning1 ; and 2. a critique of proof planning organised as a 4x3 array. Those already familiar with proof planning may want to skip straight to the critique which starts at §2, p164.

1

Background

Proof planning is a technique for guiding the search for a proof in automated theorem proving, [Bundy, 1988, Bundy, 1991, Kerber, 1998, Benzm¨ uller et al, 1997]. The main idea is to identify common patterns of reasoning in families of similar proofs, to represent them in a computational fashion and to use them to guide the search for a proof of conjectures from the same family. For instance, proofs by mathematical induction share the common pattern depicted in ﬁgure 1. This common pattern has been represented in the proof planners Clam and λClam and used to guide a wide variety of inductive proofs [Bundy et al, 1990b, Bundy et al, 1991, Richardson et al, 1998].

1

The research reported in this paper was supported by EPSRC grant GR/M/45030. I would like to thank Andrew Ireland, Helen Lowe, Raul Monroy and two anonymous referees for helpful comments on this paper. I would also like to thank other members of the Mathematical Reasoning Group and the audiences at CIAO and Scottish Theorem Provers for helpful feedback on talks from which this paper arose. Pointers to more detail can be found at http://dream.dai.ed.ac.uk/projects/proof planning.html

A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 160–177, 2002. c Springer-Verlag Berlin Heidelberg 2002

A Critique of Proof Planning

; ;

;

base case

induction

161

@ @@R step case ripple

? fertilize

Inductive proofs start with the application of an induction rule, which reduces the conjecture to some base and step cases. One of each is shown above. In the step case rippling reduces the diﬀerence between the induction conclusion and the induction hypothesis (see §1.2, p162 for more detail). Fertilization applies the induction hypothesis to simplify the rippled induction conclusion.

Fig. 1. ind strat: A Strategy for Inductive Proof

1.1

Proof Plans and Critics

The common patterns of reasoning are represented using tactics: computer programs which control proof search by applying rules of inference [Gordon et al, 1979]. These tactics are speciﬁed by methods. These methods give both the preconditions under which the tactics are applicable and the eﬀects of their successful application. Meta-level reasoning is used to combine the tactics into a customised proof plan for the current conjecture. This meta-level reasoning matches the preconditions of later tactics to the eﬀects of earlier ones. Examples of such customised proof plans are given in ﬁgure 2. Proof planning has been extended to capture common causes of proof failure and ways to patch them [Ireland, 1992, Ireland & Bundy, 1996b]. With each proof method are associated some proof critics. Critics have a similar format to methods, but their preconditions specify situations in which the method’s associated tactic will fail and instead of tactics they have instructions on patching a failed proof. Each of the critics associated with a method has a diﬀerent precondition. These are used to decide on an appropriate patch. Most of the critics built to date have been associated with the ripple method, or rather with its principle sub-method, wave, which applies one ripple step (see §1.2, p162). Among the

162

Alan Bundy ind strat( x + 1

↑

, x)

ind strat( x + 1

↑

[ ind strat( y + 1 ind strat( y + 1 Associativity of + x + (y + z) = (x + y) + z

, x) then ↑ ↑

, y) , y)

] Commutativity of + x+y =y+x

The associativity of + is an especially simple theorem, which can be proved with a single application of ind strat from ﬁgure 1, using a one step induction rule on induction variable x. The commutativity of + is a bit more complicated. ind strat is ﬁrst applied using induction variable x then in both the base and step cases there is a nested application of ind strat using y. The ﬁrst argument of ind strat indexes the induction rule using the rippling concept of wave-fronts (see §1.2, p162). The second argument speciﬁes the induction variable.

Fig. 2. Special-Purpose Proof Plans

patches these critics suggest are: a generalisation of the current conjecture, the use of an intermediate lemma, a case split and using an alternative induction rule. The use of a critic to generalise a conjecture is illustrated in ﬁgure 8. Proof planning has been tested successfully on a wide range of inductive and other theorems. These include conjectures arising from formal methods, i.e. from the veriﬁcation, synthesis and transformation of both software and hardware. They include, for instance: the transformation of naive into tail recursive programs [Hesketh et al, 1992], the veriﬁcation of a microprocessor, [Cantu et al, 1996], the synthesis of logic programs [Kraan et al, 1996], decision procedures [Armando et al, 1996] and the rippling tactic [Gallagher, 1993], resolution completeness proofs [Kerber & Sehn, 1997], proofs of limit theorems [Melis, 1998] and diagonalization proofs [Huang et al, 1995, Gow, 1997]. Critics are especially useful at coming up with, so called, ‘eureka’ steps, i.e. those proof steps that usually seem to require human intervention, for instance constructing appropriate induction rules, intermediate lemmas and generalisations [Lowe et al, 1998] and loop invariants [Ireland & Stark, 1997]. Proof planning has also been applied outwith mathematics to the computer games of bridge [Frank et al, 1992] and Go [Willmott et al, 1999] and also to problems of conﬁguring systems from parts, [Lowe, 1991, Lowe et al, 1996]. 1.2

Rippling

Rippling is the key method in proof plans for inductive proof. Not only does it guide the manipulation of the induction conclusion to prepare it for the application of the induction hypothesis, but preparation for rippling suggests an

A Critique of Proof Planning

163

appropriate induction rule and variable and diﬀerent patterns of rippling failure suggest new lemmas and generalisations. Since it is also cited several times in the critique, we have included a brief introduction to rippling here. Rippling is useful whenever there is a goal to be proved in the context of one or more ‘givens’. Givens may be axioms, previously proved theorems, assumptions or hypotheses. It works by calculating the diﬀerence between the goal and the given(s) and then systematically reducing it. The similarities and diﬀerences between the goal and given(s) are marked with meta-level annotations. These annotations are shown graphically in ﬁgure 5, where the notation of rippling is explained. An example of rippling is given in ﬁgure 6.

rev(nil) = nil rev(H :: T ) = rev(T ) (H :: nil) qrev(nil, L) = L qrev(H :: T, L) = qrev(T, H :: L) rev and qrev are alternative recursive functions for reversing a list. Each is deﬁned by a one-step list recursion using a base and step case. :: is an inﬁx list cons and an inﬁx list append. rev is a naive reverse function and qrev a more eﬃcient, tail-recursive function. The second argument of qrev is called an accumulator. This accumulator should be set to nil when qrev is ﬁrst applied to reverse a list. Figure 4 states two theorems that relate these two functions.

Fig. 3. Recursive Deﬁnitions of Two Reverse Functions

∀k. rev(k) = qrev(k, nil)

(1)

∀k, l. rev(k) l = qrev(k, l)

(2)

Theorem (1) shows that rev and qrev output the same result from the same input when the accumulator of qrev is initialised to nil. Theorem (2) generalises theorem (1) for all values of this accumulator. Paradoxically, the more specialised theorem (1) is harder to prove. One way to prove it is ﬁrst to generalise it to theorem (2).

Fig. 4. Two Theorems about List Reversing Functions

164

Alan Bundy

Given: rev(t) L = qrev(t, L) Goal: rev( h :: t Wave-Rules:

↑

) l = qrev( h :: t

rev( H :: T qrev( H :: T ( X Y

↑

↑

↑

↑

, l)

) ⇒ rev(T ) H :: nil

, L) ⇒ qrev(T, H :: L

↓

) Z ⇒ X ( Y Z

↑

) ↓

(3) (4)

)

(5)

The example is drawn from the inductive proof of theorem (2) in ﬁgure 4. The given and the goal are the induction hypothesis and induction conclusion, respectively, of this theorem. Wave-rules (3) and (4) are annotated versions of the step cases of the recursive deﬁnitions of the two list reversing functions in ﬁgure 3. Wave-rule (5) is from the associativity of . The grey boxes are called wave-fronts and the holes in them are called waveholes. The wave-fronts in the goal indicate those places where the goal diﬀers from the given. Those in the wave-rules indicate the diﬀerences between the left and right hand sides of the rules. The arrows on the wave-fronts indicate the direction in which rippling will move them: either outwards (↑) or inwards (↓). The corners, . . ., around the l in the goal indicate a sink. A sink is one of rippling’s target locations for wave-fronts; the other target is to surround an instance of the whole given with a wave-front. The wave-rules are used to rewrite each side of the goal. The eﬀect is to move the wave-fronts either to surround an instance of the given or to be absorbed into a sink. An example of this process is given in ﬁgure 6

Fig. 5. The Notation of Rippling

2

Critique

Our critique of proof planning is organised along two dimensions. On the ﬁrst dimension we consider four diﬀerent aspects of proof planning: (1) its potential for advance formation, (2) its theorem proving power, (3) its support for interaction and (4) its methodology. On the second dimension, for each aspect of the ﬁrst dimension we present: (a) the original dream, (b) the reality of current implementations and (c) the options available for overcoming obstacles and realising part of that original dream. 2.1

The Advance Formation of Plans

The Dream: In the original proposal for proof planning [Bundy, 1988] it was envisaged that the formation of a proof plan for a conjecture would precede its use to guide the search for a proof. Meta-level reasoning would be used to join general proof plans together by matching the preconditions of later ones to the

A Critique of Proof Planning

165

Given: rev(t) L = qrev(t, L) Goal: rev( h :: t ( rev(t) h :: nil

↑ ↑

) l = qrev( h :: t

↑

, l)

) l = qrev(t, h :: l)

rev(t) (h :: nil) l = qrev(t, h :: l) rev(t) h :: l = qrev(t, h :: l) The example comes from the step case of the inductive proof of theorem (2) from ﬁgure 4. Note that the induction variable k becomes the constant t in the ↑

in the goal. However, the other universal given and the wave-front h :: t variable, l, becomes a ﬁrst-order meta-variable, L, in the given, but a sink, l, in the goal. We use uppercase to indicate meta-variables and lowercase for object-level variables and constants. The left-hand wave-front is rippled-out using wave-rule (3) from ﬁgure 5, but then rippled-sideways using wave-rule (5), where it is absorbed into the lefthand sink. The right-hand wave-front is rippled-sideways using wave-rule (4) and absorbed into the right-hand sink. After the left-hand sink is simpliﬁed, using the recursive deﬁnition of , the contents of the two sinks are identical and the goal can be fertilized with the given, completing the proof. Note that fertilization uniﬁes the meta-variable L with the sink h :: l. Note that there is no point in rippling sideways unless this absorbs wave-fronts into sinks. Sinks mark the potential to unify wave-fronts with meta-variables during fertilization. Without sinks to absorb the wave-fronts, fertilization will fail. Such a failure is illustrated in ﬁgure 7

Fig. 6. An Example of Rippling

eﬀects of earlier ones. A tactic would then be extracted from the customised proof plan thus constructed. A complete proof plan would be sent to a tacticbased theorem prover where it would be unpacked into a formal proof with little or no search. The Reality: Unfortunately, in practice, this dream proved impossible to realise. The problem is due to the frequent impossibility of checking the preconditions of methods against purely abstract formulae. For instance, the preconditions of rippling include checking for the presence of wave-fronts in the current goal formula, that a wave-rule matches a sub-expression of this goal and that any new inwards wave-fronts have a wave-hole containing a sink. These preconditions cannot be checked unless the structure of the goal is known in some detail. To know this structure requires anticipating the eﬀects of the previous methods in the current plan. The simplest way to implement this is to apply each of the tactics of the previous methods in order.

166

Alan Bundy

Similar arguments hold for most of the other proof methods used by proof planners. This is especially true in applications to game playing where the diﬀerent counter actions of the opposing players must be explored before a response can be planned, [Willmott et al, 1999]. So the reality is an interleaving of proof planning and proof execution. Moreover, the proof is planned in a consecutive fashion, i.e. the proof steps are developed starting at one end of the proof then proceeding in order. At any stage of the planning process only an initial or ﬁnal segment of the object-level proof is known. The Options: One response to this reality is to admit defeat, abandon proof planning and instead recycle the preconditions of proof methods as preconditions for the application of tactics. Search can then be conducted in a space of condition/action production rules in which the conditions are the method preconditions and the actions are the corresponding tactics. Satisfaction of a precondition will cause the tactic to be applied thus realising the preconditions of subsequent tactics. Essentially, this strategy was implemented by Horn in the Oyster2 system [Horn, 1992]. The experimental results were comparable to earlier versions of Clam, i.e. if tactics are applied as soon as they are found to be applicable then proof planning conveys no advantage over Horn’s production rule approach. However, in subsequent developments some limited abstraction has been introduced into proof planning, in particular, the use of (usually second-order) meta-variables. In many cases the method preconditions can be checked on such partially abstract formulae. This allows choices in early stages of the proof to be delayed then made subsequently, e.g. as a side eﬀect of uniﬁcation of the meta-variables. We call this middle-out reasoning because it permits the nonconsecutive development of a proof, i.e. instead of having to develop a proof from the top down or the bottom up we can start in the middle and work outwards. Middle-out reasoning can signiﬁcantly reduce search by postponing a choice with a high branching factor until the correct branch can be determined. Figure 8 provides an example of middle-out reasoning. Among the choices that can be successfully delayed in this way are: the witness of an existential variable, the induction rule, [Bundy et al, 1990a], an intermediate lemma and generalisation of a goal [Ireland & Bundy, 1996b, Ireland & Bundy, 1996a]. Each of these has a high branching factor – inﬁnite in some cases. A single abstract branch containing meta-variables can simultaneously represent all the alternative branches. Incremental instantiation of the meta-variables as a side eﬀect of subsequent proof steps will implicitly exclude some of these branches until only one remains. Even though the higher-order2 uniﬁcation required to whittle down these choices is computationally expensive the cost is far less than the separate exploration of each branch. Moreover, the wave annotation can be exploited to control higher-order uniﬁcation by requiring wave-fronts to unify with wave-fronts and wave-holes to unify with wave-holes. 2

Only second-order unification is required for the examples tackled so far, but higherorder unification is required in the general case.

A Critique of Proof Planning

167

Given: rev(t) = qrev(t, nil) Goal: rev( h :: t ( rev(t) h :: nil

↑ ↑

) = qrev( h :: t ) = qrev( h :: t

↑

↑

, nil) , nil)

blocked The example comes from the failed step case of the inductive proof of theorem (1) from ﬁgure 4. A particular kind of ripple failure is illustrated. The left-hand wave-front can be rippled-out using wave-rule (3) and is then completely rippled. However, the right-hand wave-front cannot be rippledsideways even though wave-rule (4) matches it. This is because there is no sink to absorb the resulting inwards directed wave-front. If the wave-rule was nevertheless applied then any subsequent fertilization attempt would fail. Figure 8 shows how to patch the proof by a generalisation aimed to introduce a sink into the appropriate place in the theorem and thus allow the ripple to succeed.

Fig. 7. A Failed Ripple We have exploited this middle-out technique to especially good eﬀect in our use of critics, [Ireland & Bundy, 1996b]. Constraints have also been used as a least commitment mechanism in the Ωmega proof planner [Benzm¨ uller et al, 1997]. Suppose a proof requires an object with certain properties. The existence of such an object can be assumed and the properties posted as constraints. Such constraints can be propagated as the proof develops and their satisfaction interleaved with that proof in an opportunistic way [Melis et al, 2000b, Melis et al, 2000a]. Middle-out reasoning recovers a small part of the original dream of advance proof planning and provides some signiﬁcant search control advantage over the mere use of method preconditions in tactic-based production rules. 2.2

The Theorem Proving Power of Proof Planning

The Dream: One of the main aims of proof planning was to enable automatic theorem provers to prove much harder theorems than conventional theorem provers were capable of. The argument was that the meta-level planning search space was considerably smaller than the object-level proof search space. This reduction was partly due to the fact that proof methods only capture common patterns of reasoning, excluding many unsuccessful parts of the space. It was also because the higher-level methods, e.g. ind strat, each cover many objectlevel proof steps. Moreover, the use of abstraction devices, like meta-variables, enables more than one proof branch to be explored simultaneously. Such search space reductions should bring much harder proofs into the scope of exhaustive search techniques.

168

Alan Bundy

Schematic Conjecture: ∀k, l. F (rev(k), l) = qrev(k, G(l)) Given: F (rev(t), L) = qrev(t, G(L)) Goal: F (rev( h :: t

↑

F ( rev(t) h :: nil

rev(t) ( h :: nil F ( rev(t) h :: nil

rev(t) ( h :: F ( rev(t) h :: nil

↑

↑

), l) = qrev( h :: t

↑

↑

, G(l))

, l) = qrev(t, h :: G(l) ↓

, l)

) = qrev(t, h :: G(l) ↓

, l)

) = qrev(t, h :: G(l)

↓

↓

↓

)

)

)

rev(t) (h :: l) = qrev(t, h :: l) Meta-Variable Bindings: λu, v. u F (u, v)/F λu, v. v./F λu. u./G Generalised Conjecture: ∀k, l. rev(k) l = qrev(k, l) The example shows how the failed proof attempt in ﬁgure 7 can be analysed using a critic and patched in order to get a successful proof. The patch generalises the theorem to be proved by introducing an additional universal variable and hence a sink. Middle-out reasoning is used to delay determining the exact form of the generalisation. This form is determined later as a side eﬀect of higher-order uniﬁcation during rippling. First a schematic conjecture is introduced. A new universal variable l is introduced, in the right-hand side, at the point where a sink was required in the failed proof in ﬁgure 7. Since we are not sure exactly how l relates to the rest of the right-hand side a second-order meta-variable G is wrapped around it. On the left-hand side a balancing occurrence of l is introduced using the metavariable F . Note that l becomes a ﬁrst-order meta-variable L in the given, but a sink l in the goal. Induction on k, rippling, simpliﬁcation and fertilization are now applied, but higher-order uniﬁcation is used to instantiate F and G. If the schematic conjecture is now instantiated we see that the generalised conjecture is, in fact, theorem (2) from ﬁgure 4.

Fig. 8. Patching a Failed Proof using Middle-Out Reasoning The Reality: This dream has been partially realised. The reduced search space does allow the discovery of proofs that would be beyond the reach of purely object-level, automatic provers: for instance, many of the proofs listed in §1.1, p161.

A Critique of Proof Planning

169

Unfortunately, these very search reduction measures can also exclude the proofs of hard theorems from the search space, making them impossible to ﬁnd. The reduced plan space is incomplete. Hard theorems may require uncommon or even brand new patterns of reasoning, which have not been previously captured in proof methods. Or they may require existing tactics to be used in unusual ways that are excluded by their current heuristic preconditions. Indeed, it is often a characteristic of a breakthrough in mathematical proof that the proof incorporates some new kind of proof method, cf G¨ odel’s Incompleteness Theorems. Such proofs will not be found by proof planning using only already known proof methods, but could potentially be stumbled upon by exhaustive search at the object-level.

The Options: Firstly, we consider ways of reducing the incompleteness of proof planning, then ways of removing it. We should strive to ensure that the preconditions of methods are as general as possible, for instance, minimising the use of heuristic preconditions, as opposed to preconditions that are required for the legal application of the method’s tactic. This will help ensure that the tactic is applied whenever it is appropriate and not excluded due to a failure to anticipate an unusual usage. A balance is required here since the absence of all heuristic preconditions may increase the search space to an infeasible size. Rather diligence is needed to design both tactics and their preconditions which generalise away from the particular examples that may have suggested the reasoning pattern in the ﬁrst place. The use of critics expands the search space by providing a proof patch when the preconditions of a method fail. In practice, critics have been shown to facilitate the proof of hard theorems by providing the ‘eureka’ steps, e.g. missing lemmas, goal generalisations, unusual induction rules, etc, that hard theorems often require [Ireland & Bundy, 1996b]. However, even with these additions, the plan space is still incomplete; so the problem is only postponed. One way to restore completeness would be to allow arbitrary object-level proof steps, e.g. the application of an individual rule of inference such as rewriting, generalisation, induction, etc, with no heuristic limits on its application. Since such a facility is at odds with the philosophy of proof planning, its use would need to be carefully restricted. For instance, a proof method could be provided that made a single object-level proof step at random, but only when all other possibilities had been exhausted. Provided that the rest of the plan space was ﬁnite, i.e. all other proof methods were terminating, then this random method would occasionally be called and would have the same potential for stumbling upon new lines of proof that a purely object-level exhaustive prover does, i.e. we would not expect it to happen very often – if at all. It is interesting to speculate about whether it would be possible to draw a more permanent beneﬁt from such serendipity by learning a new proof method from the example proof. Note that this might require the invention of new meta-level concepts: consider, for instance, the learning of rippling from example

170

Alan Bundy

object-level proofs, which would require the invention of the meta-level concepts of wave-front, wave-hole, etc. Note that a ﬁrst-order object-level proof step might be applied to a formula containing meta-variables. This would require the ﬁrst-order step to be applied using higher-order uniﬁcation, – potentially creating a larger search space than would otherwise occur. Also, some object-level proof steps require the speciﬁcation of an expression, e.g. the witness of an existential quantiﬁer, an induction variable and term, the generalisation of an expression. If these expressions are not provided via user interaction then inﬁnite branching could be avoided by the use of meta-variables. So object-level rule application can introduce meta-variables even if they are not already present. These considerations further underline the need to use such object-level steps only as a last resort. 2.3

The Support for Interaction of Proof Planning

The Dream: Proof planning is not just useful for the automation of proof, it can also assist its interactive development. The language of proof planning describes the high-level structure of a proof and, hence, provides a high-level channel of communication between machine and user. This can be especially useful in a very large proof whose description at the object-level is unwieldy. The diﬀerent proof methods chunk the proof into manageable pieces at a hierarchy of levels. The method preconditions and eﬀects describe the relationships between and within each chunk and at each level. For instance, the language of rippling enables a proof state to be described in terms of diﬀerences between goals and givens, why it is important to reduce those diﬀerences and of ways to do so. The preconditions and eﬀects of methods and critics support the automatic analysis and patching of failed proof attempts. Thus the user can be directed to the reasons for a failed proof and the kind of steps required to remedy the situation. This orients the user within a large and complex search space and gives useful hints as to how to proceed. The Reality: The work of Lowe, Jackson and others in the XBarnacle system [Lowe & Duncan, 1997] shows that proof planning can be of considerable assistance in interactive proof. For instance, in Jackson’s PhD work, [Jackson, 1999, Ireland et al, 1999], the user assists in the provision of goal generalisations, missing lemmas, etc. by instantiating meta-variables. However, each of the advantages listed in the previous section brings corresponding disadvantages. Firstly, proof planning provides an enriched language of human/computer communication but at the price of introducing new jargon for the user to understand. The user of XBarnacle must learn the meaning of wave-fronts, ﬂawed inductions, fertilization, etc. Secondly, and more importantly, the new channel of communication assists users at the cost of restricting them to the proof planning search space; cf the discussion of incompleteness in §2.2, p168. For instance, XBarnacle users can

A Critique of Proof Planning

171

get an explanation of why a method or critic did or did not apply in terms of successful or failed preconditions. They can over-ride those preconditions to force or prevent a method or critic applying. But their actions are restricted to the search space of tactics and critics. If the proof lies outside that space then they are unable to direct XBarnacle to ﬁnd it. The Options: The ﬁrst problem can be ameliorated in a number of ways. Jargon can be avoided, translated or explained according to the expertise and preferences of the user. For instance, “fertilization” can be avoided in favour of, or translated into, the “use of the induction hypothesis”. “Wave-front”, on the other hand, has no such ready translation into standard terminology and must be explained within the context of rippling. Thus, although this problem can be irritating, it can be mitigated with varying amounts of eﬀort. The second problem is more fundamental. Since it is essentially the same as the problem of the incompleteness of the plan space, discussed in §2.2, p168, then one solution is essentially that discussed at the end of §2.2, p169. New methods can be provided which apply object-level proof steps under user control. As well as providing an escape mechanism for a frustrated user this might also be a valuable device for system developers. It would enable them to concentrate on the parts of a proof they were interested in automating while using interaction to ‘fake’ the other parts. The challenge is to integrate such object-level steps into the rest of the proof planning account. For instance, what story can we now tell about how such object-level steps exploit the eﬀects of previous methods and enable the preconditions of subsequent ones? 2.4

The Methodology of Proof Planning

The Dream: Proof planning aims to capture common patterns of reasoning and repair in methods and critics. In [Bundy, 1991] we provide a number of criteria by which these methods and critics are to be assessed. These include expectancy3 , generality, prescriptiveness4, simplicity, eﬃciency and parsimony. In particular, each method and critic should apply successfully in a wide range of situations (generality) and a few methods and critics should generate a large number of proofs (parsimony). Moreover, the linking of eﬀects of earlier methods and critics to the preconditions of later ones should enable a good ‘story’ to be told about how and why the proof plan works. This ‘story’ enables the expectancy criterion to be met. The Reality: It is hard work to ensure that these criteria are met. A new method or critic may originally be inspired by only a handful of examples. There is a constant danger of producing methods and critics that are too ﬁne tuned to 3 4

Some degree of assurance that the proof plan will succeed. The less search required the better.

172

Alan Bundy

these initial examples. This can arise both from a lack of imagination in generalising from the speciﬁc situation and from the temptation to get quick results in automation. Such over-speciﬁcity leads to a proliferation of methods and critics with limited applicability. Worse still, the declarative nature of methods may be lost as methods evolve into arbitrary code tuned to a particular problem set. The resulting proof planner will be brittle, i.e. will frequently fail when confronted with new problems. It will become increasing hard to tell an intelligible story about its reasoning. Critical reviewers will view the empirical results with suspicion, suspecting that the system has been hand-tuned to reproduce impressive results on only a handful of hard problems. As the consequences of over-speciﬁcity manifest themselves in failed proof attempts so the methods and critics can be incrementally generalised to cope with the new situations. One can hope that this process of incremental generalisation will converge on a few robust methods and critics, so realising the original dream. However, a reviewer may suspect that this process is both inﬁnite and non-deterministic, with each incremental improvement only increasing the range of the methods and critics by a small amount. The opposite problem is caused by an over-general or missing precondition, permitting a method to apply in an inappropriate situation. This may occur, for instance, where a method is developed in a context in which a precondition is implicit, but then applied in a situation in which it is absent. This problem is analogous to feature interaction in telecomms or of predicting the behaviour of a society of agents. The Options: The challenge is not only to adopt a development methodology that meets the criteria in [Bundy, 1991] but also to be seen to do so. This requires both diligence in the development of proof plans and the explicit demonstration of this diligence. Both aims can be achieved by experimental or theoretical investigations designed to test explicit hypotheses. For instance, to test the criterion of generality, systematic and thorough application of proof planning systems should be conducted. This testing requires a large and diverse set of examples obtained from independent sources. The diversity should encompass the form, source and diﬃculty level of the examples. However, the generality of the whole system should not be obtained at the cost of parsimony, i.e. by providing lots of methods and critics ‘hand crafted’ to cope with each problematic example; so each of the methods and critics must be shown to be general-purpose. Unfortunately, it is not possible to test each one in isolation, since the methods and critics are designed to work as a family. However, it is possible to record how frequently each method and critic is used during the course of a large test run. To meet the criterion of expectancy the speciﬁcations of the methods and critics should be declarative statements in a meta-logic. It should be demonstrated that the eﬀects of earlier methods enable the preconditions of later ones and that the patches of critics invert the failed preconditions of the methods to which they are attached. Such demonstrations will deal both with the situation

A Critique of Proof Planning

173

in which method preconditions/eﬀects are too-speciﬁc (they will not be strong enough hypotheses) and in which they are too general (they will not be provable). The work of Gallagher [Gallagher, 1993] already shows that this kind of reasoning about method preconditions and eﬀects can be automated. To meet the criterion of prescriptiveness the search space generated by rival methods needs to be compared either theoretically or experimentally; the method with the smaller search space is to be preferred. However, reductions in search space should not be obtained at the cost of unacceptable reductions in success rate. So it might be shown experimentally and/or via expectancy arguments that acceptable success rates are maintained. Reduced search spaces will usually contribute to increased eﬃciency, but it is possible that precondition testing is computationally expensive and that this cost more than oﬀsets the beneﬁts of the increased prescriptiveness, so overall eﬃciency should also be addressed.

3

Conclusion

In this paper we have seen that some of the original dreams of proof planning have not been fully realised in practice. We have shown that in some cases it has not been possible to deliver the dream in the form in which it was originally envisaged, for instance, because of the impossibility of testing method preconditions on abstract formulae or the inherent incompleteness of the planning search space. In each case we have investigated whether and how a lesser version of the original dream can be realised. This investigation both identiﬁes the important beneﬁts of the proof planning approach and points to the most promising directions for future research. In particular, there seem to be three important lessons that have permeated the analysis. Firstly, the main beneﬁts of proof planning are in facilitating a nonconsecutive exploration of the search space, e.g. by ‘middle-out’ reasoning. This allows the postponement of highly branching choice points using least commitment mechanisms, such as meta-variables or constraints. Parts of the search space with low branching rates are explored ﬁrst and the results of this search determine the postponed choices by side-eﬀect, e.g. using higher-order uniﬁcation or constraint solving. This can result in dramatic search space reductions. In particular, ‘eureka’ steps can be made in which witnesses, generalisations, intermediate lemmas, customised induction rules, etc, are incrementally constructed. The main vehicle for such non-consecutive exploration is critics. Our analysis points to the further development of critics as the highest priority in proof planning research. Secondly, in order to increase the coverage of proof planners in both automatic and interactive theorem proving it is necessary to combine it with more brute force approaches. For instance, it may be necessary to have default methods in which arbitrary object-level proof steps are conducted either at random or under user control. One might draw an analogy with simulated annealing in which it is sometimes necessary to make a random move in order to escape from a local minimum.

174

Alan Bundy

Thirdly, frequent and systematic rational reconstruction is necessary to oﬀset the tendency to develop over-specialised methods and critics. This tendency is a natural by-product of the experimental development of proof planning as speciﬁcations are tweaked and tuned to deal with challenging examples. It is necessary to clean-up non-declarative speciﬁcations, merge and generalise methods and critics and to test proof planners in a systematic and thorough way. The assessment criteria of [Bundy, 1991] must be regularly restated and reapplied. Despite the limitations exposed by the analysis of this paper, proof planning has been shown to have a real potential for eﬃcient and powerful, automatic and interactive theorem proving. Much of this potential still lies untapped and our analysis has identiﬁed the priorities and directions for its more eﬀective realisation. Afterword I ﬁrst met Bob Kowalski in June 1971, when I joined Bernard Meltzer’s Metamathematics Unit as a research fellow. Bernard had assembled a world class centre in automatic theorem proving. In addition to Bob, the other research fellows in the Unit were: Pat Hayes, J Moore, Bob Boyer and Donald Kuehner; Donald was the co-author, with Bob, of SL-Resolution, which became the theoretical basis for Prolog. Bob’s ﬁrst words to me were “Do you like computers? I don’t!”. This sentiment was understandable given the primitive computer facilities then available to us: one teletype with a 110 baud link to a shared ICL 4130 with 64k of memory. Bob went on to forsake the automation of mathematical reasoning as the main domain for theorem proving and instead pioneered logic programming: the application of theorem proving to programming. I stuck with mathematical reasoning and focussed on the problem of proof search control. However, I was one of the earliest adopters of Prolog and have been a major beneﬁciary of Bob’s work, using logic programming both as a practical programming methodology and as a domain for formal veriﬁcation and synthesis. I am also delighted to say that Bob has remained a close family friend for 30 years. Happy 60th birthday Bob!

References [Armando et al, 1996]

[Benzm¨ uller et al, 1997]

Armando, A., Gallagher, J., Smaill, A. and Bundy, A. (3-5 January 1996). Automating the synthesis of decision procedures in a constructive metatheory. In Proceedings of the Fourth International Symposium on Artiﬁcial Intelligence and Mathematics, pages 5–8, Florida. Also in the Annals of Mathematics and Artificial Intelligence, 22, pp 259–79, 1998. Benzm¨ uller, C., Cheikhrouhou, L., Fehrer, D., Fiedler, A., Huang, X., Kerber, M., Kohlhase, K., Meier, A, Melis, E., Schaarschmidt, W., Siekmann, J. and Sorge, V. (1997).

A Critique of Proof Planning

[Bundy, 1988]

[Bundy, 1991]

[Bundy et al, 1990a]

[Bundy et al, 1990b]

[Bundy et al, 1991]

[Cantu et al, 1996]

[Frank et al, 1992]

[Gallagher, 1993]

[Gordon et al, 1979]

[Gow, 1997]

[Hesketh et al, 1992]

175

Ωmega: Towards a mathematical assistant. In McCune, W., (ed.), 14th International Conference on Automated Deduction, pages 252–255. Springer-Verlag. Bundy, A. (1988). The use of explicit plans to guide inductive proofs. In Lusk, R. and Overbeek, R., (eds.), 9th International Conference on Automated Deduction, pages 111–120. Springer-Verlag. Longer version available from Edinburgh as DAI Research Paper No. 349. Bundy, Alan. (1991). A science of reasoning. In Lassez, J.L. and Plotkin, G., (eds.), Computational Logic: Essays in Honor of Alan Robinson, pages 178–198. MIT Press. Also available from Edinburgh as DAI Research Paper 445. Bundy, A., Smaill, A. and Hesketh, J. (1990a). Turning eureka steps into calculations in automatic program synthesis. In Clarke, S. L.H., (ed.), Proceedings of UK IT 90, pages 221–6. IEE. Also available from Edinburgh as DAI Research Paper 448. Bundy, A., van Harmelen, F., Horn, C. and Smaill, A. (1990b). The Oyster-Clam system. In Stickel, M. E., (ed.), 10th International Conference on Automated Deduction, pages 647–648. Springer-Verlag. Lecture Notes in Artificial Intelligence No. 449. Also available from Edinburgh as DAI Research Paper 507. Bundy, A., van Harmelen, F., Hesketh, J. and Smaill, A. (1991). Experiments with proof plans for induction. Journal of Automated Reasoning, 7:303–324. Earlier version available from Edinburgh as DAI Research Paper No 413. Cantu, Francisco, Bundy, Alan, Smaill, Alan and Basin, David. (1996). Experiments in automating hardware verification using inductive proof planning. In Srivas, M. and Camilleri, A., (eds.), Proceedings of the Formal Methods for Computer-Aided Design Conference, number 1166 in Lecture Notes in Computer Science, pages 94–108. SpringerVerlag. Frank, I., Basin, D. and Bundy, A. (1992). An adaptation of proof-planning to declarer play in bridge. In Proceedings of ECAI-92, pages 72–76, Vienna, Austria. Longer Version available from Edinburgh as DAI Research Paper No. 575. Gallagher, J. K. (1993). The Use of Proof Plans in Tactic Synthesis. Unpublished Ph.D. thesis, University of Edinburgh. Gordon, M. J., Milner, A. J. and Wadsworth, C. P. (1979). Edinburgh LCF - A mechanised logic of computation, volume 78 of Lecture Notes in Computer Science. SpringerVerlag. Gow, J. (1997). The Diagonalization Method in Automatic Proof. Undergraduate project dissertation, Dept of Artificial Intelligence, University of Edinburgh. Hesketh, J., Bundy, A. and Smaill, A. (June 1992). Using middle-out reasoning to control the synthesis of tail-

176

Alan Bundy

recursive programs. In Kapur, Deepak, (ed.), 11th International Conference on Automated Deduction, volume 607 of Lecture Notes in Artiﬁcial Intelligence, pages 310–324, Saratoga Springs, NY, USA. [Horn, 1992] Horn, Ch. (1992). Oyster-2: Bringing type theory into practice. Information Processing, 1:49–52. [Huang et al, 1995] Huang, X., Kerber, M. and Cheikhrouhou, L. (1995). Adapting the diagonalization method by reformulations. In Levy, A. and Nayak, P., (eds.), Proc. of the Symposium on Abstraction, Reformulation and Approximation (SARA-95), pages 78–85. Ville d’Esterel, Canada. [Ireland & Bundy, 1996a] Ireland, A. and Bundy, A. (1996a). Extensions to a Generalization Critic for Inductive Proof. In McRobbie, M. A. and Slaney, J. K., (eds.), 13th International Conference on Automated Deduction, pages 47–61. Springer-Verlag. Springer Lecture Notes in Artificial Intelligence No. 1104. Also available from Edinburgh as DAI Research Paper 786. [Ireland & Bundy, 1996b] Ireland, A. and Bundy, A. (1996b). Productive use of failure in inductive proof. Journal of Automated Reasoning, 16(1– 2):79–111. Also available from Edinburgh as DAI Research Paper No 716. [Ireland & Stark, 1997] Ireland, A. and Stark, J. (1997). On the automatic discovery of loop invariants. In Proceedings of the Fourth NASA Langley Formal Methods Workshop. NASA Conference Publication 3356. Also available as Research Memo RM/97/1 from Dept of Computing and Electrical Engineering, HeriotWatt University. [Ireland, 1992] Ireland, A. (1992). The Use of Planning Critics in Mechanizing Inductive Proofs. In Voronkov, A., (ed.), International Conference on Logic Programming and Automated Reasoning – LPAR 92, St. Petersburg, Lecture Notes in Artificial Intelligence No. 624, pages 178–189. Springer-Verlag. Also available from Edinburgh as DAI Research Paper 592. [Ireland et al, 1999] Ireland, A., Jackson, M. and Reid, G. (1999). Interactive Proof Critics. Formal Aspects of Computing: The International Journal of Formal Methods, 11(3):302–325. A longer version is available from Dept. of Computing and Electrical Engineering, Heriot-Watt University, Research Memo RM/98/15. [Jackson, 1999] Jackson, M. (1999). Interacting with Semi-automated Theorem Provers via Interactive Proof Critics. Unpublished Ph.D. thesis, School of Computing, Napier University. [Kerber & Sehn, 1997] Kerber, Manfred and Sehn, Arthur C. (1997). Proving ground completeness of resolution by proof planning. In Dankel II, Douglas D., (ed.), FLAIRS-97, Proceedings of the 10th International Florida Artiﬁcial Intelligence Research Symposium, pages 372–376, Daytona, Florida, USA. Florida AI Research Society, St. Petersburg, Florida, USA. [Kerber, 1998] Kerber, Manfred. (1998). Proof planning: A practical approach to mechanized reasoning in mathematics. In Bibel,

A Critique of Proof Planning

[Kraan et al, 1996]

[Lowe & Duncan, 1997]

[Lowe, 1991]

[Lowe et al, 1996]

[Lowe et al, 1998]

[Melis, 1998]

[Melis et al, 2000a]

[Melis et al, 2000b]

[Richardson et al, 1998]

[Willmott et al, 1999]

177

Wolfgang and Schmitt, Peter H., (eds.), Automated Deduction, a Basis for Application – Handbook of the German Focus Programme on Automated Deduction, chapter III.4, pages 77–95. Kluwer Academic Publishers, Dordrecht, The Netherlands. Kraan, I., Basin, D. and Bundy, A. (1996). Middle-out reasoning for synthesis and induction. Journal of Automated Reasoning, 16(1–2):113–145. Also available from Edinburgh as DAI Research Paper 729. Lowe, H. and Duncan, D. (1997). XBarnacle: Making theorem provers more accessible. In McCune, William, (ed.), 14th International Conference on Automated Deduction, pages 404–408. Springer-Verlag. Lowe, Helen. (1991). Extending the proof plan methodology to computer configuration problems. Artiﬁcial Intelligence Applications Journal, 5(3). Also available from Edinburgh as DAI Research Paper 537. Lowe, H., Pechoucek, M. and Bundy, A. (October 1996). Proof planning and configuration. In Proceedings of the Ninth Exhibition and Symposium on Industrial Applications of Prolog. Also available from Edinburgh as DAI Research Paper 859. Lowe, H., Pechoucek, M. and Bundy, A. (1998). Proof planning for maintainable configuration systems. Artiﬁcial Intelligence in Engineering Design, Analysis and Manufacturing, 12:345–356. Special issue on configuration. Melis, E. (1998). The “limit” domain. In Simmons, R., Veloso, M. and Smith, S., (eds.), Proceedings of the Fourth International Conference on Artiﬁcial Intelligence in Planning Systems, pages 199–206. Melis, E., Zimmer, J. and M¨ uller, T. (2000a). Extensions of constraint solving for proof planning. In Horn, W., (ed.), European Conference on Artiﬁcial Intelligence, pages 229– 233. Melis, E., Zimmer, J. and M¨ uller, T. (2000b). Integrating constraint solving into proof planning. In Ringeissen, Ch., (ed.), Frontiers of Combining Systems, Third International Workshop, FroCoS’2000, number 1794 in Lecture Notes on Artificial Intelligence, pages 32–46. Springer. Richardson, J. D. C, Smaill, A. and Green, I. (July 1998). System description: proof planning in higher-order logic with Lambda-Clam. In Kirchner, Claude and Kirchner, H´el`ene, (eds.), 15th International Conference on Automated Deduction, volume 1421 of Lecture Notes in Artiﬁcial Intelligence, pages 129–133, Lindau, Germany. Willmott, S., Richardson, J., Bundy, A. and Levine, J. (1999). An adversarial planning approach to Go. In Jaap van den Herik, H. and Iida, H., (eds.), Computers and Games, pages 93–112. 1st Int. Conference, CG98, Springer. Lecture Notes in Computer Science No. 1558.

A Model Generation Based Theorem Prover MGTP for First-Order Logic Ryuzo Hasegawa, Hiroshi Fujita, Miyuki Koshimura, and Yasuyuki Shirai Graduate School of Information Science and Electrical Engineering Kyushu University 6-1, Kasuga-koen, Kasuga, Fukuoka 816-8580, JAPAN {hasegawa,fujita,koshi,shirai}@ar.is.kyushu-u.ac.jp

Abstract. This paper describes the major results on research and development of a model generation theorem prover MGTP. It exploits OR parallelism for non-Horn problems and AND parallelism for Horn problems achieving more than a 200-fold speedup on a parallel inference machine PIM with 256 processing elements. With MGTP, we succeeded in proving diﬃcult mathematical problems that cannot be proven on sequential systems, including several open problems in ﬁnite algebra. To enhance the pruning ability of MGTP, several new features are added to it. These include: CMGTP and IV-MGTP to deal with constraint satisfaction problems, enabling negative and interval constraint propagation, respectively, non-Horn magic set to suppress the generation of useless model candidates caused by irrelevant clauses, a proof simpliﬁcation method to eliminate duplicated subproofs, and MM-MGTP for minimal model generation. We studied several techniques necessary for the development of applications, such as negation as failure, abductive reasoning and modal logic systems, on MGTP. These techniques share a basic idea, which is to use MGTP as a meta-programming system for each application.

1

Introduction

Theorem proving is an important basic technology that gave rise to logic programming, and is acquiring a greater importance not only for reasoning about mathematical theorems but also for developing knowledge processing systems. We started research on parallel theorem provers in 1989 in the Fifth Generation Computer Systems (FGCS) project, with the aim of integrating logic programming and theorem proving technologies. The immediate goal of this research was to develop a fast theorem proving system on the parallel inference machine PIM [42], by eﬀectively utilizing KL1 languages [55] and logic programming techniques. MGTP [11,12] basically follows the model generation method of SATCHMO [38] which has a good property that one way uniﬁcation suﬃces. Indeed, the method is very suited to KL1 implementation because we can use fast builtin uniﬁcation without occur-check. MGTP exploits OR parallelism from non-Horn A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 178–213, 2002. c Springer-Verlag Berlin Heidelberg 2002

A Model Generation Based Theorem Prover MGTP for First-Order Logic

179

problems by independently exploring each branch of a proof tree caused by case splitting, whereas it exploits AND parallelism from Horn problems that do not cause case splitting. Although OR parallelization of MGTP is relatively easy, it is essential to reduce the amount of inter processor communication. For this, we proposed a new method called the N-sequential method [22]. The basic idea is that we run in each processing element (PE) a sequential algorithm to traverse a proof tree depth-ﬁrst and restrict the number of tasks being activated to at most the number N of available PEs. Almost linear speedup was achieved for both Horn and non-Horn problems on a PIM/m system consisting of 256 PEs. With MGTP, we succeeded in solving some open quasigroup problems in ﬁnite algebra [13]. We also solved several hard condensed detachment problems that could not be solved by OTTER [39] with any strategy [25]. On the other hand, research on solving quasigroup problems with MGTP reveals that it lacks negative constraint propagation ability. Then, we developed CMGTP (Constraint-MGTP) [50] that can handle constraint propagations with negative atoms. As a result, CMGTP’s search spaces became much smaller than the original MGTP’s. Recently, we have been developing Java versions of MGTP (JavaMGTP) aiming at better eﬃciency as well as wider usability. JavaMGTP achieves several tens fold speedup compared to KL1 based implementations on a sequential machine. However, in order to further improve the eﬃciency of model generation, several problems remain to be solved that are common to model generation based provers: redundant inference caused by clauses that are irrelevant to the given goal, duplication of the same subproof after case-splitting, and generation of nonminimal models. To solve the ﬁrst problem, we developed a method called non-Horn magic sets (NHM) [24,45]. NHM is a natural extension of the magic sets developed in the deductive database ﬁeld, and is applicable to non-Horn problems. We showed that NHM has the same power as relevancy testing in SATCHMORE [36], although they take completely diﬀerent approaches. For the second problem, we came up with a method that combines the relevancy testing realized by NHM and SATCHMORE with folding-up proposed by Letz [34], within a single framework [32]. The method has not only an eﬀect similar to relevancy testing that suppresses useless model extensions with irrelevant clauses, but also a folding-up function to eliminate duplicate subproofs. These can be achieved by computing relevant literals that contribute to closing a branch. The third problem is how to avoid generating nonminimal models that are redundant and thus would cause ineﬃciency. To this end, we proposed an eﬃcient method that employs branching assumptions and lemmas so as to prune branches that lead to nonminimal models, and to reduce minimality tests on obtained models [23]. Then, we have implemented MM-MGTP based on the method. Experimental results with MM-MGTP show a remarkable speedup compared to the original MGTP.

180

Ryuzo Hasegawa et al.

Regarding applications, MGTP can be viewed as a meta-programming system. We can build various reasoning systems on MGTP by writing the inference rules used for each system as MGTP input clauses. Along this idea, we developed several techniques and reasoning systems necessary for AI applications. They include a method to incorporate negation as failure into MGTP [29], abductive reasoning systems [30], and modal logic systems [31]. In particular, MGTP has actually been used as a rule-based engine for the argumentation and negotiation support system in the legal area.

2

An Abstract MGTP Procedure

MGTP is a theorem proving system for ﬁrst-order logic. An input for MGTP is given as a set of clauses of the implicational form: Π →Σ where, normally, the antecedent Π is a conjunction of atoms and the consequent Σ is a disjunction of atoms1 . A clause is said to be positive if its antecedent is empty or true, negative if its consequent is empty or false, and mixed otherwise. A clause is called a Horn clause if it has at most one atom in its consequent, otherwise it is called a non-Horn clause. A clause is said to be range-restricted if every variable in the consequent of the clause appears in the antecedent, and violated under a model candidate M if it holds that M |= Πσ and M |= Σσ with some substitution σ. A generic algorithm of a standard MGTP procedure is sketched in Fig. 1. The task of MGTP is to try to construct a model for a given set of clauses, by extending the current model candidate M so as to satisfy violated clauses under M (model extension). The function M G takes as an initial input positive Horn clauses U0 , positive non-Horn clauses D0 , and an empty model candidate M , and returns true/false (SAT/UNSAT) as a proof result. MG also outputs a model every time it is found. It works as follows: (1) As long as the unit buﬀer U is not empty, M G picks up an atom u from U , tests whether M |= u (subsumption test), and extends a model candidate M with u (Horn extension). Then, the conjunctive matching procedure CJM (M, u) is invoked to search for clauses whose antecedents Π are satisﬁed by M ∪ {u} under some substitution σ. If such nonnegative clauses are found, their consequents Σσ are added to U or the disjunction buﬀer D according to the form of a consequent. When the antecedent of a negative clause is satisﬁed by M ∪ {u} in CJM (M, u), M G rejects M and returns false (model rejection). (2) When U becomes empty, and if D is not empty, M G picks up a disjunction d from D. If d is not satisﬁed by M , M G recursively calls itself to expand M with each disjunct Lj ∈ d (non-Horn extension). (3) When both U and D become empty, M G outputs M and returns true. 1

This is the primitive form of a clause in a standard MGTP, which will be extended in several ways in MGTP descendants.

A Model Generation Based Theorem Prover MGTP for First-Order Logic procedure MGTP : begin U0 ← positive Horn clauses; D0 ← positive non-Horn clauses; output M G(U0 , D0 , ∅); end ; boolean function M G(buﬀer U, buﬀer D, buﬀer M ) : begin while (U = ∅) begin U ← U \ {u ∈ U }; if (M |= u) then begin M ← M ∪ {u}; CJM (M, u); if (M is rejected) then return false ; end end ; if (D = ∅) then begin D ← D \ {d ∈ D}; (where d = (L1 ∨ . . . ∨ Ln )) if (M |= d) then return

n

181

· · · (1)

· · · (2)

M G(U ∪ {Lj }, D, M );

j=1

end else begin output M ; return true ; end end .

· · · (3)

Fig. 1. A standard MGTP procedure The standard MGTP procedure might be modiﬁed in several ways. For instance, each disjunct of Σ is allowed to be a conjunction of literals. This is especially useful, in fact, for implementing a negation as failure mechanism [29]. We can also extend the procedure to deal with negative literals by introducing two additional operations: unit refutation and unit simpliﬁcation. This extension yields CMGTP [28,50] which is meant for solving constraint satisfaction problems more eﬃciently, and MM-MGTP [23] for minimal model generation. Although the procedure apparently looks sequential, it can be parallelized by exploiting parallelism inherent in it. These issues will be described in detail in subsequent sections.

3

Parallel Implementation

There are several ways to parallelize the proving process in MGTP. These are to exploit parallelism in conjunctive matching, subsumption tests, and case splitting. For ground non-Horn cases, it is most promising to exploit OR parallelism induced by case splitting. Here we use OR parallelism to seek multiple models, which produce multiple solutions, in parallel. For Horn clauses, we have to exploit AND parallelism during the traversal of a single branch. The main source of AND parallelism is conjunctive matching and subsumption testing.

182

Ryuzo Hasegawa et al.

push

?

pop top for self PE

6

PE master

give task

task newer ...

- popfor bottom other PEs ;;;;;;;;;;;;;; task older

Fig. 2. Task stack

3.1

; @ ; @@[email protected]@ ;;;; @@ ; PE 1

take task

PE 2

...

PE n

Fig. 3. Process diagram for OR parallelization

OR Parallelization

For ground non-Horn clauses, it is relatively easy for MGTP to exploit OR parallelism by exploring diﬀerent branches (model candidates) in diﬀerent processing elements (PEs) independently. However, inter-PE communication increases rapidly as the number of branching combinatorially explodes and a large amount of data structures, e.g. model candidates and model extending candidates, is copied to one PE after another. Conventional PE allocation methods, such as cyclic and probabilistic allocation, are based on the principle that whenever tasks are created in own PE, all of them but one are to be thrown to other PEs. Although this scheme is easy to implement, the amount of inter-PE communication is at least proportional to the number of tasks created in the whole computation. To overcome this, we proposed a new method called the N-sequential method [22]. The basic idea is that we run in each PE a sequential algorithm to traverse a proof tree depth-ﬁrst and restrict the number of activated tasks at any time to at most the number N of available PEs. In this method, a PE can move an unsolved task to other idle PE only when requested from it. When the number of created tasks exceeds the number of free PEs, the excess of tasks are executed sequentially within their current PE. Each PE maintains a task stack as shown in Fig. 2 for use in the sequential traversal of multiple unsolved branches. Created tasks are pushed onto the stack, then popped up from the top of stack (pop top) when the current task has been completed. On receipt of a request from the other PE, a task at the bottom is given to it (pop bottom). We provide a master process as shown in Fig. 3 which acts as a matchmaker between task-requesting (take task) and task-oﬀering (give task) PEs. The task stack process and the master process are written in KL1 and incorporated to the MGTP program. OR Parallel Performance. The performance of OR parallel MGTP was evaluated on a PIM/m with 128 PEs and a Sun Enterprise 10000 (SE10k) with 24 PEs. For the latter we used the Klic system which compiles KL1 programs into C codes and makes it possible to run them on a single machine or parallel ma-

A Model Generation Based Theorem Prover MGTP for First-Order Logic

25

Ideal GRP124-8.004 test2-445 PUZ010-1 QG5-12

20

3 +

3 2

+ ×

2 3

15

×

+

2 10 5 0

3 × 2 +

+ × 3 2 0

×

3 + 2 ×

3

183

+ 2 ×

5

10 15 Number of PEs

20

25

Fig. 4. Speedup ratio by OR parallelization on SE10k(1–24PE)

chines like SE10k. The experimental results show signiﬁcant speedups on both systems. Figure 4 shows a speedup ratio by OR parallel execution for non-Horn problems using the N-sequential method on SE10k. Here, GRP124-8.004 and PUZ0101 are problems taken from the TPTP library [53], QG5-12 is a quasigroup problem to be explained in later sections, and test2-445 is an artiﬁcial benchmark spanning a trinary tree. A satisfactory speedup is attained for such problem as GRP124-8.004 and test2-445 in which the number of non-Horn extensions dominates that of Horn extensions. The reason why the PUZ010-1 and QG5-12 degrade the speedup is that they require a signiﬁcant number of Horn-extensions, whereas they do only a small number of non-Horn extensions.

(a) Cyclic allocation method

(b) N-sequential method

Fig. 5. Snapshot of “xamonitor” on PIM/m

184

Ryuzo Hasegawa et al.

Figure 5 depicts snapshots of a “xamonitor” window that indicates the CPU usage on PIM/m which is sampled and displayed at every second of interval. With this ﬁgure, one can observe clear distinction of the characteristic behavior between the cyclic allocation and N-sequential methods. The lighter part of each bar in the graph indicates the percentage of the CPU time used for the net computation during an interval (one second), and the darker part indicates the rate of inter-PE communication. The inter-PE communication consumed about 20 percent of the execution time for the cyclic allocation, whereas it took almost negligible time for the N-sequential method. Furthermore, for the cyclic allocation, the percentage of idling time increases as the computation progresses, whereas there is almost no idling time for the N-sequential method. As a result, the execution of N-sequential method terminates faster than the cyclic allocation. 3.2

AND Parallelization

The computational mechanism for MGTP is essentially based on the “generateand-test” scheme. However, this approach would cause over-generation of atoms, leading to the waste of time and memory spaces. In the AND parallelization of MGTP, we adopted the lazy model generation method [26] that induces a demand-driven style of computation. In this method, a generator process to perform model extension generates a speciﬁed number of atoms only when required by the tester process to perform rejection testing. The lazy mechanism can avoid over-generation of atoms in model extension, and provides ﬂexible control to maintain a high running rate in a parallel environment. Figure 6 shows a process diagram for AND parallel MGTP. It consists of generator(G), tester(T), and master(M) processes. In our implementation, several G and T processes are allocated to each PE. G(T) processes perform conjunctive matching with mixed(negative) clauses. Atoms created by a G process are stored in a N ew buﬀer in the G, and are sent via the Master to T processes to perform rejection testing. The M process prevents G processes from generating too many atoms by monitoring the number of atoms stored in N ew buﬀers and by keeping that number in a moderate range. This number indicates the diﬀerence between the number of atoms generated by G processes and the number of atoms tested by T processes. By simply controlling G and T processes with the buﬀering mechanism mentioned above, the idea of lazy model generation can be implemented. This also enables us to balance the computational load of G and T processes, thus keeping a high running rate. AND Parallel Performance. Figure 7 shows AND parallel performance for solving condensed detachment problems [39] on PIM/m with 256 PEs. Proving time (sec) obtained with 32 PEs for each problem is as follows: #49:18600, #44:9700, #22:8600, #79:2500, and #82:1900. The numbers of atoms that have been kept in M and D are in between 15100 and 36500. More than a 230-fold speedup was attained for #49 and #44, and a 170 to 180-fold speedup for #22, #79 and #82.

@ B ; @@ ; @ B ; R@ B ;; ; @@ ; @@ ; R ;; @

A Model Generation Based Theorem Prover MGTP for First-Order Logic

G2

...

Gg

newg

new1

M aster

∆1

T1

∆t

T2

...

Tt

256 Speedup

G1

ideal

#49 #44 #79 #22 #82

128 64 32 0

185

0 32 64

Fig. 6. Process diagram for AND parallelization

128 No. of PEs

256

Fig. 7. Speedup ratio

To verify the eﬀectiveness of an AND parallel MGTP, we challenged 12 hard condensed detachment problems. These problems could not be solved by OTTER with any strategy proposed in [39]. 7 of 12 problems were solved within an hour except for problem #23, in which the maximum number of atoms being stored in M and D was 85100. The problems we failed to solve were such that this size exceeds 100000 and more than 5 hours are required to solve them. 3.3

Java Implementation of MGTP

While MGTP was originally meant for parallel theorem proving based on parallel logic programming technology, Java implementations of it (JavaMGTP) [20,21] have been developed aiming at more pervasive use of MGTP through the Java technology. Here, we will brieﬂy describe these recent implementations and results for interested readers. The advantages of JavaMGTP’s over the previous implementations with logic languages include platform independence, friendly user interfaces, and ease of extension. Moreover, JavaMGTP achieved the best performance on conventional machines among a family of model generation based provers. This achievement is brought by several implementation techniques that include a special mechanism called A-cells for handling multiple contexts, and an eﬃcient term indexing. It is also a key to the achievement that we eﬀectively utilize Java language facilities such as sophisticated class hierarchies, method overriding, and automatic memory management (garbage collection), as well as destructive assignment. A-cells. Finding a clause Γ → B1 ∨ . . . ∨ Bm violated under a current model candidate M , i.e., (M |= Γ ) ∧ (∀j(1≤j≤m) . M |= Bj ) holds, MGTP extends M to M ∪{B1 }, . . . , M ∪{Bm }. Repeating such extension forms a tree of model candidates, called an MG tree. Thus, each model candidate Mi comprises a sequence < Mi0 , . . . , Mij , . . . , Miki > of sets of literals, where j is a serial number given to

186

Ryuzo Hasegawa et al. S1 = { → a ∨ b. a → c ∨ d. c → ¬e. b → d. d → f. }

φ

φ

; a A ; c A ◦ 2

¬e M1

(a)

◦ 1

; a A ;@ c A d

◦ 1

⇒

• 2

¬e M1

M2 (b)

φ HHH ; a A b ; @ c A d A d • 1

A◦3

⇒

• 2

¬e M1

A◦4

• 3

M2 ⇓ A◦4 f M3 (c)

Fig. 8. Clause set S1 and its MG-tree a context, i.e., a branch extended with a disjunct literal B j , and Mij contains B j and literals used for Horn extension that follow B j . The most frequent operation in MGTP is to check if a ground literal L belongs to the current model candidate M . For this, we introduce an Activation-cell (A-cell) [21]. For each Mij above, we allocate an A-cell Aj containing a boolean ﬂag act. When Mij gets included in the current model candidate M , the act ﬂag of the associated A-cell Aj is set true (denoted by A◦j ), indicating Mij is active. When Mij becomes no longer a member of M , the act of Aj is set false (denoted by A•j ), indicating Mij is inactive. On the other hand, we allocate for each atom P two variables called pac and nac, and assign a pointer to Aj to pac(nac) when P (¬P ) becomes a member of Mij . Note that all occurrences of P and its complement ¬P are represented with a unique object for P in the system. Thus, whether P (¬P ) ∈ Mij can be checked simply by looking into Aj via pac(nac) of P . This A-cell mechanism reduces the complexity of the membership test to O(1) from O(|M |) which would be taken if it were naively implemented. Figure 8 (a) shows an MG tree when a model M1 is generated, in which pac of a refers to an A-cell A◦1 , and both pac of c and nac of e refer to A◦2 . In Fig. 8 (b), the current model candidate has moved from M1 to M2 , so that the A-cell A◦2 is inactivated (changed to A•2 ), which means that neither c nor ¬e belongs to the current model candidate M2 = {a, d}. In Fig. 8 (c), the current model candidate is now M3 = {b, d, f }, and the fact is easily recognized by looking into pac ﬁelds of b, d, and f . Note that d’s pac ﬁeld was updated from A•3 to A◦4 . It is also easily seen that none of the other “old” literals a, c, and ¬e belongs to M3 , since their pac or nac ﬁeld refers to the inactivated A-cell A•1 or A•2 . Graphics. A JavaMGTP provides users with a graphical interface called Proof Tree Visualizer (PTV) for visualizing and controlling the proof process, which is especially useful for debugging and educational purpose. Several kinds of graphical representation for a proof tree can be chosen in PTV, e.g., a standard tree and a radial tree (see Fig. 9). The available graphical functions on a proof tree include: zooming up/down, marking/unmarking nodes, and displaying statistical information on each node. All these graphical operations are performed in concurrent with the proving process by using the multi-threading facility of Java.

A Model Generation Based Theorem Prover MGTP for First-Order Logic

187

Fig. 9. A snapshot of PTV window

Moreover, one can pause/resume a proving process via the mouse control on the graphic objects for a proof tree. Performance of a JavaMGTP. We compared JavaMGTP written in JDK1.2 (+JIT) with a Klic version of MGTP (KlicMGTP) written in KLIC v3.002 and the fastest version [49] of SATCHMO [38] written in ECLi PSe v3.5.2, on a SUN UltraSPARC10 (333MHz, 128MB). 153 range-restricted problems are taken from TPTP v2.1.1 [53], of which 42 satisﬁable problems were run in all-solution mode. In Fig. 10–13, the problems are arranged and numbered in an ascending order of their execution times taken by JavaMGTP. In Fig. 12,13, a black bar shows the runtime ratio for a propositional problem, while a white bar for a ﬁrst-order problem. A gap between bars (ratio zero) indicates the problems for which the two systems gave diﬀerent proofs. JavaMGTP Vs. KlicMGTP. Regarding the problems after #66 for which JavaMGTP takes more than one millisecond, JavaMGTP is 12 to 26 times (except #142) as fast as KlicMGTP for propositional cases, while 5 to 20 times for ﬁrst-order cases (Fig. 12). This diﬀerence in performance is explained as follows. In JavaMGTP, CJM of ground literals like p, q(a) is performed with A-cells, while CJM of a nonground literal like r(X, Y ) is performed with a term memory (TM) [51] rather heavier than A-cells. On the other hand, KlicMGTP always utilizes a TM for CJM, which contains some portions to be linearly scanned. Moreover, since in KlicMGTP, the TM has to be copied every time case splitting occurs, this overhead degrades the performance more signiﬁcantly as the problem becomes harder.

188

Ryuzo Hasegawa et al.

Fig. 12. JavaMGTP vs. KlicMGTP

Fig. 13. JavaMGTP vs. SATCHMO

JavaMGTP Vs. SATCHMO. SATCHMO solved three problems faster than JavaMGTP, while it failed to solve some problems due to memory overﬂow. This is because the proofs given by the two systems diﬀer for such problems. For the other problems, SATCHMO gives the same proofs as JavaMGTP. Observe the problems after #66 in Fig. 13. JavaMGTP is 8 to 23 times as fast as SATCHMO for propositional cases. As for ﬁrst-order cases, JavaMGTP achieves 27- to 38fold speedup compared to SATCHMO for some problems, although its speedup gain is about 3 to 5 for most problems. In SATCHMO, since a model candidate M is maintained by using assert/retract of Prolog, the complexity of CJM is always O(|M |). On the other hand, JavaMGTP can perform CJM of ground literals in O(1) with A-cells. Consequently, a remarkable eﬀect brought by this is seen for propositional problems as well as in Fig. 12. The diﬀerence in runtime for ﬁrst-order problems is mainly caused by that in speed between match-TM and linear-search based findall operations, employed in JavaMGTP and SATCHMO, respectively. To get an instance of a literal, the latter takes time proportional to the number N of asserted literals, while the former a constant time w.r.t. N .

A Model Generation Based Theorem Prover MGTP for First-Order Logic

4

189

Extensions of MGTP Features

4.1

Extension for Constraint Solving

In this section, we present two types of extensions of the MGTP system in terms of constraint solving. Those extensions aimed at solving constraint satisfaction problems in MGTP eﬃciently. MGTP presents a general framework to represent and solve ﬁrst order clauses, but sometimes it lacks the ability of constraint propagation using the problem (or domain) structures. We consider, as an example, quasigroup (QG) existence problems in ﬁnite algebra [3]. This problem can be deﬁned as ﬁnite-domain constraint satisfaction problems. In solving these problems, we found that the negative information should be propagated explicitly to prune redundant branches. This ability has been realized in the extension of MGTP, called CMGTP. Another example we consider here is channel routing problems in VLSI design. For these problems, it is needed to propagate interval constraint information as well as negative information. This additional propagation ability has been realized in the other extension of MGTP, called IV-MGTP. CMGTP. In 1992, MGTP succeeded in solving several open quasigroup (QG) problems on a parallel inference machine PIM/m consisting of 256 processors [13]. Later, other theorem provers or constraint solvers such as DDPP, FINDER, and Eclipse solved other new open problems more eﬃciently than the original MGTP. Those researches have revealed that the original MGTP lacked negative constraint propagation ability. This motivated us to develop CMGTP [28,50] that allows negated atoms in the MGTP clause to enable it to propagate negative constraints explicitly. Quasigroup Problems. A quasigroup is a pair Q, ◦ where Q is a ﬁnite set, ◦ a binary operation on Q and for any a, b, c ∈ Q, a◦b=a◦c⇒b=c a ◦ c = b ◦ c ⇒ a = b. The multiplication table of this binary operation ◦ forms a Latin square (shown in Fig. 14). QG problems we tried to solve are classiﬁed to 7 categories (called QG1, QG2, ..., QG7), each of which is deﬁned by adding some constraints to original quasigroup constraints. For example, QG5 constraint is deﬁned as ∀X, Y ∈ Q. ((Y ◦ X) ◦ Y ) ◦ Y = X. This constraint is represented with an MGTP clause: p(Y, X, A) ∧ p(A, Y, B) ∧ p(B, Y, C), X = C → .

(1)

From the view point of constraint propagation, rule (1) can be rewritten as follows2 : p(Y, X, A) ∧ p(A, Y, B) → p(B, Y, X). 2

In addition, we assume functionality in the arguments of p.

(2)

190

Ryuzo Hasegawa et al. ◦

1 2 3 4 5

1

1 3 2 5 4

2

5 2 4 3 1

3

4 5 3 1 2

4

2 1 5 4 3

5

3 4 1 2 5

Fig. 14. Latin square (order 5) p(Y, X, A) ∧ p(B, Y, X) → p(A, Y, B). p(B, Y, X) ∧ p(A, Y, B) → p(Y, X, A).

(3) (4)

These rules are still in the MGTP representation. To generate negative constraints, we add extra rules containing negative atoms to the original MGTP rule, by making contrapositives of it. For example, rule (2) can be augmented by the following rules: p(Y, X, A) ∧ ¬p(B, Y, X) → ¬p(A, Y, B).

(5)

p(A, Y, B) ∧ ¬p(B, Y, X) → ¬p(Y, X, A).

(6)

Each of the above rules is logically equivalent to (2), but has a diﬀerent operational meaning, that is, if a negative atom is derived, it can simplify the current disjunctive clauses in the disjunction buﬀer D. This simpliﬁcation can reduce the number of redundant branches signiﬁcantly. CMGTP Procedure. The structure of the model generation processes in CMGTP is basically the same as MGTP. The diﬀerences between CMGTP and MGTP are in the unit refutation processes and the unit simpliﬁcation processes with negative atoms. We can use negative atoms explicitly in CMGTP to represent constraints. If there exist P and ¬P in the current model candidate M , then f alse is derived by the unit refutation mechanism. If for a unit clause ¬Pi ∈ M (Pi ∈ M ), there exists a disjunction which includes Pi (¬Pi ), then Pi (¬Pi ) is removed from that disjunction by the unit simpliﬁcation mechanism. The refutation and simpliﬁcation processes added to MGTP guarantee that for any atom P ∈ M , P and ¬P are not in the current M simultaneously, and disjunctions in the current D have already been simpliﬁed by all unit clauses in M. Experimental Results. Table 1 compares the experimental results for QG problems on CP, CMGTP and other systems. CP is an experimental program written in SICStus Prolog, that is dedicated to QG problems [50]. In CP, the domain variable and its candidates to be assigned are represented with shared variables. The number of failed branches generated by CP and CMGTP are almost equal to DDPP and less than those from FINDER and MGTP. In fact, we

A Model Generation Based Theorem Prover MGTP for First-Order Logic

191

Table 1. Comparison of experimental results for QG5 Failed Branches Order DDPP FINDER MGTP CP CMGTP IV-MGTP 9 15 40 239 15 15 15 10 50 356 7026 38 38 52 11 136 1845 51904 117 117 167 12 443 13527 2749676 372 372 320

conﬁrmed that CP and CMGTP have the same pruning ability as DDPP by comparing the proof trees generated by these systems. The slight diﬀerences in the number of failed branches were caused by the diﬀerent selection functions used. For general performance, CP was superior to the other systems in almost every case. In particular, we obtained a new result in October 1993 that no model exists for QG5.16 by running CP on a SPARCstation-10 for 21 days. On the other hand, CMGTP is about 10 times slower than CP. The diﬀerence in speed is mainly caused by the term memory manipulation necessary for CMGTP. IV-MGTP. In MGTP (CMGTP), interpretations (called model candidates) are represented as ﬁnite sets of ground atoms (literals). In many situations this turns out being too redundant. Take, for example, variables I, J ranging over the domain {1, . . . , 4}, and interpret ≤, + naturally. A rule like “p(I) ∧ {I + J ≤ 4} → q(J)” splits into three model extensions: q(1), q(2), q(3), if p(1) is present in the current model candidate. Now assume we have the rule “q(I)∧q(J)∧{I = J} → .” saying that q is functional in its argument and, say, q(4) is derived from another rule. Then all three branches must be refuted separately. Totally ordered, ﬁnite domains occur naturally in many problems. In such problems, situations such as the one just sketched are common. Thus we developed an IV-MGTP system [19] to enhance MGTP with mechanisms to deal with them eﬃciently. Introducing Constrained Atoms into MGTP. In order to enhance MGTP with totally ordered, ﬁnite domain constraints, we adopt the notation: p(t1 , . . . , tr , S1 , . . . , Sm ) for what we call a constrained atom. This notation is motivated from the viewpoint of signed formula logic programming (SFLP) [37] and constraint logic programming (CLP) over ﬁnite domains [41]. Constrained atoms explicitly stipulate subsets of domains and thus are in solved form. The language of IV-MGTP needs to admit other forms of atoms, in order to be practically useful in solving problems with totally ordered domains. An IV-MGTP atom is an expression p(t1 , . . . , tr , κ1 , . . . , κm ), where the κi has one of the following forms: 1. {i1 , . . . , ir }, where ij ∈ N for 1 ≤ j ≤ r (κi is in solved form); 2. ]ι1 , ι2 [, where ιj (j = 1, 2) ∈ N ∪ CVar; the intended meaning is ]ι1 , ι2 [ = {i ∈ N | i < ι1 or i > ι2 };

192

Ryuzo Hasegawa et al.

3. [ι1 , ι2 ], where ιj (j = 1, 2) ∈ N ∪ CVar; the intended meaning is [ι1 , ι2 ] = {i ∈ N | ι1 ≤ i ≤ ι2 }; 4. U ∈DVar. where CVar is a set of constraint variables which hold elements from a domain N , and DVar is a set of domain variables which hold subsets of a domain N . In this framework, since intervals play a central role, we gave the name IV-MGTP to the extension of MGTP. For each predicate p with constrained arguments, an IV-MGTP program contains a declaration line of the form “declare p(t, . . . , t, j1 , . . . , jm )”. If the i-th place of p is t, then the i-th argument of p is a standard term; if the i-th place of p is a positive integer j, then the i-th argument of p is a constraint over the domain {1, . . . , j}. Each IV-MGTP atom p(t1 , . . . , tr , κ1 , . . . , κm ) consists of two parts: the standard term part p(t1 , . . . , tr ) and the constraint part κ1 , . . . , κm . Each of r and m can be 0. The latter, m = 0, is in particular the case for a predicate that has no declaration. By this convention, every MGTP program is an IV-MGTP program. If m = 1 and the domain of κ1 is {1, 2}, the IV-MGTP programs are equivalent to CMGTP programs where {1} is interpreted as positive and {2} as negative. Hence, every CMGTP program is also an IV-MGTP program. Model Candidates in IV-MGTP. While the deduction procedure for IV-MGTP is almost the same as for CMGTP, model candidates are treated diﬀerently. In MGTP, a list of current model candidates that represent Herbrand interpretations is kept during the deduction process, and model candidates can be simply identiﬁed with sets of ground atoms. The same holds in IV-MGTP, only that some places of a predicate contain a ground constraint in solved form (that is: a subset of a domain) instead of a ground term. Note that, while in MGTP one model candidate containing ground atoms {L1 , . . . , Lr } trivially represents exactly one possible interpretation of the set of atoms {L1 , . . . , Lr }, in IV-MGTP one model candidate represents many IV-MGTP interpretations which diﬀer in the constraint parts. Thus, model candidates can be conceived as sets of constrained atoms of the form p (t1 , . . . , tr , S1 , . . . , Sm ), where the Si are subsets of the appropriate domain. If M is a model candidate, p(t1 , . . . , tr ) the ground term part, and

S1 , . . . , Sm the constraint part in M , then deﬁne M ( p(t1 , . . . , tr ) ) = S1 , . . . , Sm . We say that a ground constrained atom L = p (t1 , . . . , tr , i1 , . . . , im ) is satisfied by M (M |= L) iﬀ there are domain elements s1 ∈ i1 , . . . , sm ∈ im such that s1 , . . . , sm ∈ M (p(t1 , . . . , tr )). Formally, a model candidate M is a partial function that maps ground instances of the term part of constrained atoms which is declared as “p(t, . . . , t, j1 , . . . , jm )” into (2{1,...,j1 } −{∅})×· · ·×(2{1,...,jm } −{∅}). Note that M (p(t1 , . . . , tr )) can be undeﬁned. Besides rejection, subsumption, and extension of a model candidate, in IVMGTP there is a fourth possibility not present in MGTP, that is, model can-

A Model Generation Based Theorem Prover MGTP for First-Order Logic

193

didate update. We see that model candidate update is really a combination of subsumption and rejection. Consider the following example. Example 1. Let C = p({1, 2}) be the consequent of an IV-MGTP rule and assume M (p) = {2, 3}. Neither is the single atom in C inconsistent with M nor is it subsumed by M . Yet the information contained in C is not identical to that in M and it can be used to reﬁne M to M (p) = {2}. Channel Routing Problems. Channel routing problems in VLSI design can be represented as constraint satisfaction problems, in which connection requirements (what we call nets) between terminals must be solved under the condition that each net has a disjoint path from all others. For these problems, many specialized solvers employing heuristics were developed. Our experiments are not primarily intended to compare IV-MGTP with such solvers, but to show the eﬀectiveness of the interval/extraval representation and its domain calculation in the IV-MGTP procedure. We consider a multi-layer channel which consists of multiple layers, each of which has multiple tracks. We assume in addition, to simplify the problem, that each routing path makes no detour and contains only one track. By this assumption, the problem can be formalized to determine the layer and the track numbers for each net with the help of constraints that express the two binary relations: not equal (neq) and above. neq(N1 , N2 ) means that the net N1 and N2 do not share the same track. above(N1 , N2 ) means that if N1 and N2 share the same layer, the track number of N1 must be larger than that of N2 . For example, not equal constraints for nets N1 and N2 are represented in IV-MGTP as follows: p(N1 , [L, L], [T1 , T1 ]) ∧ p(N2 , [L, L], [T21 , T22 ]) ∧ neq(N1 , N2 ) → p(N2 , [L, L], ]T1 , T1 [) where the predicate p has two constraint domains: layer number L and track number Ti . Experimental Results. We developed an IV-MGTP prototype system in Java and made experiments on a Sun Ultra 5 under JDK 1.2. The results are compared with those on the same problems formulated and run with CMGTP [50] (also written in Java [21]). We experimented with problems consisting of 6, 8, 10, and 12 net patterns on the 2 layers channel each of which has 3 tracks. The results are shown in Table 2. IV-MGTP reduces the number of models considerably. For example, we found the following model in a 6-net problem: { p(1, [1, 1], [3, 3]), p(2, [1, 1], [1, 1]), p(3, [1, 1], [2, 2]), p(4, [2, 2], [2, 3]), p(5, [2, 2], [1, 2]), p(6, [1, 1], [2, 3]) }, which contains 8 (= 1 × 1 × 1 × 2 × 2 × 2) CMGTP models. The advantage of using IV-MGTP is that the diﬀerent feasible track numbers can be represented as

194

Ryuzo Hasegawa et al.

Table 2. Experimental results for the channel routing problem Number of Nets = 6 IV-MGTP CMGTP models 250 840 branches 286 882 runtime(msec) 168 95 Number of Nets = 10 models branches runtime(msec)

IV-MGTP CMGTP 4998 51922 6238 52000 2311 3882

Number of Nets = 8 models branches runtime(msec)

IV-MGTP CMGTP 1560 10296 1808 10302 706 470

Number of Nets = 12 models branches runtime(msec)

IV-MGTP CMGTP 13482 538056 20092 539982 7498 31681

interval constraints. In CMGTP, the above model is split into 8 diﬀerent models. Obviously, as the number of nets increases, the reduction ratio of the number of models becomes larger. We conclude that IV-MGTP can eﬀectively suppress unnecessary case splitting by using interval constraints, and hence, reduce the total size of proofs. Because CMGTP program can be transferred to IV-MGTP program, QG problems can be transferred into IV-MGTP program. IV-MGTP, however, cannot solve QG problems more eﬃciently than CMGTP, that is, QG problems do not receive the beneﬁt of IV-MGTP representation and deduction process. The eﬃciency or advantage by using IV-MGTP depends on the problem domain how beneﬁcial the eﬀect of interval/extraval constraints on performance is. For problems where the ordering of the domain elements has no signiﬁcance, such as the elements of a QG problem (whose numeric elements are considered strictly as symbolic values, not arithmetic values), CMGTP and IV-MGTP have essentially the same pruning eﬀect. However, where reasoning on the arithmetic ordering between the elements is important, such as in channel routing problems, IV-MGTP outperforms CMGTP. Completeness. MGTP provides a sound and complete procedure in the sense of standard Herbrand interpretation. The extensions, CMGTP and IV-MGTP described above, however, lost completeness [19]. The reason is essentially the same as for incompleteness of resolution and hypertableaux with unrestricted selection function [18]. It can be demonstrated with the simple example P = {→ p, ¬q → ¬p, q →}. The program P is unsatisﬁable, yet deduction procedures based on selection of only antecedent (or only consequent) literals cannot detect this. Likewise, the incomplete treatment of negation in CMGTP comes up with the incorrect model {p} for P . The example can be transferred to IV-MGTP 3 . Assume p and q are 3

We discuss only about IV-MGTP in the rest of this section, because CMGTP can be considered as a special case of IV-MGTP. It is suﬃcient to say about IV-MGTP.

A Model Generation Based Theorem Prover MGTP for First-Order Logic

195

deﬁned “declare p(2)” and “declare q(2)”. The idea is to represent a positive literal p with p({2}) and a negative literal ¬p with p({1}). Consider P = {→ p({2}), q({1}) → p({1}), q({2}) →}

(7)

which is unsatisﬁable (recall that p and q are functional), but has an IV-MGTP model, where M (p) = {2}, and M (q) is undeﬁned. In order to handle such cases, we adopt a non-standard semantics called extended interpretations which is suggested in SFLP [37]. The basic idea underlying extended interpretations (e-interpretations) is to introduce the disjunctive information inherent to constraints into the interpretations themselves. In e-interpretations, an e-interpretation of a predicate p is a partial function I mapping ground instances of the term part of p into its constraint part. This means that the concepts introduced for model candidates can be used for einterpretations. An extended interpretation I does e-satisfy an IV-MGTP ground atom L= , p( t1 , . . . , tr , S1 , . . . , Sm ) iﬀ I(p(t1 , . . . , tr )) is deﬁned, has the value S1 , . . . , Sm and Si ⊆ Si for all 1 ≤ i ≤ m. Using the above deﬁnition, we have proved the following completeness theorem [19]. Theorem 1 (Completeness). An IV-MGTP program P having an IV-MGTP model M is e-satisfiable by M (viewed as an e-interpretation). Simple conversion of this theorem and proof makes the case of CMGTP trivial. 4.2

Non-Horn Magic Set

The basic behaviors of model generation theorem provers, such as SATCHMO and MGTP, are to detect a violated clause under some interpretation, called a model candidate, and to extend the model candidate so that the clause is satisﬁed. However, when there are several violated clauses, a computational cost may greatly diﬀer according to the order in which those clauses are evaluated. Especially when a non-Horn clause irrelevant to the given goal is selected, many interpretations generated with the clause would become useless. Thus, in the model generation method, it is necessary to develop a method to suppress the generation of useless interpretations. To this end, Loveland et al. proposed a method, called relevancy testing [56,36], to restrict the selecting of a violated clause to only those whose consequent literals are all relevant to the given goal (“totally relevant”). Then they implemented this idea in SATCHMORE (SATCHMO with RElevancy). Let HC be a set of Horn clauses, and I be a current model candidate. A relevant literal is deﬁned as a goal called in a failed search to prove ⊥ from HC ∪ I or a goal called in a failed search to prove the antecedent of a non-Horn clause by Prolog execution.

196

Ryuzo Hasegawa et al.

The relevancy testing can avoid useless model extension with irrelevant violated clauses. However, there is some overhead, because it computes relevant literals dynamically by utilizing Prolog over Horn clauses whenever violated nonHorn clauses are detected. On the other hand, compared to top-down provers, a model generation prover like SATCHMO or MGTP can avoid solving duplicate subgoals because it is based on bottom-up evaluation. However, it also has the disadvantage of generating irrelevant atoms to prove the given goal. Thus it is necessary to combine bottom-up with top-down proving to use goal information contained in negative clauses, and to avoid generating useless model candidates. For this purpose, several methods such as magic sets, Alexander templates, and bottom-up metainterpretation have been proposed in the ﬁeld of deductive databases [9]. All of these transform the given Horn intentional databases to eﬃcient Horn intentional databases, which generate only ground atoms relevant to the given goal in extensional databases. However, these were restricted to Horn programs. To further extend these methods, we developed a new transformation method applicable to non-Horn clauses. We call it the non-Horn magic set (NHM) [24]. NHM is a natural extension of the magic set yet works within the framework of the model generation method. Another extension for non-Horn clauses has been proposed, which simulates top-down execution based on the model elimination procedure within a forward chaining paradigm [52]. In the NHM method, each clause in a given clause set is transformed into two types of clauses. One is used to simulate backward reasoning and the other is to control inferences in forward reasoning. The set of transformed clauses is proven by bottom-up theorem provers. There are two kinds of transformation methods: the breadth-first NHM and the depth-first NHM. The former simulates breadth-ﬁrst backward reasoning, and the latter simulates depth-ﬁrst backward reasoning. Breadth-first NHM. For the breadth-ﬁrst NHM method, a clause A1 ∧ · · · ∧ An → B1 ∨ · · · ∨ Bm in the given clause set S is transformed into the following (extended) clauses: TB1 : goal(B1 ) ∧ . . . ∧ goal(Bm ) → goal(A1 ) ∧ . . . ∧ goal(An ). TB2 : goal(B1 ) ∧ . . . ∧ goal(Bm ) ∧ A1 ∧ . . . ∧ An → B1 ∨ . . . ∨ Bm . In this transformation, for n = 0 (a positive clause), the ﬁrst transformed clause TB1 is omitted. For m = 0 (a negative clause), the conjunction of goal(B1 ), . . . , goal(Bm ) becomes true. For n = 0, two clauses TB1 and TB2 are obtained by the transformation. Here, the meta-predicate goal(A) represents that the atom A is relevant to the goal and it must be solved. The clause TB1 simulates top-down evaluation. Intuitively, TB1 means that when it is necessary to solve the consequent B1 , . . . , Bm of the original clause, it is necessary to solve the antecedent A1 , . . . , An before doing that. The n antecedent literals are solved in parallel. On the other hand, the clause TB2 simulates relevancy testing. TB2 means that a model extension with

A Model Generation Based Theorem Prover MGTP for First-Order Logic

197

the consequent is performed only when A1 , . . . , An are satisﬁed by the current model candidate and all the consequent atoms B1 , . . . , Bm are relevant to the given goal. That is, the original clause is not used for model extension if there exists any consequent literal Bj such that Bj is not a goal. Depth-first NHM. For the depth-ﬁrst NHM transformation, a clause A1 ∧ · · · ∧ An → B1 ∨ · · · ∨ Bm in S is transformed into n + 1 (extended) clauses: 1 : goal(B1 ) ∧ . . . ∧ goal(Bm ) → goal(A1 ) ∧ contk,1 (Vk ). TD 2 TD : contk,1 (Vk ) ∧ A1 → goal(A2 ) ∧ contk,2 (Vk ). .. . n : contk,(n−1) (Vk ) ∧ An−1 → goal(An ) ∧ contk,n (Vk ). TD n+1 TD : contk,n (Vk ) ∧ An → B1 ∨ . . . ∨ Bm .

where k is the clause identiﬁer of the original clause, Vk is the tuple of all variables appearing in the original clause. The transformed clauses are interpreted as follows: If all consequent literals B1 , · · · , Bm are goals, we ﬁrst attempt to solve the ﬁrst atom A1 . At that time, the variable bindings obtained in the sat2 by isﬁability checking of the antecedent are propagated to the next clause TD the continuation literal contk,1 (Vk ). If atom A1 is solved under contk,1 (Vk ), then we attempt to solve the second atom A2 , and so on. Unlike the breadth-ﬁrst NHM transformation, n antecedent atoms are being solved sequentially from A1 to An . During this process, the variable binding information is propagated from A1 to An in this order. Several experimental results obtained so far suggest that the NHM and relevancy testing methods have a similar or the same pruning ability. To clarify this, we deﬁned the concept of weak relevancy testing that mitigates the condition of relevancy testing, and then proved that the NHM method is equivalent to the weak relevancy testing in terms of the ability to prune redundant branches [45]. However, signiﬁcant diﬀerences between NHM and SATCHMORE can be admitted. First, SATCHMORE performs the relevancy testing dynamically during proof, while NHM is based on the static analysis of input clauses and transforms them as a preprocessing of proof. Second, the relevancy testing by SATCHMORE repeatedly calls Prolog to compute relevant literals backward whenever a new violated clause is found. This process often results in re-computation of the same relevant literals. In contrast, for NHM, goal literals are computed forward and their re-computation is avoided. 4.3

Eliminating Redundant Searches by Dependency Analysis

There are two types of redundancies in model generation: One is that the same subproof tree may be generated at several descendants after a case-splitting occurs. Another is caused by unnecessary model candidate extensions. Folding-up is a well known technique for eliminating duplicate subproofs in a tableaux framework [34]. In order to embed folding-up into model generation,

198

Ryuzo Hasegawa et al.

B1 σ

Bi σ

Bm σ

A1

Ai

Am

Fig. 15. Model extension

we have to analyze dependency in a proof for extracting lemmas from proven subproofs. Lemmas are used for pruning other remaining subproofs. Dependency analysis makes unnecessary parts visible because such parts are independent of essential parts in the proof. In other words, we can separate unnecessary parts from the proof according to dependency analysis. Identifying unnecessary parts and eliminating them are considered as proof simplification. The computational mechanism for their elimination is essentially the same as that for proof condensation [46] and level cut [2]. Taking this into consideration, we implemented not only folding-up but also proof condensation by embedding a single mechanism, i.e. proof simpliﬁcation, into model generation [32]. In the following, we consider the function M G in Fig. 1 to be a builder of proof trees in which each leaf is labeled with ⊥ (for a failed branch, that is, UNSAT) or (for a success branch, that is, SAT), and each non-leaf node is labeled with an atom used for model extension. Definition 1 (Relevant atom). Let P be a finite proof tree. A set Rel(P ) of relevant atoms of P is defined as follows: 1. If P = ⊥ and A1 σ ∧ . . . ∧ An σ → is the negative clause used for building P , then Rel(P ) = {A1 σ, . . . , An σ}. 2. If P = , then Rel(P ) = ∅. 3. If P is in the form depicted in Fig. 15, A1 σ ∧ . . . ∧ An σ → B1 σ ∨ . . . ∨ Bm σ is the mixed or positive clause used for forming the root of P and (a) ∀i(1 ≤ i ≤ m)Bi σ ∈ Rel(Pi ), then Rel(P ) = ∪m i=1 (Rel(Pi ) \ {Bi σ}) ∪ {A1 σ, . . . , An σ} (b) ∃i(1 ≤ i ≤ m)Bi σ ∈ Rel(Pi ), then Rel(P ) = Rel(Pi0 ) (where i0 is the minimal index satisfying 1 ≤ i0 ≤ m and Bi0 σ ∈ Rel(Pi0 )) Informally, relevant atoms of a proof tree P are atoms which contribute to building P and appear as ancestors of P if P does not contain . If P contains , the set of relevant atoms of P is ∅. Definition 2 (Relevant model extension). A model extension with a clause A1 σ ∧ . . .∧ An σ → B1 σ ∨ . . .∨ Bm σ is relevant to the proof if the model extension yields the proof tree in the form depicted is Fig. 15 and either ∀i(1 ≤ i ≤ m)Bi σ ∈ Rel(Pi ) or ∃i(1 ≤ i ≤ m)(Pi contains ) holds.

A Model Generation Based Theorem Prover MGTP for First-Order Logic

199

We can eliminate irrelevant model extensions as follows. Let P be a proof tree in the form depicted in Fig. 15. If there exists a subproof tree Pi (1 ≤ i ≤ m) such that Bi σ ∈ Rel(Pi ) and Pi does not contain , we can conclude that the model extension forming the root of P is unnecessary because Bi σ does not contribute to Pi . Therefore, we can delete other subproof trees Pj (1 ≤ j ≤ m, j = i) and take Pi to be a simpliﬁed proof tree of P . When P contains , we see that the model extension forming the root of P is necessary from a model ﬁnding point of view. Performing proof simpliﬁcation during the proof, instead of after the proof has been completed, makes the model generation procedure more eﬃcient. Let assume that we build a proof tree P (in the form depicted in Fig. 15) in a leftﬁrst manner and check whether Bi σ ∈ Rel(Pi ) after Pi is built. If Bi σ ∈ Rel(Pi ) holds, we can ignore building the proofs Pj (i < j ≤ m) because the model extension does not contribute to the proof Pi . Thus m − i out of m branches are eliminated after i branches have been explored. This proof elimination mechanism is essentially the same as the proof condensation [46] and the level cut [2] facilities. We can make use of a set of relevant atoms not only for proof condensation but also for generating lemmas. Theorem 2. Let S be a set of clauses, M a set of ground atoms and P = M G(U0 , D0 , ∅). Note that M G in Fig. 1 is modified to return a proof tree. If all leaves in P are labeled with ⊥, i.e. P does not contain , then S ∪ Rel(P ) is unsatisfiable. This theorem says that a set of relevant atoms can be considered as a lemma. Consider the model generation procedure shown in Fig. 1. Let M be a current model candidate and P be a subproof tree which was previously obtained and does not contain . If M ⊃ Rel(P ) holds, we can reject M without further proving because S ∪ M is unsatisﬁable where S is a clause set to be proven. This rejection mechanism can reduce search spaces by orders of magnitude. However, it is expensive to test whether M ⊃ Rel(P ). Thus, we restrict the usage of the rejection mechanism. Definition 3 (Context unit lemma). Let S be a set of clauses and P a proof tree of S in the form depicted in Fig. 15. When Bi σ ∈ Rel(Pi ), Rel(Pi ) \ {Bi σ} |=S ¬Bi σ is called a context unit lemma4 extracted from Pi . We call Rel(Pi ) \ {Bi σ} the context of the lemma. Note that Bi σ ∈ Rel(Pi ) implies Rel(Pi ) is not empty. Therefore, Pi does not contain . Thus, S ∪ Rel(Pi ) is unsatisﬁable according to Theorem 2. The context of the context unit lemma extracted from Pi (1 ≤ i ≤ m) is satisﬁed in model candidates of sibling proofs Pj (j = i, 1 ≤ j ≤ m), that is, the lemma is available in Pj . Furthermore, the lemma can be lifted to the nearest ancestor’s node which does not satisfy the context (in other words, which is 4

Γ |=S L is an abbreviation of S ∪ Γ |= L where Γ is a set of ground literals, S is a set of clauses, and L is a literal.

200

Ryuzo Hasegawa et al.

labeled with an atom in the context) and is available in its descendant’s proofs. Lifting context unit lemmas to appropriate nodes and using them for pruning proof tree is an implementation of folding-up [34] for model generation. In this way, not only folding-up but also proof condensation can be achieved by calculating sets of relevant atoms of proofs. We have already implemented the model generation procedure with folding-up and proof condensation and experienced their pruning eﬀects on some typical examples. For all non-Horn problems (1984 problems) in the TPTP library [53] version 2.2.1, the overall success rate was about 19% (cf., pure model generation 16%, Otter(v3.0.5) 27%5 ) for a time limit of 10 minutes on a Sun Ultra1 (143MHz, 256MB, Solaris2.5.1) workstation. 4.4

Minimal Model Generation

The notion of minimal models is important in a wide range of areas such as logic programming, deductive databases, software veriﬁcation, and hypothetical reasoning. Some applications in such areas would actually need to generate Herbrand minimal models of a given set of ﬁrst-order clauses. A model generation algorithm can generate all minimal Herbrand models if they are ﬁnite, though it may generate non-minimal models [10]. Bry and Yahya proposed a sound (in the sense that it generates only minimal models) and complete (in the sense that it generates all minimal models) minimal model generation prover MM-SATCHMO [10]. It uses complement splitting (or foldingdown in [34]) for pruning some branches leading to nonminimal models and constrained search for eliminating non-minimal models. Niemel¨ a also presented a propositional tableaux calculus for minimal model reasoning [43], where he introduced the groundedness test which substitutes for constrained searches. The following theorem says that a model being eliminated by factorization [34] in the model generation process is not minimal. This implies that model generation with factorization is complete for generating minimal models. It is also known that factorization is more ﬂexible than complement splitting for pruning the redundant search spaces [34]. Theorem 3. Let P be a proof tree of a set S of clauses. We assume that N1 and N2 are sibling nodes in P , Ni is labeled with a literal Li , and Pi is a subproof tree under Ni (i = 1, 2) shown in Fig. 16(a). If there is a node N3 , descended from N2 , labeled with L1 , then for each model M found in proof tree P3 , there exists a model M found in P1 such that M ⊂ M where P3 is a subproof tree under N3 (Fig. 16(b)). To avoid a circular argument, the proof tree has to be supplied with an additional factorization dependency relation. 5

This measurement is obtained by our experiment with just Otter (not Otter+MACE).

A Model Generation Based Theorem Prover MGTP for First-Order Logic L1

L2 N2 N1 L1

L2 N2

P1

P2 (a)

L1 N3

L11

L1i

P3 (b)

(c)

201

N1 L1m1

N1 L1

L2 N2

L1 N3 (d)

Fig. 16. Proof trees explaining Theorem 3, 4 and Deﬁnition 5

Definition 4 (Factorization dependency relation). A factorization dependency relation on a proof tree is a strict partial ordering ≺ relating sibling nodes in the tree (N1 ≺ N2 means that searching minimal models under N2 is delegated to that under N1 ). Definition 5 (Factorization). Given a proof tree P and a factorization dependency relation ≺ on P . First, select a node N3 labeled with literal L1 and another node N1 labeled with the same literal L1 such that (1) N3 is a descendant of N2 which is the sibling node of N1 , and (2) N2 ≺ N1 . Then, mark N3 with N1 and modify ≺ by first adding the pair of nodes

N1 , N2 and then forming the transitive closure of the relation. We say that N3 has been factorized with N1 . Marking N3 with N1 indicates finding models under N3 is delegated to that under N1 . The situation is depicted in Fig. 16(d). Corollary 1. Let S be a set of clauses. If a minimal model M of S is built by model generation, then M is also built by model generation with factorization. We can replace L1 ∨ L2 ∨ . . . ∨ Ln used for non-Horn extension with an augmented one (L1 ∧ ¬L2 ∧ . . . ∧ ¬Ln ) ∨ (L2 ∧ ¬L3 ∧ . . . ∧ ¬Ln ) ∨ . . . ∨ Ln , which corresponds to complement splitting. Here a negated literal is called a branching assumption. If none of branching assumptions ¬Li+1 , . . . , ¬Ln is used in a branch expanded below Li , we can use ¬Li as a unit lemma in the proof of Lj (i + 1 ≤ j ≤ n). The unit lemma is called a branching lemma. We consider model generation with complement splitting as pre-determining factorization dependency relation on sibling nodes N1 , . . . , Nm as follows: Nj ≺ Ni if i < j for all i and j (1 ≤ i, j ≤ m). According to this consideration, complement splitting is a restricted way of implementing factorization. We have proposed a minimal model generation procedure [23] that employs branching assumptions and lemmas. We consider model generation with branching assumptions and lemmas as arranging factorization dependency relation on sibling nodes N1 , . . . , Nm as follows: For each i (1 ≤ i ≤ m), Nj ≺ Ni for all j (i < j ≤ m) if Nj0 ≺ Ni for some j0 (i < j0 ≤ m) and otherwise Ni ≺ Nj for all j (i < j ≤ m). Performing branching assumptions and lemmas can still be taken as a restricted implementation of factorization. Nevertheless, it provides

202

Ryuzo Hasegawa et al.

Table 3. Results of MM-MGTP and other systems Problem ex1 (N=5) ex1 (N=7) ex2 (N=14) ex3 (N=16) ex3 (N=18) ex4

ex5

MM-MGTP Rcmp Mchk 0.271 0.520 100000 100000 0 0 34.150 OM (>144) 10000000 − 0 − 0.001 0.001 1 1 26 26 19.816 5.076 65536 65536 1 1 98.200 26.483 262144 262144 1 1 0.002 0.002 341 341 96 96 0.001 0.001 17 17 84 84

MMSATCHMO 8869.950 100000 0 OM (>40523) − − 1107.360 1 1594323 OM (>2798) − − OM (>1629) − − 0.3 341 284 0.25 17 608

MGTP 0.199 100000 0 19.817 10000000 0 9.013 1594323 0 589.651 86093442 0 5596.270 774840978 0 0.004 501 0 0.001 129 0

top: time(sec), middle: No. of models, bottom: No. of failed branches, OM: Out of memory. MM-MGTP and MGTP: run on Java (Solaris JDK 1.2.1 03) MM-SATCHMO: run on ECLi PSs Prolog Version 3.5.2 All programs were run on Sun Ultra10 (333MHz, 128MB)

an eﬃcient way of applying factorization to minimal model generation, since it is unnecessary to compute the transitive closure of the factorization dependency relation. In order to make the procedure sound in the sense that it generates only minimal models, it is necessary to test whether a generated model is minimal or not. The following theorem gives a necessary condition for a generated model to be nonminimal. Theorem 4. Let S be a set of clauses and P a proof tree of S obtained by the model generation with factorization. We assume that N1 and N2 are sibling nodes in P , Pi a subproof tree under Ni , and Mi a model found in Pi (i = 1, 2). If N2 ≺ N1 , then M1 ⊂ M2 . Theorem 4 says that we have to test whether M1 ⊂ M2 only when Mi is found under a node Ni (i = 1, 2) such that N2 ≺ N1 .

A Model Generation Based Theorem Prover MGTP for First-Order Logic

203

We implemented a minimal model generation prover called MM-MGTP with branching assumptions and lemmas on Java [23]. The implementation takes Theorem 4 into account. It is applicable to ﬁrst-order clauses as well as MMSATCHMO. Table 3 shows experimental results on MM-MGTP, MM-SATCHMO, and MGTP. There are two versions of MM-MGTP: model checking (Mchk) and model re-computing (Rcmp). The former is based on constrained search and the latter on the groundedness test. Although the model checking MM-MGTP is similar to MM-SATCHMO, the way of treating model constraints diﬀers somewhat. Instead of dynamically adding model constraints (negative clauses) to the given clause set, MM-MGTP retains them in the form of a model tree consisting of only models. Thus, the constrained search for minimal models in MM-SATCHMO is replaced by a model tree traversal for minimality testing. In the model re-computing version, a re-computation procedure for minimality testing is invoked instead of a model tree traversal. The procedure is the same as M G except that some routines are modiﬁed for restarting the execution. It returns UNSAT if the current model is minimal, otherwise SAT. Experimental results show remarkable speedup compared to MM-SATCHMO. See [23] for a detailed consideration on the experiment.

5

Applications

A model generation theorem prover has a general reasoning power in various AI applications. In particular, we ﬁrst implemented a propositional modal tableaux system on MGTP, by representing each rule of tableaux with MGTP input clauses. This approach has lead to research on logic programming with negation as failure [29], abductive reasoning [30], modal logic systems [31], mode analysis of FGHC programs [54], and legal reasoning [44,27], etc. In the following sections, we focus on the issue of implementing negation as failure within a framework of model generation, and describe how this feature is used to build a legal reasoning system. 5.1

Embedding Negation as Failure into MGTP

Negation as failure is one of the most important techniques developed in the logic programming ﬁeld, and logic programming supporting this feature can be a powerful knowledge representation tool. Accordingly, declarative semantics such as the answer set semantics have been given to extensions of logic programs containing both negation as failure (not) and classical negation (¬), where the negation as failure operator is considered to be a non-monotonic operator [16]. However, for such extended classes of logic programs, the top-down approach cannot be used for computing the answer set semantics because there is no local property in evaluating programs. Thus, we need bottom-up computation for correct evaluation of negation as failure formulas. For this purpose, we use the

204

Ryuzo Hasegawa et al.

framework of MGTP, which can ﬁnd the answer sets as the ﬁxpoint of model candidates. Here, we introduce a method to transform any logic program (with negation as failure) into a positive disjunction program (without negation as failure) [40] for which MGTP can compute the minimal models [29]. Translation into MGTP Rules. A positive disjunctive program is a set of rules of the form: (8) A1 | . . . | Al ← Al+1 , . . . , Am where m ≥ l ≥ 0 and each Ai is an atom. The meaning of a positive disjunctive program P can be given by the minimal models of P [40]. The minimal models of positive disjunctive programs can be computed using MGTP. We represent each rule of the form (8) in a positive disjunctive program with the following MGTP input clauses: Al+1 ∧ . . . ∧ Am → A1 ∨ . . . ∨ Al

(9)

General and Extended Logic Programs. MGTP can also compute the stable models of a general logic program [15] and the answer sets of an extended disjunctive program [16] by translation into positive disjunctive programs. An extended logic program is a set of rules of the form: L1 | . . . | Ll ← Ll+1 , . . . , Lm , not Lm+1 , . . . , not Ln

(10)

where n ≥ m ≥ l ≥ 0 and each Li is a literal. This logic program is called a general logic program if l ≤ 1, and each Li is an atom. While a general logic program contains negation-as-failure but does not contain classical negation, an extended disjunctive program contains both of them. In evaluating not L in a bottom-up manner, it is necessary to interpret not L with respect to a ﬁxpoint of the computation, because even if L is not currently proved, L might be proved in subsequent inferences. When we have to evaluate not L in a current model candidate, we split the model candidate into two: (1) the model candidate where L is assumed not to hold, and (2) the model candidate where it is necessary that L holds. Each negation-as-failure formula not L is thus translated into negative and positive literals with a modality expressing belief, i.e., “disbelieve L” (written as ¬KL) and “believe L” (written as KL). Based on the above discussion, we translate each rule of the form (10) to the following MGTP rule: Ll+1 ∧ . . . ∧ Lm → H1 ∨ . . . ∨ Hl ∨ KLm+1 ∨ . . . ∨ KLn

(11)

where Hi ≡ ¬KLm+1 ∧ . . . ∧ ¬KLn ∧ Li (i = 1, . . . , l) For any MGTP rule of the form (11), if a model candidate M satisﬁes Ll+1 , . . . , Lm , then M is split into n − m + l (n ≥ m ≥ 0, 0 ≤ l ≤ 1) model candidates.

A Model Generation Based Theorem Prover MGTP for First-Order Logic

205

In order to reject model candidates when their guesses turn out to be wrong, the following two schemata (integrity constraints) are introduced: ¬KL ∧ L →

for every literal L ∈ L .

(12)

¬KL ∧ KL →

for every literal L ∈ L .

(13)

Added to the schemata above, we need the following 3 schemata to deal with classical negation. Below, L is the literal complement to a literal L. L∧L → KL ∧ L →

for every literal L ∈ L . for every literal L ∈ L .

(14) (15)

KL ∧ KL →

for every literal L ∈ L .

(16)

Next is the condition to guarantee stability at a ﬁxpoint that all of the guesses made so far in a model candidate M are correct. For every ground literal L, if KL ∈ M , then L ∈ M. The above computation by the MGTP is sound and complete with respect to the answer set semantics. This technique is simply based on a bottom-up model generation method together with integrity constraints over K-literals expressed by object-level schemata on the MGTP. Compared with other approaches, the proposed method has several computational advantages: put simply, it can ﬁnd all minimal models for every class of groundable logic program or disjunctive database, incrementally, without backtracking, and in parallel. This method has been applied to a legal reasoning system [44]. 5.2

Legal Reasoning

As an real application, MGTP has been applied to a legal reasoning system [44,27]. Since legal rules imply uncertainty and inconsistency, we have to introduce other language rather than the MGTP input language, for users to represent law and some judicial precedents. In this section, we show an extended logic programming language, and a method to translate it into the MGTP input clauses to solve legal problems automatically using MGTP. Extended Logic Programming Language. In our legal reasoning system, we adopted the extended logic programming language deﬁned below to represent legal knowledge and judicial precedents. We consider rules of the form: R :: L0 ← L1 ∧ . . . ∧ Lm ∧ not Lm+1 ∧ . . . ∧ not Ln .

(17)

R ::← L1 ∧ L2 . R :: L0 ⇐ L1 ∧ . . . ∧ Lm ∧ not Lm+1 ∧ . . . ∧ not Ln .

(18) (19)

206

Ryuzo Hasegawa et al.

where Li (0 ≤ i ≤ n) represents a literal, not represents negation as failure (NAF), and R is a rule identiﬁer, which has all variables occurring in Li (0 ≤ i ≤ n) as its arguments. (17) is called an exact rule, in which if all literals in the rule body are assigned true, then the rule head is assigned true without any exception. (18) is called an integrity constraint which means the constraint that L1 and L2 must not be assigned true in the same context. (19) is called a default rule, in which if all literals in the rule body are assigned true, then the rule head is assigned true unless it causes a conﬂict or destroys an integrity constraints. Example: r1(X) :: f ly(X) ⇐ bird(X) ∧ not baby(X). r2(X) :: ¬f ly(X) ⇐ penguin(X). r3(X) :: bird(X) ← penguin(X). f 1 :: bird(a). f 2 :: penguin(a). f 3 :: baby(b). In this example, r1(X) can derive f ly(a), that is inconsistent with ¬f ly(a) derived from r2(X). Since r1(X) and r2(X) are represented with default rules, we cannot conclude whether a ﬂies or a does not ﬂy. If r2(X), however, were deﬁned as a more speciﬁc rule than r1(X), that is, r2(X) is preferred to r1(X), ¬f ly(a) could defeat f ly(a). In order to realize such reasoning about rule preference, we introduce another form of literal representation: R1 < R2 which means “rule R2 is preferred to R1 ” (where R1 and R2 are rule identiﬁers with arguments). For example, the following rule represents that r2(X) is preferred to r1(X) when X is a bird: r4(X) :: r1(X) < r2(X) ← bird(X). If we recognize it as a default rule, we can replace ← with ⇐. The rule preference deﬁned as above is called dynamic in the sense that the preference is determined according to its arguments. Semantics of the Rule Preference. A lot of semantics for a rule preference structure have been proposed: introducing the predicate preference relation into circumscription [35,17], introducing the rule preference relation into the default theory [4,8,1,5,6], using literal preference relation [7,48], deﬁning its semantics as translation rules [33,47]. Among these, our system adopted the approach presented in [33], because it can be easily applied to legal reasoning and is easy to translate into MGTP input clauses. Translation into the MGTP Input Clauses. Assume we have the default rule as: R1 :: L10 ⇐ L11 ∧ . . . , L1m ∧ not L1m+1 ∧ . . . ∧ not L1n .

A Model Generation Based Theorem Prover MGTP for First-Order Logic

207

If we have the following default rule: R2 :: L20 ⇐ L21 ∧ . . . ∧ L2k ∧ not L2k+1 ∧ . . . ∧ not L2q . then R1 is translated to: L10 ← L11 ∧ . . . ∧ L1m ∧ not L1m+1 ∧ . . . ∧ not L1n ∧ not def eated(R1 ). This translation shows the interpretation of our default rules, that is, the rule head can be derived if the rule body is satisﬁed and there is no proof that R1 can be defeated. The predicate def eated is newly introduced and deﬁned as the following rules: def eated(R2 θ) ← L11 θ ∧ . . . ∧ L1m θ ∧ not L1m+1 θ ∧ . . . ∧ not L1n θ ∧ not def eated(R1 θ)∧ L21 θ ∧ . . . ∧ L2k θ ∧ not L2k+1 θ ∧ . . . ∧ not L2q θ ∧ not R1 θ < R2 θ. where θ is a most general uniﬁer that satisﬁes the following condition: There exists the unifier θ such that L10 θ = ¬L20 θ, or there exists the unifier θ such that for some integrity constraint ← L1 ∧ L2 , L1 θ = L10 θ and L2 θ = L20 θ, or L2 θ = L10 θ and L1 θ = L20 θ. In this way, default rules with rule preference relations are translated to the rule with NAF, The deduction process in MGTP for those rule set is based on [29]. Introducing Modal Operator. For each NAF literal in a rule, a modal operator K is introduced. If we have the following clause: Al ← Al+1 ∧ . . . ∧ Am ∧ not Am+1 ∧ . . . ∧ not An then we translate it with modal operators into: Al+1 ∧ . . . ∧ Am → (−KAm+1 ∧ . . . ∧ −KAn , Al ) ∨ KAm+1 ∨ . . . ∨ KAn In addition, we provide the integrity constraint for K such as P ∧ ¬KP →, which enables MGTP to derive the stable models for the given input clauses. These integrity constraints are built in the MGTP deduction process with slight modiﬁcation. Extracting Stable Models. The derived models from MGTP contain not only all possible stable models but also the models which are constructed only by hypotheses. A stable model must satisfy the following condition called T-condition. T-condition is a criteria to extract ﬁnal stable models from the derived models from MGTP. T-Condition. If KP ∈ M , then P ∈ M . If the proof structure included in a stable model also occurs in all the other stable models, we call it a justified argument, otherwise a plausible argument. Justiﬁed arguments are sound for any attacks against them, while plausible arguments are not sound for some attacks, that is, they might be attacked by some arguments and cause a conﬂict.

208

Ryuzo Hasegawa et al.

Fig. 17. The interface window in the argumentation support system

System and Experiments. We have developed an argumentation support system [27] including the legal reasoning system by MGTP. The system is written in Java and works on each client machine which is connected with other client via a TCP/IP network. Each participant (including parties concerned and a judge if needed) makes argument diagrams according to his/her own assertion by hand or sometimes automatically, and sends them to all others. Figure 17 shows an example of argument diagrams on the user interface window. The system maintains the current status of each node, that is, agreed by all, disagreed by someone, attacked by some nodes or attacking some nodes, etc. Based on these status, the judge, if necessary, intervenes their arguments and undertakes mediation. As an experiment, we implemented a part of Japanese civil law on the system. More than 10 legal experts used the system, investigated the arguments which were automatically derived from the legal reasoning system, and had high opinions of the ability about: representation of the extended logic programming language, negotiation protocol adopted, and eﬃciency of reasoning.

6

Conclusion

We have reviewed research and development of the model generation theorem prover MGTP, including our recent activities around it. MGTP is one of successful application systems developed at the FGCS project. MGTP achieved more than a 200-fold speedup on a PIM/m consisting of 256 PEs for many theorem proving benchmarks. By using parallel MGTP systems, we succeeded in solving some hard mathematical problems such as

A Model Generation Based Theorem Prover MGTP for First-Order Logic

209

condensed detachment problems and quasigroup existence problems in ﬁnite algebra. In the current parallel implementation, however, we have to properly use an AND parallel MGTP for Horn problems and an OR parallel MGTP for nonHorn problems separately. Thus, it is necessary to develop a parallel version of MGTP which can combine AND- and OR-parallelization for proving a set of general clauses. In addition, when running MGTP (written in Klic [14]) on other commercial parallel computers, it is diﬃcult for them to attain such a good parallel performance as PIM for problems that require ﬁne-grain concurrency. At present, the N-sequential method to exploit coarse-grain concurrency with low communication costs would be a practical solution for this. Recent results with Java versions of MGTP (JavaMGTP) shows several tens fold speedup compared to Klic versions. This achievement is largely due to the new A-cell mechanism for handling multiple contexts and several language facilities of Java including destructive assignment to variables. To enhance the MGTP’s pruning ability, we extended the MGTP features in several ways. NHM is a key technology for making MGTP practical and applicable to several applications such as disjunctive databases and abductive reasoning. The essence of the NHM method is to simulate a top-down evaluation in a framework of bottom-up computation by static clause transformation to propagate goal (negative) information, thereby pruning search spaces. This propagation is closely related to the technique developed in CMGTP to manipulate (negative) constraints. Thus, further research is needed to clarify whether the NHM method can be incorporated to CMGTP or its extended version, IVMGTP. It is also important in real applications that MGTP avoids duplicating the same subproofs and generating nonminimal models. The proof simpliﬁcation based on dependency analysis is a technique to embed both folding-up and proof condensation in a model generation framework, and has a similar eﬀect to NHM. Although the proof simpliﬁcation is weaker than NHM in the sense that relevancy testing is performed after a model extension occurs, it is compensated by the folding-up function embedded. Incorporating this method into a minimal model generation prover MM-MGTP would enhance its pruning ability furthermore. Lastly, we have shown that the feature of negation as failure, which is a most important invention in logic programming, can be easily implemented on MGTP, and have presented a legal reasoning system employing the feature. The basic idea behind this is to translate formulas with special properties, such as non-monotonicity and modality, into ﬁrst order clauses on which MGTP works as a meta-interpreter. The manipulation of these properties is thus reduced to generate-and-test problems for model candidates. These can then be handled by the MGTP very eﬃciently through case-splitting of disjunctive consequences and rejection of inconsistent model candidates. A family of MGTP systems is available at http://ss104.is.kyushu-u.ac. jp/software/.

210

Ryuzo Hasegawa et al.

Acknowledgment We would like to thank Prof. Kazuhiro Fuchi of Keio University, the then director of ICOT, and Prof. Koichi Furukawa of Keio University, the then deputy director of ICOT, who have given us continuous support and helpful comments during the Fifth Generation Computer Systems Project. Thanks are also due to members of the MGTP research group including Associate Prof. Katsumi Inoue of Kobe University and Prof. Katsumi Nitta of Tokyo Institute of Technology for their fruitful discussions and cooperation.

References 1. Franz Baader and Bernhard Hollunder. How to prefer more speciﬁc defaults in terminological default logic. In Proc. International Joint Conference on Artificial Intelligence, pages 669–674, 1993. 2. Peter Baumgartner, Ulrich Furbach, and Ilkka Niemel¨ a. Hyper Tableaux. In Jos´e J´ ulio Alferes, Lu´ıs Moniz Pereira, and Ewa OrJlowska, editors, Proc. European Workshop: Logics in Artificial Intelligence, JELIA, volume 1126 of Lecture Notes in Artificial Intelligence, pages 1–17. Springer-Verlag, 1996. 3. Frank Bennett. Quasigroup Identities and Mendelsohn Designs. Canadian Journal of Mathematics, 41:341–368, 1989. 4. Gerhard Brewka. Preferred subtheories : An extended logical framework for default reasoning. In Proc. International Joint Conference on Artificial Intelligence, pages 1043–1048, Detroit, MI, USA, 1989. 5. Gerhard Brewka. Adding priorities and speciﬁcity to default logic . In Proc. JELIA 94, pages 247–260, 1994. 6. Gerhard Brewka. Reasoning about priorities in default logic. In Proc. AAAI 94, pages 940–945, 1994. 7. Gerhard Brewka. Well-founded semantics for extended logic programs with dynamic preference. Journal of Artificial Intelligence Research, 4:19–36, 1996. 8. Gerhard Brewka and Thomas F. Gordon. How to Buy a Porsche: An Approach to defeasible decision making. In Proc. AAA94 workshop on Computational Dialectics, 1994. 9. Fran¸cois Bry. Query evaluation in recursive databases: bottom-up and top-down reconciled. Data & Knowledge Engineering, 5:289–312, 1990. 10. Fran¸cois Bry and Adnan Yahya. Minimal Model Generation with Positive Unit Hyper-Resolution Tableaux. In Proc. 5th International Workshop, TABLEAUX’96, volume 1071 of Lecture Notes in Artificial Intelligence, pages 143–159, Terrasini, Palermo, Italy, May 1996. Springer-Verlag. 11. Hiroshi Fujita and Ryuzo Hasegawa. A Model-Generation Theorem Prover in KL1 Using Ramiﬁed Stack Algorithm. In Proc. 8th International Conference on Logic Programming, pages 535–548. The MIT Press, 1991. 12. Masayuki Fujita, Ryuzo Hasegawa, Miyuki Koshimura, and Hiroshi Fujita. Model Generation Theorem Provers on a Parallel Inference Machine. In Proc. International Conference on Fifth Generation Computer Systems, volume 1, pages 357– 375, Tokyo, Japan, June 1992. 13. Masayuki Fujita, John Slaney, and Frank Bennett. Automatic Generation of Some Results in Finite Algebra. In Proc. International Joint Conference on Artificial Intelligence, 1993.

A Model Generation Based Theorem Prover MGTP for First-Order Logic

211

14. Tetsuro Fujita, Takashi Chikayama, Kazuaki Rokuwasa, and Akihiko Nakase. KLIC: A Portable Implementation of KL1. In Proc. International Conference on Fifth Generation Computer Systems, pages 66–79, Tokyo, Japan, December 1994. 15. Michael Gelfond and Vladimir Lifschitz. The Stable Model Semantics for Logic Programming. In Proc. 5th International Conference and Symposium on Logic Programming, pages 1070–1080. MIT Press, 1988. 16. Michael Gelfond and Vladimir Lifschitz. Classical Negation in Logic Programs and Disjunctive Databases. New Generation Computing, 9:365–385, 1991. 17. Benjamin Grosof. Generalization Prioritization. In Proc. 2nd Conference on Knowledge Representation and Reasoning, pages 289–300, 1991. 18. Reiner H¨ ahnle. Tableaux and related methods. In Alan Robinson and Andrei Voronkov, editors, Handbook of Automated Reasoning, volume I. North-Holland, 2001. 19. Reiner H¨ ahnle, Ryuzo Hasegawa, and Yasuyuki Shirai. Model Generation Theorem Proving with Finite Interval Constraints. In Proc. First International Conference on Computational Logic (CL2000), 2000. 20. Ryuzo Hasegawa and Hiroshi Fujita. Implementing a Model-Generation Based Theorem Prover MGTP in Java. Research Reports on Information Science and Electrical Engineering, 3(1):63–68, 1998. 21. Ryuzo Hasegawa and Hiroshi Fujita. A new Implementation Technique for a ModelGeneration Theorem Prover to Solve Constraint Satisfaction Problems. Research Reports on Information Science and Electrical Engineering, 4(1):57–62, 1999. 22. Ryuzo Hasegawa, Hiroshi Fujita, and Miyuki Koshimura. MGTP: A Parallel Theorem-Proving System Based on Model Generation. In Proc. 11th International Conference on Applications of Prolog, pages 34–41, Tokyo, Japan, September 1998. 23. Ryuzo Hasegawa, Hiroshi Fujita, and Miyuki Koshimura. Eﬃcient Minimal Model Generation Using Branching Lemmas. In Proc. 17th International Conference on Automated Deduction, volume 1831 of Lecture Notes in Artificial Intelligence, pages 184–199, Pittsburgh, Pennsylvania, USA, June 2000. Springer-Verlag. 24. Ryuzo Hasegawa, Katsumi Inoue, Yoshihiko Ohta, and Miyuki Koshimura. NonHorn Magic Sets to Incorporate Top-down Inference into Bottom-up Theorem Proving. In Proc. 14th International Conference on Automated Deduction, volume 1249 of Lecture Notes in Artificial Intelligence, pages 176–190, Townsville, North Queensland, Australia, July 1997. Springer-Verlag. 25. Ryuzo Hasegawa and Miyuki Koshimura. An AND Parallelization Method for MGTP and Its Evaluation. In Proc. First International Symposium on Parallel Symbolic Computation, Lecture Notes Series on Computing, pages 194–203. World Scientiﬁc, September 1994. 26. Ryuzo Hasegawa, Miyuki Koshimura, and Hiroshi Fujita. Lazy Model Generation for Improving the Eﬃciency of Forward Reasoning Theorem Provers. In Proc. International Workshop on Automated Reasoning, pages 221–238, Beijing, China, July 1992. 27. Ryuzo Hasegawa, Katsumi Nitta, and Yasuyuki Shirai. The Development of an Argumentation Support System Using Theorem Proving Technologies. In Research Report on Advanced Software Enrichment Program 1997, pages 59–66. Information Promotion Agency, Japan, 1999. (in Japanese). 28. Ryuzo Hasegawa and Yasuyuki Shirai. Constraint Propagation of CP and CMGTP: Experiments on Quasigroup Problems. In Proc. Workshop 1C (Automated Reasoning in Algebra), CADE-12, Nancy, France, 1994.

212

Ryuzo Hasegawa et al.

29. Katsumi Inoue, Miyuki Koshimura, and Ryuzo Hasegawa. Embedding Negation as Failure into a Model Generation Theorem Prover. In Proc. 11th International Conference on Automated Deduction, volume 607 of Lecture Notes in Artificial Intelligence, pages 400–415, Saratoga Springs, NY, USA, 1992. Springer-Verlag. 30. Katsumi Inoue, Yoshihiko Ohta, Ryuzo Hasegawa, and Makoto Nakashima. Bottom-Up Abduction by Model Generation. In Proc. International Joint Conference on Artificial Intelligence, pages 102–108, 1993. 31. Miyuki Koshimura and Ryuzo Hasegawa. Modal Propositional Tableaux in a Model Generation Theorem Prover. In Proc. 3rd Workshop on Theorem Proving with Analytic Tableaux and Related Methods, pages 145–151, May 1994. 32. Miyuki Koshimura and Ryuzo Hasegawa. Proof Simpliﬁcation for Model Generation and Its Applications. In Proc. 7th International Conference, LPAR 2000, volume 1955 of Lecture Notes in Artificial Intelligence, pages 96–113. SpringerVerlag, November 2000. 33. Robert A. Kowalski and Francesca Toni. Abstract Argumentation. Artificial Intelligence and Law Journal, 4:275–296, 1996. 34. Reinhold Letz, Klaus Mayr, and Christoph Goller. Controlled Integration of the Cut Rule into Connection Tableau Calculi. Journal of Automated Reasoning, 13:297–337, 1994. 35. Vladimir Lifschitz. Computing Circumscription. In Proc. International Joint Conference on Artificial Intelligence, pages 121–127, Los Angeles, CA, USA, 1985. 36. Donald W. Loveland, David W. Reed, and Debra S. Wilson. Satchmore: Satchmo with RElevancy. Journal of Automated Reasoning, 14(2):325–351, April 1995. 37. James J. Lu. Logic Programming with Signs and Annotations. Journal of Logic and Computation, 6(6):755–778, 1996. 38. Rainer Manthey and Fran¸oois Bry. SATCHMO: a theorem prover implemented in Prolog. In Proc. 9th International Conference on Automated Deduction, volume 310 of Lecture Notes in Computer Science, pages 415–434, Argonne, Illinois, USA, May 1988. Springer-Verlag. 39. William McCune and Larry Wos. Experiments in Automated Deduction with Condensed Detachment. In Proc. 11th International Conference on Automated Deduction, volume 607 of Lecture Notes in Artificial Intelligence, pages 209–223, Saratoga Springs, NY, USA, 1992. Springer-Verlag. 40. Jack Minker. On indeﬁnite databases and the closed world assumption. In Proc. 6th International Conference on Automated Deduction, volume 138 of Lecture Notes in Computer Science, pages 292–308, Courant Institute, USA, 1982. Springer-Verlag. 41. Ugo Montanari and Francesca Rossi. Finite Domain Constraint Solving and Constraint Logic Programming. In Constraint Logic Programming: Selected Research, pages 201–221. The MIT press, 1993. 42. Hiroshi Nakashima, Katsuto Nakajima, Seiichi Kondo, Yasutaka Takeda, Y¯ u Inamura, Satoshi Onishi, and Kanae Matsuda. Architecture and Implementation of PIM/m. In Proc. International Conference on Fifth Generation Computer Systems, volume 1, pages 425–435, Tokyo, Japan, June 1992. 43. Ilkka Niemel¨ a. A Tableau Calculus for Minimal Model Reasoning. In Proc. 5th International Workshop, TABLEAUX’96, volume 1071 of Lecture Notes in Artificial Intelligence, pages 278–294, Terrasini, Palermo, Italy, May 1996. Springer-Verlag. 44. Katsumi Nitta, Yoshihisa Ohtake, Shigeru Maeda, Masayuki Ono, Hiroshi Ohsaki, and Kiyokazu Sakane. HELIC-II: A Legal Reasoning System on the Parallel Inference Machine. In Proc. International Conference on Fifth Generation Computer Systems, volume 2, pages 1115–1124, Tokyo, Japan, June 1992.

A Model Generation Based Theorem Prover MGTP for First-Order Logic

213

45. Yoshihiko Ohta, Katsumi Inoue, and Ryuzo Hasegawa. On the Relationship Between Non-Horn Magic Sets and Relevancy Testing. In Proc. 15th International Conference on Automated Deduction, volume 1421 of Lecture Notes in Artificial Intelligence, pages 333–349, Lindau, Germany, July 1998. Springer-Verlag. 46. Franz Oppacher and E. Suen. HARP: A Tableau-Based Theorem Prover. Journal of Automated Reasoning, 4:69–100, 1988. 47. Henry Prakken and Giovanni Sartor. Argument-based Extended Logic Programming with Defeasible Priorities. Journal of Applied Non-Classical Logics, 7:25–75, 1997. 48. Chiaki Sakama and Katsumi Inoue. Representing Priorities in Logic Programs. In Proc. International Conference and Symposium on Logic Programming, pages 82–96, 1996. 49. Heribert Sch¨ utz and Tim Geisler. Eﬃcient Model Generation through Compilation. In Proc. 13th International Conference on Automated Deduction, volume 1104 of Lecture Notes in Artificial Intelligence, pages 433–447. Springer-Verlag, 1996. 50. Yasuyuki Shirai and Ryuzo Hasegawa. Two Approaches for Finite-domain Constraint Satisfaction Problem - CP and MGTP -. In Proc. 12th International Conference on Logic Programming, pages 249–263. MIT Press, 1995. 51. Mark Stickel. The Path-Indexing Method For Indexing Terms. Technical Note 473, AI Center, SRI, 1989. 52. Mark E. Stickel. Upside-Down Meta-Interpretation of the Model Elimination Theorem-Proving Procedure for Deduction and Abduction. Journal of Automated Reasoning, 13(2):189–210, October 1994. 53. Geoﬀ Sutcliﬀe, Christian Suttner, and Theodor Yemenis. The TPTP Problem Library. In Proc. 12th International Conference on Automated Deduction, volume 814 of Lecture Notes in Artificial Intelligence, pages 252–266, Nancy, France, 1994. Springer-Verlag. 54. Evan Tick and Miyuki Koshimura. Static Mode Analyses of Concurrent Logic Programs. Journal of Programming Languages, 2:283–312, 1994. 55. Kazunori Ueda and Takashi Chikayama. Design of the Kernel Language for the Parallel Inference Machine. Computer Journal, 33:494–500, December 1990. 56. Debra S. Wilson and Donald W. Loveland. Incorporating Relevancy Testing in SATCHMO. Technical Reports CS-1989-24, Department of Computer Science, Duke University, Durham, North Carolina, USA, 1989.

A ‘Theory’ Mechanism for a Proof-Verifier Based on First-Order Set Theory Eugenio G. Omodeo1 and Jacob T. Schwartz2 1

2

University of L’Aquila, Dipartimento di Informatica [email protected] University of New York, Department of Computer Science, Courant Institute of Mathematical Sciences [email protected]

We often need to associate some highly compound meaning with a symbol. Such a symbol serves us as a kind of container carrying this meaning, always with the understanding that it can be opened if we need its content. (Translated from [12, pp. 101–102])

Abstract. We propose classical set theory as the core of an automated proof-veriﬁer and outline a version of it, designed to assist in proof development, which is indeﬁnitely expansible with function symbols generated by Skolemization and embodies a modularization mechanism named ‘theory’. Through several examples, centered on the ﬁnite summation operation, we illustrate the potential utility in large-scale proof-development of the ‘theory’ mechanism: utility which stems in part from the power of the underlying set theory and in part from Skolemization.

Key words: Proof-veriﬁcation technology, set theory, proof modularization.

1

Introduction

Set theory is highly versatile and possesses great expressive power. One can readily ﬁnd terse set-theoretic equivalents of established mathematical notions and express theorems in purely set-theoretic terms. Checking any deep fact (say the Cauchy integral theorem) using a proofveriﬁer requires a large number of logical statements to be fed into the system. These must formalize a line of reasoning that leads from bare set rudiments to the specialized topic of interest (say, functional analysis) and then to a target theorem. Such an enterprise can only be managed eﬀectively if suitable modularization constructs are available.

E.G. Omodeo enjoyed a Short-term mobility grant of the Italian National Research Council (CNR) enabling him to stay at the University of New York during the preparation of this work.

A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 214–230, 2002. c Springer-Verlag Berlin Heidelberg 2002

A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory

215

This paper outlines a version of the Zermelo-Fraenkel theory designed to assist in automated proof-veriﬁcation of mathematical theorems. This system incorporates a technical notion of “theory” designed, for large-scale proof-development, to play a role similar to the notion of object class in large-scale programming. Such a mechanism can be very useful for “proof-engineering”. The theories we propose, like procedures in a programming language, have lists of formal parameters. Each “theory” requires its parameters to meet a set of assumptions. When “applied” to a list of actual parameters that have been shown to meet the assumptions, a theory will instantiate several additional “output” set, predicate, and function symbols, and then supply a list of theorems initially proved explicitly by the user inside the theory itself. These theorems will generally involve the new symbols. Such use of “theories” and their application adds a touch of second-order logic capability to the ﬁrst-order system which we describe. Since set theory has full multi-tier power, this should be all the second-order capability that is needed. We illustrate the usefulness of the proposed theory notion via examples ranging from mere “utilities” (e.g. the speciﬁcation of ordered pairs and associated projections, and the thinning of a binary predicate into a global single-valued map) to an example which characterizes a very ﬂexible recursive deﬁnition scheme. As an application of this latter scheme, we outline a proof that a ﬁnite summation operation which is insensitive to operand rearrangement and grouping can be associated with any commutative-associative operation. This is an intuitively obvious fact (seldom, if ever, proved explicitly in algebra texts), but nevertheless it must be veriﬁed in a fully formalized context. Even this task can become unnecessarily challenging without an appropriate set-theoretic support, or without the ability to indeﬁnitely extend the formal language with new Skolem symbols such as those resulting from “theory” invocations. Our provisional assessment of the number of “proofware” lines necessary to reach the Cauchy integral theorem in a system like the one which we outline is 20–30 thousand statements.

2

Set Theory as the Core of a Proof-Verifier

A fully satisfactory formal logical system should be able to digest ‘the whole of mathematics’, as this develops by progressive extension of mathematics-like reasoning to new domains of thought. To avoid continual reworking of foundations, one wants the formal system taken as basic to remain unchanged, or at any rate to change only by extension as such eﬀorts progress. In any fundamentally new area work and language will initially be controlled more by guiding intuitions than by entirely precise formal rules, as when Euclid and his predecessors ﬁrst realized that the intuitive properties of geometric ﬁgures in 2 and 3 dimensions, and also some familiar properties of whole numbers, could be covered by modes of reasoning more precise than those used in everyday life. But mathematical developments during the last two centuries have reduced the intuitive

216

Eugenio G. Omodeo and Jacob T. Schwartz

content of geometry, arithmetic, and calculus (‘analysis’) in set-theoretic terms. The geometric notion of ‘space’ maps into ‘set of all pairs (or triples) of real numbers’, allowing consideration of the ‘set of all n-tuples of real numbers’ as ‘n-dimensional space’, and of more general related constructs as ‘inﬁnite dimensional’ and ‘functional’ spaces. The ‘ﬁgures’ originally studied in geometry map, via the ‘locus’ concept, into sets of such pairs, triples, etc. Dedekind reduced ‘real number x’ to ‘set x of rational numbers, bounded above, such that every rational not in x is larger than every rational in x’. To eliminate everything but set theory from the formal foundations of mathematics, it only remained (since ‘fractions’ can be seen as pairs of numbers) to reduce the notion of ‘integer’ to set-theoretic terms. This was done by Cantor and Frege: an integer is the class of all ﬁnite sets in 1-1 correspondence with any one such set. Subsequently Kolmogorov modeled ‘random’ variables as functions deﬁned on an implicit settheoretic measure space, and Laurent Schwartz interpreted the initially puzzling ‘delta functions’ in terms of a broader notion of generalized function systematically deﬁned in set-theoretic terms. So all of these concepts can be digested without forcing any adjustment of the set-theoretic foundation constructed for arithmetic, analysis, and geometry. This foundation also supports all the more abstract mathematical constructions elaborated in such 20th century ﬁelds as topology, abstract algebra, and category theory. Indeed, these were expressed settheoretically from their inception. So (if we ignore a few ongoing explorations whose signiﬁcance remains to be determined) set theory currently stands as a comfortable and universal basis for the whole of mathematics—cf. [5]. It can even be said that set theory captures a set of reality-derived intuitions more fundamental than such basic mathematical ideas as that of number. Arithmetic would be very diﬀerent if the real-world process of counting did not return the same result each time a set of objects was counted, or if a subset of a ﬁnite set S of objects proved to have a larger count than S. So, even though Peano showed how to characterize the integers and derive many of their properties using axioms free of any explicit set-theoretic content, his approach robs the integers of much of their intuitive signiﬁcance, since in his reduced context they cannot be used to count anything. For this and the other reasons listed above, we prefer to work with a thoroughly set-theoretic formalism, contrived to mimic the language and procedures of standard mathematics closely.

3

Set Theory in a Nutshell

Set theory is based on the handful of very powerful ideas summarized below. All notions and notation are more or less standard (cf. [16]).1 – The dyadic Boolean operations ∩, \, ∪ are available, and there is a null set, ∅, devoid of elements. The membership relation ∈ is available, and set nesting is 1

As a notational convenience, we usually omit writing universal quantiﬁers at the beginning of a sentence, denoting the variables which are ruled by these understood quantiﬁers by single uppercase Italic letters.

A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory

217

made possible via the singleton operation X → {X}. Derived from this, we have single-element addition and removal, and useful increment/decrement operations: X with Y := X ∪ {Y }, X less Y := X \ {Y }, next(X) := X with X. Unordered lists {t1 , . . . , tn } and ordered tuples [t1 , . . . , tn ] are deﬁnable too: in particular, {X1 , . . . , Xn } := {X1 } ∪ · · · ∪ {Xn }. – ‘Sets whose elements are the same are identical’: Following a step = r in a proof, one can introduce a new constant b subject to the condition b ∈ ↔ b ∈ / r; no subsequent conclusions where b does not appear will depend on this condition. Negated set inclusion ⊆ can be treated similarly, since X ⊆ Y := X \ Y = ∅. – Global choice: We use an operation arb which, from any non-null set X, deterministically extracts an element which does not intersect X. Assuming arb ∅ = ∅ for deﬁniteness, this means that arb X ∈ next(X) & X ∩ arb X = ∅ for all X. – Set-formation: By (possibly transﬁnite) element- or subset-iteration over the sets represented by the terms t0 , t1 ≡ t1 (x0 ), ..., tn ≡ tn (x0 , ..., xn−1 ), we can form the set { e : x0 C0 t0 , x1 C1 t1 , . . . , xn Cn tn | ϕ } , where each Ci is either ∈ or ⊆, and where e ≡ e(x0 , . . . , xn ) and ϕ ≡ ϕ(x0 , . . . , xn ) are a set-term and a condition in which the p.w. distinct variables xi can occur free (similarly, each tj+1 may involve x0 , . . . , xj ). Many operations are readily deﬁnable using setformers, e.g. Y := { x2 : x1 ∈ Y, x2 ∈ x1 } , Y × Z := { [x1 , x2 ] : x1 ∈ Y, x2 ∈ Z } , (Y ) := { x : x ⊆ Y } , pred(X) := arb { y ∈ X | next(y) = X } ,

P

where if the condition ϕ is omitted it is understood to be true, and if the term e is omitted it is understood to be the same as the ﬁrst variable inside the braces. – ∈-recursion: (“Transﬁnite”) recursion over the elements of any set allows one to introduce global set operations; e.g., Ult membs(S) := S ∪ { Ult membs(x) : x ∈ S } and rank(S) := { next( rank(x) ) : x ∈ S } , which respectively give the set of all “ultimate members” (i.e. elements, elements of elements, etc.) of S and the maximum “depth of nesting” of sets inside S. – ‘Infinite sets exist’: There is at least one s inf satisfying s inf = ∅ & (∀ x ∈ s inf)({x} ∈ s inf) , so that the p.w. distinct elements b, {b}, {{b}}, {{{b}}}, . . . belong to s inf for each b in s inf.

218

Eugenio G. Omodeo and Jacob T. Schwartz

The historical controversies concerning the choice and replacement axioms of set theory are all hidden in our use of setformers and in our ability, after a statement of the form ∃ y ψ(X1 , . . . , Xn , y) has been proved, to introduce a Skolem function f (X1 , . . . , Xn ) satisfying the condition ψ ( X1 , . . . , Xn , f (X1 , . . . , Xn ) ). In particular, combined use of arb and of the setformer construct lets us write the choice set of any set X of non-null pairwise disjoint sets simply as { arb y : y ∈ X }.2 To appreciate the power of the above formal language, consider von Neumann’s elegant deﬁnition of the predicate ‘X is a (possibly transﬁnite) ordinal’, and the characterization of R , the set of real numbers, as the set of Dedekind cuts (cf. [17]):

P

Ord(X) := X ⊆ (X) & (∀ y, z ∈ X)(y ∈ z ∨ y = z ∨ z ∈ y) , R := { c ⊆ Q | (∀ y ∈ c)(∃ z ∈ c)(y < z) & (∀ y ∈ c)(∀ z ∈ Q )(z < y → z ∈ c) } \ {∅, Q }; here the ordered ﬁeld before R .3

4

Q , < of rational numbers is assumed to have been deﬁned

Theories in Action: First Examples

Here is one of the most obvious theories one can think of: THEORY ordered pair() ==>(opair, car, cdr) car( opair(X, Y ) ) = X cdr( opair(X, Y ) ) = Y opair(X, Y ) = opair(U, V ) → X = U & Y = V END ordered pair. This THEORY has no input parameters and no assumptions, and returns three global functions: a pairing function and its projections. To start its construction, the user simply has to SUPPOSE THEORY ordered pair() ==> END ordered pair, then to ENTER THEORY ordered pair, and next to deﬁne e.g. opair(X, Y ) := { {X}, { {X}, {Y, {Y }} } } , car(P ) := arb arb P , cdr(P ) := car( arb (P \ {arb P }) \ {arb P } ) . 2 3

Cf. [18, p. 177]. Even in the more basic framework of ﬁrst-order predicate calculus, the availability of choice constructs can be highly desirable, cf. [1]. For an alternative deﬁnition of real numbers which works very well too, see E.A. Bishop’s adaptation of Cauchy’s construction of R in [2, pp. 291–297].

A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory

219

This makes it possible to prove such intermediate lemmas as arb {U } = U , V ∈ Z → arb {V, Z} = V , car( { {X}, { {X}, W } } ) = X , arb opair(X, Y ) = {X} , cdr( opair(X, Y ) ) = car( { { Y, {Y } } } ) = Y . Once these intermediate results have been used to prove the three theorems listed earlier, the user can indicate that they are the ones he wants to be externally visible, and that the return-parameter list consists of opair, car, cdr (the detailed deﬁnitions of these symbols, as well as the intermediate lemmas, have hardly any signiﬁcance outside the THEORY itself4 ). Then, after re-entering the main THEORY, which is set theory, the user can APPLY(opair, head, tail) ordered pair() ==> head( opair(X, Y ) ) = X tail( opair(X, Y ) ) = Y opair(X, Y ) = opair(U, V ) → X = U & Y = V , thus importing the three theorems into the main proof level. As written, this application also changes the designations ‘car’ and ‘cdr’ into ‘head’ and ‘tail’. Fig.1 shows how to take advantage of the functions just introduced to deﬁne notions related to maps that will be needed later on.5 is map(F ) := F = {[head(x), tail(x)] : x ∈ F } Svm(F ) := is map(F ) & (∀ x, y ∈ F )( head(x) = head(y) → x = y ) 1 1 map(F ) := Svm(F ) & (∀ x, y ∈ F )( tail(x) = tail(y) → x = y ) F −1 := {[tail(x), head(x)] : x ∈ F } domain(F ) := {head(x) : x ∈ F } range(F ) := {tail(x) : x ∈ F } F {X} := { y ∈ range(F ) | [X, y] ∈ F } F|S := F ∩ ( S × range(F ) ) Finite(S) := ¬ ∃ f ( 1 1 map(f ) & S = domain(f ) = range(f ) ⊆ S )

Fig. 1. Notions related to maps, single-valued maps, and 1-1 maps

For another simple example, suppose that the theory THEORY setformer0(e, s, p) ==> s = ∅ → { e(x) : x ∈ s } = ∅ { x ∈ s | p(x) } = ∅ → { e(x) : x ∈ s | p(x) } = ∅ END setformer0 4 5

A similar remark on Kuratowski’s encoding of an ordered pair as a set of the form {{x, y}, {x}} is made in [14, pp. 50–51]. We subsequently return to the notation [X, Y ] for opair(X, Y ).

220

Eugenio G. Omodeo and Jacob T. Schwartz

has been proved, but that its user subsequently realizes that the reverse implications could be helpful too; and that the formulae s ⊆ T → { e(x) : x ∈ s | p(x) } ⊆ { e(x) : x ∈ T | p(x) } , s ⊆ T & (∀ x ∈ T \ s)¬ p(x) → { e(x) : x ∈ s | p(x) } = { e(x) : x ∈ T | p(x) } are also needed. He can then re-enter the THEORY setformer0, strengthen the implications already proved into bi-implications, and add the new results: of course he must then supply proofs of the new facts. Our next sample THEORY receives as input a predicate P ≡ P(X, V ) and an “exception” function xcp ≡ xcp(X); it returns a global function img ≡ img(X) which, when possible, associates with its argument X some Y such that P(X, Y ) holds, and otherwise associates with X the “ﬁctitious” image xcp(X). The THEORY has an assumption, intended to guarantee non-ambiguity of the ﬁctitious value: THEORY fcn from pred(P, xcp) ¬ P( X, xcp(X) ) -- convenient “guard” ==>(img) img(X) = xcp(X) ↔ ∃ v P(X, v) P(X, V ) → P( X, img(X) ) END fcn from pred. To construct this THEORY from its assumption, the user can simply deﬁne img(X) := if P( X, try(X) ) then try(X) else xcp(X) end if , where try results from Skolemization of the valid ﬁrst-order formula ∃ y ∀ v ( P(X, v) → P(X, y) ) , after which the proofs of the theorems of fcn from pred pose no problems. As an easy example of the use of this THEORY, note that it can be invoked in the special form APPLY(img) fcn from pred( P(X, Y ) → Y ∈ X & Q(Y ), xcp(X) → X

)==> · · ·

for any monadic predicate Q (because ∈ is acyclic); without the condition Y ∈ X such an invocation would instead result in an error indication, except in the uninteresting case in which one has proved that ∀ x ¬ Q(x). Here is a slightly more elaborate example of a familiar THEORY: THEORY equivalence classes(s, Eq) (∀ x ∈ s)( Eq(x, x) ) (∀ x, y, z ∈ s)( Eq(x, y) → ( Eq(y, z) ↔ Eq(x, z) ) ) ==>(quot, cl of) -- “quotient”-set and globalized “canonical embedding” (∀ x, y ∈ s)( Eq(x, y) ↔ Eq(y, x) )

A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory

221

(∀ x ∈ s)( cl of(x) ∈ quot ) (∀ b ∈ quot)( arb b ∈ s & cl of(arb b) = b ) (∀ y ∈ s)( Eq(x, y) ↔ cl of(x) = cl of(y) ) END equivalence classes. Suppose that this THEORY has been established, and that N , Z, and the multiplication operation ∗ have been deﬁned already, where N is the set of natural numbers, and Z, intended to be the set of signed integers, is deﬁned (somewhat arbitrarily) as

Z := {[n, m] : n, m ∈ N | n = 0 ∨ m = 0} . Here the position of 0 in a pair serves as a sign indication, and the restriction of ∗ to Z × Z is integer multiplication (but actually, x ∗ y is always deﬁned, whether or not x, y ∈ Z). Then the set Fr of fractions and the set Q of rational numbers can be deﬁned as follows: Fr := { [x, y] : x, y ∈ Z | y = [0, 0] } , Same frac(F, G) := ( head(F ) ∗ tail(G) = tail(F ) ∗ head(G) ), APPLY(Q , Fr to Q ) equivalence classes( s → Fr, Eq(F, G) → Same frac(F, G) )==> · · · Before APPLY can be invoked, one must prove that the restriction of Same frac to Fr meets the THEORY assumptions, i.e. it is an equivalence relation. Then the system will not simply return the two new symbols Q and Fr to Q , but will provide theorems insuring that these represent the standard equivalence-class reduction Fr/Same frac and the canonical embedding of Fr into this quotient. Note as a curiosity —which however hints at the type of hiding implicit in the THEORY mechanism— that a Q satisfying the conclusions of the THEORY is not actually forced to be the standard partition of Fr but can consist of singletons or even of supersets of the equivalence classes (which is harmless).

5

A Final Case Study: Finite Summation

Consider the operation Σ(F ) or, more explicitly,

x∈domain(F )

[x,y]∈F

y

available for any finite map F (and in particular when domain(F ) = d ∈ N , so that x ∈ d amounts to saying that x = 0, 1, . . . , d − 1) such that range(F ) ⊆ abel, where abel is a set on which a given operation + is associative and commutative and has a unit element u. Most of this is captured formally by the following THEORY:

222

Eugenio G. Omodeo and Jacob T. Schwartz

THEORY sigma add(abel, +, u) (∀ x, y ∈ abel)(x+y ∈ abel & -- closure w.r.t. . . . x+y = y+x) -- . . . commutative operation u ∈ abel & (∀ x ∈ abel)(x+u = x) -- designated unit element (∀ x, y, z ∈ abel)( (x+y)+z = x+(y+z) )-- associativity ==>(Σ) -- summation operation Σ(∅) = u & (∀ x ∈ N )(∀ y ∈ abel)( Σ({[x, y]}) = y ) is map(F ) & Finite(F ) & range(F ) ⊆ abel & domain(F ) ⊆ N → Σ(F ) = Σ(F ∩ G) + Σ(F \ G) -- additivity END sigma add. We show below how to construct this THEORY from its assumptions, and how to generalize it into a THEORY gen sigma add in which the condition domain(F ) ⊆ N is dropped, allowing the condition (∀ x ∈ N )(∀ y ∈ abel)( Σ({[x, y]}) = y ) to be simpliﬁed into (∀ y ∈ abel)( Σ({[X, y]}) = y ). After this, we will sketch the proof of a basic property (‘rearrangement of terms’) of this generalized summation operation. 5.1

Existence of a Finite Summation Operation

In order to tackle even the simple sigma add, it is convenient to make use of recursions somewhat diﬀerent (and actually simpler) than the fully general transﬁnite ∈-recursion axiomatically available in our version of set theory. Speciﬁcally, we can write Σ(F ) := if F = ∅ then u else tail(arb F ) + Σ(F less arb F ) end if , which is a sort of “tail recursion” based on set inclusion. To see why such constructions are allowed we can use the fact that strict inclusion is a well-founded relation between ﬁnite sets, and in particular that it is well-founded over { f ⊆ N × abel | Finite(f ) }: this makes the above form of recursive deﬁnition acceptable. In preparing to feed this deﬁnition —or something closely equivalent to it— into our proof-veriﬁer, we can conveniently make a d´etour through the following THEORY (note that in the following formulae Ord(X) designates the predicate ‘X is an ordinal’—see end of Sec.3): THEORY well founded set(s, Lt) (∀t ⊆ s)( t = ∅ → (∃ m ∈ t)(∀u ∈ t)¬ Lt(u, m) ) -- Lt is thereby assumed to be irreflexive and well-founded on s ==>(orden) (∀ x, y ∈ s)( ( Lt(x, y) → ¬ Lt(y, x) ) & ¬ Lt(x, x) ) s ⊆ { orden(y) : y ∈ X } ↔ orden(X) = s orden(X) = s ↔ orden(X) ∈ s Ord(U ) & Ord(V ) & orden(U ) = s = orden(V ) → ( Lt( orden(U ), orden(V ) ) → U ∈ V )

A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory

223

{ u ∈ s : Lt( u, orden(V ) ) } ⊆ { orden(x) : x ∈ V } Ord(U ) & Ord(V ) & orden(U ) = s = orden(V ) & U = V → orden(U ) = orden(V ) ∃ o( Ord(o) & s = { orden(x) : x ∈ o } & 1 1 map( {[x, orden(x)] : x ∈ o} ) ) END well founded set. Within this THEORY and in justiﬁcation of it, orden can be deﬁned in two steps: Minrel(T ) := if ∅ = T ⊆ s then arb { m ∈ T | (∀x ∈ T )¬ Lt(x, m) } else s end if , orden(X) := Minrel( s \ { orden(y) : y ∈ X} ) , after which the proof of the output theorems of the THEORY just described will take approximately one hundred lines. Next we introduce a THEORY of recursion on well-founded sets. Even though the deﬁnition of Σ only requires much less, other kinds of recursive deﬁnition beneﬁt if we provide a generous scheme like the following: THEORY recursive fcn(dom, Lt, a, b, P) (∀t ⊆ dom)( t = ∅ → (∃ m ∈ t)(∀u ∈ t)¬ Lt(u, m) ) -- Lt is thereby assumed to be irreflexive and well-founded on dom ==>(rec) (∀ v ∈ dom)( rec(v) = a( v, { b( v, w, rec(w) ) : w ∈ dom | Lt(w, v) & P( v, w, rec(w) ) } ) ) END recursive fcn. The output symbol rec of this THEORY is easily deﬁnable as follows: G(X) := a( orden(X), { b( orden(X), orden(y), G(y) ) : y ∈ X | Lt( orden(y), orden(X) ) & P( orden(X), orden(y), G(y) ) rec(V ) := G( index of(V ) ) ;

}) ,

here orden results from an invocation of our previous THEORY well founded set, namely APPLY(orden) well founded set( s → dom, Lt(X, Y ) → Lt(X, Y ) )==> · · · ; also, the restriction of index to to dom is assumed to be the local inverse of the function orden. Note that the recursive characterization of rec in the theorem of recursive fcn is thus ultimately justiﬁed in terms of the very general form of ∈-recursion built into our system, as appears from the deﬁnition of G. Since we cannot take it for granted that we have an inverse of orden, a second auxiliary THEORY, invokable as APPLY(index of) bijection( f(X) → orden(X), d → o1, r → dom )==> · · · ,

224

Eugenio G. Omodeo and Jacob T. Schwartz

is useful. Here o1 results from Skolemization of the last theorem in well founded set. The new THEORY used here can be speciﬁed as follows: THEORY bijection(f, d, r) 1 1 map( {[x, f(x)] : x ∈ d} ) & r = { f(x) : x ∈ d } f(X) ∈ r → X ∈ d -- convenient “guard” ==>(finv) Y ∈ r → f ( finv(Y ) ) = Y Y ∈ r → finv(Y ) ∈ d X ∈ d ↔ f(X) ∈ r X ∈ d → finv( f(X) ) = X ( finv(Y ) ∈ d & ∃ x( f(x) = Y ) ) ↔ Y ∈ r d = { finv(y) : y ∈ r } & 1 1 map( {[y, finv(y)] : y ∈ r} ) END bijection. This little digression gives us one more opportunity to show the interplay between theories, because one way of deﬁning finv inside bijection would be as follows: APPLY(finv) fcn from pred( P(Y, X) → f(X) = Y & d = ∅ , e(Y ) → if Y ∈ r then d else arb d end if )==> · · · , where fcn from pred is as shown in Sec.4. We can now recast our ﬁrst-attempt deﬁnition of Σ as APPLY(Σ) recursive fcn( dom → { f ⊆ N × abel | is map(f ) & Finite(f ) } , Lt(W, V ) → W ⊆ V & W = V , a(V, Z) → if V = ∅ then u else tail(arb V ) + arb Z end if , b(V, W, Z) → Z , P(V, W, Z) → W = V less arb V )==> · · · , whose slight intricacy is the price being paid to our earlier decision to keep the recursive deﬁnition scheme very general. We skip the proofs that Σ(∅) = u and (∀ x ∈ N )(∀ y ∈ abel)( Σ({[x, y]}) = y ), which are straightforward. Concerning additivity, assume by absurd hypothesis that f is a ﬁnite map with domain(f) ⊆ N and range(f) ⊆ abel such that Σ(f) = Σ(f ∩ g) + Σ(f \ g) holds for some g, and then use the following tiny but extremely useful THEORY (of induction over the subsets of any ﬁnite set) THEORY finite induction(n, P) Finite(n) & P(n) ==>(m) m ⊆ n & P(m) & (∀ k ⊆ m)( k = m → ¬ P(k) ) END finite induction, to get an inclusion-minimal such map, f0, by performing an

A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory

225

APPLY(f0) finite induction( n → f, P(F ) → ∃ g ( Σ(F ) = Σ(F ∩ g) + Σ(F \ g) ) )==> · · · . Reaching a contradiction from this is very easy. 5.2

Generalized Notion of Finite Summation

Our next goal is to generalize the ﬁnite summation operation Σ(F ) to any ﬁnite map F with range(F ) ⊆ abel. To do this we can use a few basic theorems on ordinals, which can be summarized as follows. Deﬁne min el(T, S) := if S ⊆ T then S else arb (S \ T ) end if , enum(X, S) := min el( { enum(y) : y ∈ X}, S ) , for all sets S, T (a use of ∈-recursion quite similar to the construction used inside the THEORY well founded set!6 ). Then the following enumeration theorem holds: ∃ o ( Ord(o) & S = { enum(x, S) : x ∈ o } & (∀ x, y ∈ o)( x = y → enum(x, S) = enum(y, S) ) ) . From this one gets the function ordin by Skolemization. Using the predicate Finite of Fig.1, and exploiting the inﬁnite set s inf axiomatically available in our version of set theory, we can give the following deﬁnition of natural numbers:

N := arb { x ∈ next( ordin(s inf) ) | ¬ Finite(x) } . These characterizations of Finite and

N yield

X ∈ N ↔ ordin(X) = X & Finite(X) , Finite(X) ↔ ordin(X) ∈ N , Finite(F ) → Finite( domain(F ) ) & Finite( range(F ) ) . Using these results and working inside the THEORY gen sigma add, we can obtain the generalized operation Σ by ﬁrst invoking APPLY(σ) sigma add( abel → abel, + → +, u → u )==> · · · and then deﬁning: Σ(F ) := σ ( { [x, y] : x ∈ ordin( domain(F ) ), y ∈ range(F ) | [ enum( x, domain(F ) ), y ] ∈ F } ) . We omit the proofs that Σ(∅) = u, (∀ y ∈ abel)( Σ({[X, y]}) = y ), and Σ(F ) = Σ(F ∩ G) + Σ(F \ G), which are straightforward. 6

This is more than just an analogy: we could exploit the well-foundedness of ∈ to hide the details of the construction of enum into an invocation of the THEORY well founded set.

226

5.3

Eugenio G. Omodeo and Jacob T. Schwartz

Rearrangement of Terms in Finite Summations

To be most useful, the THEORY of Σ needs to encompass various strong statements of the additivity property. Writing Φ(F ) ≡ is map(F ) & Finite( domain(F ) ) & range(F ) ⊆ abel , Ψ (P, X) ≡ X = P & (∀ b, v ∈ P )(b = v → b ∩ v = ∅) for brevity, much of what is wanted can be speciﬁed e.g. as follows: THEORY gen sigma add(abel, +, u) (∀ x, y ∈ abel)(x+y ∈ abel & -- closure w.r.t. . . . x+y = y+x) -- . . . commutative operation u ∈ abel & (∀ x ∈ abel)(x+u = x) -- designated unit element (∀ x, y, z ∈ abel)( (x+y)+z = x+(y+z) )-- associativity ==>(Σ) -- summation operation Σ(∅) = u & (∀ y ∈ abel)( Σ({[X, y]}) = y ) Φ(F ) → Σ(F ) ∈ abel Φ(F ) → Σ(F ) = Σ(F ∩ G) + Σ(F \ G) -- additivity Φ(F ) & Ψ ( P, F ) → Σ(F ) = Σ ( { [g, Σ(g)] : g ∈ P } ) Φ(F ) & Ψ ( P, domain(F ) ) → Σ(F ) = Σ ( { [b, Σ ( F|b )] : b ∈ P } ) Φ(F ) & Svm(G) & domain(F ) = domain(G) → Σ(F ) = Σ ( { [x, Σ ( F|G−1 {x} )] : x ∈ range(G) } ) END gen sigma add. A proof of the last of these theorems, which states that Σ is insensitive to operand rearrangement and grouping, is sketched below. Generalized additivity is proved ﬁrst: starting with the absurd hypothesis that speciﬁc f, p exist for which Φ(f) & Ψ ( p, f ) & Σ(f) = Σ ( { [g, Σ(g)] : g ∈ p } ) holds, one can choose an inclusion-minimal such p referring to the same f and included in the p chosen at ﬁrst, by an invocation APPLY(p0) finite induction( n → p, P(Q) → Ψ ( Q, f ) & Σ(f) = Σ ( { [g, Σ(g)] : g ∈ Q } ) )==> · · · . From this, a contradiction is easily reached. The next theorem, namely Φ(F ) & Ψ ( P, domain(F ) ) → Σ(F ) = Σ ( { [b, Σ ( F|b )] : b ∈ P

})

follows since Ψ ( P, domain(F ) ) implies Ψ ( {F|b : b ∈ P }, F ) . Proof of the summand rearrangement theorem seen above is now easy, because Svm(G) & D = domain(G) → Ψ ( { G−1 {x} : x ∈ range(G) }, D ) holds for any D and hence in particular for D = domain(F ).

A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory

227

The above line of proof suggests a useful preamble is to construct the following theory of Ψ : THEORY is partition(p, s) ==>(flag) -- this indicates whether or not s is partitioned by p flag ↔ s = p & (∀ b, v)(b = v → b ∩ v = ∅) flag & Finite(s) → Finite(p) flag & s = domain(F ) & Q = { F|b : b ∈ p } → F = Q & (∀ f, g ∈ Q)(f = g → f ∩ g = ∅) Svm(G) & s = domain(G) & p = { G−1 {y} : y ∈ range(G) } → flag END is partition.

6

Related Work

To support software design and speciﬁcation, rapid prototyping, theorem proving, user interface design, and hardware veriﬁcation, various authors have proposed systems embodying constructs for modularization which are, under one respect or another, akin to our THEORY construct. Among such proposals lies the OBJ family of languages [15], which integrates speciﬁcation, prototyping, and veriﬁcation into a system with a single underlying equational logic. In the implementation OBJ3 of OBJ, a module can either be an object or a theory: in either case it will have a set of equations as its body, but an object is executable and has a ﬁxed standard model whereas a theory describes nonexecutable properties and has loose semantics, namely a variety of admissible models. As early as in 1985, OBJ2 [13] was endowed with a generic module mechanism inspired by the mechanism for parameterized speciﬁcations of the Clear speciﬁcation language [3]; the interface declarations of OBJ2 generics were not purely syntactic but contained semantic requirements that actual modules had to satisfy before they could be meaningfully substituted. The use of OBJ for theorem-proving is aimed at providing mechanical assistance for proofs that are needed in the development of software and hardware, more than at mechanizing mathematical proofs in the broad sense. This partly explains the big emphasis which the design of OBJ places on equational reasoning and the privileged role assigned to universal algebra: equational logic is in fact suﬃciently powerful to describe any standard model within which one may want to carry out computations. We observe that an equational formulation of set theory can be designed [11], and may even oﬀer advantages w.r.t. a more traditional formulation of ZermeloFraenkel in limited applications where it is reasonable to expect that proofs can be found in fully automatic mode; nevertheless, overly insisting on equational reasoning in the realm of set theory would be preposterous in light of the highly interactive proof-veriﬁcation environment which we envision. We like to mention another ambitious project, closer in spirit to this paper although based on a sophisticated variant of Church’s typed lambda-calculus [6]: the Interactive Mathematical Proof System (IMPS) described in [10]. This

228

Eugenio G. Omodeo and Jacob T. Schwartz

system manages a database of mathematics, represented as a collection of interconnected axiomatic “little theories” which span graduate-level parts of analysis (about 25 theories: real numbers, partial orders, metric spaces, normed spaces, etc.), some algebra (monoids, groups, and ﬁelds), and also some theories more directly relevant to computer science (concerning state machines, domains for denotational semantics, and free recursive datatypes). The initial library caters for some fragments of set theory too: in particular, it contains theorems about cardinalities. Mathematical analysis is regarded as a signiﬁcant arena for testing the adequacy of formalizations of mathematics, because analysis requires great expressive power for constructing proofs. The authors of [10] claim that IMPS supports a view of the axiomatic method based on “little theories” tailored to the diverse ﬁelds of mathematics as well as the “big theory” view in which all reasoning is performed within a single powerful and highly expressive set theory. Greater emphasis is placed on the former approach, anyhow: with this approach, links —“conduits”, so to speak, to pass results from one theory to another— play a crucial role. To realize such links, a syntactic device named “theory interpretation” is used in a variety of ways to translate the language of a source theory to the language of a target theory so that the image of a theorem is always a theorem: this method enables reuse of mathematical results “transported” from relatively abstract theories to more specialized ones. One main diﬀerence of our approach w.r.t. that of IMPS is that we are willing to invest more on the “big theory” approach and, accordingly, do not feel urged to rely on a higher-order logic where functions are organized according to a type hierarchy. It may be contended that the typing discipline complies with everyday mathematical practice, and perhaps gives helpful clues to the automated reasoning mechanisms so as to ensure better performance; nevertheless, a well-thought type-free environment can be conceptually simpler. Both OBJ and IMPS attach great importance to interconnections across theories, inheritance to mention a most basic one, and “theory ensembles” to mention a nice feature of IMPS which enables one to move, e.g., from the formal theory of a metric space to a family of interrelated replicas of it, which also caters for continuous mappings between metric spaces. As regards theory interconnections, the proposal we have made in this paper still awaits being enriched. The literature on the OBJ family and on the IMPS system also stresses the kinship between the activity of proving theorems and computing in general; even more so does the literature on systems, such as Nuprl [8] or the Calculus of Constructions [9], which rely on a constructive foundation, more or less close to Martin-L¨ of’s intuitionistic type theory [19]. Important achievements, and in particular the conception of declarative programming languages such as Prolog, stem in fact from the view that proof-search can be taken as a general paradigm of computation. On the other hand, we feel that too little has been done, to date, in order to exploit a “proof-by-computation” paradigm aimed at enhancing theorem-proving by means of the ability to perform symbolic computations

A ‘Theory’ Mechanism for a Proof-Veriﬁer Based on First-Order Set Theory

229

eﬃciently in specialized contexts of algebra and analysis (a step in this direction was moved with [7]). Here is an issue that we intend to deepen in a forthcoming paper.

7

Conclusions

We view the activity of setting up detailed formalized proofs of important theorems in analysis and number theory as an essential part of the feasibility study that must precede the development of any ambitious proof-checker. In mathematics, set theory has emerged as the standard framework for such an enterprise, and full computer-assisted certiﬁcation of a modernized version of Principia Mathematica should now be possible. To convince ourselves of a veriﬁer system’s ability to handle large-scale mathematical proofs —and such proofs cannot always be avoided in program-correctness veriﬁcation—, it is best to follow the royal road paved by the work of Cauchy, Dedekind, Frege, Cantor, Peano, Whitehead–Russell, Zermelo–Fraenkel–von Neumann, and many others. Only one facet of our work on large-scale proof scenarios is presented in this paper. Discussion on the nature of the basic inference steps a proof-veriﬁer should (and reasonably can) handle has been omitted to focus our discussion on the issue of proof modularization. The obvious goal of modularization is to avoid repeating similar steps when the proofs of two theorems are closely analogous. Modularization must also conceal the details of a proof once they have been fed into the system and successfully certiﬁed. When coupled to a powerful underlying set theory, indeﬁnitely expansible with new function symbols generated by Skolemization, the technical notion of “theory” proposed in this paper appears to meet such proof-modularization requirements. The examples provided, showing how often the THEORY construct can be exploited in proof scenarios, may convince the reader of the utility of this construct.

Acknowledgements We thank Ernst-Erich Doberkat (Universit¨ at Dortmund, D), who brought to our attention the text by Frege cited in the epigraph of this paper. We are indebted to Patrick Cegielski (Universit´e Paris XII, F) for helpful comments.

References 1. A. Blass and Y. Gurevich. The logic of choice. J. of Symbolic Logic, 65(3):1264–1310, 2000. 2. D. S. Bridges. Foundations of real and abstract analysis. Springer-Verlag, Graduate Texts in Mathematics vol.174, 1997. 3. R. Burstall and J. Goguen. Putting theories together to make speciﬁcations. In R. Reddy, ed, Proc. 5th International Joint Conference on Artificial Intelligence. Cambridge, MA, pp. 1045–1058, 1977.

230

Eugenio G. Omodeo and Jacob T. Schwartz

4. R. Caferra and G. Salzer, editors. Automated Deduction in Classical and NonClassical Logics. LNCS 1761 (LNAI). Springer-Verlag, 2000. 5. P. Cegielski. Un fondement des math´ematiques. In M. Barbut et al., eds, La recherche de la v´erit´e. ACL – Les ´editions du Kangourou, 1999. 6. A. Church. A formulation of the simple theory of types. J. of Symbolic Logic, 5:56–68, 1940. 7. E. Clarke and X. Zhao. Analytica—A theorem prover in Mathematica. In D. Kapur, ed, Automated Deduction—CADE-11. Springer-Verlag, LNCS vol. 607, pp. 761–765, 1992. 8. R. L. Constable, S. F. Allen, H. M. Bromley, W. R. Cleaveland, J. F. Cremer, R. W. Harper, D. J. Howe, T. B. Knoblock, N. P. Mendler, P. Panangaden, J. T. Sasaki, and S. F. Smith. Implementing mathematics with the Nuprl development system. Prentice-Hall, Englewood Cliﬀs, NJ, 1986. 9. Th. Coquand and G. Huet. The calculus of constructions. Information and Computation, 76(2/3):95–120, 1988. 10. W. M. Farmer, J. D. Guttman, F. J. Thayer. IMPS: An interactive mathematical proof system. J. of Automated Reasoning, 11:213–248, 1993. 11. A. Formisano and E. Omodeo. An equational re-engineering of set theories. In Caferra and Salzer [4, pp. 175–190]. 12. G. Frege. Logik in der Mathematik. In G. Frege, Schriften zur Logik und Sprachphilosophie. Aus dem Nachlaß herausgegeben von G. Gabriel. Felix Meiner Verlag, Philosophische Bibliothek, Band 277, Hamburg, pp. 92–165, 1971. 13. K. Futatsugi, J. A. Goguen, J.-P. Jouannaud, J. Meseguer. Principles of OBJ2. Proc. 12th annual ACM Symp. on Principles of Programming Languages (POPL’85), pp. 55-66, 1985. 14. R. Godement. Cours d’alg` ebre. Hermann, Paris, Collection Enseignement des Sciences, 3rd edition, 1966. 15. J. A. Goguen and G. Malcolm. Algebraic semantics of imperative programs. MIT, 1996. 16. T. J. Jech. Set theory. Springer-Verlag, Perspectives in Mathematical Logic, 2nd edition, 1997. 17. E. Landau. Foundation of analysis. The arithmetic of whole, rational, irrational and complex numbers. Chelsea Publishing Co., New York, 2nd edition, 1960. 18. A. Levy. Basic set theory. Springer-Verlag, Perspectives in Mathematical Logic, 1979. 19. P. Martin-L¨ of. Intuitionistic type theory. Bibliopolis, Napoli, Studies in Proof Theory Series, 1984.

An Open Research Problem: Strong Completeness of R. Kowalski’s Connection Graph Proof Procedure J¨ org Siekmann1 and Graham Wrightson2 1

Universit¨ at des Saarlandes, Stuhlsatzenhausweg, D-66123 Saarbr¨ ucken, Germany. [email protected] 2 Department of Computer Science and Software Engineering, The University of Newcastle, NSW 2308, Australia. [email protected]

Abstract. The connection graph proof procedure (or clause graph resolution as it is more commonly called today) is a theorem proving technique due to Robert Kowalski. It is a negative test calculus (a refutation procedure) based on resolution. Due to an intricate deletion mechanism that generalises the well-known purity principle, it substantially reﬁnes the usual notions of resolution-based systems and leads to a largely reduced search space. The dynamic nature of the clause graph upon which this refutation procedure is based, poses novel meta-logical problems previously unencountered in logical deduction systems. Ever since its invention in 1975 the soundness, conﬂuence and (strong) completeness of the procedure have been in doubt in spite of many partial results. This paper provides an introduction to the problem as well as an overview of the main results that have been obtained in the last twenty-ﬁve years.

1

Introduction to Clause Graph Resolution

We assume the reader to be familiar with the basic notions of resolution-based theorem proving (see, for example, Alan Robinson [1965], Chang, C.-L. and Lee, R.C.-T. [1973] or Don Loveland [1978]). Clause graphs introduced a new ingenious development into the ﬁeld, the central idea of which is the following: In standard resolution two resolvable literals must ﬁrst be found in the set of sets of literals before a resolution step can be performed, where a set of literals represents a clause (i.e. a disjunction of these literals) and a statement to be refuted is represented as a set of clauses. Various techniques were developed to carry out this search. However, Robert Kowalski [1975] proposed an enhancement to the basic data structure in order to make possible resolution steps explicit, which — as it turned out in subsequent years — not only simpliﬁed the search, but also introduced new and unexpected logical problems. This enhancement was gained by the use of so-called links between complementary literals, thus turning the set notation into a graph-like structure. The new approach allowed in particular for the removal of a link after the corresponding resolution step and A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 231–252, 2002. c Springer-Verlag Berlin Heidelberg 2002

232

J¨ org Siekmann and Graham Wrightson

a clause that contains a literal which is no longer connected by a link may be removed also (generalised purity principle). An important side eﬀect was that this link removal had the potential to cause the disappearance of even more clauses from the current set of clauses (avalanche eﬀect). Although this eﬀect could reduce the search space drastically it also had a signiﬁcant impact on the underlying logical foundations. To quote Norbert Eisinger from his monograph on Kowalski’s clause graphs [1991]: “Let S and S be the current set of formulae before and after a deduction step S S . A step of a classical calculus and a resolution step both simply add a formula following from S. Thus, each interpreted as the conjunction of its members, S and S are always equivalent. For clause graph resolution, however, S may contain formulae missing in S , and the removed formulae are not necessarily consequences of those still present in S . While this does not aﬀect the forward implication, S does in general no longer ensue from S . In other words, it is possible for S to possess more models than S. But, when S is unsatisﬁable, so must be S , i.e. S must not have more models than S, if soundness, unsatisﬁability and hence refutability, is to be preserved.” This basic problem underlying all investigations of the properties of the clause graph procedure will be made more explicit in the following.

2

Clause Graph Resolution: The Problem

The standard resolution principle, called set resolution in the following, assumes the axioms and the negated theorem to be represented as a set of clauses. In contrast, the clause graph proof procedure represents the initial set of clauses as a graph by drawing a link between pairs of literal occurrences to denote that some relation holds between these two literals. If this relation is “complementarity” (it may denote other relations as well, see e.g. Christoph Walter [1981], but this is the standard case and the basic point of interest in this paper) of the two literals, i.e. resolvability of the respective clauses, then an initial clause graph for the set S = {{ −P (z, c, z), −P (z, d, z)}, {P (a, x, a), −P (a, b, c)}, {P (a, w, c), P (w, y, w)}, {P (u, d, u), −P (b, u, d), P (u, b, b)}, {−P (a, b, b)}, {−P (c, b, c), P (v, a, d), P (a, v, b)}} is the graph in Figure 1. Here P is a ternary predicate symbol, letters from the beginning of the alphabet a, b, c, . . . denote constants, letters from the end of the alphabet x, y, z, v, . . . denote variables and −P (. . . ) denotes the negation of P (. . . ).

An Open Research Problem

233

Example 1.

-Pzcz

-Pzdz

Pudu

10

1

Pavb 9

4 2

-Pbud

7

3

-Pabb 8

-Paxa

-Pabc

6

5

-Pawc -Pwyw

-Pcbc

Pvad

Pavb

Fig. 1.

An appropriate most general uniﬁer is associated with each link (not shown in the example of Figure 1). We use the now standard notation that adjacent boxes denote a clause, i.e. the conjunction of the literals in the boxes. So far such a clause graph is just a data structure without commitment to a particular proof procedure and in fact there have been many proposals to base an automated deduction procedure on some graph-like notion (e.g. Andrews [1976], Andrews [1981], Bibel [1981b], Bibel [1982], Chang and Slagle [1979], Kowalski [1975], Shostak [1976], Shostak [1979], Sickel [1976], Stickel [1982], Yates and Raphael and Hart [1970], Omodeo [1982], Yarmush [1976], Murray and Rosenthal [1993], Murray and Rosenthal [1985]). Kowalski’s procedure uses a graph-like data structure as well, but its impact is more fundamental since it operates now as follows: suppose we want to perform the resolution step represented by link 6 in Figure 1 based on the uniﬁer σ = {w → b}. Renaming the variables appropriately we obtain the resolvent {P (a, x , a), P (b, y , b)} which is inserted into the graph and if now all additional links are set this yields the graph: Pax’a Pby’b 11

12 13

14 Pudu

-Pzcz -Pzdz 1

10

Pavb 9

4 2

-Pbud

7

3

-Pabb 8

-Paxa -Pabc

-Pawc -Pwyw

5

Fig. 2.

-Pcbc

Pvad

Pavb

234

J¨ org Siekmann and Graham Wrightson

Now there are three essential operations: 1. The new links don’t have to be recomputed by comparing every pair of literals again for complementarity, but this information can instead be inherited from the given link structure. 2. The link resolved upon is deleted to mark the fact that this resolution step has already been performed, 3. Clauses that contain a literal with no link connecting it to the rest of the graph may be deleted (generalised purity principle). While the ﬁrst point is the essential ingredient for the computational attractiveness of the clause graph procedure, the second and third points show the ambivalence between gross logical and computational advantages versus severe and novel theoretical problems. Let us turn to the above example again. After resolution upon link 6 we obtain the graph in Figure 2 above. Now since link 6 has been resolved upon we have it deleted it according to rule (2). But now the two literals involved become pure and hence the two clauses can be deleted as well leading to the following graph:

Pax’a

11

Pby’b

12 13

-Pzcz

14

-Pzdz

Pudu

-Pbud

10

Pavb 9

7

-Pabb 8

-Pcbc

Pvad

Pavb

Fig. 3.

But now the literal −P (c, b, c) in the bottom clause becomes pure as well and hence we have the graph:

An Open Research Problem Pax’a

11

Pby’b

12 13

-Pzcz

235

14

-Pzdz

Pudu 10

-Pbud

Pavb 9 -Pabb

Fig. 4.

This removal causes the only literal −P (a, b, b) in the bottom clause to become pure and hence, after a single resolution step followed by all these purity deletions, we arrive at the ﬁnal graph:

Pax’a

11

Pby’b

12 13

-Pzcz

14

-Pzdz

Fig. 5.

It is this strong feature that reduces redundancy in the complementary set of clauses, that marks the fascination for this proof procedure (see Ohlbach [1985], Ohlbach [1983], Bl¨ asius [1986] and [1987], Eisinger et al. [1989], Ohlbach and Siekmann [1991], Bl¨ asius et al. [1981], Eisinger [1981], Eisinger and Siekmann and Unvericht [1979], Ohlbach [1987], Ramesh et al. [1997], Murray and Rosenthal [1993], Siekmann and Wrightson [1980]). It can sometimes even reduce the initial redundant set to its essential contradictory subset (subgraph). But this also marks its problematical theoretical status: how do we know that we have not deleted too many clauses? Skipping the details of an exact deﬁnition of the various inheritance mechanisms (see e.g. Eisinger [1991] for details) the following example demonstrates the problem.

236

J¨ org Siekmann and Graham Wrightson

Suppose we have the refutable set S = {{P (a), P (a)}, {−P a}} and its initial graph as in Figure 6, where PUR means purity deletion and MER stands for merging two literals (Andrews [1968]), whilst RES stands for resolution. Example 2.

?

-Pa Pa

Pa

PUR

-Pa

PUR

Pa MER

-Pa

RES

{2}

Fig. 6.

Thus in two steps we would arrive either at the empty set ?, which stands for satisﬁability, or in the lower derivation we arrive at the empty clause {}, which stands for unsatisﬁability. This example would seem to show that the procedure: (i) is not conﬂuent, as deﬁned below (ii) is not sound (correct), and (iii) is not refutation complete (at least not in the strong sense as deﬁned below), and hence would be useless for all practical purposes. But here we can spot the ﬂaw immediately: the process did not start with the full initial graph, where all possible links are set. If, instead, all possible links are drawn in the initial graph, the example in Figure 6 fails to be a counterexample. On the other hand, after a few initial steps we always have a graph with some links deleted, for example because they have been resolved upon. So how can we be sure that the same disastrous phenomenon, as in the above example, will not occur again later on in the derivation? These problems have been called the conﬂuence, the soundness and the (strong) completeness problem of the clause graph procedure and it can be shown that for the original formulation of the procedure in Kowalski [1975] (with full

An Open Research Problem

237

subsumption and tautology removal) all these three essential properties unfortunately do not hold in general. However, for suitable remedies (of subsumption and tautology removal) the ﬁrst two properties hold, whereas the third property has been open ever since.

3

Properties and Results for the Clause Graph Proof Procedure

In order to capture the strange and novel properties of logical graphs let us ﬁx the following notions: A clause graph of a set of clauses S consists of a set of nodes labelled by the literal occurrences in S and a set of links that connect complementary literals. There are various possibilities to make this notion precise (e.g. Siekmann and Stephan [1976] and [1980], Brown [1976], Eisinger [1986] and [1991], Bibel [1980], Smolka [1982a,b,c] Bibel and Eder [1997], H¨ahnle et al. [2001], Murray and Rosenthal [1985]). Let INIT(S) be the full initial clause graph for S with all possible links set. This is called a full connection graph in Bibel and Eder [1997], a total graph in Eisinger [1991] and in Siekmann, Stephan [1976] and a complete graph in Brown [1976]. Definition 1. Clause graph resolution is called ∗ refutation sound if INIT(S) −→ {} then S is unsatisﬁable; refutation complete if S is unsatisﬁable then there exists a derivation ∗ INIT(S) −→ {}; refutation conﬂuent if S is unsatisﬁable, and, ∗ ∗ if INIT(S) −→ G1 and INIT(S) −→ G2 ∗ ∗ then there exists G1 −→ G and G2 −→ G for some G ; ∗ aﬃrmation sound if INIT(S) −→ ? then S is satisﬁable; aﬃrmation complete if S is satisﬁable then there exists a derivation ∗ INIT(S) −→ ?; aﬃrmation conﬂuent if S is satisﬁable, and, ∗ ∗ if INIT(S) −→ G1 and INIT(S) −→ G2 ∗ ∗ then there exists G1 −→ G and G2 −→ G , for some G . The state of knowledge about the clause graph proof procedure at the end of the 1980’s can be summarised by the following major theorems. There are some subtleties involved when subsumption and tautology removal are involved (see Eisinger [1991] for a thorough exposition; the discovery of the problems with subsumption and tautology removal and an appropriate remedy for these problems is due to Wolfgang Bibel). Theorem 1 (Bibel, Brown, Eisinger, Siekmann, Stephan). Clause graph resolution is refutation sound. Theorem 2 (Bibel). Clause graph resolution is refutation complete.

238

J¨ org Siekmann and Graham Wrightson

Theorem 3 (Eisinger, Smolka, Siekmann, Stephan). Clause graph resolution is refutation conﬂuent. Theorem 4 (Eisinger). Clause graph resolution is aﬃrmation sound. Theorem 5 (Eisinger). Clause graph resolution is not aﬃrmation conﬂuent. Theorem 6 (Smolka). For the unit refutable class, clause graph resolution with an unrestricted tautology rule is refutation complete, refutation conﬂuent, aﬃrmation sound, (and strongly complete). The important notion of strong completeness is introduced below. Theorem 7 (Eisinger). Clause graph resolution with an unrestricted tautology rule is refutation complete, but neither refutation conﬂuent nor aﬃrmation sound. As important and essential as the above-mentioned results may be, they are not enough for the practical usefulness of the clause graph procedure: the principal requirement for a proof procedure is not only to know that there exists a refutation, but even more importantly that the procedure can actually ﬁnd it after a ﬁnite number of steps. These two notions, called refutation completeness and strong refutation completeness in the following, essentially coincide for set resolution but unfortunately they do not do so for the clause graph procedure. This can be demonstrated by the example, in Figure 7, where we start with the graph G0 and derive G1 from G0 by resolution upon the link marked ☞. The last graph G2 contains a subgraph that is isomorphic to the ﬁrst, hence the corresponding inference steps can be repeated over and over again and the procedure will not terminate with the empty clause. Note that a refutation, i.e. the derivation of the empty clause, could have been obtained by resolving upon the leftmost link between P and −P .

G0 -P

P

-Q

P

-P

Q

☞

Example 3 (adapted from Eisinger [1991]).

Q

-R

-Q

R

An Open Research Problem

239

G0 ! G1 -P

P

-Q

P

-P

Q

-Q

R

Q

-R

-Q

R

Q

-R

Q

-R

☞ P

-P

P

-Q

P

-P

Q

☞

G1 ! G2

-R

P

-R

Fig. 7.

Examples of this nature gave rise to the strong completeness conjecture, which in spite of numerous attacks has remained an open problem now for over twenty years: How can we ensure for an unsatisﬁable graph that the derivation stops after ﬁnitely many steps with a graph that contains the empty clause? If this crucial property cannot be ascertained, the whole procedure would be rendered useless for all practical purposes, as we would have to backtrack to some earlier state in the derivation, and hence would have to store all intermediate graphs. The theoretical problems and strange counter intuitive facts that arise from the (graphical ) representation were ﬁrst discovered by J¨ org Siekmann and Werner Stephan and reported independently in Siekmann and Stephan [1976] and [1980] and by Frank Brown in Brown [1976]. They suggested a remedy to the problem: the obvious ﬂaw in the above example can be attributed to the fact that the proof procedure never selects the essential link for the refutation (the link between −P and P ). This, of course, is a property which a control strategy should have, i.e. it should be fair in the sense that every link is eventually selected. However this is

240

J¨ org Siekmann and Graham Wrightson

a subtle property in the dynamic context of the clause graph procedure as we shall see in the following. Control Strategies In order to capture the strange metalogical properties of the clause graph procedure, Siekmann and Stephan [1976] and [1980] introduced two essential notions in order to capture the above-mentioned awkward phenomenon. These two notions have been the essence of all subsequent investigations: (i) the notion of a kernel. This is now sometimes called the minimal refutable subgraph of a graph, e.g. in Bibel and Eder [1997]; (ii) several notions of covering, called fairness in Bibel and Eder [1997], exhaustiveness in Brown [1976], fairness-one and fairness-two in Eisinger [1991] and covering-one, two and three in Siekmann and Stephan [1976]. Let us have a look at these notions in turn, using the more recent and advanced notation of Eisinger [1991]. Why is it not enough to simply prove refutation completeness as in the case of clause set resolution? Ordinary refutation completeness ensures that if the initial set of clauses is unsatisﬁable, then there exists a refutation, i.e. a ﬁnite derivation of the empty clause. Of course, there is a control strategy for which this would be suﬃcient for clause graph resolution as well, namely an exhaustive enumeration of all possible graphs, as in Figure 8, where we assume that the initial graph G0 has n links. However such a strategy is computationally infeasible and far too expensive and would make the whole approach useless. G0

G01

G02

G03

·

·

G0n

G011 · G01m

·

·

·

·

·

Fig. 8.

We know by Theorem 2 that the clause graph procedure is refutation complete, i.e. that there exists a subgraph from which the derivation can be obtained. Could we not use this information from a potential derivation we know to exist in order to guide the procedure in general?

An Open Research Problem

241

Many strategies for clause graphs are in fact based on this very idea (Andrews [1981], Antoniou and Ohlbach [1983], Bibel [1981a], Bibel [1982], Chang and Slagle [1979], Sickel [1976]). However, in general, ﬁnding the appropriate subgraph essentially amounts to ﬁnding a proof in the ﬁrst place and we might as well use a standard resolution-based proof procedure to ﬁnd the derivation and then use this information to guide the clause graph procedure. So let us just assume in the abstract that every full (i.e. a graph where every possible link is set) and unsatisﬁable graph contains a subgraph, called a kernel (the shaded area in Figure 9), from which an actual refutation can be found in a ﬁnite number of steps.

Fig. 9.

We know from Theorem 2 above and from the results in Siekmann and Stephan [1976] and [1980] that every resolution step upon a link within the kernel eventually leads to the empty clause and thus to the desired refutation. If we can ensure that:

1. resolution steps involving links outside of the kernel do not destroy the kernel, and 2. every link in the kernel is eventually selected,

then we are done. This has been the line of attack ever since. Unfortunately the second condition turned out to be more subtle and rather diﬃcult to establish. So far no satisfactory solution to this problem has been found. So let us look at these concepts a little closer.

242

J¨ org Siekmann and Graham Wrightson

Definition 2. A ﬁlter for an inference system is a unary predicate F on the ∗ set of ﬁnite sequences of states. The notation S0 −→ Sn with F stands for ∗ a derivation S0 −→ Sn where F(S0 . . . Sn ) holds. For an inﬁnite derivation, S0 → . . . → Sn → . . . with F means that F(S0 . . . Sn . . . ) holds for each n. This notion is due to Gert Smolka in [1982b ] and Norbert Eisinger in [1991] and it is now used in several monographs on deduction systems (see e.g. K. Bl¨asius and H. J. B¨ urckert [1992]). Typical examples for a ﬁlter are the usual restriction and ordering strategies in automated theorem proving, such as set-ofsupport by Wos and Robinson and Carson [1965], linear refutation by Loveland [1970], merge resolution by Andrews [1968], unit resolution by Wos [1964], or see Kowalski [1970]. Definition 3. A ﬁlter F for clause graph resolution is called ∗ refutation sound: INIT(S) −→ {} with F then S is unsatisﬁable; refutation complete: if S is unsatisﬁable then there exists ∗ INIT(S) −→ {} with F; refutation conﬂuent: Let S be unsatisﬁable, ∗ ∗ For INIT(S) −→ G1 with F and INIT(S) −→ G2 ∗ with F then there exists G1 −→ G with F and ∗ G2 −→ G with F, for some G ; strong refutation for an unsatisﬁable S there does not exist an inﬁnite completeness: derivation INIT(S) → G1 → G1 → . . . → Gn → . . . with F. Note that → with F need not be transitive, hence the special form of conﬂuence, also note that the procedure terminates with {} or with ?. The most important and still open question is now: can we ﬁnd a general property for a ﬁlter that turns the clause graph proof procedure into a strongly complete system? Obviously the ﬁlter has to make sure that every link (in particular every link in some ﬁxed kernel) is eventually selected for resolution and not inﬁnitely postponed. Definition 4. A ﬁlter F for clause graph resolution is called covering, if the ∗ following holds: Let G0 be an initial graph, let G0 −→ Gn with F be a derivation, and let λ be a link in Gn . Then there is a ﬁnite number n(λ), such that for any ∗ ∗ derivation G0 −→ Gn −→ G with F extending the given one by at least n(λ) steps, λ is not in G. This is the weakest notion, called “coveringthree” in Siekmann and Stephan [1976], exhaustiveness in Brown [1976] and fairness in Bibel and Eder [1997]. It is well-known and was already observed in Siekmann and Stephan [1976] that the strong completeness conjecture is false for this notion of covering. The problem is that a link can disappear without being resolved upon, namely by purity deletion, as the examples from the beginning demonstrate. Even the original links in the kernel can be deleted without being resolved upon, but may reappear after the copying process.

An Open Research Problem

243

For this reason stronger notions of fairness are required: apparently even essential links can disappear without being resolved upon and reappear later due to the copying process. Hence we have to make absolutely sure that every link in the kernel is eventually resolved upon. To this end imagine that each initial link bears a distinct colour and that each descendant of a coloured link inherits the ancestor’s colour: Definition 5. An ordering ﬁlter F for clause graph resolution is called coveringtwo, if it is a covering and at least one link of each colour must have been resolved upon after at most ﬁnitely many steps. At ﬁrst sight this deﬁnition now seems to capture the essence, but how do we know that the “right” descendant (as there may be more than one) of the coloured ancestor has been operated upon? Hence the strongest deﬁnition of fairness for a ﬁlter: Definition 6. A ﬁlter F for clause graph resolution is called coveringone, if each colour must have disappeared after at most ﬁnitely many steps. While the strong completeness conjecture can be shown in the positive for the latter notion of covering (see Siekmann and Stephan [1980]), hardly any of the practical and standard ﬁlters actually fulﬁll this property (except for some obvious and exotic cases). So the strong completeness conjecture boils down to ﬁnding: 1. a proof or a refutation that a covering ﬁlter is strongly complete, for the appropriate notions of coveringone, -two, and -three, and 2. strong completeness results for subclasses of the full ﬁrst-order predicate calculus, or 3. an alternative notion of covering for which strong completeness can be shown. The ﬁrst two problems were settled by Norbert Eisinger and Gerd Smolka. Theorem 8 (Smolka). For the unit refutable class the strong completeness conjecture is true, i.e. the conjunction of a covering ﬁlter with any refutation complete and refutation conﬂuent restriction ﬁlter is refutation complete, refutation conﬂuent, and Noetherian, i.e. it terminates. This theorem, whose essential contribution is due to Gerd Smolka [1982a] accounts for the optimism at the time. After all the unit refutable class of clauses (Horn clauses) turned out to be very important for many practical purposes, includng logic programming, and the theorem shows that all the essential properties of a useful proof procedure now hold for the clause graph procedure. Based on an ingenious construction, Norbert Eisinger showed however the following devastating result which we will look at again in more detail in Section 4.

244

J¨ org Siekmann and Graham Wrightson

Theorem 9 (Eisinger). In general the strong completeness conjecture is false, even for a restriction ﬁlter based on the coveringtwo deﬁnition. This theorem destroyed once and for all the hope of ﬁnding a solution to the problem based on the notion of fairness, as it shows that even for the strongest possible form of fairness, strong completeness cannot be obtained. So attention turned to the third of the above options, namely of ﬁnding alternative notions of a ﬁlter for which strong completeness can be shown. Early results are in Wrightson [1989], Eisinger [1991] and more recent results are H¨ahnle et al. [2001], Meagher and Hext [1998]. Let us now look at the proof of Theorem 9 in more detail.

4

The Eisinger Example

This example is taken from Eisinger [1991], p. 158, Example 7.4 7. It shows a cyclic coveringtwo derivation, i.e. it shows that the clause graph proof procedure does not terminate even for the strong notion of a coveringtwo ﬁlter, hence in particular not for the notion of coveringthree either. Let S = {P Q, −P Q, −Q − R, RS, R − S} and INIT(S) = G0 .

G0

G1 5

Q

P

2

4 -Q

!

-R

S

Q

-P

-R

S

6

7

P

8

8

-S

R

9

-R

-S

-S

6

3 R

9

13

-S

-Q

R

9

Q

8

P

-P

S

6

-R

!

7

11

R

-S

10

G5 12

12 -Q

13

14 -S

S

12 -Q

14

G4 S

-P

G3 R

R

7

4 -Q

4

-R

P

10

-S

11 5

R

Q

Q

6 3

G2 -Q

5

S

-R

1

Q

-P

R

8 Q

-Q

-P

-R

9 R

P

11

-S

17 16

S

-P

S

15

-Q

-S

13

14 -S

8 Q

-Q

16

9

R

11

-P

-R

19

P

17 S

-P

18

-P

R

An Open Research Problem G6

G7 12

-S

-Q

S

13

14

Q 11

-Q

-S

R

-Q

22

8

-Q

-P

P

21

20

!

9

18

-P

-R

S

19

14

R

-S

-S

-P

13

8 Q

-R 19

18

-P

-P

P

11

-Q

245

-Q

-P

-Q

-P

-P

R

-P

-R

G9

G8 -Q -Q

S

13

14

24

Q

-P

8

P

11

-Q

-S

-P

25

18 -Q

S

19

14

R

-P

20

21

-Q

-R

-S

-P

13

24

25

8 Q

-Q

18 27

Q

29

19

P

11 26

-P

R

-Q 30

G8 includes two copies of −Q − P , one of which might be removed by subsumption. To make sure that the phenomenon is not just a variation of the notorious subsumption problem described earlier in his monograph, Norbert Eisinger does not subsume, but performs the corresponding resolution steps for both clause nodes in succession. G10

G11 -Q

S

13

8

14

Q 11 -Q

-S

28 31

32

P

26 27

-P

18

30 Q

-R

S

19

14

-Q

13 Q 11

-P

R

-S

8

-Q

-P

-R

19

P 18

-P

R

-Q

33

34

Q

35

-Q 36

G10 contains two tautologies and all links which are possible among its clause nodes. In other words, it is the initial clause graph of {S − Q, −S − Q, QP, −P − R, −P R, Q − Q, Q − Q}. So far only resolution steps and purity removals were performed; now apply two tautology removals to obtain G11 . G11 has the same structure as G0 , from which it can be obtained by applying the literal permutation π : ±Q → ∓Q, ±P → ±S → ∓R → ±P . Since π 6 = id, ﬁve more “rounds” with the analogous sequence of inference steps will reproduce G0 as G66 , thus after sixty-six steps we arrive at a graph isomorphic to G0 . The only object of G0 still present in G11 is the clause node labelled P Q. In particular, all initial links disappeared during the derivation. Hence G0 and G66 have no object in common, which implies that the derivation is covering. The following classes of link numbers represent the “colours” introduced for the cover-

246

J¨ org Siekmann and Graham Wrightson

ingtwo concept in Deﬁnition 5; the numbers of links resolved upon are asterisked: {1∗}, {2, 8, 17, 18, 20∗, 23, 24∗},{3∗, 9∗, 19},{4∗, 7∗},{5, 11, 13, 21, 25, 26, . . . , 36}, {6, 10, 12, 14, 15∗, 16∗, 22∗}. Only the colour {5, 11, . . . , 36} was never selected for resolution during the ﬁrst round, and it just so happens that the second round starts with a resolution on link 11, which bears the critical colour. Hence the derivation also belongs to the coveringtwo class. This seminal example was discovered in the autumn of 1986 and has since been published and quoted many times. It has once and for all destroyed all hope of a positive result for the strong completeness conjecture based only on the notion of covering or fairness. The consequence of this negative result has been compared to the most unfortunate fact that the halting problem of a Turing machine is unsolvable. The (weak) analogy is in the following sense: all the work on deduction systems rests upon the basic result that the predicate calculus is semidecidable, i.e. if the theorem to be shown is in fact valid then this can be shown after a ﬁnite number of steps, provided the uniform proof procedure carries out every possible inference step. Yet, here we have a uniform proof procedure — clause graph resolution — which by any intuitive notion of fairness (“carries out every possible inference step eventually”) runs forever even on a valid theorem — hence is not even semidecidable. In summary: The open problem is to ﬁnd a ﬁlter that captures the essence of fairness on the kernel which is practically useful1 — and then to show the strong completeness property holds for this new notion of a ﬁlter. The open problem is not to invent an appropriate termination condition (even as brilliant as the decomposition criteria of Bibel and Eder [1987]2) as the proof procedure will not terminate even for the strongest known notion of covering (fairness) — and this is exactly why the problem is still interesting even when the day is gone.

1

2

This is important, as there are strategies which are known to be complete (for example to take a standard resolution theorem prover to ﬁnd a proof and then use this information for clause-graph resolution). Hence these strategies are either based on some strange notion, or else on some too speciﬁc property. The weak notion of fairness as deﬁned by W. Bibel and E. Eder [1987] can easily be refuted by much simpler examples (see e.g. Siekmann and Stephan [1976]) and Norbert Eisinger’s construction above refutes a much stronger conjecture. The proof in the Bibel and Eder paper not only contains an excusable technical error, which we all are unfortunately prone to (the ﬂaw is on page 336, line 29, where they assume that the fairness condition forces the procedure to resolve upon every link in the minimal complementary submatrix, here called the kernel), but unfortunately misses the very nature of the open problem (see also Siekmann and Wrightson [2001]).

An Open Research Problem

5

247

Lifting

All of the previous results and counterexamples apply to the propositional case or ground level as it is called in the literature on deduction systems. The question is, if and how these ground results can be lifted to the general case of the predicate calculus. While lifting is not necessarily the wrong approach for the connection graph, the proof techniques known so far are too weak: the problem is more subtle and requires much stronger machinery for the actual lifting. The standard argument is as follows: ﬁrst the result is established for the ground case, and there is now a battery of proof techniques3 known in order to do so. After that the result is “lifted” to the general case in the following sense: Let S be an unsatisﬁable set of clauses, then by Herbrand’s theorem we know that there exists a ﬁnite truth-functionally contradictory set S of ground instances of S. Now since we have the completeness result for this propositional case we know there exists a (resolution style) derivation. Taking this derivation, we observe that all the clauses involved are just instances of the clauses at the general level and hence “lifting” this derivation amounts to exhibiting a mirror image of this derivation at the general level, as the following ﬁgures shows:

S ⇓ S

{} ⇑

ground

{}

This proof technique is due to Alan Robinson [1965]. Unfortunately this is not enough for the clause graph procedure, as we have the additional graph-like structure: not only has the ground proof to be lifted to the general level as usual, it has also to be shown that an isomorphic (or otherwise suﬃcient) graph structure can be mirrored from the ground level graph INIT(S ) to the graph at the general level INIT(S), such that the derivation can actually be carried out within this graph structure as well:

INIT(S) ⇓ INIT(S )

{G()} ⇑

ground

{G ()}

where G() is a clause graph that contains the empty clause . This turned out to be more diﬃcult than expected in the late 1970’s, when most of this work got started. However by the end of the 1980’s it was wellknown that standard lifting techniques fail: the non-standard graph-oriented 3

Such as induction on the excess-literal-number, which is due to W. Bledsoe (see Loveland [1978]).

248

J¨ org Siekmann and Graham Wrightson

lifting results in Siekmann and Stephan [1980] turned out to be false. Similarly the lifting results in Bibel [1982] and in Bibel and Eder [1997], theorem 5.4 are also false. To quote from Norbert Eisinger’s monograph ([1991], p. 125) on clause graphs “Unfortunately the idea (of lifting a refutation) fails for an intricate difﬁculty which is the central problem in lifting graph theoretic properties. A resolution step on a link in G (the general case) requires elimination of all links in G (the ground refutation) that are mapped to the link in G. . . . Such a side eﬀect can forestall the derivation of the successor.” This phenomenon seems to touch upon a new and fundamental problem, namely, the lifting technique has to take the topological structure of the two graphs (the ground graph and the general clause graph) into account as well, and several additional graph-theoretical arguments are asked for. The ground case part essentially develops a strategy which from any ground initial state leads to a ﬁnal state. In the clause graph resolution system any such strategy has to willy-nilly distinguish between “good” steps and “bad” steps from each ground state, because there are ground case examples where an inappropriate choice of inference steps leads to inﬁnite derivations that do not reach a ﬁnal state. Eliminating or reducing the number of links with a given atom are sample criteria for “good” steps in diﬀerent strategies. The lifting part then exploits the fact that it suﬃces to consider the conjunction of ﬁnitely many ground instances of a given ﬁrst order formula, and show how to lift the steps of a derivation existing for the ground formula to the ﬁrst order level. Clause graph resolution faces the problem that a single resolution step on the general level couples diﬀerent ground level steps together in a way that may be incompatible with a given ground case strategy, because “bad” steps have to be performed as a side eﬀect of “good” steps. That this is not always straightforward and may fail in general is shown by several (rather complex) examples (pp.123–130 in Eisinger [1991]), which we shall omit here. The interested reader may consult the monograph itself, which still represents most of what is known about the theoretical properties of clause graphs today. To be sure, there is a very simple way to solve this problem: just add to the inference system an unrestricted copy rule and use it to insert suﬃciently many variants. However to introduce an unrestricted copy rule, as, for example, implicitly assumed in the Bibel [1982] monograph, completely destroys the practical advantages of the clause graph procedure. It is precisely the advantage of the strong redundancy removal which motivated so many practical systems to employ this rather complicated machinery (see e.g. Ohlbach and Siekmann [1991]). Otherwise we may just use ordinary resolution instead. We feel that maybe the lifting technique should be abandoned altogether for clause graph refutation systems: the burden of mapping the appropriate graph structure (and taking its dynamically changing nature into account) seems to

An Open Research Problem

249

outweigh its advantages and a direct proof at the most general level with an appropriate technique appears far more promising. But only the future will tell.

6

Conclusion

The last twenty-ﬁve years have seen many attempts and partial results about so far unencountered theoretical problems that marred this new proof procedure, but it is probably no unfair generalisation to say, that almost every paper (including ours) on the problems has had technical ﬂaws or major errors and the main problem — strong completeness — has been open ever since 1975 when clause graph resolution was ﬁrst introduced to the scholarly community. Why is that so? One reason may be methodological. Clause graph resolution is formulated within three diﬀerent conceptual frameworks: the usual clausal logic, the graphtheoretic properties and ﬁnally the algorithmic aspects, which account for its nonmonotonic nature. So far most of the methodological eﬀort has been spent on the graphtheoretical notions (see e.g. Eisinger [1991]) in order to obtain a ﬁrm theoretical basis. The hope being that once these graphtheoretical properties have a sound mathematical foundation, the rest will follow suit. But this may have been a misconception: it is — after all — the metalogical properties of the proof procedure we are after and hence the time may have come to question the whole approach. In (Gabbay, Siekmann [2001]) we try to turn the situation back from its (graphtheroetical) head to standing on its (logical) feet, by showing a logical encoding of the proof procedure without explicit reference to graphtheoretical properties. Mathematics, it is said, advances through conjectures and refutations and this is a social process often carried out over more than one generation. Theoretical computer science and artiﬁcial intelligence apparently are no exceptions to this general rule.

Acknowledgements This paper has been considerably improved by critical comments and suggestions from the anonymous referees and from Norbert Eisinger, Christoph Walther and Dov Gabbay. The authors would like to thank Oxford University Press for their kind permissin to reprint this paper, which is appearing in the Logic Journal of the IGPL.

References Andrews, P. B.: Resolution with Merging. J. ACM 15 (1968) 367–381. Andrews, P. B.: Refutations by Matings. IEEE Trans. Comp. C-25, (1976) 8, 801–807.

250

J¨ org Siekmann and Graham Wrightson

Andrews, P.B.: Theorem Proving via General Matings. J. ACM 28 (1981) 193–214. Antoniuo, G., Ohlbach, H.J.: Terminator. Proceedings 8th IJCAI, Karlsruhe, (1983) 916–919. Bibel, W.: A Strong Completeness Result for the Connection Graph Proof Procedure. Bericht ATP-3-IV-80, Institut f¨ ur Informatik, Technische Universit¨ at, M¨ unchen (1980) Bibel, W.: On the completeness of connection graph resolution. In German Workshop on Artificial Intelligence. J.Siekmann, ed. Informatik Fachberichte 47, Springer, Berlin, Germany (1981a) pp.246–247 Bibel, W.: On matrices with connections. J.ACM, 28 (1981b) 633–645 Bibel, W.: Automated Theorem Proving. (1982) Vieweg. Wiesbaden. Bibel, W.: Matings in matrices. Commun. ACM, 26, (1983) 844–852 Bibel, W., Eder, E.: Decomposition of tautologies into regular formula and strong completeness of connection-graph resolution J. ACM 44 (1997) 320–344 Bl¨ asius, K. H.: Construction of equality graphs. SEKI report SR-86-01 (1986) Univ. Karlsruhe, Germany Bl¨ asius, K. H.: Equality reasoning based on graphs. SEKI report SR-87-01 (1987) Univ. Karlsruhe, Germany Bl¨ asius, K. H., B¨ urckert, H. J.: Deduktions Systeme, (1992) Oldenbourg Verlag. Also in English: Ellis Horwood, 1989 Bl¨ asius, K. H., Eisinger, N., Siekmann, J., Smolka, G., Herald A., Walter, C. The Markgraf Karl refutation procedure. Proc 7th IJCAI, Vancouver (1981) Brown, F. Notes on Chains and Connection Graphs. Personal Notes, Dept. of Computation and Logic, University of Edinburgh (1976) Chang, C.-L., Lee, R.C.-T.: Symbolic Logic and Mechanical Theorem Proving, Academic Press (1973) Chang, C.-L., Slagle, J.R.: Using Rewriting Rules for Connection Graphs to Prove Theorems. Artificial Intelligence 12 (1979) 159–178. Eisinger, N.: What you always wanted to know about clause graph resolution. In Proc of 8th Conf. on Automated Deduction Oxford (1986) LNCS 230, Springer Eisinger, N.: Subsumption for connection graphs. Proc 7th IGCAI, Vancouver (1981) Eisinger, N.: Completeness, Conﬂuence, and Related Properties of Clause Graph Resolution. Ph.D. dissertation, Universit¨ at Kaiserslautern (1988) Eisinger, N.: Completeness, Confluence, and Related Properties of Clause Graph Resolution. Pitman, London, Morgan Kaufmann Publishers,Inc., San Mateo,California (1991) Eisinger, N., Siekmann, J., Unvericht, E.: The Markgraf Karl refutation procedure. Proc of Conf on Automated Deduction, Austin, Texas (1979) Eisinger, N., Ohlbach, H. J., Pr¨ acklein, A.: Elimination of redundancies in clause sets and clause graphs (1989) SEKI report, SR-89-01, University of Karlsruhe Gabbay, D., Siekmann, J.: Logical encoding of the clause graph proof procedure, 2002, forthcoming H¨ ahnle, R., Murray, N. V., Rosenthal, E.: Ordered resolution versus connection graph resolution. In: R. Gor´e, A. Leitsch, T. Nipkow Automated Reasoning, Proc of IJCAR 2001 (2001) LNAI 2083, Springer Kowalski, R.: Search Strategies for Theorem Proving. Machine Intelligence (B.Meltzer and D.Michie, eds.), 5 Edinburgh University Press, Edinburgh, (1970) 181–201 Kowalski, R.: . A proof procedure using connection graphs. J.ACM 22 (1975) 572–595 Loveland, D. W.: A Linear Format for Resolution. Proc. of Symp. on Automatic Demonstration. Lecture Notes in Math 125, Springer Verlag, Berlin, (1970) 147– 162. Also in Siekmann and Wrightson [1983b], 377–398

An Open Research Problem

251

Loveland, D. W.: Automated Theorem Proving: A Logical Basis North- Holland, New York (1978) Meagher D., Hext, J.: Link deletion in resolution theorem proving (1998) unpublished manuscript Murray, N. V., Rosenthal, E.: Path resolution with link deletion. Proc. of 9th IJCAII Los Angeles (1985) Murray, N. V., Rosenthal, E.: Dissolution: making paths vanish. J. ACM 40 (1993) Ohlbach, H. J.: Ein regelbasiertes Klauselgraph Beweisverfahren. Proc. of German Conference on AI, GWAI-83 (1983) Springer Verlag IFB vol 76 Ohlbach, H. J.: Theory uniﬁcation in abstract clause graphs. Proc. of German Conf. on AI GWAI-85 (1985) Springer Verlag IFB vol 118 Ohlbach, H. J.: Link inheritance in abstract clause graphs J. Autom. Reasoning 3 (1987) Ohlbach, H. J., Siekmann, J.: The Markgraf Karl refutation procedure. In: J. L. Lassez, G. Plotkin, Computational Logic (1991) MIT Press, Cambridge MA Omodeo, E. G.: The linked conjunct method for automatic deduction and related search techniques. Computers and Mathematics with Applications 8 (1982) 185–203 Ramesh, A., Beckert, B., H¨ ahnle, R., Murray, N. V.: Fast subsumption checks using anti-links J. Autom. Reasoning 18 (1997) 47–83 Robinson, J.A.: A machine-oriented logic based on the resolution principle. J.ACM 12 (1965) 23–41 Shostak, R.E.: Refutation Graphs. J. Artificial Intelligence 7, (1976), 51–64 Shostak, R.E.: A Graph-Theoretic View of Resolution Theorem-Proving. Report SRI International, Menlo Park (1979) Sickel, S.: A Search Technique for Clause Interconnectivity Graphs. IEEE Trans. Comp. C-25 (1976) 823–835 Siekmann, J. H., Stephan, W.: Completeness and Soundness of the Connection Graph Proof Procedure. Bericht 7/76, Fakult¨ at Informatik, Universit¨ at Karlsruhe (1976). Also in Proceedings of AISB/GI Conference on Artificial Intelligence, Hamburg (1978) Siekmann, J. H., Stephan, W.: Completeness and Consistency of the Connection Graph Proof Procedure. Interner Bericht Institut I, Fakult¨ at Informatik, Universit¨ at Karlsruhe (1980) Siekmann, J. H., Wrightson, G.: Paramodulated connection graphs Acta Informatica 13 (1980) Siekmann, J. H., Wrightson, G.: Automation of Reasoning. Springer- Verlag, Berlin, Heidelberg, New York. Vol 1 and vol 2 (1983) Siekmann, J. H., Wrightson, G.: Erratum: A counterexample to W. Bibel’s and E. Eder’s strong completeness result for connection graph resolution. J. ACM 48 (2001) 145 Smolka, G.: Completeness of the connection graph proof procedure for unit refutable clause sets. In Proceedings of GWAI-82. Informatik Fachberichte, vol. 58. SpringerVerlag, Berlin, Germany (1982a) 191-204. Smolka, G.: Einige Ergebnisse zur Vollst¨ andigkeit der Beweisprozedur von Kowalski. Diplomarbeit, Fakult¨ at Informatik, Universit¨ at Karlsruhe (1982b) Smolka, G.: Completeness and conﬂuence properties of Kowalksi’s clause graph calculus (1982c) SEKI report SR-82-03, University of Karlsruhe, Germany Stickel, M.: A Non-Clausal Connection-Graph Resolution Theorem-Proving Program. Proceedings AAAI-82, Pittsburgh (1982) 229–233 Walthe, Chr.: Elimination of redundant links in extended connection graphs. Proc of German Workshop on AI, GWAI-81 (1981) Springer Verlag, Fachberichte vol 47

252

J¨ org Siekmann and Graham Wrightson

Wos, L.T., Carson, D.F., Robinson, G.A.: The Unit Preference Strategy in Theorem Proving. AFIPS Conf. Proc. 26, (1964) Spartan Books, Washington. Also in Siekmann and Wrightson [1983], 387–396. Wos, L.T., Robinson, G.A., Carson, D.F.: Eﬃciency and Completeness of the Set of Support Strategy in Theorem Proving. J.ACM 12, (1965) 536–541. Also in Siekmann and Wrightson [1983], 484–492 Wos, L. T, et al.: Automated Reasoning: Introduction and Applications (1984) Englewood Cliﬀs, new Jersey, Prentice-Hall Wrightson, G.: A pragmatic strategy for clause graphs or the strong completeness of connection graphs. Report 98-3, Dept Comp. Sci., Univ of Newcastle, Australia (1989) Yarmush, D. L.: The linked conjunct and other algorithms for mechanical theoremproving. Technical Report IMM 412, Courant Institute of Mathematical Sciences, New York University (1976) Yates, R. A., Raphael, B., Hart, T. P.: Resolution Graphs. Artificial Intelligence 1 (1970) 257–289.

Meta-reasoning: A Survey Stefania Costantini Dipartimento di Informatica Universit` a degli Studi di L’Aquila, via Vetoio Loc. Coppito, I-67100 L’Aquila, Italy [email protected]

Abstract We present the basic principles and possible applications of systems capable of meta-reasoning and reﬂection. After a discussion of the seminal approaches, we outline our own perception of the state of the art, mainly but not only in computational logic and logic programming. We review relevant successful applications of meta-reasoning, and the basic underlying semantic principles.

1

Introduction

The meaning of the term “meta-reasoning” is “reasoning about reasoning”. In a computer system, this means that the system is able to reason about its own operation. This is diﬀerent from performing object-level reasoning, which refers in some way to entities external to the system. A system capable of meta-reasoning may be able to reﬂect, or introspect, i.e. to shift from meta-reasoning to objectlevel reasoning and vice versa. We present the main principles and the possible applications of metareasoning and reﬂective systems. After a review of the relevant approaches, mainly in computational logic and logic programming, we discuss the state of the art and recent interesting applications of meta-reasoning. Finally, we brieﬂy summarize the semantic foundations of meta-reasoning. We necessarily express our own partial point of view on the ﬁeld and provide the references that we consider the most important. There are previous good reviews on this subject, to which we are indebted and to which we refer the reader for a wider perspective and a careful discussion of problems, foundations, languages, approaches, and systems. We especially mention [1], [2], [3]. Also, the reader may refer, for the computational logic aspects, to the Proceedings of the Workshops on Meta-Programming in Logic [4], [5], [6], [7], [8]. Much signiﬁcant work on Meta-Programming was carried out in the Esprit funded European projects Compulog I and II. Some of the results of this work are discussed in the following sections. For a wider report we refer the reader to [9]. More generally, about meta-reasoning in various kinds of paradigms, including object-oriented, functional and imperative languages, the reader may refer to [10] [11], [12]. A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 253–288, 2002. c Springer-Verlag Berlin Heidelberg 2002

254

Stefania Costantini

Research about meta-reasoning and reﬂection in computer science has its roots in principles and techniques developed in logic, since the fundamental work of G¨odel and Tarski, for which it may be useful to refer to the surveys [13], [14]. In meta-level approaches, knowledge about knowledge is represented by admitting sentences to be arguments of other sentences, without abandoning the framework of ﬁrst-order logic. An alternative important approach to formalize knowledge about knowledge is the modal approach that has initially been developed by logicians and philosophers and then has received a great deal of attention in the ﬁeld of Artiﬁcial Intelligence. It aims at formalizing knowledge by a logic language augmented by a modal operator, interpreted as knowledge or belief. Thus, sentences can be expressed to represent properties of knowledge (or belief). The most common modal systems adopt a possible world semantics [15]. In this semantics, knowledge and belief are regarded as propositions specifying the relationship between knowledge expressed in the theory and the external world. For a review of modal and meta-languages, focused on their expressivity, on consistency problems and on the possibility of translating modal languages into a meta-level setting, the reader may refer to [16].

2

Meta-programming and Meta-reasoning

Whatever the underlying computational paradigm, every piece of software included in any system (in the following, we will say software component ) manipulates some kind of data, organized in suitable data structures. Data can be used in various ways: for producing results, sending messages, performing actions, or just updating the component’s internal state. Data are often assumed to denote entities which are external to the software component. Whenever the computation should produce eﬀects that are visible in the external environment, it is necessary to assume that there exists a causal connection between the software system and the environment, in the sense that the intended eﬀect is actually achieved, by means of suitable interface devices. This means, if the software component performs an action in order, for instance, either to print some text, or to send an e-mail message, or to switch a light on, causal connection should guarantee that this is what actually happens. There are software components however that take other programs as data. An important well-known example is a compiler, which manipulates data structures representing the source program to be translated. A compiler can be written in the language it is intended to translate (for instance, a C compiler can be written in C), or in a diﬀerent language as well. It is important to notice that in any case there is no mixture between the compiler and the source program. The compiler performs a computation whose outcome is some transformed form of the source program. The source program is just text, recorded in a suitable data structure, that is step by step transformed into other representations. In essence, a compiler accepts and manipulates a description of the source program.

Meta-reasoning: A Survey

255

In logic, a language that takes sentences of another language as its objects of discourse is called a meta-language. The other language is called the object language. A clear separation between the object language and the meta-language is necessary: namely, it consists in the fact that sentences written in the metalanguage can refer to sentences written in the object language only by means of some kind of description, or encoding, so that sentences written in the object language are treated as data. As it is well-known, Kurt G¨odel developed a technique (g¨ odelization) for coding the formulas of the theory of arithmetic by means of numbers (g¨odel numbers). Thus, it became possible to write formulas for manipulating other formulas, the latter represented by the corresponding g¨ odel numbers. In this view a compiler is a meta-program, and writing a compiler is more than just programming: it is meta-programming. The language in which the compiler is written acts as a meta-language. The language in which the source program is written acts as the object language. More generally, all tools for program analysis, debugging and transformation are meta-programs. They perform a kind of meta-programming that can be called syntactic meta-programming. Syntactic meta-programming can be particularly useful for theorem proving. In fact, as ﬁrst stressed in [17] and [18], many lemmas and theorems are actually meta-theorems, asserting the validity of a fact by simply looking at its syntactic structure. In this case a software component, namely the theorem prover, consists of two diﬀerent parts: one, that we call the object level, where proofs are performed by repeatedly applying the inference rules; another one, that we call the meta-level, where meta-theorems are stated. We may notice that a theorem prover is an “intelligent” system that performs deduction, which is a form of (mechanized) “reasoning”. Then, we can say that the theorem prover at the object level performs “object-level reasoning”. Meta-theorems take as arguments the description of object-level formulas and theorems, and meta-level proofs manipulate these descriptions. Then, at the meta-level the system performs reasoning about entities that are internal to the system, as opposed to object-level reasoning that concerns entities denoting elements of some external domain. This is why we say that at the meta-level the theorem prover performs “meta-level reasoning”, or shortly meta-reasoning. Meta-theorems are a particular kind of meta-knowledge, i.e. knowledge about properties of the object-level knowledge. The object and the meta-level can usefully interact: meta-theorems can be used in order to shorten object-level proofs, thus improving the eﬃciency of the theorem prover, which can derive proofs more easily. In this view, meta-theorems may constitute auxiliary inference rules that enhance (in a pragmatic view) the “deductive power” of the system [19] [20]. Notice that, at the meta-level, new meta-theorems can also be proved, by applying suitable inference rules. As pointed out in [21], most software components implicitly incorporate some kind of meta-knowledge: there are pieces of object-level code that “do” something in accordance to what meta-knowledge states. For instance, an object-level planner program might “know” that near(b,a) holds whenever near(a,b) holds,

256

Stefania Costantini

while this is not the case for on(a,b). A planner with a meta-level could explicitly encode a meta-rule stating that whenever a relation R is symmetric, then R(a, b) is equivalent to R(b, a) and whenever instead a relation is antisymmetric this is never the case. So, at the meta-level, there could be statements that near is symmetric and on is antisymmetric. The same results could then be obtained by means of explicit meta-reasoning, instead of implicit “knowledge” hidden in the code. The advantage is that the meta-reasoning can be performed in the same way for any symmetric and antisymmetric relation that one may have. Other properties of relations might be encoded at the meta-level in a similar way, and such a meta-level speciﬁcation (which is independent of the speciﬁc object-level knowledge or application domain) might be reused in future applications. There are several possible architectures for meta-knowledge and metareasoning, and many applications. Some of them are reviewed later. For a wider perspective however, the reader may refer to [22], [23], [24], [25], [20], [26], [27], [28], [29], [30], [31], [32], [33] where various speciﬁc architectures, applications and systems are discussed.

3

Reification

Meta-level rules manipulate a representation of object-level knowledge. Since knowledge is represented in some kind of language, meta-rules actually manipulate a representation of syntactic expressions of the object-level language. In analogy with natural language, such a representation is usually called a name of the syntactic expression. The diﬀerence between a word of the language, such as for instance ﬂower, and a name, like “ﬂower”, is the following: the word is used to denote an entity of the domain/situation we are talking about; the name denotes the word, so that we can say that “ﬂower” is composed of six characters, is expressed in English and its translation into Italian is “ﬁore”. That is, a word can be used, while a name can be inspected (for instance to count the characters) and manipulated (for instance translated). An expression in a formal language may have diﬀerent kinds of names that allow diﬀerent kinds of meta-reasoning to be made on that expression. Names are expressions of the meta-language. Taking for instance an equation such as a=b − 2 we may have a very simple name, like in natural language, i.e. “a = b − 2” This kind of name, called quotation mark name, is usually intended as a constant of the meta-language.

Meta-reasoning: A Survey

257

A name may be instead a complex term, such as: equation (lef t hand side(variable(“a”)), (right hand side (binop(minus, f irstop(variable(“b”)), secondop(constant(“2”))))) This term describes the equation in terms of its left-hand side and righthand side and then describes the right-hand side as the application of a binary operator (binop) on two operands (f irstop and secondop) where the ﬁrst operand is a variable and the second one a constant. “a”, “b” and “2” are constants of the meta-language, they are the names of the expressions a, b and 2 of the object language. This more complex name, called a structural description name, makes it easier to inspect the expression (for instance to see whether it contains variables) and to manipulate it (for instance it is possible to transform this name into the name of another equation, by modifying some of the composing terms). Of course, many variations are possible in how detailed names are, and what kind of detail they express. Also, many choices can be made about what names should be: for instance, the name of a variable can be a meta-constant, but can also be a meta-variable. For a discussion of diﬀerent possibilities, with their advantages and disadvantages, see [34], [35], [36]. The deﬁnition of names, being a relation between object-level expressions and meta-level expressions that play the role of names, is usually called naming relation. Which naming relation to choose? In general, it depends upon the kind of meta-reasoning one wants to perform. In fact, a meta-theory can only reason about the properties of object-level expressions made explicit by the naming relation. We may provide names to any language expression, from the simplest, to the more complex ones. In a logic meta-language, we may have names for variables, constants, function and predicate symbols, terms and atoms and even for entire theories: the meta-level may in principle encode and reason about the description of several object-level theories. In practice, there is a trade-oﬀ between expressivity and simplicity. In fact, names should be kept as simple as possible, to reduce the complexity (and improve the readability) of meta-level expressions. Starting from these considerations, [37] argues that the naming relation should be adapted to each particular case and therefore should be deﬁnable by the user. In [38] it is shown that two diﬀerent naming relations can coexist in the same context, for diﬀerent purposes, also providing operators for transforming one representation into the other one. The deﬁnition of a naming relation implies the deﬁnition of two operation: the ﬁrst one, to compute the name of a given language expression. The second one, to compute the expression a given name stands for. The operation of obtaining the name of an object-level expression is called reiﬁcation or referentiation or quoting. The inverse operation is called dereiﬁcation or dereferentiation or unquoting. These are built-in operations, whose operational semantics consists in applying the naming relation in the two directions.

258

Stefania Costantini

In [39] it is shown how the naming relation can be a sort of input parameter for a meta-language. That is, a meta-language may be, if carefully designed, to a large extent independent of the syntactic form of names, and of the class of expressions that are named. Along this line, in [36] and [33] a full theory of deﬁnable naming relations is developed, where a naming relation (with some basic properties) can be deﬁned as a set of equations, with the associated rewrite system for applying referentiation/dereferentiation.

4

Introspection and Reflection

The idea that meta-knowledge and meta-reasoning could be useful for improving the reasoning performed at the object level (for instance by exploiting properties of relations, like symmetry), suggests that the object and the meta-level should interact. In fact, the object and the meta-level can be seen as diﬀerent software components that interact by passing the control to each other. At the object level, the operation of referentiation allows an expression to be transformed into its name and this name can be given as input argument to a meta-level component. This means that object-level computation gives place to meta-level computation. This computational step is called upward reﬂection, or introspection, or shift up. Upward because the meta-level is considered to be a “higher level” with respect to the object level. Reﬂection, or introspection, because the object level component suspends its activity, in order to initiate a meta-level one. This is meant to be in analogy with the process by which people become conscious (at the meta-level of mind) of mental states they are currently in (at the object level). The inverse action, that consists in going back to the object-level activity, is called downward reﬂection, or shift down. The object-level activity can be resumed from where it had been suspended, or can be somehow restarted. Its state (if any) can be the same as before, or can be altered, according to the meta-level activity that has been performed. Downward reﬂection may imply that some name is dereferenced and the resulting expression (“extracted” from the name) given as input argument to the resumed or restarted object-level activity. In logical languages, upward and downward reﬂection can be speciﬁed by means of special inference rules (reﬂection rules) or axioms (reﬂection axioms), that may also state what kind of knowledge is exchanged. In functional and procedural languages, part of the run-time state of the object-level ongoing computation can be reiﬁed and passed to a meta-level function/procedure that can inspect and modify this state. When this function terminates, object-level computation resumes on a possibly modiﬁed state. A reﬂection act, that shifts the level of the activity between the object and the meta-level, may be: explicit, in the sense that it is either invoked by the user (in interactive systems) or determined by some kind of speciﬁcation explicitly present in the text of the theory/program; implicit, in the sense that it is auto-

Meta-reasoning: A Survey

259

matically performed upon occurrence of certain predeﬁned conditions. Explicit and implicit reﬂection may co-exist. Both forms of reﬂection rely on the requirement of causal connection or, equivalently, of introspective ﬁdelity: that is, the recommendations of the metalevel must be always followed at the object level. For instance, in the procedural case, the modiﬁcations to the state performed at the meta-level are eﬀective and have a corresponding impact on the object-level computation. The usefulness of reﬂection consists exactly in the fact that the overall system (object + metalevels) not only reasons about itself, but is also properly aﬀected by the results of that reasoning. In summary, a meta-level architecture for building software components has to provide the possibility of deﬁning a meta-level that by means of a naming relation can manipulate the representation of object-level expressions. Notice that the levels may be several: beyond the meta-level there may be a meta-metalevel that uses a naming relation representing meta-level expressions. Similarly, we can have a meta-meta-meta-level, and so on. Also, we may have one object level and several independent meta-levels with which the object level may be from time to time associated, for performing diﬀerent kinds of meta-reasoning. The architecture may provide a reﬂection mechanism that allows the diﬀerent levels to interact. If the reﬂection mechanism is not provided, then the computation is performed at the meta-level, that simulates the object-level formulas through the naming relation and simulates the object-level inference rules by means of meta-level axioms. As discussed later, this is the case in many of the main approaches to meta-reasoning. The languages in which the object level and the meta-level(s) are expressed may be diﬀerent, or they may coincide. For instance, we may have a meta-level based on a ﬁrst-order logic language, were meta-reasoning is performed about an object level based on a functional or imperative language. Sometimes the languages coincide: the object language and the meta-language may be in fact the same one. In this case, this language is expressive enough as to explicitly represent (some of) its own syntactic expressions, i.e. the language is capable of self-reference. An interesting deep discussion about languages with self-reference can be found in [40] and [41]. The role of introspection in reasoning is discussed in [42] and [43]. An interesting contribution about reﬂection and its applications is [44].

5 5.1

Seminal Approaches FOL

FOL [19], standing for First Order Logic, has been (to the best of our knowledge) the ﬁrst reﬂective system appeared in the literature. It is a proof checker based on natural deduction, where knowledge and meta-knowledge are expressed in diﬀerent contexts. The user can access these contexts both for expressing and for inferring new facts.

260

Stefania Costantini

The FOL system consists of a set of theories, called contexts, based on a ﬁrst-order language with sorts and conditional expressions. A special context named META describes the proof theory and some of the model theory of FOL contexts. Given a speciﬁc context C that we take as the object theory, the naming relation is deﬁned by attachments, which are user-deﬁned explicit deﬁnitions relating symbols and terms in META with their interpretation in C. The connection between C and META is provided by a special linking rule that is applicable in both directions: T heorem(“W ”) W where W is any formula in the object theory C, “W ” is its name, and Theorem(“W ”) is a fact in the meta-theory. By means of a special primitive, called REFLECT, the linking rule can be explicitly applied by the user. Its effect is either that of reﬂecting up a formula W to the meta-theory, to derive meta-theorems involving “W ”, or vice versa that of reﬂecting down a metatheorem “W ”, so that W becomes a theorem of the theory. Meta-theorems can therefore be used as subsidiary deduction rules. Interesting applications of the FOL system to mathematical problems can be found in [17], [45]. 5.2

Amalgamating Language and Meta-language in Logic Programming

A seminal approach to reﬂection in the context of the Horn clause language is MetaProlog, proposed by Bowen and Kowalski [46]. The proposal is based on representing Horn clause syntax and provability in the logic itself, by means of a meta-interpreter, i.e. an interpreter of the Horn clause language written in the Horn clause language itself. Therefore, also in this case the object language and the meta-language coincide. The concept (and the ﬁrst implementation) of a meta-interpreter was introduced by John McCarthy for the LISP programming language [47]. McCarthy in particular deﬁned a universal function, written in LISP, which represents the basic features of a LISP interpreter. In particular, the universal function is able to: (i) accept as input the deﬁnition of a LISP function, together with the list of its arguments; (ii) evaluate the given function on the given arguments. Bowen and Kowalski, with MetaProlog, have developed this powerful and important idea in the ﬁeld of logic programming, where the inference process is based on building proofs from a given theory, rather than on evaluating functions. The Bowen and Kowalski meta-interpreter is speciﬁed via a predicate demo, that is deﬁned by a set of meta-axioms P r, where the relevant aspects of Hornclause provability are made explicit. The Demo predicate takes as ﬁrst argument the representation (name) of an object-level theory T and the representation (name) of a goal A. Demo(“T”,“A”) means that the goal A is provable in the theory T .

Meta-reasoning: A Survey

261

With the above formulation, we might have an approach where inference is performed at the meta-level (via invocation of Demo) and the object level is simulated, by providing Demo with a suitable description “T ” of an object theory T . The strength and originality of MetaProlog rely instead in the amalgamation between the object level and the meta-level. It consists in the introduction of the following linking rules for upward and downward reﬂection: T L A P r M Demo(“T ”, “A”)

P r M Demo(“T ”, “A”) T L A

where M means provability at the meta-level M and L means provability at the object level L. The application of the linking rules coincides, in practice, with the invocation of Demo, i.e., reﬂection is explicit. Amalgamation allows mixed sentences: there can be object-level sentences where the invocation of Demo determines a shift up to the meta-level, and meta-level sentences where the invocation of Demo determines a shift down to the object level. Since moreover the theory in which deduction is performed is an input argument of Demo, several object-level and meta-level theories can co-exist and can be used in the same inference process. Although the extension is conservative, i.e. all theorems provable in L+M are provable either in L or in M alone, the gain of expressivity, in practical terms, is great. Many traditional problems in knowledge representation ﬁnd here a natural formulation. The extension can be made non-conservative, whenever additional rules are added to Demo, to represent auxiliary inference rules and deduction strategies. Additional arguments can be added to Demo for integrating forms of control in the basic deﬁnition of provability. For instance it is possible to control the amount of resources consumed by the proof process, or to make the structure of the proof explicit. The semantics of the Demo predicate is, however, not easy to deﬁne (see e.g. [35], [48], [49], [50]), and holds only if the meta-theory and the linking rules provide an extension to the basic Horn clause language which is conservative, i.e., only if Demo is a faithful representation of Horn clause provability. Although the amalgamated language is far more expressive than the object language alone, enhanced meta-interpreters are (semantically) ruled out, since in that case the extension is non-conservative. In practice, the success of the approach has been great: enhanced metainterpreters are used everywhere in logic programming and artiﬁcial intelligence (see for instance [51], or any other logic programming textbook). This seminal work has initiated the whole ﬁeld of meta-programming in logic programming and computational logic. Problems and promises of this ﬁeld are discussed by Kowalski himself in [52], [53]. The approach of meta-interpreters and other relevant applications of meta-programming are discussed in the next section.

262

5.3

Stefania Costantini

3-LISP

3–Lisp [54] is another important example of a reﬂective architecture where the object language and meta-language coincide. 3–Lisp is a meta-interpreter for Lisp (and therefore it is an elaboration of McCarthy’s original proposal) where (the interesting aspects of) the state of the program that is being interpreted are not stored, but are passed by as an argument of all the functions that are internal to the meta-interpreter. Then, each of these procedures takes the state as argument, makes some modiﬁcation and passes the modiﬁed state to another internal procedure. These procedures call each other tail-recursively (i.e. the next procedure call is the last action they make) so as the state remains always explicit. Such a meta-interpreter is called a meta-circular interpreter. If one assumes that the meta-circular interpreter is itself executed by another metacircular interpreter and so on, one can imagine a potentially inﬁnite tower of interpreters, the lowest one executing the object level program (see the summary and formalization of this approach presented in [55]). Here, the meta-level is accessible from the object level at run-time through a reﬂection act represented by a special kind of function invocation. Whenever the object-level program invokes any function f in this special way, f receives as an additional parameter a representation of the state of the program itself. Then, f can inspect and/or modify the state, before returning control to object-level execution. A reﬂective act implies therefore the reiﬁcation of the state and the execution of f as if it were a procedure internal to the interpreter. Since f might in turn contain a reﬂection act, the meta-circular interpreter is able to reify its own state and start a brand-new copy of itself. In this approach one might in principle perform, via reﬂection, an inﬁnite regress on the reﬂective tower of interpreters. A program is thus able to interrupt its computation, to change something in its own state, and to continue with a modiﬁed interpretation process. This kind of mechanism is called computational reﬂection. The semantics of computational reﬂection is procedural, however, rather than declarative. A reﬂective architecture conceptually similar to 3-Lisp has been proposed for the Horn clause language and has been fully implemented [56]. Although very procedural in nature, and not easy to understand in practice, computational reﬂection has been having a great success in the last few years, especially in the context of imperative and object-oriented programming [11], [12]. Some authors even propose computational reﬂection as the basis of a new programming paradigm [57]. Since computational reﬂection can be perceived as the only way of performing meta-reasoning in non-logical paradigms, this success enlights once more how important meta-reasoning is, especially for complex applications. 5.4

Other Important Approaches

The amalgamated approach has been experimented by Attardi and Simi in Omega [58]. Omega is an object-oriented formalism for knowledge representation

Meta-reasoning: A Survey

263

which can deal with meta-theoretical notions by providing objects that describe Omega objects themselves and derivability in Omega. A non-amalgamated approach in logic programming is that of the G¨ odel language, where object theory and meta-theory are distinct. G¨ odel provides a (conservative) provability predicate, and an explicit form of reﬂection. The language has been developed and experimented in the context of the Compulog European project. It is described in the book [59]. In [60] a contribution to meta-programming in G¨ odel is proposed, on two aspects: on the one hand, a programming style for eﬃcient meta-programming is outlined; on the other hand, modiﬁcations to the implementation are proposed, in order to improve the performance of meta-programs. A project that extends and builds on both FOL and 3–Lisp is GETFOL [61],[62]. It is developed on top of a novel implementation of FOL (therefore the approach is not amalgamated: the object theory and meta-theory are distinct). GETFOL is able to introspect its own code (lifting), to reason deductively about it in a declarative meta-theory and, as a result, to produce new executable code that can be pushed back to the underlying interpretation (ﬂattening). The architecture is based on a sharp distinction between deduction (FOL style) and computation (3–Lisp style). Reﬂection in GETFOL gives access to a meta-theory where many features of the system are made explicit, even the code that implements the system itself. The main objective of GETFOL is that of implementing theorem-provers, given its ability of implementing ﬂexible control strategies to be adapted (via computational reﬂection) to the particular situation. Similarly to FOL, the kind of reasoning performed in GETFOL consists in: (i) performing some reasoning at the meta-level; (ii) using the results of this reasoning to assert facts at the object level. An interesting extension is that of applying this concept to a system with multiple theories and multiple languages (each theory formulated in its own language) [63], where the two steps are reinterpreted as (i) doing some reasoning in one theory and (ii) jumping into another theory to do some more reasoning on the basis of what has been derived in the previous theory. These two deductions are concatenated by the application of bridge rules, which are inference rules where the premises belong to the language of the former theory, and the conclusion belongs to the language of the latter. A diﬀerent concept of reﬂection is embodied in Reﬂective Prolog [39] [64] [65], a self-referential Horn clause language with logical reﬂection. The objective of this approach is that of developing a more expressive and powerful language, while preserving the essential features of logic programming: Horn clause syntax, model-theoretic semantics, resolution via uniﬁcation as procedural semantics, correctness and completeness properties. In Reﬂective Prolog, Horn clauses are extended with self-reference and resolution is extended with logical reﬂection, in order to achieve greater expressive and inference power. The reﬂection mechanism is implicit, i.e., the interpreter of the language automatically reﬂects upwards and downwards by applying suit-

264

Stefania Costantini

able linking rules called reﬂection principles. This allows reasoning and metareasoning to interleave without user’s intervention, so as to exploit both knowledge and meta-knowledge in proofs: in most of the other approaches instead, there is one level which is “ﬁrst–class”, where deduction is actually performed, and the other level which plays a secondary role. Reﬂection principles are embedded in both the procedural and the declarative semantics of the language, that is, in the extended resolution procedure which is used by the interpreter and in the construction of the models which give meanings to programs. Procedurally, this implies that there is no need to axiomatize provability in the meta-theory. Object level reasoning is not simulated by meta-interpreters, but directly executed by the language interpreter, thus avoiding unnecessary ineﬃciency. Semantically, a theory composed of an object level and (one or more) meta-levels is regarded as an enhanced theory, enriched by new axioms which are entailed by the given theory and by the reﬂection principles interpreted as axiom schemata. Therefore, in Reﬂective Prolog, language and metalanguage are amalgamated in a non-conservative extension. Reﬂection in Reﬂective Prolog gives access to a meta-theory where various kinds of meta-knowledge can be expressed, either about the application domain or about the behavior of the system. Deduction in Reﬂective Prolog means using at each step either meta-level or object level knowledge, in a continuous interleaving between levels. Meta-reasoning in Reﬂective Prolog implies a declarative deﬁnition of meta-knowledge, which is automatically integrated into the inference process. The relation between meta-reasoning in Reﬂective Prolog and modal logic has been discussed in [66]. An interpreter of Reﬂective Prolog has been fully implemented [67]. It is interesting to notice that Reﬂective Prolog has been implemented by means of computational reﬂection. This is another demonstration that computational reﬂection can be a good (although low-level) implementation tool. An approach that has been successful in the context of object-oriented languages, including the most recent ones like Java, is the meta-object protocol. A meta-object protocol [68] [69] gives every object a corresponding meta-object that is an instance of a meta-class. Then, the behavior of an object becomes the behavior of the object/meta-object pair. At the meta-level, important aspects such as the operational semantics of inheritance, instantiation and method invocation can be deﬁned. A meta-object protocol constitutes a ﬂexible mean of modifying and extending an object-oriented language. This approach has been applied to logic programming, in the ObjVProlog language [70] [71]. In addition to the above-mentioned meta-class capabilities, this language preserves the Prolog capabilities of manipulating clauses in the language itself, and provides a provability predicate. As an example of more recent application of this approach, a review of Java reﬂective implementations can be found in [72]. A limitation is that only aspects directly related to objects can be described in a meta-object. Properties of sets of objects, or of the overall system, cannot

Meta-reasoning: A Survey

265

be directly expressed. Nevertheless, some authors [72] argue that non-functional requirements such as security, fault-tolerance, atomicity, can be implemented by implicit reﬂection to the meta-object before and after the invocation of every object method.

6

Applications of Meta-reasoning

Meta-reasoning has been widely used for a variety of purposes, and recently the interest in new potential applications of meta-reasoning and reﬂection has been very signiﬁcant. In this section, we provide our (necessarily partial and limited) view of some of the more relevant applications in the ﬁeld. 6.1

Meta-interpreters

After the seminal work of Bowen and Kowalski [46], the most common application of meta-logic in computational logic is to deﬁne and to implement metainterpreters. This technique has been especially used in Prolog (which is probably the most popular logic programming language) for a variety of purposes. The basic version of a meta-interpreter for propositional Horn clause programs, reported in [53], is the following. demo(T, P ) ← demo(T, P ← Q), demo(T, Q). demo(T, P ∧ Q) ← demo(T, P ), demo(T, Q). In the above deﬁnition, ’∧’ names conjunction and ’←’ names ’←’ itself. A theory can be named by a list containing the names of its sentences. In the propositional case, formulas and their names may coincide without the problems of ambiguity (discussed below), that arise in presence of variables. If a theory is represented by a list, then the meta-interpreter must be augmented by the additional meta-axiom: demo(T, P ) ← member(T, P ). For instance, query ?q to program q ← p, s. p. s. can be simulated by query ?demo([q ← p ∧ s, p, s], q) to the above metainterpreter. Alternatively, it is possible to use a constant symbol to name a theory. In this case, the theory, say t1, can be deﬁned by the following metalevel axioms: demo(t1, q ← p ∧ s). demo(t1, p). demo(t1, s). and the query becomes ?demo(t1, q).

266

Stefania Costantini

The meta-axioms deﬁning demo can be themselves regarded as a theory that can be named, by either a list or a constant (say d). Thus, it is possible to write queries like ?demo(d, demo(t1, q)) which means to ask whether we can derive, by the meta-interpreter d, that the goal q can be proved in theory t1. In many Prolog applications however, the theory argument is omitted, as in the so-called “Vanilla” meta-interpreter [35]. The standard declarative formulation of the Vanilla meta-interpreter in Prolog is the following (where ’:−’ is the Prolog counterpart of ’←’ and ’&’ indicates conjunction): demo(empty). demo(X) :−clause(X, Y ), demo(Y ). demo(X&Y ) :−demo(X), demo(Y ). For the above object-level program, we should add to the meta-interpreter the unit clauses: clause(q, p&s). clause(p, empty). clause(s, empty).. and the query would be :− demo(q). The vanilla meta-interpreter can be used for propositional programs, as well as for programs containing variables. In the latter case however, there is an important ambiguity concerning variables. In fact, variables in the object-level program are meant to range (as usual) over the domain of the program. These variables are instantiated to object-level terms. Instead, the variables occurring in the deﬁnition of the meta-interpreter, are intended to range over object-level atoms. Then, in a correct approach these are meta-variables (for an accurate discussion of this problem see [34]). In [35], a typed version of the Vanilla meta-interpreter is advocated and its correctness proved. In [46] and [65], suitable naming mechanisms are proposed to overcome the problem. Since however it is the untyped version that is generally used in Prolog practice, some researchers have tried to specify a formal account of the Vanilla metainterpreter as it is. In particular, a ﬁrst-order logic with ambivalent syntax has been proposed to this purpose [73], [74] and correctness results have been obtained [75]. The Vanilla meta-interpreter can be enhanced in various ways, often by making use of built-in Prolog meta-predicates that allow Prolog to act as a metalanguage of itself. These predicates in fact are aimed at inspecting, building and modifying goals and at inspecting the instantiation status of variables. First, more aspects of the proof process can be made explicit. In the above formalization, uniﬁcation is implicitly demanded to the underlying Prolog interpreter and so is the order of execution of subgoals in conjunctions. Below is a formulation where these two aspects become explicit. Uniﬁcation is performed by a unify procedure and reorder rearranges subgoals of the given conjunction.

Meta-reasoning: A Survey

267

demo(empty). demo(X) :−clause(H, Y ), unif y(H, X, Y, Y 1), demo(Y 1). demo(X&Y ) :−reorder(X&Y, X1&Y 1), demo(X1), demo(Y 1). Second, extra arguments can be added to demo, to represent for instance: the maximum number of steps that demo is allowed to perform; the actual number of steps that demo has performed; the proof tree; an explanation to be returned to a user and so on. Clearly, the deﬁnition of the meta-interpreter will be suitably modiﬁed according to the use of the extra arguments. Third, extra rules can enhance the behavior of the meta-interpreter, by specifying auxiliary deduction rules. For instance, the rule demo(X) :−ask(X, yes). states that we consider X to be true, if the user answers “yes” when explicitly asked about X. In this way, the meta-interpreter exhibits an interactive behavior. The auxiliary deduction rules may be several and may interact. In Reﬂective Prolog, [65] one speciﬁes the additional rules only, while the deﬁnition of standard provability remains implicit. In the last example for instance, on failure of goal X, a goal demo(X) would be automatically generated (this is an example of implicit upward reﬂection), thus employing the additional rule to query the user about X. An interesting approach to meta-interpreters is that of [76], [77], where a binary predicate demo may answer queries with uninstantiated variables, which represent arbitrary fragments of the program currently being executed. The reader may refer to [51] for an illustration of the meta-interpreter programming techniques and of their applications, including the speciﬁcation of Expert Systems in Prolog. 6.2

Theory Composition and Theory Systems

Theory construction and combination is an important tool of software engineering, since it promotes modularity, software reuse and programming-in-thelarge. In [53] it is observed that theory-construction can be regarded as a metalinguistic operation. Within the Compulog European projects, two meta-logic approaches to working with theories have been proposed. In the Algebra of Logic Programs, proposed in [78] and [79], a program expression deﬁnes a combination of object programs (that can be seen as theories, or modules) through a set of composition operators. The provability of a query with respect to a composition of programs can be deﬁned by meta-axioms specifying the intended meaning of the various composition operations. Four basic operations for composing logic programs are introduced: encapsulation (denoted by ∗), union (∪), intersection (∩) and import (). Encapsulation copes with the requirement that a module can import from another one only its functionality, without caring of the implementation. This kind of behavior can be realized by encapsulation and union: if P is the “main program” and S is a module, the combined program is: P ∪ S∗

268

Stefania Costantini

Intersection yields a combined theory where both the original theories are forced to agree during deduction, on every single partial conclusion. The operation builds a module P Q out of two modules P and Q, where P is the visible part and Q the hidden part of the resulting module. The usefulness of these operators for knowledge representation and reasoning is shown in [78]. The meta-logical deﬁnition of the operations is given in [79], by extending the Vanilla meta-interpreter. Two alternative implementations using the G¨ odel programming language are proposed and discussed in [80]. One extends the untyped Vanilla meta-interpreter. The other one exploits the metaprogramming facilities oﬀered by the language, thus using names and typed variables. The second, cleaner version seems to the authors themselves more suitable than the ﬁrst one, for implementing program composition operations requiring a ﬁne-grained manipulation of the object programs. In the Alloy language, proposed in [81] and [82], a theory system is a collection of interdependent theories, some of which stand in a meta/object relationship, forming an arbitrary number of meta-levels. Theory systems are proposed for a meta-programming based software engineering methodology aimed at specifying, for instance, reasoning agents, programs to be manipulated, programs that manipulate them, etc. The meta/object relationship between theories provides the inspection and control facilities needed in these applications. The basic language of theory systems is a deﬁnite clause language, augmented with ground names for every well-formed expression of the language. Each theory is named by a ground theory term. A theory system can be deﬁned out of a collection of theories by using the following tools. 1. The symbol ’’ for relating theory terms and sentences. A theoremhood statement, like for instance t1 u1 Ψ where t1 and u1 are theory terms, says that u1 Ψ is a theorem of theory t1 . 2. The distinguishes function symbol ’’, where t1 t2 means that t1 is a metatheory of t2 . 3. The coincidence statement t1 ≡ t2 , expressing that t1 and t2 have exactly the same theorems. The behavior of the above operators is deﬁned by reﬂection principles (in the form of meta-axioms) that are suitably integrated in the declarative and proof-theoretic semantics. 6.3

The Event Calculus

Representing and reasoning about actions and temporally-scoped relations has been for years one of the key research topics in knowledge representation [83]. The Event Calculus (EC) has been proposed by Kowalski and Sergot [84] as a system for reasoning about time and actions in the framework of Logic Programming. In particular, the Event Calculus adapts the ontology of McCarthy and Hayes’s Situation Calculus [85] i.e., actions and ﬂuents 1 , to a new task: assimilating a narrative, which is the description of a course of events. The essential 1

It is interesting to notice that the ﬂuent/ﬂuxion terminology dates back to Newton

Meta-reasoning: A Survey

269

idea is to have terms, called ﬂuents, which are names of time-dependent relations. Kowalski and Sergot however write holds(r(x, y), t) which is understood as “ﬂuent r(x, y) is true at time t”, instead of r(x, y, t) like in situation calculus. It is worthwhile to discuss the connection between Kowalski’s work on metaprogramming and the deﬁnition of the Event Calculus. In the logic programming framework it comes natural to recognize the higher-order nature of timedependent propositions and to try to represent them at the meta-level. Kowalski in fact [86] considers McCarthy’s Situation Calculus and comments: Thus we write Holds(possess(Bob, Book1), S0) instead of the weaker but also adequate P ossess(Bob, Book1, S0). In the ﬁrst formulation, possess(Bob, Book1) is a term which names a relationship. In the second, P ossess(Bob, Book1, S0) is an atomic formula. Both representations are expressed within the formalism of ﬁrstorder classical logic. However, the ﬁrst allows variables to range over relationships whereas the second does not. If we identify relationships with atomic variable-free sentences, then we can regard a term such as possess(Bob, Book1) as the name of a sentence. In this case Holds is a meta-level predicate [ . . . ] There is a clear advantage with reiﬁcation from the computational point of view: by reifying, we need to write only one frame axiom, or inertia law, saying that truth of any relation does not change in time unless otherwise speciﬁed. Negation-as-failure is a natural choice for implementing the default inertia law. In a simpliﬁed, time points-oriented version, default inertia can be formulated as follows: Holds(f, t) ← Happens(e), initiates(e, f ), Date(e, ts ), ts < t, not Clipped(ts , f, t) where Clipped(ts , f, t) is true when there is record of an event happening between ts and t that terminates the validity of f . In other words, Holds(f, t) is derivable whenever in the interval between the initiation of the ﬂuent and the time the query is about, no terminating events has happened. It is easy to see Holds as a specialization of Demo. Kowalski and Sadri [87] [88], discuss in depth how an Event Calculus program can be speciﬁed and assumptions on the nature of the domain accommodated, by manipulating the usual Vanilla meta-interpreter deﬁnition.

270

Stefania Costantini

Since the ﬁrst proposal, a number of improved formalization have steamed, in order to adapt the calculus to diﬀerent tasks, such as abductive planning, diagnosis, temporal database and models of legislation. All extensions and applications cannot be accounted for here, but the reader may for instance refer to [89], [90], and [91]. 6.4

Logical Frameworks

A logical framework [92] is a formal system that provides tools for experimenting with deductive systems. Within a logical framework, a user can invent a new deductive system by deﬁning its syntax, inference rules and proof-theoretic semantics. This speciﬁcation is executable, so as the user can make experiments with this new system. A logical framework however cannot reasonably provide tools for deﬁning any possible deductive system, but will stay within a certain class. Formalisms with powerful meta-level features and strong semantic foundations have the possibility of evolving towards becoming logical frameworks. The Maude system for instance [93] is a particular implementation of the meta-theory of rewriting logic. It provides the predeﬁned functional module META-LEVEL, where Maude terms can be reiﬁed and where: the process of reducing a term to a normal form is represented by a function meta-reduce; the default interpreter is represented by a function meta-rewrite; the application of a rule to a term by meta-apply. Recently, a reﬂective version of Maude has been proposed [94], based on the formalization of computational reﬂection proposed in [95]. The META-LEVEL module has been made more ﬂexible, so as to allow a user to deﬁne the syntax of her own logic language L by means of meta-rules. The new language must however consist in an addition/variation to the basic syntax of the Maude language. Reﬂection is the tool for integrating the user-deﬁned syntax into the proof procedure of Maude. In particular, whenever a piece of user-deﬁned syntax is found, a reﬂection act to the META-LEVEL module happens, so as to apply the corresponding syntactic meta-rules. Then, the rewriting system Maude has evolved into a logical framework for logic languages based on rewriting. The RCL (Reﬂective Computational Logic) logical framework [33] is an evolution of the Reﬂective Prolog metalogic language. The implicit reﬂection of Reﬂective Prolog has a semantic counterpart [39] in adding to the given theory a set of new axioms called reﬂection axioms, according to axiom schemata called reﬂection principles. Reﬂection principles can specify not only the shift between levels, but also many other meta-reasoning principles. For instance, reﬂection principles can deﬁne forms of analogical reasoning [96], and synchronous communication among logical agents [97]. RCL has originated from the idea that, more generally, reﬂection principles may be used to express the inference rules of user-deﬁned deductive systems. The deductive systems that can be speciﬁed in RCL are however evolutions of the Horn clause language, based on a predeﬁned enhanced syntax. A basic version

Meta-reasoning: A Survey

271

of naming is provided in the enhanced Horn clause language, formalized through an equational theory. The speciﬁcation of a new deductive system DS in RCL is accomplished through the following four steps. Step I Deﬁnition of the naming device (encoding) for DS. The user deﬁnition must extend the predeﬁned one. RCL leaves signiﬁcant freedom in the representation of names. Step II After deﬁning the naming convention, the user of RCL has to provide a corresponding uniﬁcation algorithm (again by suitable additions to the predeﬁned one). Step III Representation of the axioms of DS, in the form of enhanced Horn clauses. Step IV Deﬁnition of the inference rules of DS as reﬂection principles. In particular, the user is required to express each inference rule R as a function R, from clauses, which constitute the antecedent of the rule, to sets of clauses, which constitute the consequent. Then, given a theory T of DS consisting of a set of axioms A and a reﬂection principle R, a theory T containing T is obtained as the deductive closure of A ∪ A , where A is the set of additional axioms generated by R. Consequently, the model-theoretic and ﬁxed point semantics of T under R are obtained as the model-theoretic and ﬁxed point semantics of T . RCL does not actually generate T . Rather, given a query for T , RCL dynamically generates the speciﬁc additional axioms usable to answer the query according to the reﬂection principle R, i.e., according to the inference rule R of DS. 6.5

Logical Agents

In the area of intelligent software agents there are several issues that require the integration of some kind of meta-reasoning ability into the system. In fact, most existing formalisms, systems and frameworks for deﬁning agents incorporate, in diﬀerent forms, a meta-component. An important challenge in this area is that of interconnecting several agents that are heterogeneous in the sense that they are not necessarily uniform in the implementation, in the knowledge they possess and in the behavior they exhibit. Any framework for developing multi-agent systems must provide a great deal of ﬂexibility for integrating heterogeneous agents and assembling communities of independent service providers. Flexibility is required in structuring cooperative interactions among agents, and for creating more accessible and intuitive user interfaces. Meta-reasoning is essential for obtaining such a degree of ﬂexibility. Metareasoning can either be performed within the single agent, or special meta-agents can be designed, to act as meta-theories for sets of other agents. Meta-reasoning can help: (i) in the interaction among agents and with the user; (ii) in the implementation suitable strategies and plans for responding to requests. These

272

Stefania Costantini

strategies can be either domain-independent, or rely on domain- and applicationspeciﬁc knowledge or reasoning (auxiliary inference rules, learning algorithms, planning, and so forth) Meta-rules and meta-programming may be particularly useful for coping with some aspects of the ontology problem: meta-rules can switch between descriptions that are syntactically diﬀerent though semantically equivalent, and can help ﬁll the gap between descriptions that are not equivalent. Also, meta-reasoning can be used for managing incomplete descriptions or requests. The following are relevant examples of approaches to developing agent systems that make use of some form of meta-reasoning. In the Open Agent ArchitectureT M [98], which is meant for integrating a community of heterogeneous software agents, there are specialized server agents, called facilitators, that perform reasoning (and, more or less explicitly, metareasoning) about the agent interactions necessary for handling a complex expression. There are also meta–agents, that perform more complex meta-reasoning so as to assist the facilitator agent in coordinating the activities of the other agents. In the constraint logic programming language CaseLP, there are logical agents, which show capabilities of complex reasoning, and interface agents, which provide an interface with external modules. There are no meta-agents, but an agent has meta–goals that trigger meta-reasoning to guide the planning process. There are applications where agents may have objectives and may need to reason about their own as well as other agents’ beliefs and about the actions that agents may take. This is the perspective of the BDI formalization of multiagent systems proposed in [99] and [100], where BDI stands for “Belief, Desire, Intentions”. The approach of Meta-Agents [101] allow agents to reason about other agents’ state, beliefs, and potential actions by introducing powerful meta-reasoning capabilities. Meta-Agents are a speciﬁcation tool, since for eﬃcient implementation they are translated into ordinary agent programs, plus some integrity constraints. In logic programming, research on multi-agent systems starts, to the best of our knowledge, from the work by Kim and Kowalski in [102], [103]. The amalgamation of language and meta-language and the demo predicate with theories named by constants are used for formalizing reasoning capabilities in multi-agent domains. In this approach, the demo predicate is interpreted as a belief predicate and thus agents can reason, like in the BDI approach, about beliefs. In the eﬀort of obtaining logical agents that are rational, but also reactive (i.e. logical reasoning agents capable of timely response to external events) a more general approach has been proposed in [82], by Kowalski, and in [104] and [105] by Kowalski and Sadri. A meta-logic program deﬁnes the “observe-think-act” cycle of an agent. Integrity constraints are used to generate actions in response to updates from the environment. In the approach of [97], agents communicate via the two meta-level primitives tell/told. An agent is represented by a theory, i.e. by a set of clauses preﬁxed with the corresponding theory name. Communication between agents is formalized by the following reﬂection principle Rcom :

Meta-reasoning: A Survey

273

T : told (“S”, “A”)⇐Rcom S : tell (“T”, “A”). The intuitive meaning is that every time an atom of the form tell (“T”,“A”) can be derived from a theory S (which means that agent S wants to communicate proposition A to agent T ), the atom told (“S”,“A”) is consequently derived in theory T (which means that proposition A becomes available to agent T ). The objective of this formalization is that each agent can specify, by means of clauses deﬁning the predicate tell, the modalities of interaction with the other agents. These modalities can thus vary with respect to diﬀerent agents or different conditions. For instance, let P be a program composed of three agents, a and b and c, deﬁned as follows. a : tell (X, “ciao”):- friend (X). a : friend (“b”). b : happy :-told(“a”, “ciao”). c : happy :-told(“a”, “ciao”). Agent a says “ciao” to every other agent X that considers to be its friend. In the above deﬁnition, the only friend is b. Agents b and c are happy if a says “ciao” to them. The conclusion happy can be derived in agent b, while it cannot be derived in agent c. In fact, we get a : tell (“b”,“ciao”) from a : friend (“b”); instead, a : tell (“c”,“ciao”) is not a conclusion of agent a. In [106], Dell’Acqua, Sadri and Toni propose an approach to logic-based agents as a combination of the above approaches, i.e. the approach to agents by Kowalski and Sadri [105] and the approach to meta-reasoning by Costantini et al. [65], [97]. Similarly to Kowalski and Sadri’s agents, the agents in [106] are hybrid in that they exhibit both rational (or deliberative) and reactive behavior. The reasoning core of these agents is a proof procedure that combines forward and backward reasoning. Backward reasoning is used primarily for deliberative activities. Forward reasoning is used primarily for reactivity to the environment, possibly including other agents. The proof procedure is executed within an “observe-think-act” cycle that allows the agent to be alert to the environment and react to it, as well as think and devise plans. The proof procedure (IFF proof procedure proposed by Fung and Kowalski in [107]) treats both inputs from the environment and agents’ actions as abducibles (hypotheses). Moreover, by adapting the techniques proposed in [97], the agents are capable of reasoning about their own beliefs and the beliefs of other agents. In [108], the same authors extend the approach by providing agents with proactive communication capabilities. Proactive agents are able to communicate on their own initiative, not only in response to stimula. In the resulting framework reactive, rational or hybrid agents can reason about their own beliefs as well as the beliefs of other agents and can communicate proactively with each other. The agents’ behavior can be regulated by condition-action rules. In this approach, there are two primitives for communication, tell and ask, treated as abducibles within the “observe-think-act” cycle of the agent architecture. The

274

Stefania Costantini

predicate told is used to express both passive reception of messages from other agents and reception of information in response to an active request. The following example is taken by [108] and is aimed at illustrating the basic features of the approach. Let Ag be represented by the abductive logic program P, A, I with: told(A, X) ← ask(A, X) ∧ tell(A, X) told(A, X) ← tell(A, X) P = solve(X) ← told(A, X) desire(y) ← y = car good price(p, x) ← p = 0 A = tell, ask, offer

desire(x) ∧ told(B,good price(p,x)) I = . ⇒ tell(B,offer(p,x)) The ﬁrst two clauses in P state that Ag may be told something, say X, by another agent A either because A has been explicitly asked about X (ﬁrst clause) or because A tells X proactively (second clause). The third clause in P says that Ag believes anything it is told. The fourth and ﬁfth clauses in P say, respectively, that the agent desires a car and that anything that is free is at a good price. The integrity constraint says that, if the agent desires something and it is told (by some other agent B) of a good price for it, then it makes an oﬀer to B, by telling it. The logic programming language DALI [109], is indebted to all previously mentioned approaches to logical agents. DALI introduces explicit reactive and proactive rules at the object level. Thus, reactivity and proactivity are modeled in the basic logic language of the agent In fact, declarative semantics is very close to that of the standard Horn clause language. Procedural semantics relies on an extended resolution. The language incorporates tell/told primitives, integrity constraints and solve rules. An “observe-think-act” cycle can of course been implemented in a DALI agent, but it is no longer necessary for modeling reactivity and proactivity. Below is a simpliﬁed fragment of a DALI agent representing the waiter of a pub, that tries to serve a customer that enters. The customer wants some X. This request is an external event (indicated with ’E’) that arrives to the agent. The event triggers a reactive rule (indicated with ’:>’ instead of usual ’:-’), and determines the body of the rule to be executed. This is very much like any other goal: only, computation is not initiated by a query, but starts on reception of the event. During the execution of the body of the reactive rule, the waiter ﬁrst checks whether X is one of the available drinks. If so, the waiter serves the drink: the predicate serve drink is in fact an action (indicated with ’A’). Otherwise, the waiter checks whether the request is expressed in some foreign language, for which a translation is available (this is a simple example of coping with one

Meta-reasoning: A Survey

275

aspect of the ontology problem). If this is not the case, the waiter asks the customer for explanation about X: it expects to be told that X is actually an Y , in order to try to serve this Y . Notice that the predicate translate is symmetric, where symmetry is managed by the solve rule. To understand the behavior, one can assume this rule to be an additional rule of a basic meta-interpreter that is not explicitly reported. A subgoal like translate(beer, V ) is automatically transformed into a call to the meta-interpreter, of the form solve(“translate”(“beer”, “V ”)) (formally, this is implicit upward reﬂection). Then, since symmetric(“translate”) succeeds, solve(“translate”(“beer”, “V ”)) is attempted, and automatically reﬂected at the object level (formally, this is implicit downward reﬂection). Finally, the unquoted subgoal translate(beer, V ) succeeds with V instantiated to birra. W aiter request(Customer,“X”)E :> serve(Customer,X). serve(C,X) :- drink(X), serve drink(C,X)A . serve(C,X) :- translate(X,Y), drink(Y), serve drink(C,Y)A . serve(C,X) :- ask(C, X, Y ), serve(C, Y ). ask(C,X,Y) :- ask for explanation(C,“X”),told(C,“Y”). drink(beer). drink(coke). translate(birra,beer). translate(cocacola,coke). symmetric(“translate”). solve(“P”(“X”,“Y”)) :- symmetric(“P ”), solve(“P ”(“Y ”, “X”)). Agents that interact with other agents and/or with an external environment, may expand and modify their knowledge base by incorporating new information. In a dynamic setting, the knowledge base of an agent can be seen as the set of beliefs of the agent, that may change over time. An agent may reach a stage where its beliefs have become inconsistent, and actions must be taken to regain consistency. The theory of belief revision aims at modeling how an agent updates its state of belief as a result of receiving new information [110], [111]. Belief revision is, in our opinion, another important issue related to intelligent agents where meta-reasoning can be usefully applied.

276

Stefania Costantini

In [32] a model-based diagnosis system is presented, capable of revision of the description of the system to be diagnosed if inconsistencies arise from observations. Revision strategies are implemented by means of meta-programming and meta-reasoning methods. In [112], a framework is proposed where rational, reactive agents can dynamically change their own knowledge bases as well as their own goals. In particular, an agent can make observations, learn new facts and new rules from the environment (even in contrast with its current knowledge) and then update its knowledge accordingly. To solve contradictions, techniques of contradiction removal and preferences among several sources can be adopted [113]. In [114] it is pointed out that most existing approaches to intelligent agents have diﬃculties to model the way agents revise their beliefs, because new information always come together certain meta-information: e.g., where the new information comes from? Is the source reliable? and so on. Then, the agent has to reason about this meta-information, in order to revise its beliefs. This leads to the proposal of a new approach, where this meta-information can be explicitly represented and reasoned about, and revision strategies can be deﬁned in a declarative way.

7

Semantic Issues

In computational logic, meta-programming and meta-reasoning capabilities are mainly based on self-reference, i.e. on the possibility of describing language expressions in the language itself. In fact, in most of the relevant approaches the object language and the meta-language coincide. The main tool for self-reference is a naming mechanism. An alternative form of self-reference has been proposed by McCarthy [115], who suggests that introducing function symbols denoting concepts (rather than quoted expressions) might be suﬃcient for most forms of meta-reasoning. But Perlis [40] observes: “The last word you just said” is an expression that although representable as a function still refers to a particular word, not to a concept. Thus quotation seems necessarily involved at some point if we are to have a self-describing language. It appears we must describe speciﬁc expressions as carriers of (the meaning of) concepts. The issue of appropriate language facilities for naming is addressed by Hill and Lloyd in [35]. They point out the distinction between two possible representation schemes: the non-ground representation, in which an object-level variable is represented by a meta-level variable, and the ground representation, in which object-level expressions are represented by ground (i.e. variable free) terms at the meta-level. In the ground representation, an object level variable may be represented by a meta-level constant, or by any other ground term. The problem with the non-ground representation is related to meta-level predicates such as the Prolog var(X), which is true if the variable X is not instantiated, and is false otherwise. As remarked in [35]:

Meta-reasoning: A Survey

277

To see the diﬃculty, consider the goals: :−var(X) ∧ solve(p(X)) and :−solve(p(X)) ∧ var(X) If the object program consists solely of the clause p(a), then (using the “leftmost literal” computation rule) the ﬁrst goal succeeds, while the second goal fails. Hill and Lloyd propose a ground representation of expressions of a ﬁrst-order language L in another ﬁrst-order language L with three types ω, µ and η. Definition 1 (Hill and Lloyd ground representation). Given a constant a in L, there is a corresponding constant a of type ω in L . Given a variable x in L, there is a corresponding constant x of type ω in L . Given an n-ary function symbol f in L, there is a corresponding n-ary function symbol f of type ω × . . . ω −→ ω in L . Given an n-ary predicate symbol p in L, there is a corresponding n-ary function symbol f of type ω × . . . ω −→ µ in L . The language L has a constant empty of type µ. The mappings a −→ a , x −→ x , f −→ f and p −→ p are all injective. Moreover, L contains some function and predicate symbols useful for declaratively redeﬁning the “impure” features of Prolog and the Vanilla metainterpreter. For instance we will have: constant(a1 ). ... constant(an ). ∀ω x nonvar(x) ← constant(x). ∀ω x var(x) ← ¬ nonvar(x). The above naming mechanism is used in [35] for providing a declarative semantics to a meta-interpreter that implements SLDNF resolution [116] for normal programs and goals. This approach has then evolved into the metalogical facilities of the G¨ odel language [59]. Notice that, since names of predicate symbols are function symbols, properties of predicates (e.g. symmetry) cannot be explicitly stated. Since levels in G¨odel are separated rather than amalgamated, this naming mechanism does not provide operators for referentiation/dereferentiation. An important issue raised in [40] is the following: Now, it is essential to have also an un-naming device that would return a quoted sentence to its original (assertive) form, together with axioms stating that that is what naming and un-naming accomplish.

278

Stefania Costantini

Along this line, the approach of [36], developed in detail in [117], proposes to name an atom of the form α0 (α1 , . . . , αn ) as [β0 , β1 , . . . , βn ], where each βi is the name of αi . The name of the name of α0 (α1 , . . . , αn ) is the name term [γ0 , γ1 , . . . , γn ], where each γi is the name of βi , etc. Requiring names of compound expressions to be compositional allows one to use uniﬁcation for constructing name terms and accessing their components. In this approach, we are able to express properties of predicates by using their names. For instance, we can say that predicate p is binary and predicate q is symmetric, by asserting binary pred (p1 ) and symmetric(q 1 ). Given a term t and a name term s, the expression ↑ t indicates the result of quoting t and the expression ↓ s indicates the result of unquoting s. The following axioms for the operators ↑ and ↓ formalize the relationship between terms and the corresponding name terms. They form an equality theory, called NT and ﬁrst deﬁned in [118], for the basic compositional encoding outlined above. Enhanced encodings can be obtained by adding axioms to this theory. N T states that there exist names of names (each term can be referenced n times, for any n ≥ 0) and that the name of a compound term is obtained from the names of its components. Definition 2 (Basic encoding NT ). Let NT be the following equality theory. – For every constant or meta-constant cn , n ≥ 0, ↑ cn = cn+1 . – For every function or predicate symbol f of arity k, ∀x1 . . . ∀xk ↑ (f (x1 , . . . , xk )) = [f 1 , ↑ x1 , . . . , ↑ xk ]. – For every compound name term [x0 , x1 , . . . , xk ] ∀x0 . . . ∀xk ↑ [x0 , x1 , . . . , xk ] = [↑ x0 , ↑ x1 , . . . , ↑ xk ]. – For every term t ↓↑ t = t. The above set of axioms admits an associated convergent rewrite system U N . Then, a corresponding extended uniﬁcation algorithm (E-uniﬁcation algorithm) U A(U N ) can be deﬁned, that deals with name terms in addition to usual terms. In [118] it is shown that: Proposition 1 (Unification Algorithm for NT ). The E-uniﬁcation algorithm U A(U N ) is sound for NT, terminates and converges. The standard semantics of the Horn clause language can be adapted, so as to include the naming device. Precisely, the technique of quotient universes by Jaﬀar et al. [119] can be used to this purpose. Definition 3 (Quotient Universe). Let R be a congruence relation. The quotient universe of U with respect to R, indicated as U/R, is the set of the equivalence classes of U under R, i.e., the partition given by R in U . By taking R as the ﬁnest congruence relation corresponding to U N (that always exists) we get the standard semantics of the Horn clause language [116], modulo the naming relation. The naming relation can be extended according to the

Meta-reasoning: A Survey

279

application domain at hand, by adding new axioms to N T and by correspondingly extending U N and U A(U N ), provided that their nice formal properties are preserved. What is important is that, as advocated in [37], the approach to meta-programming and the approach to naming become independent. It is important to observe that, as shown in [36], any (ground or non-ground) encoding providing names for variables shows in an amalgamated language the same kind of problems emphasized in [35]. In fact, let P be the following deﬁnite program, x an object-level variable and Y a meta-variable: p(x) :- Y =↑ x, q(Y ) q(a1 ). Goal :-p(a) succeeds by ﬁrst instantiating Y to a1 and then proving q(a1 ). In contrast, the goal :-p(x) fails, as Y is instantiated to the name of x, say x1 , and subgoal q(x1 ) fails, x1 and a1 being distinct. Therefore, if choosing naming mechanisms providing names for variables, on the one hand terms can be inspected with respect to variable instantiation, on the other hand however important properties are lost. A ground naming mechanism is used in [49] for providing a declarative semantics to the (conservative) amalgamation of language and meta-language in logic programming. A naming mechanism where each well-formed expression can act as a name of itself is provided by the ambivalent logic AL of Jiang [73]. It is based on the assumption that each expression can be interpreted as a formula, as a term, as a function and as a predicate, where predicates and functions have free arity. Uniﬁcation must be extended accordingly, with the following results: Theorem 1 (Termination of AL Unification Algorithm). The uniﬁcation algorithm for ambivalent logic terminates. Theorem 2 (Correctness of AL Unification Algorithm). If the uniﬁcation algorithm for ambivalent logic terminates successfully, then it provides an ambivalent uniﬁer. If the algorithm halts with failure, then no ambivalent uniﬁer exists. The limitation is that ambivalent uniﬁers are less general than traditional uniﬁers. Theorem 3 (Properties of Resolution for AL). Resolution is a sound and complete inference method for AL. Ambivalent logic has been used in [75] for proving correctness of the Vanilla meta-interpreter, also with respect to the (conservative) amalgamation of object language and meta-language. Let P be the object program, LP the language of P , VP the Vanilla meta-interpreter and LVP the language of VP . Let MP be the least Herbrand model of P , MVP be the least Herbrand model of VP , and MVP ∪P be the least Herbrand model of VP ∪ P . We have:

280

Stefania Costantini

Theorem 4 (Properties of Vanilla Meta-Interpreter under AL). For all (ground) A in LVP , demo(A) ∈ MVP iﬀ demo(A) ∈ MVP ∪P ; for all (ground) A in LP , demo(A) ∈ MP iﬀ demo(A) ∈ MVP ∪P A similar result is obtained by Martens and De Schreye in [120] and [50] for the class of language independent programs. They use a non-ground representation with overloading of symbols, so as the name of an atom is a term, identical to the atom itself. Language independent programs can be characterized as follows: Proposition 2 (Language Independence). Let P be a deﬁnite program. Then P is language independent iﬀ for any deﬁnite goal G, all (SLD) computed answers for P ∪ G are ground. Actually however, the real practical interest lies in enhanced metainterpreters. Martens and De Schreye extend their results to meta-interpreters without additional clauses, but with additional arguments. An additional argument can be for instance an explicit theory argument, or an argument denoting the proof tree. The amalgamation is still conservative, but more expressivity is achieved. The approach to proving correctness of the Vanilla meta-interpreter proposed by Levi and Ramundo in [48] uses the S-semantics introduced by Falaschi et al. in [121]. In order to ﬁll the gap between the procedural and declarative interpretations of deﬁnite programs, the S-least Herbrand model MPS of a program P contains not only ground atoms, but all atoms Q(T ) such that t = x θ, where θ is the computed answer substitution for P ∪ {← Q(x)}. The S-semantics is obtained as a variation of the standard semantics of the Horn clause language. Levi and Ramundo [48] and Martens and De Schreye prove (independently) that demo(p(t)) ∈ MVSP iﬀ p(t) ∈ MPS . In the approach of Reﬂective Prolog, axiom schemata are deﬁned at the meta-level, by means of a distinguished predicate solve and of a naming facility. Deduction is performed at any level where there are applicable axioms. This means, conclusions drawn in the basic theory are available (by implicit reﬂection) at the meta-level, and vice versa. The following deﬁnition of RSLD-resolution [65] (SLD-resolution with reﬂection) is independent of the naming mechanism, provided that a suitable uniﬁcation algorithm is supplied. Definition 4 (RSLD-resolution). Let G be a deﬁnite goal ← A1 , . . . , Ak , let Am be the selected atom in G and let C be a deﬁnite clause. The goal (← A1 , . . . , Am−1 , B1 , . . . , Bq , Am+1 , . . . , Ak )θ is derived from G and C using mgu θ iﬀ one of the following conditions holds: i. C is A ← B1 , . . . , Bq θ is a mgu of Am and A ii. C is solve(α) ← B1 , . . . , Bq Am = solve(δ) ↑ Am = α θ is a mgu of α and α

Meta-reasoning: A Survey

281

iii. Am is solve(α) C is A ← B1 , . . . , Bq ↓ α = A θ is a mgu of A and A If the selected atom Am is an object-level atom (e.g p(a, b)), it can be resolved in two ways. First, by using as usual the clauses deﬁning the corresponding predicate (case (i)); for instance, if Am is p(a, b), by using the clauses deﬁning the predicate p. Second, by using the clauses deﬁning the predicate solve (case (ii), upward reﬂection) if the name ↑ Am of Am and α unify with mgu θ; for instance, referring to the N T naming relation deﬁned above, we have ↑ p(a, b) = [p1 , a1 , b1 ] and then a clause with conclusion solve([p1 , v, w]) can be used, with θ = {v/a1 , w/b1 }. If the selected atom Am is solve(α) (e.g solve([q 1 , c1 , d1 ])), again it can be resolved in two ways. First, by using the clauses deﬁning the predicate solve itself, similarly to any other goal (case (i)). Second, by using the clauses deﬁning the predicate corresponding to the atom denoted by the argument α of solve (case (iii), downward reﬂection); for instance, if α is [q 1 , c1 , d1 ] and thus ↓ α = q(c, d), by using the clauses deﬁning the predicate q can be used. In the declarative semantics of Reﬂective Prolog, upward and downward reﬂection are modeled by means of axiom schemata called reﬂection principles. The Least Reﬂective Herbrand Model RMP of program P is the Least Herbrand Model of the program itself, augmented by all possible instances of the reﬂection principles. RMP is the least ﬁxed point of a suitably modiﬁed version of operator TP . Theorem 5 (Properties of RSLD-Resolution). RSLD-resolution is correct and complete w.r.t. RMP

8

Conclusions

In this paper we have discussed the meta-level approach to knowledge representation and reasoning that has its roots in the work of logicians and has played a fundamental role in computer science. We believe in fact that meta-programming and meta-reasoning are essential ingredients for building any complex application and system. We have tried to illustrate to a broad audience what are the main principles meta-reasoning is based upon and in which way these principles have been applied in a variety of languages and systems. We have illustrated how sentences can be arguments of other sentences, by means of naming devices. We have distinguished between amalgamated and separated approaches, depending on whether the meta-expressions are deﬁned in (an extension of) a given language, or in a separate language. We have shown that the diﬀerent levels of knowledge can interact by reﬂection. In our opinion, the choice of logic programming as a basis for metaprogramming and meta-reasoning has several theoretical and practical advantages. ¿From the theoretical point of view, all fundamental issues (including

282

Stefania Costantini

reﬂection) can be coped with on a strong semantic basis. In fact, the usual framework of ﬁrst-order logic can be suitably modiﬁed and extended, as demonstrated by the various existing meta-logic languages. ¿From the practical point of view, in logic programming the meta-level mechanisms are understandable and easy-to-use and this has given rise to several successful applications. We have in fact tried (although necessarily shortly) to revise some of the important applications of meta-programming and meta-reasoning. At the end of this survey, I wish to explicitly acknowledge the fundamental, deep and wide contribution that Robert A. Kowalski has given to this ﬁeld. Robert A. Kowalski initiated meta-programming in logic programming, as well as many of its successful applications, including meta-interpreters, event calculus, logical agents. With his enthusiasm he has given constant encouragement to research in this ﬁeld, and to researchers as well, including myself.

9

Acknowledgements

I wish to express my gratitude to Gaetano Aurelio Lanzarone, who has been the mentor of my research work on meta-reasoning and reﬂection. I gratefully acknowledge Pierangelo Dell’Acqua for his participation to this research and for the important contribution to the study of naming mechanisms and reﬂective resolution. I also wish to mention Jonas Barklund, for the many interesting discussions and the fruitful cooperation on these topics. Many thanks are due to Luigia Carlucci Aiello, for her careful review of the paper, constructive criticism and useful advice. Thanks to Alessandro Provetti for his help. Thanks also to the anonymous referees, for their useful comments and suggestions. Any remaining errors or misconceptions are of course my entire responsibility.

References 1. Hill, P.M., Gallagher, J.: Meta-programming in logic programming. In Gabbay, D., Hogger, C.J., Robinson, J.A., eds.: Handbook of Logic in Artiﬁcial Intelligence and Logic Programming, Vol. 5, Oxford University Press (1995) 2. Barklund, J.: Metaprogramming in logic. In Kent, A., Williams, J.G., eds.: Encyclopedia of Computer Science and Technology. Volume 33. M. Dekker, New York (1995) 205–227 3. Lanzarone, G.A.: Metalogic programming. In Sessa, M.I., ed.: 1985–1995 Ten Years of Logic Programming in Italy. Palladio (1995) 29–70 4. Abramson, H., Rogers, M.H., eds.: Meta-Programming in Logic Programming, Cambridge, Mass., THE MIT Press (1989) 5. Bruynooghe, M., ed.: Proc. of the Second Workshop on Meta-Programming in Logic, Leuven (Belgium), Dept. of Comp. Sci., Katholieke Univ. Leuven (1990) 6. Pettorossi, A., ed.: Meta-Programming in Logic. LNCS 649, Berlin, SpringerVerlag (1992) 7. Fribourg, L., Turini, F., eds.: Logic Program Synthesis and Transformation – Meta-Programming in Logic. LNCS 883, Springer-Verlag (1994)

Meta-reasoning: A Survey

283

8. Barklund, J., Costantini, S., van Harmelen, F., eds.: Proc. Workshop on Meta Programming and Metareasonong in Logic, post-JICSLP96 workshop, Bonn (Germany), UPMAIL technical Report No. 127 (Sept. 2, 1996), Computing Science Dept., Uppsala Univ. (1996) 9. Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 10. Maes, P., Nardi, D., eds.: Meta-Level Architectures and Reﬂection, Amsterdam, North-Holland (1988) 11. Kiczales, G., ed.: Meta-Level Architectures and Reﬂection, Proc. Of the First Intnl. Conf. Reﬂection 96, Xerox PARC (1996) 12. Cointe, A., ed.: Meta-Level Architectures and Reﬂection, Proc. Of the Second Intnl. Conf. Reﬂection 99. LNCS 1616, Berlin, Springer-Verlag (1999) 13. Smorinski, C.: The incompleteness theorem. In Barwise, J., ed.: Handbook of Mathematical Logic. North-Holland (1977) 821–865 14. Smullyan, R.: Diagonalization and Self-Reference. Oxford University Press (1994) 15. Kripke, S.A.: Semantical considerations on modal logic. In: Acta Philosophica Fennica. Volume 16. (1963) 493–574 16. Carlucci Aiello, L., Cialdea, M., Nardi, D., Schaerf, M.: Modal and meta languages: Consistency and expressiveness. In Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 243–266 17. Aiello, M., Weyhrauch, L.W.: Checking proofs in the metamathematics of ﬁrst order logic. In: Proc. Fourth Intl. Joint Conf. on Artiﬁcial Intelligence, Tbilisi, Georgia, Morgan Kaufman Publishers (1975) 1–8 18. Bundy, A., Welham, B.: Using meta-level inference for selective application of multiple rewrite rules in algebraic manipulation. Artiﬁcial Intelligence 16 (1981) 189–212 19. Weyhrauch, R.W.: Prolegomena to a theory of mechanized formal reasoning. Artiﬁcial Intelligence (1980) 133–70 20. Carlucci Aiello, L., Cecchi, C., Sartini, D.: Representation and use of metaknowledge. Proc. of the IEEE 74 (1986) 1304–1321 21. Carlucci Aiello, L., Levi, G.: The uses of metaknowledge in AI systems. In: Proc. European Conf. on Artiﬁcial Intelligence. (1984) 705–717 22. Davis, R., Buchanan, B.: Meta-level knowledge: Overview and applications. In: Procs. Fifth Int. Joint Conf. On Artiﬁcial Intelligence, Los Altos, Calif., Morgan Kaufmann (1977) 920–927 23. Maes, P.: Computational Reﬂection. PhD thesis, Vrije Universiteit Brussel, Faculteit Wetenschappen, Dienst Artiﬁciele Intelligentie, Brussel (1986) 24. Genesereth, M.R.: Metalevel reasoning. In: Logic-87-8, Logic Group, Stanford University (1987) 25. Carlucci Aiello, L., Levi, G.: The uses of metaknowledge in AI systems. In Maes, P., Nardi, D., eds.: Meta-Level Architectures and Reﬂection. North-Holland, Amsterdam (1988) 243–254 26. Carlucci Aiello, L., Nardi, D., Schaerf, M.: Yet Another Solution to the Three Wisemen Puzzle. In Ras, Z.W., Saitta, L., eds.: Methodologies for Intelligent Systems 3: ISMIS-88, Elsevier Science Publishing (1988) 398–407 27. Carlucci Aiello, L., Nardi, D., Schaerf, M.: Reasoning about Knowledge and Ignorance. In: Proceedings of the International Conference on Fifth Generation Computer Systems 1988: FGCS-88, ICOT Press (1988) 618–627 28. Genesereth, M.R., Nilsson, J.: Logical Foundations of Artiﬁcial Intelligence. Morgan Kaufmann, Los Altos, California (1987)

284

Stefania Costantini

29. Russell, S.J., Wefald, E.: Do the right thing: studies in limited rationality (Chapter 2: Metareasoning Architectures). The MIT Press (1991) 30. Carlucci Aiello, L., Cialdea, M., Nardi, D.: A meta level abstract description of diagnosis in Intelligent Tutoring Systems. In: Proceedings of the Sixth International PEG Conference, PEG-91. (1991) 437–442 31. Carlucci Aiello, L., Cialdea, M., Nardi, D.: Reasoning about Student Knowledge and Reasoning. Journal of Artiﬁcial Intelligence and Education 4 (1993) 397–413 32. Dam´ asio, C., Nejdl, W., Pereira, L.M., Schroeder, M.: Model-based diagnosis preferences and strategies representation with logic meta-programming. In Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 267–308 33. Barklund, J., Costantini, S., Dell’Acqua, P., Lanzarone, G.A.: Reﬂection Principles in Computational Logic. Journal of Logic and Computation 10 (2000) 34. Barklund, J.: What is a meta-variable in Prolog? In Abramson, H., Rogers, M.H., eds.: Meta-Programming in Logic Programming. The MIT Press, Cambridge, Mass. (1989) 383–98 35. Hill, P.M., Lloyd, J.W.: Analysis of metaprograms. In Abramson, H., Rogers, M.H., eds.: Meta-Programming in Logic Programming, Cambridge, Mass., THE MIT Press (1988) 23–51 36. Barklund, J., Costantini, S., Dell’Acqua, P., Lanzarone, G.A.: Semantical properties of encodings in logic programming. In Lloyd, J.W., ed.: Logic Programming – Proc. 1995 Intl. Symp., Cambridge, Mass., MIT Press (1995) 288–302 37. van Harmelen, F.: Deﬁnable naming relations in meta-level systems. In Pettorossi, A., ed.: Meta-Programming in Logic. LNCS 649, Berlin, Springer-Verlag (1992) 89–104 38. Cervesato, I., Rossi, G.: Logic meta-programming facilities in Log. In Pettorossi, A., ed.: Meta-Programming in Logic. LNCS 649, Berlin, Springer-Verlag (1992) 148–161 39. Costantini, S.: Semantics of a metalogic programming language. Intl. Journal of Foundation of Computer Science 1 (1990) 40. Perlis, D.: Languages with self-reference I: foundations (or: we can have everything in ﬁrst-order logic!). Artiﬁcial Intelligence 25 (1985) 301–322 41. Perlis, D.: Languages with self-reference II. Artiﬁcial Intelligence 34 (1988) 179– 212 42. Konolige, K.: Reasoning by introspection. In Maes, P., Nardi, D., eds.: Meta-Level Architectures and Reﬂection. North-Holland, Amsterdam (1988) 61–74 43. Genesereth, M.R.: Introspective ﬁdelity. In Maes, P., Nardi, D., eds.: Meta-Level Architectures and Reﬂection. North-Holland, Amsterdam (1988) 75–86 44. van Harmelen, F., Wielinga, B., Bredeweg, B., Schreiber, G., Karbach, W., Reinders, M., Voss, A., Akkermans, H., Bartsch-Sp¨ orl, B., Vinkhuyzen, E.: Knowledgelevel reﬂection. In: Enhancing the Knowledge Engineering Process – Contributions from ESPRIT. Elsevier Science, Amsterdam, The Netherlands (1992) 175– 204 45. Carlucci Aiello, L., Weyhrauch, R.W.: Using Meta-theoretic Reasoning to do Algebra. Volume 87 of Lecture Notes in Computer Science., Springer Verlag (1980) 1–13 46. Bowen, K.A., Kowalski, R.A.: Amalgamating language and metalanguage in logic ˜ arnlund, S.˚ programming. In Clark, K.L., T¨ A., eds.: Logic Programming. Academic Press, London (1982) 153–172 47. McCarthy, J.e.a.: (The LISP 1.5 Programmer’s Manual)

Meta-reasoning: A Survey

285

48. Levi, G., Ramundo, D.: A formalization of metaprogramming for real. In Warren, D.S., ed.: Logic Programming - Procs. of the Tenth International Conference, Cambridge, Mass., The MIT Press (1993) 354–373 49. Subrahmanian, V.S.: Foundations of metalogic programming. In Abramson, H., Rogers, M.H., eds.: Meta-Programming in Logic Programming, Cambridge, Mass., The MIT Press (1988) 1–14 50. Martens, B., De Schreye, D.: Why untyped nonground metaprogramming is not (much of) a problem. J. Logic Programming 22 (1995) 51. Sterling, L., Shapiro, E.Y., eds.: The Art of Prolog. The MIT Press, Cambridge, Mass. (1986) 52. Kowalski, R.A.: Meta matters. invited presentation at Second Workshop on Meta-Programming in Logic META90 (1990) 53. Kowalski, R.A.: Problems and promises of computational logic. In Lloyd, J.W., ed.: Computational Logic. Springer-Verlag, Berlin (1990) 1–36 54. Smith, B.C.: Reﬂection and semantics in Lisp. Technical report, Xerox Parc ISL-5, Palo Alto (CA) (1984) 55. Lemmens, I., Braspenning, P.: A formal analysis of smithinsonian computational reﬂection. (In Cointe, P., ed.: Proc. Reﬂection ’99) 135–137 56. Casaschi, G., Costantini, S., Lanzarone, G.A.: Realizzazione di un interprete riﬂessivo per clausole di Horn. In Mello, P., ed.: Gulp89, Proc. 4th Italian National Symp. on Logic Programming, Bologna (1989 (in italian)) 227–241 57. Friedman, D.P., Sobel, J.M.: An introduction to reﬂection-oriented programming. In Kiczales, G., ed.: Meta-Level Architectures and Reﬂection, Proc. Of the First Intnl. Conf. Reﬂection 96, Xerox PARC (1996) 58. Attardi, G., Simi, M.: Meta–level reasoning across viewpoints. In O’Shea, T., ed.: Proc. European Conf. on Artiﬁcial Intelligence, Amsterdam, North-Holland (1984) 315–325 59. Hill, P.M., Lloyd, J.W.: The G¨ odel Programming Language. The MIT Press, Cambridge, Mass. (1994) 60. Bowers, A.F., Gurr, C.: Towards fast and declarative meta-programming. In Apt, K.R., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 137–166 61. Giunchiglia, F., Cimatti, A.: Introspective metatheoretic reasoning. In Fribourg, L., Turini, F., eds.: Logic Program Synthesis and Transformation – MetaProgramming in Logic. LNCS 883 (1994) 425–439 62. Giunchiglia, F., Traverso, A.: A metatheory of a mechanized object theory. Artiﬁcial Intelligence 80 (1996) 197–241 63. Giunchiglia, F., Seraﬁni, L.: Multilanguage hierarchical logics, or: how we can do without modal logics. Artiﬁcial Intelligence 65 (1994) 29–70 64. Costantini, S., Lanzarone, G.A.: A metalogic programming language. In Levi, G., Martelli, M., eds.: Proc. 6th Intl. Conf. on Logic Programming, Cambridge, Mass., The MIT Press (1989) 218–233 65. Costantini, S., Lanzarone, G.A.: A metalogic programming approach: language, semantics and applications. Int. J. of Experimental and Theoretical Artiﬁcial Intelligence 6 (1994) 239–287 66. Konolige, K.: An autoepistemic analysis of metalevel reasoning in logic programming. In Pettorossi, A., ed.: Meta-Programming in Logic. LNCS 649, Berlin, Springer-Verlag (1992) 67. Dell’Acqua, P.: Development of the interpreter for a metalogic programming language. Degree thesis, Univ. degli Studi di Milano, Milano (1989 (in italian))

286

Stefania Costantini

68. Maes, P.: Concepts and experiments in computational reﬂection. In: Proc. Of OOPSLA’87. ACM SIGPLAN NOTICES (1987) 147–155 69. Kiczales, G., des Rivieres, J., Bobrow, D.G.: The Art of Meta-Object Protocol. The MIT Press (1991) 70. Malenfant, J., Lapalme, G., Vaucher, G.: Objvprolog: Metaclasses in logic. In: Proc. Of ECOOP’89, Cambridge Univ. Press (1990) 257–269 71. Malenfant, J., Lapalme, G., Vaucher, G.: Metaclasses for metaprogramming in prolog. In Bruynooghe, M., ed.: Proc. of the Second Workshop on MetaProgramming in Logic, Dept. of Comp. Sci., Katholieke Univ. Leuven (1990) 272–83 72. Stroud, R., Welch, I.: the evolution of a reﬂective java extension. LNCS 1616, Berlin, Springer-Verlag (1999) 73. Jiang, Y.J.: Ambivalent logic as the semantic basis of metalogic programming: I. In Van Hentenryck, P., ed.: Proc. 11th Intl. Conf. on Logic Programming, Cambridge, Mass., THE MIT Press (1994) 387–401 74. Kalsbeek, M., Jiang, Y.: A vademecum of ambivalent logic. In Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 27–56 75. Kalsbeek, M.: Correctness of the vanilla meta-interpreter and ambivalent syntax. In Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 3–26 76. Christiansen, H.: A complete resolution principle for logical meta-programming languages. In Pettorossi, A., ed.: Meta-Programming in Logic. LNCS 649, Berlin, Springer-Verlag (1992) 205–234 77. Christiansen, H.: Eﬃcient and complete demo predicates for deﬁnite clause languages. Datalogiske Skrifter, Technical Report 51, Dept. of Computer Science, Roskilde University (1994) 78. Brogi, A., Mancarella, P., Pedreschi, D., Turini, F.: Composition operators for logic theories. In Lloyd, J.W., ed.: Computational Logic. Springer-Verlag, Berlin (1990) 117–134 79. Brogi, A., Contiero, S.: Composing logic programs by meta-programming in G¨ odel. In Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 167–194 80. Brogi, A., Turini, F.: Meta-logic for program composition: Semantic issues. In Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 83–110 81. Barklund, J., Boberg, K., Dell’Acqua, P.: A basis for a multilevel metalogic programming language. In Fribourg, L., Turini, F., eds.: Logic Program Synthesis and Transformation – Meta-Programming in Logic. LNCS 883, Berlin, SpringerVerlag (1994) 262–275 82. Barklund, J., Boberg, K., Dell’Acqua, P., Veanes, M.: Meta-programming with theory systems. In Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 195–224 83. Shoham, Y., McDermott, D.: Temporal reasoning. In Encyclopedia of Artiﬁcial Intelligence (ed. Shapiro, S. C.) pp. 967–981, 1987. 84. Kowalski, R.A., Sergot, M.: A logic-based calculus of events. New Generation Computing 4 (1986) 67–95 85. McCarthy, J., Hayes, P.: Some philosophical problems from the standpoint of artiﬁcial intelligence. Machine Intelligence 4 (1969) 463–502 86. Kowalski, R.A.: Database updates in the event calculus. J. Logic Programming (1992) 121–146

Meta-reasoning: A Survey

287

87. Kowalski, R.A., Sadri, F.: The situation calculus and event calculus compared. In: Proc. 1994 Intl. Logic Programming Symp. (1994) 539–553 88. Kowalski, R.A., Sadri, F.: Reconciling the event calculus with the situation calculus. J. Logic Programming 31 (1997) 39–58 89. Provetti, A.: Hypothetical reasoning: From situation calculus to event calculus. Computational Intelligence Journal 12 (1996) 478–498 90. D´ıaz, O., Paton, N.: Stimuli and business policies as modeling constructs: their deﬁnition and validation through the event calculus. In: Proc. of CAiSE’97. (1997) 33–46 91. Sripada, S.: Eﬃcient implementation of the event calculus for temporal database applications. In Lloyd, J.W., ed.: Proc. 12th Intl. Conf. on Logic Programming, Cambridge, Mass., The MIT Press (1995) 99–113 92. Pfenning, F.: The practice of logical frameworks. In Kirchner, H., ed.: Trees in Algebra and Programming - CAAP ’96. LNCS 1059, Linkoping, Sweden, Springer– Verlag (1996) 119–134 93. Clavel, M.G., Eker, S., Lincoln, P., Meseguer, J.: Principles of Maude. In Proc. First Intl Workshop on Rewriting Logic, volume 4 of Electronic Notes in Th. Comp. Sc. (ed. Meseguer, J.), 1996. 94. Clavel, M.G., Duran, F., Eker, S., Lincoln, P., Marti-Oliet, N., Meseguer, J., Quesada, J.: Maude as a metalanguage. In Proc. Second Intl. Workshop on Rewriting Logic, volume 15 of Electronic Notes in Th. Comp. Sc., 1998. 95. Clavel, M.G., Meseguer, J.: Axiomatizing reﬂective logics and languages. In Kiczales, G., ed.: Proc. Reﬂection ’96, Xerox PARC (1996) 263–288 96. Costantini, S., Lanzarone, G.A., Sbarbaro, L.: A formal deﬁnition and a sound implementation of analogical reasoning in logic programming. Annals of Mathematics and Artiﬁcial Intelligence 14 (1995) 17–36 97. Costantini, S., Dell’Acqua, P., Lanzarone, G.A.: Reﬂective agents in metalogic programming. In Pettorossi, A., ed.: Meta-Programming in Logic. LNCS 649, Berlin, Springer-Verlag (1992) 135–147 98. Martin, D.L., Cheyer, A.J., Moran, D.B.: The open agent architecture: a framework for building distributed software systems. Applied Artiﬁcial Intelligence 13(1–2) (1999) 91–128 99. Rao, A.S., Georgeﬀ, M.P.: Modeling rational agents within a BDI-architecture. In Fikes, R., Sandewall, E., eds.: Proceedings of Knowledge Representation and Reasoning (KR&R-91), Morgan Kaufmann Publishers: San Mateo, CA (1991) 473–484 100. Rao, A.S., Georgeﬀ, M.: BDI Agents: from theory to practice. In: Proceedings of the First International Conference on Multi-Agent Systems (ICMAS-95), San Francisco, CA (1995) 312–319 101. J., D., Subrahmanian, V., Pick, G.: Meta-agent programs. J. Logic Programming 45 (2000) 102. Kim, J.S., Kowalski, R.A.: An application of amalgamated logic to multi-agent belief. In Bruynooghe, M., ed.: Proc. of the Second Workshop on Meta-Programming in Logic, Dept. of Comp. Sci., Katholieke Univ. Leuven (1990) 272–83 103. Kim, J.S., Kowalski, R.A.: A metalogic programming approach to multi-agent knowledge and belief. In Lifschitz, V., ed.: Artiﬁcial Intelligence and Mathematical Theory of Computation, Academic Press (1991) 104. Kowalski, R.A., Sadri, F.: Towards a uniﬁed agent architecture that combines rationality with reactivity. In: Proc. International Workshop on Logic in Databases. LNCS 1154, Berlin, Springer-Verlag (1996)

288

Stefania Costantini

105. Kowalski, R.A., Sadri, F.: From logic programming towards multi-agent systems. In Annals of Mathematics and Artiﬁcial Intelligence, Vol. 25, pp. 391–410, 1999. 106. Dell’Acqua, P., Sadri, F., Toni, F.: Combining introspection and communication with rationality and reactivity in agents. In Dix, J., Cerro, F.D., Furbach, U., eds.: Logics in Artiﬁcial Intelligence. LNCS 1489, Berlin, Springer-Verlag (1998) 107. Fung, T.H., R. A. Kowalski, R.A.: The IFF proof procedure for abductive logic programming. J. Logic Programming 33 (1997) 151–165 108. Dell’Acqua, P., Sadri, F., Toni, F.: Communicating agents. In: Proc. International Workshop on Multi-Agent Systems in Logic Programming, in conjunction with ICLP’99, Las Cruces, New Mexico (1999) 109. Costantini, S.: Towards active logic programming. In Brogi, A., Hill, P., eds.: Proc. of 2nd International Workshop on Component-based Software Development in Computational Logic (COCL’99). PLI’99, Paris, France, http://www.di.unipi.it/ brogi/ ResearchActivity/COCL99/ proceedings/index.html (1999) 110. G¨ ardenfors, P.: Belief revision: a vademecum. In Pettorossi, A., ed.: MetaProgramming in Logic. LNCS 649, Berlin, Springer-Verlag (1992) 135–147 111. G¨ ardenfors, P., Roth, H.: Belief revision. In Gabbay, D., Hogger, C., Robinson, J., eds.: Handbook of Logic in Artiﬁcial Intelligence and Logic Programming. Volume 4. Clarendon Press (1995) 36–119 112. Dell’Acqua, P., Pereira, L.M.: Updating agents. (1999) 113. Lamma, E., Riguzzi, F., Pereira, L.M.: Agents learning in a three-valued logical setting. In Panayiotopoulos, A., ed.: Workshop on Machine Learning and Intelligent Agents, in conjunction with Machine Learning and Applications, Advanced Course on Artiﬁcial Intelligence (ACAI’99), Chania (Greece) (1999) (Also available at http://centria.di.fct.unl.pt/∼lmp/). 114. Brewka, G.: Declarative representation of revision strategies. In Baral, C., Truszczynski, M., eds.: NMR’2000, Proc. Of the 8th Intl. Workshop on NonMonotonic Reasoning. (2000) 115. McCarthy, J.: First order theories of individual concepts and propositions. Machine Intelligence 9 (1979) 129–147 116. Lloyd, J.W.: Foundations of Logic Programming, Second Edition. SpringerVerlag, Berlin (1987) 117. Dell’Acqua, P.: Reﬂection principles in computational logic. PhD Thesis, Uppsala University, Uppsala (1998) 118. Dell’Acqua, P.: SLD–Resolution with reﬂection. PhL Thesis, Uppsala University, Uppsala (1995) 119. Jaﬀar, J., Lassez, J.L., Maher, M.J.: A theory of complete logic programs with equality. J. Logic Programming 3 (1984) 211–223 120. Martens, B., De Schreye, D.: Two semantics for deﬁnite meta-programs, using the non-ground representation. In Apt, K., Turini, F., eds.: Meta-Logics and Logic Programming. The MIT Press, Cambridge, Mass. (1995) 57–82 121. Falaschi, M.and Levi, G., Martelli, M., Palamidessi, C.: A new declarative semantics for logic languages. In Kowalski, R. A.and Bowen, K.A., ed.: Proc. 5th Intl. Conf. Symp. on Logic Programming, Cambridge, Mass., MIT Press (1988) 993–1005

Argumentation-Based Proof Procedures for Credulous and Sceptical Non-monotonic Reasoning Phan Minh Dung1 , Paolo Mancarella2, and Francesca Toni3 1

3

Division of Computer Science, Asian Institute of Technology, GPO Box 2754, Bangkok 10501, Thailand [email protected] 2 Dipartimento di Informatica, Universit` a di Pisa, Corso Italia 40, I-56125 Pisa, Italy [email protected] Department of Computing, Imperial College of Science, Technology and Medicine, 180 Queen’s Gate, London SW7 2BZ, U.K. [email protected]

Abstract. We deﬁne abstract proof procedures for performing credulous and sceptical non-monotonic reasoning, with respect to the argumentation-theoretic formulation of non-monotonic reasoning proposed in [1]. Appropriate instances of the proposed proof procedures provide concrete proof procedures for concrete formalisms for non-monotonic reasoning, for example logic programming with negation as failure and default logic. We propose (credulous and sceptical) proof procedures under diﬀerent argumentation-theoretic semantics, namely the conventional stable model semantics and the more liberal partial stable model or preferred extension semantics. We study the relationships between proof procedures for diﬀerent semantics, and argue that, in many meaningful cases, the (simpler) proof procedures for reasoning under the preferred extension semantics can be used as sound and complete procedures for reasoning under the stable model semantics. In many meaningful cases still, proof procedures for credulous reasoning under the preferred extension semantics can be used as (much simpler) sound and complete procedures for sceptical reasoning under the preferred extension semantics. We compare the proposed proof procedures with existing proof procedures in the literature.

1

Introduction

In recent years argumentation [1,3,4,6,12,15,21,23,24,29,30,32] has played an important role in understanding many non-monotonic formalisms and their semantics, such as logic programming with negation as failure, default logic and autoepistemic logic. In particular, Eshghi and Kowalski [9] have given an interpretation of negation as failure in Logic Programming as a form of assumption based reasoning (abduction). Continuing this line of work, Dung [5] has given A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 289–310, 2002. c Springer-Verlag Berlin Heidelberg 2002

290

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

a declarative understanding of this assumption based view, by formalizing the concept that an assumption can be safely accepted if “there is no evidence to the contrary”. It has also been shown in [5] that the assumption based view provides a unifying framework for diﬀerent semantics of logic programming. Later, this view has been further put forward [1,6,12] by the introduction the notions of attack and counterattacks between sets of assumptions, ﬁnally leading to an argumentation-theoretic understanding of the semantics of logic programming and nonmonotonic reasoning. In particular, Dung [6] has introduced an abstract framework of argumentation, that consists of a set of arguments and an attack relation between them. However, this abstract framework leaves open the question of how the arguments and their attack relationship are deﬁned. Addressing this issue, Bondarenko et al. [1] has deﬁned an abstract, argumentation-theoretic assumption-based framework to non-monotonic reasoning that can be instantiated to capture many of the existing approaches to non-monotonic reasoning, namely logic programming with negation as failure, default logic [25], (many cases of) circumscription [16], theorist [22], autoepistemic logic [18] and nonmonotonic modal logics [17]. The semantics of argumentation can be used to characterize a number of alternative semantics for non-monotonic reasoning, each of which can be the basis for credulous and sceptical reasoning. In particular, three semantics have been proposed in [1,6] generalizing, respectively, the semantics of admissible scenaria for logic programming [5], the semantics of preferred extensions [5] or partial stable models [26] for logic programming, and the conventional semantics of stable models [10] for logic programming as well as the standard semantics of theorist [22], circumscription [16], default logic [25], autoepistemic logic [18] and non-monotonic modal logic [17]. More in detail, Bondarenko et al. understand non-monotonic reasoning as extending theories in some monotonic language by means of sets of assumptions, provided they are “appropriate” with respect to some requirements. These are expressed in argumentation-theoretic terms, as follows. According to the semantics of admissible extensions, a set of assumptions is deemed “appropriate” iﬀ it does not attack itself and it attacks all sets of assumptions which attack it. According to the semantics of preferred extensions, a set of assumptions is deemed “appropriate” iﬀ it is maximally admissible, with respect to set inclusion. According to the semantics of stable extensions, a set of assumptions is deemed “appropriate” iﬀ it does not attack itself and it attacks every assumption which it does not belong. Given any such semantics of extensions, credulous and sceptical non-monotonic reasoning can be deﬁned, as follows. A given sentence in the underlying monotonic language is a credulous non-monotonic consequence of a theory iﬀ it holds in some extension of the theory that is deemed “appropriate” by the chosen semantics. It is a sceptical non-monotonic consequence iﬀ it holds in all extensions of the theory that are deemed “appropriate” by the chosen semantics. In this paper we propose abstract proof procedures for performing credulous and sceptical reasoning under the three semantics of admissible, preferred and

Argumentation-Based Proof Procedures

291

stable extensions, concentrating on the special class of ﬂat frameworks. This class includes logic programming with negation as failure and default logic. We deﬁne all proof procedures parametrically with respect to a proof procedure computing the semantics of admissible extensions. A number of such procedures have been proposed in the literature, e.g. [9,5,7,8,15]. We argue that the proof procedures for reasoning under the preferred extension semantics are “simpler” than those for reasoning under the stable extension semantics. This is an interesting argument in that, in many meaningful cases (e.g. when the frameworks are order-consistent [1]), the proof procedures for reasoning under the preferred extension semantics can be used as sound and complete procedures for reasoning under the stable model semantics. The paper is organized as follows. Section 2 summarises the main features of the approach in [1]. Section 3 gives some preliminary deﬁnitions, used later on in the paper to deﬁne the proof procedures. Sections 4 and 5 describe the proof procedures for performing credulous reasoning under the preferred and stable extension semantics, respectively. Sections 6 and 7 describe the proof procedures for performing sceptical reasoning under the stable and preferred extension semantics, respectively. Section 8 compares the proposed proof procedures with existing proof procedures proposed in the literature. Section 9 concludes.

2

Argumentation-Based Semantics

In this section we brieﬂy review the notion of assumption-based framework [1], showing how it can be used to extend any deductive system for a monotonic logic to a non-monotonic logic. A deductive system is a pair (L, R) where – L is a formal language consisting of countably many sentences, and – R is a set of inference rules of the form α1 , . . . , αn α where α, α1 , . . . , αn ∈ L and n ≥ 0. If n = 0, then the inference rule is an axiom. A set of sentences T ⊆ L is called a theory. A deduction from a theory T is a sequence β1 , . . . , βm , where m > 0, such that, for all i = 1, . . . , m, – βi ∈ T , or α1 , . . . , αn in R such that α1 , . . . , αn ∈ {β1 , . . . , βi−1 }. – there exists βi T α means that there is a deduction (of α) from T whose last element is α. T h(T ) is the set {α ∈ L | T α}. Deductive systems are monotonic, in the sense that T ⊆ T implies T h(T ) ⊆ T h(T ). They are also compact, in the sense that T α implies T α for some ﬁnite subset T of T . Given a deductive system (L, R), an argumentation-theoretic framework with respect to (L, R) is a tuple T, Ab, where

292

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

– T, Ab ⊆ L, Ab = {} is a mapping from Ab into L. α is called the contrary of α. – The theory T can be viewed as a given set of beliefs, and Ab as a set of candidate assumptions that can be used to extend T . An extension of a theory T is a theory T h(T ∪ ∆), for some ∆ ⊆ Ab. Sometimes, informally, we refer to the extension simply as T ∪ ∆ or ∆. Given a deductive system (L, R) and an argumentation-theoretic framework

T, Ab, with respect to (L, R), the problem of determining whether a given sentence σ in L is a non-monotonic consequence of the framework is understood as the problem of determining whether there exist “appropriate” extensions ∆ ⊆ Ab of T such that T ∪ ∆ σ. In particular, σ is a credulous non-monotonic consequence of T, Ab, if there exists some “appropriate” extension of T . Many logics for default reasoning are credulous in this same sense, diﬀering however in the way they understand what it means for an extension to be “appropriate”. Some logics, in contrast, are sceptical, in the sense they they require that σ belong to all “appropriate” extensions. However, the semantics of any of these logics can be made sceptical or credulous, simply by varying whether a sentence is deemed to be a non-monotonic consequence of a theory if it belongs to all “appropriate” extensions or if it belongs to some “appropriate” extension. A number of notions of “appropriate” extensions are given in [1], for any argumentation-theoretic framework T, Ab, with respect to (L, R). All these notions are formulated in argumentation-theoretic terms, with respect to a notion of “attack” deﬁned as follows. Given a set of assumptions ∆ ⊆ Ab: – ∆ attacks an assumption α ∈ Ab iﬀ T ∪ ∆ α – ∆ attacks a set of assumptions ∆ ⊆ Ab iﬀ ∆ attacks an assumption α, for some α ∈ ∆ . In this paper we will consider the notions of “stable”, “admissible” and “preferred” extensions, deﬁned below. Let a set of assumptions ∆ ⊆ Ab be closed iﬀ ∆ = {α ∈ Ab | T ∪ ∆ α}. Then, ∆ ⊆ Ab is stable if and only if 1. ∆ is closed, 2. ∆ does not attack itself, and 3. ∆ attacks α, for every assumption α ∈ ∆. Furthermore, ∆ ⊆ Ab is admissible if and only if 1. ∆ is closed, 2. ∆ does not attack itself, and 3. for each closed set of assumptions ∆ ⊆ Ab, if ∆ attacks ∆ then ∆ attacks ∆ . Finally, ∆ ⊆ Ab is preferred if and only if ∆ is maximally admissible, with respect to set inclusion.

Argumentation-Based Proof Procedures

293

In general, every admissible extension is contained in some preferred extension. Moreover, every stable extension is preferred (and thus admissible) [1] but not vice versa. However, in many cases, e.g. for stratiﬁed and order-consistent argumentation-theoretic frameworks (see [1]), preferred extensions are always stable1 . In this paper we concentrate on ﬂat frameworks [1], namely frameworks in which every set of assumptions ∆ ⊆ Ab is closed. For this kind of frameworks, the deﬁnitions of admissible and stable extensions can be simpliﬁed by dropping condition 1 and by dropping the requirement that ∆ be closed in condition 3 of the deﬁnition of admissible extension. In general, if the framework is ﬂat, both admissible and preferred extensions are guaranteed to exist. Instead, even for ﬂat frameworks, stable extensions are not guaranteed to exist. However, in many cases, e.g. for stratiﬁed argumentation-theoretic frameworks [1], stable extensions are always guaranteed to exist. Diﬀerent logics for default reasoning diﬀer, not only in whether they are credulous or sceptical and how they interpret the notion of what it means to be an “appropriate” extension, but also in their underlying framework. Bondarenko et al. [1] show how the framework can be instantiated to obtain theorist [22], (some cases of) circumscription [16], autoepistemic logic [18], nonmonotonic modal logics [17], default logic [25], and logic programming, with respect to, e.g., the semantics of stable models [10] and partial stable models [26], the latter being equivalent [13] to the semantics of preferred extensions [5]. They also prove that the instances of the framework for default logic and logic programming are ﬂat. Default logic is the instance of the abstract framework T, Ab, where the is ﬁrst-order logic augmented with domain-speciﬁc inference rules of the form α1 , . . . , αm , M β1 , . . . , M βn γ where αi , βj , γ are sentences in classical logic. T is a classical theory and Ab consists of all expressions of the form M β where β is a sentence of classical logic. The contrary M β of an assumption M β is ¬β. The conventional semantics of extensions of default logic [25] corresponds to the semantics of stable extensions of the instance of the abstract framework for default logic [1]. Moreover, default logic inherits the semantics of admissible and preferred extensions, simply by being an instance of the framework. Logic programming is the instance of the abstract framework T, Ab, where T is a logic program, the assumptions in Ab are all negations not p of atomic sentences p, and the contrary not p of an assumption is p. is Horn logic provability, with assumptions, not p, understood as new atoms, p∗ (see [9]). The logic programming semantics of stable models [10], admissible scenaria [5], and partial stable models [26]/preferred extensions [5] correspond to the semantics of stable, admissible and preferred extensions, respectively, of the instance of the abstract framework for logic programming [1]. 1

See the Appendix for the deﬁnition of stratiﬁed and order-consistent frameworks.

294

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

In the remainder of the paper we will concentrate on computing credulous and sceptical consequences under the semantics of preferred and stable extensions. We will rely upon a proof procedure for computing credulous consequences under the semantics of admissible extensions (see Sect. 8 for a review of such procedures). Note that we ignore the problem of computing sceptical consequences under the semantics of admissible extensions as, for ﬂat frameworks, this problem reduces to that of computing monotonic consequences in the underlying deductive system. Indeed, in ﬂat frameworks, the empty set of assumptions is always admissible. We will propose abstract proof procedures, but, for simplicity, we will illustrate their behaviour within the concrete instance of the abstract framework for logic programming.

3

Preliminaries

In the sequel we assume that a framework is given and we omit mentioning it explicitly if clear by the context. Let S be a set of sets. A subset B of S is called a base of S if for each element s in S there is an element b in B such that b ⊆ s. We assume that the following procedures are deﬁned, where α is a sentence in L and ∆ ⊆ Ab is a set of assumptions: – support(α, ∆) computes a set of sets ∆ ⊆ Ab such that α ∈ T h(T ∪ ∆ ) and ∆ ⊇ ∆. support(α, ∆) is said to be complete if it is a base of the set {∆ ⊆ Ab|α ∈ T h(T ∪ ∆ ) and ∆ ⊇ ∆}. – adm expand(∆) computes a set of sets ∆ ⊆ Ab such that ∆ ⊇ ∆ and ∆ is admissible. adm expand(∆) is said to be complete if it is a base of the set of all admissible supersets of ∆. We will assume that the above procedures are nondeterministic. We will write, e.g. A := support(α, ∆) meaning that the variable A is assigned, if any, a result of the procedure support. Such a statement represents a backtracking point, which may eventually fail if no further result can be produced by support. The following example illustrates the above procedures. Example 1. Consider the following logic program p ← q, not r q ← not s t ← not h f

Argumentation-Based Proof Procedures

295

and the sentence p. Possible outcomes of the procedure support(p, {}) are ∆1 = {not s, not r} and ∆2 = {not s, not r, not f }. Possible outcomes of the procedure adm expand(∆1 ) are ∆1 and ∆1 ∪ {not h}. No possible outcomes exist for adm expand(∆2 ). Note that diﬀerent implementations for the above procedures are possible. In all examples in the remainder of the paper we will assume that support and adm expand return minimal sets. In the above example, ∆1 is a minimal support whereas ∆2 is not, and ∆1 is a minimal admissible expansion of ∆1 whereas ∆1 ∪ {not h} is not.

4

Computing Credulous Consequences under Preferred Extensions

To show that a sentence is a credulous consequence under the preferred extension semantics, we simply need to check the existence of an admissible set of assumptions which entails the desired sentence. This can be done by: – ﬁnding a support set for the sentence – showing that the support set can be extended into an admissible extension. Proof procedure 4.1 (Credulous Preferred Extensions). CP E(α): S := support(α, {}); ∆ := adm expand(S); return ∆ Notice that the two assignments in the procedure are backtracking points, due to the nondeterministic nature of both support and adm expand. Example 2. Consider the following logic program p ← not s s←q q ← not r r ← not q and the sentence p. The procedure CP E(p) will perform the following steps: – ﬁrst the set S = {not s} is generated by support(p, {}) – then the set ∆ = {not s, not q} is generated by adm expand(S) – ﬁnally, ∆ is the set returned by the procedure Consider now the conjunction p, q. The procedure CP E((p, q))2 would fail, since – S = {not s, not r} is generated by support((p, q), {}) – there exists no admissible set ∆ ⊇ S. 2

Note that, in the instance of the framework of [1] for logic programming, conjunction of atoms are not part of the underlying deductive system. However, conjunctions can be accommodated by additional program clauses. E.g., in the given example, the logic program can be extended by t ← p, q, and CPE can be called for t.

296

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

Theorem 1 (Soundness and Completeness of CP E). 1. If CP E(α) succeeds then there exists a preferred extension ∆ such that α ∈ T h(T ∪ ∆). 2. If both support and adm expand are complete then for each preferred extension E there exist appropriate selections such that CP E(α) returns ∆ ⊆ E. Proof. 1. It follows immediately from the fact that each admissible set of assumptions could be extended into a preferred extension. 2. Let E be a preferred extension such that α ∈ T h(T ∪E). Since support(α, {}) is complete, there is a set S ⊂ E such that S could be computed by support(α, {}). From the completeness of adm expand, it follows that there is ∆ ⊆ E such that ∆ is computed by adm expand(S).

5

Computing Credulous Consequences under Stable Extensions

A stable model is nothing but a preferred extension which entails either α or its contrary, for each assumption α [1]. Hence, to show that a sentence is a credulous consequence under the stable model semantics, we simply need to ﬁnd an admissible extension which entails the sentence and which can be extended into a stable model. We assume that the following procedures are deﬁned: – f ull cover(Γ ) returns true iﬀ the set of sentences Γ entails any assumption or its contrary, f alse otherwise; – uncovered(Γ ) nondeterministically returns, if any, an assumption which is undeﬁned, given Γ , i.e. neither the assumption nor its contrary is entailed by Γ . In the following procedure CSM , both f ull cover and uncovered will be applied to sets of assumptions only. Proof procedure 5.1 (Credulous Stable Models). CSM (α): ∆ := CP E(α); loop if f ull cover(∆) then return ∆ else β := uncovered(∆) ∆ := adm expand(∆ ∪ {β}); end if end loop

Argumentation-Based Proof Procedures

297

Note that CSM is a non-trivial extension of CP E: once an admissible extension is selected, as in CP E, CSM needs to further expand the selected admissible extension, if possible, to render it stable. This is achieved by the main loop in the procedure. Clearly, the above procedure may not terminate if the underlying framework

T, Ab, contains inﬁnitely many assumptions, since in this case the main loop may go on forever. In the following theorem we assume that the set of assumptions Ab is ﬁnite. Theorem 2 (Soundness and Completeness of CSM ). Let T, Ab, be a framework such that Ab is ﬁnite. 1. If CSM (α) succeeds then there exists a stable extension ∆ such that α ∈ T h(T ∪ ∆). 2. If both support and adm expand are complete then for each stable extension E such that α ∈ T h(T ∪ E) there exist appropriate selections such that CSM (α) returns E. Proof. The theorem follows directly from theorem 3.

The CSM procedure is based on backward-chaining in contrast to the procedure of Niemel¨a et al. [19,20] that is based on forward-chaining. We explain the diﬀerence between the two procedures in the following example. Example 3. p ← not q q ← not r r ← not q Assume that the given query is p. The CSM procedure would compute {not q} as a support for p. The procedure adm expand({not q}) will produce ∆ = {not q} as its result. Since ∆ covers all assumptions, ∆ is the result produced by the procedure. Niemel¨ a et. al procedure would start by picking an arbitrary element from {not p, not q, not r} and start to apply the Fitting operator to it to get a ﬁxpoint. For example, not r may be selected. Then the set B = {q, not r} is obtained. Since there is no conﬂict in B and B does not cover all the assumptions, not p will be selected. Since {not p, q, not r} covers all assumptions, a test to check whether p is implied from it is performed with f alse as the result. Therefore backtracking will be made and not q will be selected leading to the expected result. A drawback of Niemel¨a et. al procedure is that it may have to make too many unnecessary choices as the above example shows. However forward chaining may help in getting closer to the solution more eﬃciently. The previous observations suggest a modiﬁcation of the procedure which tries to combine both backward and forward chaining. This can be seen as an integration of ours and Niemel¨a et. al procedures. In the new procedure, CSM 2, we make use of some additional procedures and notations:

298

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

– Given a set of sentences Γ , Γ − denotes the set of assumptions contained in Γ. – A set of sentences Γ is said to be coherent if Γ − is admissible and Γ ⊆ T h(T ∪ Γ ), – Given a set of sentences Γ , expand(Γ ) deﬁnes a forward expansion of Γ satisfying the following conditions: 1. Γ ⊆ expand(Γ ) 2. If Γ is coherent then (a) expand(Γ ) is also coherent, and (b) for each stable extension E, if Γ − ⊆ E then expand(Γ )− ⊆ E. Proof procedure 5.2 (Credulous Stable Models). CSM 2(α): ∆ := CP E(α); Γ := expand(∆); loop if f ull cover(Γ ) then return Γ − else β := uncovered(Γ ); ∆ := adm expand(Γ − ∪ {β}); Γ := expand(∆ ∪ Γ ); end if end loop As anticipated, the procedure expand can be deﬁned in various ways. If expand is simply the identity function, i.e. expand(∆) = ∆ the procedure CSM 2 collapses down to CSM . In some other cases, expand could also eﬀectively perform forward reasoning, and try to produce the deductive closure of the given set of sentences. This can be achieved by deﬁning expand in such a way that expand(∆) = T h(T ∪ ∆). In still other cases, expand(∆) could be extended to be closed under the Fitting’s operator. As in the case of Theorem 2, we need to assume that the set of assumptions in the underlying framework is ﬁnite, in order to prevent non termination of the main loop. Theorem 3 (Soundness and Completeness of CSM 2). Let T, Ab, be a framework such that Ab is ﬁnite. 1. If CSM 2(α) succeeds then there exists a stable extension ∆ such that α ∈ T h(T ∪ ∆). 2. If both CP E and adm expand are complete then for each stable extension E such that α ∈ T h(T ∪ E) there exist appropriate selections such that CSM 2(α) returns E.

Argumentation-Based Proof Procedures

299

Proof. 1. We ﬁrst prove by induction that at the beginning of each iteration of the loop, Γ is coherent. The basic step is clear since ∆ is admissible. Inductive Step: Let Γ be coherent. From ∆ := adm expand(Γ − ∪ {β}), it follows that ∆ is admissible. Because Γ − ⊆ ∆ and Γ ⊆ T h(T ∪ Γ ), it follows that Γ ⊆ T h(T ∪ ∆). From (∆ ∪ Γ )− = ∆, it follows that ∆ ∪ Γ is coherent. Therefore expand(∆ ∪ Γ ) is coherent. It is obvious that for any coherent set of sentences Γ such that f ull cover(Γ ) holds, Γ − is stable. 2. Let E be a stable model such that α ∈ T h(T ∪E). Because CP E is complete, there is a selection such that executing the command ∆ := CP E(α) yields an admissible ∆ ⊆ E. From the properties of expand, it follows that Γ obtained from Γ := expand(∆), is coherent and Γ − ⊆ E. If f ull cover(Γ ) does not hold, then we can always select a β ∈ E − Γ − . Therefore due to the completeness of adm expand, we can get a ∆ that is a subset of E. Hence Γ obtained from Γ := expand(∆ ∪ Γ ), is coherent and Γ − ⊆ E. Continuing this process until termination, which is guaranteed by the hypothesis that Ab is ﬁnite, will return E as the result of the procedure. However, if in the underlying framework every preferred extension is also stable, then CSM can be greatly simpliﬁed by dropping the main loop, namely CSM coincides with CP E. As shown in [1], this is the case if the underlying framework is order-consistent (see Appendix). Theorem 4 (Soundness and completeness of CP E wrt stable models and order consistency). Let the underlying framework be order-consistent. 1. If CP E(α) succeeds then there exists a stable extension ∆ such that α ∈ T h(T ∪ ∆). 2. If both support and adm expand are complete then for each stable extension E there exist appropriate selections such that CP E(α) returns ∆ ⊆ E. The use of CP E instead of CSM , whenever possible, greatly simpliﬁes the task of performing credulous reasoning under the stable semantics, in that it allows to keep the search for a stable extension “localised”, as illustrated by the following example. Example 4. Consider the following order-consistent logic program p ← not s q ← not r r ← not q which has two preferred (and stable) extensions containing p, corresponding to the sets of assumptions ∆1 = {not s, not r} and ∆2 = {not s, not q}. The

300

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

procedure CP E(p) would compute the admissible extension {not s} as a result, since {not s} is a support for p and it is admissible (there are no attacks against not s) . On the other hand, the procedure CSM (p) would produce either ∆1 or ∆2 , which are both stable sets extending {not s}.

6

Computing Sceptical Consequences under Stable Extensions

First, we deﬁne the notion of “contrary of sentences”, by extending the notion of “contrary of assumptions”. In all concrete instances of the abstract framework, e.g. logic programming, default logic, autoepistemic logic and non-monotonic modal logic, for each non-assumption sentence β there is a unique assumption α such that α = β, so the natural way of deﬁning the “contrary of a sentence” β which is not an assumption is β = α such that α = β. But in general, it is possible that for some non-assumption sentence β there may be no assumption α such that α = β, or there may be more than one assumption α such that α = β. Thus, for general frameworks, we deﬁne the concept of contrary of sentences which are not assumptions as follows. Let β be a sentence such that β ∈ Ab. – if there exists α such that α = β then β = {γ|γ = β} – if there exists no α such that α = β then we introduce a new assumption κβ , not already in the language, and we deﬁne • κβ = β • β = {κβ } Note that, in this way, the contrary of a sentence β ∈ / Ab is a set of assumptions. Let us denote by Ab ⊇ Ab the new set of assumptions. It is easy to see that the original framework, T, Ab, , and the extended framework, T, Ab , , are equivalent in the following sense: – if ∆ ⊆ Ab is admissible wrt the original framework then it is also admissible wrt the new framework; – if ∆ ⊆ Ab is admissible wrt the new framework then ∆ ∩ Ab is admissible wrt the original framework. Therefore from now on, we will assume that for each sentence β which is not an assumption there exists at least an assumption α such that α = β. In order to show that a sentence β is entailed by each stable model, we can proceed as follows: – check that β is a credulous consequence under the stable model semantics – check that the contrary of the sentence is not a credulous consequence under the stable models semantics.

Argumentation-Based Proof Procedures

301

Notice that if β ∈ / Ab the second step amounts to checking that each α ∈ β is not a credulous consequence under the stable models semantics. Moreover, notice that the ﬁrst step of the computation cannot be omitted (as one could expect) since there may be cases in which neither β nor its contrary hold in any stable model (e.g. in the framework corresponding to the logic program p ← not p). Lemma 1. Let E be a stable extension. Then for each non-assumption β such that β ∈ T h(T ∪ E), the following statements are equivalent: 1. β ∩ E = ∅ 2. β ⊆ E Proof. It is clear that the second condition implies the ﬁrst. We need only to prove now that the ﬁrst condition implies the second one. Let β ∩E = ∅. Suppose that β −E = ∅. Let α ∈ β −E. Then it is clear that α ∈ T h(T ∪E). Contradiction to the condition that α = β and β ∈ T h(T ∪ E). Proof procedure 6.1 (Sceptical Stable Models). SSM (α): if CSM (α) fails then fail; select β ∈ α; if CSM (β) succeeds then fail; Notice that the SSM procedure makes use of the CSM procedure. To prevent non termination of CSM we need to assume that the set of assumptions Ab of the underlying extended framework is ﬁnite. This guarantees the completeness of CSM (cfr. Theorem 2). Theorem 5 (Soundness and Completeness of SSM ). Let CSM be complete. 1. If SSM (α) succeeds then α ∈ T h(T ∪ ∆), for every stable extension ∆. 2. If α ∈ T h(T ∪ ∆), for every stable extension ∆, and the set of stable extensions is not empty, then SSM (α) succeeds. Proof. 1. Let SSM (α) succeed. Assume now that α is not a skeptical consequence wrt stable semantics. There are two cases: α ∈ Ab and α ∈ Ab. Consider the ﬁrst case where α ∈ Ab. It follows that there is a stable extension E such that α ∈ T h(T ∪ E). Because of the completeness of CSM, it follows that CSM (α) succeeds. Hence SSM (α) fails, contradiction. Let α ∈ Ab. From lemma 1, it follows that there is a stable extension E such that E ∩ α = ∅. That means CSM (β) succeeds for some β ∈ α. Lemma 1 implies CSM (β) succeeds for each β ∈ α. Hence SM M (α) fails. Contradiction.

302

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

2. Because CSM is complete, it is clear that CSM (α) succeeds. Also because of the soundness of CSM, CSM (β) fails for each β ∈ α. Therefore it is obvious that SSM succeeds. For a large class of argumentation frameworks, preferred extensions and stable models semantics coincide, e.g. if the frameworks are order-consistent [1]. In these frameworks, the procedure SSM can be simpliﬁed signiﬁcantly as follows. Proof procedure 6.2 (Sceptical Stable Models via CP E). SSM P E(α): if CP E(α) fails then fail; select β ∈ α; if CP E(β) succeeds then fail ; The procedure is structurally the same as the earlier SSM , but it relies upon CP E rather than CSM , and is therefore “simpler” in the same way that CP E is “simpler” than CSM , as discussed earlier in Sect. 5. Theorem 6 (Soundness and completeness of SSM P E wrt sceptical stable semantics). Let the underlying framework be order-consistent and CPE be complete. 1. If SSM P E(α) succeeds then α ∈ T h(T ∪ ∆), for every stable extension ∆. 2. If α ∈ T h(T ∪ ∆), for every stable extension ∆, then SSM P E(α) succeeds. Note that the second statement in the above theorem does not require the existence of stable extensions. This is due to the assumption that order-consistency always guarantees such condition.

7

Computing Sceptical Consequences under Preferred Extensions

The naive way of showing that a sentence is a sceptical consequence under the preferred extensions semantics is to consider each preferred extension in turn and check that the sentence is entailed by it. The earlier procedure SSM P E can be used as a simpliﬁcation of the naive method only if every preferred extension is guaranteed to be stable. In general, however, the procedure SSM P E is not sound under the preferred extensions semantics, since there might exist preferred extensions in which, for some assumption α, neither α nor its contrary hold, as the following example shows. Example 5. p ← not p p←q q ← not r r ← not q

Argumentation-Based Proof Procedures

303

Notice that there are two preferred extensions,namely E1 = {not q, r} and E2 = {not r, q, p}. E2 is also a stable extension, whereas E1 is not since neither p nor not p hold in E1 . Notice that SSM P E(p) would succeed, hence giving an unsound result. Nonetheless, in the general case, the following theorem shows that it is possible to restrict the number of preferred extensions to consider. This theorem is a variant of theorem 16 in [30], as we will discuss in Sect. 8. Theorem 7. Given an argumentation-theoretic framework T, Ab, and a sentence α in its language, α is a sceptical non-monotonic consequence of T with respect to the preferred extension semantics, i.e. α ∈ T h(T ∪ ∆) for all preferred ∆ ⊆ Ab, iﬀ 1. α ∈ T h(T ∪ ∆0 ), for some admissible set of assumptions ∆0 ⊆ Ab, and 2. for every set of assumptions ∆ ⊆ Ab, if ∆ is admissible and ∆ attacks ∆0 , then α ∈ T h(T ∪ ∆ ) for some set of assumptions ∆ ⊆ Ab such that (a) ∆ ⊇ ∆, and (b) ∆ is admissible. Proof. The only if half is trivial. The if half is proved by contradiction. Suppose there exists a set of assumptions ∆∗ such that ∆∗ is preferred and α ∈ T h(T ∪ ∆∗ ). Suppose ∆0 is the set of assumptions provided in part 1. If ∆0 = ∅ then α ∈ T h(T ) and therefore α ∈ T h(T ∪ ∆∗ ), thus contradicting the hypothesis. Therefore, ∆0 = ∅. Consider the following two cases: (i) ∆∗ ∪ ∆0 attacks itself, or (ii) ∆∗ ∪ ∆0 does not attack itself. Case (ii) implies that ∆∗ ∪ ∆0 is admissible, thus contradicting the hypothesis that ∆∗ is preferred (and therefore maximally admissible). Case (i) implies that (i.1) ∆∗ ∪ ∆0 attacks ∆∗ , or (i.2) ∆∗ ∪ ∆0 attacks ∆0 . Assume that (i.1) holds. . ∆∗ ∪ ∆0 attacks ∆∗ ⇒ {by admissibility of ∆∗ } ∆∗ attacks ∆∗ ∪ ∆0 ⇒ {by admissibility, ∆∗ does not attack itself} ∆∗ attacks ∆0 ⇒ {by part 2 } α ∈ T h(T ∪ ∆∗ ) thus contradicting the hypothesis.

304

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

Assume now that (i.2) holds. ∆∗ ∪ ∆0 attacks ∆0 ⇒ {by admissibility of ∆0 } ∆0 attacks ∆∗ ∪ ∆0 ⇒ {by admissibility, ∆0 does not attack itself} ∆0 attacks ∆∗ ⇒ {by admissibility of ∆∗ } ∆∗ attacks ∆0 ⇒ {by part 2 } α ∈ T h(T ∪ ∆∗ ) thus contradicting the hypothesis.

This result can be used to deﬁne the following procedure to check whether or not a given sentence is a sceptical consequence with respect to the preferred extension semantics. Let us assume the following procedure is deﬁned – attacks(∆) computes a base of the set of all attacks against the set of assumptions ∆. Proof procedure 7.1 (Sceptical Preferred Extensions). SP E(α): ∆ := CP E(α); for each A := attacks(∆) for each ∆ := adm expand(A) ∆ := support(α, ∆ ); if adm expand(∆ ) fails then fail end if end for end for

The following soundness theorem is a trivial corollary of theorem 7. Theorem 8 (Soundness and Completeness of SP E). Let adm expand be complete. 1. if SP E(α) succeeds, then α ∈ T h(T ∪ ∆), for every preferred extension ∆. 2. If CP E is complete and α ∈ T h(T ∪ ∆), for every preferred extension ∆, then SP E(α) succeeds. In many cases where the framework has exactly one preferred extension that is also stable (for example when the framework is stratiﬁed), it is obvious that the CPE procedure could be used as a procedure for skeptical preferred extension semantics.

Argumentation-Based Proof Procedures

8

305

Related Work

The proof procedures we propose in this paper rely upon proof procedures for computing credulous consequences under the semantics of admissible extensions. A number of such procedures have been proposed in the literature. Eshghi and Kowalski [9] (see also the revised version proposed by Dung in [5]) propose a proof procedure for logic programming based upon interleaving abductive derivations, for the generation of negative literals to “derive” goals, and consistency derivations, to check “consistency” of negative literals with atoms “derivable” from the program. The proof procedure can be understood in argumentation-theoretic terms [12], as interleaving the generation of assumptions supporting goals or counter-attacking assumptions (abductive derivations) and the generation of attacks against any admissible support (consistency derivations), while checking that the generated support does not attack itself. Dung, Kowalski and Toni [7] propose abstract proof procedures for computing credulous consequences under the semantics of admissible extensions, deﬁned via logic programs. Kakas and Toni [15] propose a number of proof procedures based on the construction of trees whose nodes are sets of assumptions, and such that nodes attack their parents, if any. The proof procedures are deﬁned in abstract terms and, similarly to the procedures we propose in this paper, can be adopted for any concrete framework that is an instance of the abstract one. The procedures allow to compute credulous consequences under the semantics of admissible extensions as well as under semantics that we have not considered in this paper, namely the semantics of weakly stable extensions, acceptable extensions, well-founded extensions. The concrete procedure for computing credulous consequences under the semantics of admissible extensions, in the case of logic programming, corresponds to the proof procedure of [9]. Dung, Kowalski and Toni [8] also propose abstract proof procedures for computing credulous consequences under the semantics of admissible extensions, that can be instantiated to any instance of the framework of [1]. These procedures are deﬁned in terms of trees whose nodes are assumptions, as well as via derivations as in [9]. Kakas and Dimopoulos [2] propose a proof procedure to compute credulous consequences under the semantics of admissible extensions for the argumentation framework of Logic Programming without Negation as Failure proposed in [14]. Here, negation as failure is replaced and extended by priorities over logic programs with no negation as failure but with explicit negation instead. Other proof procedures for computing credulous consequences under the stable extension semantics and sceptical consequences under the semantics of preferred and stable extensions have been proposed. Thielscher [30] proposes a proof procedure for computing sceptical consequences under the semantics of preferred extensions for the special case of logic programming [31]. This proof procedure is based upon a version of theorem 7 (theorem 16 in [30]). However, whereas [30] uses the notion of “conﬂict-free set

306

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

of arguments” (which is an atomic, abstract notion), we use the notion of admissible set of assumptions. Moreover, theorem 16 in [30] replaces the condition in part 2 of theorem 7 “∆ attacks ∆0 ” by the (equivalent) condition corresponding to “∆ ∪ ∆0 attacks itself”. For a formal correspondence between the two approaches see [31]. Niemel¨a [19] and Niemel¨ a and Simons [20] give proof procedures for computing credulous and sceptical consequences under stable extensions, for default logic and logic programming, respectively. As discussed in Sect. 5, their proof procedures for computing credulous consequences under stable extensions rely upon forward chaining, whereas the proof procedures we propose for the same task rely either on backward chaining (CSM) or on a combination of backward and forward chaining (CSM2). Satoh and Iwayama [28] deﬁne a proof procedure for logic programming, computing credulous consequences under the stable extension semantics for rangerestricted logic programs that admit at least one stable extension. Satoh [27] adapts the proof procedure in [28] to default logic. The proof procedure applies to consistent and propositional default theories. Inoue et al. [11] apply the model generation theorem prover to logic programming to generate stable extensions, thus allowing to perform credulous reasoning under the stable extension semantics by forward chaining.

9

Conclusions

We have presented abstract proof procedures for computing credulous and sceptical consequences under the semantics of preferred and stable extensions for non-monotonic reasoning, as proposed in [1], relying upon any proof procedure for computing credulous consequences under the semantics of admissible extensions. The proposed proof procedures are abstract in that they can be instantiated to any concrete framework for non-monotonic reasoning which is an instance of the abstract ﬂat framework of [1]. These include logic programming and default logic. They are abstract also in that they abstract away from implementation details. We have compared our proof procedures with existing, state of the art procedures deﬁned for logic programming and default logic. We have argued that the proof procedures for computing consequences under the semantics of preferred extensions are simpler than those for computing consequences under the semantics of stable extensions, and supported our arguments with examples. However, note that the (worst-case) computational complexity of the problem of computing consequences under the semantics of stable extensions is in general no worse than that of computing consequences under the semantics of preferred extensions, and in some cases it is considerably simpler [3,4]. In particular, in the case of autoepistemic logic, the problem of computing sceptical consequences under the semantics of preferred extensions is located at

Argumentation-Based Proof Procedures

307

the fourth level of the polynomial hierarchy, whereas the same problem under the semantics of stable extensions is located at the second level. Of course, these results do not contradict the expectation that in practice, in many cases, computing consequences under the semantics of preferred extensions is easier than under the semantics of stable extensions. Indeed, preferred extensions supporting a desired sentence can be constructed “locally”, by restricting attention to the sentences in the language that are directly relevant to the sentence. Instead, stable extensions need to be constructed “globally”, by considering all sentences in the language, whether they are directly relevant to the given sentence or not. This is due to the fact that stable extensions are not guaranteed to exist. However, note that in all cases where stable extensions are guaranteed to exist and coincide with preferred extensions, e.g. for stratiﬁed and order-consistent frameworks [1], any proof procedure for reasoning under the latter is a correct (and simpler) computational mechanism for reasoning under the former. Finally, the “locality” feature in the computation of consequences under the preferred extension semantics renders it a feasible alternative to the computation of consequences under the stable extension semantics in the non-propositional case, when the language is inﬁnite. Indeed, both CPE and SPE do not require that the given framework be propositional.

Acknowledgements This research has been partially supported by the EC KIT project “Computational Logic for Flexible Solutions to Applications”. The third author has been supported by the UK EPSRC project “Logic-based multi-agent systems”.

References 1. A. Bondarenko, P. M. Dung, R. A. Kowalski, F. Toni, An abstract, argumentationtheoretic framework for default reasoning. Artificial Intelligence, 93:63-101, 1997. 2. Y. Dimopoulos, A. C. Kakas, Logic Programming without Negation as Failure, Proceedings of the 1995 International Symposium on Logic Programming, pp. 369383, 1995. 3. Y. Dimopoulos, B. Nebel, F. Toni, Preferred Arguments are Harder to Compute than Stable Extensions, Proc. of the Sixteenth International Joint Conference on Artiﬁcial Intelligence, IJCAI 99, (T. Dean ed.), pp. 36-43, 1999. 4. Y. Dimopoulos, B. Nebel, F. Toni, Finding Admissible and Preferred Arguments Can Be Very Hard, Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning, KR 2000, (A. G. Cohn, F. Giunchiglia, B. Selman eds.), pp. 53-61, Morgan Kaufmann Publishers, 2000. 5. P. M. Dung, Negation as hypothesis: an abductive foundation for logic programming. Proceedings of the 8th International Conference on Logic Programming, Paris, France (K. Furukawa, ed.), MIT Press, pp. 3–17, 1991. 6. P. M. Dung, On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games Artificial Intelligence,, 77:321-357, Elsevier, 1993.

308

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

7. P. M. Dung, R. A. Kowalski, F. Toni, Synthesis of proof procedures for default reasoning, Proc. LOPSTR’96, International Workshop on Logic Program Synthesis and Transformation, (J. Gallagher ed.), pp. 313–324, LNCS 1207, Springer Verlag, 1996. 8. P. M. Dung, R. A. Kowalski, F. Toni, Proof procedures for default reasoning. In preparation, 2002. 9. K. Eshghi, R. A. Kowalski, Abduction compared with negation as failure. Proceedings of the 6th International Conference on Logic Programming, Lisbon, Portugal (G. Levi and M. Martelli, eds), MIT Press, pp. 234–254, 1989 10. M. Gelfond, V. Lifschitz, The stable model semantics for logic programming. Proceedings of the 5th International Conference on Logic Programming, Washington, Seattle (K. Bowen and R. A. Kowalski, eds), MIT Press, pp. 1070–1080, 1988 11. K. Inoue, M. Koshimura, R. Hasegawa, Embedding negation as failure into a model generation theorem-prover. Proc. CADE’92, pp. 400-415, LNCS 607, Springer, 1992. 12. A. C. Kakas, R. A. Kowalski, F. Toni, The role of abduction in logic programming. Handbook of Logic in Artificial Intelligence and Logic Programming (D.M. Gabbay, C.J. Hogger and J.A. Robinson eds.), 5: 235-324, , Oxford University Press, 1998. 13. A. C. Kakas, P. Mancarella. Preferred extensions are partial stable models. Journal of Logic Programming 14(3,4), pp.341–348, 1993. 14. A. C. Kakas, P. Mancarella, P. M. Dung, The Acceptability Semantics for Logic Programs, Proceedings of the Eleventh International Conference on Logic Programming, pp. 504-519, 1994. 15. A. C. Kakas, F. Toni, Computing Argumentation in Logic Programming. Journal of Logic and Computation 9:515-562, Oxford University Press, 1999. 16. J. McCarthy, Circumscription – a form of non-monotonic reasoning. Artificial Intelligence, 1327–39, 1980. 17. D. McDermott, Nonmonotonic logic II: non-monotonic modal theories. Journal of ACM 29(1), pp. 33–57, 1982. 18. R. Moore, Semantical considerations on non-monotonic logic. Artificial Intelligence 25:75–94, 1985. 19. I. Niemel¨ a, Towards eﬃcient default reasoning. Proc. IJCAI’95, pp. 312–318, Morgan Kaufman, 1995. 20. I. Niemel¨ a, P. Simons, Eﬃcient implementation of the well-founded and stable model semantics. Proc. JICSLP’96, pp. 289–303, MIT Press, 1996. 21. J. L. Pollock. Defeasible reasoning. Cognitive Science, 11(4):481–518, 1987. 22. D. Poole, A logical framework for default reasoning. Artificial Intelligence 36:27– 47, 1988. 23. H. Prakken and G. Sartor. A system for defeasible argumentation, with defeasible priorities. Artificial Intelligence Today, (M. Wooldridge and M. M. Veloso, eds.), LNCS 1600, pp. 365–379, Springer, 1999. 24. H. Prakken and G. Vreeswijk. Logical systems for defeasible argumentation. Handbook of Philosophical Logic, 2nd edition, (D. Gabbay and F. Guenthner eds.), Vol. 4, Kluwer Academic Publishers, 2001. 25. R. Reiter, A logic for default reasoning. Artificial Intelligence 13:81–132, Elsevier, 1980). 26. D. Sacc` a, C. Zaniolo, Stable model semantics and non-determinism for logic programs with negation. Proceedings of the 9th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, ACM Press, pp. 205–217, 1990.

Argumentation-Based Proof Procedures

309

27. K. Satoh, A top-down proof procedure for default logic by using abduction. Proceedings of the Eleventh European Conference on Artificial Intelligence, pp. 65-69, John Wiley and Sons, 1994. 28. K. Satoh and N. Iwayama. A Query Evaluation Method for Abductive Logic Programming. Proceedings of the Joint International Conference and Symposium on Logic Programming, pp. 671 – 685, 1992. 29. G.R. Simari and R.P. Loui. A mathematical treatment of defeasible reasoning and its implementation. Artificial Intelligence, 52:125–257, 1992. 30. M. Thielscher, A nonmonotonic disputation-based semantics and proof procedure for logic programs. Proceedings of the 1996 Joint International Conference and Symposium on Logic Programming (M. Maher ed.), pp. 483–497, 1996. 31. F. Toni, Argumentation-theoretic proof procedures for logic programming. Technical Report, Department of Computing, Imperial College, 1997. 32. G. Vreeswijk. The feasibility of defeat in defeasible reasoning. Proceedings of the 2nd Int. Conf. on Principles of Knowledge Representation and Reasoning (KR’91), (J.F. Allen, R. Fikes, E. Sandewall, eds.), pp. 526–534, 1991.

310

A

Phan Minh Dung, Paolo Mancarella, and Francesca Toni

Stratified and Order Consistent Frameworks

We recall the deﬁnitions of stratiﬁed and order consistent ﬂat argumentationtheoretic frameworks, and theire semantics properties, ad given in [1]. Both classes are characterized in terms of their attack relationship graphs. The attack relationship graph of a ﬂat assumption-based framework

T, Ab, is a directed graph whose nodes are the assumptions in Ab and such that there exists an edge from an assumption δ to an assumption α if and only if δ belongs to a minimal (with respect to set inclusion) attack ∆ against α. A ﬂat assumption-based framework is stratiﬁed if and only if its attack relationship graph is well-founded, i.e. it contains no inﬁnite path of the form α1 , . . . , αn , . . . , where for every i ≥ 0 there is an edge from αi+1 to αi . The notion of order-consistency requires some more auxiliary deﬁnitions. Given a ﬂat assumption-based framework T, Ab, let δ, α ∈ Ab. – δ is friendly (resp. hostile) to α if and only if the attack relationship graph for T, Ab, contains a path from δ to α with an even (resp. odd) number of edges. – δ is two-sided to α, written δ ≺ α, if δ is both friendly and hostile to α. A ﬂat assumption-based framework T, Ab, is order-consistent if the relation ≺ is well-founded, i.e. there exists no inﬁnite sequence of the form α1 , . . . , αn , . . . , where for every i ≥ 0, αi+1 ≺ αi . The following proposition summarizes some of the semantics results of [1] as far as stratiﬁed and order-consistent frameworks are concerned. Proposition 1 (see [1]). – for any stratiﬁed assumption-based framework there exists a unique stable set of assumptions, which coincides with the well-founded set of assumptions. – for any order-consistent assumption-based framework stable sets of assumptions are preferred sets of assumptions and viceversa. It is worth recalling that the abstract notions of stratiﬁcation and orderconsistency generalize the notions of stratiﬁcation and order-consistency for logic programming.

Automated Abduction Katsumi Inoue Department of Electrical and Electronics Engineering Kobe University Rokkodai, Nada, Kobe 657-8501, Japan [email protected]

Abstract. In this article, I review Peirce’s abduction in the context of Artiﬁcial Intelligence. First, I connect abduction from ﬁrst-order theories with nonmonotonic reasoning. In particular, I consider relationships between abduction, default logic, and circumscription. Then, based on a ﬁrst-order characterization of abduction, I show a design of abductive procedures that utilize automated deduction. With abductive procedures, proof procedures for nonmonotonic reasoning are also obtained from the relationship between abduction and nonmonotonic reasoning.

1

Introduction

Kowalski had a decisive impact on the research of abductive reasoning in AI. In 1979, Kowalski showed the role of abduction in information system in his seminal book “Logic for Problem Solving” [58]. In the book, Kowalski also pointed out some similarity between abductive hypotheses and defaults in nonmonotonic reasoning. This article is devoted to investigate such a relation in detail and to give a mechanism for automated abduction from ﬁrst-order theories. In this article, Peirce’s logic of abduction is ﬁrstly reviewed in Section 2, and is then related to a formalization of explanation within ﬁrst-order logic. To know what formulas hold in the theory augmented by hypotheses, the notion of prediction is also introduced. There are two approaches to nonmonotonic prediction: credulous and skeptical approaches, depending on how conﬂicting hypotheses are treated. In Section 3, it is shown that abduction is related to the brave approach, in particular to the simplest subclass of default logic [87] for which eﬃcient theorem proving techniques may exist. On the other hand, circumscription [70] is a notable example of the skeptical approach. Interestingly, the skeptical approach is shown to be realized using the brave approach. In Section 4, computational properties of abduction are discussed in the context of ﬁrst-order logic. To make abduction and nonmonotonic reasoning computable, the consequence-ﬁnding problem in ﬁrst-order logic is reviewed, which is an important challenging problem in automated deduction [61,35,68]. The problem of consequence-ﬁnding is then modiﬁed so that only interesting clauses with a certain property (called characteristic clauses) are found. Then, A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 311–341, 2002. c Springer-Verlag Berlin Heidelberg 2002

312

Katsumi Inoue

abduction is formalized in terms of characteristic clauses. Two consequenceﬁnding procedures are then introduced: one is SOL resolution [35], and the other is ATMS [14]. Compared with other resolution methods, SOL resolution generates fewer clauses to ﬁnd characteristic clauses in general. Finally, this article is concluded in Section 5, where Peirce’s abduction is revisited with future work. It should be noted that this article does not cover all aspects of abductive reasoning in AI. General considerations on abduction in science and AI are found in some recent books [50,26,67] and survey papers [56,78]. Applications of abduction in AI are also excluded in this article. This article mostly focuses on ﬁrst-order abduction, i.e., automated abduction from ﬁrst-order theories, and its relationship with nonmonotonic reasoning with ﬁrst-order theories. Often however, abduction is used in the framework of logic programming, which is referred to as abductive logic programming [53,54,20]. This article omits details of abductive logic programming, but see [51] in this volume. Part of this article is excerpted from the author’s thesis [36] and a summary paper by the author [37].

2

Logic of Abduction

Abduction is one of the three fundamental modes of reasoning characterized by Peirce [79], the others being deduction and induction. To see the diﬀerences between these three reasoning modes, let us look at the “beans” example used by Peirce [79, paragraph 623] in a syllogistic form. Abduction amounts to concluding the minor premise (Case) from the major premise (Rule) and the conclusion (Result): (Rule) All the beans from this bag are white. (Result) These beans are white. (Case) These beans are from this bag. On the contrary, deduction amounts to concluding Result from Rule and Case, and induction amounts to concluding Rule from Case and Result. Later, Peirce wrote an inferential form of abduction as follows. The (surprising) fact, C, is observed; But if A were true, C would be a matter of course; Hence, there is reason to suspect that A is true. This corresponds to the following rule of the form, called the fallacy of aﬃrming the consequent : C A⊃C . (1) A Sometimes A is called an explanans for an explanandum C. Both abduction and induction are non-deductive inference and generate hypotheses. However, hypothesis generation by abduction is distinguished from that by induction, in

Automated Abduction

313

the sense that while induction infers something to be true through generalization of a number of cases of which the same thing is true, abduction can infer something quite diﬀerent from what is observed.1 Therefore, according to Peirce [79, paragraph 777], abduction is “the only kind of reasoning which supplies new ideas, the only kind which is, in this sense, synthetic”. Since abduction can be regarded as a method to explain observations, Peirce considered it as the basic method for scientiﬁc discovery. In the above sense, abduction is “ampliative” reasoning and may play a key role in the process of advanced inference. For example, analogical reasoning can be formalized by abduction plus deduction [79, paragraph 513]. Abduction is, however, only “probable” inference as it is non-deductive. That is, as Peirce argues, abduction is “a weak kind of inference, because we cannot say that we believe in the truth of the explanation, but only that it may be true”. This phenomenon of abduction is preferable, since our commonsense reasoning also has a probable nature. In everyday life, we regularly form hypotheses, to explain how other people behave or to understand a situation, by ﬁlling in the gaps between what we know and what we observe. Thus, abduction is a very important form of reasoning in everyday life as well as in science and engineering. Another important issue involved in abduction is the problem of hypothesis selection: what is the best explanation, and how can we select it from a number of possible explanations which satisfy the rule (1)? Peirce considered this problem philosophically, and suggested various preference criteria that are both qualitative and economical. One example of such criteria is the traditional maxim of Occam’s razor, which adopts the simplest hypotheses. In the following subsections, we give a logic of abduction studied in AI from two points of views, i.e., explanation and prediction. 2.1

Explanation

Firstly, we connect Peirce’s logic of abduction with formalizations of abduction developed in AI within ﬁrst-order logic. The most popular formalization of abduction in AI deﬁnes an explanation as a set of hypotheses which, together with the background theory, logically entails the given observations. This deductivenomological view of explanation [33] has enabled us to have logical speciﬁcations of abduction and their proof procedures based on the resolution principle [89]. There are a number of proposals for resolution-based abductive systems [85,10,25,84,88,91,96,34,83,18,35,53,97,13,19,16]. According to the deductive-nomological view of explanation, we here connect Peirce’s logic of abduction (1) with research on abduction in AI. To this end, we make the following assumptions. 1. Knowledge about a domain of discourse, or background knowledge, can be represented in a set of ﬁrst-order formulas as the proper axioms. In the following, we denote such an axiom set by Σ, and call it a set of facts. 1

The relation, diﬀerence, similarity, and interaction between abduction and induction are now extensively studied by many authors in [26].

314

Katsumi Inoue

2. An observation is also expressed as a ﬁrst-order formula. Given an observation C, each explanation A of C satisfying the rule (1) can be constructed from a sub-vocabulary H of the representation language that contains Σ. We call each formula constructed from such a subset of the language a hypothesis. In general, a hypothesis constructed from H is a formula whose truth value is indeﬁnite but may be assumed to be true. Sometimes H is the representation language itself. 3. The major premise A ⊃ C in the rule (1) can be obtained deductively from Σ, either as an axiom contained in Σ or as a logical consequence of Σ: Σ |= A ⊃ C .

(2)

4. Σ contains all the information required to judge the acceptability of each hypothesis A as an explanation of C. That is, each formula A satisfying (2) can be tested for its appropriateness without using information not contained in Σ. One of these domain-independent, necessary conditions is that A should not be contradictory to Σ, or that Σ ∪ {A} is consistent. 5. We adopt Occam’s razor as a domain-independent criterion for hypothesis selection. Namely, the simplest explanation is preferred over any other. These assumptions are useful particularly for domain-independent automated abduction. The ﬁrst and second conditions above deﬁne a logical framework of abduction: the facts and the hypotheses are both ﬁrst-order formulas. The third and fourth conditions give a logical speciﬁcation of the link between observations and explanations: theories augmented with explanations should both entail observations and be consistent. Although these conditions are most common in abductive theories proposed in AI, the correctness of them from the philosophical viewpoint is still being argued. The ﬁfth condition, simplicity, is also one of the most agreeable criterion to select explanations: a simpler explanation is preferred if every other condition is equal in multiple explanations. Note that these conditions are only for the deﬁnition of explanations. Criteria for good, better, or best explanations are usually given using meta information and domain-dependent heuristics. A number of factors should be considered in selecting the most reasonable explanation. Since there has been no concrete consensus among AI researchers or philosophers about the preference criteria, we will not discuss them further in this article. An example of the above abductive theory can be seen in the Theorist system by Poole, Goebel and Aleliunas [84], which consists of a ﬁrst-order theorem prover that distinguishes facts from hypotheses. Definition 2.1 (Theorist) Let Σ be a set of facts, and Γ a set of hypotheses. We call a pair (Σ, Γ ) an abductive theory. Given a closed formula G, a set E of ground instances of elements of Γ is an explanation of G from (Σ, Γ )2 if 1. Σ ∪ E |= G, and 2. Σ ∪ E is consistent. 2

Some Theorist literature [81] gives a slightly diﬀerent deﬁnition, where a set Σ ∪ E (called a scenario) satisfying the two conditions is called an explanation of G.

Automated Abduction

315

An explanation E of G is minimal if no proper subset E of E is an explanation of G. The ﬁrst condition in the above deﬁnition reﬂects the fact that Theorist has been devised for automated scientiﬁc theory formation, which is useful for prototyping AI problem solving systems by providing a simple “hypothesizetest” framework, i.e., hypothetical reasoning. When an explanation is a ﬁnite set of hypotheses, E = {H1 , . . . , Hn }, the ﬁrst condition is equivalent to Σ |= H1 ∧ . . . ∧ Hn ⊃ G by deduction theorem, and thus can be written in the form of (2). The minimality criterion is a syntactical form of Occam’s razor. Since for an explanation E of G, any E ⊆ E is consistent with Σ, the condition can be written as: an explanation E of G is minimal if no E ⊂ E satisﬁes Σ ∪ E |= G. Note that in Theorist, explanations are deﬁned as a set of ground instances. A more general deﬁnition of (minimal) explanations is deﬁned in [35], in which variables can be contained in explanations. Example 2.2 Suppose that (Σ1 , Γ1 ) is an abductive theory, where Σ1 = { ∀x( Bird(x) ∧ ¬Ab(x) ⊃ F lies(x) ) , ∀x( P enguin(x) ⊃ Ab(x) ) , Bird(T weety) } , Γ1 = { ¬Ab(x) } . Here, the hypothesis ¬Ab(x) means that for any ground term t, ¬Ab(t) can be hypothesized. In other words, a hypothesis containing variables is shorthand for the set of its ground instances with respect to the elements from the universe of the language. Intuitively, ¬Ab(x) means that anything can be assumed to be not abnormal (i.e., normal). In this case, a minimal explanation of F lies(T weety) is { ¬Ab(T weety) }. In Theorist, a set Γ of hypotheses can be any set of ﬁrst-order formulas. Poole [81] shows a naming method which transforms each hypothesis in Γ into an atomic formula. The naming method converts an abductive theory (Σ, Γ ) into a new abductive theory (Σ , Γ ) in the following way. For every hypothesis F (x) in Γ , where x = x1 , . . . , xn is the tuple of the free variables appearing in F , we associate a predicate symbol δF not appearing anywhere in (Σ, Γ ), and deﬁne the following sets of formulas: Γ = { δF (x) | F (x) ∈ Γ } , Σ = Σ ∪ { ∀x( δF (x) ⊃ F (x) ) | F (x) ∈ Γ } . Then, there is a 1-1 correspondence between the explanations of G from (Σ, Γ ) and the explanations of G from (Σ , Γ ) [81, Theorem 5.1].

316

Katsumi Inoue

Example 2.2 (continued) The hypothesis ¬Ab(x) can be named N ormal(x): Σ1 = Σ ∪ { ∀x( N ormal(x) ⊃ ¬Ab(x) ) } , Γ1 = { N ormal(x) } . In this case, a minimal explanation of F lies(T weety) is { N ormal(T weety) }, which corresponds to the explanation { ¬Ab(T weety) } from the original (Σ1 , Γ1 ). Naming hypotheses is a technique commonly used in most abductive systems because hypotheses in the form of atomic formulas can be processed very easily in their implementation. Restriction of hypotheses to atoms is thus used in many abductive systems such as [25,96,52,9]. Note that when we use a resolution procedures for non-Horn clauses, we can allow for negative as well as positive literals as names of hypotheses, since both positive and negative literals can be resolved upon in the procedure. For Example 2.2, we do not have to rename the negative literal ¬Ab(x) to the positive literal N ormal(x). This kind of negative abnormal literal was originally used by McCarthy [71], and is convenient for computing circumscription through abduction. Abductive systems that allow literal hypotheses can be seen in such as [85,10,35]. It should be noted that there are many other formalizations of abduction. For example, abduction is deﬁned by the set covering model [6], is discussed at the knowledge level [63], and is formalized in various ways [100,5,12,65,80,1]. Levesque’s [63] formulation suggests that abduction does not have to be formalized within ﬁrst-order logic. There are some proposals for abductive theories based on other logical languages. In such cases, the background knowledge is often written in a nonmonotonic logic. For example, abductive logic programming (ALP) is an extension of logic programming, which is capable of abductive reasoning as well as nonmonotonic reasoning [52,53,38,44,13,28,54,19,20,51]. Abduction is also deﬁned within a modal logic in [94], autoepistemic logic in [43], or default logic in [22]. Inoue and Sakama [43] point out that, in abduction from nonmonotonic theories, abductive explanations can be obtained not only by addition of new hypotheses, but also by removal of old hypotheses that become inappropriate. 2.2

Prediction

Theory formation frameworks like Theorist can be used for prediction as well as abduction. In [82], a distinction between explanation and prediction is discussed as follows. Let (Σ, Γ ) be an abductive theory, G a formula, and E an explanation of G from (Σ, Γ ) as deﬁned by Deﬁnition 2.1. 1. In abduction, G is an observation which is known to be true. We may assume E is true because G is true. 2. In prediction, G is a formula or a query whose truth value is unknown but is expected to be true. We may assume E is true to make G hold under E.

Automated Abduction

317

Both of the above ways of theory formation perform hypothetical reasoning, but in diﬀerent ways. In abduction, hypotheses used to explain observations are called conjectures, whereas, in prediction, hypotheses are called defaults [81,82]. In Example 2.2, if we have observed that T weety was ﬂying and we want to know why this observation could have occurred, then obtaining the explanation E1 = ¬Ab(T weety) is abduction; but if all we know is only the facts Σ1 and we want to know whether T weety can ﬂy or not, then ﬁnding E1 is prediction where we can expect T weety may ﬂy by default reasoning. These two processes may occur successively: when an observation is made, we abduce possible hypotheses; from these hypotheses, we predict what else we can expect to be true. In such a case, hypotheses can be used as both conjectures and defaults. See also [91,50] for other discussions on the diﬀerence between explanation and prediction. A hypothesis regarded as a default may be used unless there is evidence to the contrary. Therefore, defaults may be applied as many as possible unless augmented theories are inconsistent. This leads to the notion of extensions [81]. Definition 2.3 Given the facts Σ and the hypotheses (defaults) Γ , an extension of the abductive theory (Σ, Γ ) is the set of logical consequences of Σ ∪ M where M is a maximal (with respect to set inclusion) set of ground instances of elements of Γ such that Σ ∪ M is consistent. Using the notion of extensions, various alternative deﬁnitions of what should be predicted can be given [82]. They are related to the multiple extension problem: if G1 holds in an extension X1 and G2 holds in another extension X2 , but there is no extension in which both G1 and G2 hold (i.e., X1 ∪X2 is inconsistent), then what should we predict? —Nothing? Both? Or just one of G1 and G2 ? The next two are the most well-known prediction policies: 1. Predict what holds in an extension of (Σ, Γ ); 2. Predict what holds in all extensions of (Σ, Γ ). The ﬁrst approach to default reasoning leads to multiple extensions and is called a credulous approach. On the other hand, the latter approach is called a skeptical approach. Credulous and skeptical reasoning are also called brave and cautious reasoning, respectively. In the next section, we see that credulous prediction can be directly characterized by explanation and that skeptical prediction can be represented by combining explanations.

3

Relating Abduction to Nonmonotonic Reasoning

In this section, we relate the abductive theories introduced in Section 2 to formalisms of nonmonotonic reasoning. Since abduction is ampliative and plausible reasoning, conclusions of abductive reasoning may not be correct. Therefore, abduction is nonmonotonic. This can be easily veriﬁed for abductive theories. First, an explanation E is consistent with the facts Σ by deﬁnition, but E is not necessarily an explanation with

318

Katsumi Inoue

respect to the new facts Σ (⊃ Σ) because Σ ∪ E may not be consistent. Second, a minimal explanation E of G with respect to Σ may not be minimal with respect to Σ (⊃ Σ) because a subset E of E may satisfy Σ ∪ E |= G. Poole [82] investigates other possibilities of nonmonotonicity that may arise according to changes of facts, hypotheses, and observations. The above discussion can also be veriﬁed by considering relationships between abduction and nonmonotonic logics. In fact, this link is bidirectional [36,56]: abduction can be formalized by a credulous form of nonmonotonic logic (default logic), and a skeptical nonmonotonic formalism (circumscription) can be represented using an abductive theory. The former relationship veriﬁes the nonmonotonicity of abduction, and the latter implies that abduction can be used for commonsense reasoning as well as scientiﬁc theory formation. 3.1

Nonmonotonic Reasoning

We here review two major formalisms for nonmonotonic reasoning: default logic [87] and circumscription [70]. Both default logic and circumscription extend the classical ﬁrst-order predicate calculus, but in diﬀerent ways. Default logic introduces inference rules referring to the consistency with a belief set, and uses them meta-theoretically to extend a ﬁrst-order theory. Circumscription, on the other hand, augments a ﬁrst-order theory with a second-order axiom expressing a kind of minimization principle, and restricts the objects satisfying a certain predicate to just those that the original theory says must satisfy that predicate. Default Logic. Default logic, proposed by Reiter [87], is a logic for drawing plausible conclusions based on consistency. This is one of the most intuitive and natural logics for nonmonotonic reasoning. One of the most successful results derived from the studies on default logic can be seen in the fact that logic programming with negation as failure can be interpreted as a class of default logic [2,29]. In this article, we also see that abduction can be characterized by one of the simplest classes of default logic (Section 3.2). A default is an inference rule of the form: α(x) : M β1 (x), . . . , M βm (x) , γ(x)

(3)

where α(x), β1 (x), . . . , βm (x), and γ(x) are ﬁrst-order formulas whose free variables are contained in a tuple of variables x. α(x) is called the prerequisite, β1 (x), . . . , βm (x) the justiﬁcations, and γ(x) the consequent of the default. A default is closed if no formula in it contains a free variable; otherwise it is open. An open default is usually identiﬁed with the set of closed defaults obtained by replacing the free variables with ground terms. A default is normal if it contains only one justiﬁcation (m = 1) that is equivalent to the consequent (β1 ≡ γ). A default theory is a pair, (D, W ), where D is a set of defaults and W is a set of ﬁrst-order formulas which represents proper axioms. A default theory is normal if every default is normal.

Automated Abduction

319

The intended meaning of the default (3) is: for any tuple t of ground terms, “if α(t) is inferable and each of β1 (t), . . . , βm (t) is consistently assumed, then infer γ(t)”. When a default is applied, it is necessary that each of its justiﬁcations is consistent with a “belief set”. In order to express this condition formally, an acceptable “belief set” induced by reasoning with defaults (called an extension) is precisely deﬁned in default logic as follows. Definition 3.1 [87] Let (D, W ) be a default theory, and X a set of formulas. X is an extension of (D, W ) if it coincides with the smallest set Y of formulas satisfying the following three conditions: 1. W ⊆ Y . 2. Y is deductively closed, that is, it holds that cl(Y ) = Y , where cl(Y ) is the logical closure of Y under classical ﬁrst-order deduction. 3. For any ground instance of any default in D of the form (3), if α(t) ∈ Y and ¬β1 (t), . . . , ¬βm (t) ∈ X, then γ(t) ∈ Y . A default theory may have multiple or, even, no extensions. However, it is known that for any normal default theory, there is at least one extension [87, Theorem 3.1]. It is also noted that in default logic each extension is interpreted as an acceptable set of beliefs in accordance with default reasoning. Such an approach to default reasoning leads to multiple extensions and is a credulous approach. By credulous approaches one can get more conclusions depending on the choice of the extension so that conﬂicting beliefs can be supported by different extensions. This behavior is not necessarily intrinsic to a reasoner dealing with a default theory; we could deﬁne the theorems of a default theory to be the intersection of all its extensions so that we remain agnostic to conﬂicting information. This latter variant is a skeptical approach. Circumscription. Circumscription, proposed by McCarthy [70], is one of the most “classical” and best-developed formalisms for nonmonotonic reasoning. An important property of circumscription that many other nonmonotonic formalisms lack, is that it is based on classical predicate logic. Let T be a set of ﬁrst-order formulas, and P and Z denote disjoint tuples of distinct predicate symbols in the language of T . The predicates in P are said to be minimized and those in Z to be variables; Q denotes the rest of the predicates in the language of T , called the ﬁxed predicates (or parameters). We denote a theory T by T (P; Z) when we want to indicate explicitly that T mentions the predicates P and Z. Adopting the formulation by Lifschitz [64], the circumscription of P in T with Z, written CIRC (T ; P; Z), is the augmentation of T with a second-order axiom expressing the minimality condition: T (P; Z) ∧ ¬∃ pz (T (p; z) ∧ p < P) .

(4)

Here, p and z are tuples of predicate variables each of which has the same arity as the corresponding predicate symbol in P and Z, and T (p; z) denotes a theory obtained from T (P; Z) by replacing each occurrence of P and Z with p and z.

320

Katsumi Inoue

Also, p < P stands for the conjunction of formulas each of which is deﬁned, for every member Pi of P with a tuple x of object variables and the corresponding predicate variable pi in p, in the form: ∀x(pi (x) ⊃ Pi (x)) ∧ ¬∀x(Pi (x) ⊃ pi (x)) . Thus, the second-order formula in the deﬁnition (4) represents that the extension of the predicates from P is minimal in the sense that it is impossible to make it smaller without violating the constraint T . Intuitively, CIRC (T ; P; Z) is intended to minimize the number of objects satisfying P, even at the expense of allowing more or diﬀerent objects to satisfy Z. The model-theoretic characterization of circumscription is based on the notion of minimal models. Definition 3.2 [64] Let M1 and M2 be models of T with the same universe. We write M1 ≤P,Z M2 if M1 and M2 diﬀer only in the way they interpret predicates from P and Z, and the extension of every predicate P from P in M1 is a subset of the extension of P in M2 . Then, a model M of T is (P, Z)-minimal if, for no other model M of T , M ≤P,Z M but M ≤P,Z M . It is known that, for any formula F , CIRC (T ; P; Z) |= F if and only if F is satisﬁed by every (P, Z)-minimal model of T [64]. Since each theorem of a circumscription is satisﬁed by all minimal models, this property makes the behavior of circumscription skeptical. 3.2

Abduction and Default Logic

Suppose that Σ is a set of facts and Γ is a set of hypotheses. In order to avoid confusion in terminology, we here call an extension of the abductive theory (Σ, Γ ) given by Deﬁnition 2.3 a Theorist extension, and call an extension of a default theory (D, W ) given by Deﬁnition 3.1 a default extension. Let w(x) be a formula whose free variables are x. For Σ and Γ , we deﬁne a normal default theory (DΓ , Σ), where : M w(x) DΓ = w(x) ∈ Γ . w(x) Notice that DΓ is a set of prerequisite-free normal defaults, that is, normal defaults whose prerequisites are true. We obtain the next theorem by resluts from [81, Theorems 2.6 and 4.1]. Theorem 3.3 Let (Σ, Γ ) be an abductive theory, and G a formula. The following three are equivalent: (a) There is an explanation of G from (Σ, Γ ). (b) There is a Theorist extension of (Σ, Γ ) in which G holds. (c) There is a default extension of the default theory (DΓ , Σ) in which G holds.

Automated Abduction

321

Theorem 3.3 is very important for the following reasons. 1. It is veriﬁed that each abductive explanation is contained in a possible set of beliefs. In particular, when the hypotheses Γ represent defaults for normal or typical properties, then in order to predict a formula G by default reasoning, it is suﬃcient to ﬁnd an explanation of G from (Σ, Γ ) [81]. 2. All properties possessed by normal default theories are valid for abductive explanations and Theorist extensions. For instance, for any Σ and Γ , there is at least one Theorist extension of (Σ, Γ ). 3. Computation of abduction can be given by top-down default proofs [87], which is an extension of linear resolution theorem proving procedures such as [59,7,66]. This fact holds for the following reasons. It is shown that, G holds in some default extension of a normal default theory (D, W ) if and only if there is a top-down default proof of G with respect to (D, W ) [87, Theorem 7.3]. Also, every top-down default proof returns a set S of instances of consequents of defaults from D with which G can be proven from W , i.e., W ∪ S |= G. Therefore, such an S is an explanation from the corresponding abductive theory whenever W ∪ S is consistent. The last point above is also very useful for designing and implementing hypothetical reasoning systems. In fact, many ﬁrst-order abductive procedures [85,10,84,96,83] can be regarded as variants of Reiter’s top-down default proof procedures: computation of explanations of G from (Σ, Γ ) can be seen as an extension of proof-ﬁnding in linear resolution by introducing a set of hypotheses from Γ that, if they could be proven by preserving the consistency of the augmented theories, would complete the proofs of G. Alternatively, abduction can be characterized by a consequence-ﬁnding problem [35], in which some literals are allowed to be hypothesized (or skipped ) instead of proven, so that new theorems consisting of only those skipped literals are derived at the end of deductions instead of just deriving the empty clause. In this sense, abduction can be implemented as an extension of deduction, in particular of a top-down, backwardchaining theorem-proving procedure. For example, Theorist [84,83] and SOL resolution [35] are extensions of the Model Elimination procedure [66]. Example 2.2 (continued) For the goal G = F lies(T weety), a version of Theorist implementation works as follows (written using a Prolog-like notation): ← F lies(T weety) , ← Bird(T weety) ∧ ¬Ab(T weety) , ← ¬Ab(T weety) , 2 by defaults: {¬Ab(T weety)} . Then, the returned set of defaults S = {¬Ab(T weety)} is checked for the consistency with Σ1 by failing to prove the negation of S from Σ1 . In this case, it holds that Σ1 |= Ab(T weety) , thus showing that S is an explanation of G from (Σ1 , Γ1 ).

322

Katsumi Inoue

Next, suppose that P enguin(T weety) is added to Σ1 , and let Σ2 = Σ1 ∪ { P enguin(T weety) } . We then get S again by the same top-down default proof as above, but the consistency check of S in this case results in a success proof: ← Ab(T weety) , ← P enguin(T weety) , 2. Therefore, S is no longer an explanation of G from (Σ2 , Γ1 ). 3.3

Abduction and Circumscription

A signiﬁcant diﬀerence between circumscription and default logic lies in their ways to handle variables and equality. We then assume that the function symbols are the constants only and the number of constants is ﬁnite. Furthermore, in this subsection, a theory T means a set of formulas over the language including the equality axioms, and both the domain-closure assumption (DCA) and the unique-names assumption (UNA) are assumed to be satisﬁed by T . In this setting, the UNA represents that each pair of distinct constants denotes diﬀerent individuals in the domain. The DCA implies that the theory has ﬁnite models and that every formula containing variables is equivalent to a propositional combination of ground atoms. Although these assumptions are strong, their importance is widely recognized in databases and logic programming. For circumscription, these assumptions make the universe ﬁxed, so that the comparison with default logic becomes clear [24]. In particular, circumscription with these assumptions is essentially equivalent to the Extended Closed World Assumption (ECWA) [30]. Another big diﬀerence between circumscription and default logic is in their approaches to default prediction: skeptical versus credulous. The theorems of a circumscription are the formulas satisﬁed by every minimal model, while there are multiple default extensions in default logic. We, therefore, compare the theorems of a circumscription with the formulas contained in every default extension of a default theory. On the relationship between circumscription and default logic, Etherington [24] has shown that, under some conditions, a formula is entailed by a circumscription plus the DCA and the UNA if and only if the formula is contained in every default extension of the corresponding default theory. Proposition 3.4 [24] Assume that T is a theory satisfying the above conditions. Let P be a tuple of predicates, and Z the tuple of all predicates other than those in P in the language. Then, the formulas true in every default extension of the default theory: : M ¬Pi (x) P ∈ P , T (5) i ¬Pi (x) are precisely the theorems of CIRC (T ; P; Z).

Automated Abduction

323

Since the default theory (5) is a prerequisite-free normal default theory, we can connect each of its default extensions with a Theorist extension using Theorem 3.3. Therefore, in the abductive theory, we hypothesize the negative occurrences of the minimized predicates P. The following corollary can be obtained by Theorem 3.3 and the model theory of circumscription. Corollary 3.5 Let T , P and Z be the same as in Proposition 3.4. A (P, Z)minimal model of T satisﬁes a formula F if and only if F has an explanation from the abductive theory (T, { ¬Pi (x) | Pi ∈ P }). The above corollary does not deal with a skeptical prediction but a credulous one. Moreover, Proposition 3.4 does not allow for the speciﬁcation of ﬁxed predicates. Gelfond et al. [30], on the other hand, show a more general result for the ECWA by allowing some predicates to be ﬁxed. The idea of reducing circumscription to the ECWA is very important as it is helpful for designing resolution-based theorem provers for circumscription [86,31,41,32]. Earlier work for such a reduction of circumscription to a special case of the ECWA can be found in [73,3] where all predicates in the language are minimized. To compute circumscription, we are particularly interested in two results of the ECWA obtained by Gelfond et al. [30, Theorems 5.2 and 5.3] with the notion of free for negation. These are also adopted as the basic characterizations for query answering in circumscriptive theories by Przymusinski [86, Theorems 2.5 and 2.6]. Inoue and Helft [41] express them using diﬀerent terminology (characteristic clauses). Here, we relate these results of the ECWA with abduction. Let T be a theory as above, P the minimized predicates, Q the ﬁxed predicates, and Z variables. For a tuple R of predicates in the language, we denote by R+ (R− ) the positive (negative) occurrences of predicates from R in the language. Then, we deﬁne the abductive theory for circumscription, (T, Γcirc ), where the hypotheses are given as: Γcirc = P− ∪ Q+ ∪ Q− . Intuitively, both positive and negative occurrences of Q are hypothesized as defaults to prevent the abductive theory from altering the deﬁnition of each predicate from Q. The next theorem can be obtained from [30, Theorems 5.2 and 5.3]. Theorem 3.6 [41] (1) For any formula F not containing predicate symbols from Z, CIRC (T ; P; Z) |= F if and only if ¬F has no explanation from (T, Γcirc ). (2) For any formula F , CIRC (T ; P; Z) |= F if and only if there exist explanations E1 , . . . , En (n ≥ 1) of F from (T, Γcirc ) such that ¬(E1 ∨ . . . ∨ En ) has no explanation from (T, Γcirc ). Using Theorem 3.6, we can reduce query answering in a circumscriptive theory to the ﬁnding of a combination of explanations of a query such that the

324

Katsumi Inoue

negation of the disjunction cannot be explained. The basic intuition behind this theorem is as follows. In abduction, by Corollary 3.5, if a formula F is explained, then F holds in some default extension, that is, F is satisﬁed by some minimal model. In circumscription, on the other hand, F should be satisﬁed by every minimal model, or F should hold in all default extensions. This condition is checked by computing multiple explanations E1 , . . . , En of F corresponding to multiple default extensions such that those explanations cover all default extensions. Then, the disjunction E1 ∨ . . . ∨ En is also an explenation of F , and is a skeptical but the weakest explanation of F [55]. Combining explanations is like an argument system [82,83,32], which consists of two processes where one tries to ﬁnd explanations of the query and the other tries to ﬁnd a counter argument to refute them. Example 3.7 Consider the theory T consisting of the two formulas: ¬Bird(x) ∨ ¬Ab(x) ∨ F lies(x) , Bird(T weety) , where P = {Ab}, Q = {Bird} and Z = {F lies}, so that the abductive hypotheses are set to Γcirc = {Ab}− ∪ {Bird}+ ∪ {Bird}− . Let us consider the query F = F lies(T weety). Now, {¬Ab(T weety)} is an explanation of F . The negation of this explanation has no explanation. F is thus a theorem of CIRC (T ; Ab; F lies). Next, let T = T ∪ { Ab(T weety) ∨ Ab(Sam) }. Then ¬Ab(Sam) is an explanation of ¬Ab(T weety) from (T , Γcirc ). Hence, F is not a theorem of the circumscription of Ab in T . Skeptical prediction other than circumscription can also be characterized by credulous prediction. Instead of giving the hypotheses Γcirc , any set Γ of hypotheses can be used in Theorem 3.6 as follows. Corollary 3.8 Let (Σ, Γ ) be an abductive theory. A formula F holds in every Theorist extension of (Σ, Γ ) if and only if there exist explanations E1 , . . . , En (n ≥ 1) of F from (Σ, Γ ) such that ¬(E1 ∨ . . . ∨ En ) has no explanation from (Σ, Γ ). 3.4

Abduction and Other Nonmonotonic Formalization

Although we focused on default logic and circumscription as two major nonmonotonic formalization, abduction can also be used to represent other form of nonmonotonic reasoning. Here we brieﬂy cite such work for reference. One of the most important results in this area is a formalization of nonmonotonic reasoning by means of argumentation framework [60,4,54]. In [4], an assumption-based

Automated Abduction

325

framework (Σ, Γ, ∼) is deﬁned as a generalization of the Theorist framework. Here, like Theorist, Σ and Γ are deﬁned as facts and hypotheses respectively, but are not restricted to ﬁrst-order language. The mapping ∼ deﬁnes some notion of contrary of assumptions, and a defeated argument is deﬁned as an augmented theory whose contrary is proved. Varying the underlying language of Σ and Γ and the notion of ∼, this framework is powerful enough to deﬁne the semantics of most nonmonotonic logics, including Theorist, default logic, extended logic programs [29], autoepistemic logic [74], other non-monotonic modal logics, and certain instances of circumscription. This framework is applied to defeasible rules in legal reasoning [60] and is related to other methods in abductive logic programming [54]. In [45], abduction is also related to autoepistemic logic and negation as failure in extended disjunctive logic programs. In particular, an autoepistemic translation of a hypothesis γ is given as Bγ ⊃ γ . The set consisting of this autoepistemic formula produces two stable expansions, one containing γ and Bγ, the other containing ¬Bγ but neither γ nor ¬γ. With this property, we can deﬁne the world in which γ is assumed to be true, while another world not assuming γ is also kept.

4

Computing Abduction via Automated Deduction

This section presents computational methods for abduction. In Section 2.1, we have seen that abduction can be characterized within ﬁrst-order logic. Using this characterization, here we show a realization of automated abduction based on the resolution principle. 4.1

Consequence-Finding

As explained in Section 3.2, many abductive systems based on the resolution principle can be viewed as procedures that perform a kind of Reiter’s top-down default proofs. Now, we see the underlying principle behind such abductive procedures from a diﬀerent, purely deductive, viewpoint [35]. Firstly, the deﬁnition of abduction given in Section 2.1 can be represented as a consequence-ﬁnding problem, which is a problem of ﬁnding theorems of the given axiom set Σ. The consequence-ﬁnding problem is ﬁrstly addressed by Lee in 1967 [61] in the context of Robinson’s resolution principle [89]. Lee proved the completeness result that: Given a set of clauses Σ, if a clause C is a logical consequence of Σ, then the resolution principle can derive a clause D such that D implies C. In this sense, the resolution principle is said complete for consequence-ﬁnding. In Lee’s theorem, “D implies C” can be replaced with “D subsumes C”. Later,

326

Katsumi Inoue

Slagle, Chang and Lee [95] and Minicozzi and Reiter [72] showed that “the resolution principle” can also be replaced with “semantic resolution” and “linear resolution”, respectively. In practice, however, the set of theorems of an axiom set is generally inﬁnite, and hence the complete deductive closure is neither obtainable nor desirable. Toward more practical automated consequence-ﬁnding, Inoue [35] reformulated the consequence-ﬁnding problem as follows. Given a set of clauses Σ and some criteria of “interesting” clauses, derive each “interesting” clause that is a logical consequence of Σ and is minimal with respect to subsumption. Here, each interesting clause is called a characteristic clause. Criteria of interesting clauses are speciﬁed by a sub-vocabulary of the representation language called a production ﬁeld. In the propositional case, each characteristic clause of Σ is a prime implicate of Σ. The use of characteristic clauses enables us to characterize various reasoning problems of interest to AI, such as nonmonotonic reasoning [3,41,32,8], diagnosis [25,93], and knowledge compilation [69,15,90] as well as abduction. Moreover, for inductive logic programming (ILP), consequence-ﬁnding can be applied to generate hypothesis rules from examples and background knowledge [98,39], and is used as the theoretical background for discussing the completeness of ILP systems [76].3 An extensive survey of consequence-ﬁnding in propositional logic is given by Marquis [68]. Now, characteristic clauses are formally deﬁned as follows [35]. Let C and D be two clauses. C subsumes D if there is a substitution θ such that Cθ ⊆ D and C has no more literals than D [66]. C properly subsumes D if C subsumes D but D does not subsume C. For a set of clauses Σ, µΣ denotes the set of clauses in Σ not properly subsumed by any clause in Σ. A production ﬁeld P is a pair, L, Cond , where L is a set of literals and is closed under instantiation, and Cond is a certain condition to be satisﬁed. When Cond is not speciﬁed, P is denoted as L . A clause C belongs to P = L, Cond if every literal in C belongs to L and C satisﬁes Cond. When Σ is a set of clauses, the set of logical consequence of Σ belonging to P is denoted as T hP (Σ). Then, the characteristic clauses of Σ with respect to P are deﬁned as: Carc(Σ, P) = µ T hP (Σ) . Note that the empty clause 2 is the unique clause in Carc(Σ, P) if and only if Σ is unsatisﬁable. This means that proof-ﬁnding is a special case of consequenceﬁnding. When a new clause F is added to the set Σ of clauses, some consequences are newly derived with this new information. Such a new and “interesting” clause is called a “new” characteristic clauses. Formally, the new characteristic clauses of F with respect to Σ and P are deﬁned as: N ewcarc(Σ, F, P) = µ [ T hP (Σ ∪ {F }) − T h(Σ) ] . 3

In ILP, the completeness result of consequence-ﬁnding is often called the subsumption theorem [76], which was originally coined by Kowalski in 1970 [57].

Automated Abduction

327

The above deﬁnition is equivalent to the following [35]: N ewcarc(Σ, F, P) = Carc(Σ ∪ {F }, P) − Carc(Σ, P). 4.2

Abduction as Consequence-Finding

Now, we are ready to characterize abduction as consequence-ﬁnding. In the following, we denote the set of all literals in the representation language by L, and a set Γ of hypotheses is deﬁned as a subset of L. Any subset E of Γ is identiﬁed with the conjunction of all elements in E. Also, for any set T of formulas, T represents the set of formulas obtained by negating every formula in T , i.e., T = { ¬C | C ∈ T }. Let G1 , . . . , Gn be a ﬁnite number of observations, and suppose that they are all literals. We want to explain the observations G = G1 ∧ . . . ∧ Gn from (Σ, Γ ), where Σ is a set of clauses representing facts and Γ is a set of ground literals representing hypotheses. Let E = E1 ∧ . . . ∧ Ek be any explanation of G from (Σ, Γ ). Then, the following three hold: 1. Σ ∪ { E1 ∧ . . . ∧ Ek } |= G1 ∧ . . . ∧ Gn , 2. Σ ∪ { E1 ∧ . . . ∧ Ek } is consistent, 3. Each Ei is an element of Γ. These are equivallent to the next three conditions: 1 . Σ ∪ { ¬G1 ∨ . . . ∨ ¬Gn } |= ¬E1 ∨ . . . ∨ ¬Ek , 2 . Σ |= ¬E1 ∨ . . . ∨ ¬Ek , 3 . Each ¬Ei is an element of Γ . By 1 , a clause derived from the clause set Σ by adding the clause ¬G is the negation of an explanation of G from (Σ, Γ ), and this computation can be done as automated deduction over clauses.4 By 2 , such a derived clause must not be a consequence of Σ before adding ¬G. By 3 , every literal appearing in such a clause must belong to Γ . Moreover, E is a minimal explanation from (Σ, Γ ) if and only if ¬E is a minimal theorem from Σ ∪ {¬G}. Hence, the problem of abduction is reduced to the problem of seeking a clause such that (i) it is a minimal theorem of Σ ∪ {¬G}, but (ii) it is not a theorem of Σ alone, and (iii) it consists of literals only from Γ . Therefore, we obtain the following result. Theorem 4.1 [35] Let (Σ, Γ ) be an abductive theory, where Γ ⊆ L. Put the production ﬁeld as P = Γ . Then, the set of minimal explanations of an observation G from (Σ, Γ ) is: N ewcarc(Σ, ¬G, P) . 4

This way of computing hypotheses is often referred as “inverse entailment” in ILP [75,39]. Although there are some discussion against such a scheme of “abduction as deduction-in-reverse” [12], it is surely one of the most recognizable ways to construct possible hypotheses deductively.

328

Katsumi Inoue

In the above setting, we assumed that G is a conjunction of literals. Extending the form of each observation Gi to a clause is possible. When G is any formula, suppose that by converting ¬G into the conjunctive normal form we obtain a formula F = C1 ∧ · · · ∧ Cm , where each Ci is a clause. In this case, N ewcarc(Σ, F, P) can be decomposed into m N ewcarc operations each of whose added new formula is a single clause [35]: N ewcarc(Σ, F, P) = µ [

m

N ewcarc(Σi , Ci , P) ] ,

i=1

where Σ1 = Σ, and Σi+1 = Σi ∪ {Ci } for i = 1, . . . , m − 1. This incremental computation can also be applied to get the characteristic clauses of Σ with respect to P as: Carc(Σ, P) = N ewcarc(∅, Σ, P). In Theorem 4.1, explanations obtained by a consequence-ﬁnding procedure are not necessarily ground and can contain variables. Note, however, that in implementing resolution-based abductive procedures, both the query G and its explanation E are usually considered as existentially quantiﬁed formulas. When G contains universally quantiﬁed variables, each of them is replaced with a new constant or function in ¬G through Skolemization. Then, to get a universally quantiﬁed explanation in negating each new characteristic clause containing Skolem functions, we need to apply the reverse Skolemization algorithm [10]. For example, if ¬P (x, ϕ(x), u, ψ(u)) is a new characteristic clause where ϕ, ψ is a Skolem function, we get two explanations, ∃x∀y∃u∀v P (x, y, u, v) and ∃u∀v∃x∀y P (x, y, u, v) by reverse Skolemization. Using Theorems 3.6 and 4.1, skeptical prediction can also be realized by consequence-ﬁnding procedures as follows. Corollary 4.2 [41] Let CIRC (Σ; P; Z) be the circumscription of P in Σ with variables Z. Put Pcirc = P+ ∪ Q+ ∪ Q− , where Q is the ﬁxed predicates. (1) For any formula F not containing literals from Z, CIRC (Σ; P; Z) |= F if and only if N ewcarc(Σ, F, Pcirc ) = ∅. (2) For any formula F , CIRC (Σ; P; Z) |= F if and only if there is a conjunction G of clauses from N ewcarc(Σ, ¬F, Pcirc ) such that N ewcarc(Σ, ¬G, Pcirc ) = ∅. 4.3

SOL Resolution

To compute new characteristic clauses, Inoue [35] deﬁned an extension of the Model Elimination (ME) calculus [59,7,66] by adding the Skip rule to ME. The extension is called SOL resolution, and can be viewed either as OL resolution [7] (or SL resolution [59]) augmented with the Skip rule, or as a ﬁrst-order generalization of Siegel’s propositional production algorithm [93]. Note here that, although ME is complete for proof-ﬁnding (i.e., refutation-complete) [66], it is not complete for consequence-ﬁnding [72]. SOL resolution is useful for computing the (new) characteristic clauses for the following reasons.

Automated Abduction

329

(1) In computing N ewcarc(Σ, C, P), SOL resolution treats a newly added clause C as the top clause (or a start clause) input to ME. This is a desirable feature for consequence-ﬁnding since the procedure can directly derive the theorems relevant to the added information. (2) It is easy to focus on producing only those theorems belonging to the production ﬁeld. This is implemented by allowing an ME procedure to skip the selected literal belonging to P. In other words, SOL resolution is restricted to searching only characteristic clauses. Here, we show a deﬁnition of SOL resolution based on [35]. An ordered clause is a sequence of literals possibly containing framed literals which represent literals that have been resolved upon. A structured clause P, Q is a pair of a clause P and an ordered clause Q, whose clausal meaning is P ∪ Q. Definition 4.3 (SOL Resolution) Given a set of clauses Σ, a clause C, and a production ﬁeld P, an SOL-deduction of a clause S from Σ + C and P consists of a sequence of structured clauses, D0 , D1 , . . . , Dn , such that: D0 = 2, C . Dn = S, 2 . For each Di = Pi , Qi , Pi ∪ Qi is not a tautology. For each Di = Pi , Qi , Qi is not subsumed by any Qj with the empty substitution, where Dj = Pj , Qj is a previous structured clause, j < i. 5. For each Di = Pi , Qi , Pi belongs to P. 6. Di+1 = Pi+1 , Qi+1 is generated from Di = Pi , Qi according to the following steps: (a) Let l be the selected literal in Qi . Pi+1 and Ri+1 are obtained by applying one of the rules: i. (Skip) If Pi ∪ {l} belongs to P, then Pi+1 = Pi ∪ {l} and Ri+1 is the ordered clause obtained by removing l from Qi . ii. (Resolve) If there is a clause Bi in Σ ∪ {C} such that ¬k ∈ Bi and l and k are uniﬁable with mgu θ, then Pi+1 = Pi θ and Ri+1 is an ordered clause obtained by concatenating Bi θ and Qi θ, framing lθ, and removing ¬kθ. iii. (Reduce) If either A. Pi or Qi contains an unframed literal k (factoring/merge) or B. Qi contains a framed literal ¬k (ancestry), and l and k are uniﬁable with mgu θ, then Pi+1 = Pi θ and Ri+1 is obtained from Qi θ by deleting lθ. (b) Qi+1 is obtained from Ri+1 by deleting every framed literal not preceded by an unframed literal in the remainder (truncation).

1. 2. 3. 4.

When the Skip rule is applied to the selected literal in an SOL deduction, it is never solved by applying any resolution. To apply this rule, the selected literal has to belong to the production ﬁeld. When a deduction with the top clause C is completed, that is, every literal is either solved or skipped, those skipped literals are collected and output. This output clause is a logical consequence of Σ ∪ {C}

330

Katsumi Inoue

and every literal in it belongs to the production ﬁeld P. Note that when both Skip and resolution can be applied to the selected literal, these two rules are chosen non-deterministically. In [35], it is proved that SOL resolution is complete for both consequence-ﬁnding and ﬁnding (new) characteristic clauses. In [99], SOL resolution is implemented using the Weak Model Elimination method [66]. In [49], various pruning methods are introduced to enhance the eﬃciency of SOL resolution in a connection-tableau format [62]. In [16], del Val deﬁnes a variant of consequence-ﬁnding procedure for ﬁnding characteristic clauses, which is based on ordered resolution instead of Model Elimination. Example 4.4 [35] Suppose that Σ consists of the two clauses: (1) ¬P (x) ∨ Q(y, y) ∨ R(z, x) , (2) ¬Q(x, y) ∨ R(x, y) . Suppose also that the set of hypotheses is given as Γ = {P }+ . Then the production ﬁeld is P = Γ = {P }− . Now, consider the query, G = R(A, x), where the variable x is interpreted as existentially quantiﬁed, and we want to compute its answer substitution. The ﬁrst SOL-deduction from Σ + ¬G and P is as follows: (3) (4)

2 , ¬R(A, x) ,

2 , ¬P (x) ∨ Q(y, y) ∨ ¬R(A, x) ,

(5) ¬P (x) , Q(y, y) ∨ ¬R(A, x) ,

top clause resolution with (1) skip

(6) ¬P (x) , R(y, y) ∨ Q(y, y) ∨ ¬R(A, x) , resolution with (2) (7a) ¬P (A) , Q(A, A) ∨ ¬R(A, A) , (7b) ¬P (A) , 2 .

ancestry truncation

In the above SOL-deduction, P (A) is an explanation of the answer R(A, A) from (Σ, Γ ). Namely, Σ |= P (A) ⊃ R(A, A) . The second SOL-deduction from Σ + ¬G and P takes the same four steps as the above (3)–(6), but instead of applying ancestry at (7), R(y, y) is resolved upon against the clause ¬R(A, x ), yielding (7a ) ¬P (x) , R(A, A) ∨ Q(A, A) ∨ ¬R(A, x) , (7b ) ¬P (x) , 2 . In this case, ¬G is used twice in the SOL-deduction. Note that P (x) is not an explanation of any deﬁnite answer. It represents that for any term t, P (t) is an explanation of the indeﬁnite answer R(A, t) ∨ R(A, A). Namely, Σ |= ∀x( P (x) ⊃ R(A, x) ∨ R(A, A) ) .

Automated Abduction

331

By Theorem 4.1 and the completeness result of SOL resolution, we can guarantee the completeness for ﬁnding explanations from ﬁrst-order abductive theories. In contrast, the completeness does not hold for abductive procedures like [85,10], in which hypothesizing literals is allowed only when resolution cannot be applied for selected literals. The hypothesized, unresolved literals are “deadends” of deductions, and explanations obtained in this way are most-speciﬁc [96]. This kind of abductive computation can also be implemented in a variant of SOL resolution, called SOL-R resolution [35], by preferring resolution to Skip whenever both can be applied. On the other hand, there is another variant of SOL resolution, called SOL-S resolution [35], in which only Skip is applied by ignoring the possibility of resolution when the selected literal belongs to P. Each explanation obtained by using SOL-S resolution is called a least-speciﬁc explanation [96]. While most-speciﬁc explanations are often useful for application to diagnosis [85,10], least-speciﬁc explanations are used in natural language understanding [96] and computing circumscription by Corollary 4.2 [41]. 4.4

Bottom-Up Abduction

As shown by Reiter and de Kleer [88], an assumption-based truth maintenance system (ATMS) [14] is a propositional abductive system. In ATMS, facts are given as propositional Horn clauses and hypotheses are propositional atoms [63,34,92]. An extension of ATMS, which allows non-Horn propositional clauses for facts and propositional literals for hypotheses, is called a clause management system (CMS) [88]. The task of CMS is to compute the set of all minimal explanations of a literal G from (Σ, Γ ), where Σ is a set of propositional clauses and Γ ⊆ L is a set of hypotheses. In ATMS, the minimal explanations of an atom G is called the label of G. The label updating algorithm of ATMS [14] computes the label of every propositional atom in a bottom-up manner. This algorithm can be logically understood as a ﬁxpoint computation of the following semantic resolution. Let Γ be a set of propositional atoms, and Σ be a set of propositional Horn clauses. Suppose that N is either f alse or any atom appearing in Σ, and that Ni (1 ≤ i ≤ m; m ≥ 0) is any atom and Ai,j (1 ≤ i ≤ m; 1 ≤ j ≤ ni ; ni ≥ 0) is an element of Γ . Then, a clash in semantic resolution of the form: N1 ∧ . . . ∧ Nm ⊃ N Ai,1 ∧ . . . ∧ Ai,ni ⊃ Ni , for all i = 1, . . . , m Ai,j ⊃ N 1≤i≤m, 1≤j≤ni

represents multiple applications of resolution. The label updating algorithm of ATMS takes each clause in Σ as input one by one, applies the above clash as many as possible, and incrementally computes every theorem of Σ that are not subsumed by any other theorem of Σ. Then, each resultant minimal theorem

332

Katsumi Inoue

obtained by this computation yields a prime implicate of Σ. Now, let P I(Σ, Γ ) be the set of such prime implicates. The label of an atom N is obtained as { {A1 , . . . , Ak } ⊆ Γ | ¬A1 ∨ . . . ∨ ¬Ak ∨ N ∈ P I(Σ, Γ ) }. In particular, each element in the label of f alse is called a nogood, which is obtained as the negation of each negative clause from P I(Σ, Γ ). Nogoods are useful for recognizing forbidden combinations of hypotheses in many AI applications, and work as integrity constraints saying that those atoms cannot be assumed simultaneously. A typical implementation of the label updating algorithm performs the above clash computation for an atom N by: (i) generating the product of the labels of antecedent atoms of N , (ii) eliminating each element which is a superset of some nogood, and (iii) eliminating every non-minimal element from the rest. Although ATMS works for propositional abduction only, a similar clash rule that is complete for ﬁrst-order abduction is also proposed in [18], and a method to simulate the above crash using hyperresolution is proposed for ﬁrst-order abductive theories in [97]. Example 4.5 Let (Σ, Γ ) be a propositional Horn abductive theory such that Σ = { A ∧ B ⊃ P, C ⊃ P, B ∧ C ⊃ Q, D ⊃ Q, P ∧ Q ⊃ R,

C ∧ D ⊃ f alse },

Γ = { A, B, C, D }. We here presuppose the existence of tautology α ⊃ α in Σ for each assumption α ∈ Γ , i.e., A ⊃ A, B ⊃ B, C ⊃ C, D ⊃ D. Then, the label of each non-assumption atom is computed as: P : Q: R: f alse :

{{A, B}, {C}}, {{B, C}, {D}}, {{B, C}, {A, B, D}}, {{C, D}}.

To compute the label of R in ATMS, we ﬁrstly construct the product of P and Q’s labels as {{A, B, C}, {B, C}, {A, B, D}, {C, D}}, then eliminate {C, D} as a nogood and {A, B, C} as a superset of {B, C}. The above label updating method from [14] cannot be directly used when Σ contains non-Horn clauses. This is because semantic resolution in the above form is not deductively complete for non-Horn clauses. For a full CMS, the level saturation method is proposed in [88], which involves computation of all prime implicates of Σ. In [34], it is shown that a sound and complete procedure of CMS/ATMS can be provided using SOL resolution, without computing all prime implicates of Σ, for both label generating and label updating.

Automated Abduction

333

Example 4.6 Consider a propositional abductive theory (Σ, Γ ), where Σ = { P ∨ Q, Γ = { A, B }.

¬B ∨ P },

Let N be the set of all atoms appearing in Σ. We set the production ﬁeld as P ∗ = Γ ∪ G, the number of literals from N − Γ is at most one . Then, Carc(Σ, P ∗ ) in this case is equivalent to Σ. While P has the label {{B}}, Q’s label is empty. Now, suppose that a new clause, ¬A ∨ ¬P , is added to Σ. Then, an updating algorithm based on SOL resolution ﬁnds Q’s new label {{A}}, as well as a new nogood {A, B}:

2, ¬A ∨ ¬P , ¬A, ¬P ,

¬A, Q ∨ ¬P , ¬A ∨ Q, ¬P , ¬A ∨ Q, 2 .

¬A, ¬B ∨ ¬P , ¬A ∨ ¬B, ¬P , ¬A ∨ ¬B, 2 .

Abductive procedures based on Clark completion [9,55,28,47] also perform computation of abduction in a deductive manner. This kind of abductive procedures is often used in implementing abductive logic programming. Inoue et al. [42] develop a model generation procedure for bottom-up abduction based on a translation in [44], which applies the Skip rule of SOL resolution [35] in model generation. Abductive procedures that combine top-down and bottom-up approaches are also proposed in two ways: one is to achieve the goal-directedness in bottom-up procedures [77,42,97], and the other is to utilize derived lemmas in top-down methods [49]. Other than these resolution-based procedures, Cialdea Mayer and Pirri [11] propose tableau and sequent calculi for ﬁrst-order abduction. 4.5

Computational Complexity

The computational complexity of abduction has been extensively studied. First, in the case that the background knowledge is expressed in ﬁrst-order logic as in Section 2.1, the problem of ﬁnding an explanation that is consistent with Σ is not semi-decidable. That is, the problem of deciding the satisﬁability of an axiom set is undecidable for ﬁrst-order logic in general, hence computing an explanation is not decidable even if there exists an explanation. For the consequence-ﬁnding problem in Section 4.1, the set of characteristic clauses of Σ is not even recursively enumerable [48]. Similarly, the set of new characteristic clauses of F with respect to Σ, which is used to characterize explanations in

334

Katsumi Inoue

abduction (Theorem 4.1), involves computation as whether a derived formula is not a logical consequence of Σ, which cannot be necessarily determined in a ﬁnite amount of time. Hence, to check if a set E of hypotheses obtained in a top-down default proof or SOL resolution is in fact consistent with Σ, we need some approximation like a procedure which makes a theory consistent whenever a refutation-complete theorem prover cannot succeed to prove ¬E in a ﬁnite amount of time. Next, in the propositional case, the computational complexity of abduction is studied in [6,92,21]. From the theory of enumerating prime implicates, it is known that the number of explanations grows exponentially as number of clauses or propositions grows. Selman and Levesque [92] show that ﬁnding even one explanation of an atom from a Horn theory and a set of atomic hypotheses is NP-hard. Therefore, even if we abandon the completeness of explanations, it is still intractable. However, if we do not restrict a set Γ of hypotheses and can hypothesize any atom to construct explanations, an explanation can be found in polynomial time. Hence, the restriction of abducible atoms is a source of complexity. On the other hand, as analyses by [6,21] show, the intrinsic diﬃculty also lies in checking the consistency of explanations, and the inclusion of negative clauses in a theory increases the complexity. Another source of complexity lies in the requirement of minimality for abductive explanations [21]. However, some tractable classes of abductive theories have also been discovered [23,17]. Thus, in propositional abduction, it is unlikely that there exists a polynomialtime algorithm for abductive explanations in general. We can consider approximation of abduction, by discarding either the consistency or the soundness. However, we should notice that showing that a logical framework of abduction or default reasoning is undecidable or intractable does not mean that it is useless. Since they are intrinsically diﬃcult problems (consider, for instance, scientiﬁc discovery as the process of abduction), what we would like to know is that representing a problem in such a framework does not increase the computational complexity of the original problem.

5 5.1

Final Remark Problems to Be Addressed

In this article, we observed that automated abduction involves automated deduction in some way. However, clarifying the relationship between abduction and deduction is just a ﬁrst step towards a mechanization of Peirce’s abduction. There are many future research topics in automated abduction, which include fundamental problems of abduction, applications of abduction, and computational problems of abduction. Some of these problems are also listed in [51] in this volume, and some philosophical problems are discussed in [26,67]. As a fundamental problem of abduction, we have not yet fully understood the human mechanism of explanation and prediction. The formalization in this article only reﬂects a small part of the whole. Most importantly, there are non-logical aspects of abduction, which are hard to be represented. The mechanization of

Automated Abduction

335

hypothesis selection is one of the most challenging topics. Research on acquiring meta-knowledge like preference among explanations [47] and inventing new abducible hypotheses [40] is related to increase the quality of explanations in abduction. For computational problems, this article showed a directly mechanized way to compute abduction. There are another approach for computation, which translates the abduction problem into other technologies developed in AI. For example, some classes of abductive theories can be transformed into propositional satisﬁability and other nonmonotonic formalizations for which eﬃcient solvers exist. Such indirect approaches are taken in recent applications involving assumptionbased reasoning such as planning and diagnoses. One might think that nonmonotonic logic programming such as the stable model semantics or default logic is enough for reasoning under incomplete information when they are as expressive as the class of abductive theories. The question as to why we need abductive theories should be answered by considering the role of abduction in application domains. One may often understand abductive theories more easily and intuitively than theories represented in other nonmonotonic logics. For example, in diagnostic domains, background knowledge contains cause-eﬀect relations and hypotheses are written as a set of causes. In the process of theory formation, incomplete knowledge is naturally represented in the form of hypothetical rules. We thus can use an abductive framework as a high-level description language while computation of abduction can be compiled into other technologies. 5.2

Towards Mechanization of Scientific Reasoning

Let us recall Peirce’s theory of scientiﬁc reasoning. His theory of scientiﬁc discovery relies on the cycle of “experiment, observation, hypothesis generation, hypothesis veriﬁcation, and hypothesis revision”. Peirce mentions that this process involves all modes of reasoning; abduction takes place at the ﬁrst stage of scientiﬁc reasoning, deduction follows to derive the consequences of the hypotheses that were given by abduction, and ﬁnally, induction is used to verify that those hypotheses are true. According to this viewpoint, let us review the logic of abduction: (1) Facts ∪ Explanation |= Observation . (2) Facts ∪ Explanation is consistent . A possible interpretation of this form of hypothetical reasoning is now as follows. The formula (1) is the process of abduction, or the fallacy of aﬃrming the consequent. The consistency check (2), on the other hand, is the place where deduction plays a role. Since our knowledge about the world may be incomplete, we should experiment with the consequences using an inductive manner in order to verify that the hypotheses are consistent with the knowledge base. At the same time, the process of inductive generalization or the synthesis from examples involves abduction too. This phenomenon of human reasoning is also discussed by Flach and Kakas [27] as the “cycle” of abductive and inductive knowledge development.

336

Katsumi Inoue

When we are given some examples, we ﬁrst make hypotheses. While previous AI approaches for inductive generalization often enumerated all the possible forms of formulas, abduction would help to restrict the search space. Additional heuristics, once they are formalized, would also be helpful for constructing the hypotheses. Investigation on knowledge assimilation involving abduction, deduction and induction will become more and more important in AI research in the 21st century. Acknowledgements. Discussion with many researchers were very helpful in preparing this article. In particular, Bob Kowalski gave me valuable comments on an earlier draft of this article. I would also like to thank Toshihide Ibaraki, Koji Iwanuma, Chiaki Sakama, Ken Satoh, and Hiromasa Haneda for their suggestions on this work.

References 1. Chitta Baral. Abductive reasoning through ﬁltering. Artificial Intelligence, 120:1–28, 2000. 2. Nicole Bidoit and Christine Froidevaux. Minimalism subsumes default logic and circumscription. In: Proceedings of LICS-87, pages 89–97, 1987. 3. Genevieve Bossu and Pierre Siegel. Saturation, nonmonotonic reasoning, and the closed-world assumption. Artificial Intelligence, 25:13–63, 1985. 4. A. Bondarenko, P. M. Dung, R. A. Kowalski, and F. Toni. An abstract, argumentation-theoretic approach to default reasoning. Artificial Intelligence, 93:63–101, 1997. 5. Craig Boutilier and Ver´ onica Becher. Abduction as belief revision. Artificial Intelligence, 77:43–94, 1995. 6. Tom Bylander, Dean Allemang, Michael C. Tanner, and John R. Josephson. The computational complexity of abduction. Artificial Intelligence, 49:25–60, 1991. 7. Chin-Liang Chang and Richard Char-Tung Lee. Symbolic Logic and Mechanical Theorem Proving. Academic Press, New York, 1973. 8. Viorica Ciorba. A query answering algorithm for Lukaszewicz’ general open default theory. In: Proceedings of JELIA ’96, Lecture Notes in Artiﬁcial Intelligence, 1126, pages 208–223, Springer, 1996. 9. Luca Console, Daniele Theseider Dupre, and Pietro Torasso. On the relationship between abduction and deduction. Journal of Logic and Computation, 1:661–690, 1991. 10. P.T. Cox and T. Pietrzykowski. Causes for events: their computation and applications. In: Proceedings of the 8th International Conference on Automated Deduction, Lecture Notes in Computer Science, 230, pages 608–621, Springer, 1986. 11. Marita Cialdea Mayer and Fiora Pirri. First order abduction via tableau and sequent calculi. Journal of the IGPL, 1(1):99–117, 1993. 12. Marita Cialdea Mayer and Fiora Pirri. Abduction is not deduction-in-reverse. Journal of the IGPL, 4(1):95–108, 1996. 13. Hendrik Decker. An extension of SLD by abduction and integrity maintenance for view updating in deductive databases. In: Proceedings of the 1996 Joint International Conference and Symposium on Logic Programming, pages 157–169, MIT Press, 1996.

Automated Abduction

337

14. Johan de Kleer. An assumption-based TMS. Artificial Intelligence, 28:127–162, 1986. 15. Alvaro del Val. Approximate knowledge compilation: the ﬁrst order case. In: Proceedings of AAAI-96, pages 498–503, AAAI Press, 1996. 16. Alvaro del Val. A new method for consequence ﬁnding and compilation in restricted languages. In: Proceedings of AAAI-99, pages 259–264, AAAI Press, 1999. 17. Alvaro del Val. On some tractable classes in deduction and abduction. Artificial Intelligence, 116:297–313, 2000. 18. Robert Demolombe and Luis Fari˜ nas del Cerro. An inference rule for hypothesis generation. In: Proceedings of IJCAI-91, pages 152–157, 1991. 19. Marc Denecker and Danny De Schreye. SLDNFA: an abductive procedure for abductive logic programs. Journal of Logic Programming, 34:111–167, 1998. 20. Marc Denecker and Antonis Kakas, editors. Special Issue: Abductive Logic Programming. Journal of Logic Programming, 44(1–3), 2000. 21. Thomas Eiter and George Gottlob. The complexity of logic-based abduction. Journal of the ACM, 42(1):3–42, 1995. 22. Thomas Eiter, George Gottlob, and Nicola Leone. Semantics and complexity of abduction from default theories. Artificial Intelligence, 90:177–223, 1997. 23. Kave Eshghi. A tractable class of abduction problems. In: Proceedings of IJCAI93, pages 3–8, 1993. 24. David W. Etherington. Reasoning with Incomplete Information. Pitman, London, 1988. 25. Joseph J. Finger. Exploiting constraints in design synthesis. Ph.D. Dissertation, Technical Report STAN-CS-88-1204, Department of Computer Science, Stanford University, Stanford, CA, 1987. 26. Peter A. Flach and Antonis C. Kakas, editors. Abduction and Induction—Essays on their Relation and Integration. Kluwer Academic, 2000. 27. Peter A. Flach and Antonis C. Kakas. Abductive and inductive reasoning: background and issues. In: [26], pages 1–27, 2000. 28. T. H. Fung and R. Kowalski. The iﬀ procedure for abductive logic programming. Journal of Logic Programming, 33:151–165, 1997. 29. Michael Gelfond and Vladimir Lifschitz. Classical negation in logic programs and disjunctive databases. New Generation Computing, 9:365–385, 1991. 30. Michael Gelfond, Halina Przymusinska, and Teodor Przymusinski. On the relationship between circumscription and negation as failure. Artificial Intelligence, 38:75–94, 1989. 31. Matthew L. Ginsberg. A circumscriptive theorem prover. Artificial Intelligence, 39:209–230, 1989. 32. Nicolas Helft, Katsumi Inoue, and David Poole. Query answering in circumscription. In: Proceedings of IJCAI-91, pages 426–431, 1991. 33. Carl Gustav Hempel. Philosophy of Natural Science. Prentice-Hall, New Jersey, 1966. 34. Katsumi Inoue. An abductive procedure for the CMS/ATMS. In: Jo˜ ao P. Martins and Michael Reinfrank, editors, Truth Maintenance Systems, Lecture Notes in Artiﬁcial Intelligence, 515, pages 34–53, Springer, 1991. 35. Katsumi Inoue. Linear resolution for consequence ﬁnding. Artificial Intelligence, 56:301–353, 1992. 36. Katsumi Inoue. Studies on abductive and nonmonotonic reasoning. Doctoral Dissertation, Kyoto University, Kyoto, 1992.

338

Katsumi Inoue

37. Katsumi Inoue. Principles of abduction. Journal of Japanese Society for Artificial Intelligence, 7(1):48–59, 1992 (in Japanese). 38. Katsumi Inoue. Hypothetical reasoning in logic programs. Journal of Logic Programming, 18(3):191–227, 1994. 39. Katsumi Inoue. Induction, abduction, and consequence-ﬁnding. In: C´eline Rouveirol and Mich`ele Sebag, editors, Proceedings of the 11th International Conference on Inductive Logic Programming, Lecture Notes in Artiﬁcial Intelligence, 2157, pages 65–79, Springer, 2001. 40. Katsumi Inoue and Hiromasa Haneda. Learning abductive and nonmonotonic logic programs. In: [26], pages 213–231, 2000. 41. Katsumi Inoue and Nicolas Helft. On theorem provers for circumscription. In: Peter F. Patel-Schneider, editor, Proceedings of the 8th Biennial Conference of the Canadian Society for Computational Studies of Intelligence, pages 212–219, Morgan Kaufmann, 1990. 42. Katsumi Inoue, Yoshihiko Ohta, Ryuzo Hasegawa, and Makoto Nakashima. Bottom-up abduction by model generation. In: Proceedings of IJCAI-93, pages 102–108, Morgan Kaufmann, 1993. 43. Katsumi Inoue and Chiaki Sakama. Abductive framework for nonmonotonic theory change. In: Proceedings of IJCAI-95, pages 204–210, Morgan Kaufmann, 1995. 44. Katsumi Inoue and Chiaki Sakama. A ﬁxpoint characterization of abductive logic programs. Journal of Logic Programming, 27(2):107–136, 1996. 45. Katsumi Inoue and Chiaki Sakama. Negation as failure in the head. Journal of Logic Programming, 35(1):39–78, 1998. 46. Katsumi Inoue and Chiaki Sakama. Abducing priorities to derive intended conclusions. In: Proceedings of IJCAI-99, pages 44–49, Morgan Kaufmann, 1999. 47. Katsumi Inoue and Chiaki Sakama. Computing extended abduction through transaction programs. Annals of Mathematics and Artificial Intelligence, 25(3,4):339-367, 1999. 48. Koji Iwanuma and Katsumi Inoue. Minimal conditional answer computation and SOL. To appear, 2002. 49. Koji Iwanuma, Katsumi Inoue, and Ken Satoh. Completeness of pruning methods for consequence ﬁnding procedure SOL. In: Peter Baumgartner and Hantao Zhang, editors, Proceedings of the 3rd International Workshop on First-Order Theorem Proving, pages 89–100, Research Report 5-2000, Institute for Computer Science, University of Koblenz, Germany, 2000. ˙ 50. John R. Jpsephson and Susan G.Josephson. Abductive Inference: Computation, Philosophy, Technology. Cambridge University Press, 1994. 51. Antonis Kakas and Marc Denecker. Abductive logic programming. In this volume, 2002. 52. A.C. Kakas and P. Mancarella. Generalized stable models: a semantics for abduction. In: Proceedings of ECAI-90, pages 385–391, 1990. 53. A. C. Kakas, R. A. Kowalski, and F. Toni. Abductive logic programming. Journal of Logic and Computation, 2:719–770, 1992. 54. A. C. Kakas, R. A. Kowalski, and F. Toni. The role of abduction in logic programming. In: Dov M. Gabbay, C. J. Hogger, and J. A. Robinson, editors, Handbook of Logic in Artificial Intelligence and Logic Programming, Volume 5, pages 235–324, Oxford University Press, 1998. 55. Kurt Konolige. Abduction versus closure in causal theories. Artificial Intelligence, 53:255–272, 1992.

Automated Abduction

339

56. Kurt Konolige. Abductive theories in artiﬁcial intelligence. In: Gerhard Brewka, editor, Principles of Knowledge Representation, pages 129–152, CSLI Publications & FoLLI, 1996. 57. R. Kowalski. The case for using equality axioms in automated demonstration. In: Proceedings of the IRIA Symposium on Automatic Demonstration, Lecture Notes in Mathematics, 125, pages 112–127, Springer, 1970. 58. Robert A. Kowalski. Logic for Problem Solving. Elsevier, New York, 1979. 59. Robert Kowalski and Donald G. Kuehner. Linear resolution with selection function. Artificial Intelligence, 2:227–260, 1971. 60. Robert A. Kowalski and Francesca Toni. Abstract argumentation. Artificial Intelligence and Law, 4:275–296, 1996. 61. Char-Tung Lee. A completeness theorem and computer program for ﬁnding theorems derivable from given axioms. Ph.D. thesis, Department of Electrical Engineering and Computer Science, University of California, Berkeley, CA, 1967. 62. R. Letz, K. Mayer, and C. Goller. Controlled integration of the cut rule into connection tableau calculi. Journal of Automated Reasoning, 13(3):297–337, 1994. 63. Hector J. Levesque. A knowledge-level account of abduction (preliminary version). In: Proceedings of IJCAI-89, pages 1061–1067, 1989. 64. Vladimir Lifschitz. Computing circumscription. In: Proceedings of IJCAI-85, pages 121–127, 1985. 65. Jorge Lobo and Carlos Uzc´ ategui. Abductive consequence relations. Artificial Intelligence, 89:149–171, 1997. 66. Donald W. Loveland. Automated Theorem Proving: A Logical Basis. NorthHolland, Amsterdam, 1978. 67. Lorenzo Magnani. Abduction, Reason, and Science—Processes of Discovery and Explanation. Kluwer Academic, 2001. 68. Pierre Marquis. Consequence ﬁnding algorithms. In: Dov M. Gabbay and Philippe Smets, editors, Handbook for Defeasible Reasoning and Uncertain Management Systems, Volume 5, pages 41–145, Kluwer Academic, 2000. 69. Philippe Mathieu and Jean-Paul Delahaye. A kind of logical compilation for knowledge bases. Theoretical Computer Science, 131:197–218, 1994. 70. John McCarthy. Circumscription—a form of non-monotonic reasoning. Artificial Intelligence, 13:27–39, 1980. 71. John McCarthy. Applications of circumscription to formalizing common-sense knowledge. Artificial Intelligence, 28:89–116, 1986. 72. Eliana Minicozzi and Raymond Reiter. A note on linear resolution strategies in consequence-ﬁnding. Artificial Intelligence, 3:175–180, 1972. 73. Jack Minker. On indeﬁnite databases and the closed world assumption. In: Proceedings of the 6th International Conference on Automated Deduction, Lecture Notes in Computer Science, 138, pages 292–308, Springer, 1982. 74. Robert C. Moore. Semantical considerations on nonmonotonic logic. Artificial Intelligence, 25:75–94, 1985. 75. Stephen Muggleton. Inverse entailment and Progol. New Generation Computing, 13:245–286, 1995. 76. Shan-Hwei Nienhuys-Cheng and Ronald de Wolf. Foundations of Inductive Logic Programming. Lecture Notes in Artiﬁcial Intelligence, 1228, Springer, 1997. 77. Yoshihiko Ohta and Katsumi Inoue. Incorporating top-down information into bottom-up hypothetical reasoning. New Generation Computing, 11:401–421, 1993.

340

Katsumi Inoue

78. Gabriele Paul. AI approaches to abduction. In: Dov M. Gabbay and Philippe Smets, editors, Handbook for Defeasible Reasoning and Uncertain Management Systems, Volume 4, pages 35–98, Kluwer Academic, 2000. 79. Charles Sanders Peirce. Elements of Logic. In: Charles Hartshorne and Paul Weiss, editors, Collected Papers of Charles Sanders Peirce, Volume II, Harvard University Press, Cambridge, MA, 1932. 80. Ram´ on Pino-P´erez and Carlos Uzc´ ategui. Jumping to explanations versus jumping to conclusions. Artificial Intelligence, 111:131–169, 1999. 81. David Poole. A logical framework for default reasoning. Artificial Intelligence, 36:27–47, 1988. 82. David Poole. Explanation and prediction: an architecture for default and abductive reasoning. Computational Intelligence, 5:97–110, 1989. 83. David Poole. Compiling a default reasoning system into Prolog. New Generation Computing, 9:3–38, 1991. 84. David Poole, Randy Goebel, and Romas Aleliunas. Theorist: a logical reasoning system for defaults and diagnosis. In: Nick Cercone and Gordon McCalla, editors, The Knowledge Frontier: Essays in the Representation of Knowledge, pages 331– 352, Springer, New York, 1987. 85. Harry E. Pople, Jr. On the mechanization of abductive logic. In: Proceedings of IJCAI-73, pages 147–152, 1973. 86. Teodor C. Przymusinski. An algorithm to compute circumscription. Artificial Intelligence, 38:49–73, 1989. 87. Raymond Reiter. A logic for default reasoning. Artificial Intelligence, 13:81–132, 1980. 88. Raymond Reiter and Johan de Kleer. Foundations of assumption-based truth maintenance systems: preliminary report. In: Proceedings of AAAI-87, pages 183–187, 1987. 89. J.A. Robinson. A machine-oriented logic based on the resolution principle. Journal of the ACM, 12:23–41, 1965. 90. Olivier Roussel and Philippe Mathieu. Exact knowledge compilation in predicate calculus: the partial achievement case. In: Proceedings of the 14th International Conference on Automated Deduction, Lecture Notes in Artiﬁcial Intelligence, 1249, pages 161–175, Springer, 1997. 91. Murray Shanahan. Prediction is deduction but explanation is abduction. In: Proceedings of IJCAI-89, pages 1055–1060, Morgan Kaufmann, 1989. 92. Bart Selman and Hector J. Levesque. Support set selection for abductive and default reasoning. Artificial Intelligence, 82:259–272, 1996. 93. Pierre Siegel, Repr´esentation et utilization de la connaissance en calcul propo´ sitionnel. Th`ese d’Etat, Universit´e d’Aix-Marseille II, Luminy, France, 1987 (in French). 94. Pierre Siegel and Camilla Schwind. Hypothesis theory for nonmonotonic reasoning. In: Proceedings of the Workshop on Nonstandard Queries and Nonstandard Answers, pages 189–210, 1991. 95. J.R. Slagle, C.L. Chang, and R.C.T. Lee, Completeness theorems for semantic resolution in consequence-ﬁnding. In: Proceedings of IJCAI-69, pages 281–285, Morgan Kaufmann, 1969. 96. Mark E. Stickel. Rationale and methods for abductive reasoning in naturallanguage interpretation. In: R. Studer, editor, Natural Language and Logic, Proceedings of the International Scientific Symposium, Lecture Notes in Artiﬁcial Intelligence, 459, pages 233–252, Springer, 1990.

Automated Abduction

341

97. Mark E. Stickel. Upside-down meta-interpretation of the model elimination theorem-proving procedure for deduction and abduction. Journal of Automated Reasoning, 13(2):189–210, 1994. 98. Akihiro Yamamoto. Using abduction for induction based on bottom generalization. In: [26], pages 267–280, 2000. 99. Eiko Yamamoto and Katsumi Inoue. Implementation of SOL resolution based on model elimination. Transactions of Information Processing Society of Japan, 38(11):2112–2121, 1997 (in Japanese). 100. Wlodek Zadrozny. On rules of abduction. Annals of Mathematics and Artificial Intelligence, 9:387–419, 1993.

The Role of Logic in Computational Models of Legal Argument: A Critical Survey Henry Prakken1 and Giovanni Sartor2 1

Institute of Information and Computing Sciences Utrecht University, The Netherlands http://www.cs.uu.nl/staff/henry.html 2 Faculty of Law, University of Bologna, Italy [email protected]

Abstract. This article surveys the use of logic in computational models of legal reasoning, against the background of a four-layered view on legal argument. This view comprises a logical layer (constructing an argument); a dialectical layer (comparing and assessing conﬂicting arguments); a procedural layer (regulating the process of argumentation); and a strategic, or heuristic layer (arguing persuasively). Each further layer presupposes, and is built around the previous layers. At the ﬁrst two layers the information base is ﬁxed, while at the third and fourth layer it is constructed dynamically, during a dialogue or dispute.

1 1.1

Introduction AI & Law Research on Legal Argument

This article surveys a ﬁeld that has been heavily inﬂuenced by Bob Kowalski, the logical analysis of legal reasoning and legal knowledge representation. Not only has he made important contributions to this ﬁeld (witness the many times his name will be mentioned in this survey) but also has he inﬂuenced many to undertake such a logical analysis at all. Our research has been heavily inﬂuenced by his work, building on logic programming formalisms and on the well-known argumentation-theoretic account of nonmonotonic logic, of which Bob Kowalski was one of the originators [Kakas et al., 1992, Bondarenko et al., 1997]. We feel therefore very honoured to contribute to this volume in honour of him. The precise topic of this survey is the role of logic in computational models of legal argument. Argumentation is one of the central topics of current research in Artiﬁcial Intelligence and Law. It has attracted the attention of both logically inclined and design-oriented researchers. Two common themes prevail. The ﬁrst is that legal reasoning is defeasible, i.e., an argument that is acceptable in itself can be overturned by counterarguments. The second is that legal reasoning is usually performed in a context of debate and disagreement. Accordingly, such notions are studied as argument moves, attack, dialogue, and burden of proof. Historically, perhaps the ﬁrst AI & Law attempt to address legal reasoning in an adversarial setting was McCarty’s (partly implemented) Taxman A.C. Kakas, F. Sadri (Eds.): Computat. Logic (Kowalski Festschrift), LNAI 2408, pp. 342–381, 2002. c Springer-Verlag Berlin Heidelberg 2002

The Role of Logic in Computational Models of Legal Argument

343

project, which aimed to reconstruct the lines of reasoning in the majority and dissenting opinions of a few leading American tax law cases (see e.g. [McCarty and Sridharan, 1981, McCarty, 1995]). Perhaps the ﬁrst AI & Law system that explicitly deﬁned notions like dispute and dialectical role was Rissland & Ashley’s (implemented) HYPO system [Rissland and Ashley, 1987], which modelled adversarial reasoning with legal precedents. It generated 3-ply disputes between plaintiﬀ and defendant in a legal case, where each dispute is an alternating series of attacks by the defendant on the plaintiﬀ’s claim, and of defences or counterattacks by the plaintiﬀ against these attacks. This research was continued in Rissland & Skalak’s CABARET project [Rissland and Skalak, 1991], and Aleven & Ashley’s CATO project [Aleven and Ashley, 1997], both also in the ‘design’ strand. The main focus of all these projects is deﬁning persuasive argument moves, moves which would be made by ‘good’ human lawyers. By contrast, much logic-based research on legal argument has focused on defeasible inference, inspired by AI research on nonmonotonic reasoning and defeasible argumentation [Gordon, 1991, Kowalski and Toni, 1996, Prakken and Sartor, 1996, Prakken, 1997, Nitta and Shibasaki, 1997, Hage, 1997, Verheij, 1996]. Here the focus was ﬁrst on reasoning with rules and exceptions and with conﬂicting rules. After a while, some turned their attention to logical accounts of case-based reasoning [Loui et al., 1993, Loui and Norman, 1995, Prakken and Sartor, 1998]. Another shift in focus occurred after it was realised that legal reasoning is bound not only by the rules of logic but also by those of fair and eﬀective procedure. Accordingly, logical models of legal argument have been augmented with a dynamic component, capturing that the information with which a case is decided is not somehow ‘there’ to be applied, but is constructed dynamically, in the course of a legal procedure (e.g. [Hage et al., 1994, Gordon, 1994, Bench-Capon, 1998, Lodder, 1999, Prakken, 2001b]). In contrast to the above-mentioned work on dispute in the ‘design’ strand, here the focus is more on procedure and less on persuasive argument moves, i.e., more on the rules of the ’debating game’ and less on how to play this game well. In this survey we will discuss not only logical approaches but also some work from the ’design strand’. This is since, in our opinion, these approaches should not be regarded as alternatives but should complement and inspire each other. A purely logic-based approach runs the risk of becoming too abstract and ignored by the ﬁeld for which it is intended, while a purely design-based approach is in danger of becoming too self-centred and ad-hoc. 1.2

A Four-Layered View on Legal Argument

How can all these research projects be compared and contrasted? We propose that models of legal argument can be described in terms of four layers.1 The 1

The combination of the ﬁrst three layers was ﬁrst discussed by [Prakken, 1995]. The ﬁrst and third layer were also discussed by [Brewka and Gordon, 1994]. The fourth layer was added by [Prakken, 1997] and also discussed in [Sartor, 1997].

344

Henry Prakken and Giovanni Sartor

ﬁrst, logical layer deﬁnes what arguments are, i.e., how pieces of information can be combined to provide basic support for a claim. The second, dialectical layer focuses on conﬂicting arguments: it introduces such notions as ‘counterargument’, ‘attack’, ‘rebuttal’ and ‘defeat’, and it deﬁnes, given a set of arguments and evaluation criteria, which arguments prevail. The third, procedural layer regulates how an actual dispute can be conducted, i.e., how parties can introduce or challenge new information and state new arguments. In other words, this level deﬁnes the possible speech acts, and the discourse rules governing them. Thus the procedural layer diﬀers from the ﬁrst two in one crucial respect. While those layers assume a ﬁxed set of premises, at the procedural layer the set of premises is constructed dynamically, during a debate. This also holds for the ﬁnal layer, the strategic or heuristic one, which provides rational ways of conducting a dispute within the procedural bounds of the third layer. All four layers are to be integrated into a comprehensive view of argumentation: the logical layer deﬁnes, by providing a notion of arguments, the objects to be evaluated at the dialectical layer; the dialectical layer oﬀers to the procedural and heuristic layers a judgement of whether a new argument might be relevant in the dispute; the procedural layer constrains the ways in which new inputs, supplied by the heuristic layer can be submitted to the dialectical one; the heuristic layer provides the matter which is to be processed in the system. Each layer can obviously be studied (and implemented) in abstraction from the other ones. However, a main premise of this article is that research at the individual levels would beneﬁt if the connection with the other layers is always kept in mind. For instance, logical techniques (whether monotonic or not) have a better chance of being accepted by the AI & Law community when they can easily be embedded in procedural or heuristic layers of legal argument. Let us illustrate the four layers with an example of a legal dispute. P1 : I claim that John is guilty of murder. O1 : I deny your claim. P2 : John’s ﬁngerprints were on the knife. If someone stabs a person to death, his ﬁngerprints must be on the knife, so, John has stabbed Bill to death. If a person stabs someone to death, he is guilty of murder, so, John is guilty of murder. O2 : I concede your premises, but I disagree that they imply your claim: Witness X says that John had pulled the knife out of the dead body. This explains why his ﬁngerprints were on the knife. P3 X’s testimony is inadmissible evidence, since she is anonymous. Therefore, my claim still stands. P1 illustrates the procedural layer: the proponent of a claim starts a dispute by stating his claim. The procedure now says that the opponent can either accept or deny this claim. O does the latter with O1 . The procedure now assigns the burden of proof to P . P attempts to fulﬁl this burden with an argument for his claim (P2 ). Note that this argument is not deductive since it includes

The Role of Logic in Computational Models of Legal Argument

345

an abductive inference step; whether it is constructible, is determined at the logical layer. The same holds for O’s counterargument O2 , but whether it is a counterargument and has suﬃcient attacking strength is determined at the dialectical layer, while O’s right to state a counterargument is deﬁned by the procedure. The same remarks hold for P ’s counterargument P3 . In addition, P3 illustrates the heuristic layer: it uses the heuristic that evidence can be attacked by arguing that it is inadmissible. This paper is organised as follows. First, in Section 2 we discuss the four layers in more detail. Then in Section 3, we use them in discussing the most inﬂuential computational models of legal argument. In Section 4, we do the same for the main logical analyses of legal argument, after which we conclude.

2

Four Layers in Legal Argument

Let us now look in more detail at the four layers of legal argument. It is important to note that the ﬁrst two layers comprise the subject matter of nonmonotonic logics. One type of such logics explicitly separates the two layers, viz. logical systems for defeasible argumentation (cf. [Prakken and Vreeswijk, 2002]). For this reason we will largely base our discussions on the structure of these systems. However, since [Dung, 1995] and [Bondarenko et al., 1997] have shown that essentially all nonmonotonic logics can be recast as such argument-based systems, most of what we will say also applies to other nonmonotonic logics.

2.1

The Logical Layer

The logical layer is concerned with the language in which information can be expressed, and with the rules for constructing arguments in this language.

The Logical Language Deontic terms One ongoing debate in AI & Law is whether normative terms such as ‘obligatory’, ‘permitted’ and ‘forbidden’ should be formalised in (modal) deontic logics or whether they can be expressed in ﬁrst-order logic; cf. e.g. [Jones and Sergot, 1992]. From our perspective this issue is not very relevant, since logics for defeasible argumentation can cope with any underlying logic. Moreover, as for the defeasibility of deontic reasoning, we think that special deontic defeasible logics (see e.g. [Nute, 1997]) are not very suited. It is better to embed one’s preferred deontic monotonic logic in one’s preferred general defeasible logic, since legal defeasibility is not restricted to deontic terms, but extends to all other kinds of legal knowledge, including deﬁnitions and evidential knowledge. Obviously, a uniﬁed treatment of defeasibility is to be preferred; cf. [Prakken, 1996].

346

Henry Prakken and Giovanni Sartor

Conceptual structures Others have focused on the formalisation of recurring conceptual legal structures. Important work in this area is McCarty’s[1989] Language of Legal Discourse, which addresses the representation of such categories as space, time, mass, action, causation, intention, knowledge, and belief. This strand of work is, although very important for AI & Law, less relevant for our concerns, for the same reasons as in the deontic case: argument-based systems can deal with any underlying logic. Conditional rules A topic that is more relevant for our concerns is the representation of conditional legal rules. The main issue here is whether legal rules satisfy contrapositive properties or not. Some AI & Law formalisms, e.g. Gordon’s [1995] Pleadings Game, validate contraposition. However, [Prakken, 1997] has argued that contraposition makes counterarguments possible that would never be considered in actual reasoning practice. A possible explanation for why this is the case is Hage’s [1996, 1997] view on legal rules as being constitutive. In this view (based on insights of analytical philosophy) a legal rule does not describe but constitutes states of aﬀairs: for instance, a legal rule makes someone a thief or something a contract, it does not describe that this is the case. According to Hage, a legal rule must be applied to make things the case, and lawyers never apply rules contrapositively. This view is related to AI interpretations of defaults as inference licences or inference policies [Loui, 1998, Nute, 1992], while the invalidity of contraposition has also been defended in the context of causal reasoning; see e.g. [Geﬀner, 1992]. Finally, contraposition is also invalid in extended logic programming, where programs can have both weak and strong negations; cf. [Gelfond and Lifschitz, 1990]. Weak and strong negation The desire to formalise reasoning with rules and exceptions sometimes motivates the use of a nonprovability, consistency or weak negation operator, such as negation as failure in logic programming. Whether such a device should be used depends on one’s particular convention for formalising rules and exceptions (see further Section 2 below). Metalogic Features Much legal knowledge is metaknowledge, for instance, knowledge about the general validity of rules or their applicability to certain kinds of cases, priority principles for resolving conﬂicts between conﬂicting rules, or principles for interpreting legal rules. Clearly, for representing such knowledge metalogic tools are needed. Logic-based AI & Law research of legal argument has made ample use of such tools, as this survey will illustrate. Non-logical languages Finally, non-logical languages can be used. On the one hand, there are the well-known knowledge representation formalisms, such as frames and semantic networks. In AI, their logical interpretation has been thoroughly studied. On the other hand, in AI & Law various special-purpose schemes have been developed, such as HYPO’s factor-based representation of cases (see Section 3.3), ZENO’s issue-position-based language [Gordon and Kara¸capilidis, 1997], Room 5’s encapsulated text frames

The Role of Logic in Computational Models of Legal Argument

347

[Loui et al., 1997], ArguMed’s linked-boxes language [Verheij, 1999], or variants of Toulmin’s [1958] well-known argument scheme [Bench-Capon, 1998]. Simple non-logical languages are especially convenient in systems for intelligent tutoring (such as CATO) or argument mediation (such as ROOM 5, ZENO and ArguMed), since users of such systems cannot be expected to formalise their arguments in logic. In formally reconstructing such systems, one issue is whether their representation language should be taken as primitive or translated into some known logical language. Argument-based logics leave room for both options. Argument Construction As for argument construction, a minor issue is how to format arguments: as simple premises - conclusion pairs, as sequences of inferences (deductions) or as trees of inferences. The choice between these options seems a matter of convenience; for a discussion of the various options see e.g. [Prakken and Vreeswijk, 2002]. More crucial issues are whether incomplete arguments, i.e., arguments with hidden premises, should be allowed and whether nondeductive arguments should be allowed. Incomplete Arguments In ordinary language people very often omit information that could make their arguments valid, such as in “John has killed Pete, so John is guilty of Murder”. Here the hidden premise “Who kills another person is guilty of murder” is omitted. In some argument mediation applications, e.g. [Lodder, 1999], such incomplete arguments have been allowed, for instance, to give the listener the opportunity to agree with the argument, so that obvious things can be dealt with eﬃciently. In our opinion this makes sense, but only if a listener who does not agree with the argument has a way to challenge its validity. Non-deductive argument types Non-deductive reasoning forms, such as inductive, abductive and analogical reasoning are clearly essential to any form of practical reasoning, so they must have a place in the four-layered view on argumentation. In legal reasoning inductive and abductive arguments play an important role in evidential reasoning, while analogical arguments are especially important in the interpretation of legal concepts. The main issue is whether these reasoning forms should be regarded as argument construction principles (the logical layer) or as heuristics for ﬁnding new information (the heuristic layer). In [Prakken, 1995], one of us argued for the latter option. For instance, Prakken argued that an analogy is inherently unable to justify its conclusion since in the end it must always be decided whether the similarities outweigh the diﬀerences or not. However, others, e.g. [Loui et al., 1993, Loui, 1998], have included analogical arguments at the logical layer on the grounds that if they are untenable, this will show itself in a rational dispute. Clearly, the latter view presupposes that the dialectical layer is embedded in the procedural layer. For a legal-theoretical discussion of the issue see [Peczenik, 1996, pp. 310–313]. Outside AI & Law, a prominent argument-based

348

Henry Prakken and Giovanni Sartor

system that admits non-deductive arguments is [Pollock, 1995]’s OSCAR system. Our present opinion is that both approaches make sense. One important factor here is whether the dialectical layer is embedded in the procedural layer. Another important factor is whether a reasoning form is used to justify a conclusion or not. For instance, some uses of analogy concern learning [Winston, 1980], while other uses concern justiﬁcation (as in much AI & Law work on case-based reasoning). One thing is especially important: if non-deductive arguments are admitted at the logical layer, then the dialectical layer should provide for ways to attack the link between their premises and conclusion; cf. Pollock’s [1995] undercutters of defeasible inference rules. For instance, if analogies are admitted, it should not only be possible to rebut them with counterexamples, i.e., with analogies for contradictory conclusions, but it should also be possible to undercut analogies by saying that the similarities are irrelevant, or that the diﬀerences are more important than the similarities. 2.2

The Dialectical Layer

The dialectical layer addresses three issues: when arguments are in conﬂict, how conﬂicting arguments can be compared, and which arguments survive the competition between all conﬂicting arguments. Conflict In the literature, three types of conﬂicts between arguments are discussed. The ﬁrst is when arguments have contradictory conclusions, as in ‘A contract exists because there was an oﬀer and an acceptance’ and ‘A contract does not exist because the oﬀerer was insane when making the oﬀer’. Clearly, this form of attack, often called rebutting an argument, is symmetric. The other two types of conﬂict are not symmetric. One is where one argument makes a nonprovability assumption (e.g. with logic-programming’s negation as failure) and another argument proves what was assumed unprovable by the ﬁrst. For example, an argument ‘A contract exists because there was an oﬀer and an acceptance, and it is not provable that one of the parties was insane’, is attacked by any argument with conclusion ‘The oﬀerer was insane’. In [Prakken and Vreeswijk, 2002] this is called assumption attack. The ﬁnal type of conﬂict (identiﬁed by Pollock, e.g. 1995) is when one argument challenges a rule of inference of another argument. After Pollock, this is usually called undercutting an inference. Obviously, a rule of inference can only be undercut if it is not deductive. For example, an analogy can be undercut by saying that the similarity is insuﬃcient to warrant the same conclusion. Note, ﬁnally, that all these senses of attack have a direct and an indirect version; indirect attack is directed against a subconclusion or a substep of an argument. For instance, indirect rebuttals contradict an intermediate conclusion of an argument. Comparing Arguments The notion of conﬂicting, or attacking arguments does not embody any form of evaluation; comparing conﬂicting pairs of arguments, or in other words, determining whether an attack is successful, is

The Role of Logic in Computational Models of Legal Argument

349

another element of argumentation. The terminology varies: some terms that have been used are ‘defeat’ [Prakken and Sartor, 1996], ‘attack’ [Dung, 1995, Bondarenko et al., 1997] and ‘interference’ [Loui, 1998]. In this article we shall use defeat for the weak notion and strict defeat for the strong, asymmetric notion. How are conﬂicting arguments compared in the legal domain? Two main points must be stressed here. The ﬁrst is that general, domain-independent standards are of little use. Lawyers use many domain-speciﬁc standards, ranging from general principles such as “the superior law overrides the inferior law” and “the later regulation overrides the earlier one” to case-speciﬁc and context-dependent criteria such as “preferring this rule promotes economic competition, which is good for society”, or “following this argument would lead to an enormous increase in litigation, which should be avoided”. The second main point is that these standards often conﬂict, so that the comparison of conﬂicting arguments is itself a subject of dispute. For instance, the standards of legal certainty and individual fairness often conﬂict in concrete situations. For logical models of legal argument this means that priority principles must be expressible in the logical language, and that their application must be modelled as defeasible reasoning. Speciﬁcity Some special remarks are in order about the speciﬁcity principle. In AI this principle is often regarded as very important. However, in legal reasoning it is just one of the many standards that might be used, and it is often overridden by other standards. Moreover, there are reasons to doubt whether speciﬁcity of regulations can be syntactically deﬁned at all. Consider the following imaginary example (due to Marek Sergot, personal communication). 1. All cows must have earmarks 2. Calfs need not have earmarks 3. All cows must have earmarks, whether calf or not 4. All calfs are cows Lawyers would regard (2) as an exception to (1) because of (4) but they would certainly not regard (2) as an exception to (3), since the formulation of (3) already takes the possible exception into account. Yet logically (3) is equivalent to (1), since the addition “whether calf or not” is a tautology. In conclusion, speciﬁcity may be suitable as a notational convention for exceptions, but it cannot serve as a domain-independent conﬂict resolution principle. Assessing the Status of Arguments The notion of defeat only tells us something about the relative strength of two individual conﬂicting arguments; it does not yet tell us with what arguments a dispute can be won. The ultimate status of an argument depends on the interaction between all available arguments. An important phenomenon here is reinstatement :2 it may very well be that argument B defeats argument A, but that B is itself defeated by a third argument 2

But see [Horty, 2001] for a critical analysis of the notion of reinstatement.

350

Henry Prakken and Giovanni Sartor

C; in that case C ‘reinstates’ A. Suppose, for instance, that the argument A that a contract exists because there there was an oﬀer and acceptance, is defeated by the argument B that a contract does not exist because the oﬀerer was insane when making the oﬀer. And suppose that B is in turn (strictly) defeated by an argument C, attacking B’s intermediate conclusion that the oﬀerer was insane at the time of the oﬀer. In that case C reinstates argument A. The main distinction is that between justiﬁed , defensible and overruled arguments. The distinction between justiﬁed and defensible arguments corresponds to the well-known distinction between sceptical and credulous reasoning, while overruled arguments are those that are defeated by a justiﬁed argument. Several ways to deﬁne these notions have been studied, both in semantic and in proof-theoretic form, and both for justiﬁcation and for defensibility. See [Prakken and Vreeswijk, 2002] for an overview and especially [Dung, 1995, Bondarenko et al., 1997] for semantical studies. For present purposes the diﬀerences in semantics do not matter much; what is more important is that argumentbased proof theories can be stated in the dialectical form of an argument game, as a dispute between a proponent and opponent of a claim. The proponent starts with an argument for this claim, after which each player must attack the other player’s previous argument with a counterargument of suﬃcient strength. The initial argument provably has a certain status if the proponent has a winning strategy, i.e., if he can make the opponent run out of moves in whatever way she attacks. Clearly, this setup ﬁts well with the adversarial nature of legal argument, which makes it easy to embed the dialectical layer in the procedural and heuristic ones. To give an example, consider the two dialogue trees of in Figure 1. Assume that they contain all constructible arguments and that the defeat relations are as shown by the arrows (single arrows denote strict defeat while double arrows stand for mutual defeat). In the tree on the left the proponent has a winning strategy, since in all dialogues the opponent eventually runs out of moves; so argument A is provable. The tree on the right extends the ﬁrst tree with three arguments. Here the proponent does not have a winning strategy, since one dialogue ends with a move by the opponent; so A is not provable in the extended theory. Partial computation Above we said that the status of an argument depends on its interaction with all available arguments. However, we did not specify what ‘available’ means. Clearly, the arguments processed by the dialectical proof theory are based on input from the procedural layer, viz. on what has been said and assumed in a dispute. However, should only the actually stated arguments be taken into account, or also additional arguments that can be computed from the theory constructed during the dispute? And if the latter option is chosen, should all constructible arguments be considered, or only those that can be computed within given resource bounds? In the literature, all three options have been explored. The methods with partial and no computation have been defended by pointing at the fact that computer algorithms cannot be guaranteed to ﬁnd arguments in reasonable time, and sometimes not even in ﬁnite time (see especially

The Role of Logic in Computational Models of Legal Argument

P1: A

P1: A

O1: B

O1’: C

O1: B

O1’: C

O1’’: H

P2: D

P2’: E

P2: D

P2’: E

P2’’: I

O2: F

O2’: C

O2: F

O2’: C

O2’’: C

P3: G

P3’: E

P3: G

P3’: E

P3’’: E

A is provable

351

O2’’’: J

A is not provable

Fig. 1. Two trees of proof-theoretical dialogues.

Pollock 1995; Loui 1998). In our opinion, the choice essentially depends on the context and the intended use of the system. Representing Exceptions Finally, we discuss the representation of exceptions to legal rules, which concerns a very common phenomenon in the law. Some exceptions are stated by statutes themselves, while others are based, for instance on the purpose of rules or on legal principles. Three diﬀerent techniques have been used for dealing with exceptions. Two of them are well-known from nonmonotonic logic, while the third one is, to our knowledge, a contribution of AI & Law research. The ﬁrst general technique is the exception clause or explicit-exceptions approach, which corresponds to the use of ‘unless’ clauses in natural language. Logically, such clauses are captured by a nonprovability operator, which can be formalised with various well-known techniques from nonmonotonic logic or logic programming. In argument-based models the idea is that arguments concluding for the exception, thus establishing what the rule requires not to be proved, defeat arguments based upon the rule. In some formalisations, the not-to-beproved exception is directly included in the antecedent of the rule to which it refers. So, the rule ‘A if B, unless C’, is (semiformally) represented as follows (where ∼ stands for nonprovability). r1 : A ∧ ∼ C ⇒ B A more abstract and modular representation is also possible within the exception clause approach. This is achieved when the rule is formulated as requiring that no exception is proved to the rule itself. The exception now becomes the antecedent of a separate conditional.

352

Henry Prakken and Giovanni Sartor

r1 : A ∧ ∼ Exc(r1 ) ⇒ B r2 : C ⇒ Exc(r1 ) While in this approach rules themselves refer to their exceptions, a variant of this technique has been developed where instead the no-exception requirement is built into the logic of rule application [Routen and Bench-Capon, 1991, Hage, 1996, Prakken and Sartor, 1996]. Semiformally this looks as follows. r1 : A ⇒ B r2 : C ⇒ Exc(r1 ) We shall call this the exclusion approach. In argument-based versions it takes the form of allowing arguments for the inapplicability of a rule defeat the arguments using that rule. Exclusion resembles Pollock’s [1995] notion of undercutting defeaters. Finally, a third technique for representing exceptions is provided by the choice or implicit-exceptions approach. As in the exclusion approach, rules do not explicitly refer to exceptions. However, unlike with exclusion, the exception is not explicitly stated as an exception. Rather it is stated as a rule with conﬂicting conclusion, and is turned into an exception by preference information that gives the exceptional rule priority over the general rule. r1 : A ⇒ B r2 : C ⇒ ¬B r1 < r2 In argument-based models this approach is implemented by making arguments based on stronger rules defeat arguments based on weaker rules. In the general study of nonmonotonic reasoning usually either only the exception-clause- or only the choice approach is followed. However, AI & Law researchers have stressed that models of legal argument should support the combined use of all three techniques, since the law itself uses all three of them. 2.3

The Procedural Layer

There is a growing awareness that there are other grounds for the acceptability of arguments besides syntactic and semantic grounds. One class of such grounds lies in the way in which a conclusion was actually reached. This is partly inspired by a philosophical tradition that emphasises the procedural side of rationality and justice; see e.g. [Toulmin, 1958, Rawls, 1972, Rescher, 1977, Habermas, 1981]. Particularly relevant for present purposes is Toulmin’s [1958, pp. 7–8] advice that logicians who want to learn about reasoning in practice, should turn away from mathematics and instead study jurisprudence, since outside mathematics the validity of arguments would not depend on their syntactic form but on the disputational process in which they have been defended. According to Toulmin an argument is valid if it can stand against criticism in a properly conducted

The Role of Logic in Computational Models of Legal Argument

353

dispute, and the task of logicians is to ﬁnd criteria for when a dispute has been conducted properly; moreover, he thinks that the law, with its emphasis on procedures, is an excellent place to ﬁnd such criteria. Toulmin himself has not carried out his suggestion, but others have. For instance, Rescher [1977] has sketched a dialectical model of scientiﬁc reasoning which, so he claims, explains the bindingness of inductive arguments: they must be accepted if they cannot be successfully challenged in a properly conducted scientiﬁc dispute. A formal reconstruction of Rescher’s model has been given by Brewka [1994]. In legal philosophy Alexy’s [1978] discourse theory of legal argumentation addresses Toulmin’s concerns, based on the view that a legal decision is just if it is the outcome of a fair procedure. Another source of the concern for procedure is AI research on resourcebounded reasoning; e.g. [Simon, 1982, Pollock, 1995, Loui, 1998]. When the available resources do not guarantee ﬁnding an optimal solution, rational reasoners have to rely on eﬀective procedures. One kind of procedure that has been advocated as eﬀective is dialectics [Rescher, 1977, Loui, 1998]. It is not necessary to accept the view that rationality is essentially procedural in order to see that it at least has a procedural side. Therefore, a study of procedure is of interest to anyone concerned with normative theories of reasoning. How can formal models of legal procedure be developed? Fortunately, there already exists a formal framework that can be used. In argumentation theory, formal dialogue systems have been developed for so-called ‘persuasion’ or ‘critical discussion’; see e.g. [Hamblin, 1971, MacKenzie, 1990, Walton and Krabbe, 1995]. According to Walton and Krabbe [1995], dialogue systems regulate four aspects of dialogues: – – – –

Locution rules (what moves are possible) Structural rules (when moves are legal) Commitment rules (The eﬀects of moves on the players’ commitments); Termination rules (when dialogues terminate and with what outcome).

In persuasion, the parties in a dispute try to solve a conﬂict of opinion by verbal means. The dialogue systems regulate the use of speech acts for such things as making, challenging, accepting, withdrawing, and arguing for a claim. The proponent of a claim aims at making the opponent concede his claim; the opponent instead aims at making the proponent withdraw his claim. A persuasion dialogue ends when one of the players has fulﬁlled their aim. Logic governs the dialogue in various ways. For instance, if a participant is asked to give grounds for a claim, then in most systems these grounds have to logically imply the claim. Or if a proponent’s claim is logically implied by the opponent’s concessions, the opponent is forced to accept the claim, or else withdraw some of her concessions. Most computational models of legal procedure developed so far [Hage et al., 1994, Gordon, 1995, Bench-Capon, 1998, Lodder, 1999, Prakken, 2001b] have incorporated such formal dialogue systems. However, they have extended them with one interesting feature, viz. the possibility of counterargument. In argumentation-theoretic models of persuasion the only way to challenge an argument is by asking an argument for its premises. In

354

Henry Prakken and Giovanni Sartor

a legal dialogue, by contrast, a party can challenge an argument even if he accepts all premises, viz. by stating a counterargument. In other words, while in the argumentation-theoretic models the underlying logic is deductive, in the AI & Law systems it is defeasible: support for a claim may be defeasible (e.g. inductive or analogical) instead of watertight, and forced or implied concession of a claim is deﬁned in terms of defeasible instead of deductive consequence. Or in terms of our four-layered view: while the argumentation theorists only have the logical and procedural layer, the AI & Law models have added the dialectical layer in between. In fact, clarifying the interplay between the dialectical and the procedural layer is not a trivial matter, and is the subject of ongoing logical research. See e.g. [Brewka, 2001, Prakken, 2000, Prakken, 2001c]. 2.4

The Heuristic Layer

This layer (which addresses much of what is traditionally called ‘rhetoric’) is the most diverse one. In fact, heuristics play a role at any aspect of the other three levels: they say which premises to use, which arguments to construct, how to present them, which arguments to attack, which claims to make, concede or deny, etc. Heuristics can be divided into (at least) three kinds: inventional heuristics, which say how a theory can be formed (such as the classical interpretation schemes for legal rules), selection heuristics, which recommend a choice between various options (such as ‘choose an argument with as few premises as possible, to minimise its attacking points’), and presentation heuristics, which tell how to present an argument (e.g. ‘don’t draw the conclusion yourself but invite the listener to draw it’). A keyword at the heuristic level is persuasion. For instance, which arguments are the most likely to make the opponent accept one’s claims? Persuasiveness of arguments is not a matter of logic, however broadly conceived. Persuasiveness is not a function from a given body of information: it involves an essential nondeterministic element, viz. what the other player(s) will do in response to a player’s dialectic acts. To model persuasiveness, models are needed predicting what other players (perhaps the normal, typical other player) will do. Analogous models have been studied in research on argument in negotiation [Kraus et al., 1998, Parsons et al., 1998]. An interesting issue is how to draw the dividing line between argument formation rules and inventional heuristics. Below we will discuss several reasoning schemes that can be reasonably regarded as of either type. We think that the criterion is whether the schemes are meant to justify a claim or not. 2.5

Intertwining of the Layers

The four layers can be intertwined in several ways. For instance, allocating the burden of proof is a procedural matter, usually done by the judge on the basis of procedural law. However, sometimes it becomes the subject of dispute, for

The Role of Logic in Computational Models of Legal Argument

355

instance, when the relevant procedural provisions are open-textured or ambiguous. In such a case, the judge will consider all relevant arguments for and against a certain allocation and decide which argument prevails. To this the dialectical layer applies. The result, a justiﬁed argument concerning a certain allocation, is then transferred to the procedural layer as a decision concerning the allocation. Moreover, sometimes the question at which layer one ﬁnds himself depends on the use that is made of a reasoning scheme instead of on the reasoning scheme itself. We already mentioned analogy, which can be used in learning (heuristic layer) but also in justiﬁcation (dialectical layer). Or consider, for another example, the so-called teleological interpretation scheme, i.e., the idea that law texts should usually be understood in terms of their underlying purposes. This principle may be used by a party (when it provides him with a rule which is in his interest to state) as an inventional heuristic, i.e., as a device suggesting suitable contents to be stated in his argument: interpret a law text as a rule which achieves the legislator’s purposes, whenever this rule promotes your interest. If this is the use of the interpretation scheme, then a party would not input it in the dispute, but would just state the results it suggests. The principle, however, could also be viewed by a party as a justiﬁcatory premise, which the party explicitly uses to support the conclusion that a certain rule is valid, or that it prevails over alternative interpretations. Not all inventional heuristics could equally be translatable as justiﬁcatory meta-rules. Consider for example the heuristic: interpret a text as expressing the rule that best matches the political ideology (or the sexual of racial prejudices) of the judge of your case, if this rule promotes your interest. This suggestion, even though it may be a successful heuristic, usually could not be inputted in the argumentation as a justiﬁcatory meta-rule.

3

Computational Models of Legal Argument

In the introduction we said that logic-based and design-based methods in AI & law should complement and inﬂuence each other. For this reason, we now discuss some of the most inﬂuential implemented architectures of legal argument. We do so in the light of our four-layered view. 3.1

McCarty’s Work

The TAXMAN II project of McCarty (e.g. McCarty and Sridharan, 1981; McCarty, 1995) aims to model how lawyers argue for or against the application of a legal concept to a problem situation. In McCarty and Sridharan [1981] only a theoretical model is presented but in McCarty [1995] an implementation is described of most components of the model. However, their interaction in ﬁnding arguments is still controlled by the user. Among other things, the project involves the design of a method for representing legal concepts, capturing their open-textured and dynamic nature. This method is based on the view that legal concepts have three components: ﬁrstly, a

356

Henry Prakken and Giovanni Sartor

(possibly empty) set of necessary conditions for the concept’s applicability; secondly, a set of instances (“exemplars”) of the concept; and ﬁnally, a set of rules for transforming a case into another one, particularly for relating “prototypical” exemplars to “deformations”. According to McCarty, the way lawyers typically argue about application of a concept to a new case is by ﬁnding a plausible sequence of transformations which maps a prototype, possibly via other cases, onto the new case. In our opinion, these transformations might be regarded as invention heuristics for argument construction. 3.2

Gardner

An early success of logic-based methods in AI & Law was their logical reconstruction of Gardner’s [1987] program for so-called “issue spotting”. Given an input case, the task of the program was to determine which legal questions involved were easy and which were hard, and to solve the easy ones. If all the questions were found easy, the program reported the case as clear, otherwise as hard. The system contained domain knowledge of three diﬀerent types: legal rules, common-sense rules, and rules extracted from cases. The program considered a question as hard if either “the rules run out”, or diﬀerent rules or cases point at diﬀerent solutions, without there being any reason to prefer one over the other. Before a case was reported as hard, conﬂicting alternatives were compared to check whether one is preferred over the other. For example, case law sets aside legal rules or common-sense interpretations of legal concepts. Clearly, Gardner’s program can be reconstructed as nonmonotonic reasoning with prioritised information, i.e., as addressing the dialectical layer. Reconstructions of this kind have been given by [Gordon, 1991], adapting [Poole, 1988]’s abductive model of default reasoning, and [Prakken, 1997], in terms of an argument-based logic. 3.3

HYPO

HYPO aims to model how lawyers make use of past decisions when arguing a case. The system generates 3-ply disputes between a plaintiﬀ and a defendant of a legal claim concerning misuse of a trade secret. Each move conforms to certain rules for analogising and distinguishing precedents. These rules determine for each side which are the best cases to cite initially, or in response to the counterparty’s move, and how the counterparty’s cases can be distinguished. A case is represented as a set of factors pushing the case towards (pro) or against (con) a certain decision, plus a decision which resolves the conﬂict between the competing factors. A case is citable for a side if it has the decision wished by that side and shares with the Current Fact Situation (CFS) at least one factor which favours that decision. A citation can be countered by a counterexample, that is, a case that is at least as much on point, but has the opposite outcome. A citation may also be countered by distinguishing, that is, by indicating a factor in the CFS which is absent in the cited precedent and which supports the opposite outcome, or a factor in the precedent which is missing in the CFS,

The Role of Logic in Computational Models of Legal Argument

357

and which supports the outcome of the cited case. Finally, HYPO can create hypothetical cases by using magnitudes of factors. In evaluating the relative force of the moves, HYPO uses the set inclusion ordering on the factors that the precedents share with the CFS. However, unlike logic-based argumentation systems, HYPO does not compute an ‘outcome’ or ‘winner’ of a dispute; instead it outputs 3-ply disputes as they could take place between ‘good’ lawyers. HYPO in Terms of the Four Layers Interpreting HYPO in terms of the four layers, the main choice is whether to model HYPO’s analogising and distinguishing moves as argument formation rules (logical layer) or as inventional heuristics (heuristic layer). In the ﬁrst interpretation, the representation language is simply as described above (a decision, and sets of factors pro and con a decision), analogising a precedent is a constructible argument, stating a counterexample is a rebutter, and distinguishing a precedent is an undercutter. Defeat is deﬁned such that distinctions always defeat their targets, while counterarguments defeat their targets iﬀ they are not less on point. In the second interpretation, proposed by [Prakken and Sartor, 1998], analogising and distinguishing a precedent are regarded as ‘theory constructors’, i.e., as ways of introducing new information into a dispute. We shall discuss this proposal below in Section 3. Which interpretation of HYPO’s argument moves is the best one is not an easy question. Essentially it asks for the nature of analogical reasoning, which is a deep philosophical question. In both interpretations HYPO has some heuristic aspects, since it deﬁnes the “best cases to cite” for each party, selecting the most-on-point cases from those allowed by the dialectical protocol. This can be regarded as a selection heuristic. 3.4

CATO

The CATO system of Aleven and Ashley [1997] applies an extended HYPO architecture for teaching case-based argumentation skills to law students, also in the trade secrets domain. CATO’s main new component is a ‘factor hierarchy’, which expresses expert knowledge about the relations between the various factors: more concrete factors are classiﬁed according to whether they are a reason pro or con the more abstract factors they are linked to; links are given a strength (weak or strong), which can be used to solve certain conﬂicts. Essentially, this hierarchy ﬁlls the space between the factors and decision of a case. Thus it can be used to explain why a certain decision was taken, which in turn facilitates debates on the relevance of diﬀerences between cases. For instance, the hierarchy positively links the factor Security measures taken to the more abstract concept Eﬀorts to maintain secrecy. Now if a precedent contains the ﬁrst factor but the CFS lacks it, then not only could a citation of the precedent be distinguished on the absence of Security measures taken, but also could this distinction be emphasised by saying that thus no eﬀorts were made to maintain secrecy. However, if the CFS also contains a factor Agreed not to disclose information, then the factor hierarchy enables downplaying this

358

Henry Prakken and Giovanni Sartor

distinction, since it also positively links this factor to Eﬀorts to maintain secrecy: so the party that cited the precedent can say that in the current case, just as in the precedent, eﬀorts were made to maintain secrecy. The factor hierarchy is not meant to be an independent source of information from which arguments can be constructed. Rather it serves as a means to reinterpret precedents: initially cases are in CATO, as in HYPO, still represented as one-step decisions; the factor hierarchy can only be used to argue that the decision was in fact reached by one or more intermediate steps. CATO in Terms of the Four Layers At the logical layer CATO adds to HYPO the generation of multi-steps arguments, exploiting the factor hierarchy. As for CATO’s ability to reinterpret precedents, we do not regard this as an inventional heuristic, since the main device used in this feature, the factor hierarchy, is given in advance; instead we think that this is just the logic-layer ability to build multi-steps arguments from given information. However, CATO’s way of formatting the emphasising and downplaying moves in its output can be regarded as built-in presentation heuristics. 3.5

CABARET

The CABARET system of Rissland and Skalak [1991] attempts at combining rule-based and case-based reasoning. Its case-based component is the HYPO system. The focus is on statutory interpretation, in particular on using precedents to conﬁrm or contest the application of a rule. In [Skalak and Rissland, 1992], CABARET’s model is described as a hierarchy of argument techniques including strategies, moves and primitives. A strategy is a broad characterisation of how one should argue, given one’s particular viewpoint and dialectical situation. A move is a way to carry out the strategy, while a primitive is a way to implement a move. For example, when one wants to apply a rule, and not all of the rule’s conditions are satisﬁed, then a possible strategy is to broaden the rule. This strategy can be implemented with a move that argues with an analogised precedent that the missing condition is not really necessary. This move can in turn be implemented with HYPO’s ways to analogise cases. Similarly, CABARET also permits arguments that a rule which prima facie appears to cover the case, should not be applied to it. Here the strategy is discrediting a rule and the move may consist in analogising a case in which the rule’s conditions were met but the rule was not applied. Again the move can be implemented with HYPO’s ways to analogise cases. CABARET in Terms of the Four Layers At the logical layer CABARET adds to HYPO the possibility to construct simple rule-based arguments, while at the dialectical layer, CABARET adds corresponding ways to attack arguments. CABARET’s main feature, its model of argument strategies, clearly addresses the heuristic layer. The strategies can be seen as selection heuristics: they choose between the available attacking points, and pick up from the rule- and case-base the most relevant materials.

The Role of Logic in Computational Models of Legal Argument

3.6

359

DART

Freeman & Farley [1996] have semi-formally described and implemented a dialectical model of argumentation. For legal applications it is especially relevant since it addresses the issue of burden of proof. Rules are divided into three epistemic categories: ‘suﬃcient’, ‘evidential’ and ‘default’, in decreasing order of priority. The rules for constructing arguments involve standard logic principles, such as modus ponens and modus tollens, but also nonstandard ones, such as for abductive reasoning (p ⇒ q and q imply p) and a contrario reasoning (p ⇒ q and ¬p imply ¬q). Taken by themselves these inferences clearly are the well-known fallacies of ‘aﬃrming the consequent’ and ‘denying the antecedent’ but this is dealt with by deﬁning undercutters for such arguments. For instance, the above abductive argument can be undercut by providing an alternative explanation for q, in the form of a rule r ⇒ q. The defeat relations between arguments depend both on the type of premise and on the type of inference rule. The status of arguments is deﬁned in terms of an argument game based on a static knowledge base. DART’s argument game has several variants, depending on which level of proof holds for the main claim. This is because Freeman and Farley maintain that diﬀerent legal problem solving contexts require diﬀerent levels of proof. For instance, for the question whether a case can be brought before court, only a ‘scintilla of evidence’ is required (in present terms a defensible argument), while for a decision in a case ‘dialectical validity’ is needed (in our terms a justiﬁed argument). DART in Terms of the Four Layers DART essentially addresses the logical and dialectical layers, while it assumes input from the procedural layer. At the logical layer, it allows both deductive and nondeductive arguments. Freeman and Farley are well aware that this requires the deﬁnition of undercutters for the nondeductive argument types. DART’s argument games are similar to dialectical proof theories for argument-based logics. However, they are not given a formal semantics. Finally, DART assumes procedural input in the form of an assignment of a level of proof to the main claim. 3.7

The Pleadings Game

Next we discuss Gordon’s [1994, 1995] Pleadings Game, which is an attempt to model the procedural view on justice discussed above in Section 2.3. The legal-procedural example domain is ‘civil pleading’, which is the phase in AngloAmerican civil procedure where the parties exchange arguments and counterarguments to identify the issues that must be decided by the court. The system is not only implemented but also formally deﬁned. Thus this work is an excellent illustration of how logic can be used as a tool in computational models of legal argument. For this reason, and also since it clearly illustrates the relation between the ﬁrst three layers, we shall discuss it in some detail. The implemented system mediates between parties in a legal procedure: it keeps track of the stated arguments and their dialectical relations, and it checks

360

Henry Prakken and Giovanni Sartor

whether the procedure is obeyed. Gordon models civil pleading as a HamblinMacKenzie-style dialogue game, deﬁning speech acts for stating, conceding and denying (= challenging) a claim, and stating an argument for a claim. In addition, Gordon allows for counterarguments, thus choosing for a nonmonotonic logic as the underlying logical system. In fact, Gordon uses the argument-based proof theory of Geﬀner’s [1992] conditional entailment. As for the structural rules of the game, a game starts when the plaintiﬀ states his main claim. Then the game is governed by a general rule saying that at each turn a player must respond in some permissible way to every move of the opponent that is still relevant. A move is relevant iﬀ it concerns an issue. An issue is, very roughly, deﬁned as a claim that dialectically matters for the main claim and has not yet been replied-to. The other structural rules deﬁne under which conditions a move is permissible. For instance, a claim of a player may be denied by the other player iﬀ it is an issue and is not defeasibly implied by the denier’s own previous claims. And a denied claim may be defended with an argument as long as (roughly) the claim is an issue, and the argument’s premises are consistent with the mover’s previous claims, and (in case the other party had previously claimed them) they were conceded by the mover. If no such ‘permission rule’ applies, the other player is to move, except when this situation occurs at the beginning of a turn, in which case the game terminates. The result of a terminated game is twofold: a list of issues identiﬁed during the game (i.e., the claims on which the players disagree), and a winner, if there is one. Winning is deﬁned relative to the set of premises agreed upon during a game. If issues remain, there is no winner and the case must be decided by the court. If no issues remain, then the plaintiﬀ wins iﬀ its main claim is defeasibly implied by the jointly constructed theory, while the defendant wins otherwise. An Example We now illustrate the Pleadings Game with an example. Besides illustrating this particular system, the example also illustrates the interplay between the logical, dialectical and procedural layers of legal argument. For the sake of illustration we simplify the Game on several points, and use a diﬀerent (and semiformal) notation. The example, loosely based on Dutch law, concerns a dispute on oﬀer and acceptance of contracts. The players are called plaintiﬀ (π) and defendant (δ). Plaintiﬀ, who had made an oﬀer to defendant, starts the game by claiming that a contract exists. Defendant denies this claim, after which plaintiﬀ supports it with the argument that defendant accepted his oﬀer and that an accepted oﬀer creates a contract. π1 : Claim[ (1) Contract ] δ1 : Deny(1) π2 : Argue[ (2) Offer, (3) Acceptance, (4) Offer ∧ Acceptance ⇒ Contract, so Contract ] Now defendant attacks plaintiﬀ’s supporting argument [2,3,4] by defeating its subargument that she accepted the oﬀer. The counterargument says that defen-

The Role of Logic in Computational Models of Legal Argument

361

dant sent her accepting message after the oﬀer had expired, for which reason there was no acceptance in a legal sense. δ2 : Concede(2,4), Deny(3) Argue[ (5) “Accept” late, (6) “Accept” late ⇒ ¬ Acceptance, so ¬ acceptance ] Plaintiﬀ responds by strictly defeating δ2 with a more speciﬁc counterargument (conditional entailment compares arguments on speciﬁcity), saying that even though defendant’s accepting message was late, it still counts as an acceptance, since plaintiﬀ had immediately sent a return message saying that he recognises defendant’s message as an acceptance. π3 : Concede(5), Deny(6), Argue[ (5) “Accept” late, (7) “Accept” recognised, (8) “Accept” late ∧ “Accept” recognised ⇒ Acceptance, so Acceptance ] Defendant now attempts to leave the issues for trial by conceding π3 ’s argument (the only eﬀect of this is giving up the right to state a counterargument) and its premise (8), and by denying one of the other premises, viz. (7) (she had already implicitly claimed premise (5) herself, in δ2 ). Plaintiﬀ goes along with defendant’s aim by simply denying defendant’s denial of (7) and not stating a supporting argument for his claim, after which the game terminates since no relevant moves are left to answer for either party. δ3 : Concede(8,[5,7,8]), Deny(7) π4 : Deny(Deny(7)) This game has resulted in the following dialectical graph. π1 : [2,3,4] for Contract δ1 : [5,6] for ¬ Acceptance π2 : [5,7,8] for Acceptance The claims in this graph that have not been conceded are (1) Contract (3) Acceptance (6) “Accept” late ⇒ ¬ Acceptance (7) “Accept” recognised So these are the issues. Moreover, the set of premises constructed during the game, i.e. the set of conceded claims, is {2, 4, 5}. It is up to the judge whether to extend it with the issues (6) and (7). In each case conditional-entailment’s proof theory must be used to verify whether the other two issues, in particular plaintiﬀ’s main claim (1), are (defeasibly) implied by the resulting premises. In fact, it is easy to see that they are entailed only if (6) and (7) are added.

362

Henry Prakken and Giovanni Sartor

The Pleadings Game in Terms of the Four Layers Clearly, the Pleadings Game explicitly models the ﬁrst three layers of our model. (In fact, the game was a source of inspiration of [Prakken, 1995]’s ﬁrst formulation of these layers.) Its contribution to modelling the procedural layer should be apparent from the example. Gordon has also addressed the formalisation of the dialectical layer, adapting within conditional entailment well-known AI techniques concerning naming of rules in (ab)normality predicates. Firstly, he has shown how information about properties of rules (such as validity and backing) can be expressed and, secondly, he has deﬁned a way to express priority rules as object level rules, thus formalising disputes about rule priorities. However, a limitation of his method is that it has to accept conditional-entailment’s built-in speciﬁcity principle as the highest source of priorities.

4

Logical Models of Legal Argument

Having discussed several implemented models of legal argument, we now turn to logical models. Again we will discuss them in light of our four-layers model. 4.1

Applications of Logic (Meta-)Programming

The British Nationality Act First we discuss the idea of formalising law as logic programs, viz. as a set of formulas of a logical language for which automated theorem provers exist. The underlying ideas of this approach are set out in [Sergot, 1988] and [Kowalski and Sergot, 1990], and is most closely associated with Sergot and Kowalski. The best known application is the formalisation of the British Nationality Act [Sergot et al., 1986] (but see also [Bench-Capon et al., 1987]). For present purposes the main relevance of the work of Sergot et al. is its treatment of exceptions by using negation by failure (further explored by Kowalski, 1989, 1995). To our knowledge, this was the ﬁrst logical treatment of exceptions in a legal context. In this approach, which implements the explicit-exceptions approach of Section 2, negation by failure is considered to be an appropriate translation for such locutions as ‘unless the contrary is shown’ or ‘subject to section . . . ’, which usually introduce exception clauses in legislation. Consider, for example, the norm to the eﬀect that, under certain additional conditions, an abandoned child acquires British citizenship unless it can be shown that both parents have a diﬀerent citizenship. Since Kakas et al. have shown that negation as failure can be given an argument-based interpretation, where negation-as failure assumptions are defeated by proving their contrary, we can say that [Sergot et al., 1986] model reasoning with rules and exceptions at the logical and the dialectical layer. Allen & Saxon’s criticism An interesting criticism of Sergot et al.’s claim concerning exceptions was put forward by [Allen and Saxon, 1989]. They argued that the defeasible nature of legal reasoning is irreducibly procedural, so that it cannot be captured by current nonmonotonic logics, which deﬁne defeasible

The Role of Logic in Computational Models of Legal Argument

363

consequence only as a ‘declarative’ relation between premises and conclusion of an argument. In particular, they attacked the formalisation of ‘unless shown otherwise’ with negation as failure by arguing that ‘shown’ in this context does not mean ‘logically proven from the available premises’ but “shown by a process of argumentation and the presenting of evidence to an authorized decision-maker”. So ‘shown’ would not refer to logical but to legal-procedural nonprovability. In our opinion, Allen & Saxon are basically right, since such expressions address the allocation of the burden of proof, which in legal procedure is a matter of decision by the judge rather than of inference, and therefore primarily concerns the procedural layer rather than the dialectical one (as is Sergot et al.’s use of negation by failure). Note that these remarks apply not only to Sergot et al.’s work, but to any approach that stays within the dialectical layer. In Section 4.4 we will come back to this issue in more detail. Applications of Logic Metaprogramming In two later projects the legal application of logic-programming was enriched with techniques from logic metaprogramming. Hamfelt [1995] uses such techniques for (among other things) representing legal collision rules and interpretation schemes. His method uses logic programming’s DEMO predicate, which represents provability in the object language. Since much knowledge used in legal reasoning is metalevel knowledge, Hamfelt’s approach might be a useful component of models of legal argument. However, it is not immediately clear how it can be embedded in a dialectical context, so that more research is needed. The same holds for the work of Routen and Bench-Capon [1991], who have applied logic metaprogramming to, among other things, the representation of rules and exceptions. Their method provides a way to implement the exclusion approach of Section 2. They enrich the knowledge representation language with metalevel expressions Exception to(rule1 , rule2 ), and ensure that their metainterpreter applies a rule only if no exceptional rule can be applied. Although this is an elegant method, it also has some restrictions. Most importantly, it is not embedded in an argument-based model, so that it cannot easily be combined with other ways to compare conﬂicting arguments. Thus their method seems better suited for representing coherent legal texts than for modelling legal argument. 4.2

Applications of Argument-Based Logics

Next we discuss legal applications of logics for defeasible argumentation. Several of these applications in fact use argument-based versions of logic programming. Prakken & Sartor Prakken and Sartor [1996, 1997] have developed an argument-based logic similar to the one of [Simari and Loui, 1992], but that is expressive enough to deal with contradictory rules, rules with assumptions, inapplicability statements, and priority rules. Their system applies the wellknown abstract approach to argumentation, logic programming and nonmonotonic reasoning developed by Dung [1995] and Bondarenko et al. [1997]. The

364

Henry Prakken and Giovanni Sartor

logical language of the system is that of extended logic programming i.e., it has both negation as failure (∼) and classical, or strong negation (¬). Moreover, each formula is preceded by a term, its name. (In [Prakken, 1997] the system is generalised to the language of default logic.) Rules are strict, represented with →, or else defeasible, represented with ⇒. Strict rules are beyond debate; only defeasible rules can make an argument subject to defeat. Accordingly, facts are represented as strict rules with empty antecedents (e.g. → gave-up-house). The input information of the system, i.e., the premises, is a set of strict and defeasible rules, which is called an ordered theory (‘ordered’ since an ordering on the defeasible rules is assumed). Arguments can be formed by chaining rules, ignoring weakly negated antecedents; each head of a rule in the argument is a conclusion of the argument. Conﬂicts between arguments are decided according to a binary relation of defeat among arguments, which is partly induced by rule priorities. An important feature of the system is that the information about these priorities is itself presented as premises in the logical language. Thus rule priorities are as any other piece of legal information established by arguments, and may be debated as any other legal issue. The results of such debates are then transported to and used by the metatheory of the system. There are three ways in which an argument Arg2 can defeat an argument Arg1 . The ﬁrst is assumption defeat (in the above publications called “undercutting” defeat), which occurs if a rule in Arg1 contains ∼ L in its body, while Arg2 has a conclusion L. For instance, the argument [r1 : → p, r2 : p ⇒ q] (strictly) defeats the argument [r3 : ∼ q ⇒ r] (note that ∼ L reads as ‘there is no evidence that L’). This way of defeat can be used to formalise the explicit-exception approach of Section 2. The other two forms of defeat are only possible if Arg1 does not assumption-defeat Arg2 . One way is by excluding an argument, which happens when Arg2 concludes for some rule r in Arg1 that r is not applicable (formalised as ¬appl(r)). For instance, the argument [r1 : → p, r2 : p ⇒ ¬appl(r3 )] (strictly) defeats the argument [r3 : ⇒ r] by excluding it. This formalises the exclusion approach of Section 2. The ﬁnal way in which Arg2 can defeat Arg1 is by rebutting it: this happens when Arg1 and Arg2 contain rules that are in a head-to-head conﬂict and Arg2 ’s rule is not worse than the conﬂicting rule in Arg1 . This way of defeat supports the implicit-exception approach. Argument status is deﬁned with a dialectical proof theory. The proof theory is correct and complete with respect to [Dung, 1995]’s grounded semantics, as extended by Prakken and Sartor to the case with reasoning about priorities. The opponent in a game has just one type of move available, stating an argument that defeats proponent’s preceding argument (here defeat is determined as if no priorities were deﬁned). The proponent has two types of moves: the ﬁrst is an argument that combines an attack on opponent’s preceding argument with a priority argument that makes the attack strictly defeating opponent’s argument; the second is a priority argument that neutralises the defeating force of O’s last move. In both cases, if proponent uses a priority argument that is not justiﬁed

The Role of Logic in Computational Models of Legal Argument

365

by the ordered theory, this will reﬂect itself in the possibility of successful attack of the argument by the opponent. We now present the central deﬁnition of the dialogue game (‘Arg-defeat’ means defeat on the basis of the priorities stated by Arg). The ﬁrst condition says that the proponent begins and then the players take turns, while the second condition prevents the proponent from repeating a move. The last two conditions were just explained and form the heart of the deﬁnition. A dialogue is a ﬁnite nonempty sequence of moves movei = (P layeri , Argi ) (i > 0), such that 1. P layeri = P iﬀ i is odd; and P layeri = O iﬀ i is even; 2. If P layeri = P layerj = P and i = j, then Argi = Argj ; 3. If P layeri = P then Argi is a minimal (w.r.t. set inclusion) argument such that (a) Argi strictly Argi -defeats Argi−1 ; or (b) Argi−1 does not Argi -defeat Ai−2 ; 4. If P layeri = O then Argi ∅-defeats Argi−1 . The following simple dialogue illustrates this deﬁnition. It is about a tax dispute about whether a person temporarily working in another country has changed his ﬁscal domicile. All arguments are citations of precedents.3 P1 : [f1 : kept-house, r1 : kept-house ⇒ ¬ change] (Keeping one’s old house is a reason against change of ﬁscal domicile.) O1 : [f2 : ¬ domestic-headquarters, r2 : ¬ domestic-headquarters ⇒ ¬ domestic-company, r3 : ¬ domestic-company ⇒ change] (If the employer’s headquarters are in the new country, it is a foreign company, in which case ﬁscal domicile has changed.) P2 : [f3 : domestic-property, r4 : domestic-property ⇒ domestic-company, f4 : r4 is decided by higher court than r2 , r5 : r4 is decided by higher court than r2 ⇒ r2 ≺ r4 ] (If the employer has property in the old country, it is a domestic company. The court which decided this is higher than the court deciding r2 .) The proponent starts the dialogue with an argument P1 for ¬ change, after which the opponent attacks this argument with an argument O1 for the opposite conclusion. O1 defeats P1 as required, since in our logical system two rebutting 3

Facts fi : → pi are abbreviated as fi : pi .

366

Henry Prakken and Giovanni Sartor

arguments defeat each other if no priorities are stated. P2 illustrates the ﬁrst possible reply of the proponent to an opponent’s move: it combines an object level argument for the conclusion domestic-company with a priority argument that gives r4 precedence over r2 and thus makes P2 strictly defeat O1 . The second possibility, just stating a priority argument that neutralises the opponent’s move, is illustrated by the following alternative move, which resolves the conﬂict between P1 and O1 in favour of P1 : P2 : [f5 : r1 is more recent than r3 , p : r1 is more recent than r3 ⇒ r3 ≺ r1 ] Kowalski & Toni Like Prakken and Sartor, Kowalski and Toni [1996] also apply the abstract approach of [Dung, 1995, Bondarenko et al., 1997] to the legal domain, instantiating it with extended logic programming. Among other things, they show how priority principles can be encoded in the object language without having to refer to priorities in the metatheory of the system. We illustrate their method using the language of [Prakken and Sartor, 1996]. Kowalski and Toni split each rule r: P ⇒ Q into two rules Applicable(r) ⇒ Q P ∧ ∼ D ef eated(r) ⇒ Applicable(r) The predicate Defeated is deﬁned as follows: r ≺ r ∧ C onf licting(r, r ) ∧ Applicable(r) → D ef eated(r) Whether r ≺ r holds, must be (defeasibly) derived from other information. Kowalski and Toni also deﬁne the Conﬂicting predicate in the object language. Three Formal Reconstructions of HYPO-style Case-Based Reasoning The dialectical nature of the HYPO system has inspired several logically inclined researchers to reconstruct HYPO-style reasoning in terms of argument-based defeasible logics. We brieﬂy discuss three of them, and refer to [Hage, 1997] for a reconstruction in Reason-based Logic (cf. Section 4.3 below). Loui et al. (1993) Loui et al. [1993] proposed a reconstruction of HYPO in the context of the argument-based logic of [Simari and Loui, 1992]. They mixed the pro and con factors of a precedent in one rule Pro-factors ∧ Con-factors ⇒ Decision but then implicitly extended the case description with rules containing a superset of the con factors and/or a subset of the con factors in this rule. Loui et al. also studied the combination of reasoning with rules and cases. This work was continued in [Loui and Norman, 1995] (discussed below in Section 4.5).

The Role of Logic in Computational Models of Legal Argument

367

Prakken and Sartor (1998) Prakken and Sartor [1998] have modelled HYPOstyle reasoning in their [1996] system, also adding additional expressiveness. As Loui et al. [1993] they translate HYPO’s cases into a defeasible-logical theory. However, unlike Loui et al., Prakken and Sartor separate the pro and con factors into two conﬂicting rules, and capture the case decision with a priority rule. This method is an instance of a more general idea (taken from [Loui and Norman, 1995]) to represent precedents as a set of arguments pro and con the decision, and to capture the decision by a justiﬁed priority argument that in turn makes the argument for the decision justiﬁed. In its simplest form where, as in HYPO, there are just a decision and sets of factors pro and con the decision, this amounts to having a pair of rules r1 : Pro-factors ⇒ Decision r2 : Con-factors ⇒ ¬Decision and an unconditional priority rule p: ⇒ r1 r2 However, in general arguments can be multi-step (as suggested by [Branting, 1994]) and priorities can very well be the justiﬁed outcome of a competition between arguments. Analogy is now captured by a ‘rule broadening’ heuristic, which deletes the antecedents missing in the new case. And distinguishing is captured by a heuristic which introduces a conﬂicting rule ‘if these factors are absent, then the consequent of your broadened rule does not hold’. So if a case rule is r1 : f1 ∧ f2 ⇒ d, and the CFS consists of f1 only, then r1 is analogised by b(r1 ): f1 ⇒ d, and b(r1 ) is distinguished by d(b(r1 )): ¬f2 ⇒ ¬d. To capture the heuristic nature of these moves, Prakken and Sartor ‘dynamify’ their [1996] dialectical proof procedure, to let it cope with the introduction of new premises. Finally, in [Prakken, 2002] it is, inspired by [Bench-Capon and Sartor, 2001], shown how within this setup cases can be compared not on factual similarities but on the basis of underlying values. Horty (1999) Horty [1999] has reconstructed HYPO-style reasoning in terms of his own work on two other topics: defeasible inheritance and defeasible deontic logic. Since inheritance systems are a forerunner of logics for defeasible argumentation, Horty’s reconstruction can also be regarded as argument-based. It addresses the analogical citation of cases and the construction of multi-steps arguments. To support the citation of precedents on their intermediate steps, cases are separated into ‘precedent constituents’, which contain a set of factors and a possibly intermediate outcome. Arguments are sequences of factor sets, starting with the current fact situation and further constructed by iteratively applying precedent constituents that share at least one factor with the set constructed thus far. Conﬂicting uses of precedent constituents are compared with a variant of HYPO’s more-on-point similarity criterion. The dialectical status of

368

Henry Prakken and Giovanni Sartor

the constructible arguments is then assessed by adapting notions from Horty’s inheritance systems, such as ‘preemption’. Other Work on Argument-Based Logics Legal applications of argumentbased logic programming have also been studied by Nitta and his colleagues; see e.g. [Nitta and Shibasaki, 1997]. Besides rule application, their argument construction principles also include some simple forms of analogical reasoning. However, no undercutters for analogical arguments are deﬁned. The system also has a rudimentary dialogue game component. Formal work on dialectical proof theory with an eye to legal reasoning has been done by Jakobovits and Vermeir [1999]. Their focus is more on technical development than on legal applications. 4.3

Reason-Based Logic

Hage [1996, 1997] and Verheij [1996] have developed a formalism for legal reasoning called ‘reason-based logic’ (RBL), centering around a deep philosophical account of the concept of a rule. It is meant to capture how legal (and other) principles, goals and rules give rise to reasons for and against a proposition and how these reasons can be used to draw conclusions. The underlying view on principles, rules and reasons is inﬂuenced by insights from analytical philosophy on the role of reasons in practical reasoning, especially [Raz, 1975]. Hage and Verheij stress that rule application is much more than simple modus ponens. It involves reasoning about the validity and applicability of a rule, and weighing reasons for and against the rule’s consequent. RBL’s View on Legal Knowledge RBL reﬂects a distinction between two levels of legal knowledge. The primary level includes principles and goals, while the secondary level includes rules. Principles and goals express reasons for or against a conclusion. Without the secondary level these reasons would in each case have to be weighed to obtain a conclusion, but according to Hage and Verheij rules express the outcome of certain weighing process. Therefore, a rule does not only generate a reason for its consequent but also generates a so-called ‘exclusionary’ reason against applying the principles underlying the rule: the rule replaces the reasons on which it is based. This view is similar to Dworkin’s [1977] well-known view that while principles are weighed against each other, rules apply in an all-or-nothing fashion. However, according to Hage [1996] and Verheij [Verheij et al., 1998] this diﬀerence is just a matter of degree: if new reasons come up, which were not taken into account in formulating the rule, then these new reasons are not excluded by the rule; the reason based on the rule still has to be compared with the reasons based on the new principles. Consequently, in RBL rules and principles are syntactically indistinguishable; their diﬀerence is only reﬂected in their degree of interaction with other rules and principles (but Hage [1997] somewhat deviates from this account.)

The Role of Logic in Computational Models of Legal Argument

369

A Sketch of the Formal System To capture reasoning about rules, RBL provides the means to express properties of rules in the object language. To this end Hage and Verheij use a sophisticated naming technique, viz. reiﬁcation, wellknown from metalogic and AI [Genesereth and Nilsson, 1988, p. 13], in which every predicate constant and logical symbol is named by a function expression. For instance, the conjunction R(a) ∧ S(b) is denoted by the inﬁx function expression r(a) ∧ s(b). Unlike the naming techniques used by [Gordon, 1995] and [Prakken and Sartor, 1996], RBL’s technique reﬂects the logical structure of the named formula. Rules are named with a function symbol rule, resulting in terms like rule(r, p(x), q(x)) Here r is a ‘rule identiﬁer’, p(x) is the rule’s condition, and q(x) is its consequent. RBL’s object language does not contain a conditional connective corresponding to the function symbol rule; rules can only be stated indirectly, by assertions that they are valid, as in Valid(rule(r, conditionr , conclusionr )) Hage and Verheij state RBL as extra inference rules added to standard ﬁrstorder logic or, in some versions, as extra semantic constraints on models of a ﬁrst-order theory. We ﬁrst summarise the most important rules and then give some (simpliﬁed) formalisations. 1. If a rule is valid, its conditions are satisﬁed and there is no evidence that it is excluded, the rule is applicable. 2. If a rule is applicable, it gives rise to a reason for its application. 3. A rule applies if and only if the set of all derivable reasons for its application outweighs the set of all derivable reasons against its application. 4. If a rule applies, it gives rise to a reason for its consequent. 5. A formula is a conclusion of the premises if and only if the reasons for the formula outweigh the reasons against the formula. Here is how a simpliﬁed formal version of inference rule (1) looks like. Note that condition and consequent are variables, which can be instantiated with the name of any formula. If Valid(rule(r, condition, consequent)) is derivable and Obtains(condition) is derivable and Excluded(r)) is not derivable, then Applicable(r, rule(condition, consequent)) is derivable. Condition (4) has the following form. If Applies(r, rule(condition, consequent)) is derivable, then Proreason(consequent) is derivable.

370

Henry Prakken and Giovanni Sartor

Finally, here is how in condition (5) the connection between object- and metalevel is made. If Outweighs(Proreasons(f ormula),Conreasons(f ormula)) is derivable, then Formula is derivable. Whether the pro-reasons outweigh the con-reasons must itself be derived from the premises. The only built-in constraint is that any nonempty set outweighs the empty set. Note that while f ormula is a variable for an object term, occurring in a well-formed formula of RBL, Formula is a metavariable which stands for the formula named by the term f ormula. This is how object and metalevel are in RBL connected. In RBL the derivability of certain formulas is deﬁned in terms of the nonderivability of other formulas. For instance, in (1) it may not be derivable that the rule is excluded. To deal with this, RBL adapts techniques of default logic, by restating the inference rules as conditions on membership of an extension. Using RBL In RBL exceptions can be represented both explicitly and implicitly. As for explicit exceptions, since RBL has the validity and applicability requirements for rules built into the logic, the exclusion method of Section 2 can be used. RBL also supports the choice approach: if two conﬂicting rules both apply and do not exclude each other, then their application gives rise to conﬂicting reasons, which have to be weighed. Finally, Hage and Verheij formalise legal priority principles in a similar way as [Kowalski and Toni, 1996], representing them as inapplicability rules. The following example illustrates their method with the three well known legal principles Lex Superior (the higher regulation overrides the lower one), Lex Posterior (the later rule overrides the earlier one) and Lex Specialis (the speciﬁcity principle). It is formalised in the language of [Prakken and Sartor, 1996]; recall that with respect to applicability, this system follows, as RBL, the exclusion approach. The three principles can be expressed as follows. H: x conflicts with y ∧ y is inferior to x ∧ ∼ ¬appl(x) ⇒ ¬appl(y) T : x conflicts with y ∧ y is earlier than x ∧ ∼ ¬appl(x) ⇒ ¬appl(y) S: x conflicts with y ∧ x is more specific than y ∧ ∼ ¬appl(x) ⇒ ¬appl(y) Likewise for the ordering of these three principles: HT : T conflicts with H ∧ ∼ ¬appl(H) ⇒ ¬appl(T ) T S: S conflicts with T ∧ ∼ ¬appl(T ) ⇒ ¬appl(S) HS: S conflicts with H ∧ ∼ ¬appl(H) ⇒ ¬appl(S) Thus the metatheory of the logic does not have to refer to priorities. However, the method contains another metareasoning feature, viz. the ability to express metalevel statements of the kind x conflicts with y.

The Role of Logic in Computational Models of Legal Argument

371

Evaluation RBL clearly conﬁnes itself to the logical and dialectical layer of legal argument. At these layers, it is a philosophically well-motivated analysis of legal reasoning, while technically it is very expressive, supporting reasoning with rules and exceptions, with conﬂicting rules, and about rules and their priority relations. However, it remains to be investigated how RBL can, given its complicated technical nature and the lack of the notion of an argument, be embedded in procedural and heuristic accounts of legal argument. 4.4

Procedural Accounts of Legal Reasoning

The Pleadings Game is not the only procedural AI & Law model. We now brieﬂy discuss some formal models of this kind. Hage, Leenes, and Lodder At the same time when Gordon designed his system, Hage et al. [1994] developed a procedural account of Hart’s distinction between clear and hard cases. They argued that whether a case is easy or hard depends on the stage of a procedure: a case that is easy at an earlier stage, can be made hard by introducing new information. This is an instance of their purely procedural view on the law, which incorporates substantive law by the judge’s obligation to apply it. To formalise this account, a Hamblin-MacKenzie-style formal dialogue system with the possibility of counterargument was developed. This work was extended by [Lodder, 1999] in his DiaLaw system. The general setup of these systems is the same as that of the Pleadings Game. For the technical diﬀerences the reader is referred to the above publications. One diﬀerence at the dialectical layer is that instead of an argument-based logic, Hage and Verheij’s reason-based logic is used. Another diﬀerence in [Hage et al., 1994] is that it includes a third party, the referee, who is entitled to decide whether certain claims should be accepted by the parties or not. The dialogue systems also support disputes about the procedural legality of a move. Finally, arguments do not have to be logically valid; the only use of reason-based logic is to determine whether a claim of one’s opponent follows from one’s commitments and therefore must be accepted. Bench-Capon Bench-Capon [1998] has also developed a dialogue game for legal argument. As the above-discussed games, it has the possibility of counterargument (although it does not incorporate a formalised account of the dialectical layer). The game also has a referee, with roughly the same role as in [Hage et al., 1994]. Bench-Capon’s game is especially motivated by the desire to generate more natural dialogues than the “stilted” ones of Hamblin-MacKenziestyle systems. To this end, arguments are deﬁned as variants of Toulmin’s [1958] argument structures, containing a claim, data for this claim, a warrant connecting data and claim, a backing for the warrant, and possible rebuttals of the claim with an exception. The available speech acts refer to the use or attack of these items, which, according to Bench-Capon, induces natural dialogues.

372

Henry Prakken and Giovanni Sartor

Formalising Allocations of the Burden of Proof Above we supported Allen and Saxon’s [1989] criticism of Sergot et al.’s [1986] purely logical- and dialectical-layer account of reasoning with exceptions. Additional support is provided by Prakken [2001a], who argues that allocations of burden of proof cannot be modelled by ‘traditional’ nonmonotonic means. Burden of proof is one of the central notions of legal procedure, and it is clearly connected with defeasibility [Loui, 1995, Sartor, 1995]. There are two aspects of having the burden of proving a claim: the task to come with an argument for that claim, and the task to uphold this argument against challenge in a dispute. The ﬁrst aspect can be formalised in Hamblin-MacKenzie-style dialogue systems (discussed above in Section 2.3). The second aspect requires a system that assesses arguments on the basis of the dialectical interactions between all available arguments. At ﬁrst sight, it would seem that dialectical proof theories of nonmonotonic logics can be directly applied here. However, there is a problem, which we shall illustrate with an example from contract law. In legal systems it is generally the case that the one who argues that a valid contract exists has the burden of proving those facts that ordinarily give rise to the contract, while the party who denies the existence of the contract has the burden of proving why, despite these facts, exceptional circumstances prevent the contract from being valid. Now suppose that plaintiﬀ claims that a contract between him and defendant exists because plaintiﬀ oﬀered defendant to sell her his car, and defendant accepted. Then plaintiﬀ has the burden of proving that there was such an oﬀer and acceptance, while defendant has the burden of proving, for instance, that the car had a hidden defect. Suppose we formalise this in [Prakken and Sartor, 1996] as follows: r1 : oﬀer ∧ acceptance ∧ ∼ exception(r1 ) ⇒ contract r2 : hidden defect ⇒ exception(r1 ) Suppose further that in the dispute arguments for and against hidden defect are exchanged, and that the judge regards them of equal strength. What follows dialectically? If plaintiﬀ starts with moving his argument for contract , then defendant can assumption-defeat this argument with her argument for exception(r1 ). Plaintiﬀ cannot attack this with his argument against hidden defect since it is of equal strength as defendant’s argument, so it does not strictly defeat it. In conclusion, plaintiﬀ’s argument is not justiﬁed (but merely defensible), so the outcome of our logical reconstruction is that plaintiﬀ has not fulﬁlled his burden of proof. However, the problem with this reconstruction is that it ignores that neither has defendant fulﬁlled her burden of proof: she had to prove hidden defect , but her argument for this conclusion also is merely defensible. The problem with the (sceptical) dialectical proof theory is that plaintiﬀ has the burden of proof with respect to all subissues of the dispute; there is no way to distribute the burden of proof over the parties, as is common in legal dispute. This problem is not conﬁned to the particular system or knowledge representation method, but seems a fundamental problem of current ‘traditional’ nonmonotonic logics.

The Role of Logic in Computational Models of Legal Argument

373

An additional problem for such logics is that in legal procedure the allocation of the burden of proof is ultimately a matter of decision by the judge, and therefore cannot be determined by logical form. Any full model of reasoning under burden of proof should leave room for such decisions. In [Prakken, 2001a] the dialectical proof theory for grounded semantics is adapted to enable distributions of the burden of proof over the parties, which in [Prakken, 2001b] is embedded in a dialogue game model for legal procedure. The basic idea of [Prakken, 2001a] is that the required strength of a move depends on who has the burden of proof concerning the issue under attack (as decided by the judge in the dialogue game). The resulting system has no clear link with argument-based semantics in the style of [Dung, 1995, Bondarenko et al., 1997]. For logicians this is perhaps disappointing, but for others this will count as support for the view that the semantics of (legal) defeasible reasoning is essentially procedural. ZENO’s argumentation framework Another account of distributions of the burden of proof in dialectical systems is given by Gordon and Kara¸capilidis [1997]. In fact, [Prakken, 2001a]’s proposal can partly be seen as a generalisation and logical formalisation of this account. Gordon and Kara¸capilidis incorporate variants of Freeman and Farley’s ‘levels of proof’ in their ‘ZENO argumentation framework’. This is the dialectical-layer part of the ZENO argument mediation system: it maintains a ‘dialectical graph’ of the issues, the positions with respect to these issues, and the arguments pro and con these positions that have been advanced in a discussion, including positions and arguments about the strength of other arguments. Arguments are links between positions. Part of the framework is a status assignment to positions: each position is assigned in or out depending on two factors: the required level of proof for the position, and the relative strengths of the arguments pro and con the position that themselves have antecedents that are in. For instance, a position with level ‘scintilla of evidence’ is in iﬀ at least one argument pro is in (here they deviate from Freeman and Farley). And a position with level ‘preponderance of evidence’ is in iﬀ the joint pro arguments that are in outweigh the joint con arguments that are in. The burden of proof can be distributed over the parties since levels of proof can be assigned to arbitrary positions instead of (as in [Freeman and Farley, 1996]) only to the initial claim of a dispute.

4.5

Formalisations of the Heuristic Layer

In logical models of legal argument the heuristic layer has so far received very little attention. Above we discussed Prakken and Sartor’s [1998] logical reconstruction of HYPO-style ana