Center for Electronic Texts in the Humanities
Subject: Russian corpora available by ftp/gopher/www
Recently I have been spending a lot of time distributing Russian text corpora that I have collected; I have about 14 MB of various literary and non-literary texts, and word has gotten out. I'm happy to do this, but one-by-one distribution is not very efficient! I have now found a home for them on the ftp/gopher/www server at infomeister.osc.edu. Via ftp, they are in the directory pub/central_eastern_europe/russian/corpora; explicit directions for retrieval by all three methods are given below. Along with the texts I have posted an inventory of files (which will be updated periodically as I acquire and post more texts), an ascii character map of the Cyrillic coding used, and a set of bitmapped Mac fonts that I use to display these files. Questions about the texts and their preparation should be addressed to me; technical questions about the server and file retrieval can be addressed to the person handling the /russian directory, Dr. Jan Labanowski, at jkl@osc.edu.
I would be delighted to receive any and all additional Russian corpora, or news of where more can be found.
George Fowler GFowler@Indiana.Edu [Email] Dept. of Slavic Languages (812) 855-2829 [office] Ballantine 502 (317) 726-1482 [home] Indiana University (812) 855-2624/-2608/-9906 [dept.] Bloomington, IN 47405 USA (812) 855-2107 [dept. fax]
Alex Catalogue of Electronic Texts on the World-Wide Web!
The Alex Catalogue of Electronic Texts on the Internet is now available on the World-Wide Web at
http://www.lib.ncsu.edu/stacks/alex-index.html or gopher://rsl.ox.ac.uk:70/11/lib-corn/hunter or gopher://gopher.lib.ncsu.edu/11/library/stacks/Alex
Alex helps users to find and retrieve the full-text of documents on the Internet. It currently indexes over 1800 books and shorter texts by author and title, incorporating texts from Project Gutenberg, Wiretap, the On-line Book Initiative, the Eris system at Virginia Tech, the English Server at Carnegie Mellon University, Project Bartlesby, CCAT, the on-line portion of the Oxford Text Archive, and many others. Alex includes no serials.
Project Gutenberg's own FTP site is at mrcnect.cso.uiuc.edu. You can login as anonymous; use your email address as the password. All the etext are located in pub/etext in specific directories designated by the year in which an etext was released. An index of available etext is in the files INDEX100.GUT and INDEX200.GUT (mrcnext is a UNIX machine, so it is case specific; you must use the capital letters).
Please note!! mrcnext is no longer the default server for the Project Gutenberg Etexts!! Please use uiarchive.cso.uiuc.edu, as explained in detail below:
ftp uiarchive.cso.uiuc.edu or ftp 128.174.5.14 login: anonymous password: yourname@your.machine cd pub cd etext cd gutenberg cd etext95 [or 94, 93, 92, 91 or 90. 70's and 80's are in /etext90] get filename (be sure to set bin, if you get the .zip files) get more files quitTo subscribe to the Project Gutenberg Newsletter:
sub gutnberg Firstname Lastname send to: listserv@uiucvmd (bitnet) listserv@vmd.cso.uiuc.edu (internet)
Or to volunteer, use this message.
sub gutvol-l Firstname Lastname
A new WWW edition of Laurence Sterne's _A Sentimental Journey through France and Italy_ is now available from Stony Run, Richard Bear's home page. It makes heavy use of entities, nested font commands, and
Stony Run: http://www-vms.uoregon.edu/~rbear/ Sterne: http://www-vms.uoregon.edu/~rbear/sterne.html
Incidentally, while we're on the subject, there's a project to catalog e-text projects: CPET, Catalog of Projects in Electronic Text, run by Georgetown Univ.'s Center for Text and Technology (CTT -- I just discovered that PCMCIA "really" stands for People Can't reMember Computer Industry Acronyms, and I'm starting to get their point.) Includes info. on several hundred projects, arranged by subject, all in the humanities. Ftp to guvax.georgetown.edu, directory: cpet_projects_in_electronic_text Yes, this is a vax and those are underline characters. You can also try gophering to Georgetown U.'s gopher, it's supposed to be available from there too.
*********** BRITISH NATIONAL CORPUS DISTRIBUTION BEGINS **************
On behalf of the BNC Consortium, OUCS is very happy to announce that we expect to start distributing copies of the long-awaited and British National Corpus to licence holders during the week beginning 22 May.
This corpus contains 100 million words, from over 4000 different texts carefully selected to give maximal coverage of the varieties of modern British English, both spoken and written. The corpus is automatically tagged for part of speech, using the CLAWS stochastic parser developed at UCREL, and marked up in SGML, following the TEI Guidelines for corpus encoding.
The corpus is currently available under academic licence within the European Union only. The first release, comprising three CDs and a detailed technical manual, currently costs under 200 pounds. For full details, including ordering and licensing information, please see our web pages at http://info.ox.ac.uk/bnc or write to the address below.
INSTITUUT VOOR NEDERLANDSE LEXICOLOGIE On-line access to 27 million Words Dutch Newspaper Corpus for non-commercial purposes.
The Institute for Dutch Lexicology INL offers you the possibility to consult a text corpus of over 27 million words of Dutch newspaper text, by the international computer network. In 1994, a 5 Million Words Corpus with diversified composition has been made accessible in a similar way.
The retrieval system is essentially the same as that for the 5 Million Words Corpus 1994. It allows you to search for single words or for word patterns, including some predefined syntactic patterns that can be changed by the user. Searches concern the levels of word form, part of speech (POS), and head word, both separately and in combination by use of Boolean operators and proximity searches. During the search, data concerning frequency and distribution over the texts are provided at several levels. The output most often is a list of items, or a series of concordances (words in context) with a variable, user-defined textual context. Sorting facilities may support your analysis of the output data. With some limitations due to copyright, the output of your searches can be transfered to your own computer by e-mail. It is not allowed to transfer complete texts or substantial text parts.
Most of the data has not been corrected, neither on the level of the text, nor on the level of POS and headword. POS and headword have automatically been assigned to the word forms in the electronic text by lingware developed at the INL.
The provider of the texts has given permission for use of the materials for non-commercial, research purposes only.
Please note that for an optimal use of the retrieval system, the use of a VT 220 (or higher) terminal, or an appropriate terminal-emulator (e.g. Kermit) is recommended. In order to get access to this corpus, an individual user agreement has to be signed. An electronic user agreement form can be obtained from our mailserver Mailserv@Rulxho.Leidenuniv.NL. Type in the body of your e-mail message: SEND [27MLN95]AGREEMNT.USE. For access to the 5 Million Words Corpus 1994, a separate user agreement is required, which can be obtained from the same mailserver, by the message SEND [5MLN94]AGREEMNT.USE .
Please make a hard copy of the agreement form, sign it, keep a copy yourself, and return a signed copy to: Institute for Dutch Lexicology INL, P.O. Box 9515, 2300 RA Leiden. Fax: 31 71 27 2115.
After receipt of the signed user agreement, you will be informed about your username and password.
If you need additional information, please send an e-mail message to Helpdesk@Rulxho.Leidenuniv.NL, or send a fax to Mrs. dr. J.G. Kruyt.
We have the following CD-ROMs which may be of interest to the list :
TITLE PRICE (UK sterling) 20,000 leagues under the sea 24 American poetry 31 Bookshelf (dict.,thes.,quotes) 27 Bronte Sisters 19 Christmas carol 19 Classic Library 19 Collins electronic dictionary 59 Complete bookshop 19 Concise Oxford dictionary 43 Crucible 99 Dickens 25 Don Quixote 19 Electronic Home Library 19 Fall of the house of Usher 19 Famous novels 39
Ken Gourlay
EDINBURGH MULTIMEDIA
3 Hayfield
Edinburgh EH12 8UJ
SCOTLAND
Tel & fax +44 (0)131 339 5374 (24 hours)
Internet k.gourlay@bbcnc.org.uk
Worldwide Web :
Home page http://www.worldserver.pipex.com/nc/edinmedia/
http://www.echo.lu/impact/projects/imm/en/ecfolk1.html
http://www.scotborders.co.uk/mmf/directory/smgs/smg8.html
http://www.phy.hw.ac.uk/~phyjgc/
There are two new texts in the Edmund Spenser Home Page:
Edmund Spenser's doleful dirge Daphnaida [1591,1596] is now
available on the Edmund Spenser Home Page.
URL of home
page:http://darkwing.uoregon.edu/~rbear/
URL of Daphnaida: http://darkwing.uoregon.edu/~rbear/daphna.html
Prothalamion Colin Clout comes home againe
URL: http://darkwing.uoregon.edu/~rbear/
Richard Bear http://www-vms.uoregon.edu/~rbear/
New publication at the CETEDOC: the Thesaurus Pseudo-Dionysii Areopagitae, versiones latinae cum textu graeco
See: http://juppiter.fltr.ucl.ac.be/FLTR/TEDM/pseudo-dionysii/f_pseudo-dionysii.html Thanks in advance.
Jean Schumacher
The Victorian Women Writers Project is an electronic collection
of texts by British women writers of the late Victorian period.
Currently, the collection includes works by Louisa Bevington,
Amy Levy, Eliza Keary, Maud Keary and Dollie Radford, with works
by Mathilde Blind, Dinah Maria Mulock Craik and Louise Guiney in
preparation. Currently, the collection includes volumes of poetry
and verse drama, with plans to include other literary and
critical texts in the future. Considerable attention will be
given to the accuracy and completeness of the texts, and to
accurate bibliographical descriptions of them. The Victorian
Women Writers Project is supported by Indiana University's
Library Electronic Text Resource Service (LETRS) and is available
for use through the World Wide Web at
.
Perry Willett
Coleridge and Wordsworth's landmark 1798 Lyrical Ballads
has been updated from ASCII to html and is now acessible from the URL:
http://darkwing.uoregon.edu/~rbear/ballads.html
Richard Bear
Leibniz in WWW critical edition
[Reply-To: epasini@znort.it]
First-ever critical edition made expressly for the Net. It's Leibniz.
URL: http://www.znort.it/suiseth/drole/drole.html
The etext of Gay's Beggar's Opera has been updated to html with linked
notes.
http://darkwing.uoregon.edu/~rbear/beggar.html
The TLG's new web page can be accessed at http://www.tlg.uci.edu/~tlg. We
invite suggestions for further information which we might provide in order
to assist TLG users.
The Consortium for Latin Lexicography would like to announce the Home Page
for the Electronic Thesaurus Linguae Latinae, located at:
http://www.cs.usask.ca/grads/devito/e-TLL/
These web pages describe the planned development of a TLL in electronic
form. We hope to continue to publish progress reports on the Electronic
TLL at this site as work proceeds.
For more information on these web pages, the Electronic TLL project, or
the Consortium for Latin Lexicography, please contact CLL Director Patrick
Sinclair at CLL@uci.edu or CLL Systems Analyst Ann DeVito at
devito@cs.usask.ca.
The Consortium for Latin Lexicography would like to announce that the Home
Page for the Electronic Thesaurus Linguae Latinae has moved. The new URL
is:
http://www.cs.usask.ca/faculty/devito/e-TLL/
The Duke Papyrus Archive on the World Wide Web at
http://scriptorium.lib.duke.edu/papyrus/
has now virtually completed the
task, which began in September of 1992, of making the Duke papyri more
accessible. Available are records and images of all 1373 inventory
numbers of papyri in the Duke University Collection. (About 200 images
remain to be added.) The approximately 2000 images of these texts are
presented in three ways: a "thumbnail," a 72 dpi image and a 150 dpi
version. All images are linked to catalogue records.
on-line hypertext edition of the Diderot-d'Alembert
_Encyclopedie_ in the ARTFL database at the Univ. of Chicago.
http://tuna.uchicago.edu/ARTFL.html
I have mounted a small sample of the Encyclopedie with a couple
of experimental images. Additional images for the Encyclopedie
can be found at:
http://tuna.uchicago.edu/images/ENC/ENC.image_test.html
While I'm at it, let me plug our exhibition of Renaissance
Dante in Print, 1472-1629 which contains some 450 images of
every Italian edition of Dante printed during the Renaissance:
http://tuna.uchicago.edu/Dante/Dante_Ex1.html
In its current form, the site contains an English-Latin HTML edition of
DesCartes' "Meditations on First Philosophy".
The URL is:
http://philos.wright.edu/Descartes/Meditations.html
The texts contain only navigational links. We encourage anyone
interested to download these texts and to create annotated editions and
then to share their new editions by loading them on their Web servers.
To facilitate this collaboration we have added a page "DesCartes'
Myriogon" where we will provide links to editions based upon our primary
sources.
Dictionnaire de l'Academie francaise: Base Academie Echantillon en ligne.
Composante du Projet d'informatisation des huit editions completes du
Dictionnaire de l'Academie francaise, la Base Academie Echantillon vient
d'etre mise en ligne sur l'internet a l'adresse suivante:
http://www.epas.utoronto.ca:8080/~wulfric/academie/
La Base Echantillon comprend un choix d'articles indexes, les memes pour
chaque edition, un index des mots-clefs metalinguistiques, un index des
occurrences cachees, les pages de titre en images GIF et des notices
explicatives. Cette base est concue a la fois comme modele propose a la
critique et comme outil de travail didactique, linguistique et
metalexicographique.
Le soussigne invite tout commentaire et toute correction.
Russon Wooldridge
ENGLISH VERSION
On-line Sample Database of the Dictionnaire de l'Academie francaise.
A component of the Dictionnaire de l'Academie francaise Database Project
(computerization of the eight complete editions, 1694-1935), the Sample
Academie Database has been put on the internet at the following address:
http://www.epas.utoronto.ca:8080/~wulfric/academie/
The Sample Database includes a selection of articles, the same for each
edition, an index of metalinguistic keywords, an index of hidden occurrences,
GIF images of the title pages and explanatory notes. It is conceived both as a
model for criticism and as a didactic, linguistic and metalexicographical
resource.
The undersigned invites comments and corrections.
Russon Wooldridge
In late 1995, the UM Humanities Text Initiative mounted the most
recent and now complete version of the Patrologia Latina Database.
While the PLD is restricted to access by UM faculty, staff, and students,
the web-based support resources such as the list of authors by volume
and (for other implementors) the editorial policy, are unrestricted.
These resources and search screens can be found at
http://www.hti.umich.edu/latin/pld/
For more information on the HTI and access to publicly available
collections, please use
http://www.hti.umich.edu/
John Price-Wilkin
UM HTI American Verse: http://www.hti.umich.edu/english/amverse/
The University of Michigan Humanities Text Initiative, along with the
University of Michigan Press, is proud to announce the release of a
new textual resource, the American Verse Project. American Verse
is a growing collection of texts encoded in SGML using the TEI
Guidelines. The collection is made accessible in SGML, dynamically
rendered HTML, and as a searchable database. As with all of the other
Humanities Text Initiative resources, simple word and phrase searches
are supported, as well as proximity searches, and searches for verses
or paragraphs containing two or three words/phrases. The project uses
an unusual model for rights for a project involving a university press:
no restrictions or costs are placed individual and research use of the
materials practical restrictions and cost; the texts are available
for sale to other publishers and agencies who wish to provide access to
the texts from their own system. We will continue to expand the
collection as time and resources allow and hope to add ten more volumes
in the next month.
The following (10) texts have been added to the American Verse Project
collection, bringing the total collection to 35. As before, all are
part of a searchable collection; also, each can be browsed in HTML or
can be retrieved in its entirety in SGML (TEI encoding). In this
release we include two more works by African-American women, and will
soon release three more works (noted at the end of this announcement).
We are also, with this release, including a list of nearly 400 American
poets who have published material before 1920.
http://www.hti.umich.edu/english/amverse/hyperbib.html
We will continue to add names to the list and hope to gradually expand the
list to include bibliographies for the poets and to link to other
materials on the 'net which are not a part of the American Verse Project.
The trilingual HTML edition of Rene Descartes' "Meditations on First
Philosophy" is now available at:
http://philos.wright.edu/DesCartes/Meditations.html
The texts are:
1) The 1641 Latin
2) The 1647 Duc de Luynes French Translation [corrected by Descartes]
3) The 1901 John Veitch English Translation
Paragraph by paragraph cross-navigational links are provided. The
paragraphs have also been numbered -- using the Latin edition for
"paragraph authority" -- to facilitate references to these texts.
Further information about the Linguistic Data Consortium and its available
corpora can be
accessed on the Linguistic Data Consortium WWW Home Page at
http://www.cis.upenn.edu/~ldc.
Information is also available via ftp
at ftp.cis.upenn.edu under pub/ldc; for ftp access, please use
"anonymous" as your login name, and give your email address when asked
for password.
Edmund Spenser's 1591 Complaints
has been completed in html and is now acessible from the URL:
http://darkwing.uoregon.edu/~rbear/complaints.html
The Center for Electronic Texts in the Humanities (CETH) is pleased to announce
the availability on the World-Wide Web of three pilot projects in SGML markup
according to the guidelines of the TEI (Text Encoding Initiative). These are
the first in what we hope will be a continuing series of projects to
demonstrate various ways of using TEI encoding to create Humanities resources.
The projects' front page is at URL:
http://www.ceth.rutgers.edu/projects/hercproj/front.htm
A new WWW edition of Sir Philip Sidney's pageant The Lady of May
is now available at the URL:
http://darkwing.uoregon.edu/~rbear/may.html.
It includes an introduction, which can be skipped over with a click :-),
and clickable notes.
Date: Sun, 26 May 1996
Humanist Discussion Group, Vol. 10, No. 56.
This is to announce a new project launched by the "Centro
Linceo Interdisciplinare" of the "Accademia Nazionale dei
Lincei" [via della Lungara, 10 - 00165 Roma]
The project, named "Archivio Testuale Multimediale" (ARTEM),
will pursue three main goals:
1) To build a repository of electronic texts in Italian language,
selected on the basis of the best editorial reliability, and
fully encoded according to the best standards available.
The repository will be freely accessible in www network.
2) To link the repository to other similar ones, offering
the same scientific reliability.
3) To build a catalogue of existing electronic texts in
Italian language, providing a statement of their editorial
reliability and encoding methodology, and stating if and how
they are available.
Special attention is devoted to the problems of encoding,
following the SGML procedures, according to the standards
proposed by TEI. The previous analysis of textual features,
to obtain the full list of elements to encode, will be
declared and discussed.
Collaboration is evisaged with the Oxford Text Archive,
Princeton's CETH, the Tresor de la Langue Francaise,
the Institut fur deutsche Sprache of Mannheim, and all
academic Institutions dealing with electronic texts and
interested in this project.
All those interested in the project, and especially those
who can provide information on existing e-texts in Italian, may
contact the following e-address:
lincei@axcasp.caspur.it
Tito Orlandi,
General Editor, Victorian Women Writers Project
Main Library
Indiana University
Theodore F. Brunner, Director
Thesaurus Linguae Graecae
We have begun full data entry and hope to have the first volumes
ready sometime the in the summer.
University of Toronto
Department of French, Trinity College
University of Toronto, Toronto M5S 1H8, Canada
Tel: 1-416-978-2885 -- Fax: 1-416-978-4949
E-mail: wulfric@epas.utoronto.ca
Internet: http://www.epas.utoronto.ca:8080/~wulfric/
A sample GIF of a result screen is at:
http://www.hti.umich.edu/latin/pld/pld-samp.gif
From: Humanist
To: Humanist Discussion Group
Subject: 10.0056 ARTEM: new project in e-text
X-To: Humanist
Center for Electronic Texts in the Humanities (Princeton/Rutgers)
Information at http://www.princeton.edu/~mccarty/humanist/
[1] From: orlandi@rmcisadu.let.uniroma1.it
Subject: new project
Accademia dei Lincei,
and Universita di Roma La Sapienza