English-Homer

Data

What: Complete English translations of the Odyssey in txt format
Scope: 15+ major translations spanning from 1615 (Chapman) to 2018 (Green)
Format: UTF-8 encoded plain text files with consistent formatting
Processing:
- Cleaned of paratextual elements (introductions, notes, appendices)
- Structurally normalized (book divisions, line breaks, paragraphs)
- Semi-tagged using lightweight XML for character identifications, speeches, and epithets
- Annotated with book/chapter markers for cross-translation alignment

What: Comprehensive dataset of Homeric reception in English literature and translations
Structure: CSV with metadata on each work
Data fields:
- title: string - Full title of the work
- author: string - Translator or author name
- year: integer - Publication year
- period: string - Literary period (Renaissance, Victorian, Modern, etc.)
- span: string - Complete/Partial translation or adaptation
- verse: boolean - Whether in verse (true) or prose (false)
- publisher: string - Publishing house
- edition: string - Edition information
- source: string - Greek source text used (if specified)
- language: string - Original language of composition
- country: string - Country of publication
- reception: string - Brief classification of reception type (scholarly, popular, experimental)
- gender: string - Translator/author gender
- academic: boolean - Academic or non-academic background
- influence_score: float - Metric of cultural influence (citations, reprints)

What: Comparative metrics and key textual features across translations
Format: CSV/JSON for structured comparison
Features:
- title: string - Translation title
- author: string - Translator name
- year: integer - Publication year
- first_lines_book_I … first_lines_book_XXIV: string - Opening lines of each book
- Epithets by character (epithet_Odysseus, epithet_Penelope, etc.): dictionary - All epithets with frequencies
- num_lines: integer - Total line count
- num_words: integer - Total word count
- avg_sentence_length: float - Average sentence length
- lexical_diversity: float - Type-token ratio
- named_entities: dictionary - Character name variants with frequencies
- speech_ratio: float - Percentage of text in direct speech
- modernization_index: float - Computed metric of contemporary language use
- network_density: float - Density measure of character interaction network
- character_centrality: dictionary - Eigenvector centrality scores for main characters

What: Extracted interaction data between characters
Format: Edge list and node attributes in CSV/GEXF format
Features:
- nodes: Character identifiers with attributes (gender, mortality, divinity)
- edges: Interactions between characters
- weight: Frequency/intensity of interactions
- type: Nature of relationship (familial, antagonistic, supportive)
- book: Book number where interaction occurs
- initiator: Which character initiates interaction

What: Stylometric and linguistic analysis data
Format: JSON with nested metrics
Features:
- Readability scores (Flesch-Kincaid, SMOG)
- POS distribution
- Sentiment analysis by book and character
- Archaic language markers
- Topic modeling results
- Word embeddings for key concepts (nostos, xenia, etc.)

This site is open source. Improve this page.