Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Configuration

All settings can be overridden with DOKIMOS_-prefixed environment variables.

Core Settings

VariableDefaultDescription
DOKIMOS_LOG_LEVELINFODefault application log level
DOKIMOS_CORPUS_PATHcorpusDirectory used for the internal source corpus
DOKIMOS_INDEX_FILEcorpus/index.jsonJSON index file path
DOKIMOS_OUTPUT_DIRoutputDefault output directory for reports

Chunking Settings

VariableDefaultDescription
DOKIMOS_CHUNK_STRATEGYparagraphOne of paragraph, sentence, fixed
DOKIMOS_CHUNK_SIZE500Approximate words per chunk for fixed chunking
DOKIMOS_CHUNK_OVERLAP50Overlap between fixed chunks

Plagiarism Settings

VariableDefaultDescription
DOKIMOS_PLAGIARISM_BACKENDhybridOne of local, remote, or hybrid
DOKIMOS_SHINGLE_SIZE5Number of words per shingle
DOKIMOS_PLAGIARISM_JACCARD_THRESHOLD0.10Minimum Jaccard similarity to retain a candidate
DOKIMOS_PLAGIARISM_FUZZ_THRESHOLD60.0Minimum RapidFuzz score to keep a reranked match
DOKIMOS_PLAGIARISM_REMOTE_QUERY_MAX_CHUNKS5Maximum input chunks used as remote search queries
DOKIMOS_PLAGIARISM_REMOTE_QUERY_MAX_CHARS180Maximum characters sent in each remote query
DOKIMOS_PLAGIARISM_REMOTE_PER_PROVIDER_RESULTS3Maximum candidate sources requested from each provider per query
DOKIMOS_PLAGIARISM_REMOTE_TIMEOUT_SECONDS10.0Remote request timeout
DOKIMOS_PLAGIARISM_REMOTE_MAX_SOURCE_CHARS20000Maximum retained text size per fetched remote source
DOKIMOS_PLAGIARISM_REMOTE_FETCH_FULL_TEXTtrueAttempt to fetch source pages or PDFs instead of relying on metadata
DOKIMOS_PLAGIARISM_REMOTE_CONTACT_EMAILunsetOptional contact email included where provider etiquette recommends it
DOKIMOS_PLAGIARISM_REMOTE_ENABLE_OPENALEXtrueEnable OpenAlex provider
DOKIMOS_PLAGIARISM_REMOTE_ENABLE_CROSSREFtrueEnable Crossref provider
DOKIMOS_PLAGIARISM_REMOTE_ENABLE_ARXIVtrueEnable arXiv provider
DOKIMOS_PLAGIARISM_REMOTE_ENABLE_DUCKDUCKGOtrueEnable DuckDuckGo HTML web-search provider

AI-Likeness Settings

VariableDefaultDescription
DOKIMOS_AI_SHORT_SENTENCE_THRESHOLD8Sentences at or below this word count are considered short
DOKIMOS_AI_LONG_SENTENCE_THRESHOLD40Sentences at or above this word count are considered long
DOKIMOS_AI_SIGNAL_TRIGGER_THRESHOLD0.5Per-signal threshold to mark a signal as triggered
DOKIMOS_AI_RISK_HIGH_THRESHOLD0.6Aggregate score threshold for high risk
DOKIMOS_AI_RISK_MEDIUM_THRESHOLD0.3Aggregate score threshold for medium risk
DOKIMOS_AI_SHORT_DOCUMENT_WORDS80Documents below this size receive the short-document caveat

Typical Workflows

Analyze a single paper against free remote sources

dokimos analyze submissions/paper-01.pdf --format json

Combine remote discovery with a local course corpus

export DOKIMOS_INDEX_FILE=/tmp/course-a-index.json
dokimos index-sources ./course-a-corpus
dokimos analyze submissions/paper-01.pdf --format json

Force remote-only analysis

export DOKIMOS_PLAGIARISM_BACKEND=remote
dokimos analyze essay.txt --format json

Force local-only offline analysis

export DOKIMOS_PLAGIARISM_BACKEND=local
dokimos index-sources ./course-a-corpus
dokimos analyze essay.txt --format json

Use sentence chunking for a short-form writing set

export DOKIMOS_CHUNK_STRATEGY=sentence
dokimos analyze essay.txt --format json

Generate a saved report for later inspection

dokimos analyze essay.txt --json-out reports/essay.json