Wikidata:Lexicographical data/Documentation

Other languages:

Deutsch • ‎English • ‎Türkçe • ‎dansk • ‎español • ‎français • ‎polski • ‎русский • ‎српски / srpski • ‎українська • ‎العربية • ‎中文 • ‎日本語 • ‎한국어

Overview

Documentation

Development

Tools

Support for Wiktionary

How to help

Lexemes

Discussion

Wikidata:Lexicographical data

This documentation page is currently being reworked. Some important changes may occur.

This is the main documentation page for lexicographical data on Wikidata. Since the new data system is not deployed yet, this documentation is incomplete and mostly based on the test system.

See also the technical documentation on extension WikibaseLexeme.

Introduction[edit]

Data Model[edit]

Visualization of the Lexeme data model

The data model of WikibaseLexeme describes the structure of the data that is handled as "Lexemes" in Wikibase. The text below is a summary; for more detailed information, see Extension:WikibaseLexeme/Data Model.

A Lexeme is a lexical element of a language, such as a word, a phrase, or a prefix (see Lexeme on Wikipedia). Lexemes are Entities in the sense of the Wikibase data model. A Lexeme is described using the following information:

An ID. Lexemes have IDs starting with an "L" followed by a natural number in decimal notation, e.g. L3746552. These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Lexeme.
A Lemma for use as a human readable representation of the lexeme, e.g. "run".
The Language to which the lexeme belongs. This is a reference to a concrete Item, e.g. English (Q1860).
The Lexical category to which the lexeme belongs. This is given as a reference to a concrete Item, e.g. adjective (Q34698).
A list of Lexeme Statements to describe properties of the lexeme that are not specific to a Form or Sense (e.g. derived from or grammatical gender or syntactic function)
A list of Forms, typically one for each relevant combination of grammatical features, such as 2nd person / singular / past tense. A Form is described using the following information:
- An ID. Forms have IDs starting with the ID of the Lexeme they belong to, followed by a hyphen ("-") and an "F", followed by a natural number in decimal notation: e.g. L3746552-F7
- A representation, spelling out the Form as a string.
- A list of grammatical features that define for which syntactic role the given form applies. These are given as references to a concrete Items, e.g. participle (Q814722) for participle.
- A list of Form Statements further describing the Form or its relations to other Forms or Items (e.g. IPA transcription (P898), pronunciation audio, rhymes with, used until, used in region)
A list of Senses, describing the different meanings of the lexeme (e.g. "financial institution" and "edge of a body of water" for the English noun bank). A sense is described using the following information:
- An ID. Senses have IDs starting with the ID of the Lexeme they belong to, followed by a hyphen ("-") and an "S", followed by a natural number in decimal notation: e.g. L3746552-S4. These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Sense.
- A Gloss, defining the meaning of the Sense using natural language.
- A list of Sense Statements further describing the Sense and its relations to Senses and Items (e.g. translation, synonym, antonym, connotation, register, denotes, evokes).

This data model is further extended by the set of properties typically used for Lexeme statements, Form statements, and Sense statements. See Wikidata:Lexicographical data/Properties for an overview of these properties and Wikidata:Property proposal/Lexemes for current proposals of additional properties.

Sample Lexeme by Language and Lexical Category
	verb	noun	pronoun	adjective	adverb	preposition	postposition	conjunction	interjection	numeral	determiner	grammatical particle
Arabic	ذهب	كتاب	انا	جميل	عادةً	في		لكن‎ (بس)	يعني	واحد	هذا
English	go	book	I	beautiful	usually	in		but	oh	one	this
German	wissen	Zukunft	ich	ausgezeichnet	querbeet	in		aber	ach	eins	dieser
Gangbuk-gu	먹다	사람	나	괴롭다	함께				가만	극	고전적
French	aller	livre	je	beau	toujours	dans		mais	merci	un	ce
Pashto	تلل	کتاب	زه	ښکلی		په		خو		یو
Persian	رفتن	کتاب	من	زیبا		در	را	اما	آخ	یک	این
Russian	быть	вода	я	хороший	хорошо	в	-	и	всё	три	-	не

In some cases or languages, there may be multiple entities for related words, in others just one. The below table provides an overview how they may be linked:

One or several lexemes for nouns?
difference in	1 lexeme		2+ lexemes
sense	add several senses		add applicable sense to lexeme	link other(s) with homograph lexeme	duplicate forms on each
etym.	add etym. to each sense		add etym. to lexeme base	link other(s) with homograph lexeme	duplicate forms on each
gender	add gender to each sense		add gender to lexeme base	link other(s) with homograph lexeme	duplicate forms on each
common/proper	add several senses	use lexical category "noun"	add applicable sense to lexeme	link other(s) with homograph lexeme	duplicate forms on each
caps/lowercase	add several forms	qualify forms to applicable senses	add applicable sense to lexeme	link other(s) with homograph lexeme	add only applicable forms
singular/plural	add several forms	qualify forms to applicable senses	add applicable sense	if possible link other(s) with homograph lexeme	add only applicable forms
pronunciation	add the same form twice	qualify forms to applicable senses, add prononciation	add applicable sense	if possible link other(s) with homograph lexeme	add form and applicable pronunciation
forms/spelling	add several forms or alternate forms	qualify forms to applicable senses	add applicable sense	if possible link other(s) with homograph lexeme	add only applicable forms

For a given language and criterion (first column), just one of the two might apply

Interface[edit]

Lexeme[edit]

Screenshot of the Lexeme creation page

Create a new Lexeme

Go to Special:NewLexeme
Enter a lemma (dictionary form of a word)
Enter the language of the lexeme by typing the name of the language or Q-ID
In the field that appears above, enter the language code of the lemma
Enter the lexical category by typing its name or the Q-ID (example: verb, noun, adjective...)
Click on "Create"
The Lexeme is now created with this basic information, you can continue editing it

Screenshot of the top of a Lexeme page

Edit a Lexeme

Click on the edit button, next to the lemma
Edit the content of the different fields
- Lemma
- Language code of the lemma
- Language of the Lexeme
- Lexical category
Click on "publish"

Screenshot of the interface to edit a statement

Add, edit or delete statements of a Lexeme

To add a statement of a Lexeme, click on "add statement"
Enter a property: start typing its name in the property field (example: derived-from) and select it in the suggester
Enter a value
Just like on Items, you can add qualifiers and references
Save by clicking "publish"
To edit a statement, click on "edit"
To delete a statement, click on "edit", then "remove"

Delete a Lexeme

Go to WD:RFD

Search for a Lexeme

Here's how you can look for Lexemes, Lemmas, Forms or Senses, via Special:Search or the search box on any page:

look for a lexeme by its L-number
- by typing "Lexeme:L123"
- by typing "L123" and selecting the Lexeme namespace
look for a Lexeme by the name of its lemma
- by typing "Lexeme:sandbox"
- by typing "sandbox" and selecting the Lexeme namespace
use the L shortcut: "L:L123" or "L:sandbox"
look for a Form: (eg "Lexeme:mangeant") with any of the methods described above

Note that the selector (drop-down menu popping up to suggest results) is not working yet. But if you press Enter or search after typing your keyword, you'll access the results.

Form[edit]

add a Form

Create a new Form

In the Forms section, click on "add Form"
Fill the representation (mandatory)
Fill the language code of the representation (mandatory)
Enter one or several grammatical features, by typing their name and selecting them in the list of items

Edit a Form

Click on the edit button next to the representation
Modify the content in the fields
Click on "publish"

Delete a Form

Click on the edit button next to the representation
Click on Remove

Sense[edit]

Create a new Sense

In the Senses section of a Lexeme, click on "add Sense"
Enter a language code (for example: en, fr, zh)
Enter a gloss (very short phrase defining the meaning)
You can add new glosses by clicking on "add"
Click on "Publish"
Now the Sense is created, you can add statements

Edit a Sense

Click on the edit button, next to the Sense ID
Edit the content of the different fields
Click on "publish"

Remove a Sense

Click on the edit button, next to the Sense ID
Click on "remove"

Features[edit]

What is included in the first version[edit]

New datatypes: Lexeme, Form
Add, edit, delete Lexemes
Add, edit, delete Forms
Add, edit, delete statements
Add, edit, delete qualifiers
Add, edit, delete references
Linking to an Item from a Lexeme or a Form
Linking to another Lexeme from a Lexeme, a Form or an Item
Search and suggestions when entering a value
Basic internal APIs (used for UI, you should not use them)

What will be added in the future[edit]

Ordered from near to long-term plans

Search for content with Special:Search Done
Display the lemma in the history pages, recent changes and watchlist Done
Add, edit, delete Senses Done
RDF support and ability to query the data on query.wikidata.org Done
Better API support
Automatic generation of Forms
Data access on clients (other Wikimedia projects)
Editing data directly from Wiktionary

Oct	NOV	Dec
	13
2017	2018	2019

Wikidata:Lexicographical data/Documentation

Contents

Introduction[edit]

Data Model[edit]

Interface[edit]

Lexeme[edit]

Form[edit]

Sense[edit]

Features[edit]

What is included in the first version[edit]

What will be added in the future[edit]

See also[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Print/export

Tools

In Wikipedia