Contents¶
FCS SRU Server¶
Based on Java implementation git commit:
0091fca0a4add134c478beed422dd1399a5364e3
Differences:
a bit more pythonic (naming, interfaces, enums etc.)
no auth stuff yet
WIP output buffering, server framework might not allow this, so no streaming and everything is in memory until sent
server framework choice (wsgi, asgi), for now
werkzeug
TODO: refactoring to allow async variants for streaming responses (large resources), e.g. with
starlette
Summary¶
This package implements the server-side part of the SRU/CQL protocol (SRU/S) and conforms to SRU version 1.1 and 1.2. SRU version 2.0 is mostly implemented but might be missing some more obscure features. The library will handle most of the protocol related tasks for you and you’ll only need to implement a few classes to connect you search engine. However, the library will not save you from doing your SRU/CQL homework (i.e. you’ll need to have at least some understanding of the protocol and adhere to the protocol semantics). Furthermore, you need to have at least some basic understanding of Python web application development (wsgi in particular) to use this library.
More Information about SRU/CQL: http://www.loc.gov/standards/sru/
The implementation is designed to make very minimal assumptions about the
environment it’s deployed in. For interfacing with your search engine, you
need to implement the SRUSearchEngine
interface. At minimum, you’ll need
to implement at least the search()
method. Please check the Python API
documentation for further details about this interface.
The SRUServer
implements the SRU protocol and uses your supplied search engine
implementation to talk to your search engine. The SRUServer is configured
using a SRUServerConfig
instance. The SRUServerConfig
reads an XML document,
which contains the (static) server configuration. It must conform to the
sru-server-config.xsd
schema in the src/clarin/sru/xml/
directory.
Installation¶
# from github/source
python3 -m pip install 'fcs-sru-server @ git+https://github.com/Querela/fcs-sru-server-python.git'
# (locally) built package
python3 -m pip install dist/fcs_sru_server-<version>-py2.py3-none-any.whl
# or
python3 -m pip install dist/fcs-sru-server-<version>.tar.gz
# for local development
python3 -m pip install -e .
In setup.cfg
:
[options]
install_requires =
fcs-sru-server @ git+https://github.com/Querela/fcs-sru-server-python.git
Build source/binary distribution¶
python3 -m pip install build
python3 -m build
Development¶
Uses
pytest
(with coverage, clarity and randomly plugins).
python3 -m pip install -e .[test]
pytest
Run style checks:
# general style checks
python3 -m pip install -e .[style]
black --check .
flake8 . --show-source --statistics
isort --check --diff .
mypy .
# building the package and check metadata
python3 -m pip install -e .[build]
python3 -m build
twine check --strict dist/*
# build documentation and check links ...
python3 -m pip install -e .[docs]
sphinx-build -b html docs dist/docs
sphinx-build -b linkcheck docs dist/docs
Build documentation¶
python3 -m pip install -r ./docs/requirements.txt
# or
python3 -m pip install -e .[docs]
sphinx-build -b html docs dist/docs
sphinx-build -b linkcheck docs dist/docs
See also¶
Reference¶
clarin.sru.constants¶
- class clarin.sru.constants.SRUOperation(value)[source]¶
-
SRU operation
- EXPLAIN = 'explain'¶
A
explain
operation
- SEARCH_RETRIEVE = 'searchRetrieve'¶
A
searchRetrieve
operation
- SCAN = 'scan'¶
A
scan
operation
- class clarin.sru.constants.SRUQueryType(value)[source]¶
-
An enumeration.
- CQL = 'cql'¶
shorthand queryType identifier for CQL
- SEARCH_TERMS = 'searchTerms'¶
- class clarin.sru.constants.SRURecordPacking(value)[source]¶
-
SRU 2.0 record packing.
- PACKED = 'packed'¶
The client requests that the server should supply records strictly according to the requested schema.
- UNPACKED = 'unpacked'¶
The server is free to allow the location of application data to vary within the record.
- class clarin.sru.constants.SRURecordXmlEscaping(value)[source]¶
-
SRU Record XML escaping (or record packing in SRU 1.2).
- XML = 'xml'¶
XML record packing
- STRING = 'string'¶
String record packing
- class clarin.sru.constants.SRURenderBy(value)[source]¶
-
SRU Record XML escaping.
- CLIENT = 'client'¶
The client requests that the server simply return this URL in the response, in the href attribute of the xml-stylesheet processing instruction before the response xml.
- SERVER = 'server'¶
The client requests that the server format the response according to the specified stylesheet, assuming the default SRU response schema as input to the stylesheet.
- class clarin.sru.constants.SRUResultCountPrecision(value)[source]¶
-
(SRU 2.0) Indicate the accuracy of the result count reported by total number of records that matched the query.
- EXACT = 'exact'¶
The server guarantees that the reported number of records is accurate.
- UNKNOWN = 'unknown'¶
The server has no idea what the result count is, and does not want to venture an estimate.
- ESTIMATE = 'estimate'¶
The server does not know the result set count, but offers an estimate.
- MAXIMUM = 'maximum'¶
The value supplied is an estimate of the maximum possible count that the result set will attain.
- MINIMUM = 'minimum'¶
The server does not know the result count but guarantees that it is at least this large.
- CURRENT = 'current'¶
The value supplied is an estimate of the count at the time the response was sent, however the result set may continue to grow.
- class clarin.sru.constants.SRUVersion(value)[source]¶
-
SRU version
- VERSION_1_1 = '1.1'¶
- VERSION_1_2 = '1.2'¶
- VERSION_2_0 = '2.0'¶
- class clarin.sru.constants.SRUDiagnostics(value)[source]¶
-
Constants for SRU diagnostics
See also
SRU Diagnostics: http://www.loc.gov/standards/sru/diagnostics/
SRU Diagnostics List: http://www.loc.gov/standards/sru/diagnostics/diagnosticsList.html
- GENERAL_SYSTEM_ERROR = 'info:srw/diagnostic/1/1'¶
- SYSTEM_TEMPORARILY_UNAVAILABLE = 'info:srw/diagnostic/1/2'¶
- AUTHENTICATION_ERROR = 'info:srw/diagnostic/1/3'¶
- UNSUPPORTED_OPERATION = 'info:srw/diagnostic/1/4'¶
- UNSUPPORTED_VERSION = 'info:srw/diagnostic/1/5'¶
- UNSUPPORTED_PARAMETER_VALUE = 'info:srw/diagnostic/1/6'¶
- MANDATORY_PARAMETER_NOT_SUPPLIED = 'info:srw/diagnostic/1/7'¶
- UNSUPPORTED_PARAMETER = 'info:srw/diagnostic/1/8'¶
- DATABASE_DOES_NOT_EXIST = 'info:srw/diagnostic/1/235'¶
- QUERY_SYNTAX_ERROR = 'info:srw/diagnostic/1/10'¶
- TOO_MANY_CHARACTERS_IN_QUERY = 'info:srw/diagnostic/1/12'¶
- INVALID_OR_UNSUPPORTED_USE_OF_PARENTHESES = 'info:srw/diagnostic/1/13'¶
- INVALID_OR_UNSUPPORTED_USE_OF_QUOTES = 'info:srw/diagnostic/1/14'¶
- UNSUPPORTED_CONTEXT_SET = 'info:srw/diagnostic/1/15'¶
- UNSUPPORTED_INDEX = 'info:srw/diagnostic/1/16'¶
- UNSUPPORTED_COMBINATION_OF_INDEXES = 'info:srw/diagnostic/1/18'¶
- UNSUPPORTED_RELATION = 'info:srw/diagnostic/1/19'¶
- UNSUPPORTED_RELATION_MODIFIER = 'info:srw/diagnostic/1/20'¶
- UNSUPPORTED_COMBINATION_OF_RELATION_MODIFERS = 'info:srw/diagnostic/1/21'¶
- UNSUPPORTED_COMBINATION_OF_RELATION_AND_INDEX = 'info:srw/diagnostic/1/22'¶
- TOO_MANY_CHARACTERS_IN_TERM = 'info:srw/diagnostic/1/23'¶
- UNSUPPORTED_COMBINATION_OF_RELATION_AND_TERM = 'info:srw/diagnostic/1/24'¶
- NON_SPECIAL_CHARACTER_ESCAPED_IN_TERM = 'info:srw/diagnostic/1/26'¶
- EMPTY_TERM_UNSUPPORTED = 'info:srw/diagnostic/1/27'¶
- MASKING_CHARACTER_NOT_SUPPORTED = 'info:srw/diagnostic/1/28'¶
- MASKED_WORDS_TOO_SHORT = 'info:srw/diagnostic/1/29'¶
- TOO_MANY_MASKING_CHARACTERS_IN_TERM = 'info:srw/diagnostic/1/30'¶
- ANCHORING_CHARACTER_NOT_SUPPORTED = 'info:srw/diagnostic/1/31'¶
- ANCHORING_CHARACTER_IN_UNSUPPORTED_POSITION = 'info:srw/diagnostic/1/32'¶
- COMBINATION_OF_PROXIMITY_ADJACENCY_AND_MASKING_CHARACTERS_NOT_SUPPORTED = 'info:srw/diagnostic/1/33'¶
- COMBINATION_OF_PROXIMITY_ADJACENCY_AND_ANCHORING_CHARACTERS_NOT_SUPPORTED = 'info:srw/diagnostic/1/34'¶
- TERM_CONTAINS_ONLY_STOPWORDS = 'info:srw/diagnostic/1/35'¶
- TERM_IN_INVALID_FORMAT_FOR_INDEX_OR_RELATION = 'info:srw/diagnostic/1/36'¶
- UNSUPPORTED_BOOLEAN_OPERATOR = 'info:srw/diagnostic/1/37'¶
- TOO_MANY_BOOLEAN_OPERATORS_IN_QUERY = 'info:srw/diagnostic/1/38'¶
- PROXIMITY_NOT_SUPPORTED = 'info:srw/diagnostic/1/39'¶
- UNSUPPORTED_PROXIMITY_RELATION = 'info:srw/diagnostic/1/40'¶
- UNSUPPORTED_PROXIMITY_DISTANCE = 'info:srw/diagnostic/1/41'¶
- UNSUPPORTED_PROXIMITY_UNIT = 'info:srw/diagnostic/1/42'¶
- UNSUPPORTED_PROXIMITY_ORDERING = 'info:srw/diagnostic/1/43'¶
- UNSUPPORTED_COMBINATION_OF_PROXIMITY_MODIFIERS = 'info:srw/diagnostic/1/44'¶
- UNSUPPORTED_BOOLEAN_MODIFIER = 'info:srw/diagnostic/1/46'¶
- CANNOT_PROCESS_QUERY_REASON_UNKNOWN = 'info:srw/diagnostic/1/47'¶
- QUERY_FEATURE_UNSUPPORTED = 'info:srw/diagnostic/1/48'¶
- MASKING_CHARACTER_IN_UNSUPPORTED_POSITION = 'info:srw/diagnostic/1/49'¶
- RESULT_SETS_NOT_SUPPORTED = 'info:srw/diagnostic/1/50'¶
- RESULT_SET_DOES_NOT_EXIST = 'info:srw/diagnostic/1/51'¶
- RESULT_SET_TEMPORARILY_UNAVAILABLE = 'info:srw/diagnostic/1/52'¶
- RESULT_SETS_ONLY_SUPPORTED_FOR_RETRIEVAL = 'info:srw/diagnostic/1/53'¶
- COMBINATION_OF_RESULT_SETS_WITH_SEARCH_TERMS_NOT_SUPPORTED = 'info:srw/diagnostic/1/55'¶
- RESULT_SET_CREATED_WITH_UNPREDICTABLE_PARTIAL_RESULTS_AVAILABLE = 'info:srw/diagnostic/1/58'¶
- RESULT_SET_CREATED_WITH_VALID_PARTIAL_RESULTS_AVAILABLE = 'info:srw/diagnostic/1/59'¶
- RESULT_SET_NOT_CREATED_TOO_MANY_MATCHING_RECORDS = 'info:srw/diagnostic/1/60'¶
- FIRST_RECORD_POSITION_OUT_OF_RANGE = 'info:srw/diagnostic/1/61'¶
- RECORD_TEMPORARILY_UNAVAILABLE = 'info:srw/diagnostic/1/64'¶
- RECORD_DOES_NOT_EXIST = 'info:srw/diagnostic/1/65'¶
- UNKNOWN_SCHEMA_FOR_RETRIEVAL = 'info:srw/diagnostic/1/66'¶
- RECORD_NOT_AVAILABLE_IN_THIS_SCHEMA = 'info:srw/diagnostic/1/67'¶
- NOT_AUTHORISED_TO_SEND_RECORD = 'info:srw/diagnostic/1/68'¶
- NOT_AUTHORISED_TO_SEND_RECORD_IN_THIS_SCHEMA = 'info:srw/diagnostic/1/69'¶
- RECORD_TOO_LARGE_TO_SEND = 'info:srw/diagnostic/1/70'¶
- UNSUPPORTED_XML_ESCAPING_VALUE = 'info:srw/diagnostic/1/71'¶
- XPATH_RETRIEVAL_UNSUPPORTED = 'info:srw/diagnostic/1/72'¶
- XPATH_EXPRESSION_CONTAINS_UNSUPPORTED_FEATURE = 'info:srw/diagnostic/1/73'¶
- UNABLE_TO_EVALUATE_XPATH_EXPRESSION = 'info:srw/diagnostic/1/74'¶
- SORT_NOT_SUPPORTED = 'info:srw/diagnostic/1/80'¶
- UNSUPPORTED_SORT_SEQUENCE = 'info:srw/diagnostic/1/82'¶
- TOO_MANY_RECORDS_TO_SORT = 'info:srw/diagnostic/1/83'¶
- TOO_MANY_SORT_KEYS_TO_SORT = 'info:srw/diagnostic/1/84'¶
- CANNOT_SORT_INCOMPATIBLE_RECORD_FORMATS = 'info:srw/diagnostic/1/86'¶
- UNSUPPORTED_SCHEMA_FOR_SORT = 'info:srw/diagnostic/1/87'¶
- UNSUPPORTED_PATH_FOR_SORT = 'info:srw/diagnostic/1/88'¶
- PATH_UNSUPPORTED_FOR_SCHEMA = 'info:srw/diagnostic/1/89'¶
- UNSUPPORTED_DIRECTION = 'info:srw/diagnostic/1/90'¶
- UNSUPPORTED_CASE = 'info:srw/diagnostic/1/91'¶
- UNSUPPORTED_MISSING_VALUE_ACTION = 'info:srw/diagnostic/1/92'¶
- SORT_ENDED_DUE_TO_MISSING_VALUE = 'info:srw/diagnostic/1/93'¶
- SORT_SPEC_INCLUDED_BOTH_IN_QUERY_AND_PROTOCOL_QUERY_PREVAILS = 'info:srw/diagnostic/1/94'¶
- SORT_SPEC_INCLUDED_BOTH_IN_QUERY_AND_PROTOCOL_PROTOCOL_PREVAILS = 'info:srw/diagnostic/1/95'¶
- SORT_SPEC_INCLUDED_BOTH_IN_QUERY_AND_PROTOCOL_ERROR = 'info:srw/diagnostic/1/96'¶
- STYLESHEETS_NOT_SUPPORTED = 'info:srw/diagnostic/1/110'¶
- UNSUPPORTED_STYLESHEET = 'info:srw/diagnostic/1/111'¶
- RESPONSE_POSITION_OUT_OF_RANGE = 'info:srw/diagnostic/1/120'¶
- TOO_MANY_TERMS_REQUESTED = 'info:srw/diagnostic/1/121'¶
- classmethod get_by_uri(uri: str) SRUDiagnostics | None [source]¶
- class clarin.sru.constants.SRUParam(value)[source]¶
-
An enumeration.
- OPERATION = 'operation'¶
- VERSION = 'version'¶
- STYLESHEET = 'stylesheet'¶
- RENDER_BY = 'renderedBy'¶
- HTTP_ACCEPT = 'httpAccept'¶
- RESPONSE_TYPE = 'responseType'¶
- QUERY = 'query'¶
- QUERY_TYPE = 'queryType'¶
- START_RECORD = 'startRecord'¶
- MAXIMUM_RECORDS = 'maximumRecords'¶
- RECORD_XML_ESCAPING = 'recordXMLEscaping'¶
- RECORD_PACKING = 'recordPacking'¶
- RECORD_SCHEMA = 'recordSchema'¶
- RECORD_XPATH = 'recordXPath'¶
- RESULT_SET_TTL = 'resultSetTTL'¶
- SORT_KEYS = 'sortKeys'¶
- SCAN_CLAUSE = 'scanClause'¶
- RESPONSE_POSITION = 'responsePosition'¶
- MAXIMUM_TERMS = 'maximumTerms'¶
- X_UNLIMITED_RESULTSET = 'x-unlimited-resultset'¶
- X_UNLIMITED_TERMLIST = 'x-unlimited-termlist'¶
- X_INDENT_RESPONSE = 'x-indent-response'¶
- class clarin.sru.constants.SRUParamValue(value)[source]¶
-
An enumeration.
- OP_EXPLAIN = 'explain'¶
- OP_SCAN = 'scan'¶
- OP_SEARCH_RETRIEVE = 'searchRetrieve'¶
- VERSION_1_1 = '1.1'¶
- VERSION_1_2 = '1.2'¶
- RECORD_XML_ESCAPING_XML = 'xml'¶
- RECORD_XML_ESCAPING_STRING = 'string'¶
- RECORD_PACKING_PACKED = 'packed'¶
- RECORD_PACKING_UNPACKED = 'unpacked'¶
- RENDER_BY_CLIENT = 'client'¶
- RENDER_BY_SERVER = 'server'¶
clarin.sru.diagnostic¶
- class clarin.sru.diagnostic.SRUDiagnostic(uri: str, details: str | None = None, message: str | None = None)[source]¶
Bases:
object
Class to hold a SRU diagnostic.
See also
SRU Diagnostics: http://www.loc.gov/standards/sru/diagnostics/
SRU Diagnostics List: http://www.loc.gov/standards/sru/diagnostics/diagnosticsList.html
- class clarin.sru.diagnostic.SRUDiagnosticList[source]¶
Bases:
object
Container for non surrogate diagnostics for the request. The will be put in the
diagnostics
part of the response.- abstract add_diagnostic(uri: str, details: str | None = None, message: str | None = None) None [source]¶
Add a non surrogate diagnostic to the response.
- Parameters:
uri – the diagnostic’s identifying URI
details – supplementary information available, often in a format specified by the diagnostic or
None
message – human readable message to display to the end user or
None
clarin.sru.exception¶
- exception clarin.sru.exception.SRUException(uri: str, details: str | None = None, message: str | None = None, *args)[source]¶
Bases:
Exception
An exception raised, if something went wrong processing the request. For diagnostic codes, see constants in SRUConstant.
See also
SRUConstant
- get_diagnostic() SRUDiagnostic [source]¶
Create a SRU diagnostic from this exception.
clarin.sru.queryparser¶
- class clarin.sru.queryparser.SRUQuery(raw_query: str, parsed_query: _T)[source]¶
-
Holder class for a parsed query to be returned from a SRUQueryParser.
- property parsed_query: _T¶
Get the parsed query as an abstract syntax tree.
- class clarin.sru.queryparser.SRUQueryParser(*args, **kwds)[source]¶
-
Interface for implementing pluggable query parsers.
Parameterized by ‘abstract syntax tree (object) for parsed queries.’
- abstract supports_version(version: SRUVersion | None) bool [source]¶
Check if query is supported by a specific version of SRU/CQL.
- abstract parse_query(version: SRUVersion, parameters: Dict[str, str], diagnostics: SRUDiagnosticList) SRUQuery[_T] | None [source]¶
Parse a query into an abstract syntax tree.
- Parameters:
version – the SRU version the request was made
parameters – the request parameters containing the query
diagnostics – a SRUDiagnosticList for storing fatal and non-fatal diagnostics
- Returns:
the parsed query or
None
if the query could not be parsed
- class clarin.sru.queryparser.SRUQueryParserRegistry(parsers: List[SRUQueryParser[Any]])[source]¶
Bases:
object
A registry to keep track of registered SRUQueryParser to be used by the SRUServer.
See also
SRUQueryParser
- property query_parsers: List[SRUQueryParser[Any]]¶
Get a list of all registered query parsers.
- Returns:
List[SRUQueryParser[Any]] –
- a list of registered query
parsers
- find_query_parser(query_type: str) SRUQueryParser[Any] | None [source]¶
Find a query parser by query type.
- Parameters:
query_type – the query type to search for
- Returns:
SRUQueryParser[Any] –
- the matching SRUQueryParser
instance or
None
if no matching parser was found.
- class Builder(register_defaults: bool = True)[source]¶
Bases:
object
Builder for creating SRUQueryParserRegistry instances.
[Constructor]
- Parameters:
register_defaults – if
True
, register SRU/CQL standard query parsers (queryType cql and searchTerms), otherwise do nothing. Defaults to True.
- register_defaults() Builder [source]¶
Registers registers SRU/CQL standard query parsers (queryType cql and searchTerms).
- register(parser: SRUQueryParser[Any]) Builder [source]¶
Register a new query parser
- Parameters:
parser (SRUQueryParser[Any]) – the query parser instance to be registered
- Raises:
SRUConfigException – if a query parser for the same query type was already registered
- build() SRUQueryParserRegistry [source]¶
Create a configured SRUQueryParserRegistry instance from this builder.
- Returns:
SRUQueryParserRegistry –
- a SRUQueryParserRegistry
instance
- class clarin.sru.queryparser.SearchTermsQueryParser(*args, **kwds)[source]¶
Bases:
SRUQueryParser
[List
[str
]]- supports_version(version: SRUVersion | None) bool [source]¶
Check if query is supported by a specific version of SRU/CQL.
- parse_query(version: SRUVersion, parameters: Dict[str, str], diagnostics: SRUDiagnosticList) SRUQuery[List[str]] | None [source]¶
Parse a query into an abstract syntax tree.
- Parameters:
version – the SRU version the request was made
parameters – the request parameters containing the query
diagnostics – a SRUDiagnosticList for storing fatal and non-fatal diagnostics
- Returns:
the parsed query or
None
if the query could not be parsed
- class clarin.sru.queryparser.CQLQuery(raw_query: str, parsed_query: _T)[source]¶
Bases:
SRUQuery
[CQLQuery
]
- class clarin.sru.queryparser.CQLQueryParser(*args, **kwds)[source]¶
Bases:
SRUQueryParser
[CQLQuery
]Default query parser to parse CQL.
- supports_version(version: SRUVersion | None) bool [source]¶
Check if query is supported by a specific version of SRU/CQL.
- parse_query(version: SRUVersion, parameters: Dict[str, str], diagnostics: SRUDiagnosticList) SRUQuery[CQLQuery] | None [source]¶
Parse a query into an abstract syntax tree.
- Parameters:
version – the SRU version the request was made
parameters – the request parameters containing the query
diagnostics – a SRUDiagnosticList for storing fatal and non-fatal diagnostics
- Returns:
the parsed query or
None
if the query could not be parsed
clarin.sru.server.auth¶
clarin.sru.server.config¶
- class clarin.sru.server.config.LegacyNamespaceMode(value)[source]¶
-
An enumeration.
- LOC = 'loc'¶
- OASIS = 'oasis'¶
- class clarin.sru.server.config.LocalizedString(value: str, lang: str, primary: bool = False)[source]¶
Bases:
object
- class clarin.sru.server.config.SRUServerConfigKey(value)[source]¶
-
An enumeration.
- SRU_SUPPORTED_VERSION_MIN = 'eu.clarin.sru.server.sruSupportedVersionMin'¶
Parameter constant for setting the minimum supported SRU version for this SRU server. Must be smaller or equal to SRU_SUPPORTED_VERSION_MAX.
Valid values: “
1.1
”, “1.2
” or ”2.0
” (without quotation marks)
- SRU_SUPPORTED_VERSION_MAX = 'eu.clarin.sru.server.sruSupportedVersionMax'¶
Parameter constant for setting the maximum supported SRU version for this SRU server. Must be larger or equal to SRU_SUPPORTED_VERSION_MIN.
Valid values: “
1.1
”, “1.2
” or “2.0
” (without quotation marks)
- SRU_SUPPORTED_VERSION_DEFAULT = 'eu.clarin.sru.server.sruSupportedVersionDefault'¶
Parameter constant for setting the default SRU version for this SRU server, e.g. for an Explain request without explicit version.
Must not me less than SRU_SUPPORTED_VERSION_MIN or larger than SRU_SUPPORTED_VERSION_MAX. Defaults to SRU_SUPPORTED_VERSION_MAX.
Valid values: “
1.1
”, “1.2
” or “2.0
” (without quotation marks)
- SRU_LEGACY_NAMESPACE_MODE = 'eu.clarin.sru.server.legacyNamespaceMode'¶
Parameter constant for setting the namespace URIs for SRU 1.1 and SRU 1.2.
Valid values: “
loc
” for Library Of Congress URI or “oasis
” for OASIS URIs (without quotation marks).
- SRU_TRANSPORT = 'eu.clarin.sru.server.transport'¶
Parameter constant for configuring the transports for this SRU server.
Valid values: “
http
”, “https
” or “http https
” (without quotation marks)Used as part of the Explain response.
- SRU_HOST = 'eu.clarin.sru.server.host'¶
Parameter constant for configuring the host of this SRU server.
Valid values: any fully qualified hostname, e.g.
sru.example.org
.Used as part of the Explain response.
- SRU_PORT = 'eu.clarin.sru.server.port'¶
Parameter constant for configuring the port number of this SRU server.
Valid values: number between 1 and 65535 (typically 80 or 8080)
Used as part of the Explain response.
- SRU_DATABASE = 'eu.clarin.sru.server.database'¶
Parameter constant for configuring the database of this SRU server. This is usually the path component of the SRU servers URI.
Valid values: typically the path component if the SRU server URI.
Used as part of the Explain response.
- SRU_NUMBER_OF_RECORDS = 'eu.clarin.sru.server.numberOfRecords'¶
Parameter constant for configuring the default number of records the SRU server will provide in the response to a searchRetrieve request if the client does not provide this value.
Valid values: a integer greater than 0 (default value is 100)
- SRU_MAXIMUM_RECORDS = 'eu.clarin.sru.server.maximumRecords'¶
Parameter constant for configuring the maximum number of records the SRU server will support in the response to a searchRetrieve request. If a client requests more records, the number will be limited to this value.
Valid values: a integer greater than 0 (default value is 250)
- SRU_NUMBER_OF_TERMS = 'eu.clarin.sru.server.numberOfTerms'¶
Parameter constant for configuring the default number of terms the SRU server will provide in the response to a scan request if the client does not provide this value.
Valid values: a integer greater than 0 (default value is 250)
- SRU_MAXIMUM_TERMS = 'eu.clarin.sru.server.maximumTerms'¶
Parameter constant for configuring the maximum number of terms the SRU server will support in the response to a scan request. If a client requests more records, the number will be limited to this value.
Valid values: a integer greater than 0 (default value is 500)
- SRU_ECHO_REQUESTS = 'eu.clarin.sru.server.echoRequests'¶
Parameter constant for configuring, if the SRU server will echo the request.
Valid values:
true
orfalse
- SRU_INDENT_RESPONSE = 'eu.clarin.sru.server.indentResponse'¶
Parameter constant for configuring, if the SRU server pretty-print the XML response. Setting this parameter can be useful for manual debugging of the XML response, however it is not recommended for production setups.
Valid values: any integer greater or equal to
-1
(default) and less or equal to8
- SRU_ALLOW_OVERRIDE_MAXIMUM_RECORDS = 'eu.clarin.sru.server.allowOverrideMaximumRecords'¶
Parameter constant for configuring, if the SRU server will allow the client to override the maximum number of records the server supports. This parameter is solely intended for debugging and setting it to
true
is strongly discouraged for production setups.Valid values:
true
orfalse
(default)
- SRU_ALLOW_OVERRIDE_MAXIMUM_TERMS = 'eu.clarin.sru.server.allowOverrideMaximumTerms'¶
Parameter constant for configuring, if the SRU server will allow the client to override the maximum number of terms the server supports. This parameter is solely intended for debugging and setting it to
true
it is strongly discouraged for production setups.Valid values:
true
orfalse
(default)
- SRU_ALLOW_OVERRIDE_INDENT_RESPONSE = 'eu.clarin.sru.server.allowOverrideIndentResponse'¶
Parameter constant for configuring, if the SRU server will allow the client to override the pretty-printing setting of the server. This parameter is solely intended for debugging and setting it to
true
it is strongly discouraged for production setups.Valid values:
true
orfalse
(default)
- SRU_RESPONSE_BUFFER_SIZE = 'eu.clarin.sru.server.responseBufferSize'¶
Parameter constant for configuring the size of response buffer. The Servlet will buffer up to this amount of data before sending a response to the client. This value specifies the size of the buffer in bytes.
Valid values: any positive integer (default 65536)
- class clarin.sru.server.config.DatabaseInfo(title: List[clarin.sru.server.config.LocalizedString] | NoneType = None, description: List[clarin.sru.server.config.LocalizedString] | NoneType = None, author: List[clarin.sru.server.config.LocalizedString] | NoneType = None, extent: List[clarin.sru.server.config.LocalizedString] | NoneType = None, history: List[clarin.sru.server.config.LocalizedString] | NoneType = None, langUsage: List[clarin.sru.server.config.LocalizedString] | NoneType = None, restrictions: List[clarin.sru.server.config.LocalizedString] | NoneType = None, subjects: List[clarin.sru.server.config.LocalizedString] | NoneType = None, links: List[clarin.sru.server.config.LocalizedString] | NoneType = None, implementation: List[clarin.sru.server.config.LocalizedString] | NoneType = None)[source]¶
Bases:
object
- title: List[LocalizedString] | None = None¶
- description: List[LocalizedString] | None = None¶
- author: List[LocalizedString] | None = None¶
- extent: List[LocalizedString] | None = None¶
- history: List[LocalizedString] | None = None¶
- langUsage: List[LocalizedString] | None = None¶
- restrictions: List[LocalizedString] | None = None¶
- subjects: List[LocalizedString] | None = None¶
- links: List[LocalizedString] | None = None¶
- implementation: List[LocalizedString] | None = None¶
- class clarin.sru.server.config.SchemaInfo(identifier: str, name: str, location: str, sort: bool, retrieve: bool, title: List[clarin.sru.server.config.LocalizedString] | NoneType = None)[source]¶
Bases:
object
- title: List[LocalizedString] | None = None¶
- class clarin.sru.server.config.IndexInfo(sets: List[clarin.sru.server.config.IndexInfo.Set] | NoneType = None, indexes: List[clarin.sru.server.config.IndexInfo.Index] | NoneType = None)[source]¶
Bases:
object
- class Set(identifier: str, name: str, title: List[clarin.sru.server.config.LocalizedString] | NoneType = None)[source]¶
Bases:
object
- title: List[LocalizedString] | None = None¶
- class Index(can_search: bool, can_scan: bool, can_sort: bool, maps: List[clarin.sru.server.config.IndexInfo.Index.Map] | NoneType = None, title: List[clarin.sru.server.config.LocalizedString] | NoneType = None)[source]¶
Bases:
object
- title: List[LocalizedString] | None = None¶
- class clarin.sru.server.config.SRUServerConfig(min_version: SRUVersion, max_version: SRUVersion, default_version: SRUVersion, legacy_namespace_mode: LegacyNamespaceMode, transport: str, host: str, port: int, database: str, number_of_records: int, maximum_records: int, number_of_terms: int, maximum_terms: int, echo_requests: bool, indent_response: int, response_buffer_size: int, allow_override_maximum_records: bool, allow_override_maximum_terms: bool, allow_override_indent_response: bool, database_info: DatabaseInfo, index_info: IndexInfo, schema_info: List[SchemaInfo] | None = None)[source]¶
Bases:
object
SRU server configuration.
The XML configuration file must validate against the
sru-server-config.xsd
W3C schema bundled with the package and need to have thehttp://www.clarin.eu/sru-server/1.0/
XML namespace.- min_version: SRUVersion¶
- max_version: SRUVersion¶
- default_version: SRUVersion¶
- legacy_namespace_mode: LegacyNamespaceMode¶
- database_info: DatabaseInfo¶
- schema_info: List[SchemaInfo] | None = None¶
- property default_record_xml_escaping: SRURecordXmlEscaping¶
- property default_record_packing: SRURecordPacking¶
- find_schema_info(value: str) SchemaInfo | None [source]¶
- static fromparams(params: Dict[str, str], database_info: DatabaseInfo, index_info: IndexInfo | None = None, schema_info: List[SchemaInfo] | None = None) SRUServerConfig [source]¶
Creates an SRU configuration object with default values and overrides from params.
- Parameters:
params – additional settings
database_info – optinal DatabaseInfo
index_info – optinal IndexInfo
schema_info – optional list SchemaInfo
- Returns:
SRUServerConfig – a initialized SRUEndpointConfig instance
- Raises:
TypeError – if params is None
SRUConfigException – if an error occurred
- static parse(params: Dict[str, str], config_file: BytesIO | PathLike | str) SRUServerConfig [source]¶
Parse a SRU server XML configuration file and create an configuration object from it.
- Parameters:
params – additional settings
config_file – an
URL
pointing to the XML configuration file
- Returns:
SRUServerConfig – a initialized SRUEndpointConfig instance
- Raises:
TypeError – if params or configFile is None
SRUConfigException – if an error occurred
- static parse_version(params: Dict[str, str], name: str, mandatory: bool, default: SRUVersion) SRUVersion [source]¶
clarin.sru.server.request¶
- class clarin.sru.server.request.SRURequest[source]¶
Bases:
object
Provides information about a SRU request.
- abstract get_operation() SRUOperation [source]¶
Get the
operation
parameter of this request. Available for explain, searchRetrieve and scan requests.
- abstract get_version() SRUVersion [source]¶
Get the version parameter of this request. Available for explain, searchRetrieve and scan requests.
- is_version(version: SRUVersion) bool [source]¶
Check if this request is of a specific version.
- Parameters:
version – the version to check
- Returns:
bool –
True
if this request is in the requestedversion,
False
otherwise
- is_version_between(min: SRUVersion, max: SRUVersion) bool [source]¶
Check if version of this request is at least min and at most max.
- Parameters:
min – the minimum version
max – the maximum version
- Returns:
bool –
True
if this request is in the requestedversion,
False
otherwise
- abstract get_record_xml_escaping() SRURecordXmlEscaping [source]¶
Get the recordXmlEscpaing (SRU 2.0) or recordPacking (SRU 1.1 and SRU 1.2) parameter of this request. Only available for explain and searchRetrieve requests.
- Returns:
SRURecordXmlEscaping – the record XML escaping method
- abstract get_record_packing() SRURecordPacking [source]¶
Get the recordPacking (SRU 2.0) parameter of this request. Only available for searchRetrieve requests.
- Returns:
SRURecordPacking – the record packing method
- abstract get_query() SRUQuery[Any] | None [source]¶
Get the query parameter of this request. Only available for searchRetrieve requests.
- Returns:
SRUQuery[Any] –
- an SRUQuery instance tailored for the
used queryType or None if not a searchRetrieve request
- get_query_type() str | None [source]¶
Get the queryType parameter of this request. Only available for searchRetrieve requests.
- Returns:
str –
- the queryType of the parsed query or None if not a
searchRetrieve request
- is_query_type(query_type: str) bool [source]¶
Check if the request was made with the given queryType. Only available for searchRetrieve requests.
- Parameters:
query_type – the queryType to compare with
- Returns:
bool –
True
if the queryType matches,False
otherwise
- abstract get_start_record() int [source]¶
Get the startRecord parameter of this request. Only available for searchRetrieve requests. If the client did not provide a value for the request, it is set to
1
.- Returns:
int – the number of the start record
- abstract get_maximum_records() int [source]¶
Get the maximumRecords parameter of this request. Only available for searchRetrieve requests. If no value was supplied with the request, the server will automatically set a default value.
- Returns:
int – the maximum number of records
- abstract get_record_schema_identifier() str | None [source]¶
Get the record schema identifier derived from the recordSchema parameter of this request. Only available for searchRetrieve requests. If the request was send with the short record schema name, it will automatically expanded to the record schema identifier.
- Returns:
str –
- the record schema identifier or None if no
recordSchema parameter was supplied for this request
- abstract get_record_xpath() str | None [source]¶
Get the recordXPath parameter of this request. Only available for searchRetrieve requests and version 1.1 requests.
- Returns:
str –
- the record XPath or None of no value was supplied
for this request
- abstract get_resultSet_TTL() int [source]¶
Get the resultSetTTL parameter of this request. Only available for searchRetrieve requests.
- Returns:
int –
- the result set TTL or
-1
if no value was supplied for this request
- the result set TTL or
- abstract get_sortKeys() str | None [source]¶
Get the sortKeys parameter of this request. Only available for searchRetrieve requests and version 1.1 requests.
- Returns:
str –
- the record XPath or None of no value was supplied
for this request
- abstract get_scan_clause() CQLQuery | None [source]¶
Get the scanClause parameter of this request. Only available for scan requests.
- Returns:
cql.CQLQuery –
- the parsed scan clause or None if not a
scan request
- abstract get_response_position() int [source]¶
Get the responsePosition parameter of this request. Only available for scan requests. If the client did not provide a value for the request, it is set to
1
.- Returns:
int – the response position
- abstract get_maximum_terms() int [source]¶
Get the maximumTerms parameter of this request. Available for any type of request.
- Returns:
int –
- the maximum number of terms or
-1
if no value was supplied for this request
- the maximum number of terms or
- abstract get_stylesheet() str | None [source]¶
Get the stylesheet parameter of this request. Available for explain, searchRetrieve and scan requests.
- Returns:
str –
- the stylesheet or None if no value was supplied
for this request
- abstract get_renderBy() SRURenderBy | None [source]¶
Get the renderBy parameter of this request.
- Returns:
SRURenderBy –
- the renderBy parameter or None if no value
was supplied for this request
- abstract get_response_type() str | None [source]¶
(SRU 2.0) The request parameter responseType, paired with the Internet media type specified for the response (via either the httpAccept parameter or http accept header) determines the schema for the response.
- Returns:
str –
- the value of the responeType request parameter or
None if no value was supplied for this request
- abstract get_http_accept() str | None [source]¶
(SRU 2.0) The request parameter httpAccept may be supplied to indicate the preferred format of the response. The value is an Internet media type.
- Returns:
str –
- the value of the httpAccept request parameter or
None if no value was supplied for
- abstract get_protocol_schema() str [source]¶
Get the protocol schema which was used of this request. Available for explain, searchRetrieve and scan requests.
- Returns:
str – the protocol scheme
- abstract get_extra_request_data_names() List[str] [source]¶
Get the names of extra parameters of this request. Available for explain, searchRetrieve and scan requests.
- Returns:
List[str] – a possibly empty list of parameter names
- abstract get_extra_request_data(name: str) str | None [source]¶
Get the value of an extra parameter of this request. Available for explain, searchRetrieve and scan requests.
- Parameters:
name – name of the extra parameter. Must be prefixed with
x-
- Returns:
str –
- the value of the parameter of None of extra
parameter with that name exists
- class clarin.sru.server.request.ParameterInfo(parameter: clarin.sru.server.request.ParameterInfo.Parameter, mandatory: bool, min: clarin.sru.constants.SRUVersion, max: clarin.sru.constants.SRUVersion)[source]¶
Bases:
object
- class Parameter(value)[source]¶
-
An enumeration.
- STYLESHEET = 'stylesheet'¶
- RENDER_BY = 'render_by'¶
- HTTP_ACCEPT = 'http_accept'¶
- RESPONSE_TYPE = 'response_type'¶
- START_RECORD = 'start_record'¶
- MAXIMUM_RECORDS = 'maximum_records'¶
- RECORD_XML_ESCAPING = 'record_xml_escaping'¶
- RECORD_PACKING = 'record_packing'¶
- RECORD_SCHEMA = 'record_schema'¶
- RECORD_XPATH = 'record_xpath'¶
- RESULT_SET_TTL = 'result_set_ttl'¶
- SORT_KEYS = 'sort_keys'¶
- SCAN_CLAUSE = 'scan_clause'¶
- RESPONSE_POSITION = 'response_position'¶
- MAXIMUM_TERMS = 'maximum_terms'¶
- min: SRUVersion¶
- max: SRUVersion¶
- name(version: SRUVersion) str | None [source]¶
- is_for_version(version: SRUVersion) bool [source]¶
- class clarin.sru.server.request.ParameterInfoSets(value)[source]¶
Bases:
Enum
An enumeration.
- EXPLAIN = [ParameterInfo(parameter=<Parameter.STYLESHEET: 'stylesheet'>, mandatory=False, min=<SRUVersion.VERSION_1_1: '1.1'>, max=<SRUVersion.VERSION_1_2: '1.2'>), ParameterInfo(parameter=<Parameter.RECORD_XML_ESCAPING: 'record_xml_escaping'>, mandatory=False, min=<SRUVersion.VERSION_1_1: '1.1'>, max=<SRUVersion.VERSION_1_2: '1.2'>)]¶
- SCAN = [ParameterInfo(parameter=<Parameter.STYLESHEET: 'stylesheet'>, mandatory=False, min=<SRUVersion.VERSION_1_1: '1.1'>, max=<SRUVersion.VERSION_2_0: '2.0'>), ParameterInfo(parameter=<Parameter.HTTP_ACCEPT: 'http_accept'>, mandatory=False, min=<SRUVersion.VERSION_2_0: '2.0'>, max=<SRUVersion.VERSION_2_0: '2.0'>), ParameterInfo(parameter=<Parameter.SCAN_CLAUSE: 'scan_clause'>, mandatory=True, min=<SRUVersion.VERSION_1_1: '1.1'>, max=<SRUVersion.VERSION_2_0: '2.0'>), ParameterInfo(parameter=<Parameter.RESPONSE_POSITION: 'response_position'>, mandatory=False, min=<SRUVersion.VERSION_1_1: '1.1'>, max=<SRUVersion.VERSION_2_0: '2.0'>), ParameterInfo(parameter=<Parameter.MAXIMUM_TERMS: 'maximum_terms'>, mandatory=False, min=<SRUVersion.VERSION_1_1: '1.1'>, max=<SRUVersion.VERSION_2_0: '2.0'>)]¶
- SEARCH_RETRIEVE = [ParameterInfo(parameter=<Parameter.STYLESHEET: 'stylesheet'>, mandatory=False, min=<SRUVersion.VERSION_1_1: '1.1'>, max=<SRUVersion.VERSION_1_2: '1.2'>), ParameterInfo(parameter=<Parameter.HTTP_ACCEPT: 'http_accept'>, mandatory=False, min=<SRUVersion.VERSION_2_0: '2.0'>, max=<SRUVersion.VERSION_2_0: '2.0'>), ParameterInfo(parameter=<Parameter.RENDER_BY: 'render_by'>, mandatory=False, min=<SRUVersion.VERSION_2_0: '2.0'>, max=<SRUVersion.VERSION_2_0: '2.0'>), ParameterInfo(parameter=<Parameter.RESPONSE_TYPE: 'response_type'>, mandatory=False, min=<SRUVersion.VERSION_2_0: '2.0'>, max=<SRUVersion.VERSION_2_0: '2.0'>), ParameterInfo(parameter=<Parameter.START_RECORD: 'start_record'>, mandatory=False, min=<SRUVersion.VERSION_1_1: '1.1'>, max=<SRUVersion.VERSION_2_0: '2.0'>), ParameterInfo(parameter=<Parameter.MAXIMUM_RECORDS: 'maximum_records'>, mandatory=False, min=<SRUVersion.VERSION_1_1: '1.1'>, max=<SRUVersion.VERSION_2_0: '2.0'>), ParameterInfo(parameter=<Parameter.RECORD_XML_ESCAPING: 'record_xml_escaping'>, mandatory=False, min=<SRUVersion.VERSION_1_1: '1.1'>, max=<SRUVersion.VERSION_2_0: '2.0'>), ParameterInfo(parameter=<Parameter.RECORD_PACKING: 'record_packing'>, mandatory=False, min=<SRUVersion.VERSION_2_0: '2.0'>, max=<SRUVersion.VERSION_2_0: '2.0'>), ParameterInfo(parameter=<Parameter.RECORD_SCHEMA: 'record_schema'>, mandatory=False, min=<SRUVersion.VERSION_1_1: '1.1'>, max=<SRUVersion.VERSION_2_0: '2.0'>), ParameterInfo(parameter=<Parameter.RESULT_SET_TTL: 'result_set_ttl'>, mandatory=False, min=<SRUVersion.VERSION_1_1: '1.1'>, max=<SRUVersion.VERSION_2_0: '2.0'>), ParameterInfo(parameter=<Parameter.RECORD_XPATH: 'record_xpath'>, mandatory=False, min=<SRUVersion.VERSION_1_1: '1.1'>, max=<SRUVersion.VERSION_1_2: '1.2'>), ParameterInfo(parameter=<Parameter.SORT_KEYS: 'sort_keys'>, mandatory=False, min=<SRUVersion.VERSION_1_1: '1.1'>, max=<SRUVersion.VERSION_2_0: '2.0'>)]¶
- classmethod for_operation(operation: SRUOperation | None) List[ParameterInfo] | None [source]¶
- class clarin.sru.server.request.SRURequestImpl(config: SRUServerConfig, query_parsers: SRUQueryParserRegistry, request: Request, authentication_info_provider: SRUAuthenticationInfoProvider | None = None)[source]¶
Bases:
SRUDiagnosticList
,SRURequest
- get_operation() SRUOperation [source]¶
Get the
operation
parameter of this request. Available for explain, searchRetrieve and scan requests.
- get_version() SRUVersion [source]¶
Get the version parameter of this request. Available for explain, searchRetrieve and scan requests.
- get_authentication() SRUAuthenticationInfo | None [source]¶
- get_query() SRUQuery[Any] | None [source]¶
Get the query parameter of this request. Only available for searchRetrieve requests.
- Returns:
SRUQuery[Any] –
- an SRUQuery instance tailored for the
used queryType or None if not a searchRetrieve request
- get_record_xml_escaping() SRURecordXmlEscaping [source]¶
Get the recordXmlEscpaing (SRU 2.0) or recordPacking (SRU 1.1 and SRU 1.2) parameter of this request. Only available for explain and searchRetrieve requests.
- Returns:
SRURecordXmlEscaping – the record XML escaping method
- get_record_packing() SRURecordPacking [source]¶
Get the recordPacking (SRU 2.0) parameter of this request. Only available for searchRetrieve requests.
- Returns:
SRURecordPacking – the record packing method
- get_start_record() int [source]¶
Get the startRecord parameter of this request. Only available for searchRetrieve requests. If the client did not provide a value for the request, it is set to
1
.- Returns:
int – the number of the start record
- get_maximum_records() int [source]¶
Get the maximumRecords parameter of this request. Only available for searchRetrieve requests. If no value was supplied with the request, the server will automatically set a default value.
- Returns:
int – the maximum number of records
- get_record_schema_identifier() str | None [source]¶
Get the record schema identifier derived from the recordSchema parameter of this request. Only available for searchRetrieve requests. If the request was send with the short record schema name, it will automatically expanded to the record schema identifier.
- Returns:
str –
- the record schema identifier or None if no
recordSchema parameter was supplied for this request
- get_record_xpath() str | None [source]¶
Get the recordXPath parameter of this request. Only available for searchRetrieve requests and version 1.1 requests.
- Returns:
str –
- the record XPath or None of no value was supplied
for this request
- get_resultSet_TTL() int [source]¶
Get the resultSetTTL parameter of this request. Only available for searchRetrieve requests.
- Returns:
int –
- the result set TTL or
-1
if no value was supplied for this request
- the result set TTL or
- get_sortKeys() str | None [source]¶
Get the sortKeys parameter of this request. Only available for searchRetrieve requests and version 1.1 requests.
- Returns:
str –
- the record XPath or None of no value was supplied
for this request
- get_scan_clause() CQLQuery | None [source]¶
Get the scanClause parameter of this request. Only available for scan requests.
- Returns:
cql.CQLQuery –
- the parsed scan clause or None if not a
scan request
- get_response_position() int [source]¶
Get the responsePosition parameter of this request. Only available for scan requests. If the client did not provide a value for the request, it is set to
1
.- Returns:
int – the response position
- get_maximum_terms() int [source]¶
Get the maximumTerms parameter of this request. Available for any type of request.
- Returns:
int –
- the maximum number of terms or
-1
if no value was supplied for this request
- the maximum number of terms or
- get_stylesheet() str | None [source]¶
Get the stylesheet parameter of this request. Available for explain, searchRetrieve and scan requests.
- Returns:
str –
- the stylesheet or None if no value was supplied
for this request
- get_renderBy() SRURenderBy | None [source]¶
Get the renderBy parameter of this request.
- Returns:
SRURenderBy –
- the renderBy parameter or None if no value
was supplied for this request
- get_response_type() str | None [source]¶
(SRU 2.0) The request parameter responseType, paired with the Internet media type specified for the response (via either the httpAccept parameter or http accept header) determines the schema for the response.
- Returns:
str –
- the value of the responeType request parameter or
None if no value was supplied for this request
- get_version_raw() SRUVersion | None [source]¶
- get_http_accept() str | None [source]¶
(SRU 2.0) The request parameter httpAccept may be supplied to indicate the preferred format of the response. The value is an Internet media type.
- Returns:
str –
- the value of the httpAccept request parameter or
None if no value was supplied for
- get_protocol_schema() str [source]¶
Get the protocol schema which was used of this request. Available for explain, searchRetrieve and scan requests.
- Returns:
str – the protocol scheme
- add_diagnostic(uri: str, details: str | None = None, message: str | None = None) None [source]¶
Add a non surrogate diagnostic to the response.
- Parameters:
uri – the diagnostic’s identifying URI
details – supplementary information available, often in a format specified by the diagnostic or
None
message – human readable message to display to the end user or
None
- add_diagnostic_obj(diagnostic: SRUDiagnostic)[source]¶
- check_parameters() bool [source]¶
Validate incoming request parameters
- Returns:
bool –
True
if successful,False
if somethingwent wrong
- check_parameters_version_operation() bool [source]¶
Validate incoming request parameters version and operation.
- Returns:
bool –
True
if successful,False
if somethingwent wrong
- get_extra_request_data_names() List[str] [source]¶
Get the names of extra parameters of this request. Available for explain, searchRetrieve and scan requests.
- Returns:
List[str] – a possibly empty list of parameter names
- get_extra_request_data(name: str) str | None [source]¶
Get the value of an extra parameter of this request. Available for explain, searchRetrieve and scan requests.
- Parameters:
name – name of the extra parameter. Must be prefixed with
x-
- Returns:
str –
- the value of the parameter of None of extra
parameter with that name exists
clarin.sru.server.result¶
- class clarin.sru.server.result.SRUAbstractResult(diagnostics: SRUDiagnosticList)[source]¶
Bases:
object
Base class for SRU responses.
- add_diagnostic(uri: str, details: str | None = None, message: str | None = None) None [source]¶
Add a non surrogate diagnostic to the response.
- Parameters:
uri – the diagnostic’s identifying URI
details – supplementary information available, often in a format specified by the diagnostic or
None
message – human readable message to display to the end user or
None
- property has_extra_response_data: bool¶
Check, if extra response data should be serialized for this request. Default implementation is provided for convince and always returns
False
.- Returns:
bool –
True
if extra response data should be serialized
- write_extra_response_data(writer: SRUXMLStreamWriter) None [source]¶
Serialize extra response data for this request. A no-op default implementation is provided for convince.
- Parameters:
writer – Writer to serialize extra response data
- class clarin.sru.server.result.SRUExplainResult(diagnostics: SRUDiagnosticList)[source]¶
Bases:
ABC
,SRUAbstractResult
A result set of an
explain
operation. A database implementation may use it implement extensions to the SRU protocol, i.e. providing extraResponseData.This class needs to be implemented for the target data source.
See also
SRU Explain Operation: http://www.loc.gov/standards/sru/explain/
- class clarin.sru.server.result.SRUScanResultSet(diagnostics: SRUDiagnosticList)[source]¶
Bases:
ABC
,SRUAbstractResult
A result set of a
scan
operation. It is used to iterate over the term set and provides a method to serialize the terms.A SRUScanResultSet object maintains a cursor pointing to its current term. Initially the cursor is positioned before the first term. The next method moves the cursor to the next term, and because it returns
False
when there are no more terms in the SRUScanResultSet object, it can be used in a while loop to iterate through the term set.This class needs to be implemented for the target search engine.
See also
SRU Scan Operation: http://www.loc.gov/standards/sru/companionSpecs/scan.html
- class WhereInList(value)[source]¶
-
A flag to indicate the position of the term within the complete term list.
- FIRST = 'first'¶
The first term (first)
- LAST = 'last'¶
The last term (last)
- ONLY = 'only'¶
The only term (only)
- INNER = 'inner'¶
Any other term (inner)
- abstract next_term() bool [source]¶
Moves the cursor forward one term from its current position. A result set cursor is initially positioned before the first record; the first call to the method next makes the first term the current term; the second call makes the second term the current term, and so on.
When a call to the next method returns
False
, the cursor is positioned after the last term.- Returns:
bool –
True
if the new current term is valid;False
if there are no more terms
- Raises:
SRUException – if an error occurred while fetching the next term
- abstract get_value() str [source]¶
Get the current term exactly as it appears in the index.
- Returns:
str – current term
- abstract get_number_of_records() int [source]¶
Get the number of records for the current term which would be matched if the index in the request’s scanClause was searched with the term in the value field.
- Returns:
int –
- a non-negative number of records or
-1
, if the number is unknown.
- a non-negative number of records or
- abstract get_display_term() str | None [source]¶
Get the string for the current term to display to the end user in place of the term itself.
- Returns:
str – display string or
None
- abstract get_WhereInList() WhereInList | None [source]¶
Get the flag to indicate the position of the term within the complete term list.
- Returns:
WhereInList – position within term list or
None
- has_extra_term_data() bool [source]¶
Check, if extra term data should be serialized for the current term. A default implementation is provided for convince and always returns
False
.- Returns:
bool –
True
if the term has extra term data- Raises:
StopIteration – term set is already advanced past all past terms
See also
write_extra_term_data
- abstract write_extra_term_data(writer: SRUXMLStreamWriter)[source]¶
Serialize extra term data for the current term. A no-op default implementation is provided for convince.
- Parameters:
writer – Writer to serialize extra term data for current term
- Raises:
StopIteration – term set already advanced past all terms
- class clarin.sru.server.result.SRUSearchResultSet(diagnostics: SRUDiagnosticList)[source]¶
Bases:
ABC
,SRUAbstractResult
A result set of a
searchRetrieve
operation. It it used to iterate over the result set and provides a method to serialize the record in the requested format.A SRUSearchResultSet object maintains a cursor pointing to its current record. Initially the cursor is positioned before the first record. The next method moves the cursor to the next record, and because it returns
False
when there are no more records in the SRUSearchResultSet object, it can be used in a while loop to iterate through the result set.This class needs to be implemented for the target search engine.
See also
SRU Search Retrieve Operation: http://www.loc.gov/standards/sru/
SRU 1.1 SR: http://www.loc.gov/standards/sru/sru-1-1.html
SRU 1.2 SR: http://www.loc.gov/standards/sru/sru-1-2.html
SRU 2.0 SR: http://www.loc.gov/standards/sru/sru-2-0.html
Differences SRU 2.0 to SRU 1.2: http://www.loc.gov/standards/sru/differences.html
- abstract get_total_record_count() int [source]¶
The number of records matched by the query. If the query fails this must be
0
. If the search engine cannot determine the total number of matched by a query, it must return-1
.- Returns:
int –
- the total number of results or
0
if the query failed or
-1
if the search engine cannot determine the total number of results
- the total number of results or
- abstract get_record_count() int [source]¶
The number of records matched by the query but at most as the number of records requested to be returned (
maximumRecords
parameter). If the query fails this must be0
.- Returns:
int – the number of results or
0
if the query failed
- get_resultSet_id() str | None [source]¶
The result set id of this result. The default implementation returns
None
.- Returns:
str –
- the result set id or
None
if not applicable for this result
- the result set id or
- get_resultSet_TTL() int [source]¶
The result set time to live. In SRU 2.0 it will be serialized as
<resultSetTTL>
element; in SRU 1.2 as<resultSetIdleTime>
element.The default implementation returns-1
.- Returns:
int –
- the result set time to live or
-1
if not applicable for this result
- the result set time to live or
- get_result_count_precision() SRUResultCountPrecision | None [source]¶
(SRU 2.0) Indicate the accuracy of the result count reported by total number of records that matched the query. Default implementation returns
None
.- Returns:
Optional[SRUResultCountPrecision] –
- the result count
precision or
None
if not applicable for this result
See also
SRUResultCountPrecision
- abstract get_record_schema_identifier() str [source]¶
The record schema identifier in which the records are returned (
recordSchema
parameter).- Returns:
str – the record schema identifier
- abstract next_record() bool [source]¶
Moves the cursor forward one record from its current position. A SRUSearchResultSet cursor is initially positioned before the first record; the first call to the method next makes the first record the current record; the second call makes the second record the current record, and so on.
When a call to the next method returns
False
, the cursor is positioned after the last record.- Returns:
bool –
True
if the new current record is valid;False
if there are no more records
- Raises:
SRUException – if an error occurred while fetching the next record
- abstract get_record_identifier() str | None [source]¶
An identifier for the current record by which it can unambiguously be retrieved in a subsequent operation.
- Returns:
str –
- identifier for the record or
None
of none is available
- identifier for the record or
- Raises:
StopIteration – result set is past all records
- get_surrogate_diagnostic() SRUDiagnostic | None [source]¶
Get surrogate diagnostic for current record. If this method returns a diagnostic, the write_record method will not be called. The default implementation returns ``None`.
- Returns:
Optional[SRUDiagnostic] –
- a surrogate diagnostic or
None
- abstract write_record(writer: SRUXMLStreamWriter) None [source]¶
Serialize the current record in the requested format.
- Parameters:
writer – Writer to serialize current record
- Raises:
StopIteration – result set is past all records
- property has_extra_record_data: bool¶
Check, if extra record data should be serialized for the current record. The default implementation returns
False
.- Returns:
bool –
True
if the record has extra record data- Raises:
StopIteration – result set is past all records
See also
write_extra_record_data
- write_extra_record_data(writer: SRUXMLStreamWriter) None [source]¶
Serialize extra record data for the current record. A no-op default implementation is provided for convince.
- Parameters:
writer – Writer to serialize extra record data for current record
- Raises:
StopIteration – result set past already advanced past all records
clarin.sru.server.server¶
- class clarin.sru.server.server.SRUNamespaces(response_NS: str, response_prefix: str, scan_NS: str, scan_prefix: str, diagnostic_NS: str, XCQL_NS: str, diagnostic_prefix: str = 'diag', explain_NS: str = 'http://explain.z3950.org/dtd/2.0/', explain_prefix: str = 'zr')[source]¶
Bases:
object
Interface for decoupling SRU namespaces from implementation to allow to support SRU 1.1/1.2 and SRU 2.0.
- explain_NS: str = 'http://explain.z3950.org/dtd/2.0/'¶
The namespace URI for encoding explain record data fragments.
- static for_legacy_LOC() SRUNamespaces [source]¶
- static for_1_2_OASIS() SRUNamespaces [source]¶
- static for_2_0() SRUNamespaces [source]¶
- static get_namespaces(version: SRUVersion, legacy_ns_mode: LegacyNamespaceMode) SRUNamespaces [source]¶
- class clarin.sru.server.server.SRUSearchEngine[source]¶
Bases:
object
Interface for connecting the SRU protocol implementation to an actual search engine. Base class required for an SRUSearchEngine implementation to be used with the SRUServerApp.
Implementing the explain and scan is optional, but implementing search is mandatory.
The implementation of these methods must be thread-safe.
- abstract explain(config: SRUServerConfig, request: SRURequest, diagnostics: SRUDiagnosticList) SRUExplainResult | None [source]¶
Handle an explain operation. Implementing this method is optional, but is required, if the writeExtraResponseData block of the SRU response needs to be filled. The arguments for this operation are provides by the SRURequest object.
The implementation of this method must be thread-safe.
- Parameters:
config – the SRUEndpointConfig object that contains the endpoint configuration
request – the SRURequest object that contains the request made to the endpoint
diagnostics – the SRUDiagnosticList object for storing non-fatal diagnostics
- Returns:
SRUExplainResult –
- a SRUExplainResult object or
None
if the search engine does not want to provide write_extra_response_data
- a SRUExplainResult object or
- Raises:
SRUException – if an fatal error occurred
- abstract search(config: SRUServerConfig, request: SRURequest, diagnostics: SRUDiagnosticList) SRUSearchResultSet [source]¶
Handle a searchRetrieve operation. Implementing this method is mandatory. The arguments for this operation are provides by the SRURequest object.
The implementation of this method must be thread-safe.
- Parameters:
config – the SRUEndpointConfig object that contains the endpoint configuration
request – the SRURequest object that contains the request made to the endpoint
diagnostics – the SRUDiagnosticList object for storing non-fatal diagnostics
- Returns:
SRUSearchResultSet – a SRUSearchResultSet object
- Raises:
SRUException – if an fatal error occurred
- abstract scan(config: SRUServerConfig, request: SRURequest, diagnostics: SRUDiagnosticList) SRUScanResultSet | None [source]¶
Handle a scan operation. Implementing this method is optional. If you don’t need to handle the scan operation, just return
None
and the SRU server will return the appropiate diagnostic to the client. The arguments for this operation are provides by the SRURequest object.The implementation of this method must be thread-safe.
- Parameters:
config – the SRUEndpointConfig object that contains the endpoint configuration
request – the SRURequest object that contains the request made to the endpoint
diagnostics – the SRUDiagnosticList object for storing non-fatal diagnostics
- Returns:
SRUScanResultSet –
- a SRUScanResultSet object or
None
if this operation is not supported by this search engine
- a SRUScanResultSet object or
- Raises:
SRUException – if an fatal error occurred
- init(config: SRUServerConfig, query_parser_registry_builder: Builder, params: Dict[str, str]) None [source]¶
Initialize the search engine.
- Parameters:
config – the SRUServerConfig object for this search engine
query_parser_registry_builder – the SRUQueryParserRegistry.Builder object to be used for this search engine. Use to register additional query parsers with the SRUServer
params – additional parameters from the server
- Raises:
SRUConfigException – an error occurred during initialization of the search engine
- class clarin.sru.server.server.SRUServer(config: SRUServerConfig, query_parsers: SRUQueryParserRegistry, search_engine: SRUSearchEngine, authentication_info_provider: SRUAuthenticationInfoProvider | None = None)[source]¶
Bases:
object
SRU/CQL protocol implementation for the server-side (SRU/S). This class implements SRU/CQL version 1.1 and and 1.2.
See also
SRU/CQL protocol 1.2: http://www.loc.gov/standards/sru/
- TEMP_OUTPUT_BUFFERING = False¶
- explain(request: SRURequestImpl, response: Response)[source]¶
- scan(request: SRURequestImpl, response: Response)[source]¶
- search(request: SRURequestImpl, response: Response)[source]¶
clarin.sru.server.wsgi¶
- class clarin.sru.server.wsgi.SRUServerApp(SRUSearchEngine_clazz: Type[SRUSearchEngine] | SRUSearchEngine, config_file: str, params: Dict[SRUServerConfigKey | str, str], develop: bool = False)[source]¶
Bases:
object
clarin.sru.xml.writer¶
- class clarin.sru.xml.writer.SRUXMLStreamWriter(output_stream: TextIOBase, record_escaping: SRURecordXmlEscaping, indent: int = -1, encoding: str = 'utf-8', short_empty_elements: bool = False)[source]¶
Bases:
ContentHandler
- class IndentingState(value)[source]¶
Bases:
Enum
An enumeration.
- SEEN_NOTHING = 1¶
- SEEN_ELEMENT = 2¶
- SEEN_DATA = 3¶
- setDocumentLocator(locator)[source]¶
Called by the parser to give the application a locator for locating the origin of document events.
SAX parsers are strongly encouraged (though not absolutely required) to supply a locator: if it does so, it must supply the locator to the application by invoking this method before invoking any of the other methods in the DocumentHandler interface.
The locator allows the application to determine the end position of any document-related event, even if the parser is not reporting an error. Typically, the application will use this information for reporting its own errors (such as character content that does not match an application’s business rules). The information returned by the locator is probably not sufficient for use with a search engine.
Note that the locator will return correct information only during the invocation of the events in this interface. The application should not attempt to use it at any other time.
- startPrefixMapping(prefix, uri)[source]¶
Begin the scope of a prefix-URI Namespace mapping.
The information from this event is not necessary for normal Namespace processing: the SAX XML reader will automatically replace prefixes for element and attribute names when the http://xml.org/sax/features/namespaces feature is true (the default).
There are cases, however, when applications need to use prefixes in character data or in attribute values, where they cannot safely be expanded automatically; the start/endPrefixMapping event supplies the information to the application to expand prefixes in those contexts itself, if necessary.
Note that start/endPrefixMapping events are not guaranteed to be properly nested relative to each-other: all startPrefixMapping events will occur before the corresponding startElement event, and all endPrefixMapping events will occur after the corresponding endElement event, but their order is not guaranteed.
- endPrefixMapping(prefix)[source]¶
End the scope of a prefix-URI mapping.
See startPrefixMapping for details. This event will always occur after the corresponding endElement event, but the order of endPrefixMapping events is not otherwise guaranteed.
- processingInstruction(target, data)[source]¶
Receive notification of a processing instruction.
The Parser will invoke this method once for each processing instruction found: note that processing instructions may occur before or after the main document element.
A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a text declaration (XML 1.0, section 4.3.1) using this method.
- startDocument()[source]¶
Receive notification of the beginning of a document.
The SAX parser will invoke this method only once, before any other methods in this interface or in DTDHandler (except for setDocumentLocator).
- endDocument()[source]¶
Receive notification of the end of a document.
The SAX parser will invoke this method only once, and it will be the last method invoked during the parse. The parser shall not invoke this method until it has either abandoned parsing (because of an unrecoverable error) or reached the end of input.
- startElement(name, attrs=None)[source]¶
Signals the start of an element in non-namespace mode.
The name parameter contains the raw XML 1.0 name of the element type as a string and the attrs parameter holds an instance of the Attributes class containing the attributes of the element.
- endElement(name)[source]¶
Signals the end of an element in non-namespace mode.
The name parameter contains the name of the element type, just as with the startElement event.
- startElementNS(name, qname=None, attrs=None)[source]¶
Signals the start of an element in namespace mode.
The name parameter contains the name of the element type as a (uri, localname) tuple, the qname parameter the raw XML 1.0 name used in the source document, and the attrs parameter holds an instance of the Attributes class containing the attributes of the element.
The uri part of the name tuple is None for elements which have no namespace.
- endElementNS(name, qname=None)[source]¶
Signals the end of an element in namespace mode.
The name parameter contains the name of the element type, just as with the startElementNS event.
- characters(content)[source]¶
Receive notification of character data.
The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.
- ignorableWhitespace(whitespace)[source]¶
Receive notification of ignorable whitespace in element content.
Validating Parsers must use this method to report each chunk of ignorable whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating parsers may also use this method if they are capable of parsing and using content models.
SAX parsers may return all contiguous whitespace in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity, so that the Locator provides useful information.
- skippedEntity(name)[source]¶
Receive notification of a skipped entity.
The Parser will invoke this method once for each entity skipped. Non-validating processors may skip entities if they have not seen the declarations (because, for example, the entity was declared in an external DTD subset). All processors may skip external entities, depending on the values of the http://xml.org/sax/features/external-general-entities and the http://xml.org/sax/features/external-parameter-entities properties.
- clarin.sru.xml.writer.copy_XML_into_writer(writer: ContentHandler, xml: bytes | str)[source]¶
- class clarin.sru.xml.writer.XMLStreamWriterHelper(xmlwriter: ContentHandler)[source]¶
Bases:
ContentHandler
- setDocumentLocator(locator)[source]¶
Called by the parser to give the application a locator for locating the origin of document events.
SAX parsers are strongly encouraged (though not absolutely required) to supply a locator: if it does so, it must supply the locator to the application by invoking this method before invoking any of the other methods in the DocumentHandler interface.
The locator allows the application to determine the end position of any document-related event, even if the parser is not reporting an error. Typically, the application will use this information for reporting its own errors (such as character content that does not match an application’s business rules). The information returned by the locator is probably not sufficient for use with a search engine.
Note that the locator will return correct information only during the invocation of the events in this interface. The application should not attempt to use it at any other time.
- startPrefixMapping(prefix, uri)[source]¶
Begin the scope of a prefix-URI Namespace mapping.
The information from this event is not necessary for normal Namespace processing: the SAX XML reader will automatically replace prefixes for element and attribute names when the http://xml.org/sax/features/namespaces feature is true (the default).
There are cases, however, when applications need to use prefixes in character data or in attribute values, where they cannot safely be expanded automatically; the start/endPrefixMapping event supplies the information to the application to expand prefixes in those contexts itself, if necessary.
Note that start/endPrefixMapping events are not guaranteed to be properly nested relative to each-other: all startPrefixMapping events will occur before the corresponding startElement event, and all endPrefixMapping events will occur after the corresponding endElement event, but their order is not guaranteed.
- endPrefixMapping(prefix)[source]¶
End the scope of a prefix-URI mapping.
See startPrefixMapping for details. This event will always occur after the corresponding endElement event, but the order of endPrefixMapping events is not otherwise guaranteed.
- processingInstruction(target, data)[source]¶
Receive notification of a processing instruction.
The Parser will invoke this method once for each processing instruction found: note that processing instructions may occur before or after the main document element.
A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a text declaration (XML 1.0, section 4.3.1) using this method.
- startDocument()[source]¶
Receive notification of the beginning of a document.
The SAX parser will invoke this method only once, before any other methods in this interface or in DTDHandler (except for setDocumentLocator).
- endDocument()[source]¶
Receive notification of the end of a document.
The SAX parser will invoke this method only once, and it will be the last method invoked during the parse. The parser shall not invoke this method until it has either abandoned parsing (because of an unrecoverable error) or reached the end of input.
- startElement(name, attrs=None)[source]¶
Signals the start of an element in non-namespace mode.
The name parameter contains the raw XML 1.0 name of the element type as a string and the attrs parameter holds an instance of the Attributes class containing the attributes of the element.
- endElement(name)[source]¶
Signals the end of an element in non-namespace mode.
The name parameter contains the name of the element type, just as with the startElement event.
- startElementNS(name, qname=None, attrs=None)[source]¶
Signals the start of an element in namespace mode.
The name parameter contains the name of the element type as a (uri, localname) tuple, the qname parameter the raw XML 1.0 name used in the source document, and the attrs parameter holds an instance of the Attributes class containing the attributes of the element.
The uri part of the name tuple is None for elements which have no namespace.
- endElementNS(name, qname=None)[source]¶
Signals the end of an element in namespace mode.
The name parameter contains the name of the element type, just as with the startElementNS event.
- characters(content)[source]¶
Receive notification of character data.
The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.
- ignorableWhitespace(whitespace)[source]¶
Receive notification of ignorable whitespace in element content.
Validating Parsers must use this method to report each chunk of ignorable whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating parsers may also use this method if they are capable of parsing and using content models.
SAX parsers may return all contiguous whitespace in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity, so that the Locator provides useful information.
- skippedEntity(name)[source]¶
Receive notification of a skipped entity.
The Parser will invoke this method once for each entity skipped. Non-validating processors may skip entities if they have not seen the declarations (because, for example, the entity was declared in an external DTD subset). All processors may skip external entities, depending on the values of the http://xml.org/sax/features/external-general-entities and the http://xml.org/sax/features/external-parameter-entities properties.