core.bones.text

The text module contains the Textbone and a custom HTML-Parser to validate and extract client data for the TextBone.

Module Contents

Classes

CollectBlobKeys

A custom HTML parser that extends the HTMLParser class to collect blob keys found in the "src" attribute

HtmlSerializer

A custom HTML parser that extends the HTMLParser class to sanitize and serialize HTML content

TextBone

A bone for storing and validating HTML or plain text content. Can be configured to allow

Attributes

_defaultTags

A dictionary containing default configurations for handling HTML content in TextBone instances.

core.bones.text._defaultTags

A dictionary containing default configurations for handling HTML content in TextBone instances.

  • validTags (list[str]):

    A list of valid HTML tags allowed in TextBone instances.

  • validAttrs (dict[str, list[str]]):

    A dictionary mapping valid attributes for each tag. If a tag is not listed, no attributes are allowed for that tag.

  • validStyles (list[str]):

    A list of allowed CSS directives for the TextBone instances.

  • validClasses (list[str]):

    A list of valid CSS class names allowed in TextBone instances.

  • singleTags (list[str]):

    A list of self-closing HTML tags that don’t have corresponding end tags.

class core.bones.text.CollectBlobKeys

Bases: html.parser.HTMLParser

A custom HTML parser that extends the HTMLParser class to collect blob keys found in the “src” attribute of <a> and <img> tags.

Initialize and reset this instance.

If convert_charrefs is True (the default), all character references are automatically converted to the corresponding Unicode characters.

handle_starttag(tag, attrs)

Handles the start tag in the HTML content being parsed. If the tag is an <a> or <img> element, the method extracts the blob key from the “src” attribute and adds it to the “blobs” set.

Parameters:
  • tag (str) – The current start tag encountered by the parser.

  • attrs (List[Tuple[str, str]]) – A list of tuples containing the attribute name and value of the current tag.

class core.bones.text.HtmlSerializer(validHtml=None, srcSet=None)

Bases: html.parser.HTMLParser

A custom HTML parser that extends the HTMLParser class to sanitize and serialize HTML content by removing invalid tags and attributes while retaining the valid ones.

Parameters:
  • validHtml (dict) – A dictionary containing valid HTML tags, attributes, styles, and classes.

  • srcSet (dict) – A dictionary containing width and height for srcset attribute processing.

Initialize and reset this instance.

If convert_charrefs is True (the default), all character references are automatically converted to the corresponding Unicode characters.

__html_serializer_trans
handle_data(data)

Handles the data encountered in the HTML content being parsed. Escapes special characters and appends the data to the result if it is not only whitespace characters.

Parameters:

data (str) – The data encountered by the parser.

handle_charref(name)

Handles character references in the HTML content being parsed and appends the character reference to the result.

Parameters:

name (str) – The name of the character reference.

handle_entityref(name)

Handles entity references in the HTML content being parsed and appends the entity reference to the result.

Parameters:

name (str) – The name of the entity reference.

flushCache()

Flush pending tags into the result and push their corresponding end-tags onto the stack

handle_starttag(tag, attrs)

Handles start tags in the HTML content being parsed. Filters out invalid tags and attributes and processes valid ones.

Parameters:
  • tag (str) – The current start tag encountered by the parser.

  • attrs (List[Tuple[str, str]]) – A list of tuples containing the attribute name and value of the current tag.

handle_endtag(tag)

Handles end tags in the HTML content being parsed. Closes open tags and discards invalid ones.

Parameters:

tag (str) – The current end tag encountered by the parser.

cleanup()

Append missing closing tags to the result.

sanitize(instr)

Sanitizes the input HTML string by removing invalid tags and attributes while retaining valid ones.

Parameters:

instr (str) – The input HTML string to be sanitized.

Returns:

The sanitized HTML string.

Return type:

str

class core.bones.text.TextBone(*, validHtml=__undefinedC__, max_length=200000, srcSet=None, indexed=False, **kwargs)

Bases: viur.core.bones.base.BaseBone

A bone for storing and validating HTML or plain text content. Can be configured to allow only specific HTML tags and attributes, and enforce a maximum length. Supports the use of srcset for embedded images.

Parameters:
  • validHtml (None | dict) – A dictionary containing allowed HTML tags and their attributes. Defaults to _defaultTags. Must be a structured like :prop:_defaultTags

  • max_length (int) – The maximum allowed length for the content. Defaults to 200000.

  • languages – If set, this bone can store a different content for each language

  • srcSet (Optional[dict[str, list]]) – An optional dictionary containing width and height for srcset generation. Must be a dict of “width”: [List of Ints], “height”: [List of Ints], eg {“height”: [720, 1080]}

  • indexed (bool) – Whether the content should be indexed for searching. Defaults to False.

  • kwargs – Additional keyword arguments to be passed to the base class constructor.

  • validHtml – If set, must be a structure like :prop:_defaultTags

  • languages – If set, this bone can store a different content for each language

  • max_length – Limit content to max_length bytes

  • indexed – Must not be set True, unless you limit max_length accordingly

  • srcSet – If set, inject srcset tags to embedded images. Must be a dict of “width”: [List of Ints], “height”: [List of Ints], eg {“height”: [720, 1080]}

class __undefinedC__
type = 'text'
singleValueSerialize(value, skel, name, parentIndexed)

Serializes a single value of the TextBone instance for storage.

This method takes the value as-is without any additional processing, since it’s already stored in a format suitable for serialization.

Parameters:
singleValueFromClient(value, skel, bone_name, client_data)

Load a single value from a client

Parameters:
  • value – The single value which should be loaded.

  • skel – The SkeletonInstance where the value should be loaded into.

  • bone_name – The bone name of this bone in the SkeletonInstance.

  • client_data – The data taken from the client, a dictionary with usually bone names as key

Returns:

A tuple. If the value is valid, the first element is the parsed value and the second is None. If the value is invalid or not parseable, the first element is a empty value and the second a list of ReadFromClientError.

getEmptyValue()

Returns an empty value for the TextBone instance.

This method is used to represent an empty or unset value for the TextBone.

return: An empty string. :rtype: str

isInvalid(value)

Checks if the given value is valid for this TextBone instance.

This method checks whether the given value is valid according to the TextBone’s constraints (e.g., not None and within the maximum length).

Parameters:

value – The value to be checked for validity.

Returns:

Returns None if the value is valid, or an error message string otherwise.

Return type:

Optional[str]

getReferencedBlobs(skel, name)

Extracts and returns the blob keys of referenced files in the HTML content of the TextBone instance.

This method parses the HTML content of the TextBone to identify embedded images or file hrefs, collects their blob keys, and ensures that they are not deleted even if removed from the file browser, preventing broken links or images in the TextBone content.

Parameters:
  • skel (SkeletonInstance) – A SkeletonInstance object containing the data of an entry.

  • name (str) – The name of the TextBone for which to find referenced blobs.

Returns:

A set containing the blob keys of the referenced files in the TextBone’s HTML content.

Return type:

Set[str]

refresh(skel, boneName)

Re-parses the text content of the TextBone instance to rebuild the src-set if necessary.

This method is useful when the src-set configuration has changed and needs to be applied to the existing HTML content. It re-parses the content and updates the src-set attributes accordingly.

Parameters:
  • skel (SkeletonInstance) – A SkeletonInstance object containing the data of an entry.

  • boneName (str) – The name of the TextBone for which to refresh the src-set.

Return type:

None

getSearchTags(skel, name)

Extracts search tags from the text content of a TextBone.

This method iterates over the values of the TextBone in the given skeleton, and for each non-empty value, it tokenizes the text by lines and words. Then, it adds the lowercase version of each word to a set of search tags, which is returned at the end.

Parameters:
  • skel (viur.core.skeleton.SkeletonInstance) – A SkeletonInstance containing the TextBone.

  • name (str) – The name of the TextBone in the skeleton.

Returns:

A set of unique search tags (lowercase words) extracted from the text content of the TextBone.

Return type:

set[str]

getUniquePropertyIndexValues(valuesCache, name)

Retrieves the unique property index values for the TextBone.

If the TextBone supports multiple languages, this method raises a NotImplementedError, as it’s unclear whether each language should be kept distinct or not. Otherwise, it calls the superclass’s getUniquePropertyIndexValues method to retrieve the unique property index values.

Parameters:
  • valuesCache (dict) – A dictionary containing the cached values for the TextBone.

  • name (str) – The name of the TextBone.

Returns:

A list of unique property index values for the TextBone.

Raises:

NotImplementedError – If the TextBone supports multiple languages.

Return type:

list[str]

structure()

Describes the bone and its settings as an JSON-serializable dict. This function has to be implemented for subsequent, specialized bone types.

Return type:

dict