core.bones.text
===============
.. py:module:: core.bones.text
.. autoapi-nested-parse::
The `text` module contains the `Textbone` and a custom HTML-Parser
to validate and extract client data for the `TextBone`.
Classes
-------
.. autoapisummary::
core.bones.text.HtmlBoneConfiguration
core.bones.text.CollectBlobKeys
core.bones.text.HtmlSerializer
core.bones.text.TextBone
Module Contents
---------------
.. py:class:: HtmlBoneConfiguration
Bases: :py:obj:`TypedDict`
A dictionary containing configurations for handling HTML content in TextBone instances.
Initialize self. See help(type(self)) for accurate signature.
.. py:attribute:: validTags
:type: list[str]
A list of valid HTML tags allowed in TextBone instances.
.. py:attribute:: validAttrs
:type: dict[str, list[str]]
A dictionary mapping valid attributes for each tag. If a tag is not listed, this tag accepts no attributes.
.. py:attribute:: validStyles
:type: list[str]
A list of allowed CSS directives for the TextBone instances.
.. py:attribute:: validClasses
:type: list[str]
A list of valid CSS class names allowed in TextBone instances.
.. py:attribute:: singleTags
:type: list[str]
A list of self-closing HTML tags that don't have corresponding end tags.
.. py:class:: CollectBlobKeys
Bases: :py:obj:`html.parser.HTMLParser`
A custom HTML parser that extends the HTMLParser class to collect blob keys found in the "src" attribute
of and
tags.
Initialize and reset this instance.
If convert_charrefs is True (the default), all character references
are automatically converted to the corresponding Unicode characters.
.. py:attribute:: blobs
.. py:method:: handle_starttag(tag, attrs)
Handles the start tag in the HTML content being parsed. If the tag is an or
element, the method
extracts the blob key from the "src" attribute and adds it to the "blobs" set.
:param str tag: The current start tag encountered by the parser.
:param List[Tuple[str, str]] attrs: A list of tuples containing the attribute name and value of the current tag.
.. py:class:: HtmlSerializer(validHtml = None, srcSet=None, convert_charrefs = True)
Bases: :py:obj:`html.parser.HTMLParser`
A custom HTML parser that extends the HTMLParser class to sanitize and serialize HTML content
by removing invalid tags and attributes while retaining the valid ones.
:param dict validHtml: A dictionary containing valid HTML tags, attributes, styles, and classes.
:param dict srcSet: A dictionary containing width and height for srcset attribute processing.
Initialize and reset this instance.
If convert_charrefs is True (the default), all character references
are automatically converted to the corresponding Unicode characters.
.. py:attribute:: __html_serializer_trans
.. py:attribute:: result
:value: ''
.. py:attribute:: openTagsList
:value: []
.. py:attribute:: tagCache
:value: []
.. py:attribute:: validHtml
:value: None
.. py:attribute:: srcSet
:value: None
.. py:method:: handle_data(data)
Handles the data encountered in the HTML content being parsed. Escapes special characters
and appends the data to the result if it is not only whitespace characters.
:param str data: The data encountered by the parser.
.. py:method:: handle_charref(name)
Handles character references in the HTML content being parsed and appends the character reference to the
result.
:param str name: The name of the character reference.
.. py:method:: handle_entityref(name)
Handles entity references in the HTML content being parsed and appends the entity reference to the result.
:param str name: The name of the entity reference.
.. py:method:: flushCache()
Flush pending tags into the result and push their corresponding end-tags onto the stack
.. py:method:: handle_starttag(tag, attrs)
Handles start tags in the HTML content being parsed. Filters out invalid tags and attributes and
processes valid ones.
:param str tag: The current start tag encountered by the parser.
:param List[Tuple[str, str]] attrs: A list of tuples containing the attribute name and value of the current tag.
.. py:method:: handle_endtag(tag)
Handles end tags in the HTML content being parsed. Closes open tags and discards invalid ones.
:param str tag: The current end tag encountered by the parser.
.. py:method:: cleanup()
Append missing closing tags to the result.
.. py:method:: sanitize(instr)
Sanitizes the input HTML string by removing invalid tags and attributes while retaining valid ones.
:param str instr: The input HTML string to be sanitized.
:return: The sanitized HTML string.
:rtype: str
.. py:class:: TextBone(*, validHtml = __undefinedC__, max_length = 200000, srcSet = None, indexed = False, **kwargs)
Bases: :py:obj:`core.bones.raw.RawBone`
A bone for storing and validating HTML or plain text content. Can be configured to allow
only specific HTML tags and attributes, and enforce a maximum length. Supports the use of
srcset for embedded images.
:param validHtml: A dictionary containing allowed HTML tags and their attributes.
Defaults to `conf.bone_html_default_allow`.
:param max_length: The maximum allowed length for the content. Defaults to 200000.
:param languages: If set, this bone can store a different content for each language
:param srcSet: An optional dictionary containing width and height for srcset generation.
Must be a dict of "width": [List of Ints], "height": [List of Ints], eg {"height": [720, 1080]}
:param indexed: Whether the content should be indexed for searching. Defaults to False.
:param kwargs: Additional keyword arguments to be passed to the base class constructor.
:param validHtml: If set, must be a structure like `conf.bone_html_default_allow`
:param languages: If set, this bone can store a different content for each language
:param max_length: Limit content to max_length bytes
:param indexed: Must not be set True, unless you limit max_length accordingly
:param srcSet: If set, inject srcset tags to embedded images. Must be a dict of
"width": [List of Ints], "height": [List of Ints], eg {"height": [720, 1080]}
.. py:class:: __undefinedC__
.. py:attribute:: type
:value: 'text'
.. py:attribute:: validHtml
.. py:attribute:: max_length
:value: 200000
.. py:attribute:: srcSet
:value: None
.. py:method:: singleValueSerialize(value, skel, name, parentIndexed)
Serializes a single value of the TextBone instance for storage.
This method takes the value as-is without any additional processing, since it's already stored in a format
suitable for serialization.
.. py:method:: singleValueFromClient(value, skel, bone_name, client_data)
.. py:method:: getEmptyValue()
Returns an empty value for the TextBone instance.
This method is used to represent an empty or unset value for the TextBone.
return: An empty string.
:rtype: str
.. py:method:: isInvalid(value)
Checks if the given value is valid for this TextBone instance.
This method checks whether the given value is valid according to the TextBone's constraints (e.g., not
None and within the maximum length).
:param value: The value to be checked for validity.
:return: Returns None if the value is valid, or an error message string otherwise.
:rtype: Optional[str]
.. py:method:: getReferencedBlobs(skel, name)
Extracts and returns the blob keys of referenced files in the HTML content of the TextBone instance.
This method parses the HTML content of the TextBone to identify embedded images or file hrefs,
collects their blob keys, and ensures that they are not deleted even if removed from the file browser,
preventing broken links or images in the TextBone content.
:param SkeletonInstance skel: A SkeletonInstance object containing the data of an entry.
:param str name: The name of the TextBone for which to find referenced blobs.
:return: A set containing the blob keys of the referenced files in the TextBone's HTML content.
:rtype: Set[str]
.. py:method:: refresh(skel, boneName)
Re-parses the text content of the TextBone instance to rebuild the src-set if necessary.
This method is useful when the src-set configuration has changed and needs to be applied
to the existing HTML content. It re-parses the content and updates the src-set attributes
accordingly.
:param SkeletonInstance skel: A SkeletonInstance object containing the data of an entry.
:param str boneName: The name of the TextBone for which to refresh the src-set.
.. py:method:: getUniquePropertyIndexValues(valuesCache, name)
Retrieves the unique property index values for the TextBone.
If the TextBone supports multiple languages, this method raises a NotImplementedError, as it's unclear
whether each language should be kept distinct or not. Otherwise, it calls the superclass's
getUniquePropertyIndexValues method to retrieve the unique property index values.
:param valuesCache: A dictionary containing the cached values for the TextBone.
:param name: The name of the TextBone.
:return: A list of unique property index values for the TextBone.
:raises NotImplementedError: If the TextBone supports multiple languages.
.. py:method:: structure()