XSL analyze-string difficulty with tokenized strings


I need to tokenize a string and then run analyze-string on each of the tokens. This, however, seems impossible:

"XPTY0020: Required item type of the context item for the child axis is node(); supplied value has item type xs:string) because analyze-string requires a node context".

This is driving me insane, because analyze-string should, well, analyze strings, so I don't understand how to go around this problem.

My (simplified) XML looks like this:

<?xml version="1.0" encoding="UTF-8"?>
        <field name="def">1) ἀλλά sed, vero 2) καί et 3) а cum condicionali iunctum aequiparat
            аште: 4) ἵνα ut chron.</field>
        <field name="def">ἡλοῦν clavo figere</field>

and my stylesheet looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0">

    <xsl:strip-space elements="*"/>
    <xsl:output omit-xml-declaration="no" indent="yes"/>

    <xsl:template match="field[@name = 'def']">
            <xsl:call-template name="sense">
                <xsl:with-param name="def" select="."/>

    <xsl:template name="sense">
        <xsl:param name="def"/>
        <xsl:param name="separator" select="'\d{1,2}\)\s'"/>

        <xsl:for-each select="tokenize(normalize-space($def), $separator)">
            <xsl:if test="string-length(.) > 0">
                <xsl:element name="sense">
                    <xsl:attribute name="n">
                        <xsl:value-of select="position() - 1"/>
                    <!--this is the problematic bit, because current() is 
                    a string here -\- and, paradoxically, analyze-string
                    cannot deal with it-->
                    <xsl:analyze-string select="current()"
                                <xsl:value-of select="regex-group(1)"/>
                                <xsl:value-of select="regex-group(2)"/>
                            <xsl:value-of select="current()"/>

Without the problematic of analyze-string, the above stylesheet will correctly produce the following output:

<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <sense n="1">ἀλλά sed, vero </sense>
   <sense n="2">καί et </sense>
   <sense n="3">а cum condicionali iunctum aequiparat аште: </sense>
   <sense n="4">ἵνα ut chron.</sense>
<entry xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <sense n="0">ἡλοῦν clavo figere</sense>

The stylesheet uses the tokenize() method in order to separate multiple senses. Then, for each of the identified senses, I want to use analyze-string to wrap the first greek word with <greek></greek>.

What workaround can I use to make analyze-string work on tokens, i.e. strings, rather than nodes?

Many thanks in advance!

Show source
| xml   | regex   | xslt   | xslt-2.0   | tokenize   2016-12-02 20:12 1 Answers

Answers to XSL analyze-string difficulty with tokenized strings ( 1 )

  1. 2016-12-02 21:12

    I think the problem is that the regex attribute allows attribute value templates so your curly braces need to be doubled to say


    Or you need to define the pattern outside in a variable e.g.

    <xsl:variable name="pattern">^([\p{IsGreek}\p{IsGreekExtended}]+[\s]*[\p{IsGreek}\p{IsGreekExtended}]*)(.*$)</xsl:variable>

    and use regex="{$pattern}".

Leave a reply to - XSL analyze-string difficulty with tokenized strings

◀ Go back