SQL Server -replace repetitive words in a column with single word

Question

I have a table with 4 columns and data looks like

`cust_id firstname lastname  value`

`1       a            b      ct;ct;ct;dir`
`2       c            a      ct;ct;ct;ct;ct;ct`
`3       d            e      ct;ct;ct;dir;st`

i want output like

`cust_id firstname lastname value`

`1       a            b      ct;dir`
`2       c            a      ct`
`3       d            e      ct;dir;st`

Have lots of rows with different number of repetitive words in each.

Any help much appreciated.


Show source
| sql-server   | sql-server-2008   | sql   | sql-server-2012   2017-01-04 14:01 4 Answers

Answers ( 4 )

  1. 2017-01-04 14:01

    If you want repeated, adjacent instances of ct; to be turned into a single one, you can do:

    select replace(replace(replace(col, 'ct;', '><'), '<>', ''), '><', 'ct;')
    

    This assumes that "<" and ">" do not appear in the column. Any two characters can be used for this purpose.

    You can also readily put this into an update.

    Note: If these are codes of some sort, then you should normalize the data. This problem does appear in other contexts where normalization is not appropriate (for instance, removing consecutive spaces in a string).

  2. 2017-01-04 14:01

    With the help of a Parse/Split Function and a CROSS apply. I should add that the logic of the PARSE can easily be migrated into the CROSS APPLY if you can't use a UDF

    Also, the sequence will be preserved if that matters.

    Declare @YourTable table (cust_id int,value varchar(50))
    Insert Into @YourTable values
    (1,'ct;ct;ct;dir'),
    (2,'ct;ct;ct;ct;ct;ct'),
    (3,'ct;ct;ct;dir;st')
    
    Select A.*
          ,B.*
     From  @YourTable A
     Cross Apply (
        Select String=Stuff((Select Distinct ';' +RetVal From (
            Select RetVal,RetSeq=min(RetSeq)
              From [dbo].[udf-Str-Parse](A.Value,';') C
              Group By RetVal
        ) X For XML Path ('')),1,1,'') 
      ) B
    

    Returns

    cust_id value                String
    1       ct;ct;ct;dir         ct;dir
    2       ct;ct;ct;ct;ct;ct    ct
    3       ct;ct;ct;dir;st      ct;dir;st
    

    The UDF if Needed

    CREATE FUNCTION [dbo].[udf-Str-Parse] (@String varchar(max),@Delimiter varchar(10))
    Returns Table 
    As
    Return (  
        Select RetSeq = Row_Number() over (Order By (Select null))
              ,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
        From  (Select x = Cast('<x>'+ replace((Select @String as [*] For XML Path('')),@Delimiter,'</x><x>')+'</x>' as xml).query('.')) as A 
        Cross Apply x.nodes('x') AS B(i)
    );
    --Select * from [dbo].[udf-Str-Parse]('Dog,Cat,House,Car',',')
    --Select * from [dbo].[udf-Str-Parse]('John Cappelletti was here',' ')
    --Select * from [dbo].[udf-Str-Parse]('this,is,<test>,for,< & >',',')
    

    Another Parse/Spit function (Returns same results as XML version)

    CREATE FUNCTION [dbo].[udf-Str-Parse-8K] (@String varchar(max),@Delimiter varchar(10))
    Returns Table 
    As
    Return (  
        with   cte1(N)   As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
               cte2(N)   As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 a,cte1 b,cte1 c,cte1 d) A ),
               cte3(N)   As (Select 1 Union All Select t.N+DataLength(@Delimiter) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter)) = @Delimiter),
               cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter,@String,s.N),0)-S.N,8000) From cte3 S)
    
        Select RetSeq = Row_Number() over (Order By A.N)
              ,RetVal = Substring(@String, A.N, A.L) 
        From   cte4 A
    );
    --Orginal Source http://www.sqlservercentral.com/articles/Tally+Table/72993/
    --Much faster than str-Parse, but limited to 8K
    --Select * from [dbo].[udf-Str-Parse-8K]('Dog,Cat,House,Car',',')
    --Select * from [dbo].[udf-Str-Parse-8K]('John||Cappelletti||was||here','||')
    
  3. 2017-01-04 14:01

    The first thing to point out is that if you stored your data in a normalised fashion then you would not have as much of an issue, the best way would be a separate one to many table, e.g.

    CustomerValues

    Cust_ID     Value
    -------------------
    1           ct
    1           ct
    1           ct
    1           dir
    2           ct
    2           ct
    .....
    

    Your query would then become something like:

    --SAMPLE DATA
    WITH Customers AS
    (   SELECT  *
        FROM    (VALUES
                    (1, 'a', 'b'),
                    (2, 'c', 'a'),
                    (3, 'd', 'e')
                ) AS t (cust_id, firstname, lastname)
    ), CustomerValues AS
    (   SELECT  *
        FROM    (VALUES
                    (1, 'ct'), (1, 'ct'), (1, 'ct'), (1, 'dir'),
                    (2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'),
                    (3, 'ct'), (3, 'ct'), (3, 'ct'), (3, 'dir'), (3, 'st')
                ) AS t (cust_id, value)
    )
    -- SAMPLE DATA END
    SELECT  c.cust_id,
            c.firstname,
            c.lastname,
            value = STUFF(CustomerValues.value('.', 'NVARCHAR(MAX)'), 1, 1, '')
    FROM    Customers AS c
            CROSS APPLY
            (   SELECT  DISTINCT ';' + value
                FROM    CustomerValues AS cv
                WHERE   cv.cust_id = c.cust_id
                FOR XML PATH(''), TYPE
            ) AS cv (CustomerValues);
    

    For more reading on how the rows are concatenated see Grouped Concatenation in SQL Server

    Without your data in this format, you would need to perform some kind of split. For more see Split strings the right way – or the next best way

    WITH Customers AS
    (   SELECT  *
        FROM    (VALUES
                    (1, 'a', 'b', 'ct;ct;ct;dir'),
                    (2, 'c', 'a', 'ct;ct;ct;ct;ct;ct'),
                    (3, 'd', 'e', 'ct;ct;ct;dir;st')
                ) AS t (cust_id, firstname, lastname, value)
    ), Numbers (Number) AS
    (   SELECT  ROW_NUMBER() OVER(ORDER BY N1.N)
        FROM    (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n1 (N)
        CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n2 (N)
        CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n3 (N)
    ), CustomerValues AS
    (   SELECT  c.cust_id,
                value = SUBSTRING(c.value, Number,  CHARINDEX(';', c.value + ';', n.Number) - n.Number)
        FROM    Customers AS c
                INNER JOIN Numbers AS n
                    ON N.Number <= CONVERT(INT, LEN(c.value))
                    AND SUBSTRING(';' + c.value, n.Number, 1) = ';'
    )
    SELECT  c.cust_id,
            c.firstname,
            c.lastname,
            value = STUFF(CustomerValues.value('.', 'NVARCHAR(MAX)'), 1, 1, '')
    FROM    Customers AS c
            CROSS APPLY
            (   SELECT  DISTINCT ';' + value
                FROM    CustomerValues AS cv
                WHERE   cv.cust_id = c.cust_id
                FOR XML PATH(''), TYPE
            ) AS cv (CustomerValues);
    
  4. 2017-01-04 14:01

    Here is how you can do:

    WITH 
    CTE_Sample AS
    (
        SELECT 1 AS cust_id, 'a' AS firstname, 'b' AS lastname, 'ct;ct;ct;dir'      AS YourValue UNION ALL
        SELECT 2 AS cust_id, 'c' AS firstname, 'a' AS lastname, 'ct;ct;ct;ct;ct;ct' AS YourValue UNION ALL
        SELECT 3 AS cust_id, 'd' AS firstname, 'e' AS lastname, 'ct;ct;ct;dir;st'   AS YourValue 
    ),
    
    --
    -- Split your values into lines (Distinct values)
    CTE_Split AS
    (
        SELECT DISTINCT
                 YourValue  
                ,value AS Val
              FROM CTE_Sample SS
              CROSS APPLY STRING_SPLIT(YourValue, ';')
    )
    
    
    SELECT 
         cust_id
        ,firstname
        ,lastname
        -- Aggregate your different value into one column
        ,STUFF((
            SELECT ';'+ Val
              FROM CTE_Split SP       
              WHERE SP.YourValue = SA.YourValue
            FOR XML PATH('')
                            ), 1, 1, '' 
        ) AS Val
     FROM CTE_Sample SA
    
◀ Go back