Data.Text
Remarks#
Text
is a more efficient alternative to Haskell’s standard String
type. String
is defined as a linked list of characters in the standard Prelude, per the Haskell Report:
type String = [Char]
Text
is represented as a packed array of Unicode characters. This is similar to how most other high-level languages represent strings, and gives much better time and space efficiency than the list version.
Text
should be preferred over String
for all production usage. A notable exception is depending on a library which has a String
API, but even in that case there may be a benefit of using Text
internally and converting to a String
just before interfacing with the library.
All of the examples in this topic use the OverloadedStrings
language extension.
Text Literals
The OverloadedStrings
language extension allows the use of normal string literals to stand for Text
values.
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Text as T
myText :: T.Text
myText = "overloaded"
Stripping whitespace
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Text as T
myText :: T.Text
myText = "\n\r\t leading and trailing whitespace \t\r\n"
strip
removes whitespace from the start and end of a Text
value.
ghci> T.strip myText
"leading and trailing whitespace"
stripStart
removes whitespace only from the start.
ghci> T.stripStart myText
"leading and trailing whitespace \t\r\n"
stripEnd
removes whitespace only from the end.
ghci> T.stripEnd myText
"\n\r\t leading and trailing whitespace"
filter
can be used to remove whitespace, or other characters, from the middle.
ghci> T.filter /=' ' "spaces in the middle of a text string"
"spacesinthemiddleofatextstring"
Splitting Text Values
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Text as T
myText :: T.Text
myText = "mississippi"
splitOn
breaks a Text
up into a list of Texts
on occurrences of a substring.
ghci> T.splitOn "ss" myText
["mi","i","ippi"]
splitOn
is the inverse of intercalate
.
ghci> intercalate "ss" (splitOn "ss" "mississippi")
"mississippi"
split
breaks a Text
value into chunks on characters that satisfy a Boolean predicate.
ghci> T.split (== 'i') myText
["m","ss","ss","pp",""]
Encoding and Decoding Text
Encoding and decoding functions for a variety of Unicode encodings can be found in the Data.Text.Encoding
module.
ghci> import Data.Text.Encoding
ghci> decodeUtf8 (encodeUtf8 "my text")
"my text"
Note that decodeUtf8
will throw an exception on invalid input. If you want to handle invalid UTF-8 yourself, use decodeUtf8With
.
ghci> decodeUtf8With (\errorDescription input -> Nothing) messyOutsideData
Checking if a Text is a substring of another Text
ghci> :set -XOverloadedStrings
ghci> import Data.Text as T
isInfixOf :: Text -> Text -> Bool
checks whether a Text
is contained anywhere within another Text
.
ghci> "rum" `T.isInfixOf` "crumble"
True
isPrefixOf :: Text -> Text -> Bool
checks whether a Text
appears at the beginning of another Text
.
ghci> "crumb" `T.isPrefixOf` "crumble"
True
isSuffixOf :: Text -> Text -> Bool
checks whether a Text
appears at the end of another Text
.
ghci> "rumble" `T.isSuffixOf` "crumble"
True
Indexing Text
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Text as T
myText :: T.Text
myText = "mississippi"
Characters at specific indices can be returned by the index
function.
ghci> T.index myText 2
's'
The findIndex
function takes a function of type (Char -> Bool)
and Text and returns the index of the first occurrence of a given string or Nothing if it doesn’t occur.
ghci> T.findIndex ('s'==) myText
Just 2
ghci> T.findIndex ('c'==) myText
Nothing
The count
function returns the number of times a query Text
occurs within another Text
.
ghci> count ("miss"::T.Text) myText
1