Data Parsing

When you import data into Engage+, the platform runs the data through a parsing and validation step to ensure that the values are clean, valid, and usable. This topic describes the different parsing rules for the different Data Types.

Data (General)

The platform will validate input values based on the Data Type, such as making sure Integer, Date / Time, Money, and Big Integer fields contain only valid numeric characters.

All of the Data Type errors are reported on as warnings during the import process. You can create an Exception Export (see Exports for more details) that contains details of the records that encountered these Data Type errors. This Export includes the original values and line numbers, so you can correct any issues, and retry the import if desired.

Additionally, fields can be set up with Field Restrictions to allow only certain, specified values into the database. For example, you could set up Field Restrictions on a "State" field to allow only the valid U.S. Post Office two-letter state abbreviations. If any records came in which did not match this set of valid values, the value would not make it into the database. See Set Field Restrictions for more details.

Integer

For Integer fields, the value must be a whole number or a negative whole number. Any leading zeros are stripped.  Any records containing a string are rejected.

Date / Time

For Date / Time fields, the platform accepts any valid SQL Server date / time format. Values in other formats are rejected. If no time is specified, the system defaults the time to Midnight.

The acceptable Date / Time formats are as follows:

Phone Numbers

Additional data cleansing is done during the import process for fields that are set with a Data Type of "Phone." The additional data cleansing is intended to make the phone number a valid number to be sent SMS Text messages.  As such, all non-numeric characters are removed, and if the number is then exactly ten digits long, the system will add a leading "1" to the number for texting in the U.S. This leading "1" is added only to U.S. phone numbers; any international numbers are left unchanged.

Email

Messaging can perform simple validation and cleansing for fields that are set with a Data Type of "Email." This feature is not enabled by default and must be enabled for each client account. When enabled, different levels of cleansing and validation can be selected. These options are configurable as the regions of the world have different requirements.

Note: The recommended configuration setting is to enable the "Cleansing - General" and "Domain Cleansing - General" cleansing rules and the "Validation - General" validation rules. All of the available configuration settings are described below in more detail.

Cleansing

Email cleansing checks the incoming email address, and modifies the data based on the cleansing rules in an attempt to convert it into a syntax-correct email address. If successful, the corrected email address is then loaded to the database. The available email cleansing options are as follows:

Rule

Description

Usage Options

Cleansing - General

General cleansing to:

  • Extract email embedded in <>  View exampleView example

    Input: <Joe.Smith@cheetahdigital.com>

    Becomes: Joe.Smith@cheetahdigital.com

  • Coalesce dots (including commas)   View exampleView example

    Input: Joe.,.Smith@cheetahdigital.com

    Becomes: Joe.Smith@cheetahdigital.com

  • Coalesce @ (including ! and #)   View exampleView example

    Input: Joe.Smith@!@#cheetahdigital.com

    Becomes: Joe.Smith@cheetahdigital.com

  • Remove HTML tags (including "mailto:" and "smtp:")   View exampleView example

    Input: mailto:joe.smith@cheetahdigital.com

    Becomes: joe.smith@cheetahdigital.com

  • Purge invalid characters (such as leading / trailing periods, spaces, and other characters such as "():;<>[]\ ,)   View examplesView examples

    Input: joe.smith@cheetahdigital.com.

    Input: joe. smith@cheetahdigital.com

    Input: joe;smith@cheetahdigital.com

    Becomes: joe.smith@cheetahdigital.com

You can select to use either the "General" or the "Strict" Cleansing rules.

Cleansing - Strict

Same as "Cleansing - General" rules, plus email address must begin with any of the following: -+_$ or 'A-Z' or 'a-z' or '0-9' and then can include .@!&'*/=?^|~#+     View exampleView example

Input: joe.smith!@cheetahdigital.com

Becomes: joe.smith@@cheetahdigital.com

Domain Cleansing - General

Domains must not contain $&*/=?^|    View exampleView example

Input: joe.smith@cheetahdigital$.com

Becomes: joe.smith@cheetahdigital.com

You can select to use either the "General" or the "Strict" Domain Cleansing rules.

Domain Cleansing - Strict

Same as "Domain Cleansing - General" rules, plus domains must not begin with any of the following: -$%&+/=?^{}|`* and then can include hyphen (-)   View exampleView example

Input: joe.smith@&cheetahdigital.com

Becomes: joe.smith@cheetahdigital.com

Input: joe.smith@cheetahdigital-us.com

Becomes: No change; valid email

Gmail

Valid Gmail handles must start with a letter or number and then contain only characters within 'A-Z,'  'a-z,' '0-9,' or hyphen (-), underscore (_), period (.), and plus-sign (+).

Note: Gmail does not recognize “.” as characters within user names. You can add or remove the “.” from a Gmail address without changing the actual destination address; they will all go to the same inbox. For example: "j.smith" or "jsmith" are the same Gmail address.

You can either enable or disable the Gmail cleansing rules.

Validation

Email validation checks the incoming email address to confirm that it is valid against specific rules, including syntax, domain, role accounts and some ISP rules. If the email address does not pass validation, the email address is rejected and it will not be loaded to the database. If the email address is the Unique Identifier for the table to which it is being imported, the entire record will be rejected and not loaded. If the email address is not the Unique Identifier, the email address will be rejected, but the other attributes will still be loaded.

Rule

Description

Usage Options

Validation - General

Handles must begin with any of -+$&*^=/`{}%|?~'_ or 'A-Z' or 'a-z' or '0-9' and then can include !.# (but no contiguous .)

You can select to use either the "General" or the "Strict" Validation rules.

Validation - Strict

Handles must begin with -+$_ or 'A-Z' or 'a-z' or '0-9' and then can include !&'*/=?^|~#. (but no contiguous .); domain parts must begin and end with 'A-Z' or 'a-z' or '0-9' and in between can include dash (-) or underscore (_)

Top Level Domain

Perform a top level domain (TLD) database look-up during load. A list of valid top level domain names can be found at:

  • http://data.iana.org/TLD/tlds-alphaby-domain.txt

You can either enable or disable the Top Level Domain Validation rules.

Spam Trap

Spam Traps are fictitious email addresses used to entice and entrap spammers. These should be blocked from being loaded or deployed to.

You can either enable or disable the Spam Trap Validation rules.

Role Address

Role email addresses are blacklisted at the request of several ISPs. Role addresses are defined by ISPs. A global list of role addresses should be maintained, but can be overridden on a client or sub-account setting if the client’s business model requires mailing to these addresses.

You can either enable or disable the Role Address Validation rules.

Defunct Domain

Email domains that no longer exist.

You can either enable or disable the Defunct Domain Validation rules.

Yahoo

Valid Yahoo! handles must:

  • Start with a letter and then contain only characters within 'A-Z,'  'a-z,' '0-9,' or hyphen (-), underscore (_), and period (.)

  • Can be a maximum length of 32 characters.

  • Can be extended by a hyphen, and the extended characters can contain only 'A-Z,'  'a-z,' '0-9,' or underscore (_), and is between 1 and 31 characters long.

You can either enable or disable the Yahoo Validation rules.

America Online (AOL)

Valid AOL handles must:

  • Start with a letter and then contain only characters within 'A-Z,'  'a-z,' '0-9,' or underscore (_), and period (.)

  • Be between 2 and 32 characters long.

You can either enable or disable the AOL Validation rules.

Hotmail

Valid Hotmail handles must:

  • Start with a letter and then contain only characters within 'A-Z,'  'a-z,' '0-9,' or hyphen (-), underscore (_), period (.), or plus-sign (+).

You can either enable or disable the Hotmail Validation rules.

MSLIVE

Valid MSLIVE handles must:

  • Contain only characters within 'A-Z,'  'a-z,' '0-9,' or hyphen (-), underscore (_), period (.), or plus-sign (+).

You can either enable or disable the MSLIVE Validation rules.

MSN

Valid MSN handles must:

  • Contain only characters within 'A-Z,'  'a-z,' '0-9,' or hyphen (-), underscore (_), or period (.).

You can either enable or disable the MSN Validation rules.

Japan

Japan (disney,disney.mobile,docomo,ezweb,ido,jp-(c,d,h,k,n,q,r,s,t),sky.(cdp,dtg,kdp,tdp,tkc,tkk,tuka),softbank,vodafone.ne.jp) handles must:

  • Start with 'A-Z,'  'a-z,' '0-9,' or dollar sign ($), or plus-sign (+) and then can include -_!&'*/=?^|~#`%{}();

  • Domains must not begin with hyphen (-), underscore (_) or period (.), and their parts must be hyphen (-), underscore (_),  or 'A-Z,'  or 'a-z,' or '0-9.'

  • Neither the handle nor the domain can exceed 64 characters.

You can either enable or disable the Japan Validation rules.

Netease

China - Netease (163.com, 126.com, yeah.net) handles must:

  • Start with a letter or the number 1 and then can include underscore (_) or '0-9'

  • Be from 6 to 18 characters.

You can either enable or disable the Netease Validation rules.

Netease_VIP

China - Netease VIP (vip.163.com, etc) handles must:

  • Start with a letter and then can include hyphen (-), underscore (_) or period (.), or '0-9'

  • Be from 3 to 20 characters.

You can either enable or disable the Netease_VIP Validation rules.

TENCENT

China - Tencent (qq.com,foxmail.com) handles must:

  • Start with letter or number and then can include hyphen (-), underscore (_) or period (.)

  • Be from 3 to 18 characters.

You can either enable or disable the TENCENT Validation rules.

SOHU

China - Sohu handles must:

  • Start with a letter or the number 1 and then can include hyphen (-), underscore (_) or period (.), or '0-9'

  • Be from 4 to 16 characters.

You can either enable or disable the SOHU Validation rules.

SINA

China - Sina handles must:

  • Start and end with a letter or number and can include underscore (_) in between

  • Be from 4 to 16 characters.

You can either enable or disable the SINA Validation rules.

SINA_VIP

China - Sina (VIP) handles must:

  • Start and end with a letter or number and can include underscore (_) in between.

  • Be from 4 to 16 characters.

  • Cannot be all numeric

You can either enable or disable the SINA_VIP Validation rules.

TOM

China - Tom handles must:

  • Start with 'A-Z,' or 'a-z,' or hyphen (-), or underscore (_), or period (.), and then can include numbers.

  • Be from 6 to 18 characters.

You can either enable or disable the TOM Validation rules.

21CN

China - 21cn handles must:

  • Start with a letter or the number 1 and then can include hyphen (-), or underscore (_), or period (.), or '0-9'.

  • End in a letter or a number.

  • Be from 4 to 16 characters.

You can either enable or disable the 21CN Validation rules.

21CN_VIP

China - 21cn (VIP) handles can contain spaces. They cannot be all numeric; they must be from 4 to 20 characters.

You can either enable or disable the 21CN_VIP Validation rules.