Dream. Dare. Do – that is Suyati’s work principle in a nutshell.
Duplicate data is an annoying and time consuming occurrence that clogs yours organization’s work systems. Now you can weed out duplicate leads, accounts and contacts in Salesforce and make the best of your CRM with ‘Duplicate Management’, a native/out of the box feature.
Be a part our webinar on how to use ‘Duplicate Management’ in Salesforce to eliminate duplicates and keep your organization’s systems clean with the presence of just good, consistent data. Come, let’s say NO to duplicate data devils!
 Solutions as of today
[i] AppExchange Solutions
[ii] Custom Solutions
 Welcome - Duplicate Management [WI'15] in Salesforce
 Creating, Editing and Deleting Rules
 Let us try it live!
 Points / Caveats to Remember
This feature covers the topic “Eliminate Duplicates with ‘Duplicate Management’ In Salesforce”. This article covers Duplicate Management feature was introduced by Salesforce and some of the concepts related to Duplicate Management like Creating and Editing rules, Matching Rules, Limitations, and resources.
Duplicates is a major problem which you should mitigate in order to ensure that you have the right set of data with you. Imagine that you have a lot of duplicate leads coming into your system, into your Salesforce, or your sales & marketing user. It is not really easy to solve the problem of duplicates. There are certain points that you should keep in mind in order to solve the problem of duplicates completely.
For example, imagine there are 2 leads, where 1 looks same as the other, but happens to be different. If the first lead has the word corporation and the second lead has the word Intel. So, if you have both the systems, you will need to even look into such forms where even their additional works still belong to the same company.
Issues of normalization and complexity arises especially in cross business domain. It’s not just your sales, but object leads and opportunities also come into picture. You need to implement duplicate management across different types of objects, just not through lead contacts, but it may also span into other customer objects. You must also consider the cost factor that revolves around this. Imagine that you are developing an Apex trigger or some sort of custom development involving Apex. Here you need to have a developer and for installing a third party Apex, you need to pay for the licenses.
Following AppExchange packages are good enough to manage duplicates within your Salesforce:
These apps are capable enough to handle custom and standard objects. If you are not going through an app, you need to view the custom solutions that the next app provides the developer and set up Apex triggers to manage or mitigate this problem.
Duplicate Management is an out of the box offering from Salesforce, which helps you manage cost and maintenance by removing the duplicates. It is a very real time, intuitive, and user-friendly software to set up and manage. It is available in the professional edition and above.
You don’t have to buy or purchase data.com licenses to use duplicate management. Duplicate Management supports Lead, Account, and Contact which are the 3 standard Sales Cloud objects. The other objects like opportunities or person accounts are not supported. The user can feel the duplicates and solve them to proceed ahead.
Rules of Duplicate Management
Duplicate Management is composed of 2 integral rules:
When a lead is flagged as duplicate, if the user selects Block, then the user cannot save the lead or the record it without making changes. With the Allow option, the users can always neglect the warning and save the record.
Matching Rule is where you specify the fields that needs to be looked for and Duplicate Rule in simple words, is the action (block or allow duplicate) that needs to be performed.
Concepts in Duplicate Management
Duplicate Record Item
The Duplicate record item or the Duplicate record set are the two new objects that are added to your list when you turn on Duplicate management. Duplicate record item, also known as a container or a bucket, is a list of duplicates that the user created by neglecting the warning. For example, if you select Allow option the lead gets saved, and the lead goes into the lead object. Thus a new record is created under the duplicate record item object.
Duplicate record set
Duplicate record set saves the reference to the Matching Rule that was responsible for finding or flagging the record as duplicate. Just as the duplicate record item saves the individual lead record, Duplicate Record Set saves the reference to the lead, contact, account or the custom object record that was flagged as duplicate.
The best part of these two concepts is that you can set up your own reports, workflow rule, triggers, or validation rules on top of these two. Thus you can identify the duplicate records created.
Error Logs save errors that might happen. There could be instances where the Fuzzy matching engine is not available due to which the Duplicate check could not be performed, or the Duplicate record sets or the Duplicate record items could not be saved. Error Logs help you solve this. The Error Logs alert the user and helps identify the errors.
Here is an example of how to create a Duplicate rule and a Matching rule for a custom object called “Employees”.
Setting up a Matching Rule:
Here we haven’t yet specified what to do when a duplicate employee has been found. So that’s where duplicate rules come in.
Setting up a Duplicate Rule:
Record Level Security
You have two levels of security in DMS: Enforce Sharing Rules and Bypass Sharing Rules
Enforce Sharing Rules- If the user is a standard user who only has permission to look into his or her own records, and if you set it to enforce sharing rules, when the duplicate management system is looking for duplicates it would only look at lead records that is owned by the user who is trying to create a duplicate.
Bypass Sharing Rules- When you set it to bypass sharing rules, it means that irrespective of the sharing settings and permissions that the user has, when you are checking for duplicates entire database is scanned. It is preferable to set bypass sharing rules which means that when a lead record is being rated and the entire database will be scanned to see if there are duplicates.
When the user tries to create or edit a record, there are two options- Block and Allow. You can block or allow alerts for actions. Irrespective of the action you specify, you always have the option to give a custom alert message for your end users.
Check Points in setting Rules
Capacity to compare
The out of the box Duplicate Management System that Salesforce provides not only has the capability to compare with the same object but can also compare with the existing contacts. When you are specifying another object, you need to specify a mapping. When the duplicate management checks the fields with which the First Name and Last Name should be compared with. This happens with cross object comparison.
In the previous example of Employee object, you can see the summary of the matching rule right away when you select it. Thus the field mapping is automatically set for you when you are selecting the matching rule for the same object. You need to specify the mapping only if you are doing a cross object comparison.
Imagine a use case or a requirement where the duplicate management should only run where the user name or the profile is set with certain conditions. So conditions that pertain to the running user or additional conditions can be specified in this section.
Once you select the relevant check box and activate the matching rule, it would send you a message that activation is successful. When you refresh the duplicate page, a green mark appears to indicate that matching rule is activated and mapping is perfect. Similarly, you can activate the duplicate rule also. But alert messages do not appear for duplicate rules.
Let us create a new employee and name that as John Doe. email@example.com Being the first record, it is saved successfully. Now, create another employee with the same first name (John) and last name (Doe). This time the email is jdoe@Salesforce.com. When you hit Save, you are blocked right away. It shows you a list of employees which are possible duplicates. You can see that the employee ID is a hyperlink. You can always click that and view further details of the employee.
While setting up the matching rules, the important thing that needs to be specified is the field or field equation. An equation that has the fields that need to be looked for. It is always good to know the additional things happening behind the scenes so that it helps you to build an efficient matching rule. The three things that you see while setting up the matching criteria are the field, the method, and the field match check box at the end.
Field- You must know the different type of fields that are supported. You can specify only one Look up field per matching rule. It supports number, phone, and picklist. Multi select picklists are not supported.
Method- Either you have the exact matching method which requires you to have two lead records with the exact same data in all respects or you have the fuzzy method where only partial match exists (example 70% or 80%).
If you have the method set as fuzzy, the duplicate management system would automatically set a threshold depending on the algorithm and the method. Depending upon the type of fuzzy method you have set the platform, the duplicate management system would automatically set a threshold. Threshold is basically the minimum score that needs to be satisfied for the match, for the record to be flagged as a duplicate. So if the platform sets the threshold as 80% then you need to have a threshold of 80 or more for that record to be flagged as a duplicate.
Concepts in Scoring Method:
Match key- Match key is a kind of key or code that is appended with all the records and the platform utilizes it to retrieve the set of duplicate records faster. It is similar to indexing. The index has basically the table of contents with the heading for each section and the page number. If you utilize the index page you can reach on to the topic or subject that you are looking for. Match key serves like an index in Duplicate Management System. Salesforce utilizes match key to retrieve that list of duplicates faster.
While setting up the matching rule you need to specify the first name, last name, and company as the fields to look for. The minute you activate the matching rules Salesforce scans the entire database and it appends a match key or a match code to each of the records so that Salesforce can utilize this match key to grab the list of possible duplicates while in action. Due to this indexing process, it takes a while for the platform to finish activating the matching rules.
Threshold- If you set the matching method as fuzzy, for the First Name field, the threshold is set as 85. If you set the fuzzy method for Last Name field, the threshold is 90. These numbers- 85 and 90, are automatically derived values and are not configured by administrators.
Creation of 2 records- If you create 2 records- one with company name as Intel core and the second with company name as just Intel, Salesforce scans the entire leads and removes words similar to ‘core’. For example: words like ‘corporation’, ‘limited’. Thus it would normalize the records to just Intel. This helps the duplicate management system to flag records even if they differ with respect to minor words like ‘corporation and limited’. The actual lead record is not affected, but the duplicate management does all this within its memory or in its own area to ensure that the lead records are captured even if there are slight differences.
Acronyms- Imagine that you write Advanced Micro Devices in one lead and in another lead record you write only AMD. Even though both look unique, the fact is that there is a duplicate. Salesforce would look in to such scenarios of expansions and abbreviations, and would normalize them to one single form – either the expanded form or either the short form.
Fuzzy Phone- The phone number is basically split into four different categories- International code, area code, next three digits, and then the last four digits. Each of them has been specified with a score. So if the international code of two records happen to match they get a score of 10 right away. If the area codes of two records match they get a score of 50. So weighted average would add them. 50 + 10 = 60. You still haven’t crossed the threshold. It also checks to see if the three digits of the phone numbers match. So that says 30. So that would total up to 90 which means that you have crossed the threshold of 80 and the record would be flagged right away. It does not mean that your lead records or your customer objects should have the exact phone numbers. If matches are found in a portion of the phone number like area code and the next three digits or maybe the combination of international code, area code, and next three digits, and threshold comes to a total of 80 and above, it is flagged as a duplicate. So this is very important to know.
Fuzzy Street- This is similar to fuzzy phone. A score is allotted for street name, a score of 20 is allotted for street number, and then a score of 15 is allotted to street suffix. So if any combination of these come up to a score that is greater than 80%, then it is flagged as a duplicate. The same applies to fuzzy zip also.
The duplicate management system manages these things by itself.
Different Algorithms in Duplicate Management
Acronym- Basically used for finding short hand notations and expanded notations and bring them to a unique form.
Edit Distance algorithm- The threshold is calculated using this. The score is calculated depending upon the number of additions, deletions, and character replacements that is done. For example, V.P. Sales and V.P. of Sales. When you get rid of the characters ‘o’ and ‘f’, you land into V.P. Sales.
Initials- Initials is just the initials of the first name and the last name.
Jaro Winkler distance- Involves the number of character replacements. This does not include deletions or insertions but it is just the character replacements. For example, Jonny becomes Jony when you get rid of one of the ‘n’s. So remember when you select fuzzy first name it is not just one algorithm that is used. It tries to use all the algorithms. The platform identifies the best algorithm that needs to be applied in order to calculate the threshold. When you set it to fuzzy first name, the first thing that it tries is to apply exact matching method to see if they match exactly. If they match exactly then it will not step into other algorithms. But if they do not match exactly, it would try different algorithms one after the other to get the expected threshold.
Keyboard distance- This is similar to edit distance, but here the score is identified by the position of the keys.
Kullback Liber distance- Here it is the words in common that determine the score.
Metaphone 3- Here the score calculated is based on the sounds that are produced.
All these algorithms- some of them are proprietary algorithms like metaphone 3- were written by Lawrence Philips. Salesforce uses a lot of algorithms and different methods and steps to ensure that you are flagging the exact set of records.
Let us look into some of the limitations of the duplicate management system.
Q: Can you use the duplicate management with the data loader? Does this generate a duplicate record set or items report? How about the batch upload?
Ans: Duplicate management does block your records when you are importing them via the data loader. But the only option that you have to remember is that you would not see the list of duplicates. You would just be stopped by, or flagged out by saying that this is a duplicate. You will not see the list of duplicates.
Remember that the concept of duplicate items, and duplicate record set comes into picture when you have the option set as Allow. When you set Allow and if you are importing data via data loader, the user will be allowed to create records, but simultaneously records will be created under your duplicate record items saying that these are duplicates.
Ans: If 2 users are simultaneously creating records, the other one is not considered, but all the records that is already there in the database, are of course considered while you are creating a record.
Any further queries will be taken up at our social media or you can simply drop us a mail at our firstname.lastname@example.org.