I write data mining software professionally, and one weakness that comes to mind is the deduplication process. In order to combine data from different sources, the software has to determine which entries correspond to the same person. It does this by looking for common elements with a low false positive rate. If two records have the same phone number, email address, site plus account name, social security number, or name-address pair, they are almost certainly the same person, so they will be combined. This relation is transitive, so if A has the same phone number as B and B has the same email address as C, then A, B, and C will... (read more)

7

0