Ever wondered which data sources out of myriad Access and Excel files have credit card numbers or email addresses?

 

Have you spent hours creating scripts for identifying which table records contain social security numbers or person identification records?

    • Overview

      DB Best Data Discovery Suite is a product that helps analysts, data stewards, DBAs, and marketing researchers in identifying data of specific types across heterogeneous data sources (databases and spreadsheets).


      Using Data Discovery Suite you can look for columns containing emails, phone numbers, credit card information, addresses and more, and list the valid and non-valid values in these columns. All that with the purpose of improving data quality and making data consistent. As part of compliance effort Data Discovery Suite helps to identify locations of personal information in corporate databases and in personal files.

    • Features

      • Supported data sources include Microsoft SQL Server, SQL Azure, Oracle, Microsoft Excel and Microsoft Access.

      • Allows scanning multiple heterogeneous data sources in parallel. For example, you can look for a specific credit card in Excel, and Oracle sources altogether.

      • Discovery can be done on a sample of rows from each table to speed up the process, and also for all rows for an in-depth analysis of the data quality.

      • The supported types include:

      • Built-in types like integers, date, currency.

      • List, and range of values such as Male/Female for gender or {0–9,11,12} for a mix of lists and ranges.

      • Complex Regular Expressions (e.g. 1–[1–9][0–9]{2}–[1–9][0–9]{6} for validation of phone numbers).

      • List of values as a result of sub querying an external data source.

      • In order to avoid false identification of data a cutoff noise to ratio proportion can be specified. For example some middle names can fit a list of values for gender (M, F). If we specify a cutoff ratio of 20% the middle name column will not be identified as a column containing gender with 95% errors.

      • Discovery results can be exported to a csv file.

      • Library of types, Regex builder, ability to export and import external types.

    • Screenshots

  • Watch the screencast

          

    Customer quotes

    «After running Discovery Suite we were totally amazed to identify a whole lot of proprietary customer information in Access and Excel files that shouldn't have made it there…»

    Bob M, CSO, Large Healthcare provider

     

    «Really impressive! It's like a fast "Search for Databases"!»

    Suraj V, Lead DBA, Financial Services company

          
          

    System requirements

    Data Discovery Suite requires that the following software is installed on your computer.

    Operating system

    Windows Server 2003; Windows Server 2008; Windows XP; Windows Vista

    100 MB of available hard disk space

    Additional software

    Microsoft .NET Framework 3.5 SP1 Oracle Client Software with Oracle Data Provider for .NET version 8.1.7 or higher (only for Oracle databases)