call us toll-free +1 855 855 3600
 
DB Best Chronicles
 
Talks on Big Data, Mobile Apps, Web and Software Development

Correlation, Causation and Big Data

Posted by | On December 2nd, 2014 | In Big Data | Tags: , , , , , ,
 

Decision making based on BI and Analytics is a well-established industry practice. There are numerous BI implementations around the world that are used on a regular basis to run operational reports, KPI based dashboards and analytics. Lot of decisions are made based on those BI systems.

However, have you ever paused and wondered, if the decision is based on ‘correlation’ or ‘causation’. Let me explain. Two events could be correlated, but that does not necessarily mean causation. In other words, just because the events occur concurrently, does not necessarily mean one caused the other or was directly responsible for the other. For instance, hiring of a new sales person and subsequent increase in revenue, or a new coach and a team’s victory in a game. How do we know, for sure, that the new sales person caused the increase in sales and not some other factor that was purely coincidental, such as a favorable market trend?

Read the rest of this entry »

TOP-10 Database Management and Migration Articles

Posted by | On March 11th, 2014 | In Big Data, Data Managament, Database Migration | Tags: , , , , , , ,
 

In less than 2 years DB Best professionals have written and posted over 100 technical articles on database migration and management related topics. These articles have gathered more than 20K views, thousands of likes and dozens of comments from various database management professionals from all over the world. In this post we would like to present a concise and informative review of our most rated articles:

 

http://www.dreamstime.com/royalty-free-stock-image-word-cloud-database-image26686606

Read the rest of this entry »

Extract-Transform-Load (ETL) Technologies – Part 2

Posted by | On June 10th, 2013 | In Big Data, Database Migration | Tags: , ,
 

Big Data – Extract-Transform-Load (ETL) 002 (click to download)

My last blog (Extract-Transform-Load (ETL) Technologies – Part 1) discussed the purpose, function, and some of the inherent benefits of ETL technologies for moving data from source applications into target reporting and analytic environments.  Hopefully from that discussion one can gain some general understanding of ETL; what it is, how it can be used, and why we would want to use it.  Now let’s dive a bit deeper and discover who some of the key vendors are in the ETL Tools marketplace.

ETL tools have been around for some time, have evolved, matured, and now present us with productive environments for Big Data, Data Warehouse, Business Intelligence, and analytics processing.  The ETL data processing cycle itself can be simple or highly complex.  Most vendor offerings address the diverse requirements well enough although some do better than others.  The challenges with ETL tools come from the other important considerations, like:

Read the rest of this entry »

So You Think You Can Search – Comparing Microsoft SQL Server FTS and Apache Lucene

Posted by | On February 8th, 2013 | In Big Data, Web & Software Development
 

Prologue

In 1999, when I was in my first year of university, if I wanted to check my e-mail I had to come into a lab. I used to come into the class, open browser, type address and then go out to grab some coffee. Usually when I came back I was really happy to see that at least half of the page had been loaded.

Today people are not so patient. They used to get response from the web-sites at least in a few seconds and if your application is not that fast, you have a problem. In fact, “few seconds” is a very optimistic term. Nowadays we measure response time with milliseconds.

If you want to create something really popular in the Web (and your application operates big amount of data), sooner or later you will face the need to implement full text search engine over your system.

My Goals

In this post I will try to describe and compare two technologies – Microsoft SQL Server Full Text Search and Apache Lucene. The first one is an embedded SQL Server feature and the second one is a third-party software library, originally developed in Java and ported to many other platforms, including .NET.

lucene vs sql server fts

How to build a super fast search for your web application.

Read the rest of this entry »

Extract-Transform-Load (ETL) Technologies – Part 1

Posted by | On December 31st, 2012 | In Big Data, Database Migration | Tags: , ,
 

Big Data – Extract-Transform-Load (ETL) 001 (click to download)

*~:~* Happy New Year *~:~*

My last blog (Column Oriented Database Technologies) discussed the differences between Row and Column oriented databases and some key players in this space.  Concepts and technologies on Big Data have been discussed in previous blogs (Big Data & NoSQL Technologies & NoSQL .vs. Row .vs. Column).  From these blogs one should surmise that deciding upon the best database technology (or DBMS vendor) really depends on schema complexities, how you intend to retrieve your data, and how to get it there in the first place.  We’re going to dive into this next, but before we do it is imperative that we briefly examine the differences between OLTP and OLAP database designs.  Then let’s leave OLTP details for a future blog as I expect most readers already know plenty about transactional database systems.  Instead we’ll focus here on OLAP details and how we process Big Data using ETL technologies for data warehouse applications.

Generally the differences between OLTP and OLAP database applications center upon how frequently data must be stored and retrieved, the integrity of that data, and how much of there is and its growth.  OLTP database schemas are optimized for processing transactions by an increasing number of users while OLAP database schemas are optimized for aggregations against an increasing amount of data and exponential query permutations.  Design considerations involve normalization, indexing, datatypes, user load, storage requirements, performance, and scalability.  We will need to defer the many interesting details on these considerations for a future blog.

      OLTP-DW

Read the rest of this entry »

SAP HANA – a New Database Management System

Posted by | On October 8th, 2012 | In Big Data, Database Migration
 

In 2007 I worked for a company which provided data warehousing solutions. Our biggest client database was about 2-3 TB which was not so much even five years ago, but because of extremely complicated ETL and complex reports we had a lot of troubles.

We spent hundreds of hours monthly on code optimization, but every 30 minutes saved on ETL and processing were compensated with monthly data growth and we had to start over and over.

Finally it comes the time when we couldn’t improve anything else in programming code and company decided to drastically upgrade the hardware. But extremely expensive servers and disk storages gives us only several percent performance improvement. We thought about migration to another platform but preliminary estimations showed that every traditional RDBMS has the same bottleneck – disk storage. Maybe, if it was happening today, we probably would turn to fundamentally new technologies.

The whole software development industry is highly innovation-oriented, but RDMBSs always are perceived as something very conservative. I still have record about ANSI-92 SQL knowledge in my CV and almost all fundamental works from 70s are up to date, but in spite of that data management software industry is very fast developing and great new products comes to market each year.

Read the rest of this entry »

How to Emulate Oracle Sequences in SQL Server

Posted by | On August 14th, 2012 | In Big Data, Database Migration
 

When migrating Oracle tables to SQL Server, the problem of sequence conversion is one of the most common. SQL Server 2008 does not have any object even distantly resembling an Oracle sequence, and some tradeoffs are inevitable when you make decisions about the mechanisms to implement its behavior in the target system. In SQL Server 2012 Microsoft introduced sequences, but several incompatibilities with Oracle implementation still exist.

Oracle Sequences in SQL Server

First, let us discuss what you can do if the target system is SQL Server 2008.
Read the rest of this entry »

How to Implement a SQL Server View Based on Temporary Table

Posted by | On August 13th, 2012 | In Big Data, Database Migration
 

If you ask any experienced SQL Server developer: “Can I build a view on temporary table?” the answer will be “Of course not. And why on Earth you need it?” Well, Oracle developers can tell you why. Temporary tables are very useful when you have to separate data between users/sessions and creating view on them is just convenient way to store some logic. SQL Server developers used to deal without it but when they are trying to move database from Oracle they will have to find workaround because it’s pretty common pattern in Oracle.

Let’s take a look at how it works.

SQL Server View

First thing that comes to mind is to use regular table instead of temporary. To emulate data separation between sessions we can add “spid” column with default value @@spid (system function that returns the session ID of the current user process). Than we need to create view named exactly like original temporary table (of course we have to name our regular table differently) and add where condition “spid = @@spid”. So select from the view will return only that part of data, which was populated in current connection. Next challenge here is that in Oracle data from temporary table is automatically deleted on rollback/commit.
Read the rest of this entry »

How to Create Autonomous Transactions in SQL Server

Posted by | On August 13th, 2012 | In Big Data, Database Migration
 

You probably heard about autonomous transactions in Oracle. In a few words, using just one pragma directive in a PL/SQL block you can isolate it from the callers’ context, so it becomes independent transaction. It’s quite useful if you want organize data audit and is pretty widely used in production databases. As you might guess these autonomous transactions do not have direct equivalent in Microsoft SQL Server. The only way to isolate a Transact-SQL block from a transaction context is to open a new connection. There are several options here, so let’s start with SSMA approach.

Autonomous Transactions in SQL Server

Assume that we have some logging logic in a stored procedure named “dbo.LogError” and we want to commit it even in case when main transaction will be roll backed.

As in previous case with UDF exceptions SSMA use extended procedure (now – to open a new connection). SSMA will wrap our routine like that:
Read the rest of this entry »

Column Oriented Database Technologies

Posted by | On July 24th, 2012 | In Big Data, Web & Software Development | Tags:
 

Column Oriented Database Technologies (click to download)

My recent blog (Big Data & NoSQL Technologies) discussed various NoSQL technologies and market vendors.  Today let’s dive into column-oriented databases and why they should play an important role in any data warehouse whose focus is on aggregations or metrics (and whose isn’t?).

So you are all probably familiar with row-oriented databases.  Tables of data where rows of fields (also called columns) represent the structural storage and the corresponding SQL queries that select, insert, update, and delete that data.  Most database vendors like Oracle, Microsoft, Sybase, Informix, and many others all base their technology on this ANSI standard.  Column-oriented databases are indeed what you might surmise; tables of data where columns of data values represent the structural storage.  What you might not expect is that on the surface many column-oriented databases look and feel like row oriented databases also using SQL queries in much the same way.  Creating tables, storing data, querying them are all pretty much identical.  They may appear similar, but two principal things to understand is that the significant differences under the hood, in particular, physical storage and query optimization.

As noted in my previous blogs on NoSQL, there is also a column-store technology out there.  Let’s not confuse that with column oriented databases.  They are different.  Since several NoSQL column-store vendors were highlighted before, we will focus instead on the column oriented database vendors here.

First, some key benefits to column oriented databases:

Read the rest of this entry »