Friday, February 24, 2012

How I do to eliminate duplicate rows?

Hello,
I need to eliminate the duplicated rows in sql server 2000, but the duplicate is only for some fields of the row. However, I need all the fields of the row. For example, I have the next structure:
Id_type, number_type, date, diagnosis, sex, age, city

After many analysis I get many rows where the tree first field are repeated, so I need to leave only one but with the all another fields. This is because I need only the first time when the diagnosis appear.

How I can do it?

Thank you very much.

Regards,
Angela

As I understand you issue, when there are rows that have the same values for (ID_Type, Number_Type, Date), you wish to keep ONLY one (1) row, and it doens't matter which one of the duplicated rows is kept.

What if the non-duplicated fields is different, i.e., different diagnosis, or different sex, or different age, or different city (if that could happen)?

There are several methods to accomplish this task. First, a little more information is useful:

Version of SQL Server?

Are there other tables that have foreign key relationships to this table?

Approximately how much data is in the table (rows)?

Are there periods of time when no one is using the table?

Send this information and we can better assist you.

|||Hi Arnie, thanks you for your response.

Well, the problem is the information is the very bad quality .... so, I suppose that I get one row to the first time that some diagnosis appear to the pacient, but with data this not happen. So

I have found that to the same ID_Type, Number_Type, Date and same diagnosis exists rows that they have different sex or age or any other field, so I need to select only one, because I need the first time that this diagnosis appears...

Let me to response the questions:

Version of SQL Server?

R: Sql server 2000

Are there other tables that have foreign key relationships to this table?

R: yes, because some fields are only codes

Approximately how much data is in the table (rows)?

R: this table have 25 millions of rows... so much...

Are there periods of time when no one is using the table?

R: yes, this table is to datamining exercise.

I appreciate so much your help.

|||This kb should help:
http://support.microsoft.com/kb/139444|||

Here is an article that provides a bit more detailed instructions that the kb article.

http://www.sql-server-performance.com/rd_delete_duplicates.asp

One issue that neither article touches on is the size of your table. If there are many duplicates, attempting to work on the entire table could be a major struggle for your server due to the amount of Transaction Log activity and locks that will be required. You may find it more efficient to work with batches of, say 50,000 rows at a time. If there is a large amount of delete activity, there may be some Transaction Log issues that would have to be addressed.

|||

Thaks a lot, both articles are very nice....

Regards,

Angela

No comments:

Post a Comment