A few years ago, I needed to insert about 20 million rows into a table. The data to be inserted is a combination of two tables. And it’s important to note that I want it to be smooth and not lock tables for too long. Since the data does not come from files(usually CSV
) it means I cannot use the BULK INSERT
statement. Thus let me share how I accomplish this task. The main idea of this method is to split data insertion into small SQL portions . To demonstrate the method I prepared a simple Database. Imagine we have a Library project. The table dbo.Person
stores clients, the dbo.Book
stores books and dbo.RentHistory
stores books that were taken.
click to open in a new tab
The tables contain the next amount of rows:
- dbo.Person – 1 000 000 Rows
- dbo.Book – 20 Rows
Let’s assume that each library visitor rent each book. And we need to add this information to the dbo.RentHistory table. It is not hard to calculate that we need to insert 20 million rows.
Direct Way – Not Recommended
Than our SQL
statement will be the next:
INSERT INTO [dbo].[RentHistory]
([PersonId]
,[BookId]
,[RentStartDate]
,[RentEndDate])
SELECT
person.Id AS PersonId
,book.Id AS BookId
,GETDATE()
,DATEADD(month, 1, GETDATE())
FROM dbo.Person AS person
CROSS JOIN dbo.Book AS book
But this way does not guarantee table locks. As I mentioned above I suggest splitting one SQL statement into multiple.
Use Small Portions of Data – Recommended
Using the ‘print’ command you need to generate chunks of the insert statements . And each statement has the ‘GO’ operator that allows us to commit these small implicit transactions. You can always control the size of the chunk and stop or pause the migration.DECLARE @personCount INT, @bookCount INT SELECT @personCount = Count(1) FROM dbo.Person SELECT @bookCount = Count(1) FROM dbo.Book DECLARE @i INT, @pagesize INT, @totalRentRows INT = @personCount * @bookCount SET @i=0 SET @pagesize=100 WHILE @i <= @totalRentRows BEGIN PRINT ' INSERT INTO dbo.RentHistory (PersonId, BookId, RentStartDate, RentEndDate) SELECT PersonId, BookId, getDate(), DATEADD(month,1,getDate()) FROM ( SELECT ROW_NUMBER() OVER (ORDER BY person.Id, book.id) as TempId ,person.Id AS PersonId ,book.Id AS BookId FROM dbo.Person AS person CROSS JOIN dbo.Book AS book ) SUB1 where TempId between '+cast(@i as varchar(20)) +' and '+cast(@i+@pagesize as varchar(20)) +' PRINT ''Page ' + cast((@i / @pageSize) as varchar(20)) + ' of ' + cast(@totalRentRows/@pageSize as varchar(20))+''''+ ' GO ' SET @i=@i+@pagesize END
Here is an example of the first two INSERT
statements:
INSERT INTO dbo.RentHistory (PersonId, BookId, RentStartDate, RentEndDate) SELECT PersonId, BookId, getDate(), DATEADD(month,1,GETDATE()) FROM ( SELECT ROW_NUMBER() OVER (ORDER BY person.Id, book.id) as TempId ,person.Id AS PersonId ,book.Id AS BookId FROM dbo.Person AS person CROSS JOIN dbo.Book AS book ) SUB1 where TempId between 0 and 100 PRINT 'Page 0 of 200000' GO INSERT INTO dbo.RentHistory (PersonId, BookId, RentStartDate, RentEndDate) SELECT PersonId, BookId, getDate(), DATEADD(month,1,getDate()) FROM ( SELECT ROW_NUMBER() OVER (ORDER BY person.Id, book.id) as TempId ,person.Id AS PersonId ,book.Id AS BookId FROM dbo.Person AS person CROSS JOIN dbo.Book AS book ) SUB1 where TempId between 100 and 200 PRINT 'Page 1 of 200000' GO
click to open in a new tab
Now you need to copy generated SQL statements from the ‘Message’ tab and start your migration. I am pretty sure there is a ton of various methods to solve such kind of SQL migrations and this one can be considered in your daily routine.