Module 2 - Creating Data Types and Tables

 

Data Types

-system supplied data types

-alias data types (aka user defined data types)

-third “user defined types” (managed code addressed later)

 

System Supplied datatypes:

-All described here we use these every day

 

Notes:
-a global identifier id is generated with the “new id” function.

-Special datatype timestamp is not really a date/time is a row identifier and there for row identity purposes. Sql_variant can contain different sorts of data…

 

Alias Data Types

Based on system types

Used for data elements with a specific format

Created with the CREATE TYPE statement

 

Syntax:
CREATE Type dbo.zip

FROM char(5)

NOT NULL

 

Can also be created in SSMS

DBàProgrammabilityàTypesàUser-Defined Data Typesàrcànew type.

 

Tables

How does SQL Server organize rows in a table

Rows can only be 8060k

I’m still a little weary of this diagram, its on page 2-13

Here is what mark says:
SQL server starts each row with a 4 byte

**header

Contains housekeeping information like:

Number of fixed data length rows
a pointer to the null block (talk about this later)

And things like that

Following the header there are the data rows that hold

**fixed data

            -in sql server fixed data rows will always be the same length no matter how much data they actually hold.

           

Following the fixed data section is whats called a

**null block

the null block has house keeping information about the fixed data section.

·        how many fixed data columns are in this row?

·        nullability section that has a bitmap that describes that for each one of those columns, is that column nullable. 1 if nullable, 0 if not

·        number of columns in the fixed data section with pointers to those columns

Following the null Block (if there are any variable data length rows) is the

**Variable Block

·        how many variable data columns are in this row?

·        Pointer to the first byte of the each variable data column.

 

Large Data Value Organization

Large datatypes are going away

-there is now a “max” identifier for our text/ntext/binary data

- with the text/ntext/binary data using max identifier the default is that

  • data values that will fit in a row (smaller than 8k) will stay in the row
  • data values that will not fit in a row, >8k, will be stored in separate page (B-Tree Structure) with only a pointer to that data in the row.

The above is true UNLESS you toggle the LARGE DATA TYPES OUT OF ROW ON/OFF setting in the sp_tableoption system sproc. The default is OFF.

 

 

 

Considerations

-Collation

            -sort order of characters in a column

            -characters that are available

            -case sensitive or not

-Nullability

            -default can be set by admin

            -suggests you specify this so admin can not override

 

-Special column types

 

-Computed type a virtual column that is not physically stored in the table.

 

-Identity type  with integer like data system generated sequential numbering goes here based on your specification.

 

-timestamp type is  there for backward compatibility (was rowidentifier) it appears to create a unique identifier based on the time that is guaranteed to be unique within the database.

 

-uniqueidentifier  populated with the new ID function (GUID) unlike identity

 

           

Creating the table:

SSMS à Right click Tablesà New Table

TSQL Syntax:

CREATE TABLE schema.tablename

(field int,

Filed varchar)

 

Partitioning Tables

-A partition table takes data rows of the table and places them in different file groups on the basis of the partition scheme and function.

 

 

Partitioned horizontally.

 

Partition function specifies 2 things

-data type (i.e. data)

-ranges of values (boundries)

 

-the partition function defines the partition boundries, the boundary values can be assigned LEFT or RIGHT

 

Syntax:

CREATE PARTITION FUNCTION pf_DataOfOrder (datetime)

AS RANGE RIGHT

FOR VALUES (‘01/01/2003’, ‘01/01/2004’, ‘01/01/2005’)

 

Above Boundaries Description:

Partition 1 will have all records before 1/1/03

Partition 2 will have all records from 03-04

Partition 3 will have all records from 04-05

Partition 4 will have all current data???

 

 

A partition SCHEME is what is going to actually partition the table.

The SCHEME is what references the partition function.

 

Number of file groups needs to match the number of partitions defined in the partition function.

 

Syntax:

 

-- Create partition function

USE AdventureWorks

GO

 

CREATE PARTITION FUNCTION pf_OrderDate (datetime)

AS RANGE RIGHT

FOR VALUES ('01/01/2004', '01/01/2005')

/*

Partition 1 (in fg1) contains all 2003 data i.e. <1/1/2004

Partition 2 (in fg2) contains all 2004 data i.e. <1/1/2005

Partition 3 (in fg3) will fill up with new data

Partition 4 (in fg4) is marked as the next used filegroup in partition scheme 'ps_OrderDate' ??? no idea what this means

*/

-- Add filegroups and create partition scheme

ALTER DATABASE AdventureWorks ADD FILEGROUP fg1

ALTER DATABASE AdventureWorks ADD FILEGROUP fg2

ALTER DATABASE AdventureWorks ADD FILEGROUP fg3

ALTER DATABASE AdventureWorks ADD FILEGROUP fg4

GO

 

ALTER DATABASE AdventureWorks

ADD FILE

( NAME = data1,

  FILENAME = 'c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\AWd1.ndf',

  SIZE = 1MB,

  MAXSIZE = 100MB,

  FILEGROWTH = 1MB)

TO FILEGROUP fg1

GO

 

ALTER DATABASE AdventureWorks

ADD FILE

( NAME = data2,

  FILENAME = 'c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\AWd2.ndf',

  SIZE = 1MB,

  MAXSIZE = 100MB,

  FILEGROWTH = 1MB)

TO FILEGROUP fg2

GO

 

ALTER DATABASE AdventureWorks

ADD FILE

( NAME = data3,

  FILENAME = 'c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\AWd3.ndf',

  SIZE = 1MB,

  MAXSIZE = 100MB,

  FILEGROWTH = 1MB)

TO FILEGROUP fg3

GO

 

ALTER DATABASE AdventureWorks

ADD FILE

( NAME = data4,

  FILENAME = 'c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\AWd4.ndf',

  SIZE = 1MB,

  MAXSIZE = 100MB,

  FILEGROWTH = 1MB)

TO FILEGROUP fg4

GO

 

CREATE PARTITION SCHEME ps_OrderDate

AS PARTITION pf_OrderDate

TO (fg1, fg2, fg3, fg4)

GO

--Partition scheme 'ps_OrderDate' has been created successfully. 'fg4' is marked as the next used filegroup in partition scheme 'ps_OrderDate'.

--If a partition scheme does not have a defined “NEXT” filegroup an error will occur.

 

 

 

-- Create partitioned table

CREATE TABLE dbo.PartitionedTransactions

(

                TransactionID int IDENTITY(1,1) NOT NULL,

                ProductID int NOT NULL,

                TransactionDate datetime NOT NULL DEFAULT (getdate()),

                TransactionType nchar(1) NOT NULL

)

ON ps_OrderDate(TransactionDate)

GO

 

-- Insert data

INSERT INTO dbo.PartitionedTransactions

SELECT  ProductID, TransactionDate, TransactionType

FROM Production.TransactionHistory

GO

 

INSERT INTO dbo.PartitionedTransactions

VALUES

(1, '01/01/2005', 'S')

GO

 

-- View partition metadata

SELECT * FROM sys.Partitions

WHERE [object_id] = OBJECT_ID('dbo.PartitionedTransactions')

 

/*

partition_id         object_id   index_id    partition_number hobt_id              rows

-------------------- ----------- ----------- ---------------- -------------------- --------------------

72057594054246400    871674153   0           1                72057594054246400    42844

72057594054311936    871674153   0           2                72057594054311936    70599

72057594054377472    871674153   0           3                72057594054377472    1

*/

 

 

-- View data with partition number

select $Partition.pf_OrderDate(TransactionDate) as PartitionNo, max(TransactionDate), min(TransactionDate)

FROM dbo.PartitionedTransactions

group by $Partition.pf_OrderDate(TransactionDate)

order by $Partition.pf_OrderDate(TransactionDate)

/*

PartitionNo                        

----------- ----------------------- -----------------------

1           2003-12-31 00:00:00.000 2003-09-01 00:00:00.000

2           2004-09-03 00:00:00.000 2004-01-01 00:00:00.000

3           2005-01-01 00:00:00.000 2005-01-01 00:00:00.000

*/

 

 

-- Verify lowest value in each partition

SELECT MIN(TransactionDate) FirstTran, $Partition.pf_OrderDate(TransactionDate) PartitionNo

FROM dbo.PartitionedTransactions

GROUP BY $Partition.pf_OrderDate(TransactionDate)

ORDER BY PartitionNo

 

/*

FirstTran               PartitionNo

----------------------- -----------

2003-09-01 00:00:00.000 1

2004-01-01 00:00:00.000 2

2005-01-01 00:00:00.000 3

*/

 

/* left this for reference (not used here)

-- Reset database

DROP TABLE dbo.PartitionedTransactions

DROP PARTITION SCHEME ps_OrderDate

DROP PARTITION FUNCTION pf_OrderDate

ALTER DATABASE AdventureWorks REMOVE FILE data1

ALTER DATABASE AdventureWorks REMOVE FILE data2

ALTER DATABASE AdventureWorks REMOVE FILE data3

ALTER DATABASE AdventureWorks REMOVE FILE data4

ALTER DATABASE AdventureWorks REMOVE FILEGROUP fg1

ALTER DATABASE AdventureWorks REMOVE FILEGROUP fg2

ALTER DATABASE AdventureWorks REMOVE FILEGROUP fg3

ALTER DATABASE AdventureWorks REMOVE FILEGROUP fg4

8?

 

 

 

What can you do with partitioned Data? (page 2-32)

SWITCH – Allows us to exchange a populated table or partition with an empty table or partition.

Note: The way I understand it is this. You take the chunk of data that is in partition one and plop it into an Archive table thus making your table smaller.

Like Prp to PrpArchive.

SELECT top 0 *

into PartitionedTransactionsArchive

from PartitionedTransactions

--have to move this to fg1 I manually did it but I dont know the syntax to do it.

 

ALTER TABLE dbo.PartitionedTransactions

SWITCH PARTITION 1

TO PartitionedTransactionsArchive

 

 

select top 10 * from PartitionedTransactionsArchive

 

select max(TransactionDate),min(TransactionDate) from PartitionedTransactionsArchive

/*

----------------------- -----------------------

2003-12-31 00:00:00.000 2003-09-01 00:00:00.000

*/

 

 

select max(TransactionDate),min(TransactionDate) from PartitionedTransactions

/*                       

----------------------- -----------------------

2004-09-03 00:00:00.000 2004-01-01 00:00:00.000

*/

MERGETakes two partitions and squishes them into one partition.

Uses the ALTER PARTITION FUNCTION

When you perform a merge operation the partition for the boundary value specified in the ALTER PARTITION FUNCTION statement is remoe, the data is merged into the adjacent partition.

Here is the create partition function from above:

CREATE PARTITION FUNCTION [pf_OrderDate](datetime)

AS RANGE RIGHT FOR VALUES (N'2004-01-01 00:00:00', N'2005-01-01 00:00:00')

/*

Partition 1 (in fg1) contains all 2003 data i.e. <1/1/2004

Partition 2 (in fg2) contains all 2004 data i.e. <1/1/2005

Partition 3 (in fg3) will fill up with new data

Partition 4 (in fg4) is marked as the next used filegroup in partition scheme 'ps_OrderDate' ??? no idea what this means

--If a partition scheme does not have a defined “NEXT” filegroup an error will occur.

 

*/

 

Merge code:

ALTER PARTITION FUNCTION pf_OrderDate()

MERGE RANGE (‘01/01/2004’)

 

This is what the function now looks like this (SSMSàDBàStorageàPartition Functionsàpf_orderdate)

CREATE PARTITION FUNCTION [pf_OrderDate](datetime)

AS RANGE RIGHT FOR VALUES (N'2005-01-01 00:00:00')

/*

Partition 1 (in fg1) contains all 2003 data i.e. <1/1/2004

Partition 2 (in fg2) contains all 2004 data i.e. <1/1/2005

Partition 3 (in fg3) will fill up with new data

Partition 4 (in fg4) is marked as the next used filegroup in partition scheme 'ps_OrderDate' ??? no idea what this means

--If a partition scheme does not have a defined “NEXT” filegroup an error will occur.

 

*/

 

Here is all of the code needed to duplicate the above:

/* MERGE functionality*/

/*********************************************************/

--a create script of our current [pf_OrderDate] function

USE [AdventureWorks]

GO

/****** Object:  PartitionFunction [pf_OrderDate]   

Script Date: 03/04/2008 16:05:08 ******/

CREATE PARTITION FUNCTION [pf_OrderDate](datetime)

AS RANGE RIGHT FOR VALUES (N'2004-01-01 00:00:00', N'2005-01-01 00:00:00')

 

/*********************************************************/

--running the merge operation:

ALTER PARTITION FUNCTION pf_OrderDate()

MERGE RANGE ('01/01/2004')

 

/*********************************************************/

--a create script of our new [pf_OrderDate] function

USE [AdventureWorks]

GO

/****** Object:  PartitionFunction [pf_OrderDate]   

Script Date: 03/04/2008 16:07:09 ******/

CREATE PARTITION FUNCTION [pf_OrderDate](datetime)

AS RANGE RIGHT FOR VALUES (N'2005-01-01 00:00:00')

 

 

/**********************************************************/

 

-- View partition metadata

SELECT * FROM sys.Partitions

WHERE [object_id] = OBJECT_ID('dbo.PartitionedTransactions')

 

/*

BEFORE:

partition_id         object_id   index_id    partition_number hobt_id              rows

-------------------- ----------- ----------- ---------------- -------------------- --------------------

72057594054246400    871674153   0           1                72057594054246400    42844

72057594054311936    871674153   0           2                72057594054311936    70599

72057594054377472    871674153   0           3                72057594054377472    1

*/

 

 

/*

AFTER:

72057594054246400              871674153              0              1              72057594054246400              113443

72057594054377472              871674153              0              2              72057594054377472              1

*/

 

/****************************************************************************

Partition 1 and 2 were merged and the partitions were renumbered.

*****************************************************************************/

 

 

-- View data with partition number

select $Partition.pf_OrderDate(TransactionDate) as PartitionNo, max(TransactionDate), min(TransactionDate)

FROM dbo.PartitionedTransactions

group by $Partition.pf_OrderDate(TransactionDate)

order by $Partition.pf_OrderDate(TransactionDate)

 

/* BEFORE

PartitionNo                        

----------- ----------------------- -----------------------

1           2003-12-31 00:00:00.000 2003-09-01 00:00:00.000

2           2004-09-03 00:00:00.000 2004-01-01 00:00:00.000

3           2005-01-01 00:00:00.000 2005-01-01 00:00:00.000

*/

 

                /*AFTER

PartitionNo                        

----------- ----------------------- -----------------------

1           2004-09-03 00:00:00.000 2003-09-01 00:00:00.000

2           2005-01-01 00:00:00.000 2005-01-01 00:00:00.000

                */

 

 

SPLIT – The opposite of merge this will split one partition into two.

Now I want to split partition 1 back into 2003 data and 2004 data

 

 

--old func

USE [AdventureWorks]

GO

/****** Object:  PartitionFunction [pf_OrderDate]    Script Date: 03/04/2008 16:23:14 ******/

CREATE PARTITION FUNCTION [pf_OrderDate](datetime) AS RANGE RIGHT FOR VALUES (N'2005-01-01 00:00:00')

 

/*begin split operation*/

ALTER PARTITION FUNCTION pf_OrderDate()

                SPLIT RANGE ('01/01/2004')

 

--new func

USE [AdventureWorks]

GO

/****** Object:  PartitionFunction [pf_OrderDate]    Script Date: 03/04/2008 16:23:32 ******/

CREATE PARTITION FUNCTION [pf_OrderDate](datetime) AS RANGE RIGHT FOR VALUES (N'2004-01-01 00:00:00', N'2005-01-01 00:00:00')

 

 

                /*BEFORE

PartitionNo                        

----------- ----------------------- -----------------------

1           2004-09-03 00:00:00.000 2003-09-01 00:00:00.000

2           2005-01-01 00:00:00.000 2005-01-01 00:00:00.000

                */

 

                /*AFTER

PartitionNo                        

----------- ----------------------- -----------------------

1           2003-12-31 00:00:00.000 2003-09-01 00:00:00.000

2           2004-09-03 00:00:00.000 2004-01-01 00:00:00.000

3           2005-01-01 00:00:00.000 2005-01-01 00:00:00.000

*/

 

YAY!!! Weee

 

End of Module 2!!

 

QUIZ Info:

The smallest allocable unit of storage is an extent, eight contiguous pages.

Collation can be set for a server, database, and a column

XML data has methods

A real datatype is an approximate numeric datatype