AddRef

Wednesday, March 27, 2013

REST Checklist

I have been recently reading some books about the REST architectural design. Since there are a lot of little details I can easily forget, I compiled the most generic rules and principles into a relatively short checklist.

Books:
RESTful Web Services Cookbook by Subbu Allamaraju
REST in Practice by Jim Webber, Savas Parastatidis, and Ian Robinson
REST API Design Rulebook by Mark Massé

REST Checklist

CRUD

GET

Is GET used to read a resource state?
Is GET idempotent? (Can I repeat the same GET request?)
Is GET safe? (Is GET free from side effects?)
Does GET return "200 OK" when successful?
Does GET return "404 Not Found" if resource does not exist?
Are the resources that are often requested together combined into coarse-grained composites?
Does resource granularity ensure that more cacheable, less frequently changing, or immutable data is separated from less cacheable, more frequently changing, or mutable data?
Are similar resources organized into collections?

POST

Is POST used to create a new resource with a server-generated URI?
Is a Slug header used to let clients suggest a name for the server to use as part of the URI of the new resource?
Does POST return "201 Created" when successful?
Is a Location header used in the response to return a link to the new resource when successful?
Are one-time URIs with generated tokens that are valid just for one usage used to prevent repeatable POSTs?
Does POST return "403 Forbidden" if the token was already used?

PUT

Is PUT used to update the resource state?
Is PUT used to create a new resource with a client-generated URI?
Is PUT idempotent? (Can I repeat the same PUT request?)
Is PUT used to update the whole resource and not the part of it? (When you need to update the part of resource, you should use PATCH or POST. PATCH is not widely adopted yet, but it does exactly that.)
Does PUT return "200 OK" or "204 No Content" when successful?
Does PUT return "404 Not Found" if resource does not exist?
Optional: does PUT create a new resource if it does not exist?
Are entity tags (ETag) and conditional request headers (If-Unmodified-Since, If-Match) used to prevent concurrent update conflicts?
Does PUT return "403 Forbidden" if the client does not include conditional request headers?
Does PUT return "412 Precondition Failed" when conditional request headers do not match the current state?

DELETE

Is DELETE idempotent? (Can I repeat the same DELETE request?)
Does DELETE return "204 No Content" when successful?
Does DELETE return "404 Not Found" if resource does not exist?
Does DELETE return "405 Method Not Allowed" if resource exists but cannot be deleted?
Are entity tags (ETag) and conditional request headers (If-Unmodified-Since, If-Match) used to prevent deleting resources based on stale information?
Does DELETE return "403 Forbidden" if the client does not include conditional request headers?
Does DELETE return "412 Precondition Failed" when conditional request headers do not match the current state?

Beyond CRUD

Is GET used to support safe and idempotent computing/processing functions?
Is POST used to perform any unsafe or nonidempotent operation that involves modifying more than one resource atomically, or whose mapping to PUT or DELETE is not obvious?
Is POST used to start long-running requests asynchronously?
Is DELETE used to start long-running resource deletion requests asynchronously?
Are the custom HTTP methods used only when it’s absolutely impossible to avoid them?

Representations

Does server use Content-Type header to describe the type of the representation?
Is charset parameter included into Content-Type header?
Is "application/xml" media type (default UTF-8) used instead of "text/xml" (default us-ascii)?
Optional: Are the content-type headers use application-specific media types (application/vnd+xml) to take advantage of hyper-media format?
Does representation of the resource include a self link to the resource?
Does representation of the resource include id for each of the application domain entities that make up a resource?
If collection is paginated, does representation contain the link to the previous and the next page?
Are the collections returned in the response homogenic?
Is XmlConvert class used to properly encode data (dates and times, numbers etc.) in portable data formats?
Is binary data retrieved using multipart media types or as a separate resource instead of encoding within textual formats using Base64 encoding?
Is status code 4xx used for errors due to client inputs?
Is status code 5xx used for errors due to server implementation or its current state?
Optional: Does error response include an identifier or a link that can be used to refer to the error logged on the server side for later tracking or analysis?

URIs

Are singular nouns used for document names?
Are plural nouns used for collection names?
Are verbs or verb phrases used for controller names?
Is media type used to indicate the representation format instead of file extension in URI?
Is every URI designated to the unique resource instead of tunnelling requests to different resources through the same URI?
Are the clients spared from constructing URIs by exposing only stable URIs and making all the other URIs discoverable from hyper-media links?
Are URI cool? (Cool URIs don’t change.)

Sunday, February 19, 2012

I want it customizable, how do I achieve this?

I have spent some time recently working on the product design which would make it customizable, to the point that users will be allowed to add new entity attributes, or even define new entity types.

Good example (thanks Kirill for this idea) would be TFS . Default configuration allows you to create work items of types Bug, Issue, Task etc.:

Each type has its own set of fields, like Title or State:

But you are free to add your own fields, or even define new item types

(see http://msdn.microsoft.com/en-us/library/ms400654(v=vs.100).aspx)

One of the ways to achieve this degree of extensibility is to use a Generic Attribute Table, which is well-recognized antipattern. In such design all field values are stored in a single table with fields like ObjectType, ObjectID, FieldID, FieldValue.

Of course, you lose most of the good stuff the relation database provides, including referential integrity, mandatory fields, and SQL data types.

So how did guys from TFS team solve this task?

Let’s first look how the default work items are stored in the database. Quick examination of tables and fields leads to dbo.WorkItemTypes table.

The work items data itself stored in dbo.WorkItemsAre table:

And here’s something interesting: fields with those weird names like Fld10013. We can guess that these fields, are, in fact, custom fields.

Let's follow the steps from MSDN and create a new work item type.

Export the bug definition:

Modify it in notepad:

Import it back:

Now if I go to VS, I can create the work item of my custom type:

And fill my custom field:

Now let’s look into the database.

As expected, we discover that the new record appears for the new custom item type:

What is the most interesting is that we see a new field added to dbo.WorkItemsAre table:

We can see that it has the value that we provided.

Field metadata can be found in dbo.Fields table:

So we can see that TFS uses Single-Table Inheritance (http://www.martinfowler.com/eaaCatalog/singleTableInheritance.html) and allows you to add fields for the custom attributes.

Monday, September 5, 2011

MS SQL Server vs Oracle: Row Versioning

I hear sometimes that SQL Server nowadays, when it supports row versioning, is just the same as Oracle. Is that so? I am going to compare and see.

Indeed, the basic principle is the same: when one transaction is modifying the data in the table, the original row is stored in some temporary place, so other transactions can read the unchanged data.

SQL Server uses the tempdb:

Oracle uses Undo Segments:

Which conceptually looks pretty much the same. Let's get deep now.

STEP 1: Create and setup databases.

A. SQL Server TEST_01, with Read Committed Snapshot isolation enabled:

ALTER DATABASE TEST_01 SET READ_COMMITTED_SNAPSHOT ON;

B. SQL Server TEST_02, with Snapshot isolation enabled:

ALTER DATABASE TEST_02 SET ALLOW_SNAPSHOT_ISOLATION ON;

C. Any Oracle database.

STEP 2: Create a table and fill with the demo data.

1. SQL Server (Read Committed Snapshot, Snapshot)

CREATE TABLE dbo.LockTest (I INT PRIMARY KEY, C VARCHAR(50));

INSERT INTO dbo.LockTest (I, C) VALUES (1, 'A'), (2, 'B'), (3, 'C');

2. Oracle

CREATE TABLE LockTest (I INT PRIMARY KEY, C VARCHAR(50));

INSERT INTO LockTest (I, C) VALUES (1, 'A');

INSERT INTO LockTest (I, C) VALUES (2, 'B');

INSERT INTO LockTest (I, C) VALUES (3, 'C');

STEP 3: Start the first transaction which modifies the table row (DONOT commit).

1. SQL Server (Read Committed Snapshot, Snapshot)

SET TRANSACTION ISOLATION LEVEL READ COMMITTED

BEGIN TRAN

UPDATE dbo.LockTest SET C = 'B Modified' WHERE I = 2

Exclusive locks will be acquired:

We can check that the original version of the row is now stored in tempdb:

SELECT * FROM sys.dm_tran_version_store;

2. Oracle

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

UPDATE LockTest SET C = 'B Modified' where I = 2;

Exclusive locks will be acquired:

STEP 4: Start another transaction and try to read

STEP 4.a READ UNCOMMITTED

1. SQL Server (Read Committed Snapshot)

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

BEGIN TRAN

SELECT * FROM dbo.LockTest

Despite an original row is stored in tempdb, transaction reads the dirty data.

2. SQL Server (Snapshot)

The same result, transaction reads the dirty data.

3. Oracle

In fact, Oracle doesn't support READ UNCOMMITTED serialization level. Oracle always reads committed data. It simply can afford it, because, as we are going to see, it does not have any penalties.

STEP 4.b READ COMMITTED

1. SQL Server (Read Committed Snapshot)

SET TRANSACTION ISOLATION LEVEL READ COMMITTED

BEGIN TRAN

SELECT * FROM dbo.LockTest

So we can see that SQL Server could read data, including row changed in another transaction, obtaining the original data from tempdb.

2. SQL Server (Snapshot)

SET TRANSACTION ISOLATION LEVEL READ COMMITTED

BEGIN TRAN

SELECT * FROM dbo.LockTest

Now SQL Server cannot get any results, being blocked by the first uncommitted transaction:

So we can see that SQL Server is trying to get Shared lock on the row in question (KEY, technically).

3. Oracle

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

SELECT * FROM LockTest;

Pretty much the same as in case of SQL Server in Read Committed Snapshot we can read rows, including one being updated by the first transaction. Oracle reads the original row from undo segments.

STEP 4.c REPEATABLE READ

1. SQL Server (Read Committed Snapshot)

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ

BEGIN TRAN

SELECT * FROM dbo.LockTest

Once again with SQL Server, we are blocked. And the reason is that transaction is trying to obtain the Shared lock:

2. SQL Server (Snapshot)

Transaction will be blocked, the same reason.

3. Oracle

Oracle does not support REPEATABLE READ isolation level - it doesn't have to, and we'll see why.

STEP 4.d SERIALIZABLE

1. SQL Server (Read Committed Snapshot)

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN

SELECT * FROM dbo.LockTest

And we are blocked again. Now because transaction wants to get the Shared lock on the whole range of keys, which will satisfy the WHERE condition (empty).

2. SQL Server (Snapshot)

The same reason, the same result.

3. Oracle

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

SELECT * FROM LockTest;

STEP 4.e SQL SERVER ONLY: SNAPSHOT

Because of unimpressive results with the SERIALIZABLE isolation level, SQL Server offers something different: SNAPSHOT isolation.

SET TRANSACTION ISOLATION LEVEL SNAPSHOT

BEGIN TRAN

SELECT * FROM dbo.LockTest

And now we finally got it - the same as Oracle does in SERIALIZABLE.

Wrap-up:

1. Oracle never reads uncommitted data.

SQL Server always reads uncommitted data when in READ UNCOMMITTED.

2. In READ COMMITTED, Oracle transaction reads the committed data and never gets blocked by parallel transaction which modifies the same data. When the data is modified by another uncommitted transaction, Oracle reads original data from undo segments.

SQL Server (READ COMMITTED SNAPSHOT) transaction, the same as in Oracle, simply reads all the rows, modified in another uncommitted transaction from tempdb.

3. Oracle does not support REPEATABLE READ, because writers do not block readers in SERIALIZABLE.

SQL Server (READ COMMITTED SNAPSHOT) does not support REPEATABLE READ WELL ENOUGH, but SNAPSHOT can be used as an alternative.

4. Oracle does fine in SERIALIZABLE, writers do not block readers.

SQL Server does fine in SNAPSHOT, writers do not block readers.

Well, are they the same? When correctly used, pretty much they are. The performance (temptb vs undo segments) is the only question.

Saturday, September 3, 2011

Sql Server: Quering Locks

SELECT
lock.request_session_id AS "Session Id"
, task.blocking_session_id AS "Blocked By"
, DB_NAME(lock.resource_database_id) AS "DB Name"
, obj.[type] AS "Object Type"
, SCHEMA_NAME(obj.schema_id) + '.' + obj.Name AS "Object Name"
, lock.resource_type AS "Resource Type"
, lock.resource_description AS "Resource Id"
, lock.request_mode AS "Lock Type"
, lock.request_status AS "Lock Status"
, lock.request_reference_count AS "Ref Count"
, stmt.text AS "Sql Statement"
, ssntran.is_user_transaction as "User Transaction"
, acttran.name as "Transaction Name"
, '***' AS '***'
, ssn.login_name AS "Login"
, ssn.host_name AS "Host"
, ssn.program_name AS "Program"
, ssn.login_time AS "Login Time"
, conn.net_transport AS "Transport"
, conn.auth_scheme as "Authentication"
FROM
sys.dm_tran_locks lock
LEFT JOIN sys.dm_os_waiting_tasks task ON lock.lock_owner_address = task.resource_address
JOIN sys.partitions prtn ON prtn.hobt_id = lock.resource_associated_entity_id
JOIN sys.objects obj ON obj.object_id = prtn.object_id
JOIN sys.dm_exec_sessions ssn ON ssn.session_id = lock.request_session_id
JOIN sys.dm_tran_session_transactions ssntran ON ssn.session_id = ssntran.session_id
JOIN sys.dm_tran_active_transactions acttran ON ssntran.transaction_id = acttran.transaction_id
JOIN sys.dm_exec_connections conn ON conn.session_id = ssn.session_id
CROSS APPLY sys.dm_exec_sql_text(conn.most_recent_sql_handle) AS stmt
WHERE resource_database_id = db_id()
ORDER BY lock.request_session_id

SQL Server vs Oracle: Memory Management

There are many things that MS SQL Server and Oracle do the same way. Let's look at the memory management.

Whatever happens to the data, it has to be read from permanent storage first, and written back after. Things might change in next ten years, but so far the reality is that this permanent storage is a hard disk.

The hard disk performs read/write operations in minimal units, which are sectors. No surprise the databases also read and write the data not in single bytes, but in some reasonably small pieces. In SQL Server this piece is called page, and in Oracle it is called block. The SQL Server page has a fixed size of 8 KB, the Oracle block size may vary, but 8 KB could be safely called "typical".

The optimal performance can be achieved when the data pieces are contiguous. Thus both SQL Server and Oracle group their minimal units, pages and blocks, respectively, into larger chunks of contiguous pieces, called extents.

In SQL Server, extent is 8 logically contiguous pages.

In Oracle, extent is 1 to N logically contiguous blocks, where N = 2 GB / Block Size.

"Logically contiguous" should be understood as "contiguous in a file". Files themselves might not be contiguous on disk.

With some exceptions, extents are the units in which the storage is consumed by tables and indexes, both in SQL Server and Oracle.

The difference is how the storage is optimized when objects are yet small enough to effectively occupy the whole extent.

SQL Server will not allocate the whole extent for an object until it grows beyond 64 KB, instead allocating pages from an extent shared between several objects (Mixed extent).

In Oracle this problem is solved by simply using different extent sizes: one for the newly created objects (INITIAL), and another one for the existing objects which grow further (NEXT).

Friday, September 2, 2011

Validation rules

Validation, as any other constraint, is a source of assumptions in the application code.

Not only it provides the data quality check at some certain point, the whole logic of processing that piece of data afterwards assumes the validation rule was enforced.

I.e. if you cannot create a customer account without specifying the postal code, you should not be surprised to see SQL statements which will break as soon as we remove this check. The primitive example would be the following SQL query:

SELECT c.FirstName + ' ' + c.LastName + ', ' + c.PostalCode FROM dbo.Customer c WHERE...

So relaxing the validation rule is not as simple as removing the condition check and an error message.
You have to trace the usage of the data and make sure none of the functionality is broken, neither from the technical, nor from the logical point of view.

Say, if you now allow the postal code to be NULL, then firstly, the previous select will now yield NULL in case we have a customer w/o the postal code.

Secondly, the reason might be not that we are not going to send customers their invoices anymore. This might be because now we have a default postal code somewhere else, which should be used when the customer's postal code is not set. I am making this up, but you've got the idea.

Wednesday, August 31, 2011

From static pages to the server-side applications

1991 The first version of HTTP protocol has only one method, GET, and the response from the server is always an html page. All the server has to do is to retrieve the html file content from the file system and send it back to the client as a stream of bytes.

1993 The Common Gateway Interface (CGI) standard defines how web server software can delegate the generation of web pages to a stand-alone application or an executable file. CGI scripts run outside the web server, and spawn a separate operating system process per request.

1997? The Internet Server Application Program Interface (ISAPI) standard is created, which now defines how DLLs can run as part of the web application's process and address space.

1998 Active Server Pages (ASP) is Microsoft's first server-side script-engine for dynamically-generated web pages, implemented as an ISAPI extension.

2002 ASP.NET is released, still an ISAPI extension.

2009 ASP.NET MVC source code released, based on ASP.NET.

In other words, all the frameworks from Microsoft we use the last decade to provide dynamic responses to the web requests are no more than an abstraction layer on top of plain old ISAPI standard from late 90s.