Portable Inline Assembly for C Compilers!

	NOTE: marking blocks as asm might not be necessary. Using the compiler provided
		header <stdassembly.h> might be the only requirement with this API.

	New keyword __asm__ or asm defined before blocks {}.

	An asm block allows the user to handle input and output registers accross a
	asminstr() call. Without using anything in <stdassembly.h> asm blocks
	behave as normal blocks.

	Register values are fixed at entry into the asm block. The values represented in 
	the provided register macros will be unchanged (unless modified) or emulated by the
	compiler as unchanged.
	All registers are in a indeterminate state on entry into the asm block unless
	modified. Can still save their values (useful if needing to save execution state).
	The last modified state in the specified registers will be the register value
	during the call to asminstr(). Example:
		asm {
			EAX = 0;
			RDI = 1;
			EAX = 2;
			asminstr(syscall);
		}

	The asminstr() call eax is 2 and rdi is 1. Compiler could optimize out line
	EAX = 0.

		int dest1, dest2, src = 8;
		asm {
			RAX = src;
			RBX = 0x2;
			asminstr(mul, RBX);
			dest1 = RDX;
			dest2 = RAX;
		}

	During the asminstr() call, RAX is the last value of src on entry to the asm block.
	The compiler will ensure that the values of dest1 and dest2 are set to the values
	of RDX and RAX right after the asminstr() call, upon leaving the asm block.

	Within an asm block users can set values to registers from C variables using the
	register macros for the underlying architecture, defined in <stdassembly.h>.

	If a register macro is an lvalue means it is clobbered and if it is a rvalue means
	you want to set the value in that register to a variable. Compiler can
	automatically detect which registers are being clobbered.

	Some specifiers can be used on asm blocks to allow/disallow compiler optimizations:
		volatile
			- Instructions cannot be reordered.
		_Noreturn
			- means the asm block will not return to caller; compiler does not
				need to worry about clobbered registers.
	
	Functions body {} can be defined with asm {}. The function specifiers will apply
		to the asm block.

	label identifiers in the scope of a asm block can have their address assigned to a
	variable, the resulting pointer is a reg_pc_t as labels are only relevant in the
	context of setting the program counter.

	Memory fences can be handled with <stdatomic.h>. But could provide similar
	functionality in <stdassembly.h>
	

	#include <stdassembly.h>

	Implementations that define the macro __STDC_NO_ASSEMBLY__ need not provide this
		header nor support any of its facilities.

	The <stdassembly.h> header shall define the following types reg_t, reg_sp_t,
		and reg_pc_t that represent general purpose registers, stack pointer,
		and program counter types.

	The <stdassembly.h> header shall declare the following register macros that
	represent the stack pointer (SP) and program counter (PC):
		SP
		PC

	These are high level concepts. The compiler handles the underlying mechanism. For
	example PC = env->pc; would set the the program counter but the underlying
	mechanism would depend on the underlying architecture (Harvard, von Neumann, etc).
	env->sp = SP; would save the stack pointer but this might include saving of the
	link register and frame pointer in some architectures.

	Can use architecture specifics if you do not care about portability.


	The <stdassembly.h> header shall declare the following register macros that
	represent generic register types.

		CALLEESAV1
		...
		CALLEESAV*

	Each with their own macro suitable for use in #if preprocessing directives named
	ASM_HAS_CALLEESAV*.

	Also provided is CALLEESAV_COUNT macro that is a total of all callee saved
	registers for the ABI.

	Also must define architecture defined register macros that expand to modifiable
	lvalues. Internally the compiler must store and load these registers as requested.

	x86-64/amd64 example:
		RAX
		EAX
		AX
		AL
		AH
		RBX
		...

	The <stdassembly.h> header shall declare the following function, where
		opcode is the architecture defined assembly opcode.

	Underlying addressing mode is determined by compiler, according to what variables
	are passed as arguments.
		Literals represent an immediate.
		Scalar types represent register.
		Pointer types represent register indirect.
		Function pointers used for branch instructions.

	void asminstr(opcode, ...);

	Generic example operand usage:
		void asminstr(opcode);
		void asminstr(opcode, dest, src);
		void asminstr(opcode, dest, src, src);
		void asminstr(opcode, dest, dest, src);

	Example:
		asminstr(mov, &a, b);

	Operands are allocated according to underlying opcode.
	Implicit clobbers depend on the instruction opcode.

	Example implementation of _Noreturn void _exit(int) and
		size_t read(int, char *, size_t):

	#include <stddef.h>;

	/* For brevity we shall assume x86-64 SysV ABI Linux. */

	#define LINUX_SYS_EXIT 60

	static inline _Noreturn void
	_exit(int status) asm
	{
		RAX = LINUX_SYS_EXIT;
		RDI = status;
		asminstr(syscall);
	}

	#define LINUX_SYS_READ 0

	extern int errno;

	static inline size_t
	read(int fildes, char *buf, size_t nbytes)
	{
		size_t ret;

		asm {
			RAX = LINUX_SYS_READ;
			RDI = fildes;
			RSI = buf;
			RDX = nbytes;
			asminstr(syscall);
			ret = RAX;
		}

		if (ret < 0) {
			errno = -ret;
			ret = -1;
		}

		return ret;
	}

	Example implementation of int setjmp(jmp_buf env) and
		_Noreturn void longjmp(jmp_buf, int):

	#include <stdint.h>;

	/* For brevity we shall assume x86-64 SysV ABI. */

	typedef struct {
			reg_t rbx;
			reg_t rbp;
			reg_t r12;
			reg_t r13;
			reg_t r14;
			reg_t r15;
			reg_sp_t rsp;
			reg_pc_t rip;
	} jmp_buf;

	static inline int
	setjmp(jmp_buf *env) asm
	{
		/*
		Compiler ensures register values are the same as they
		where on asm block entry (function call).
		Compiler knows about memory clobbers.
		*/
		env->rbx = RBX;
		env->rbp = RBP;
		env->r12 = R12;
		env->r13 = R13;
		env->r14 = R14;
		env->r15 = R15;
		/*
		Accessing stack pointer C compiler ensures stack pointer is at same
		position it was from asm block entry (function call).
		*/

		/*
		Compiler handles underlying mechanism to set those values.
		On some ABIs the compiler would have to set the stack pointer above
		the function arguments.
		x86-64 ABI env->rip is set the the value of the rsp on entry into the
		function. rsp + 8 is argument 1.
		*/

		env->rsp = SP;
		env->rip = PC;

		/* Synonym for: xor rax, rax */
		return 0;
	}

	/* _Noreturn means compiler need not worry about clobbered registers. */
	static inline _Noreturn void
	longjmp(jmp_buf *env) asm
	{
		/*
		Compiler ensures all these registers are assigned the values during the
		asminstr() call.
		*/
		RBX = env->rbx;
		RBP = env->rbp;
		R12 = env->r12;
		R13 = env->r13;
		R14 = env->r14;
		R15 = env->r15;
		SP = env->rsp;
		RAX = 1;           /* Return value. */
		PC = env->rip;  /* Perform jump. */

		/*
		We could perform a x86-64 jump instruction with asminstr(jmp, env->rip);
		instead of PC = env->rip; (if you don't care about porability.
		*/
	}