Skip to content

Instantly share code, notes, and snippets.

@kaushikcfd
Last active January 24, 2018 04:43
Show Gist options
  • Save kaushikcfd/18feaa80508b863ef6f18dfdc0126199 to your computer and use it in GitHub Desktop.
Save kaushikcfd/18feaa80508b863ef6f18dfdc0126199 to your computer and use it in GitHub Desktop.
---------------------------------------------------------------------------
KERNEL: loopy_kernel_and_loopy_kernel_and_tsfc_kernel_and_loopy_kernel
---------------------------------------------------------------------------
ARGUMENTS:
A0_global: GlobalArg, type: np_atomic:dtype('float64'), shape: (A0_size), dim_tags: (N0:stride:1)
A0_size: ValueArg, type: np:dtype('int32')
coords_global: GlobalArg, type: np:dtype('float64'), shape: (coords_global_len, 2), dim_tags: (N1:stride:2, N0:stride:1)
coords_global_len: ValueArg, type: np:dtype('int32')
ltg_0: GlobalArg, type: np:dtype('int32'), shape: (nelements, 3), dim_tags: (N1:stride:3, N0:stride:1)
ltg_1: GlobalArg, type: np:dtype('int32'), shape: (nelements, 3), dim_tags: (N1:stride:3, N0:stride:1)
nelements: ValueArg, type: np:dtype('int32')
w_0_global: GlobalArg, type: np:dtype('float64'), shape: (w_0_global_len), dim_tags: (N0:stride:1)
w_0_global_len: ValueArg, type: np:dtype('int32')
---------------------------------------------------------------------------
DOMAINS:
[A0_size, nelements] -> { [ibf_gather_0, ibf_scat_0, ibf_dim_scat_0, ibf_scat_1, ibf_dim_scat_1, i_init_0_outer, i_init_0_inner, iel_outer, iel_inner] : 0 <= ibf_gather_0 <= 2 and 0 <= ibf_scat_0 <= 2 and 2ibf_scat_0 <= ibf_dim_scat_0 <= 5 and 0 <= ibf_scat_1 <= 2 and ibf_scat_1 <= ibf_dim_scat_1 <= 2 and i_init_0_inner >= 0 and -32i_init_0_outer <= i_init_0_inner <= 31 and i_init_0_inner < A0_size - 32i_init_0_outer and iel_inner >= 0 and -32iel_outer <= iel_inner <= 31 and iel_inner < nelements - 32iel_outer }
{ [i10, i1, i1_0] : 0 <= i10 <= 2 and 0 <= i1 <= 2 and 0 <= i1_0 <= 2 }
---------------------------------------------------------------------------
INAME IMPLEMENTATION TAGS:
i1: None
i1_0: None
i10: None
i_init_0_inner: l.0
i_init_0_outer: g.0
ibf_dim_scat_0: None
ibf_dim_scat_1: None
ibf_gather_0: None
ibf_scat_0: None
ibf_scat_1: None
iel_inner: l.0
iel_outer: g.0
---------------------------------------------------------------------------
TEMPORARIES:
acc_i10: type: np:dtype('float64'), shape: () scope:private
cnst: type: np:dtype('float64'), shape: (3, 3), dim_tags: (N1:stride:3, N0:stride:1) scope:global
cnst_0: type: np:dtype('float64'), shape: (3), dim_tags: (N0:stride:1) scope:global
cse: type: np:dtype('float64'), shape: () scope:private
cse_0: type: np:dtype('float64'), shape: () scope:private
sum_tmp_0: type: np:dtype('float64'), shape: (i1_0:3), dim_tags: (N0:stride:1) scope:private
---------------------------------------------------------------------------
INSTRUCTIONS:
for i_init_0_outer, i_init_0_inner
↱↱ A0_global[i_init_0_inner + i_init_0_outer*32] = 0.0 {id=init_0, tags=init}
││ end i_init_0_outer, i_init_0_inner
└│↱↱↱↱↱↱ ... gbarrier {id=gb1}
│││││││ for iel_inner, iel_outer, i1_0
↱│└│││││ acc_i10 = 0 {id=sum_tmp_i10_init}
││ │││││ end i1_0
││↱└││││↱↱ cse = (-1.0)*coords_global[ltg_0[iel_inner + iel_outer*32, ibf_scat_0], (-1)*2*ibf_scat_0] {id=insn_0, tags=cse:formknl}
│││↱└│││││↱↱ cse_0 = (-1.0)*coords_global[ltg_0[iel_inner + iel_outer*32, ibf_scat_0], 1 + (-1)*2*ibf_scat_0] {id=insn_0_0, tags=cse:formknl}
││││ │││││││ for i1_0, i10
└│└└↱└││││││ acc_i10 = acc_i10 + cnst[i10, i1_0]*(cnst[i10, 2]*w_0_global[ltg_1[iel_inner + iel_outer*32, 2]] + cnst[i10, 0]*w_0_global[ltg_1[iel_inner + iel_outer*32, 0]] + cnst[i10, 1]*w_0_global[ltg_1[iel_inner + iel_outer*32, 1]])*cnst_0[i10]*abs((cse + coords_global[ltg_0[iel_inner + iel_outer*32, ibf_scat_0], 2 + (-1)*2*ibf_scat_0])*(cse_0 + coords_global[ltg_0[iel_inner + iel_outer*32, ibf_scat_0], 5 + (-1)*2*ibf_scat_0]) + (-1.0)*(cse + coords_global[ltg_0[iel_inner + iel_outer*32, ibf_scat_0], 4 + (-1)*2*ibf_scat_0])*(cse_0 + coords_global[ltg_0[iel_inner + iel_outer*32, ibf_scat_0], 3 + (-1)*2*ibf_scat_0])) {id=sum_tmp_i10_update}
│ │ ││││││ end i10
↱│ └ └│└│└│ sum_tmp_0[i1_0] = acc_i10 {id=sum_tmp_0, tags=formknl}
││ │ │ │ end i1_0
││ │ │ │ for ibf_gather_0
└└ └ └ └ A0_global[ltg_1[iel_inner + iel_outer*32, ibf_gather_0]] = A0_global[ltg_1[iel_inner + iel_outer*32, ibf_gather_0]] + sum_tmp_0[ibf_gather_0] {id=insn, tags=formknl, atomic=update[A0_global]seq_cst/auto}
end iel_inner, iel_outer, ibf_gather_0
---------------------------------------------------------------------------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment