See Community post: https://redcap.vumc.org/community/post.php?id=259667&comment=259668
I was working on a dynamic sql field, but ran into a snag where one of the data elements I wanted to include in the label was a dropdown. So, instead of piping the label of the field into the visible part of the dynamic sql, I put the numerical key.
As a challenge, I spent the next few hours trying to hack a sql solution for replacing the value with the label and got something that seems to work. The following is the story:
So, the intended sql label was to look something like this:
[[record_id]] [project_title] - [name] - [basic_science_department]
or
[1232] Ride the Lightning - Jane Doe - Internal Medicine
Because of a bug? in how REDCap saves dynamic sql, I had to replace all instances of \\n
with concat("\\","\n")
so that on save and reload, the dynamic sql didn't get mangled. That explains the now, 'somewhat odd', syntax in some of these queries. Note that I didn't change the representation in the procedure as this would never go directly into REDCap anyhow.
So, I started with a basic multi-field dynamic sql query:
select
concat(
"[",rd1.record,"] ",
rd1.value,
" - ", rd2.value,
" - ", rd3.value
)
from [data-table:30] rd1
left join [data-table:30] rd2 on rd1.project_id=rd2.project_id and rd1.event_id=rd2.event_id and rd1.record = rd2.record and rd2.field_name='name'
left join [data-table:30] rd3 on rd1.project_id=rd3.project_id and rd1.event_id=rd3.event_id and rd1.record = rd3.record and rd3.field_name='basic_science_department'
left join redcap_metadata rm on rm.project_id = rd1.project_id and rm.field_name = rd3.field_name
where
rd1.project_id = 30
and rd1.field_name = 'project_title'
ORDER by record desc
As you might have guessed, there is a problem here because rd3's field_name, basic_science_department, is dropdown field! So, this is what we get:
[123] Title - Person - 15
[124] Title2 - Person2 - 21
...
So, how do you take a number like '21' and convert it into 'Bioengineering'?
Well, basic_science_department is really a number and the enum value for it comes from the metadata table. Where the enum is a long string with a rather unusual delimiter.
select element_enum from redcap_metadata where project_id = 30 and field_name = 'basic_science_department';
Reveals:
1, Anesthesia\n20, Biochemistry\n21, Bioengineering\n22, Biomedical Data Sciences\n2, Cancer Institute\n3, Cardiothoracic Surgery\n23, Chemical and Systems Biology\n4, Comparative Medicine\n5, Dermatology\n24, Developmental Biology\n31, Emergency Medicine\n32, Epidemiology and Population Health\n25, Genetics\n19, Health Research and Policy\n6, Medicine\n26, Microbiology and Immunology\n27, Molecular & Cellular Physiology\n28, Neurobiology\n7, Neurology\n8, Neurosurgery\n9, Obstetrics and Gynecology\n10, Ophthalmology\n11, Orthopaedic Surgery\n12, Otolaryngology\n13, Pathology\n14, Pediatrics\n15, Psychiatry & Behavioral Sciences\n16, Radiation Oncology\n30, Radiology\n29, Structural Biology\n17, Surgery\n18, Urology
How do we convert they enum key into the label of Bioengineering? The idea is to first find out where in the long string the enum key exists (e.g. the characters "21, "). But, like any hack, it is complicated and filled with gotchas.
Let's instead consider 7, Neurology. If I search for the first instance of "7, " I will actually end up matching the tail of "27, Molecular & Cellular Physiology". Ugh! So, it seems better if I search for the leading newline character too, e.g. "\n7, ". But, what about Anesthesia, the first one in the list? It doesn't start with "\n".
The end result is a clause like this to find the 'start index' given the key:
if (
instr(rm.element_enum,concat("\\","n",rd3.value,", ")) = 0, --This checks to see if we are on the first key e.g. (1, Anesthesia)
instr(rm.element_enum,concat(rd3.value,", ")), -- yep, so don't include the "\ n"
instr(rm.element_enum,concat("\\","n",rd3.value,", ")) + 2. -- otherwise, all other keys should be in the format of "\ n123, "
) + char_length(rd3.value) + 2, -- add key length (e.g. 1 for '5' and 2 for '21') plus 2 for the comma,space.
I'm going to call this the 'START_INDEX'. Unfortunately, we are going to have to repeat this whole thing every time we need to refer to the start position.
Next, we need to find the 'end index'. We find this by starting at the 'START_INDEX' and looking for the next delimiter, "\n", but... what if you are on the last one (i.e. Urology). It doesn't have another "\ n". Hence, another edge case:
-- LENGTH_OF_LABEL = NEXT_INDEX - START_INDEX
-- LENGTH_OF_LABEL =
-- deal with the edge case where the label is at the end of the enum and doesn't have a \\n after it
if (
locate(concat("\\","n"), rm.element_enum, START_INDEX <> 0,
-- (TRUE) we found the next \ \ n so lets get in the index for it.
locate(concat("\\","n"), rm.element_enum, START_INDEX),
-- (FALSE) the next \ \ n wasnt found so we must be at the end of the enum
char_length(rm.element_enum)+1
) - START_INDEX
IF WE FILL IN THE FULL 'START_INDEX' EXPRESSION FROM ABOVE, THE FINAL SQL FOR FINDING THE LENGTH_OF_LABEL IS:
if (
locate(concat("\\","n"), rm.element_enum,
-- starting search position
if (
instr(rm.element_enum,concat("\\","n",rd3.value,", ")) = 0,
instr(rm.element_enum,concat(rd3.value,", ")),
instr(rm.element_enum,concat("\\","n",rd3.value,", ")) + 2
) + char_length(rd3.value) + 2
) <> 0,
-- we found the next \ \ n so lets use it
locate(concat("\\","n"), rm.element_enum,
-- starting search position
if (
instr(rm.element_enum,concat("\\","n",rd3.value,", ")) = 0,
instr(rm.element_enum,concat(rd3.value,", ")),
instr(rm.element_enum,concat("\\","n",rd3.value,", ")) + 2
) + char_length(rd3.value) + 2
),
-- wasnt found so use the end of the enum
char_length(rm.element_enum)+1
) -
-- now we need to subtract the start index
(if (
instr(rm.element_enum,concat("\\","n",rd3.value,", ")) = 0,
instr(rm.element_enum,concat(rd3.value,", ")),
instr(rm.element_enum,concat("\\","n",rd3.value,", ")) + 2
) + char_length(rd3.value) + 2
)
Now that we have the start and end positions, we can use a simple substring function to get the actual text label.
substring(rm.element_enum, START_INDEX, LENGTH_OF_LABEL)
When you put it all together, it is a bit long - the whole query for my sql field came out as:
select
rd1.record,
concat_ws("","[",rd1.record,"] ",
concat(
" ",
IF(
LENGTH(rd1.value) > 42,
CONCAT(SUBSTRING(rd1.value, 1, 40), '...'),
rd1.value
)
),
concat(" : ",rd2.value),
concat(" : ",
substring(rm.element_enum,
-- ARG1: starting position
if (
instr(rm.element_enum,concat("\\","n",rd3.value,", ")) = 0,
instr(rm.element_enum,concat(rd3.value,", ")),
instr(rm.element_enum,concat("\\","n",rd3.value,", ")) + 2
) + char_length(rd3.value) + 2,
-- ARG2: LENGTH
-- Determine length subtracting end from start
-- It is complicated as both start and end have edge cases
-- determine if there is another \ \ n after the key
if (locate(concat("\\","n"), rm.element_enum,
-- starting search position
if (
instr(rm.element_enum,concat("\\","n",rd3.value,", ")) = 0,
instr(rm.element_enum,concat(rd3.value,", ")),
instr(rm.element_enum,concat("\\","n",rd3.value,", ")) + 2
) + char_length(rd3.value) + 2
) <> 0,
-- we found the next \ \ n so lets use it
locate(concat("\\","n"), rm.element_enum,
-- starting search position
if (
instr(rm.element_enum,concat("\\","n",rd3.value,", ")) = 0,
instr(rm.element_enum,concat(rd3.value,", ")),
instr(rm.element_enum,concat("\\","n",rd3.value,", ")) + 2
) + char_length(rd3.value) + 2
),
-- wasnt found so use the end of the enum
char_length(rm.element_enum)+1
) -
-- now we need to subtract the start index
(if (instr(rm.element_enum,concat("\\","n",rd3.value,", ")) = 0,
instr(rm.element_enum,concat(rd3.value,", ")),
instr(rm.element_enum,concat("\\","n",rd3.value,", ")) + 2
) + char_length(rd3.value) + 2
)
)
)
)
from redcap_data3 rd1
left join redcap_data3 rd2 on
rd1.project_id=rd2.project_id and
rd1.event_id=rd2.event_id and
rd1.record = rd2.record and
rd2.field_name='name'
left join redcap_data3 rd3 on
rd1.project_id=rd3.project_id and
rd1.event_id=rd3.event_id and
rd1.record = rd3.record and
rd3.field_name='basic_science_department'
left join redcap_metadata rm on
rm.project_id = rd1.project_id and
rm.field_name = rd3.field_name
where
rd1.project_id = 30
and rd1.field_name = 'project_title'
ORDER by record desc
NOTE1: So, you might ask, "What's up with the concat
and concat_ws
? If I only used concat and if we were missing the 'name' or 'basic_science_department' fields in the database, I could end up with nothing as a single null value will make the entire second column null.
For example,
concat("[", rd1.record, "] ", rd1.value, " - ", rd2.value, " - ", rd3.value)
would return empty if ANY of values were null.
concat_ws("", "[", rd1.record, "] ", rd1.value, " - ", rd2.value, " - ", rd3.value)
would still return non-null portions if any of the values were null, but if r3 was null, you would still get a final -, as in: "[123] Title Here - Jane Doe -"
by mixing concat w/ concat_ws you can make it so that the separator will also disappear if the value isn't there. So, in the same example above, you would end up with: "[123] Title Here - Jane Doe" in the event the department (rd3) value were Null.
NOTE2: Why left join and not just join?
In the event that some of the data fields hadn't been written to REDCap data, doing a strict join would return a null and the row would be omitted. To ensure you get a dropdown value even if some of the join fields are missing, you should use a left-join. Using the left-join on the redcap_metadata probably wasn't necessary, but I left it there.
PS - I think it took me as much time to write this post as to do the query :-) I wish the code editor worked better...
POST 2 - Let's go one step further... Can we create a MYSQL function to make this more managable? Sure!
Let's open up a new database query window and create the following function:
DROP FUNCTION IF EXISTS GET_REDCAP_ENUM_LABEL;
DELIMITER $$
CREATE FUNCTION GET_REDCAP_ENUM_LABEL(ENUM_STRING TEXT, ENUM_KEY VARCHAR(255))
RETURNS TEXT
DETERMINISTIC
BEGIN
RETURN
substring(ENUM_STRING,
-- ARG1: starting position
if (
instr(ENUM_STRING,concat("\\n",ENUM_KEY,", ")) = 0,
instr(ENUM_STRING,concat(ENUM_KEY,", ")),
instr(ENUM_STRING,concat("\\n",ENUM_KEY,", ")) 2
) char_length(ENUM_KEY) 2,
-- ARG2: LENGTH
-- Determine length subtracting end from start
-- It is complicated as both start and end have edge cases
-- determine if there is another \\n after the key
if (locate("\\n", ENUM_STRING,
-- starting search position
if (
instr(ENUM_STRING,concat("\\n",ENUM_KEY,", ")) = 0,
instr(ENUM_STRING,concat(ENUM_KEY,", ")),
instr(ENUM_STRING,concat("\\n",ENUM_KEY,", ")) 2
) char_length(ENUM_KEY) 2
) <> 0,
-- we found the next \\n so lets use it
locate("\\n", ENUM_STRING,
-- starting search position
if (
instr(ENUM_STRING,concat("\\n",ENUM_KEY,", ")) = 0,
instr(ENUM_STRING,concat(ENUM_KEY,", ")),
instr(ENUM_STRING,concat("\\n",ENUM_KEY,", ")) 2
) char_length(ENUM_KEY) 2
),
-- wasnt found so use the end of the enum
char_length(ENUM_STRING) 1
) -
-- now we need to subtract the start index
(if (instr(ENUM_STRING,concat("\\n",ENUM_KEY,", ")) = 0,
instr(ENUM_STRING,concat(ENUM_KEY,", ")),
instr(ENUM_STRING,concat("\\n",ENUM_KEY,", ")) 2
) char_length(ENUM_KEY) 2
)
);
END$$
DELIMITER ;
Now, you can replace the long query above with something like this:
select
rd1.record,
concat_ws("","[",rd1.record,"] ",
concat(
" ",
IF(
LENGTH(rd1.value) > 42,
CONCAT(SUBSTRING(rd1.value, 1, 40), '...'),
rd1.value
)
),
concat(" : ",rd2.value),
concat(" : ",GET_REDCAP_ENUM_LABEL(rm.element_enum, rd3.value))
)
from redcap_data3 rd1
left join redcap_data3 rd2 on
rd1.project_id=rd2.project_id and
rd1.event_id=rd2.event_id and
rd1.record = rd2.record and rd2.field_name='name'
left join redcap_data3 rd3 on
rd1.project_id=rd3.project_id and
rd1.event_id=rd3.event_id and
rd1.record = rd3.record and
rd3.field_name='basic_science_department'
left join redcap_metadata rm on
rm.project_id = rd1.project_id and
rm.field_name = rd3.field_name
where
rd1.project_id = 30
and rd1.field_name = 'project_title'
ORDER by record desc
That's a lot better :-)
Note the update where
and