This page lists all the problems with the data that were encountered during the data conversion process, and how the issue was resolved.

The problems are numbered, in the order in which they were encountered during the conversion.

Given a choice, earlier problems should be solved before later problems.
This allows the later steps in the conversion to receive "correct" data, which help eliminate spurious problems, and aids the discovery of problems hidden by bad data.

Unsolved problems are marked with an <code>*</code>.

== (#1) FOLLOW_MAP_TIME duplicate keys ==

=== Problem ===
The data dump says that the <code>FOLLOW_MAP_TIME</code> table has a primary key consisting of, in order, the columns: <code>FMT_FOL_date</code>, <code>FMT_FOL_B_focal_AnimID</code>, <code>FMT_time</code>.
But these columns contain duplicate values.

=== Bad Data ===
The duplicate values can be listed (from the <code>raw</code> schema) with:

'''This query no longer reports results because the data was changed in the original MS Access data.'''

<pre>SELECT *
FROM "FOLLOW_MAP_TIME"
JOIN (SELECT "FMT_FOL_date" AS the_date
, "FMT_FOL_B_focal_AnimID" AS the_animid
, "FMT_time" AS the_time
FROM "FOLLOW_MAP_TIME"
GROUP BY "FMT_FOL_date", "FMT_FOL_B_focal_AnimID", "FMT_time"
HAVING count(*) > 1
) AS fmt
ON ("FOLLOW_MAP_TIME"."FMT_FOL_date" = fmt.the_date
AND "FOLLOW_MAP_TIME"."FMT_FOL_B_focal_AnimID" = fmt.the_animid
AND "FOLLOW_MAP_TIME"."FMT_time" = fmt.the_time);</pre>

=== Solution ===

Fixed in MS Access 11/17/23 by ICG. Checked all against Tikis.
When run in SokweDB, table does not exist

== (#2) SUBADULT_ARRIVALS_LOG has a textual SA_first_tiki_date column ==

=== Problem ===

The SUBADULT_ARRIVALS_LOG table has a column that is supposed to contain a date, but instead contains the string "TEXTY".

Likely, the entire row is bad. The row contains:

<pre>
SA_B_AnimID | SA_first_tiki_date | SA_notes
-------------+--------------------+----------
TEXTY | TEXTY | TEXTY</pre>

=== Bad Data ===

In the <code>raw</code> schema run:

'''This query no longer reports results because the data was changed in the original MS Access data.'''

<pre>
select * from "SUBADULT_ARRIVALS_LOG" where "SA_first_tiki_date" = 'TEXTY';</pre>

=== Solution ===

Fixed in MS Access, 11/16/2023, ICG.
Deleted row

== (#3) BRECORD_NOTES contains rows where BREC_FOL_date has values that are not just a date ==

=== Problem ===

There are 4 rows in BRECORD_NOTES where the BREC_FOL_date column, supposedly a date, contains time values that are not '00:00:00'.

=== Bad Data ===

In the <code>raw</code> schema run:

'''This query no longer reports results because the data was changed in the original MS Access data.'''

<pre>
select * from "BRECORD_NOTES" where "BREC_FOL_date"::TIME <> '00:00:00';</pre>

=== Solution ===

Fixed in MS Access, 11/16/2023, ICG
Deleted time stamp

== (#4) MATING_EVENT contains rows where M_FOL_date has values that are not just a date ==

=== Problem ===

The MATING_EVENT table contains 1 row where the time portion of M_FOL_date is not '00:00:00'.

=== Bad Data ===

In the <code>raw</code> schema run:

'''This query no longer reports results because the data was changed in the original MS Access data.'''

<pre>
select * from "MATING_EVENT" where "M_FOL_date"::TIME WITHOUT TIME ZONE <> '00:00:00';</pre>

=== Solution ===

Fixed in MS Access, 11/16/2023, ICG
Deleted time stamp

== (#5) BIOGRAPHY.DepartdateError data discarded ==

=== Problem ===

The BIOGRAPHY.DepartdateError column contains data that was, after discussion, determined to be unusable.

=== Bad Data ===

In the <code>raw</code> schema run:

'''This query no longer reports results because the data was changed in the original MS Access data.'''

<pre>
select * from raw."BIOGRAPHY" where "B_DepartdateError" <> 0;
</pre>

=== Solution ===
Do not convert the data. There is no corresponding column in the new db design.

== (#6) BIOGRAPHY.B_AnimID_num column contains the empty string ==

=== Problem ===
The BIOGRAPHY.B_AnimID_num column contains the empty string, instead of NULL.

=== Bad Data ===

In the <code>raw</code> schema run:

'''This query no longer reports results because the data was changed in the original MS Access data.'''

<pre>
select * from raw."BIOGRAPHY" where "B_AnimID_num" = '';
</pre>

=== Solution ===

Changed 11 empty string values to NULL in MS Access. ICG 12/6/2023

== (#7) BIOGRAPHY.B_AnimID_num column is textual ==

=== Problem ===
The BIOGRAPHY.B_AnimID_num column contains has a data type of TEXT.
The data values all begin with "CH".

=== Bad Data ===

In the <code>raw</code> schema run:

'''This query no longer reports results because the data was changed in the original MS Access data.'''

<pre>
select "B_AnimID", "B_AnimID_num"
from raw."BIOGRAPHY"
where "B_AnimID_num" <> ''
AND "B_AnimID_num" IS DISTINCT FROM NULL
order by "B_AnimID";
</pre>

=== Solution ===

Removed the "CH" prefix and made the column an integer in MS Access. ICG 12/6/2023

== (#8) BIOGRAPHY.DadID_publication_info column contains NULL values ==

No longer a problem as NULLs are now allowed.

=== (Not) Problem ===
The BIOGRAPHY.DadID_publication_info column has a data type that allows <code>NULL</code> values.
SokweDB wants only text

=== Bad Data ===

In the <code>raw</code> schema run:

'''This query no longer reports results because the data was changed in the original MS Access data.'''

<pre>
select *
from raw."BIOGRAPHY"
where "B_DadID" IS NULL
order by "B_AnimID";
</pre>

=== Solution ===

Changed the <code>NULL</code> to the empty string ("") in MS Access. ICG 12/6/2023

== (#9) BIOGRAPHY.DadID column contains non-AnimID values ==

=== Problem ===
The BIOGRAPHY.DadID column has a non-AnimID values.

=== Bad Data ===

In the <code>raw</code> schema run:
<pre>
select "B_AnimID", "B_DadID"
from raw."BIOGRAPHY"
where "B_DadID" is not null
and "B_DadID" <> ''
and not exists (select 1
from raw."BIOGRAPHY" as search
where search."B_AnimID" = raw."BIOGRAPHY"."B_DadID");
</pre>

=== Solution ===

Create the DadIDPrelim column in a BIOGRAPHY_DATA table, and make a BIOGRAPHY view that combines Dad_ID and DadIDPrelim into DadID -- adding the '_prelim' suffix as expected.

== (#10) BRECORD_NOTES contains rows where BREC_time has values that are not just a time ==

=== Problem ===

There are 56 rows in BRECORD_NOTES where the BREC_time column, supposedly a time, contains date values that are not '1899-12-30'.

=== Bad Data ===

In the <code>raw</code> schema run:
<pre>
select * from raw."BRECORD_NOTES" where "BREC_time"::DATE <> '1899-12-30';</pre>

=== Solution ===
The data is fixed in the conversion process.

== (#11) KAZ has b_dadid_publication_info, but b_dadid is NULL ==

=== Problem ===

<code>KAZ</code> has dad id publication info of 'Rudicell et al. 2010', but a <code>NULL</code> dadid.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select *
from clean.biography
where b_dadid is NULL
and (b_dadid_publication_info <> ''
or b_dadid_publication_info is null);</pre>

=== Solution ===

<code>KAZ</code> has 2 potential dads. Introduce a DadIDStatus column, to replace the DadIDPrelim column and have a code that describes what's going on with <code>KAZ</code>.

Fixed in the MS Access data; <code>KAZ</code> was assigned the <code>UNK</code> individual as the dad. Problem will be marked resolved with the upload of the next MS Access database dump.

== (#12) 9 BIOGRAPHY rows have BirthComm values that are not COMM_IDS.CommID values ==

=== Problem ===

There are 9 BIOGRAPHY rows with BirthComm values that are not valid COMM_IDS, their communities do not exist.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select *
from easy.biography
where b_birthgroup is not NULL
and not exists (select 1
from easy.community_lookup
where community_lookup.cl_community_id
= biography.b_birthgroup);</pre>

=== Solution ===
Added KL and KL_KK to CommIds. KL = Kalande, KL_KK= Kasekela/kalande
codes are created during conversion

Fixed in commit 5d85b83bfe1a.

== * (#13) COMM_MEMBS rows place individuals in a community, that is not their birth community, before their BIOGRAPHY.EntryDate ==

=== Problem ===

13 COMM_MEMBS rows place individuals into a community, that is not their birth community, before BIOGRAPHY says they
entered the community.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select b.b_animid, b.b_birthgroup, b.b_entrydate, cm.cm_start_date
from clean.community_membership as cm
join clean.biography as b
on (b.b_animid = cm.cm_b_animid)
where b.b_birthgroup is distinct from cm.cm_cl_community_id
and cm.cm_start_date < b.b_entrydate
order by b.b_animid, cm.cm_start_date;
</pre>

=== Solution ===

== * (#14) COMM_MEMBS rows place individuals in a community after their BIOGRAPHY.EndDate ==

=== Problem ===

3 COMM_MEMBS rows place individuals into a community after BIOGRAPHY says they
left the community.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select b.b_animid, cm.cm_end_date
from clean.community_membership as cm
join clean.biography as b
on (b.b_animid = cm.cm_b_animid)
where cm.cm_end_date > b.b_departdate
order by b.b_animid, cm.cm_end_date;
</pre>

=== Solution ===
ICG EVL fixed BH, HAI, TZB2 in Access. Need to talk to Karl about MG, RO, WD. Treat like KL chimps in Biography?

== (#15) TT is placed in a community twice on the same day ==

=== Problem ===

An individual may not be in more than one community (or even twice in the same community) on any given day.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select first.cm_b_animid as anim_id
, first.cm_start_date as first_start_date
, first.cm_end_date as first_end_date
, first.cm_cl_community_id as first_community_id
, first.cm_start_source as first_start_source
, first.cm_end_source as first_end_source
, second.cm_start_date as second_start_date
, second.cm_end_date as second_end_date
, second.cm_cl_community_id as second_community_id
, second.cm_start_source as second_start_source
, second.cm_end_source as second_end_source
from clean.community_membership as first
join clean.community_membership as second
on (first.cm_b_animid = second.cm_b_animid
and first.cm_start_date < second.cm_start_date)
where first.cm_end_date >= second.cm_start_date;
</pre>

=== Solution ===
Fixed in MS Access 2/1/2024
Changed end date of KK_P1 membership to 8/14/2022

ICG

== (#16) There are 31 rows in COMM_MEMB_LOG where MadeBy is NULL ==

=== Problem ===

There are 31 rows in COMM_MEMB_LOG where MadeBy is NULL, and the column does not allow NULL values.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select * from clean.community_membership_update_log where made_by is null;
</pre>

=== Solution ===

See problem #18. Fixed in commit 7a48fb2c4aad3b5bd4

== (#17) There are 64 rows in COMM_MEMB_LOG where the chimp_id is not on BIOGRAPHY ==

=== Problem ===

There are 64 rows in COMM_MEMB_LOG where the chimp_id is not a BIOGRAPHY.AnimID.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select date_of_update, chimp_id
from clean.community_membership_update_log as log
where not exists (select 1
from clean.biography
where biography.b_animid = log.chimp_id);
</pre>

=== Solution ===
Make this a "soft" error. Fixed in commit 4c9626b304.

== (#18) There are 31 rows in COMM_MEMB_LOG where the MadeBy is NULL ==

=== Problem ===

There are 31 rows in COMM_MEMB_LOG where MadeBy is NULL, and the column does not allow
NULLs.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select * from clean.community_membership_update_log where made_by is null;
</pre>

=== Solution ===

Create an unknown person (UNK), and when the person is NULL, use the unkonwn person.

This allows us to make the unknown person "inactive", preventing them from being used
in newly entered data. The alternative, allowing NULL MadeBy values, would allow
new "bad data", and NULL values make querying harder.

Fixed in commit 7a48fb2c4aad3b5b.

== (#19) There are 18 rows in BIOGRAPHY_LOG where the MadeBy is NULL ==

=== Problem ===

There are 18 rows in BIOGRAPHY_LOG where MadeBy is NULL, and the column does not allow
NULLs.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select * from clean.biography_update_log where made_by is null;
</pre>

=== Solution ===

Create an unknown person (UNK), and when the person is NULL, use the unkonwn person.
(See Problem #18)== * (#19) There are 18 rows in BIOGRAPHY_LOG where the MadeBy is NULL ==

Fixed in commit 8a528cc7d1a9fb.

== (#20) There are 3 rows in BIOGRAPHY_LOG where the Rationale is NULL ==

=== Problem ===

There are 3 rows in BIOGRAPHY_LOG where Rationale is NULL. The column does not allow
NULLs; normally the conversion program would convert NULL to the empty string.
But the Rationale column requires there be (non-empty) textual data.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select * from clean.biography_update_log where update_rationale is null;
</pre>

=== Solution ===
ICG fixed in MS Access 2/8/2024. Updated rationale to 'routine update'.

== (#21) There are 6 rows in BIOGRAPHY_LOG where the update_escription is NULL ==

=== Problem ===

There are 6 rows in BIOGRAPHY_LOG where Description is NULL.
It appears that the description was put into the Rationale column, sometimes along with some rationale.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select * from clean.biography_update_log where update_description is null;
</pre>

=== Solution ===
ICG fixed in MS Access 2/8/2024, using information in update_rationale

== (#22) There are 85 rows in BIOGRAPHY_LOG where the chimp_id is not on BIOGRAPHY ==

=== Problem ===

There are 85 rows in BIOGRAPHY_LOG where the chimp_id is not a BIOGRAPHY.AnimID.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select *
from clean.biography_update_log as log
where not exists (select 1
from clean.biography
where biography.b_animid = log.chimp_id);
</pre>

=== Solution ===
Make this a "soft" error. Fixed in commit c96555f9f326.

== (#23) There are 65 rows in BIOGRAPHY_LOG where the MadeBy is not on PEOPLE ==

=== Problem ===

There are 65 rows in BIOGRAPHY_LOG where the MadeBy is not on PEOPLE, but there is MadeBy data.

These are "combination" ID errors, where multiple people are entered instead
of a single people code.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select *
from clean.biography_update_log as log
where not exists (select 1
from clean.people

</pre>
where people.person = log.made_by)
and made_by is not null;
</pre>

=== Solution ===
Fixed in MS Access by ICG 2/8/2024. Changed multiple IDs to the one who made the final change.

== (#24) Follow starts are not first arrivals ==

=== Problem ===
There are 6,397 cases where the old FOLLOW table's columns
"fol_time_begin" is not the first arrival time, fa_time_start,
on FOLLOW_ARRIVAL.

The The Gombe Chimpanzee Database Handbook says that this
should never happen, because the follow start is supposed
to be the first arrival. Is there additional data
in the old fol_time_begin, like the actual start
the observers started working? If not, which data is correct,
the one in FOLLOW or the one in FOLLOW_ARRIVAL?

=== Bad Data ===
<pre>
WITH spans AS
(SELECT fa_fol_date, fa_fol_b_focal_animid
, MIN(fa_time_start) AS min_start
FROM clean.follow_arrival
WHERE fa_b_arr_animid = fa_fol_b_focal_animid
GROUP BY fa_fol_date, fa_fol_b_focal_animid)
SELECT spans.*, follow.fol_time_begin
FROM spans
JOIN clean.follow
ON (follow.fol_date = spans.fa_fol_date
AND follow.fol_b_animid = spans.fa_fol_b_focal_animid)
WHERE NOT EXISTS
(SELECT 1
FROM clean.follow
WHERE follow.fol_date = spans.fa_fol_date
AND follow.fol_b_animid = spans.fa_fol_b_focal_animid
AND spans.min_start = follow.fol_time_begin)
ORDER BY spans.fa_fol_date, spans.fa_fol_b_focal_animid;
</pre>

=== Solution ===

This is data entry error, ignore the problem and ignore the data in the follow.fol_time_begin.

==(#25) Follow ends are not last arrivals ==

=== Problem ===
There are 6,570 cases where the old FOLLOW table's columns
"fol_time_end" is not the last arrival
time, fa_time_end, on FOLLOW_ARRIVAL.

The The Gombe Chimpanzee Database Handbook says that this
should never happen, because the follow end is supposed
to be the first arrival/last departure. Is there additional data
in the old fol_time_end, like the actual end-time
the observers started working? If not, which data is correct,
the one in FOLLOW or the one in FOLLOW_ARRIVAL?

=== Bad Data ===
<pre>
WITH spans AS
(SELECT fa_fol_date, fa_fol_b_focal_animid
, MAX(fa_time_end) AS max_end
FROM clean.follow_arrival
WHERE fa_b_arr_animid = fa_fol_b_focal_animid
GROUP BY fa_fol_date, fa_fol_b_focal_animid)
SELECT spans.*, follow.fol_time_end
FROM spans
JOIN clean.follow
ON (follow.fol_date = spans.fa_fol_date
AND follow.fol_b_animid = spans.fa_fol_b_focal_animid)
WHERE NOT EXISTS
(SELECT 1
FROM clean.follow
WHERE follow.fol_date = spans.fa_fol_date
AND follow.fol_b_animid = spans.fa_fol_b_focal_animid
AND spans.max_end = follow.fol_time_end)
ORDER BY spans.fa_fol_date, spans.fa_fol_b_focal_animid;
</pre>

=== Solution ===

This is data entry error, ignore the problem and ignore the data in the follow.fol_time_end.

== (#26) Mismatch of start-in-nest on follow and follow_arrival ==

=== Problem ===

There are <s>1,848</s> 3,369 follows where the follow arrival says the focal started in the nest but the follow does not, or vice-versa.

=== Bad Data ===

<pre>
WITH spans AS
(SELECT fa_fol_date, fa_fol_b_focal_animid
, MIN(fa_time_start) AS min_start
FROM easy.follow_arrival
WHERE fa_b_arr_animid = fa_fol_b_focal_animid
GROUP BY fa_fol_date, fa_fol_b_focal_animid)
SELECT spans.*, follow.fol_time_begin, follow.fol_flag_begin_in_nest
, first_arrivals.fa_type_of_nesting
FROM easy.follow
JOIN spans
ON (follow.fol_date = spans.fa_fol_date
AND follow.fol_b_animid = spans.fa_fol_b_focal_animid)
JOIN easy.follow_arrival
AS first_arrivals
ON (first_arrivals.fa_fol_date = follow.fol_date
AND first_arrivals.fa_fol_b_focal_animid = follow.fol_b_animid
AND first_arrivals.fa_time_start = spans.min_start)
WHERE (((first_arrivals.fa_type_of_nesting = 1
OR first_arrivals.fa_type_of_nesting = 3)
AND follow.fol_flag_begin_in_nest = 0)
OR (first_arrivals.fa_type_of_nesting <> 1
AND first_arrivals.fa_type_of_nesting <> 3
AND follow.fol_flag_begin_in_nest = 1))
ORDER BY spans.fa_fol_date, spans.fa_fol_b_focal_animid;
</pre>

=== Solution ===

Ignore. Default to data from Follow_Arrival

==== Remarks ====

When applying the solution, solving both problem #26 and #27,
and updating follow_arrival so that it is the one source of truth
used by the conversion,
there are 297 follow_arrival rows updated.

This implies that it is primarily the follow table
that does not mark the individual as being in a
nest when they should be in a nest. ("Should be",
according to the accepted solution.)

==(#27) Mismatch of end-in-nest on follow and follow_arrival ==

=== Problem ===

There are 3,804 follows where the follow arrival says the focal ended in the nest but the follow does not, or vice-versa.

=== Bad Data ===

<pre>
WITH spans AS
(SELECT fa_fol_date, fa_fol_b_focal_animid
, MAX(fa_time_end) AS max_end
FROM easy.follow_arrival
WHERE fa_b_arr_animid = fa_fol_b_focal_animid
GROUP BY fa_fol_date, fa_fol_b_focal_animid)
SELECT spans.*
, follow.fol_time_end, follow.fol_flag_end_in_nest
, last_arrivals.fa_type_of_nesting
FROM easy.follow
JOIN spans
ON (follow.fol_date = spans.fa_fol_date
AND follow.fol_b_animid = spans.fa_fol_b_focal_animid)
JOIN easy.follow_arrival
AS last_arrivals
ON (last_arrivals.fa_fol_date = follow.fol_date
AND last_arrivals.fa_fol_b_focal_animid = follow.fol_b_animid
AND last_arrivals.fa_time_end = spans.max_end)
WHERE (((last_arrivals.fa_type_of_nesting = 2
OR last_arrivals.fa_type_of_nesting = 3)
AND follow.fol_flag_end_in_nest = 0)
OR (last_arrivals.fa_type_of_nesting <> 2
AND last_arrivals.fa_type_of_nesting <> 3
AND follow.fol_flag_end_in_nest = 1))
ORDER BY spans.fa_fol_date, spans.fa_fol_b_focal_animid;
</pre>

=== Solution ===

Ignore the problem. Default to Follow_Arrival

==== Remarks ====

See the remarks for [[Conversion Data Issues#(#27) Mismatch of end-in-nest on follow and follow_arrival|problem #26]].

== (#28) The FOLLOW.FOL_distance_traveled column is not converted ==

=== Problem ===

The FOLLOW.FOL_distance_traveled column has no corresponding column in the new
database design. The conversion process does not check that the value
of this column is consistent with the other data in the database from which it is computed.

=== Solution ===

This is a computed column and does not need to be converted. The desired value is computed in the design of the new database.

The assumption is that the "raw" data from which this value is computed in the MS Access database is correct.

== (#29) The FOLLOW.Brecord_notes column is not converted ==

=== Problem ===

The FOLLOW.Brecord_notes column has no corresponding column in the new
database design.

=== Solution ===

This column is used for administrative purposes and does not need to be in the new database design.

== (#30) Some follows have no community ==

=== Problem ===

There are 13 follows with no community.

=== Bad Data ===

<pre>
select *
from clean.follow
where fol_cl_community_id is null
order by fol_date, fol_b_animid;
</pre>

=== solution ===

Solved

== (#31) Some follows have an animid with trailing spaces ==

=== Problem ===

There are <s>9</s> 39 follows with animids that don't exist, because they have trailing spaces. These are comprised of <s>9</s> 23 distinct animids.

=== Bad Data ===

Cleaned up in <code>clean</code> schema, so query <code>easy</code> schema.

<pre>
select rtrim(fol_b_animid)
from easy.follow
where rtrim(fol_b_animid) <> fol_b_animid
order by fol_b_animid;
</pre>

=== Solution ===

Remove the trailing spaces in the conversion process.

== (#32) Some follows have an animid that is lower-case ==

=== Problem ===

There is 1 follows with an animids that don't exist, because it is lower-case.

=== Bad Data ===

Cleaned up in <code>clean</code> schema, so query <code>easy</code> schema.
<pre>
select *
from easy.follow
where upper(fol_b_animid) <> fol_b_animid
order by fol_b_animid;
</pre>

=== Solution ===

Convert the animid to upper-case in the conversion process.

==(#33) Some follows have an animid that does not exist, even after animid cleanup ==

=== Problem ===

There are 14 follows with animids that don't exist, even after cleanup that removes trailing spaces and forces upper-case. (Some of these may be due to prior errors.) These are comprised of 3 distinct animids.

=== Bad Data ===

<pre>
-- The 14 follows with bad animids
select *
from clean.follow
where not exists
(select 1
from clean.biography
where biography.b_animid = upper(rtrim(follow.fol_b_animid)))
order by fol_date, fol_b_animid;
</pre>

<pre>
-- The 3 animids involved
select distinct follow.fol_b_animid
from clean.follow
where not exists
(select 1
from clean.biography
where biography.b_animid = upper(rtrim(follow.fol_b_animid)))
order by follow.fol_b_animid;
</pre>

=== Solution ===
Solved

== (#34) There are duplicate animid, date combinations on FOLLOW ==

=== Problem ===
There are 2 sets of duplicate animid, date combinations on the FOLLOW table, for a total of 4 rows.

=== Bad Data ===

<pre>
-- The duplicate animid, date combinations
select follow.fol_b_animid, follow.fol_date, count(*)
from clean.follow
group by follow.fol_b_animid, follow.fol_date
having count(*) > 1
order by fol_b_animid, fol_date;
</pre>

<pre>
-- The duplicate rows
with dups as (
select follow.fol_b_animid, follow.fol_date
from clean.follow
group by follow.fol_b_animid, follow.fol_date
having count(*) > 1)
select *
from clean.follow
join dups on (follow.fol_b_animid = dups.fol_b_animid
and follow.fol_date = dups.fol_date)
order by follow.fol_b_animid, follow.fol_date;
</pre>
=== Solution ===
Solved

== (#35) There are follows done before a focal was under study ==

=== Problem ===
There are 5 follows that are done before their focal was under study, before the focal's EntryDate.

=== Bad Data ===

<pre>
select follow.*, biography.b_entrydate
from clean.follow
join clean.biography on (biography.b_animid = follow.fol_b_animid)
where biography.b_entrydate > follow.fol_date
order by follow.fol_date, follow.fol_b_animid;
</pre>

=== Solution ===

Solved

== (#36) FOLLOW_OBSERVERS.Period is not checked against follow start or stop times ==

=== Problem ===
The conversion process uses the clean.follow columns of fol_am_observer_1, fol_am_observer_2, fol_pm_observer_1, and fol_pm_observer_2 to populate FOLLOW_OBSERVERS.
(The *_observer_1 column going into FOLLOW_OBSERVERS.OBS_BRec and the *_observer_2 column going in OBS_Tiki.)

If both the *_observer_1 and the *_observer_2 columns are either NULL or the empty string (after space trimming), the no row is created for the respective time period.

There are no checks done to ensure that the time periods of the follow have any relation to the FOLLOW_OBSERVERS.Period value.

=== Solution ===

Ignore the problem. (If someone cares, add a query to the warning system.)

== (#37) Some follows have no recorded observers ==

=== Problem ===

There are 56 follows with no recorded observers, but the system requires there be a FOLLOW_OBSERVERS record related to the follow.

See problem #36 for a description of the follow observer conversion process.

=== Bad data ===

<pre>
select *
from clean.follow
where coalesce(btrim(fol_am_observer_1), '') = ''
and coalesce(btrim(fol_am_observer_2), '') = ''
and coalesce(btrim(fol_pm_observer_1), '') = ''
and coalesce(btrim(fol_pm_observer_2), '') = ''
order by fol_date, fol_b_animid;
</pre>

=== Solution ===

Make the "NONE" (no observer) person the observers.

Create a "UNK" (unknown) time period in PERIODS, and make that the time period.

== (#38) Some follows have only one observer, but the system wants two ==

=== Problem ===

There are follows with only one observer, but the system requires a FOLLOW_OBSERVERS record have a value in both OBS_BRec and OBS_Tiki.

See problem #36 for a description of the follow observer conversion process.

=== Solution ===

Create a "NONE" person, and make that person the observer when there's otherwise not a value.

== (#39) In follow, there are observers that are 2 people ==

=== Problem ===

In the follow table, in columns fol_am_observer_1, fol_am_observer_2, fol_pm_observer_1, fol_pm_observer_2, there are 30 rows (28, really, because of n/a) that appear to represent 2 people.

These look like names, separated by the "/" character.

=== Bad data ===
<pre>
WITH new_people AS (
SELECT BTRIM(follow.fol_am_observer_1) AS person
FROM clean.follow
WHERE follow.fol_am_observer_1 IS NOT NULL
GROUP BY BTRIM(follow.fol_am_observer_1)
UNION
SELECT BTRIM(follow.fol_am_observer_2) AS person
FROM clean.follow
WHERE follow.fol_am_observer_2 IS NOT NULL
GROUP BY BTRIM(follow.fol_am_observer_2)
UNION
SELECT BTRIM(follow.fol_pm_observer_1) AS person
FROM clean.follow
WHERE follow.fol_pm_observer_1 IS NOT NULL
GROUP BY BTRIM(follow.fol_pm_observer_1)
UNION
SELECT BTRIM(follow.fol_pm_observer_2) AS person
FROM clean.follow
WHERE follow.fol_pm_observer_2 IS NOT NULL
GROUP BY BTRIM(follow.fol_pm_observer_2)
)
, uniq_people AS (
SELECT new_people.person
FROM new_people
WHERE new_people.person <> ''
GROUP BY new_people.person
)
SELECT uniq_people.person
FROM uniq_people
JOIN uniq_people AS up
ON (LOWER(uniq_people.person) = LOWER(up.person))
WHERE uniq_people.person <> up.person
AND STRPOS(uniq_people.person, '/') <> 0
GROUP BY uniq_people.person
ORDER BY LOWER(uniq_people.person), uniq_people.person;
</pre>

=== Solution ===

Ignore the problem. Mark all people containing a "/" character as not active, so they can't be used in the future.

== (#40) In follow, there are observers that differ only by character case ==

=== Problem ===

In the follow table, in columns fol_am_observer_1, fol_am_observer_2, fol_pm_observer_1, fol_pm_observer_2, there are 511 names that differ only by character case. So, about half that in terms of unique names.

This is complicated to account for and exclude duplicates in the conversion process. So resolution of this is holding back additional conversion work.

=== Bad data ===

Note the use of the <code>easy</code> schema, instead of the
<code>clean</code> schema. This is because the observer data
in the <code>clean</code> schema has been case-normalized.

<pre>
WITH new_people AS (
SELECT BTRIM(follow.fol_am_observer_1) AS person
FROM easy.follow
WHERE follow.fol_am_observer_1 IS NOT NULL
GROUP BY BTRIM(follow.fol_am_observer_1)
UNION
SELECT BTRIM(follow.fol_am_observer_2) AS person
FROM easy.follow
WHERE follow.fol_am_observer_2 IS NOT NULL
GROUP BY BTRIM(follow.fol_am_observer_2)
UNION
SELECT BTRIM(follow.fol_pm_observer_1) AS person
FROM easy.follow
WHERE follow.fol_pm_observer_1 IS NOT NULL
GROUP BY BTRIM(follow.fol_pm_observer_1)
UNION
SELECT BTRIM(follow.fol_pm_observer_2) AS person
FROM easy.follow
WHERE follow.fol_pm_observer_2 IS NOT NULL
GROUP BY BTRIM(follow.fol_pm_observer_2)
)
, uniq_people AS (
SELECT new_people.person
FROM new_people
WHERE new_people.person <> ''
GROUP BY new_people.person
)
SELECT uniq_people.person
FROM uniq_people
JOIN uniq_people AS up
ON (LOWER(uniq_people.person) = LOWER(up.person))
WHERE uniq_people.person <> up.person
GROUP BY uniq_people.person
ORDER BY LOWER(uniq_people.person), uniq_people.person;
</pre>

=== Solution ===

Use the mixed case code when such exists. This involves changing the observer columns in the <code>follow</code> table. See [[The_Old_Database#The_observer_columns_of_the_follow_table|the notes]].

==(#41)There are follow arrivals with NULL nesting information ==

=== Problem ===

In the follow_arrival table, there are 11 rows with a NULL fa_type_of_nesting.

=== Bad data ===

<pre>
select * from clean.follow_arrival where fa_type_of_nesting IS NULL;
</pre>

=== Solution ===

Solved

== (#42) There are follow arrivals with NULL sexual cycle information ==

=== Problem ===

In the follow_arrival table, there are 222 rows with a NULL fa_type_of_cycle.

=== Bad data ===

<pre>
select * from clean.follow_arrival where fa_type_of_cycle is null;
</pre>

=== Solution ===

Make a <code>MISS</code> <code>CYCLE_STATES</code> value and use that for the arrivals with no cycle information.

== * (#43) There are follow arrivals with no related follow ==

=== Problem ===

In the follow_arrival table, there are 732 rows with a focal and a date that have no matching information on the follow table.

=== Bad data ===

<pre>
SELECT *
FROM clean.follow_arrival
WHERE NOT EXISTS
(SELECT 1
FROM clean.follow
WHERE follow.fol_b_animid = follow_arrival.fa_fol_b_focal_animid
AND follow.fol_date = follow_arrival.fa_fol_date)
ORDER BY follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid;
</pre>
=== Solution ===
Ignore for now. Eventually fix by hand.

== * (#44) There are follow arrivals where the arriving chimp arrives before being under study ==

=== Problem ===

In the follow_arrival table, there are <s>217</s> 49 rows where the fa_b_arr_animid, the arriving individual, has an entry date after the date of the follow.

=== Bad data ===

<pre>
select follow_arrival.*, biography.b_entrydate, biography.b_sex
from clean.follow_arrival
join clean.biography
on (biography.b_animid = follow_arrival.fa_b_arr_animid)
where biography.b_entrydate > follow_arrival.fa_fol_date
order by biography.b_animid, follow_arrival.fa_fol_date;
</pre>

== * (#45) The follow_arrival.fa_update column is not preserved ==

=== Problem ===

There is nowhere in the current design to store the values in the follow_arrival.fa_update column.

=== Solution ===

Add a temporal extension to Postgres to make the db into a temporal database to track change history and be able to see data as it existed at any point in time.

Will re-review this should a temporal extension be out of budget, etc.

== * (#46) There are follow_arrivals where non-females have a cycle code that is other than <code>n/a</code> ==

=== Problem ===

There are <s>169</s> 168 follow_arrival rows, for non-female arriving individuals, that have a sexual cycle code (fa_type_of_cycle) that is not <code>n/a</code>.

=== Bad data ===

<pre>
select follow_arrival.*, biography.b_sex
from clean.follow_arrival
join clean.biography
on (biography.b_animid = follow_arrival.fa_b_arr_animid)
where biography.b_sex <> 'F'
and follow_arrival.fa_type_of_cycle <> 'n/a'
order by follow_arrival.fa_fol_date
, follow_arrival.fa_b_arr_animid;
</pre>

To summarize by sex and cycle code:

<pre>
select biography.b_sex, follow_arrival.fa_type_of_cycle, count(*)
from clean.follow_arrival
join clean.biography
on (biography.b_animid = follow_arrival.fa_b_arr_animid)
where biography.b_sex <> 'F'
and follow_arrival.fa_type_of_cycle <> 'n/a'
group by biography.b_sex, follow_arrival.fa_type_of_cycle order by biography.b_sex
, follow_arrival.fa_type_of_cycle;
</pre>

=== Solution ===

Allow males to have a cycle code of <code>MISS</code>.

Change non-female's who have a cycle code of <code>0</code> to a cycle code of <code>n/a</code>.

== (#47) There are follow_arrivals where females that are too young have a cycle code of <code>U</code> ==

=== Problem ===

There are 16 follow_arrival rows, for female arriving individuals, that have a sexual cycle code (fa_type_of_cycle) of <code>U</code> but are less than 5 years of age.

Naturally, this number will change if the age limit is changed, but this is here as a placeholder.

Note: The test is against the birthdate, not the minimum possible birthdate.

=== Bad data ===

<pre>
select follow_arrival.*, biography.b_sex, biography.b_birthdate
from clean.follow_arrival
join clean.biography
on (biography.b_animid = follow_arrival.fa_b_arr_animid)
where biography.b_sex = 'F'
and follow_arrival.fa_type_of_cycle = 'U'
and biography.b_birthdate
> (follow_arrival.fa_fol_date
- '5 years'::INTERVAL
)
order by follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid;
</pre>

Note: The original query used the wrong inequality symbol.

=== Solution ===

Changing the limit from 6 years of age to 5 reduced the number of outstanding problems to 16.

The rest will have to be fixed in the data. Alternately, we can adjust the hard limit, and set a soft limit of 5 years in the warning system with a note to change the hard limit back once the errors are resolved.

Resolution: Fixed in the data.

== * (#48) There are follow arrivals where the arriving chimp arrives after finishing being under study ==

=== Problem ===

In the follow_arrival table, there are <s>239</s> 324 rows where the fa_b_arr_animid, the arriving individual, has a departure date before the date of the follow.

=== Bad data ===

<pre>
select follow_arrival.*, biography.b_departdate, biography.b_sex
from clean.follow_arrival
join clean.biography
on (biography.b_animid = follow_arrival.fa_b_arr_animid)
where biography.b_departdate < follow_arrival.fa_fol_date
order by biography.b_animid, follow_arrival.fa_fol_date;
</pre>

== * (#49) There are follow_arrivals where females that are too old have a cycle code of <code>U</code> ==

=== Problem ===

There are <s>127</s> 98 follow_arrival rows, for female arriving individuals, that have a sexual cycle code (fa_type_of_cycle) of <code>U</code> but are more than 14 years of age.
(Actually, because the endpoint takes up the whole 14th year, this means at least 15 years of age.)

Naturally, this number will change if the age limit is changed, but this is here as a placeholder.

Note: The test is against the birthdate, not the maximum possible birthdate.

=== Bad data ===

Note: Original query used the wrong inequality.

<pre>
select follow_arrival.*
, biography.b_sex
, biography.b_birthdate
from clean.follow_arrival
join clean.biography
on (biography.b_animid = follow_arrival.fa_b_arr_animid)
where biography.b_sex = 'F'
and follow_arrival.fa_type_of_cycle = 'U'
and biography.b_birthdate
<= (follow_arrival.fa_fol_date
- '1 year'::INTERVAL
- '14 years'::INTERVAL
)
order by follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid;
</pre>

=== Solution ===

The number of problem rows was reduced after changing the limit to 14 years from 9 years.

The remaining problems will have to be adjusted in the data. Alternately, we can adjust the hard limit, and set a soft limit of 14 years in the warning system with a note to change the hard limit back once the errors are resolved.

== (#50) There are follow_arrivals where females have a cycle code of <code>n/a</code> ==

=== Problem ===

There are 4,496 follow_arrival rows, for female arriving individuals, that have a sexual cycle code (fa_type_of_cycle) of <code>n/a</code>.

=== Bad data ===

<pre>
select follow_arrival.*
, biography.b_sex
, biography.b_birthdate
from clean.follow_arrival
join clean.biography
on (biography.b_animid = follow_arrival.fa_b_arr_animid)
where biography.b_sex = 'F'
and follow_arrival.fa_type_of_cycle = 'n/a'
order by follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid;
</pre>

=== Solution ===

Change the cycle code to <code>MISS</code>, for these rows. This is done in the <code>clean</code> schema.

This is an indication that future cleanup is required.

== (#51) There are follow_arrivals with invalid fa_data_source values ==

=== Problem ===

There are 546 follow_arrival rows that have fa_data_source values that are not one of:
<code>Tiki</code>
<code>Tiki_Mom</code>
<code>Tiki_ID</code>
<code>Brec</code>.

=== Bad data ===

Query the <code>easy</code> schema, because the data has been cleaned in the <code>clean</code> schema.
<pre>
select follow_arrival.*
from easy.follow_arrival
where follow_arrival.fa_data_source <> 'Tiki'
and follow_arrival.fa_data_source <> 'Tiki_Mom'
and follow_arrival.fa_data_source <> 'Tiki_ID'
and follow_arrival.fa_data_source <> 'Brec'
order by follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid;

-- Summarize with:
select follow_arrival.fa_data_source, count(*)
from easy.follow_arrival
where follow_arrival.fa_data_source <> 'Tiki'
and follow_arrival.fa_data_source <> 'Tiki_Mom'
and follow_arrival.fa_data_source <> 'Tiki_ID'
and follow_arrival.fa_data_source <> 'Brec'
group by follow_arrival.fa_data_source
order by follow_arrival.fa_data_source;
</pre>

=== Solution ===

Change <code>BREC</code> and <code>brec</code> to <code>Brec</code>.

Change <code>TikI</code> to <code>Tiki</code>.

Add the other codes:

fa_data_source count
Tiki_GM 202
Tiki_PM 165
Tiki_SS 22

Also: Tiki_Mom

== * (#52) There are follow_arrivals that are almost duplicates ==

=== Problem ===

This entry may end up being multiple problems.

There are follow_arrivals that are near duplicates.
When checking for duplicates on <code>fa_fol_date</code>, <code>fa_fol_b_focal_animid</code>, <code>fa_b_arr_animid</code>
and <code>fa_seq_num</code>, the rows are always unique.
But checking the combination of <code>fa_fol_date</code>, <code>fa_fol_b_focal_animid</code>, <code>fa_b_arr_animid</code>
and <code>fa_time_start</code>,
and <code>fa_time_end</code> yields 184 rows.
Leaving off <code>fa_time_end</code> and just checking the combination of <code>fa_fol_date</code>, <code>fa_fol_b_focal_animid</code>, <code>fa_b_arr_animid</code>
and <code>fa_time_start</code> yields 430 rows.

Why the duplicates?

This entry is a call for a definition of an <code>ARRIVALS</code> row, what does it mean to be a duplicate?

=== Bad data ===

<pre>
-- no duplicates when looking at just sequence number
select fa_fol_date
, fa_fol_b_focal_animid
, fa_b_arr_animid
, fa_seq_num
from clean.follow_arrival
group by fa_fol_date
, fa_fol_b_focal_animid
, fa_b_arr_animid
, fa_seq_num
having count(*) > 1;
</pre>

<pre>
-- Checking against start and end time
with dups as
(select fa_fol_date
, fa_fol_b_focal_animid
, fa_b_arr_animid
, fa_time_start
, fa_time_end
from clean.follow_arrival
group by fa_fol_date
, fa_fol_b_focal_animid
, fa_b_arr_animid
, fa_time_start
, fa_time_end
having count(*) > 1)
select *
from clean.follow_arrival
where exists
(select 1
from dups
where follow_arrival.fa_fol_date = dups.fa_fol_date
and follow_arrival.fa_fol_b_focal_animid
= dups.fa_fol_b_focal_animid
and follow_arrival.fa_b_arr_animid = dups.fa_b_arr_animid
and follow_arrival.fa_time_start = dups.fa_time_start
and follow_arrival.fa_time_end = dups.fa_time_end)
order by follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid
, follow_arrival.fa_time_start
, follow_arrival.fa_time_end
, follow_arrival.fa_seq_num;
</pre>

=== Solution ===

Where both start and end-times are duplicated, along with other data like cycle state, I can add additional observers or data sources. But I don't know what to do with "almost the same" data.

== * (#53) There are follow_arrivals where the arriving chimp does not exist ==

=== Problem ===

There are <s>688</s> 84 rows, having <s>3</s> 6 different <code>fa_b_arr_animid</code> values, where the <code>fa_b_arr_animid</code> value is not a <code>biography.b_animid</code> value.

=== Bad Data ===

<pre>
-- The rows
select *
from clean.follow_arrival
where not exists
(select 1
from clean.biography
where biography.b_animid = follow_arrival.fa_b_arr_animid)
order by follow_arrival.fa_b_arr_animid
, follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid;

-- A summary of the bad animal ids
select follow_arrival.fa_b_arr_animid, count(*)
from clean.follow_arrival
where not exists
(select 1
from clean.biography
where biography.b_animid = follow_arrival.fa_b_arr_animid)
group by follow_arrival.fa_b_arr_animid
order by follow_arrival.fa_b_arr_animid;
</pre>

=== Solution ===

ICG fixed in MS Access 10/2025. AMA-->AME, OBE-->POR

== * (#54) There are follow_arrivals where females that are too young have a cycle state code that is not <code>0</code>, <code>U</code>, or <code>MISS</code> ==

=== Problem ===

There are <s>26</s> 42 follow_arrival rows, for female arriving individuals, that have a sexual cycle code indicating sexual swelling but are less than 8 years of age.

Note: The test is against the birthdate, not the minimum possible birthdate.

=== Bad data ===

<pre>
select follow_arrival.*, biography.b_sex, biography.b_birthdate
from clean.follow_arrival
join clean.biography
on (biography.b_animid = follow_arrival.fa_b_arr_animid)
where biography.b_sex = 'F'
and follow_arrival.fa_type_of_cycle <> '0'
and follow_arrival.fa_type_of_cycle <> 'U'
and follow_arrival.fa_type_of_cycle <> 'MISS'
and biography.b_birthdate
> (follow_arrival.fa_fol_date
- '8 years'::INTERVAL
)
and follow_arrival.fa_b_arr_animid <> 'MGF'
and follow_arrival.fa_b_arr_animid <> 'MGF2'
and follow_arrival.fa_b_arr_animid <> 'MGF3'
order by follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid;
</pre>

=== Solution ===

== * (#55) There are community_membership rows that place an individual in a community before birth ==

=== Problem ===

There is one individual who is placed in a community, once, before birth.

Note: The test is against the birthdate, not the minimum possible birthdate.

=== Bad data ===

<pre>
select b.b_animid, b.b_birthdate, cm.cm_start_date
from clean.community_membership as cm
join clean.biography as b
on (b.b_animid = cm.cm_b_animid)
where cm.cm_start_date < b.b_birthdate
order by b.b_animid, cm.cm_start_date;
</pre>

=== Bad solution ===

ICG fixed FN community starte date in MS Access - oct 2025

== (#56) There are follow_arrival focal animids with trailing spaces ==

=== Problem ===

There are 144 follow arrivals where the focal id has trailing spaces.

=== Bad data ===

<pre>
select * from follow_arrival where fa_fol_b_focal_animid <> rtrim(fa_fol_b_focal_animid);
</pre>

=== Solution ===

Remove the trailing spaces in table in the <code>clean</code> schema.

== * (#57) GROOM_BOUT duplicate keys ==

=== Problem ===
The data dump says that the <code>GROOM_BOUT</code> table has a primary key consisting of, in order, the columns: <code>GRM_FOL_date</code>, <code>GRM_FOL_B_focal_AnimId</code>, <code>GRM_time_begin</code>, <code>GRM_B_partner_AnimId</code>
But these columns contain duplicate values.

=== Bad Data ===
The duplicate values can be listed (from the <code>raw</code> schema) with:

<pre>SELECT *
FROM raw."GROOM_BOUT"
JOIN (SELECT "GRM_FOL_date" AS the_date
, "GRM_FOL_B_focal_AnimId" AS the_animid
, "GRM_time_begin" AS the_time
, "GRM_B_partner_AnimId" AS the_partner
FROM raw."GROOM_BOUT"
GROUP BY "GRM_FOL_date"
, "GRM_FOL_B_focal_AnimId"
, "GRM_time_begin"
, "GRM_B_partner_AnimId"
HAVING count(*) > 1
) AS gb
ON ("GROOM_BOUT"."GRM_FOL_date" = gb.the_date
AND "GROOM_BOUT"."GRM_FOL_B_focal_AnimId" = gb.the_animid
AND "GROOM_BOUT"."GRM_time_begin" = gb.the_time
AND "GROOM_BOUT"."GRM_B_partner_AnimId" = gb.the_partner);</pre>

== * (#58) OTHER_SPECIES duplicate keys ==

=== Problem ===
The data dump says that the <code>OTHER_SPECIES</code> table has a primary key consisting of, in order, the columns: <code>OS_FOL_date</code>, <code>OS_FOL_B_focal_AnimID</code>, <code>OS_time_begin</code>
But these columns contain duplicate values.

=== Bad Data ===
The duplicate values can be listed (from the <code>raw</code> schema) with:

<pre>SELECT *
FROM raw."OTHER_SPECIES"
JOIN (SELECT "OS_FOL_date" AS the_date
, "OS_FOL_B_focal_AnimId" AS the_animid
, "OS_time_begin" AS the_time
FROM raw."OTHER_SPECIES"
GROUP BY "OS_FOL_date"
, "OS_FOL_B_focal_AnimId"
, "OS_time_begin"
HAVING count(*) > 1
) AS os
ON ("OTHER_SPECIES"."OS_FOL_date" = os.the_date
AND "OTHER_SPECIES"."OS_FOL_B_focal_AnimId" = os.the_animid
AND "OTHER_SPECIES"."OS_time_begin" = os.the_time);</pre>

=== Note ===
The reason the MS Access database seems to allow this condition is likely due to difference in character case between column names and primary key designations.

== * (#59) Zero BIOGRAPHY.b_animid_num values ==

=== Problem ===
SokewDB requires that the animal ID number be greater than <code>0</code>, or <code>NULL</code>, but some (what seem to be rows for babys) have a <code>0</code> value.

=== Bad Data ===
The <code>0</code> values can be listed (from the <code>clean</code> schema) with:

<pre>select * from clean.biography where b_animid_num = 0;</pre>

=== Solution ===
Change the <code>0</code> values to <code>NULL</code>.

This is a brute-force, but adequate, solution because it does not validate anything concerning the rows affected.

== * (#60) Invalid biography_update_log.made_by values ==

=== Problem ===
There is a <code>biography_update_log.made_by</code> value (<code>SF/EVL</code>) that is not a person. (Not on the <code>PEOPLE</code> table.)

=== Bad Data ===
The invalid values can be listed (from the <code>clean</code> schema) with:

<pre>SELECT *
FROM clean.biography_update_log
WHERE made_by IS NOT NULL
AND NOT EXISTS (SELECT 1
FROM clean.people
WHERE people.person = biography_update_log.made_by);</pre>

== * (#61) Invalid follow date/focals pairs in follow_arrival ==

=== Problem ===
There are <code>follow_arrival.fa_fol_b_animid</code> and <code>follow_arrival.fa_fol_date</code> value combinations that do not exist in <code>follow</code>.

=== Bad Data ===
The invalid values can be listed (from the <code>clean</code> schema) with:

<pre>SELECT *
FROM clean.follow_arrival
WHERE NOT EXISTS
(SELECT 1
FROM clean.follow
WHERE follow.fol_date = follow_arrival.fa_fol_date
AND follow.fol_b_animid
= follow_arrival.fa_fol_b_focal_animid)
ORDER BY follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid
, follow_arrival.fa_time_start
, follow_arrival.fa_time_end;</pre>

== (#62) There are follow_arrival rows with arriving animids with trailing spaces ==

=== Problem ===

There are 39 follow arrivals where the arriving animid has trailing spaces.

=== Bad data ===

<pre>
select * from easy.follow_arrival where fa_b_arr_animid <> rtrim(fa_b_arr_animid);
</pre>

=== Solution ===

Remove the trailing spaces in table in the <code>clean</code> schema.

== * (#63) There are follow_arrival rows with <code>NULL</code> fa_data_source values ==

=== Problem ===

There are 5,020 follow arrivals where the fa_data_source is <code>NULL</code>.

=== Bad data ===

<pre>SELECT *
FROM easy.follow_arrival
WHERE fa_data_source IS NULL
ORDER BY follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid
, follow_arrival.fa_time_start
, follow_arrival.fa_time_end;</pre>

=== Solution ===

Create a <code>none</code> value in ARRIVAL_SOURCES, and use that value instead of <code>NULL</code> in follow_arrival table in the <code>clean</code> schema.

== * (#64) There are follow_arrival rows with <code>NULL</code> fa_type_of_certainty values ==

=== Problem ===

There is 1 follow arrivals where the fa_type_of_certainty is <code>NULL</code>.

=== Bad data ===

<pre>SELECT *
FROM easy.follow_arrival
WHERE fa_type_of_certainty IS NULL
ORDER BY follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid
, follow_arrival.fa_time_start
, follow_arrival.fa_time_end;</pre>

=== Solution ===

Conversion Data Issues

2026-02-16T16:25:53Z

IanGilby: /* * (#41)There are follow arrivals with NULL nesting information */

This page lists all the problems with the data that were encountered during the data conversion process, and how the issue was resolved.

The problems are numbered, in the order in which they were encountered during the conversion.

Given a choice, earlier problems should be solved before later problems.
This allows the later steps in the conversion to receive "correct" data, which help eliminate spurious problems, and aids the discovery of problems hidden by bad data.

Unsolved problems are marked with an <code>*</code>.

== (#1) FOLLOW_MAP_TIME duplicate keys ==

=== Problem ===
The data dump says that the <code>FOLLOW_MAP_TIME</code> table has a primary key consisting of, in order, the columns: <code>FMT_FOL_date</code>, <code>FMT_FOL_B_focal_AnimID</code>, <code>FMT_time</code>.
But these columns contain duplicate values.

=== Bad Data ===
The duplicate values can be listed (from the <code>raw</code> schema) with:

'''This query no longer reports results because the data was changed in the original MS Access data.'''

<pre>SELECT *
FROM "FOLLOW_MAP_TIME"
JOIN (SELECT "FMT_FOL_date" AS the_date
, "FMT_FOL_B_focal_AnimID" AS the_animid
, "FMT_time" AS the_time
FROM "FOLLOW_MAP_TIME"
GROUP BY "FMT_FOL_date", "FMT_FOL_B_focal_AnimID", "FMT_time"
HAVING count(*) > 1
) AS fmt
ON ("FOLLOW_MAP_TIME"."FMT_FOL_date" = fmt.the_date
AND "FOLLOW_MAP_TIME"."FMT_FOL_B_focal_AnimID" = fmt.the_animid
AND "FOLLOW_MAP_TIME"."FMT_time" = fmt.the_time);</pre>

=== Solution ===

Fixed in MS Access 11/17/23 by ICG. Checked all against Tikis.
When run in SokweDB, table does not exist

== (#2) SUBADULT_ARRIVALS_LOG has a textual SA_first_tiki_date column ==

=== Problem ===

The SUBADULT_ARRIVALS_LOG table has a column that is supposed to contain a date, but instead contains the string "TEXTY".

Likely, the entire row is bad. The row contains:

<pre>
SA_B_AnimID | SA_first_tiki_date | SA_notes
-------------+--------------------+----------
TEXTY | TEXTY | TEXTY</pre>

=== Bad Data ===

In the <code>raw</code> schema run:

'''This query no longer reports results because the data was changed in the original MS Access data.'''

<pre>
select * from "SUBADULT_ARRIVALS_LOG" where "SA_first_tiki_date" = 'TEXTY';</pre>

=== Solution ===

Fixed in MS Access, 11/16/2023, ICG.
Deleted row

== (#3) BRECORD_NOTES contains rows where BREC_FOL_date has values that are not just a date ==

=== Problem ===

There are 4 rows in BRECORD_NOTES where the BREC_FOL_date column, supposedly a date, contains time values that are not '00:00:00'.

=== Bad Data ===

In the <code>raw</code> schema run:

'''This query no longer reports results because the data was changed in the original MS Access data.'''

<pre>
select * from "BRECORD_NOTES" where "BREC_FOL_date"::TIME <> '00:00:00';</pre>

=== Solution ===

Fixed in MS Access, 11/16/2023, ICG
Deleted time stamp

== (#4) MATING_EVENT contains rows where M_FOL_date has values that are not just a date ==

=== Problem ===

The MATING_EVENT table contains 1 row where the time portion of M_FOL_date is not '00:00:00'.

=== Bad Data ===

In the <code>raw</code> schema run:

'''This query no longer reports results because the data was changed in the original MS Access data.'''

<pre>
select * from "MATING_EVENT" where "M_FOL_date"::TIME WITHOUT TIME ZONE <> '00:00:00';</pre>

=== Solution ===

Fixed in MS Access, 11/16/2023, ICG
Deleted time stamp

== (#5) BIOGRAPHY.DepartdateError data discarded ==

=== Problem ===

The BIOGRAPHY.DepartdateError column contains data that was, after discussion, determined to be unusable.

=== Bad Data ===

In the <code>raw</code> schema run:

'''This query no longer reports results because the data was changed in the original MS Access data.'''

<pre>
select * from raw."BIOGRAPHY" where "B_DepartdateError" <> 0;
</pre>

=== Solution ===
Do not convert the data. There is no corresponding column in the new db design.

== (#6) BIOGRAPHY.B_AnimID_num column contains the empty string ==

=== Problem ===
The BIOGRAPHY.B_AnimID_num column contains the empty string, instead of NULL.

=== Bad Data ===

In the <code>raw</code> schema run:

'''This query no longer reports results because the data was changed in the original MS Access data.'''

<pre>
select * from raw."BIOGRAPHY" where "B_AnimID_num" = '';
</pre>

=== Solution ===

Changed 11 empty string values to NULL in MS Access. ICG 12/6/2023

== (#7) BIOGRAPHY.B_AnimID_num column is textual ==

=== Problem ===
The BIOGRAPHY.B_AnimID_num column contains has a data type of TEXT.
The data values all begin with "CH".

=== Bad Data ===

In the <code>raw</code> schema run:

'''This query no longer reports results because the data was changed in the original MS Access data.'''

<pre>
select "B_AnimID", "B_AnimID_num"
from raw."BIOGRAPHY"
where "B_AnimID_num" <> ''
AND "B_AnimID_num" IS DISTINCT FROM NULL
order by "B_AnimID";
</pre>

=== Solution ===

Removed the "CH" prefix and made the column an integer in MS Access. ICG 12/6/2023

== (#8) BIOGRAPHY.DadID_publication_info column contains NULL values ==

No longer a problem as NULLs are now allowed.

=== (Not) Problem ===
The BIOGRAPHY.DadID_publication_info column has a data type that allows <code>NULL</code> values.
SokweDB wants only text

=== Bad Data ===

In the <code>raw</code> schema run:

'''This query no longer reports results because the data was changed in the original MS Access data.'''

<pre>
select *
from raw."BIOGRAPHY"
where "B_DadID" IS NULL
order by "B_AnimID";
</pre>

=== Solution ===

Changed the <code>NULL</code> to the empty string ("") in MS Access. ICG 12/6/2023

== (#9) BIOGRAPHY.DadID column contains non-AnimID values ==

=== Problem ===
The BIOGRAPHY.DadID column has a non-AnimID values.

=== Bad Data ===

In the <code>raw</code> schema run:
<pre>
select "B_AnimID", "B_DadID"
from raw."BIOGRAPHY"
where "B_DadID" is not null
and "B_DadID" <> ''
and not exists (select 1
from raw."BIOGRAPHY" as search
where search."B_AnimID" = raw."BIOGRAPHY"."B_DadID");
</pre>

=== Solution ===

Create the DadIDPrelim column in a BIOGRAPHY_DATA table, and make a BIOGRAPHY view that combines Dad_ID and DadIDPrelim into DadID -- adding the '_prelim' suffix as expected.

== (#10) BRECORD_NOTES contains rows where BREC_time has values that are not just a time ==

=== Problem ===

There are 56 rows in BRECORD_NOTES where the BREC_time column, supposedly a time, contains date values that are not '1899-12-30'.

=== Bad Data ===

In the <code>raw</code> schema run:
<pre>
select * from raw."BRECORD_NOTES" where "BREC_time"::DATE <> '1899-12-30';</pre>

=== Solution ===
The data is fixed in the conversion process.

== (#11) KAZ has b_dadid_publication_info, but b_dadid is NULL ==

=== Problem ===

<code>KAZ</code> has dad id publication info of 'Rudicell et al. 2010', but a <code>NULL</code> dadid.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select *
from clean.biography
where b_dadid is NULL
and (b_dadid_publication_info <> ''
or b_dadid_publication_info is null);</pre>

=== Solution ===

<code>KAZ</code> has 2 potential dads. Introduce a DadIDStatus column, to replace the DadIDPrelim column and have a code that describes what's going on with <code>KAZ</code>.

Fixed in the MS Access data; <code>KAZ</code> was assigned the <code>UNK</code> individual as the dad. Problem will be marked resolved with the upload of the next MS Access database dump.

== (#12) 9 BIOGRAPHY rows have BirthComm values that are not COMM_IDS.CommID values ==

=== Problem ===

There are 9 BIOGRAPHY rows with BirthComm values that are not valid COMM_IDS, their communities do not exist.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select *
from easy.biography
where b_birthgroup is not NULL
and not exists (select 1
from easy.community_lookup
where community_lookup.cl_community_id
= biography.b_birthgroup);</pre>

=== Solution ===
Added KL and KL_KK to CommIds. KL = Kalande, KL_KK= Kasekela/kalande
codes are created during conversion

Fixed in commit 5d85b83bfe1a.

== * (#13) COMM_MEMBS rows place individuals in a community, that is not their birth community, before their BIOGRAPHY.EntryDate ==

=== Problem ===

13 COMM_MEMBS rows place individuals into a community, that is not their birth community, before BIOGRAPHY says they
entered the community.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select b.b_animid, b.b_birthgroup, b.b_entrydate, cm.cm_start_date
from clean.community_membership as cm
join clean.biography as b
on (b.b_animid = cm.cm_b_animid)
where b.b_birthgroup is distinct from cm.cm_cl_community_id
and cm.cm_start_date < b.b_entrydate
order by b.b_animid, cm.cm_start_date;
</pre>

=== Solution ===

== * (#14) COMM_MEMBS rows place individuals in a community after their BIOGRAPHY.EndDate ==

=== Problem ===

3 COMM_MEMBS rows place individuals into a community after BIOGRAPHY says they
left the community.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select b.b_animid, cm.cm_end_date
from clean.community_membership as cm
join clean.biography as b
on (b.b_animid = cm.cm_b_animid)
where cm.cm_end_date > b.b_departdate
order by b.b_animid, cm.cm_end_date;
</pre>

=== Solution ===
ICG EVL fixed BH, HAI, TZB2 in Access. Need to talk to Karl about MG, RO, WD. Treat like KL chimps in Biography?

== (#15) TT is placed in a community twice on the same day ==

=== Problem ===

An individual may not be in more than one community (or even twice in the same community) on any given day.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select first.cm_b_animid as anim_id
, first.cm_start_date as first_start_date
, first.cm_end_date as first_end_date
, first.cm_cl_community_id as first_community_id
, first.cm_start_source as first_start_source
, first.cm_end_source as first_end_source
, second.cm_start_date as second_start_date
, second.cm_end_date as second_end_date
, second.cm_cl_community_id as second_community_id
, second.cm_start_source as second_start_source
, second.cm_end_source as second_end_source
from clean.community_membership as first
join clean.community_membership as second
on (first.cm_b_animid = second.cm_b_animid
and first.cm_start_date < second.cm_start_date)
where first.cm_end_date >= second.cm_start_date;
</pre>

=== Solution ===
Fixed in MS Access 2/1/2024
Changed end date of KK_P1 membership to 8/14/2022

ICG

== (#16) There are 31 rows in COMM_MEMB_LOG where MadeBy is NULL ==

=== Problem ===

There are 31 rows in COMM_MEMB_LOG where MadeBy is NULL, and the column does not allow NULL values.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select * from clean.community_membership_update_log where made_by is null;
</pre>

=== Solution ===

See problem #18. Fixed in commit 7a48fb2c4aad3b5bd4

== (#17) There are 64 rows in COMM_MEMB_LOG where the chimp_id is not on BIOGRAPHY ==

=== Problem ===

There are 64 rows in COMM_MEMB_LOG where the chimp_id is not a BIOGRAPHY.AnimID.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select date_of_update, chimp_id
from clean.community_membership_update_log as log
where not exists (select 1
from clean.biography
where biography.b_animid = log.chimp_id);
</pre>

=== Solution ===
Make this a "soft" error. Fixed in commit 4c9626b304.

== (#18) There are 31 rows in COMM_MEMB_LOG where the MadeBy is NULL ==

=== Problem ===

There are 31 rows in COMM_MEMB_LOG where MadeBy is NULL, and the column does not allow
NULLs.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select * from clean.community_membership_update_log where made_by is null;
</pre>

=== Solution ===

Create an unknown person (UNK), and when the person is NULL, use the unkonwn person.

This allows us to make the unknown person "inactive", preventing them from being used
in newly entered data. The alternative, allowing NULL MadeBy values, would allow
new "bad data", and NULL values make querying harder.

Fixed in commit 7a48fb2c4aad3b5b.

== (#19) There are 18 rows in BIOGRAPHY_LOG where the MadeBy is NULL ==

=== Problem ===

There are 18 rows in BIOGRAPHY_LOG where MadeBy is NULL, and the column does not allow
NULLs.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select * from clean.biography_update_log where made_by is null;
</pre>

=== Solution ===

Create an unknown person (UNK), and when the person is NULL, use the unkonwn person.
(See Problem #18)== * (#19) There are 18 rows in BIOGRAPHY_LOG where the MadeBy is NULL ==

Fixed in commit 8a528cc7d1a9fb.

== (#20) There are 3 rows in BIOGRAPHY_LOG where the Rationale is NULL ==

=== Problem ===

There are 3 rows in BIOGRAPHY_LOG where Rationale is NULL. The column does not allow
NULLs; normally the conversion program would convert NULL to the empty string.
But the Rationale column requires there be (non-empty) textual data.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select * from clean.biography_update_log where update_rationale is null;
</pre>

=== Solution ===
ICG fixed in MS Access 2/8/2024. Updated rationale to 'routine update'.

== (#21) There are 6 rows in BIOGRAPHY_LOG where the update_escription is NULL ==

=== Problem ===

There are 6 rows in BIOGRAPHY_LOG where Description is NULL.
It appears that the description was put into the Rationale column, sometimes along with some rationale.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select * from clean.biography_update_log where update_description is null;
</pre>

=== Solution ===
ICG fixed in MS Access 2/8/2024, using information in update_rationale

== (#22) There are 85 rows in BIOGRAPHY_LOG where the chimp_id is not on BIOGRAPHY ==

=== Problem ===

There are 85 rows in BIOGRAPHY_LOG where the chimp_id is not a BIOGRAPHY.AnimID.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select *
from clean.biography_update_log as log
where not exists (select 1
from clean.biography
where biography.b_animid = log.chimp_id);
</pre>

=== Solution ===
Make this a "soft" error. Fixed in commit c96555f9f326.

== (#23) There are 65 rows in BIOGRAPHY_LOG where the MadeBy is not on PEOPLE ==

=== Problem ===

There are 65 rows in BIOGRAPHY_LOG where the MadeBy is not on PEOPLE, but there is MadeBy data.

These are "combination" ID errors, where multiple people are entered instead
of a single people code.

=== Bad Data ===

In the <code>clean</code> schema run:
<pre>
select *
from clean.biography_update_log as log
where not exists (select 1
from clean.people

</pre>
where people.person = log.made_by)
and made_by is not null;
</pre>

=== Solution ===
Fixed in MS Access by ICG 2/8/2024. Changed multiple IDs to the one who made the final change.

== (#24) Follow starts are not first arrivals ==

=== Problem ===
There are 6,397 cases where the old FOLLOW table's columns
"fol_time_begin" is not the first arrival time, fa_time_start,
on FOLLOW_ARRIVAL.

The The Gombe Chimpanzee Database Handbook says that this
should never happen, because the follow start is supposed
to be the first arrival. Is there additional data
in the old fol_time_begin, like the actual start
the observers started working? If not, which data is correct,
the one in FOLLOW or the one in FOLLOW_ARRIVAL?

=== Bad Data ===
<pre>
WITH spans AS
(SELECT fa_fol_date, fa_fol_b_focal_animid
, MIN(fa_time_start) AS min_start
FROM clean.follow_arrival
WHERE fa_b_arr_animid = fa_fol_b_focal_animid
GROUP BY fa_fol_date, fa_fol_b_focal_animid)
SELECT spans.*, follow.fol_time_begin
FROM spans
JOIN clean.follow
ON (follow.fol_date = spans.fa_fol_date
AND follow.fol_b_animid = spans.fa_fol_b_focal_animid)
WHERE NOT EXISTS
(SELECT 1
FROM clean.follow
WHERE follow.fol_date = spans.fa_fol_date
AND follow.fol_b_animid = spans.fa_fol_b_focal_animid
AND spans.min_start = follow.fol_time_begin)
ORDER BY spans.fa_fol_date, spans.fa_fol_b_focal_animid;
</pre>

=== Solution ===

This is data entry error, ignore the problem and ignore the data in the follow.fol_time_begin.

==(#25) Follow ends are not last arrivals ==

=== Problem ===
There are 6,570 cases where the old FOLLOW table's columns
"fol_time_end" is not the last arrival
time, fa_time_end, on FOLLOW_ARRIVAL.

The The Gombe Chimpanzee Database Handbook says that this
should never happen, because the follow end is supposed
to be the first arrival/last departure. Is there additional data
in the old fol_time_end, like the actual end-time
the observers started working? If not, which data is correct,
the one in FOLLOW or the one in FOLLOW_ARRIVAL?

=== Bad Data ===
<pre>
WITH spans AS
(SELECT fa_fol_date, fa_fol_b_focal_animid
, MAX(fa_time_end) AS max_end
FROM clean.follow_arrival
WHERE fa_b_arr_animid = fa_fol_b_focal_animid
GROUP BY fa_fol_date, fa_fol_b_focal_animid)
SELECT spans.*, follow.fol_time_end
FROM spans
JOIN clean.follow
ON (follow.fol_date = spans.fa_fol_date
AND follow.fol_b_animid = spans.fa_fol_b_focal_animid)
WHERE NOT EXISTS
(SELECT 1
FROM clean.follow
WHERE follow.fol_date = spans.fa_fol_date
AND follow.fol_b_animid = spans.fa_fol_b_focal_animid
AND spans.max_end = follow.fol_time_end)
ORDER BY spans.fa_fol_date, spans.fa_fol_b_focal_animid;
</pre>

=== Solution ===

This is data entry error, ignore the problem and ignore the data in the follow.fol_time_end.

== (#26) Mismatch of start-in-nest on follow and follow_arrival ==

=== Problem ===

There are <s>1,848</s> 3,369 follows where the follow arrival says the focal started in the nest but the follow does not, or vice-versa.

=== Bad Data ===

<pre>
WITH spans AS
(SELECT fa_fol_date, fa_fol_b_focal_animid
, MIN(fa_time_start) AS min_start
FROM easy.follow_arrival
WHERE fa_b_arr_animid = fa_fol_b_focal_animid
GROUP BY fa_fol_date, fa_fol_b_focal_animid)
SELECT spans.*, follow.fol_time_begin, follow.fol_flag_begin_in_nest
, first_arrivals.fa_type_of_nesting
FROM easy.follow
JOIN spans
ON (follow.fol_date = spans.fa_fol_date
AND follow.fol_b_animid = spans.fa_fol_b_focal_animid)
JOIN easy.follow_arrival
AS first_arrivals
ON (first_arrivals.fa_fol_date = follow.fol_date
AND first_arrivals.fa_fol_b_focal_animid = follow.fol_b_animid
AND first_arrivals.fa_time_start = spans.min_start)
WHERE (((first_arrivals.fa_type_of_nesting = 1
OR first_arrivals.fa_type_of_nesting = 3)
AND follow.fol_flag_begin_in_nest = 0)
OR (first_arrivals.fa_type_of_nesting <> 1
AND first_arrivals.fa_type_of_nesting <> 3
AND follow.fol_flag_begin_in_nest = 1))
ORDER BY spans.fa_fol_date, spans.fa_fol_b_focal_animid;
</pre>

=== Solution ===

Ignore. Default to data from Follow_Arrival

==== Remarks ====

When applying the solution, solving both problem #26 and #27,
and updating follow_arrival so that it is the one source of truth
used by the conversion,
there are 297 follow_arrival rows updated.

This implies that it is primarily the follow table
that does not mark the individual as being in a
nest when they should be in a nest. ("Should be",
according to the accepted solution.)

==(#27) Mismatch of end-in-nest on follow and follow_arrival ==

=== Problem ===

There are 3,804 follows where the follow arrival says the focal ended in the nest but the follow does not, or vice-versa.

=== Bad Data ===

<pre>
WITH spans AS
(SELECT fa_fol_date, fa_fol_b_focal_animid
, MAX(fa_time_end) AS max_end
FROM easy.follow_arrival
WHERE fa_b_arr_animid = fa_fol_b_focal_animid
GROUP BY fa_fol_date, fa_fol_b_focal_animid)
SELECT spans.*
, follow.fol_time_end, follow.fol_flag_end_in_nest
, last_arrivals.fa_type_of_nesting
FROM easy.follow
JOIN spans
ON (follow.fol_date = spans.fa_fol_date
AND follow.fol_b_animid = spans.fa_fol_b_focal_animid)
JOIN easy.follow_arrival
AS last_arrivals
ON (last_arrivals.fa_fol_date = follow.fol_date
AND last_arrivals.fa_fol_b_focal_animid = follow.fol_b_animid
AND last_arrivals.fa_time_end = spans.max_end)
WHERE (((last_arrivals.fa_type_of_nesting = 2
OR last_arrivals.fa_type_of_nesting = 3)
AND follow.fol_flag_end_in_nest = 0)
OR (last_arrivals.fa_type_of_nesting <> 2
AND last_arrivals.fa_type_of_nesting <> 3
AND follow.fol_flag_end_in_nest = 1))
ORDER BY spans.fa_fol_date, spans.fa_fol_b_focal_animid;
</pre>

=== Solution ===

Ignore the problem. Default to Follow_Arrival

==== Remarks ====

See the remarks for [[Conversion Data Issues#(#27) Mismatch of end-in-nest on follow and follow_arrival|problem #26]].

== (#28) The FOLLOW.FOL_distance_traveled column is not converted ==

=== Problem ===

The FOLLOW.FOL_distance_traveled column has no corresponding column in the new
database design. The conversion process does not check that the value
of this column is consistent with the other data in the database from which it is computed.

=== Solution ===

This is a computed column and does not need to be converted. The desired value is computed in the design of the new database.

The assumption is that the "raw" data from which this value is computed in the MS Access database is correct.

== (#29) The FOLLOW.Brecord_notes column is not converted ==

=== Problem ===

The FOLLOW.Brecord_notes column has no corresponding column in the new
database design.

=== Solution ===

This column is used for administrative purposes and does not need to be in the new database design.

== (#30) Some follows have no community ==

=== Problem ===

There are 13 follows with no community.

=== Bad Data ===

<pre>
select *
from clean.follow
where fol_cl_community_id is null
order by fol_date, fol_b_animid;
</pre>

=== solution ===

Solved

== (#31) Some follows have an animid with trailing spaces ==

=== Problem ===

There are <s>9</s> 39 follows with animids that don't exist, because they have trailing spaces. These are comprised of <s>9</s> 23 distinct animids.

=== Bad Data ===

Cleaned up in <code>clean</code> schema, so query <code>easy</code> schema.

<pre>
select rtrim(fol_b_animid)
from easy.follow
where rtrim(fol_b_animid) <> fol_b_animid
order by fol_b_animid;
</pre>

=== Solution ===

Remove the trailing spaces in the conversion process.

== (#32) Some follows have an animid that is lower-case ==

=== Problem ===

There is 1 follows with an animids that don't exist, because it is lower-case.

=== Bad Data ===

Cleaned up in <code>clean</code> schema, so query <code>easy</code> schema.
<pre>
select *
from easy.follow
where upper(fol_b_animid) <> fol_b_animid
order by fol_b_animid;
</pre>

=== Solution ===

Convert the animid to upper-case in the conversion process.

==(#33) Some follows have an animid that does not exist, even after animid cleanup ==

=== Problem ===

There are 14 follows with animids that don't exist, even after cleanup that removes trailing spaces and forces upper-case. (Some of these may be due to prior errors.) These are comprised of 3 distinct animids.

=== Bad Data ===

<pre>
-- The 14 follows with bad animids
select *
from clean.follow
where not exists
(select 1
from clean.biography
where biography.b_animid = upper(rtrim(follow.fol_b_animid)))
order by fol_date, fol_b_animid;
</pre>

<pre>
-- The 3 animids involved
select distinct follow.fol_b_animid
from clean.follow
where not exists
(select 1
from clean.biography
where biography.b_animid = upper(rtrim(follow.fol_b_animid)))
order by follow.fol_b_animid;
</pre>

=== Solution ===
Solved

== (#34) There are duplicate animid, date combinations on FOLLOW ==

=== Problem ===
There are 2 sets of duplicate animid, date combinations on the FOLLOW table, for a total of 4 rows.

=== Bad Data ===

<pre>
-- The duplicate animid, date combinations
select follow.fol_b_animid, follow.fol_date, count(*)
from clean.follow
group by follow.fol_b_animid, follow.fol_date
having count(*) > 1
order by fol_b_animid, fol_date;
</pre>

<pre>
-- The duplicate rows
with dups as (
select follow.fol_b_animid, follow.fol_date
from clean.follow
group by follow.fol_b_animid, follow.fol_date
having count(*) > 1)
select *
from clean.follow
join dups on (follow.fol_b_animid = dups.fol_b_animid
and follow.fol_date = dups.fol_date)
order by follow.fol_b_animid, follow.fol_date;
</pre>
=== Solution ===
Solved

== (#35) There are follows done before a focal was under study ==

=== Problem ===
There are 5 follows that are done before their focal was under study, before the focal's EntryDate.

=== Bad Data ===

<pre>
select follow.*, biography.b_entrydate
from clean.follow
join clean.biography on (biography.b_animid = follow.fol_b_animid)
where biography.b_entrydate > follow.fol_date
order by follow.fol_date, follow.fol_b_animid;
</pre>

=== Solution ===

Solved

== (#36) FOLLOW_OBSERVERS.Period is not checked against follow start or stop times ==

=== Problem ===
The conversion process uses the clean.follow columns of fol_am_observer_1, fol_am_observer_2, fol_pm_observer_1, and fol_pm_observer_2 to populate FOLLOW_OBSERVERS.
(The *_observer_1 column going into FOLLOW_OBSERVERS.OBS_BRec and the *_observer_2 column going in OBS_Tiki.)

If both the *_observer_1 and the *_observer_2 columns are either NULL or the empty string (after space trimming), the no row is created for the respective time period.

There are no checks done to ensure that the time periods of the follow have any relation to the FOLLOW_OBSERVERS.Period value.

=== Solution ===

Ignore the problem. (If someone cares, add a query to the warning system.)

== (#37) Some follows have no recorded observers ==

=== Problem ===

There are 56 follows with no recorded observers, but the system requires there be a FOLLOW_OBSERVERS record related to the follow.

See problem #36 for a description of the follow observer conversion process.

=== Bad data ===

<pre>
select *
from clean.follow
where coalesce(btrim(fol_am_observer_1), '') = ''
and coalesce(btrim(fol_am_observer_2), '') = ''
and coalesce(btrim(fol_pm_observer_1), '') = ''
and coalesce(btrim(fol_pm_observer_2), '') = ''
order by fol_date, fol_b_animid;
</pre>

=== Solution ===

Make the "NONE" (no observer) person the observers.

Create a "UNK" (unknown) time period in PERIODS, and make that the time period.

== (#38) Some follows have only one observer, but the system wants two ==

=== Problem ===

There are follows with only one observer, but the system requires a FOLLOW_OBSERVERS record have a value in both OBS_BRec and OBS_Tiki.

See problem #36 for a description of the follow observer conversion process.

=== Solution ===

Create a "NONE" person, and make that person the observer when there's otherwise not a value.

== (#39) In follow, there are observers that are 2 people ==

=== Problem ===

In the follow table, in columns fol_am_observer_1, fol_am_observer_2, fol_pm_observer_1, fol_pm_observer_2, there are 30 rows (28, really, because of n/a) that appear to represent 2 people.

These look like names, separated by the "/" character.

=== Bad data ===
<pre>
WITH new_people AS (
SELECT BTRIM(follow.fol_am_observer_1) AS person
FROM clean.follow
WHERE follow.fol_am_observer_1 IS NOT NULL
GROUP BY BTRIM(follow.fol_am_observer_1)
UNION
SELECT BTRIM(follow.fol_am_observer_2) AS person
FROM clean.follow
WHERE follow.fol_am_observer_2 IS NOT NULL
GROUP BY BTRIM(follow.fol_am_observer_2)
UNION
SELECT BTRIM(follow.fol_pm_observer_1) AS person
FROM clean.follow
WHERE follow.fol_pm_observer_1 IS NOT NULL
GROUP BY BTRIM(follow.fol_pm_observer_1)
UNION
SELECT BTRIM(follow.fol_pm_observer_2) AS person
FROM clean.follow
WHERE follow.fol_pm_observer_2 IS NOT NULL
GROUP BY BTRIM(follow.fol_pm_observer_2)
)
, uniq_people AS (
SELECT new_people.person
FROM new_people
WHERE new_people.person <> ''
GROUP BY new_people.person
)
SELECT uniq_people.person
FROM uniq_people
JOIN uniq_people AS up
ON (LOWER(uniq_people.person) = LOWER(up.person))
WHERE uniq_people.person <> up.person
AND STRPOS(uniq_people.person, '/') <> 0
GROUP BY uniq_people.person
ORDER BY LOWER(uniq_people.person), uniq_people.person;
</pre>

=== Solution ===

Ignore the problem. Mark all people containing a "/" character as not active, so they can't be used in the future.

== (#40) In follow, there are observers that differ only by character case ==

=== Problem ===

In the follow table, in columns fol_am_observer_1, fol_am_observer_2, fol_pm_observer_1, fol_pm_observer_2, there are 511 names that differ only by character case. So, about half that in terms of unique names.

This is complicated to account for and exclude duplicates in the conversion process. So resolution of this is holding back additional conversion work.

=== Bad data ===

Note the use of the <code>easy</code> schema, instead of the
<code>clean</code> schema. This is because the observer data
in the <code>clean</code> schema has been case-normalized.

<pre>
WITH new_people AS (
SELECT BTRIM(follow.fol_am_observer_1) AS person
FROM easy.follow
WHERE follow.fol_am_observer_1 IS NOT NULL
GROUP BY BTRIM(follow.fol_am_observer_1)
UNION
SELECT BTRIM(follow.fol_am_observer_2) AS person
FROM easy.follow
WHERE follow.fol_am_observer_2 IS NOT NULL
GROUP BY BTRIM(follow.fol_am_observer_2)
UNION
SELECT BTRIM(follow.fol_pm_observer_1) AS person
FROM easy.follow
WHERE follow.fol_pm_observer_1 IS NOT NULL
GROUP BY BTRIM(follow.fol_pm_observer_1)
UNION
SELECT BTRIM(follow.fol_pm_observer_2) AS person
FROM easy.follow
WHERE follow.fol_pm_observer_2 IS NOT NULL
GROUP BY BTRIM(follow.fol_pm_observer_2)
)
, uniq_people AS (
SELECT new_people.person
FROM new_people
WHERE new_people.person <> ''
GROUP BY new_people.person
)
SELECT uniq_people.person
FROM uniq_people
JOIN uniq_people AS up
ON (LOWER(uniq_people.person) = LOWER(up.person))
WHERE uniq_people.person <> up.person
GROUP BY uniq_people.person
ORDER BY LOWER(uniq_people.person), uniq_people.person;
</pre>

=== Solution ===

Use the mixed case code when such exists. This involves changing the observer columns in the <code>follow</code> table. See [[The_Old_Database#The_observer_columns_of_the_follow_table|the notes]].

==(#41)There are follow arrivals with NULL nesting information ==

=== Problem ===

In the follow_arrival table, there are 11 rows with a NULL fa_type_of_nesting.

=== Bad data ===

<pre>
select * from clean.follow_arrival where fa_type_of_nesting IS NULL;
</pre>

=== Solution ===

Solved

== (#42) There are follow arrivals with NULL sexual cycle information ==

=== Problem ===

In the follow_arrival table, there are 222 rows with a NULL fa_type_of_cycle.

=== Bad data ===

<pre>
select * from clean.follow_arrival where fa_type_of_cycle is null;
</pre>

=== Solution ===

Make a <code>MISS</code> <code>CYCLE_STATES</code> value and use that for the arrivals with no cycle information.

== * (#43) There are follow arrivals with no related follow ==

=== Problem ===

In the follow_arrival table, there are 732 rows with a focal and a date that have no matching information on the follow table.

=== Bad data ===

<pre>
SELECT *
FROM clean.follow_arrival
WHERE NOT EXISTS
(SELECT 1
FROM clean.follow
WHERE follow.fol_b_animid = follow_arrival.fa_fol_b_focal_animid
AND follow.fol_date = follow_arrival.fa_fol_date)
ORDER BY follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid;
</pre>

== * (#44) There are follow arrivals where the arriving chimp arrives before being under study ==

=== Problem ===

In the follow_arrival table, there are <s>217</s> 49 rows where the fa_b_arr_animid, the arriving individual, has an entry date after the date of the follow.

=== Bad data ===

<pre>
select follow_arrival.*, biography.b_entrydate, biography.b_sex
from clean.follow_arrival
join clean.biography
on (biography.b_animid = follow_arrival.fa_b_arr_animid)
where biography.b_entrydate > follow_arrival.fa_fol_date
order by biography.b_animid, follow_arrival.fa_fol_date;
</pre>

== * (#45) The follow_arrival.fa_update column is not preserved ==

=== Problem ===

There is nowhere in the current design to store the values in the follow_arrival.fa_update column.

=== Solution ===

Add a temporal extension to Postgres to make the db into a temporal database to track change history and be able to see data as it existed at any point in time.

Will re-review this should a temporal extension be out of budget, etc.

== * (#46) There are follow_arrivals where non-females have a cycle code that is other than <code>n/a</code> ==

=== Problem ===

There are <s>169</s> 168 follow_arrival rows, for non-female arriving individuals, that have a sexual cycle code (fa_type_of_cycle) that is not <code>n/a</code>.

=== Bad data ===

<pre>
select follow_arrival.*, biography.b_sex
from clean.follow_arrival
join clean.biography
on (biography.b_animid = follow_arrival.fa_b_arr_animid)
where biography.b_sex <> 'F'
and follow_arrival.fa_type_of_cycle <> 'n/a'
order by follow_arrival.fa_fol_date
, follow_arrival.fa_b_arr_animid;
</pre>

To summarize by sex and cycle code:

<pre>
select biography.b_sex, follow_arrival.fa_type_of_cycle, count(*)
from clean.follow_arrival
join clean.biography
on (biography.b_animid = follow_arrival.fa_b_arr_animid)
where biography.b_sex <> 'F'
and follow_arrival.fa_type_of_cycle <> 'n/a'
group by biography.b_sex, follow_arrival.fa_type_of_cycle order by biography.b_sex
, follow_arrival.fa_type_of_cycle;
</pre>

=== Solution ===

Allow males to have a cycle code of <code>MISS</code>.

Change non-female's who have a cycle code of <code>0</code> to a cycle code of <code>n/a</code>.

== (#47) There are follow_arrivals where females that are too young have a cycle code of <code>U</code> ==

=== Problem ===

There are 16 follow_arrival rows, for female arriving individuals, that have a sexual cycle code (fa_type_of_cycle) of <code>U</code> but are less than 5 years of age.

Naturally, this number will change if the age limit is changed, but this is here as a placeholder.

Note: The test is against the birthdate, not the minimum possible birthdate.

=== Bad data ===

<pre>
select follow_arrival.*, biography.b_sex, biography.b_birthdate
from clean.follow_arrival
join clean.biography
on (biography.b_animid = follow_arrival.fa_b_arr_animid)
where biography.b_sex = 'F'
and follow_arrival.fa_type_of_cycle = 'U'
and biography.b_birthdate
> (follow_arrival.fa_fol_date
- '5 years'::INTERVAL
)
order by follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid;
</pre>

Note: The original query used the wrong inequality symbol.

=== Solution ===

Changing the limit from 6 years of age to 5 reduced the number of outstanding problems to 16.

The rest will have to be fixed in the data. Alternately, we can adjust the hard limit, and set a soft limit of 5 years in the warning system with a note to change the hard limit back once the errors are resolved.

Resolution: Fixed in the data.

== * (#48) There are follow arrivals where the arriving chimp arrives after finishing being under study ==

=== Problem ===

In the follow_arrival table, there are <s>239</s> 324 rows where the fa_b_arr_animid, the arriving individual, has a departure date before the date of the follow.

=== Bad data ===

<pre>
select follow_arrival.*, biography.b_departdate, biography.b_sex
from clean.follow_arrival
join clean.biography
on (biography.b_animid = follow_arrival.fa_b_arr_animid)
where biography.b_departdate < follow_arrival.fa_fol_date
order by biography.b_animid, follow_arrival.fa_fol_date;
</pre>

== * (#49) There are follow_arrivals where females that are too old have a cycle code of <code>U</code> ==

=== Problem ===

There are <s>127</s> 98 follow_arrival rows, for female arriving individuals, that have a sexual cycle code (fa_type_of_cycle) of <code>U</code> but are more than 14 years of age.
(Actually, because the endpoint takes up the whole 14th year, this means at least 15 years of age.)

Naturally, this number will change if the age limit is changed, but this is here as a placeholder.

Note: The test is against the birthdate, not the maximum possible birthdate.

=== Bad data ===

Note: Original query used the wrong inequality.

<pre>
select follow_arrival.*
, biography.b_sex
, biography.b_birthdate
from clean.follow_arrival
join clean.biography
on (biography.b_animid = follow_arrival.fa_b_arr_animid)
where biography.b_sex = 'F'
and follow_arrival.fa_type_of_cycle = 'U'
and biography.b_birthdate
<= (follow_arrival.fa_fol_date
- '1 year'::INTERVAL
- '14 years'::INTERVAL
)
order by follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid;
</pre>

=== Solution ===

The number of problem rows was reduced after changing the limit to 14 years from 9 years.

The remaining problems will have to be adjusted in the data. Alternately, we can adjust the hard limit, and set a soft limit of 14 years in the warning system with a note to change the hard limit back once the errors are resolved.

== (#50) There are follow_arrivals where females have a cycle code of <code>n/a</code> ==

=== Problem ===

There are 4,496 follow_arrival rows, for female arriving individuals, that have a sexual cycle code (fa_type_of_cycle) of <code>n/a</code>.

=== Bad data ===

<pre>
select follow_arrival.*
, biography.b_sex
, biography.b_birthdate
from clean.follow_arrival
join clean.biography
on (biography.b_animid = follow_arrival.fa_b_arr_animid)
where biography.b_sex = 'F'
and follow_arrival.fa_type_of_cycle = 'n/a'
order by follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid;
</pre>

=== Solution ===

Change the cycle code to <code>MISS</code>, for these rows. This is done in the <code>clean</code> schema.

This is an indication that future cleanup is required.

== (#51) There are follow_arrivals with invalid fa_data_source values ==

=== Problem ===

There are 546 follow_arrival rows that have fa_data_source values that are not one of:
<code>Tiki</code>
<code>Tiki_Mom</code>
<code>Tiki_ID</code>
<code>Brec</code>.

=== Bad data ===

Query the <code>easy</code> schema, because the data has been cleaned in the <code>clean</code> schema.
<pre>
select follow_arrival.*
from easy.follow_arrival
where follow_arrival.fa_data_source <> 'Tiki'
and follow_arrival.fa_data_source <> 'Tiki_Mom'
and follow_arrival.fa_data_source <> 'Tiki_ID'
and follow_arrival.fa_data_source <> 'Brec'
order by follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid;

-- Summarize with:
select follow_arrival.fa_data_source, count(*)
from easy.follow_arrival
where follow_arrival.fa_data_source <> 'Tiki'
and follow_arrival.fa_data_source <> 'Tiki_Mom'
and follow_arrival.fa_data_source <> 'Tiki_ID'
and follow_arrival.fa_data_source <> 'Brec'
group by follow_arrival.fa_data_source
order by follow_arrival.fa_data_source;
</pre>

=== Solution ===

Change <code>BREC</code> and <code>brec</code> to <code>Brec</code>.

Change <code>TikI</code> to <code>Tiki</code>.

Add the other codes:

fa_data_source count
Tiki_GM 202
Tiki_PM 165
Tiki_SS 22

Also: Tiki_Mom

== * (#52) There are follow_arrivals that are almost duplicates ==

=== Problem ===

This entry may end up being multiple problems.

There are follow_arrivals that are near duplicates.
When checking for duplicates on <code>fa_fol_date</code>, <code>fa_fol_b_focal_animid</code>, <code>fa_b_arr_animid</code>
and <code>fa_seq_num</code>, the rows are always unique.
But checking the combination of <code>fa_fol_date</code>, <code>fa_fol_b_focal_animid</code>, <code>fa_b_arr_animid</code>
and <code>fa_time_start</code>,
and <code>fa_time_end</code> yields 184 rows.
Leaving off <code>fa_time_end</code> and just checking the combination of <code>fa_fol_date</code>, <code>fa_fol_b_focal_animid</code>, <code>fa_b_arr_animid</code>
and <code>fa_time_start</code> yields 430 rows.

Why the duplicates?

This entry is a call for a definition of an <code>ARRIVALS</code> row, what does it mean to be a duplicate?

=== Bad data ===

<pre>
-- no duplicates when looking at just sequence number
select fa_fol_date
, fa_fol_b_focal_animid
, fa_b_arr_animid
, fa_seq_num
from clean.follow_arrival
group by fa_fol_date
, fa_fol_b_focal_animid
, fa_b_arr_animid
, fa_seq_num
having count(*) > 1;
</pre>

<pre>
-- Checking against start and end time
with dups as
(select fa_fol_date
, fa_fol_b_focal_animid
, fa_b_arr_animid
, fa_time_start
, fa_time_end
from clean.follow_arrival
group by fa_fol_date
, fa_fol_b_focal_animid
, fa_b_arr_animid
, fa_time_start
, fa_time_end
having count(*) > 1)
select *
from clean.follow_arrival
where exists
(select 1
from dups
where follow_arrival.fa_fol_date = dups.fa_fol_date
and follow_arrival.fa_fol_b_focal_animid
= dups.fa_fol_b_focal_animid
and follow_arrival.fa_b_arr_animid = dups.fa_b_arr_animid
and follow_arrival.fa_time_start = dups.fa_time_start
and follow_arrival.fa_time_end = dups.fa_time_end)
order by follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid
, follow_arrival.fa_time_start
, follow_arrival.fa_time_end
, follow_arrival.fa_seq_num;
</pre>

=== Solution ===

Where both start and end-times are duplicated, along with other data like cycle state, I can add additional observers or data sources. But I don't know what to do with "almost the same" data.

== * (#53) There are follow_arrivals where the arriving chimp does not exist ==

=== Problem ===

There are <s>688</s> 84 rows, having <s>3</s> 6 different <code>fa_b_arr_animid</code> values, where the <code>fa_b_arr_animid</code> value is not a <code>biography.b_animid</code> value.

=== Bad Data ===

<pre>
-- The rows
select *
from clean.follow_arrival
where not exists
(select 1
from clean.biography
where biography.b_animid = follow_arrival.fa_b_arr_animid)
order by follow_arrival.fa_b_arr_animid
, follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid;

-- A summary of the bad animal ids
select follow_arrival.fa_b_arr_animid, count(*)
from clean.follow_arrival
where not exists
(select 1
from clean.biography
where biography.b_animid = follow_arrival.fa_b_arr_animid)
group by follow_arrival.fa_b_arr_animid
order by follow_arrival.fa_b_arr_animid;
</pre>

=== Solution ===

ICG fixed in MS Access 10/2025. AMA-->AME, OBE-->POR

== * (#54) There are follow_arrivals where females that are too young have a cycle state code that is not <code>0</code>, <code>U</code>, or <code>MISS</code> ==

=== Problem ===

There are <s>26</s> 42 follow_arrival rows, for female arriving individuals, that have a sexual cycle code indicating sexual swelling but are less than 8 years of age.

Note: The test is against the birthdate, not the minimum possible birthdate.

=== Bad data ===

<pre>
select follow_arrival.*, biography.b_sex, biography.b_birthdate
from clean.follow_arrival
join clean.biography
on (biography.b_animid = follow_arrival.fa_b_arr_animid)
where biography.b_sex = 'F'
and follow_arrival.fa_type_of_cycle <> '0'
and follow_arrival.fa_type_of_cycle <> 'U'
and follow_arrival.fa_type_of_cycle <> 'MISS'
and biography.b_birthdate
> (follow_arrival.fa_fol_date
- '8 years'::INTERVAL
)
and follow_arrival.fa_b_arr_animid <> 'MGF'
and follow_arrival.fa_b_arr_animid <> 'MGF2'
and follow_arrival.fa_b_arr_animid <> 'MGF3'
order by follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid;
</pre>

=== Solution ===

== * (#55) There are community_membership rows that place an individual in a community before birth ==

=== Problem ===

There is one individual who is placed in a community, once, before birth.

Note: The test is against the birthdate, not the minimum possible birthdate.

=== Bad data ===

<pre>
select b.b_animid, b.b_birthdate, cm.cm_start_date
from clean.community_membership as cm
join clean.biography as b
on (b.b_animid = cm.cm_b_animid)
where cm.cm_start_date < b.b_birthdate
order by b.b_animid, cm.cm_start_date;
</pre>

=== Bad solution ===

ICG fixed FN community starte date in MS Access - oct 2025

== (#56) There are follow_arrival focal animids with trailing spaces ==

=== Problem ===

There are 144 follow arrivals where the focal id has trailing spaces.

=== Bad data ===

<pre>
select * from follow_arrival where fa_fol_b_focal_animid <> rtrim(fa_fol_b_focal_animid);
</pre>

=== Solution ===

Remove the trailing spaces in table in the <code>clean</code> schema.

== * (#57) GROOM_BOUT duplicate keys ==

=== Problem ===
The data dump says that the <code>GROOM_BOUT</code> table has a primary key consisting of, in order, the columns: <code>GRM_FOL_date</code>, <code>GRM_FOL_B_focal_AnimId</code>, <code>GRM_time_begin</code>, <code>GRM_B_partner_AnimId</code>
But these columns contain duplicate values.

=== Bad Data ===
The duplicate values can be listed (from the <code>raw</code> schema) with:

<pre>SELECT *
FROM raw."GROOM_BOUT"
JOIN (SELECT "GRM_FOL_date" AS the_date
, "GRM_FOL_B_focal_AnimId" AS the_animid
, "GRM_time_begin" AS the_time
, "GRM_B_partner_AnimId" AS the_partner
FROM raw."GROOM_BOUT"
GROUP BY "GRM_FOL_date"
, "GRM_FOL_B_focal_AnimId"
, "GRM_time_begin"
, "GRM_B_partner_AnimId"
HAVING count(*) > 1
) AS gb
ON ("GROOM_BOUT"."GRM_FOL_date" = gb.the_date
AND "GROOM_BOUT"."GRM_FOL_B_focal_AnimId" = gb.the_animid
AND "GROOM_BOUT"."GRM_time_begin" = gb.the_time
AND "GROOM_BOUT"."GRM_B_partner_AnimId" = gb.the_partner);</pre>

== * (#58) OTHER_SPECIES duplicate keys ==

=== Problem ===
The data dump says that the <code>OTHER_SPECIES</code> table has a primary key consisting of, in order, the columns: <code>OS_FOL_date</code>, <code>OS_FOL_B_focal_AnimID</code>, <code>OS_time_begin</code>
But these columns contain duplicate values.

=== Bad Data ===
The duplicate values can be listed (from the <code>raw</code> schema) with:

<pre>SELECT *
FROM raw."OTHER_SPECIES"
JOIN (SELECT "OS_FOL_date" AS the_date
, "OS_FOL_B_focal_AnimId" AS the_animid
, "OS_time_begin" AS the_time
FROM raw."OTHER_SPECIES"
GROUP BY "OS_FOL_date"
, "OS_FOL_B_focal_AnimId"
, "OS_time_begin"
HAVING count(*) > 1
) AS os
ON ("OTHER_SPECIES"."OS_FOL_date" = os.the_date
AND "OTHER_SPECIES"."OS_FOL_B_focal_AnimId" = os.the_animid
AND "OTHER_SPECIES"."OS_time_begin" = os.the_time);</pre>

=== Note ===
The reason the MS Access database seems to allow this condition is likely due to difference in character case between column names and primary key designations.

== * (#59) Zero BIOGRAPHY.b_animid_num values ==

=== Problem ===
SokewDB requires that the animal ID number be greater than <code>0</code>, or <code>NULL</code>, but some (what seem to be rows for babys) have a <code>0</code> value.

=== Bad Data ===
The <code>0</code> values can be listed (from the <code>clean</code> schema) with:

<pre>select * from clean.biography where b_animid_num = 0;</pre>

=== Solution ===
Change the <code>0</code> values to <code>NULL</code>.

This is a brute-force, but adequate, solution because it does not validate anything concerning the rows affected.

== * (#60) Invalid biography_update_log.made_by values ==

=== Problem ===
There is a <code>biography_update_log.made_by</code> value (<code>SF/EVL</code>) that is not a person. (Not on the <code>PEOPLE</code> table.)

=== Bad Data ===
The invalid values can be listed (from the <code>clean</code> schema) with:

<pre>SELECT *
FROM clean.biography_update_log
WHERE made_by IS NOT NULL
AND NOT EXISTS (SELECT 1
FROM clean.people
WHERE people.person = biography_update_log.made_by);</pre>

== * (#61) Invalid follow date/focals pairs in follow_arrival ==

=== Problem ===
There are <code>follow_arrival.fa_fol_b_animid</code> and <code>follow_arrival.fa_fol_date</code> value combinations that do not exist in <code>follow</code>.

=== Bad Data ===
The invalid values can be listed (from the <code>clean</code> schema) with:

<pre>SELECT *
FROM clean.follow_arrival
WHERE NOT EXISTS
(SELECT 1
FROM clean.follow
WHERE follow.fol_date = follow_arrival.fa_fol_date
AND follow.fol_b_animid
= follow_arrival.fa_fol_b_focal_animid)
ORDER BY follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid
, follow_arrival.fa_time_start
, follow_arrival.fa_time_end;</pre>

== (#62) There are follow_arrival rows with arriving animids with trailing spaces ==

=== Problem ===

There are 39 follow arrivals where the arriving animid has trailing spaces.

=== Bad data ===

<pre>
select * from easy.follow_arrival where fa_b_arr_animid <> rtrim(fa_b_arr_animid);
</pre>

=== Solution ===

Remove the trailing spaces in table in the <code>clean</code> schema.

== * (#63) There are follow_arrival rows with <code>NULL</code> fa_data_source values ==

=== Problem ===

There are 5,020 follow arrivals where the fa_data_source is <code>NULL</code>.

=== Bad data ===

<pre>SELECT *
FROM easy.follow_arrival
WHERE fa_data_source IS NULL
ORDER BY follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid
, follow_arrival.fa_time_start
, follow_arrival.fa_time_end;</pre>

=== Solution ===

Create a <code>none</code> value in ARRIVAL_SOURCES, and use that value instead of <code>NULL</code> in follow_arrival table in the <code>clean</code> schema.

== * (#64) There are follow_arrival rows with <code>NULL</code> fa_type_of_certainty values ==

=== Problem ===

There is 1 follow arrivals where the fa_type_of_certainty is <code>NULL</code>.

=== Bad data ===

<pre>SELECT *
FROM easy.follow_arrival
WHERE fa_type_of_certainty IS NULL
ORDER BY follow_arrival.fa_fol_date
, follow_arrival.fa_fol_b_focal_animid
, follow_arrival.fa_b_arr_animid
, follow_arrival.fa_time_start
, follow_arrival.fa_time_end;</pre>

=== Solution ===