Alex,
Here is the final biweekly update. I am only including you on the
distribution. Feel free to send it broader. Holly will receive a copy with
the invoice as requested.
Feel free to give me a call next week and we can discuss.
Thanks,
Brian
--
Brian Mayer
bmayer(a)cs.vt.edu
540-231-5907
Sanghani Center for Artificial Intelligence and Data Analytics
<https://sanghani.cs.vt.edu/> - Virginia Tech <https://vt.edu/>
Alex,
Here is our Final Report and Lessons Learned as discussed during our call
on Tuesday. Let me know if there is anything else you'd like us to add.
Also as mentioned over the next two weeks we are working to prepare the
sample product output and all the elements that are going into it.
Happy to set up another call if you'd like and let us know if/when you have
dates for an in-person meeting.
Thanks,
Brian and team
--
Brian Mayer
bmayer(a)cs.vt.edu
540-231-5907
Sanghani Center for Artificial Intelligence and Data Analytics
<https://sanghani.cs.vt.edu/> - Virginia Tech <https://vt.edu/>
Alex,
I've attached an outline of what we think the output could include.
I've discussed these items with the team and we are confident most of them
will come to fruition in the next two weeks (or sooner) but a few are still
being tweaked, tested, or completed so we may not be able to include them
in the final product for this effort.
I have also listed a few options of items on page 3 that could further
enhance the report but are items that we definitely won't be able to get to
in the next two weeks but could certainly work on if we have more time. One
of those is a list of the fastest-growing terms, that I know you seemed to
be interested in.
I'll be working tonight and tomorrow on converting our slides into a
document and attaching this sample output as an Appendix for the final
report (which I will send to you tomorrow).
Thanks,
Brian
--
Brian Mayer
bmayer(a)cs.vt.edu
540-231-5907
Sanghani Center for Artificial Intelligence and Data Analytics
<https://sanghani.cs.vt.edu/> - Virginia Tech <https://vt.edu/>
In addition to any quantitative predictions we provide (total, by sector,
and by country) in the current dashboard... Alex would like us to provide
what demographics we expect to see an uptick from. (*Bursts*?)
- We could look for bursts of keywords by demographic
- country
- family type
- sector
- conveyance (????)
- We know that this is pretty reliable as a total and by sector, but
we may need to validate others. Might be wise to develop a more objective
validation methodology
- e.g., What defines an uptick/burst in the encounter data
(increase by X%?)
- we tried using burst analysis but for some reason this didn't
work
- Then maybe we determine Accuracy/Precision/Recall
- We will only be able to evaluate a limited amount of
observations (20-30)
- SHOULD WE DO BURST ANALYSIS? HOW MUCH? HOW SHOULD WE VALIDATE?
Alex liked the fastest-growing keywords from the EMBERS visualization (CAN
WE PROVIDE THIS? CAN WE INCLUDE NON-KEYWORDS? RUN DQE?)
Alex liked the word cloud in the EMBERS visualization (WHAT COULD/SHOULD WE
USE TO CREATE A WORD CLOUD?)
Alex liked the idea of "hot" keywords
Other aspects that could be included with "Hot" Keywords
- Group hot keywords by spaCy entity type, e.g., PERSON, NORP, FAC, ORG,
GPE, etc.
- Can we include non-keywords
*Modeling work still to do in addition to anything determined above*
Take another look at the country-specific predictions and try to improve
nMIL results
Confidence Interval
CBP Conveyance/Transportation analysis (*push this to future work*)
*Items Due to ODNI*
Provide Alex an outline of the product
Turn slides into document report
Add a description/example of the output product to the final report
--
Brian Mayer
bmayer(a)cs.vt.edu
540-231-5907
Sanghani Center for Artificial Intelligence and Data Analytics
<https://sanghani.cs.vt.edu/> - Virginia Tech <https://vt.edu/>