Notes from my discussion with Alex
In addition to any quantitative predictions we provide (total, by sector, and by country) in the current dashboard... Alex would like us to provide what demographics we expect to see an uptick from. (*Bursts*?) - We could look for bursts of keywords by demographic - country - family type - sector - conveyance (????) - We know that this is pretty reliable as a total and by sector, but we may need to validate others. Might be wise to develop a more objective validation methodology - e.g., What defines an uptick/burst in the encounter data (increase by X%?) - we tried using burst analysis but for some reason this didn't work - Then maybe we determine Accuracy/Precision/Recall - We will only be able to evaluate a limited amount of observations (20-30) - SHOULD WE DO BURST ANALYSIS? HOW MUCH? HOW SHOULD WE VALIDATE? Alex liked the fastest-growing keywords from the EMBERS visualization (CAN WE PROVIDE THIS? CAN WE INCLUDE NON-KEYWORDS? RUN DQE?) Alex liked the word cloud in the EMBERS visualization (WHAT COULD/SHOULD WE USE TO CREATE A WORD CLOUD?) Alex liked the idea of "hot" keywords Other aspects that could be included with "Hot" Keywords - Group hot keywords by spaCy entity type, e.g., PERSON, NORP, FAC, ORG, GPE, etc. - Can we include non-keywords *Modeling work still to do in addition to anything determined above* Take another look at the country-specific predictions and try to improve nMIL results Confidence Interval CBP Conveyance/Transportation analysis (*push this to future work*) *Items Due to ODNI* Provide Alex an outline of the product Turn slides into document report Add a description/example of the output product to the final report -- Brian Mayer bmayer@cs.vt.edu 540-231-5907 Sanghani Center for Artificial Intelligence and Data Analytics https://sanghani.cs.vt.edu/ - Virginia Tech https://vt.edu/
participants (1)
-
Brian Mayer