# AAAI2020

This is a compiled notes during AAAI 2020. I mainly went to the sessions related to my research field, and several relevant fields:

• NLP and neural network, machine learning
• Explainable AI
• AI applied to societies

# Friday 0207

## Tutorial FA1: AI in Precision Medicine

• Instead of interpreting models in a black-box manner, we should build explainable models in medication.

## Tutorial XAI

Their slides are at the website xaitutorial2020.github.io

## Workshop PPAI

Privately computing influence in regression models

• Compute an influence score that encourages participants to give higher quality data points.
• The influence score is not computable: approximate it.

Private learning for high dimensional targets with PATE

• PATE is for classification. How about image segmentation?
• Approach: PCA to lower dimensions, then you can do PATE.
• Details: Need to empirically determine the dimension for PCA. Also, need to bound the activations. Used an activation function that projects the logits onto unit ball.

Plans that remain private, even in hindsight

• Want to protect robot information from the maintainer (adversary).
• Problem formulation: described a hierarchy of the maintainer’s knowledge (weakest: know nothing; highest: know the exact plan)
• Method:
• Use bipartite graph (world state -> plan state) to represent the problems.
• Use rule-based methods (formal systems) to protect information.

A survey of differentially private GANs

• Decision Tree has a proof that the accuracy in each split operation does not become worse. Try to give a similar guarantee for differential privacy.
• Also gave a comparison of three approaches empirically.
• NoisyCounts
• LocalRNM

## Tutorial FP1: Differential Deep Learning on Graphs

Chengxi Zang. Tutorial website

1. Introduction: differential deep learning and their graph applications

Graph applications example:

• Network dynamics of social media.
• Traffic data.

Examples of DNN and differential equations:

• Residual Net -> Differential equations -> Neureal ODE

• RNN -> Residual RNN -> differential equation RNN

Example of normalizing flow -> diff equations

• A normalizing flow is an invertible generative model.
• P(X) -> P(Z): inference. P(Z)->P(X): generation
• Flow -> residual flow -> differential equation flow.

NICE (neural invertible flow)

• NICE comes from Hamiltonian Systems
• NICE vs RealNVP
• NICE vs differential equations.

Then we could do the other way. Converting diff equations into DNNs by numerical methods.

2. Molecular graph generation in drug discovery

Encoding graph is hard. Decoding graph is much harder. (e.g., chemical constraints).

3.Learning dynamics on graphs

Goal: to predict temporal change or final states of complex systems

• Use differential equations (parameterized by neural networks) to model the system dynamics.

• Continuous-time dynamics prediction
• Structured sequence prediction
• Node classification / regression

Propose to use NDCN (neural dynamics on complex networks) as a unified framework to solve all three of them.

4. Mechanism discovery in graphs

Goal: to find dynamical laws of complex systems

5. Conclusion: discussions and Future Directions

# Saturday 0208

## W11S: Evaluating Evaluation of AI systems

930-1030 Anna Rogers: The questions that the current AI can’t answer

• Models could still give good performances when inputting random embeddings.
• BERT etc., model performances drop when we paraphrase.
• Human evaluation results largely depend on the questions.

11-12 Paper session 1

Jose Hernandez-Orallo: AI Evaluation: On Broken Yardsticks and measurement scales

• AI measurement suffers from a moving-target phenomenon.

• A “challenge-solve-and-replace” dynamics of AI benchmarks.
• The moving target: five possible causes

• AI effect: whenever something is automated, it is not intelligence any more
• Superhuman abyss: What does it mean by saying “it goes beyond human performance”? There are many unfair extensions.
• Resource neglect: breakthroughs are obtained with huge resources in terms of data, compute, supervision and other
• Specialisation drift. Tendency of AI researchs to specialise to a particular task, or to overfit to a benchmark
• Cognitive judge problem: manual or automatic cognitive effort is needed to produce and verify instances, frequently relying on human labellers or crowdsourcing.
• How to improve the evaluation metrics? Divide tasks into different categories:
• Ceiling. Ground truth is human intelligence. E.g., Turing Test, human voice generation.
• Projectional. Once AI reaches human performance, the projected score is meaningful.
• Transitional. Add dimensions once human performance is reached. E.g., Add noise to ImageNet.
• Universal extrapolation. E.g., Predict brain tumor 5 years before.
• For each task, the measurement metrics (Table 1).
• Could also measure from multiple dimensions.

Christopher Pereyda and Lawrence Holder: Measuring the relative similarity and difficulty between AI benchmark problems

• A similarity framework to describe task similarity.
• Evaluate this framework, by DNN performances on foundational datasets (Classification: MNIST, Fashion-MNIST, C10, C100, CP. RL environments: misc.).

Azamat Kamzin, Prajwal Paudyal, Ayan Banerjee and Sandeep K.S.Gupta: Evaluating the gap between hype and performance of AI systems

• First look at the task, and then some up with evaluation metrics.

• Five best practices for evaluating AI systems that can potentially address and explain the gap between hype and practically observed performance of AI systems.

• BP1: Transparency on the effects of evaluation methodologies on participant bias
• BP2: Disclosure of priorities of evaluation criteria
• BP3: Choosing robust metrics
• BP4: Evaluation in terms of explainable concepts
• BP5: Iterative evaluation on adaptive human AI interaction

Chris Welty, Praveen Paritosh, Lora Aroyo: Metrology for AI: From benchmarks to instruments

• Would you use a yardstick to measure the thickness of a piece of paper?
• 35% of empirical paper authors don’t believe error bars to be important.
• Learn from the metrology community.
• Reproducibility is very important. Reproducibility -> Repeatability.

12-13 Odd Erik Gunderson: How can we know it is shoulders we stand on? Reproducibility and evaluation

14-1430 Paper session 2

Botty Dimanov, Umang Bhatt, Mateja Jamnik and Adrian Weller: You shouldn’t Trust Me: Learning models which conceal unfiarness from multiple explanation methods

• Message: feature importance contain little information about model fairness
• Motivating example: while the importance of features are unchanged, a modified model could give maximal violation of demographic parity.
• Propose a method to modify the classifiers..

Deborah Raji and Genevieve Fried: About Face: Tracking shifting trends of facial recognition evaluation

• Facial recognition + fairness

• Different era has different data source + technology + demographic information

1430-1530 Sam Henry. Amazon Augmented AI: People + AI = Magic

• Background
• Humans continuously review for ML auditoring
• Combining humans and ML is hard
• Product: Amazon Augmented AI (A2I)
• Easily implement human review of machine learning predictions
• BTW: \$4000 funding for collecting datasets.

1600-1645 Paper Session 3

Stefano Alletto, Shenyang Huang, Vincent Francois-Lavet, Yohei Nakata and Guillaume Rabusseau: RandomNet: Towards fully automatic neural architecture design for multimodal learning

Julian Niedermeier, Goncalo Mordido and Christoph Meinel: Improving the evaluation of generative models with fuzzy logic

Huiyuan Xie, Tom Sherborne, Alexander Kuhnie and Ann Copestake: Going beneath the surface: Evaluating image captioning for grammaticality, trustfulness, and diversity

1645-1730: Panel: perspectives on self-evaluation. Should researchers, goups, companies evaluate their own work, or is a more adversarial approach necessary to achieve evaluation quality?

# Sunday 0209

## Document Analysis and Understanding

• Approach: Differentiable Binarization. Compute segmentation map and threshold map, then convert to binarization map to get detection results.
• Experiment on five benchmark datasets of scene text.
• Also compared speed of experiments.

• Proposed a Shape Transformation Module

• Scene text detection
• Approach:
• Guided training: GCN+CTC Decoder
• Inference: Only keep the GCN stream.

## Interpretable AI

Delleiger and Vreeken, Explainable Data Decompositions

• Define the pattern composition problem with a regularized max likelihood.

• D is a set of data points. S is the set of patterns.

• Fast method for discovering pattern components & compositions.

• Propose the nCor: neighbor correlation coefficient
• Measure the local continuity of the reordered data points to quantify the strength of the global association between variables.
• Propose I-GOS: optimizes for a heatmap, so that the classification scores on the masked image would maximally decrease.
• Train on integrated gradients instead of individual gradients, so it’s less likely to get trapped in local minima.
• Network design:
• Question Net: select questions from a given set
• Predict Net: produce judgment results based on the answers.
• Design reward functions to minimize the n. questions asked.

Tomsett et al., Sanity checks for saliency metrics

• Investigate the properties of different approaches for measuring the fidelity of saliency map explanations for image classifiers.
• Propose a corresponding set of sanity checks for saliency metrics based on measures of reliability from psychometric testing literature.
• Psychometric test reliability is usually estimated in four separate ways (Peter 1979). Neural network as the agent, saliency methods as a battery of psychometric tests.
• Inter-rater reliability: For each image, check the agreement of saliency metric scores.
• Inter-method reliability: Assesses the degree to which test scores are consistent when there is a variation in the methods / instruments used.
• Internal consistency reliability: Degree to which different methods intending to measure the same concept produce similar scores.
• Test-retest reliability. (not relevant for saliency methods)

Virani et al., Justification-based reliability in ML

• Extend the Justified True Belief in epistemology, to characterizing the validity and limits of knowledge in supervised classifiers.
• An epistemic classifier characterizes the region where it is confident and not. The goal is to train an epistemic classifier.
• JTB theory: knowledge = “justified + true” belief.
• “justified”: gathers evidence using the training set for each individual test input x in the layers of NN classifier, where:
• an unambiguous truth state in the neighborhood of x in embedded spaces provide support to the belief
• allowing the model to declare “I know it”, “I might know”, or “I don’t know” (output classification results from either “IK”, “IMK” or “IDK”).

## NLP session

• Identify the important words
• Change the words to the most semantically similar and grammatically correct words until the prediction is altered.

Sahin and Gurevych, Investigating invertible NNs for inverse problems in morphology

• Task: morphological inflection and lemmatization
• Use Invertible Neural Networks to optimize both direction problems simultaneously

Aguilar et al., Knowledge Distillation from internal representations

• Distill the BERT representations into a simplified BERT version (with less layers).

## ML: Neural Nets etc.

Tian et al., Network as regularization for training DNN

• Propose Network as Regularization (NaR) - use an auxiliary network to dynamically incorporate guided semantic disturbance to the labels.
• Target-based regularization: instead of focusing on the primary class, the model also pays some attention to the other classes. Many existing regularization methods adopt prior knowledge as reg term in loss function.
• How is the auxiliary network trained? ground-truth label y, independently.
• Convex comb the two network’s outputs as noise, then add this noise to label of training the target network.
• Good results on some datasets in comparison to baselines.

## ML: Causal Learning and Bayes Nets

• A unified framework for counterfactual risk minimization based on the DRO (distribuionally robust optimization) of policies.
• Dose Response Network (DRNet) architecture. Lower layers are shared, but higher layers branch into different heads. Assign heads corresponding to the dosage strata.
• Problem setting (motivating example). E.g., Dataset (X,Y) and (Y,Z) are separately collected. We know X->Y, and Y<->Z. Want to find if there are common causes between X and Z. But these two datasets are not measured at the same conditions. Need some bivariate causal discovery methods.
• Propose approach that outperform previous bivariate causal discovery algorithms.
• Propose Sequential Score Test (SST)
• Can control type I error under continuous monitoring and detect multi-dimensional heterogeneous treatment effects.
• Compare to SOTA online test (mSPRT).

## ML: Fairness and Privacy, Learning Theory

Davidson and Ravi., Making existing clusterings fairer: algos, complexity results, and insights

• Formulate the minimal cluster modification for fairness (MCMF) problem.
• Input: a given partitional clustering
• Goal: minimally change it, so that the clustering is still of good quality and fairer.
• Goal: define non-exempt discrimination, while exempting discrimination due to critical features.
• Tool: partial information decomposition (PID) from information theory
• Counterfactual causal influence & counterfactual fairness
• Decomposition of total discrimination into four non-negative components (exempt and non-exempt visible discrimination, exempt and non-exempt masked discrimination)
• An impossibility result (no purely observational measure of non-exempt discrimination can satisfy all our desirable properties)
• Observational relaxations

Harder et al., Interpretable and differentially private predictions

• Trade-off between interpretability, privacy, and accuracy.
• Propose a family of interpretable models (“LLM, locally linear maps”), accounting privacy and interpretability.
• Provide DP “local” and “global” explanations on classification.
• Use random projections (Johnson-Lindenstrauss transform) to better deal with privacy and accuracy trade-off.

Wang and Zhou, Differentially private learning with small public data

• Use public database to adjust privacy budget & gradient clipping of SGD training of private training database. Also use public database to fine-tune the model.

# Monday 0210

## Games: Description Languages and NLP

Goldwaser and Thielscher, Deep RL for general game playing

• General Game Playing: providing game rules only at runtime. Agents are allowed some starttime to analyze the game strategies.
• Game Description Language (GDL) logical programming
• Previous approaches: Construct propositional networks
• Proposed approach: GCP with reinforcement learning
• Limitations for AlphaZero: Needs a handcrafted neural network, zero sum, turn-based, two-player, play-specific. How to remove these limitations?
• GDL input -> Propositional network->DNN->RL, computing expected reward for move probabilities of multiple players’ outputs.
• The DNN is shared (for common feature extraction)
• Each player has their own move space. Move simultaneously.
• Automatically generated NN
• Expected reward allows cooperative strategies.
• Evaluation method:
• Compared to UCT: run both with equal time per move, and run with equal number of simulations
• Background: baseline narrative planning.
• Aim of this work: approach to assist authors / reduce authoring burden via a semi-automated route to baseline narrative planning model development.
• Planning model acquisition from text summaries
• From input natural language sentences to baseline domain model
• Segmentation -> Object identification -> Reference co-resolution -> Narrative domain model acquisition (target PDDL representation)
• Domain model acquisition: the output is a PDDL model (Planning domain definition language. McDermott et al., 1998)

Hausknecht et al., Interactive Fiction Games: A Colossal Adventure

• Interactive Fiction games are fully text-based simulation environments where a player issues text commands to effect change in the environment and progress through the story.
• Challenges IF games provide for agents:
• Combinational Action Space
• Commonsense Reasoning
• Knowledge Representation
• Introduce Jericho: learning environment for 56 man-made IF games across the spectrum of difficulty.
• Reward / game over / victory detection
• Save / load game states
• Walkthroughs, vocabulary identification, action templates, valid action detection…
• Formulate a “game character auto-creation” framework”: predict a large set of physically meaningful facial parameters under a self-supervised learning paradigm, according to the input face photo.
• Previous methods:
• 3D reconstruction (morphable face models of 3D reconstructions vs. bone-driven face models in RPG games)
• F2P (face to parameters): iteratively adjust params at the input end of the renderer by gradient descent. Slow.
• Propose network:
• Translator T and Imitator G. Optimize using a cycle-consistency loss.
• Also has segmentation networks, etc. See orig paper for their pipeline.
• Evaluation:
• Two games: Justice and Revelation.
• Subjective evaluation from 15 volunteers.
• Ablation studies on several loss components.
• Previous models (VAE, GAN, Conv-Seq2Seq) models didn’t incorporate attributes and prior knowledge of the story genre.
• Proposed method: combine deep generation networks with character modeling (allocate a consistent character to a story, via encoding it into the distributed embedding)
• Decompose the story generation into two steps:
• First, determine the character’s reaction to the current situation at each time step
• Second, generate a complete sentence by incorporating the character embedding, predicted action, and the situation information.
• Experiment:
• Dataset: corpus of movie plot summaries extracted from Wikipedia
• Baseline models: Conditional LM, S2S with attention, incremental S2S with attention, plan-and-write, event representation, hierarchical convolution sequence model.
• Evaluation metrics: BLEU, Perplexity, Human Evaluation.

Fan et al., Generating Interactive Worlds with Text

• Task: generate cohesive and interesting game environments.
• Investigate a ML approach for world creating, using the LIGHT game environment.
• Use NN-based models to compositionally arrange locations, characters, and objects into a coherent whole.
• Main challenge of story generation:
• Consistency
• Current SOTA solution: exploit intermediate representations (keywords, events, skeleton) to provide better guidance for the story generation process. Risky + Complex
• Diversity
• VAE with noise (also has drawbacks and randomness)
• Propose model
• Consistency: use hierarchical structure
• Diversity: CVAE model + Multi-pass generation.

## Application: Human Modeling

(Beekman, 1115-1230)

• Introduced Variational Pathway Reasoning (VPR) method to EEG emotion recognition.
• A salient pathway reasoning method is proposed:
• EEG -> {Random walk -> pathway candidates -> Sequence modeling -> Pathway codes} (pathway sampling and coding) -> Salient pathway reasoning
• How to do salient pathway reasoning? Sparse variation scaling, pseudo salient pathways, etc…. -> Full conv layer
• Experiments:
• Accuracy vs benchmarks
• Visualization of salient pathways.
• Input: image, speech, etc. Output: perceived emotion.
• Previous approaches:
• Unimodal emotion recognition. Either of facial, speech, body gestures, or physiological features.
• Multimodal emotion recognition. Early fusion, dynamic fusion graphs, tensor fusion networks, etc.
• Approach: M3ER
• Modality check step with CCA. Discard some ineffective modalities with some heuristics.
• Proxy feature generator. Generate proxy feature vectors for the ineffectual modalities using a linear transformation.
• Multiplicative combination step.
• Dataset:
• IEMOCAP Dataset (Busso et al., 2008). 3 modalities of 10 actors.
• CMU MOSEI dataset (Zadeh et al., 2018)
• Challenges in emotion recognition:
• Large intra-class ariation: affective gap
• Low structured consistency: resolutions and bulrring noises
• Sparse keyframe expression. Only limited keyframes directly convey and determine emotions.
• Propose visual-audio attention network (VAANet), to study emotion recognition task in user-generated video in E2E manner.
• Polarity-consistent cross entorpy loss (increase the penalty coefficient if the polarity of prediction is different from the ground truth label).
• (See Figure 2) Use instance -adaptive branch to achieve the dynamic graphic connections.
• Then process EEG by multi-level and multi-graph convolution, graph coarsening, region dependency modeling, then FC and softmax.
• Use real+generated data and gait-based effective features.
• Generate with conditional VAE.
• Use novel emotion-gait dataset.
• Problem: want to generalize from one security game to another against the same adversary
• Adversary has a fixed but unobserved attractiveness function
• Goal: learn the attractiveness function to maximize defender utility.
• Using a predict then optimize approach may not maximize utility
• This paper: learn a predictive model using a game-focused loss function
• Maximize a surrogate for decision quality.
• Result: better performance with less data.
• Need more sensors to recognize human activies, but how to balance sensoe numbers and the cost?
• Alternatively minimize classification loss and sensor number
• MDP-based sensor model, sensoe-adaptive activity model, Mutual DAgger imitation learning.

## ML Unsupervised and Semi-supervised Learning, clustering 2

Chen et al., Multi-view clustering in latent embedding space

• Background: global similarity learning
• Learn a similarity matrix S, where s_ij is the similarity between sample i and j
• Propose MCLES approach: optimize Eq (9).
• Evaluate on ACC, NMI, and PUR against various baselines.

## ML Unsupervised and semi-supervised learning, clustering 3

(Concourse A, 1400-1515)

• The current linear combination-based multi-view spectral clustering framework could: (1) limit the representation capacity of the learned Laplacian matrix, and (2) insufficiently explore the high-order neighborhood information among data
• Provide a flexible optimal Laplacian matrix construction mechanism to solve the aforementioned issues.
• Seeds an optimal Laplacian matrix L* in the neighborhood of both the linear comb of first-order and high-order base Laplacian matrices.
• Experimental study on eight benchmark datasets.

Sun et al., Lifelong spectral clustering

• Spectral clustering in lifelong learning setting
• Two common knowledge libraries: orthogonal basis library B and feature embedding library F.
• Propose RSwMPC.
• Subspace learning is performed by adding a projection matrix. Feature selection and noise suppression are achieved by introducing the l_2,1-norm penalty term of the projection matrix.
• Parameter-free self-weighted strategy.
• Experiments on synthetic and real-world datasets.
• Can second order method (K-FAC) improve the critical batch size problems?
• K-FAC also significantly deviate from ideal strong scaling behaviour, when training beyond a critical batch size.
• Experiments on CIFAR-10 and SVHN.
• Motivation: outlier detection. feature selection. Usually these two steps are done separately or iteratively.
• Solution: ODEFS: Integration of feature selection, outlier candidates selection, and outlier detection.

Deep Embedded Non-redundant clustering

• What is non-redundant clustering?
• Deep clustering techniques cluster in the latent space.
• Joint learn the class-specific clusters of representations in latent space using a non-redundant clustering layer.
• IVFS is inspired by random subset method.
• Proposed algorithm can provide satisfactory performance under a sharp sub-sampling rate.
• Motivation:
• exploit one-to-one correspondence
• exploit feature-level ordinal information
• Method: self-paced joint learning model

(Murray Hill, 1545-1715)

Lee et al., Residual Continual Learning

• What is continual learning? Learn multiple tasks sequentially, without forgetting.
• Linear combination of source network and target network, to form a combined network.

Chandak et al., Lifelong Learning with a changing action set

• (Outstanding student paper award, honorable mention)
• Lifelong MDP setup:
• How do we capture the notion of underlying structure in action space?
• How does the action set change?
• What are the characteristic properties of a changing action set?
• Proposed solution:
• A policy parameterization that is invariant to the cardinality of the action set.
• A new obj func which can be used to update the agent when new actions are introduced.
• Proposed method:
• What if we could infer the underlying structure in the action space and whenever a new action is introduced we associate its behavior using the inferred underlying structure?
• Policy parameterization is critical.
• Longitudinally ensemble the ghost networks.
• Dropout erosion; skip connection erosion
• Evaluate on the NeurIPS 2017 adversarial challenge.
• Aim to optimize the attribute space in the context of ZSL, and leverage the inter-class relations to generate more powerful attribute vector per class.
• APNet: propagate the attributes of every class on the category graph to its neighbors in order to generate attribute vectors.
• Inspired by belief propagation, message passing and label propagation. Also related to GNN.
• Problem: the data are collected in the stream. New class hides in unlabeled data.
• Solution: semi-supervised learning framework “SEEN” method:
• Add the data point to a buffer, if it is likely belonging to a new class. (as determined by a detector)
• How to add the data points from the buffer into corresponding category, if a new label is introduced?
• SeenLP for known class classification, usign label propagation.

# Tuesday 0211

## Fairness and Equality in Economic Paradigms

(930-1045, Bryant)

• Focus: personalized recommendations on two-sided platforms
• Terminology:
• Exposure of producers
• Relevance score
• Utility of recommendation
• Tested types of recommendatoin updates
• Change in relevance scoring model (Amazon-M. Collaborative filtering)
• Addition of new data (Amazon-D). Factorization model
• 69-98% of producers face 100+% change in their exposure / visibility immediately due to the updates. Sudden change in exposure is undesirable.
• An update is incrementally fair if the difference between the exposure distribution is smaller than epsilon.
• Fairness guarantee: need to guarantee a min utility for customer.
• Method: formulate into an integer programming problem.
• Baseline methods:
• Canary deployment (phased roll out) (some customers will get less utility than others; unfair to customers.)
• Intermediate Relevance Function (does not consider producer or customer)
• Resource allocation of items. Some items are indivisible. So that no agent envies another one.
• (An agent envies another one if the other one has more value than itself)
• Envy-freeness in social networks:
• Mostly unreasonable to assume agents know complex assignment.
• Local envy-freeness. Goal: allocate so that no agent directly envies another one.
• Global envy-freeness: a stronger assumption.
• Envy-free allocation is hard:
• In a complete social network, allocating resources in a locally envy-free way is NP-hard.
• For edgeless social network, doing it in envy-free way is also NP-hard
• Parameterized complexity theory:
• Instead of measuring asymptotic complexity in terms of instance size n, additionally consider a parameter k.
• Resource and agent types
• Social network types
• Results:
• On their own restricting social graph structure and n. of types do not yield polynomial time solvability
• Together restricting social graph structure and number of types yield polynomial time solvability
• Future work: Parameterized complexity wrt the number of resource types for complete social networks?

Abebe et al., Subsidy Allocations in the presence of income shocks

• Background: poverty, income, shock
• Min-max via binary search.
• An optimal solution: min-sum objective, using knapsack.
• There is a gap between our alg and the income subsidy method which only focuses on income.
• How different are min-sum and min-max objectives? Entirely different solutions.
• Income vs wealth subsidy?

Tziavelis et al., Fair procedure for fair stable marriage outcomes

• Problem setting: two-sided market where each agent ranks those on the other side by preference.
• Stable marriage: find a perfect matching, such that no pair of agents prefer each other to their matches.
• Gale-Shapley alg is not good enough (optimal match for one side, but pessimal one to the other side)
• Gale-Shapley extension: two-sided proposals.
• Fair procedures that reach an equitable stable marriage in cubic time.
• Key idea:
• Sort users and tasks together.
• Maintain asub-graph that always has a right-perfect matching
• Incentive compatible, budget feasible, individually rational, computationally efficient, 2-approximate.

## Human and AI 3

(Clinton, 1115-1230)

Evertsz and Thangarajah, A framework for engineering human/agent teaming system

• How to engineer MAS (multi-agent system) to support human team members?
• Identify key human/agent teaming parameters
• Foster team transparency
• Methodology for engineering human/agent teams
• What artefacts to represent
• How to represent these artefacts
• TDF: Tactics development framework
• Runtime
• Retrieval based interactive systems
• Common pipeline:
• Environment + Action of RL agents.
• Drawback: most existin gretrieval functions are optimized over precision at top ranks. -> The RL agent do not have a global view of the problem.
• Non-differentiability of retrieval functions.
• Dynamic Search:
• Background: Multi-turn information seeking
• Key idea:
• The interactions with a human user.
• To represent segments:
• Option 1: Doc2vec (unable to handle the crowding problem: lower dimensional space may not be able to accommodate all the datapoints from higher dimensional space. Documents may collapse.)
• Option 2: t-SNE.
• Differentiable retrieval function (so we can optimize a value network)
• Train the policy network using proximal policy optimization.

De et al., Regression under human assistance

• Motivation:
• Societies rely on human experts for import decisions
• Timelineess and quality of the huan decisions are often compromised by large number of decisions, and shortage of human experts.
• Let machines automate this procedure.
• ML is still worse than humans in many cases.
• Goal: develop ML models that are optimized to operate under different automation levels.
• Take decisions for a given fraction of the instances, and defer to humans for the rest.
• During training, optimize the machine for this scenario.
• Method:
• Ridge regression. See equation (1).
• Ridge regression becomes a combinatorial problem.
• This is NP hard
• Express log l(S) as a difference of submodular functions. Use a heuristic iterative algorithm for submodular optimization.
• Evaluation: This greedy alg consistently outperforms the baselines across almost the entire range of automation levels on all datasets.
• Explain decisions made by a CNN
• The features that humans zoon in on when they image an alternative to a model prediction: “fault-lines”.
• Identify the “fault-line”, the minimal semantic-level features that need to be added / deleted from input image, in order to alter the classification category of image to another specified class.
• First select highly influential superpixels, and then apply K-means clustering with outlier removal, to form clusters (“explaining concept”).
• Select the fault-line explaining concepts.
• Task: event detection on microposts. Given positive labels, unlabeled data. Goal: a binary classifier
• Extract informative keyword and estimate their Expectation.

## Cognitive Modeling

• Motivation
• Human number sense: the cognitive process of numbers and mathematics. Human is able to perform induction of number symbols, is competent in problem solving, and have vision-based cognitive capacity.
• Dataset
• The MNS dataset: text number sense directly from pixel input.
• Requires adaptive hierarchical representation based on context
• Focus on reasoning and understanding, rather than recognition.
• Investigate number sense from a cognitive perspective instead of a clinical perspective.
• Dataset generation
• Generate MNS by parsing and sampling an And-Or Graph (see figure 2)
• Problem types: combination, composition, partition.
• Layout component serves as problem context.
• Experiments and analysis
• Neural Network models
• Symbolic search based models.
• Related Work:
• educational psychology, machine IQ and analogy
• Imagine going on a trip
• Partial planning via soft planning
• Control planning with state-specific inverse temperatures (a.g., attention) over represented states.
• Partial planning + information theoretic costs

Rostami et al., Generative continual concept learning

• Inspired by the Parallel Distributed Processing learning and Complementary Learning Systems theories, develop a computational model that is able to expand its previously learned concepts efficiently to new domains using a few labeled samples.