|Welcome and Opening Remarks
Data Platforms for Designing, Acquiring, and Integrating Data for Valuable Knowledge Discovery
|Process and Technologies for Data Exchange
Takehiro Tezuka, Lihua Wang, Takuya Hayashi, Seiichi Ozawa
A Fast Privacy-Preserving Multi-Layer Perceptron Using Ring-LWE-Based Homomorphic Encryption
Concerns about leaking privacy from data have been preventing from making good use of so-called big data. On the other hand, data analysis with preserving privacy would still be a promising direction in big data analysis. In this paper, we propose Privacy-Preserving Multi-Layer Perceptron (PP-MLP) that can compute the prediction real-time using Ring-LWE-based homomorphic encryption. We implement the proposed PP-MLP in the form of a two-party model consisting of client and server. The former encrypts input data and receives a classification result from a server, and the latter performs prediction over encrypted data. This scheme enables a client to acquire prediction results by hiding actual data contents, including privacy information against a server. The proposed PP-MLP can make a fast prediction that requires up to 80 msec per input without a significant drop in classification accuracy compared to the convention multi-layer perceptron for plaintexts.
Yuta Niki, Hiroki Sakaji, Kiyoshi Izumi, Hiroyasu Matsushima
Causality Existence Classification from Multilingual Texts Using End-to-End LSTM Models
In this study, we propose a neural model for extracting causal sentences from both English and Japanese documents. Causal knowledge extraction is an important topic in the area of natural language processing. However, numerous studies concerning the extraction of causal knowledge target only one language. Therefore, in this study, we propose a multilingual model for extracting causal knowledge. Our model employs end-to-end architecture to deal with multilingual documents by using one model. Through the experiment, the effectiveness of our model is revealed.
|Applications of Knowlwdge Discovery 1
Zhibo Yang, Luyun Li
An Online Retrieval Question Answering System for Featured Snippets Triggering
In current search engines, there are two ways to display the results of Question style User Queries (QUQs), the natural results and Featured Snippet (FS). Actually, a search engine that triggers a FS can better satisfy the user’s information seeking need, and the data format of FS is usually formulated as Question Answer Pairs (QAPs) in the dictionary. In this setting, an answer can be retrieved as a FS if and only if the QUQ and the question in QAPs are matched. The traditional retrieval method is basically based on keywords, which failed to bridge the semantic gap. On the other hand, neural matching methods may not be deployed online directly due to the high flexibility requirements in complex real-world scenarios. To this end, this paper combines retrieval model and matching model in a unified system for FS triggering. This system contains two stages: the recall stage and the ranking stage. In the recall stage, for a QUQ, we use the vector-based retrieval model rather than the BOW (bag of words)-based one to ensure accurate and quick recall of possible candidates from QAPs. In the ranking stage, we use the ensemble method on multiple models, including pre-trained network BERT, to boost matching performance. To improve the flexibility and adaptability of the system, two query analysis techniques, i.e. query term weighting and query term fuzzy method are also incorporated in the matching network. We conduct extensive experiments on real-word data. The experimental results demonstrate the superiority of our system.
|Invited Talk 1
Data Exchange: The Next Big Thing For The Future?
In June 2019 at the G-20 summit, world leaders adopted Osaka Track to formulate rules on digital governance under the concept of “Data Free Flow with Trust”. This is following the European Commission initiative and guidance on free flow of non-personal data. Considering that today 70% of data exchanges occur between organizations from different sectors and 50% are cross-border, within a data economy that is accelerating, frameworks are being designed to bring the full value of data exchange. Which are the latest data exchange trends? What are the benefits from cross-domain data exchanges and how public/private sectors can collaborate? What are the needs for centralized or decentralized data exchange platforms? Fabrice Tocco, co-Founder and co-CEO of Dawex, will share his view of the future of the data economy.
Fabrice Tocco, serial entrepreneur, co-founder and co-CEO of Dawex, is a recognized expert in the data economy, and regularly invited to engage with the European and international institutions as a speaker. He strongly believes that data is the mirror of the economy, and that boosting tomorrow’s economy requires from organizations to position data exchange at the core of their business strategy. In 2015, Fabrice jumped into his second entrepreneurial adventure with Laurent Lafaye and co-created Dawex. The company’s mission is to build the conditions for the smooth development of the data economy by facilitating data exchange between companies and organizations. Dawex operates the largest data marketplace to date and develops cutting-edge technologies for data trading with the ambition to become the world’s leading Data Exchange. Fabrice started his career at a world leader in the tire industry taking responsibilities in the group marketing and innovation divisions. Fabrice graduated from Reims Management School, Neoma Business School, in France.
|Invited Talk 2
Introduction to Social Data Platform, D-Ocean
Social Data Platform, D-Ocean is data focused Social Networking Service which is available globally to help people finding and sharing data with others. D-Ocean isn't just a collection of data sets, but also focuses on people who have contributed to the data economy. Users, such as data scientists, data engineers and etc., can be assessed by others fairly. In this session, I will explain how D-Ocean provides capability as Data Exchanges and also Social Networking Service. I will also introduce use cases where users find new insights which can only be found by mixing with data from other users and how D-Ocean helps those users using unique features as online platform.
Teppei Yagihashi is CTO and Co-Founder of D-Ocean, Inc. D-Ocean provides the world’s first data platform which combines data exchange and social networking service. He established D-Ocean in 2017 and has been developing D-Ocean platform. He already applied for a few patents related to data exchange platform in Japan as well as other foreign countries. Before D-Ocean, he has been working for Google Cloud division as solutions architect and published papers related to mobile, IoT and data management products. He also worked for Amazon Web Services as solutions architect to help customers in various industries to implement cloud native services. He was awarded for the Best Solutions Architect of the year in 2014. As his personal project, he contributed to NATS open-source project and has developed and maintained Java / Scala client libraries for several years.
|Applications of Knowlwdge Discovery 2
Hiroshi Nagaya, Kazuko Uno, Hiroyuki A. Torii
Tracking Topics of Influential Tweets on Fukushima Disaster over Long Periods of Time
Social media has been extensively and effectively deployed to share information and communicate during emergencies, such as the 2011 Fukushima Daiichi nuclear power plant accident in Japan. It is important to provide information during crises on social media and find the most effective way to transmit information in such situations. It is necessary to carefully preprocess Twitter data because it includes a considerable amount of noise. However, compared to other resources, such as government statistics and newspapers, Twitter provides varied information and is distinguished by its immediacy. We can also regard Twitter data as data that reflect human behaviors, thoughts, and intentions across different domains by characteristics of the platform. We propose an expansion model of Topic Dynamics for tracking the trend and detecting the moments of the occurrences of influential tweets on the Fukushima disaster. Using this method, we obtained the list of bursting words at different periods over a long duration following the Fukushima disaster.
|Case Studies on Cross-disciplinary Data Analysis 1
Hiroki Sakaji, Yasutomo Kimura, Kiyoshi Izumi, Hiroyasu Matsushima
Extraction of Volitional Utterances from Japanese Local Political Corpus
In this paper, we propose a new method for extracting utterances expressing opinions and will from Japanese political corpus. Recently, many local autonomies in Japan provide on their Web sites various political documents such as basic plans of urban development, local assembly minutes or ordinances. These documents include utterances expressing opinions and will. If we can extract such utterances, voters can know who has what kind of opinion, and it will be useful for selecting candidates at the time of the election. Also, if such as utterances can be extracted from past local assembly minutes, it is also possible to investigate whether some countermeasure was taken against that opinion. Therefore, we develop a method for extracting utterances expressing opinions and will by using Japanese political corpus.
Masahiro Suzuki, Toshiya Katagi, Hiroki Sakaji, Kiyoshi Izumi, Yasushi Ishikawa
Stock Price Analysis Using Combination of Analyst Reports and Several Documents
In this paper, we propose a methodology of forecasting the direction and extent of volatility in mid- to long-term excess returns of stock prices by applying natural language processing and neural networks in the context of analyst reports. Analyst reports are prepared by analysts in the research departments of stock brokerage firms. We examine the contents of reports for useful information on forecasting the movements of stock prices. First, our method extracts opinion sentences from the reports while the remaining parts are classified as non-opinion sentences. Second, our method predicts stock price movements by inputting the opinion and non-opinion sentences into separate neural networks.
|Case Studies on Cross-disciplinary Data Analysis 2
How will sense of values change during art appreciation?
For the art appreciation in museums, usually a certain information will be provided as a caption. Visitors usually read the description to help his/her understanding. Museums usually prepare such descriptions for general visitors (viewers). The problem for reading these captions is visitors will not see the artworks after reading the descriptions to understand the art works. However, recently several museums have removed or hidden such descriptions as captions. For instance, the exhibition “Bacon and Caravaggio” held in Museo e Galleria Borghese, Roma did not provide any descriptions as captions. Accordingly visitors can see and think the artworks deeply. This might be a positive aspect of the display strategy. Thus experts show greater flexibility and differentiation in art appreciation. Non-experts will not show flexibility and differentiation in art appreciation. This is a problem in art appreciation. We have conducted several experiments where various types of information offering strategies were performed. From the results, we obtained interesting phenomenon. The participants seemed to be able to gradually understand the artwork by offering information of the artwork. Of course for an abstract art, the information of the artworks functions better understanding of artworks. Even for a representational painting, the level of understanding was gradually changing. Thus the information of art sometimes influences the art appreciation. However, a price will not be related to aesthetics. For the price, it is a rather vulgar matter. However, such a price can be an influential factor in art appreciation and sense of values. In this paper, we will discuss how will the value of art change according to offered information? For that we conducted an experiment. Where information of the artwork was offered randomly. Information involved title, painting materials, production year, name of artist, price, and theme of the artwork.
Standardization for innovation with data exchange
This paper analyzes and proposes hypotheses regarding the process through which the World Wide Web Consortium (W3C) realized the convergence of standards among fiercely competitive players. New modes of innovation works only with mashed up data generated by diverse sensor devices via the Internet, and varied based on processing results. It is realized only with the situation that more than two modules work in cooperation in real time. Any innovation can be realized only with standardization. However Unlike ordinary joint ventures, stakeholders of standardization occasionally have conflicting interests. We hypothesize changes in standardization process management at the W3C as key factors of innovation through standardization among stakeholders with conflicting interests. 1) Defining scope of specifications to be developed by functions instead of by technical structures. 2) Development management policy based on feedback from implementations, referred to as an "implementation-oriented policy," And 3) Inclusion of diverse stakeholders in open standardization processes that facilitate consensus formation and the diffusion of developed standards.
These awards celebrate the most inspiring and effective presentations and papers, that are delivered by impactful, confident and engaging speakers. The details will be announced via e-mail or this Web page.
Dr. Teruaki Hayashi (co-chair)
Email: hayashi -at- sys.t.u-tokyo.ac.jp