DIPr Lab at PSU

Joining the lab

Fri, 01 Nov 2024 00:00:00 +0000

Graduate Students

If you are interested in applying to the PSU Computer Science graduate program, please check the information on our Graduate Program. In your application, mention Dr. Primal Pappachan as a potential advisor and your application will be routed to me for consideration. Graduate admissions are done by Grad admissions committee that processes all applications and decides on admissions for the entire department. Individual faculty members may not accept students on their own but you can reach out to Dr. Primal by email if you are interested in applying and potentially being part of the DIPr lab. Please see details below on what to include in this email.

Funding: PhD students typically obtain funding. All of your tuition and fees will be paid and you will be paid a monthly stipend.

Masters students

Master’s students who are interested in conducting research in the lab are welcome to apply, provided they are in their second quarter of study. Applicants should be able to commit to dedicating a minimum of 10 hours per week to doing research. This opportunity is ideal for students looking to gain hands-on research experience and contribute to ongoing projects.

Funding: M.S. students are not typically funded. In some rare instances, it may be possible to pay the MS student an hourly salary. These opportunities are extremely rare and are reserved to support students who have been in the lab for 1 or more quarters.

Undergraduate Students

Students majoring in Computer Science are encouraged to apply. Applicants should be able to commit to dedicating a minimum of 10 hours per week to doing research. This opportunity is ideal for students looking to gain hands-on research experience and contribute to ongoing projects.

Previous research experience is not required.
Knowledge of programming languages (e.g., Java, Python), Web development (e.g., HTML, Javascript, React), and Databases (e.g., MySQL, PostgreSQL) are a plus. Demonstrated coding skills is a plus.
Strong communication skills (written and oral) are a plus.

Funding: Similar to M.S. students, undergraduate students generally do not receive funding. Limited opportunities may be available, typically reserved for those who have been in the lab for more than one quarter.

You may also apply through the Maseeh College Undergraduate Research & Mentoring Program (URMP) listing Dr. Primal Pappachan as a faculty mentor. This 10-week program includes a stipend.

Highschool students

For high school students in the Portland Metropolitan area, please apply through programs such as Institute for Computing in Research or Saturday Academy and mention your interest in working in Dr. Primal Pappachan and DIPr lab.

How do I apply?

Ph.D. applicants: When reaching out to Dr. Primal after completing the graduate application, please include your: (1) CV (including a link to your GitHub profile and website), (2) a description of your previous research experience and interests, and (3) specific information about which research projects in the lab interests you and why. Emails without these information may be ignored.

M.S. applicants: This only applies to currently enrolled M.S. students. After going through projects in the lab, if you are interested in applying to be part of the lab, send an email including: (1) your resume (PDF), (2) unofficial copy of your first year transcript (PDF), and (3) few paragraphs explaining why you’d like to work in our lab. To write this well, I suggest you look at some of our previous publications to orient yourself to our current projects. Make the subject of your email “Masters Application” and send this to Dr. Primal. You are welcome to join the weekly group meetings (see Expectations) to learn more about the ongoing projects and connect with lab members.

Undergrad applicants: This only applies to currently enrolled undergrad students. After going through projects in the lab, if you are interested in applying to be part of the lab, send an email including: (1) your resume (PDF), (2) unofficial copy of your transcript (PDF); freshman can send high school transcript (PDF), and (3) few paragraphs explaining why you’d like to work in our lab. To write this well, I suggest you look at some of our previous publications to orient yourself to our current projects. Make the subject of your email “Undergrad Application” and send this to Dr. Primal. You are welcome to join the weekly group meetings (see Expectations) to learn more about the ongoing projects and connect with lab members.

Getting started

Fri, 01 Nov 2024 00:00:00 +0000

Welcome to DIPr Lab! We are excited that you have decided to join our team! We hope that these onboarding resources, guidelines, and tips will make the first few steps easier.

Set up the first meeting

Check the Primal’s Google Calendar (See someone’s calendar availability) and propose a time to meet that works for both of you. Primal’s office is room FAB 115-08.

Join the GitHub organization

Our group has a GitHub organization account to host public and private repos for the software we create for each of the research projects. You can learn more about how to use GitHub in this tutorial. This is also the location for our group website, which is hosted through GitHub pages.

Add yourself to the group’s website

After you have joined the GitHub organization, you can add yourself as a member to the lab website by following the instructions on the README of the website repo.

Join our Zulip instance

We use Zulip for communication around research projects, general updates, and sharing random news of interest. You can join our Zulip instance by signing up here.

Request to be added to the shared drive

We use a shared drive on Google drive to manage common resources that are relevant to all members. You can join this drive by sending an email about the same to Primal.

Mailing list

Send an invite to the DIPr lab Google Groups (only visible if you are logged into your PSU email) which is used for broadcasting information that is relevant to all group members.

Office Space

If you prefer to work in the department, please request for a desk in the first meeting with Primal. Lab members sit either in the DIPr lab space (FAB 135-04), or in the shared graduate student cubicle space. Graduate students are expected to make use of their allocated desk in the DIPr lab space. Group meetings are held in one of the conference rooms.

Individual Development/Mentoring plans

Within the first few weeks of joining the DIPr Lab, you should work with Primal to develop a plan outlining your short, medium, and long term goals. More on individual developing plans can be found in Expectations.

Key access

If you require regular use of the lab space, you should discuss your need with Primal before applying for key access. Once you’ve received permission to receive a lab key, you should fill out a Key Authorization & Request form and email your request to keys@pdx.edu. Key requests require an active ODIN Account.

Expectations

Fri, 01 Nov 2024 00:00:00 +0000

Working hours

You should create a working schedule that is a right fit for you with the understanding that your ideal schedule may evolve over time. Depending on the nature of your appointment, there may be specific minimum number of hours that you are expected to work and you should check with about these in the first meeting. Graduate students are generally expected to work an average of at least 40 hours per week. For M.S., undergrad, and high school students this will vary depending on the project. You are not expected to work on weekends and holidays. Consult with Primal and notify fellow lab members in advance of any planned absences during the week. As a student, you should feel flexible to create a work schedule that work for you while meeting the expectations of your role.

If you prefer to work remotely, this should be discussed and approved by Primal and arranged in accordance with Portland State University Polices on Remote Work. All lab members are expected to attend certain in-person events.

Note: All female researchers are entitled to two days of leave when they are on their periods.

Individual Meetings

Following the first meeting, we will set up a time for regular individual meetings. Together, we’ll decide upon the timing and frequency of these meetings that will involve you, Primal, and any others working on the research project (e.g., PhD students, collaborators etc.). To make these meetings productive, please prepare an agenda in the form of document or slides.

Past: Key points from our previous meeting.
Present: Updates on what you have worked on since our last discussion.
Future: Your plan for upcoming tasks or areas where you need feedback.

Primal will help in problem solving, providing constructive feedback and general support during these meetings.

After the meeting, please share meeting notes with Primal summarizing the main discussion points within a day of the meeting. This helps keep discussions fresh in everyone’s mind. Use this provided template to organize the meeting notes. Primal will review these meetings and may reach out with clarifying questions or additional guidance on the tasks.

Do not cancel meetings with Primal if you feel that you have not made adequate progress on your research; these might be the most critical times to meet with a mentor.

Group Meetings

The schedule and venue for meetings will be determined at the beginning of the quarter and announced in the mailing list. In the beginning of the meeting, Primal will make group announcements followed by a presentation from the students. All the DIPr lab students are expected to present at least once during the quarter. This presentation can be one of the following:

Research presentation: A detailed talk about your research project covering aspects such as motivation, approach, and evaluation.
Project workshopping: A brief presentation of the problem that you are working on followed by collaborative brainstorming for the remainder of the meeting.
Tutorial: A hands-on presentation on a tool or a new approach that could benefit the entire group.

If you are the presenter, send a detailed draft of the presentation to Primal at least 3 days prior to the meeting. Primal will review the presentation and will give you feedback on the presentation outline and indidivual slides.

We will use a shared Google Sheet to organize the presentation. Active participation is encouraged, so please ask questions and engage in discussions. Non-members are welcome to attend to learn about our work and connect with the lab members.

Celebrations 🎉

Once every month (typically the first or the last meeting of the month) we will celebrate with sweet treats (donuts, cupcakes, etc) to recognize any achievments or life milestones of the lab members. This includes and is not limited to paper acceptance, submissions, rejections (yes, we celebrate those too!), birthdays, and more.

Database Reading Group (DBRG) meetings

The Database Reading Group meets weekly to discuss papers related (broadly speaking) to database technology. Meeting times are decided at the beginning of the quarter. The regular meeting place is room 130 in PSU’s Fourth Avenue Building. The group welcomes anyone, inside or outside of PSU, with an interest in the subject matter. Designated group members will lead the discussion each week. More on database reading group can be found in DBRG.

Attendance

In-person attendance is expected for the following.

Weekly group meetings
individual meetings with Primal to discuss your research
Database Reading Group meetings
Student Research Symposiums
Outreach events (such as Summer Research Academy, CyberPDX)

Communication

Email and Zulip will be the primary communication mediums. This should be primarily viewed as a medium for asynchronous communication. If you receive a message, you are neither obligated to read nor to respond immediately (and you shouldn’t expect this when you’re sending, too). The expectation is that you will respond to the email within 24-48 hours.

Authorship

Authorship is earned by someone who significantly contributes to the project (e.g., conceives of the project, designs solution, performs simulations or experiments and analyzes results, writes the paper). All authors must read, proofread, and sign off on the final version of the manuscript before submission. Barring unusual circumstances, the lab policy is that students are first-author on all work for which they are leading.

Guidelines for PhD and MS Defense

The university has guidelines for remote participation. Please note:

All committee members must agree to remote in advance
Remote connections for committee members are expected to be both audio and video
Visual aids must be distributed in advance
All committee members must participate in the entire meeting
A draft of the thesis must be delivered to the committee at least two weeks in advance of defense

Student must send their abstract and other required information at least two weeks in advance of the defense to the CS Graduate Advisor (gccs@pdx.edu). Students should not book their room for the defense. The Graduate Advisor will do so once she has received the student’s abstract.

The expectations for student milestones are outlined in greater detail in this document.

Code Management

Code and data are important and it is your responsability to make sure that nothing is lost. Have a plan to make sure that your code and data are safe and accessible to other members of the group. We all work on related topics in the group so, we all benefit from utilizing each other code/data (with appropriate acknowledgement and after requesting access to it).

GitHub

The DIPr lab has a dedicated GitHub account for hosting public repositories, including the code we create and our group website (hosted on GitHub Pages). Unless specified otherwise, all files required to reproduce research results will be made publicly available on the DIPr lab’s GitHub repository. For double-blind review cases, public access may occur post-acceptance, but every published paper will link to a public GitHub repository with the corresponding code and data.

Backups

Backing up data is crucial; nothing is more disheartening than losing months of work, simulations, or code updates. Each lab member is responsible for regularly backing up their own work, and Google Drive is our recommended cloud storage solution.

Ethical management of data

If your research involves data related to individuals, it is essential to follow ethical guidelines. Always consult with Primal if you are uncertain about the responsible handling of such data. Ethical standards are particularly important in these situations, so reach out if you need clarification.

Quarterly evaluation form

At the end of each quarter, you will be receiving the following evaluation form asking you to reflect you on your progress and your experience as a mentee. We will sit aside time during an individual meeting at the beginning of following quarter to review your answers together. This is an opportunity for you to share any concerns you may have about your experience as a graduate student, whether they involve other students, faculty, or staff.

This discussion is also a chance to address any concerns you may have about my role as your advisor. If you need more guidance, would like more independence, or would prefer more frequent meetings, please let me know. Likewise, I’ll provide feedback on your progress, noting any areas where improvement is needed so we can address them proactively. This session is our time to address any concerns early, ensuring that we’re aligned on your goals and progress.

Individual Development Plans

Primal will work with each of you to develop your individual mentoring plan that serves to ensure your time in the lab progresses your short, medium, and long-term goals. This is a useful planning document that assists in aligning expectations. Graduate students will revisit the mentoring plan during an individual meeting with Primal during multiple times in the year. Other lab members will revisit these at appropriate time scales (e.g. every 6 months). For this purpose, we will either modify a template (for all students) or use an online tool (geared towards PhD students and PostDocs).

Take care of yourself!

As a student, you may experience a wide variety of challenges to your physical and mental health, that can interfere with learning or doing research. Help is available on campus and an important aspect of taking care of yourself is learning how to ask for help. Talk to Primal or any of the lab members, if you are struggling. Ask for help early. We cannot change the past, but can influence the future. Confidential counseling services are available at PSU. Please refer to the Student Crisis Resource Card for a list of phone numbers, contacts and support resources.

Lab Policies

Fri, 01 Nov 2024 00:00:00 +0000

Diversity and Inclusivity Policy

At DIPr lab, we also aim to build and sustain a community in which everyone feels welcomed, respected, and intellectually stimulated. It is my intent to ensure that members from diverse backgrounds, including but not limited to race, color, national origin, language, sex, disability, age, sexual orientation, gender identity, and religion, feel welcome and included in this group. If you notice that any of the interactions in this group are not respectful of this diversity, please bring it to my attention. Any suggestions on how to improve the inclusivity of the lab policies are also much appreciated. If you have experienced or observed any discrimination, please report it and/or reach out to support groups listed on PSU’s Equity and Compliance website.

Reporting

DIPr lab desires to create a safe space for everyone. If you or someone you know has been harassed by a lab member or if you have concerns, please contact Primal. If you do not wish to contact Primal, please contact the Department Chair - Dr. Wu-chi Feng or the Dean of College - Dr. Joseph Bull.

Please remember that by way of his position at the university, Primal is a mandated reporter under Title IX. This means that he is not allowed to keep matters falling under Title IX confidential, and is required to disclose these incidents to the administration. You are welcome to discuss matters with Primal, but please keep this in mind when doing so. Primal will do his best to remind you of his responsibilities at the start of conversations anticipated to relate to these topics. If you would rather share information about these matters with a PSU staff member who does not have these reporting responsibilities and can keep the information confidential, please use these campus resources:

Confidential Advocates: 503-894-7982 or schedule online (for matters regarding sexual harassment and sexual and relationship violence)
Center for Student Health and Counseling: 1880 SW 6th Avenue #200; 503-725-2800

You can also find additional resources on PSU’s Sexual Misconduct Response website.

Discrimination and Bias Incidents

The Office of Equity and Compliance (OEC) addresses complaints of discrimination, discriminatory Harassment, and sexual harassment against employees (faculty and staff). If you or someone you know believes they have been discriminated against, you may file a complaint. Someone from the OEC will contact you to discuss how to best address your complaint.

The Bias Review Team (BRT) gathers information on bias incidents that happen on and around campus, and gives resources and support to individuals who experience them. You can report a bias incident you experienced or learned about. A member of the BRT will contact you if you indicate you would like to be contacted.

Confidentiality Policy

All communications within DIPr lab, including emails, discussions, and meetings involving research data or methodologies, should be treated with care and kept confidential. Information should only be shared using secure methods, and with third parties only after obtaining permission from the principal investigator or project leader. Public sharing of research findings should be coordinated with the prinicipal investigator. This policy is intended to ensure the integrity of our work and applies during and after involvement with the research group.

Privacy Policy

The privacy of all members and collaborators of DIPr lab is respected and protected. Personal information, such as contact details and other identifying data, will be used solely for professional and administrative purposes and will not be shared with third parties without consent. Data collected as part of research will be handled in compliance with ethical standards and legal regulations, ensuring that participant identities are safeguarded, and that personal information remains confidential. Members of the research group are expected to handle any personal data they encounter in a manner that respects the privacy of all individuals involved in the research.

Offboarding

Fri, 01 Nov 2024 00:00:00 +0000

Everyone will eventually move on from the lab—whether it’s to complete a degree, start a job, or pursue new opportunities, which is an exciting time! A clear offboarding process ensures that your work can seamlessly continue, that future collaborators have what they need, and that any remaining steps (e.g., publications, future projects) are clearly outlined.

(Credits to Fay lab)

Exit Interview

Set up a dedicated time to meet with Primal to talk about your time in the lab, and to go through the below checklist to make sure these have been done. Besides the checklist, things to talk about include the best part of being in our team, whether you got the support you needed and what could we improve for mentoring and training someone in your role in the future.

Project Documentation

Project work should be hosted in a repository under the organizational GitHub account.

Each project should have an easily found README text file that provides information for others so they can navigate and use your work, and give contact information for authors (and any data creators/use restrictions if propietary data). Ideally, the README should also include links to publications and presentations from the work.

Publications

Science is not finished until it has been communicated. Ideally, you’ll have the chance to publish your results in a conference or journal. In your exit interview, coordinate with Primal on any remaining publications and set a submission timeline. Ensure that all publications and presentations from your projects are archived in the appropriate folder on the lab’s Google Drive and are listed on the lab website where appropriate.

Equipment

Ensure any lab equipment (e.g. computer and peripherals) you have been using has been returned to the lab, office furniture is present. Make sure any problems with equipment are documented and that Primal and relevant department staff so that they can be addressed.

References

Fri, 01 Nov 2024 00:00:00 +0000

Some of the content in this wiki is inspired by similar wikis from labs.

Winter 2026 Week 9

Fri, 06 Mar 2026 00:00:00 +0000

Title	BridgeScope: A Universal Toolkit for Bridging Large Language Models and Databases
Authors	Lianggui Weng, Dandan Liu, Rong Zhu, Bolin Ding, Jingren Zhou
Abstract	As large language models (LLMs) demonstrate increasingly powerful reasoning and orchestration capabilities, LLM-based agents are rapidly adopted for complex data-related tasks. Despite this progress, the current design of how LLMs interact with databases exhibits critical limitations in usability, security, privilege management, and data transmission efficiency. To address these challenges, we introduce BridgeScope, a universal toolkit that bridges LLMs and databases through three key innovations. First, it modularizes SQL operations into fine-grained tools for context retrieval, CRUD execution, and ACID-compliant transaction management. This design enables more precise, LLM-friendly controls over database functionality. Second, it aligns tool implementations with database privileges and user-defined security policies to steer LLMs away from unsafe or unauthorized operations, which not only safeguards database security but also enhances task execution efficiency by enabling early identification and termination of infeasible tasks. Third, it introduces a proxy mechanism that supports seamless data transfer between tools, thereby bypassing the transmission bottlenecks via LLMs. All of these designs are database-agnostic and can be transparently integrated with existing agent architectures. We also release an open-source implementation of BridgeScope for PostgreSQL. Evaluations on two novel benchmarks demonstrate that BridgeScope enables LLM agents to interact with databases more effectively. It reduces token usage by up to 80% through improved security awareness and uniquely supports data-intensive workflows beyond existing toolkits. These results establish BridgeScope as a robust foundation for next-generation intelligent data automation.

BL(u)E CRAB: Bluetooth Low Energy Connection Risk Assessment Benchmarking

Fri, 27 Feb 2026 00:00:00 +0000

Winter 2026 Week 8

Fri, 27 Feb 2026 00:00:00 +0000

Title	Algorithmic Data Minimization for Machine Learning over Internet-of-Things Data Streams
Authors	Ted Shaowang, Shinan Liu, Jonatas Marques, Nick Feamster, Sanjay Krishnan
Abstract	Machine learning can analyze vast amounts of data generated by IoT devices to identify patterns, make predictions, and enable real-time decision-making. This raises significant privacy concerns, necessitating the application of data minimization – a foundational principle in emerging data regulations, which mandates that service providers only collect data that is directly relevant and necessary for a specified purpose. Despite its importance, data minimization lacks a precise technical definition in the context of sensor data, where collections of weak signals make it challenging to apply a binary “relevant and necessary” rule. This paper provides a technical interpretation of data minimization in the context of sensor streams, explores practical methods for implementation, and addresses the challenges involved. Through our approach, we demonstrate that our framework can reduce user identifiability by up to 16.7% while maintaining accuracy loss below 1%, offering a viable path toward privacy-preserving IoT data processing.

UR2PhD program acceptance

Mon, 16 Feb 2026 00:00:00 +0000

About UR2PHD:

UR2PhD (Undergraduate Research to PhD) is a three-month, national virtual program run by the Computing Research Association that pairs undergraduate researchers with graduate student mentors in computing. The undergraduate research training course is a virtual, synchronous opportunity where undergraduate students receive support and training during their research with a faculty and graduate student mentor. It is designed for first-time researchers who want tostrengthen their technical and communication skills in the context of research.

Ambika Vyas and Nico Wood got accepted into the Undergraduate research training course for Spring of 2026.

“As an aspiring researcher, I’m grateful for this opportunity and the mentorship I will receive. I have been with the DIPr lab for a few months and have enjoyed participating in the research environment and working with Primal Pappachan & Anadi Shakya. Being able to continue this with additional support from computer science professionals at different institutions is something I’m looking forward to.” – Ambika Vyas

“As an undergraduate with an interest in research, I’m thankful for the opportunity provided by UR2PhD for a formal environment to build these skills and make connections with other researchers. Being new to the DIPr lab, I’m also very excited for the chance to work under the care of Anadi and Primal while Ambika and I progress through this course. I look forward to growing alongside each other and continuing to build community together going forward.” – Nico Wood

DIPr Lab at Northwest Database Society (2026) Meeting

Fri, 13 Feb 2026 00:00:00 +0000

The Northwest Database Society Annual Meeting brings together researchers and practitioners from the greater Pacific Northwest for a day of technical talks and networking on the broad topic of data management systems.

Orobosa Ekhator attended the full day event at University of Washington, Seattle. She presented a poster about the Clustering-Based Local Outlier Factor (CBLOF) classifier she developed for BL(u)E CRAB.

Winter 2026 Week 5

Fri, 06 Feb 2026 00:00:00 +0000

Title	I Can’t Believe It’s Not Yannakakis: Pragmatic Bitmap Filters in Microsoft SQL Server
Authors	Hangdong Zhao et al.
Abstract	The quest for optimal join processing has reignited interest in the Yannakakis algorithm, as researchers seek to realize its theoretical ideal in practice via bitmap filters instead of expensive semijoins. While this academic pursuit may seem distant from industrial practice, our investigation into production databases led to a startling discovery: over the last decade, Microsoft SQL Server has built an infrastructure for bitmap pre-filtering that subsumes the very spirit of Yannakakis! This is not a story of academia leading industry; but rather of industry practice, guided by pragmatic optimization, outpacing academic endeavors. This paper dissects this discovery. As a crucial contribution, we prove how SQL Server’s bitmap filters, pull-based execution, and Cascades optimizer conspire to not only consider, but often generate, instance-optimal plans, when it truly minimizes the estimated cost! Moreover, its rich plan search space reveals novel, largely overlooked pre-filtering opportunities on intermediate results, which approach strong semi-robust runtime for arbitrary join graphs. Instead of a verdict, this paper is an invitation: by exposing a system design that is long-hidden, we point our community towards a challenging yet promising research terrain.

Winter 2026 Week 4

Fri, 30 Jan 2026 00:00:00 +0000

Title	LOCATER: Cleaning WiFi Connectivity Datasets for Semantic Localization
Authors	Yiming Lin, Daokun Jiang, Roberto Yus, Georgios Bouloukakis, Andrew Chio, Sharad Mehrotra, Nalini Venkatasubramanian
Abstract	This paper explores the data cleaning challenges that arise in using WiFi connectivity data to locate users to semantic indoor locations such as buildings, regions, rooms. WiFi connectivity data consists of sporadic connections between devices and nearby WiFi access points (APs), each of which may cover a relatively large area within a building. Our system, entitled semantic LOCATion cleanER (LOCATER), postulates semantic localization as a series of data cleaning tasks - first, it treats the problem of determining the AP to which a device is connected between any two of its connection events as a missing value detection and repair problem. It then associates the device with the semantic subregion (e.g., a conference room in the region) by postulating it as a location disambiguation problem. LOCATER uses a bootstrapping semi-supervised learning method for coarse localization and a probabilistic method to achieve finer localization. The paper shows that LOCATER can achieve significantly high accuracy at both the coarse and fine levels.

Data Privacy Day 2026

Wed, 28 Jan 2026 00:00:00 +0000

Winter 2026 Week 2

Fri, 16 Jan 2026 00:00:00 +0000

Title	LLM-Driven Auto Configuration for Transient IoT Device Collaboration
Authors	Hetvi Shastri, Walid A. Hanafy, Li Wu, David Irwin, Mani Srivastava, Prashant Shenoy
Abstract	Today's Internet of Things (IoT) has evolved from simple sensing and actuation devices to those with embedded processing and intelligent services, enabling rich collaborations between users and their devices. However, enabling such collaboration becomes challenging when transient devices need to interact with host devices in temporarily visited environments. In such cases, fine-grained access control policies are necessary to ensure secure interactions; however, manually implementing them is often impractical for non-expert users. Moreover, at run-time, the system must automatically configure the devices and enforce such fine-grained access control rules. Additionally, the system must address the heterogeneity of devices. In this paper, we present CollabIoT, a system that enables secure and seamless device collaboration in transient IoT environments. CollabIoT employs a Large language Model (LLM)-driven approach to convert users' high-level intents to fine-grained access control policies. To support secure and seamless device collaboration, CollabIoT adopts capability-based access control for authorization and uses lightweight proxies for policy enforcement, providing hardware-independent abstractions. We implement a prototype of CollabIoT's policy generation and auto configuration pipelines and evaluate its efficacy on an IoT testbed and in large-scale emulated environments. We show that our LLM-based policy generation pipeline is able to generate functional and correct policies with 100% accuracy. At runtime, our evaluation shows that our system configures new devices in ~150 ms, and our proxy-based data plane incurs network overheads of up to 2 ms and access control overheads up to 0.3 ms.

Thesis Defense - Dylan Conklin

Tue, 09 Dec 2025 00:00:00 +0000

Dylan Conklin successfully defended is M.S. thesis on BL(u)E CRAB.

Committee: Primal Pappachan, Roberto Yus, Bart Massey, Nirupama Bulusu, Wu-Chang Feng

Abstract: The usage of Bluetooth Low Energy (BLE)-based tracker devices for stalking has become a salient privacy concern. Detecting unwanted or suspicious trackers is challenging due to their cross-platform compatibility issues, inconsistent detection methods, and lack of an industry-wide standard for detecting malicious devices. BL(u)E CRAB, Bluetooth Low Energy Connection Risk Assessment Benchmarking, scans and collects risk factors about nearby devices to classify them as suspicious or not. These risk factors include the number of encounters the user had with a device, the duration of time a device has been near the user, the distance a device has travelled with the user, the number of areas each device appeared in, the device’s proximity to the user, and the stability of the device’s signal strength. After collecting this information, BL(u)E CRAB uses one of several classifiers adapted to these risk metrics to determine whether a device is suspicious or not. We have integrated a multitude of new device classifier methods, including single- and multi-dimensional clustering methods. We evaluated these classifiers against existing methods using a diverse dataset of BLE tracker data in various real-world scenarios. The benchmark results show the efficacy of different classifiers in identifying suspicious BLE trackers. We also developed a full working prototype of BL(u)E CRAB that is an end-to-end solution that is easy to use, customizable, and can easily integrate other classifiers.

The thesis paper can be read here.

Fall 2025 Week 9

Wed, 26 Nov 2025 00:00:00 +0000

Title	SIEVE: Effective Filtered Vector Search with Collection of Indexes
Authors	Zhaoheng Li, et al.
Abstract	Real-world tasks such as recommending videos tagged kids can be reduced to finding similar vectors associated with hard predicates. This task, filtered vector search, is challenging as prior state-of-the-art graph-based (unfiltered) similarity search techniques degenerate when hard constraints are considered: effective graph-based filtered similarity search relies on sufficient connectivity for reaching similar items within a few hops. To consider predicates, recent works propose modifying graph traversal to visit only items that satisfy predicates. However, they fail to offer the just-a-few-hops property for a wide range of predicates: they must restrict predicates significantly or lose efficiency if only few items satisfy predicates. We propose an opposite approach: instead of constraining traversal, we build many indexes each serving different predicate forms. For effective construction, we devise a three-dimensional analytical model capturing relationships among index size, search time, and recall, with which we follow a workload-aware approach to pack as many useful indexes as possible into a collection. At query time, the analytical model is employed yet again to discern the one that offers the fastest search at a given recall. We show superior performance and support on datasets with varying selectivities and forms: our approach achieves up to 8.06 x speedup while having as low as 1% build time versus other indexes, with less than 2.15 x memory of a standard HNSW graph and modest knowledge of past workloads.

PaPrica-PS: Fine-Grained, Dynamic Access Control Policy Enforcement for Pub/Sub Systems

Wed, 26 Nov 2025 00:00:00 +0000

High-volume publish/subscribe (pub/sub) systems include collections of hardware and software components such as IoT sensors and the protocols that connect them. Many of these have heretofore lacked robust security and privacy controls by default despite there being significant security, safety, and privacy implications driving the need to control access to the data they generate and manage.

Examples of such pub/sub-based systems are those which power critical systems from smart buildings and factories to full city-wide device networks. In this project, we are developing a fine-grained access control model and enforcement mechanism to address this gap. Our proposed FGAC model builds upon Attribute-Based Access Control (ABAC) defining access rules based on the MQTT protocol message “topics”, attributes of the subscribers and publishers to those topics, as well as ephemeral and per-message context information.

Our framework is platform-agnostic and we implement the prototype for our experiments based on an off-the-shelf open source MQTT pub/sub system without altering the base code of that server itself.

Fall 2025 Week 8

Wed, 19 Nov 2025 00:00:00 +0000

Title	Adaptive Differentially Private Structural Entropy Minimization for Unsupervised Social Event Detection
Authors	Zhiwei Yang, et al.
Abstract	Social event detection refers to extracting relevant message clusters from social media data streams to represent specific events in the real world. Social event detection is important in numerous areas, such as opinion analysis, social safety, and decision-making. Most current methods are supervised and require access to large amounts of data. These methods need prior knowledge of the events and carry a high risk of leaking sensitive information in the messages, making them less applicable in open-world settings. Therefore, conducting unsupervised detection while fully utilizing the rich information in the messages and protecting data privacy remains a significant challenge. To this end, we propose a novel social event detection framework, ADP-SEMEvent, an unsupervised social event detection method that prioritizes privacy. Specifically, ADP-SEMEvent is divided into two stages, i.e., the construction stage of the private message graph and the clustering stage of the private message graph. In the first stage, an adaptive differential privacy approach is used to construct a private message graph. In this process, our method can adaptively apply differential privacy based on the events occurring each day in an open environment to maximize the use of the privacy budget. In the second stage, to address the reduction in data utility caused by noise, a novel 2-dimensional structural entropy minimization algorithm based on optimal subgraphs is used to detect events in the message graph. The highlight of this process is unsupervised and does not compromise differential privacy. Extensive experiments on two public datasets demonstrate that ADP-SEMEvent can achieve detection performance comparable to state-of-the-art methods while maintaining reasonable privacy budget parameters.

Fall 2025 Week 7

Wed, 12 Nov 2025 00:00:00 +0000

Title	Scribe: How Meta transports terabytes per second in real time
Authors	Manos Karpathiotakis, et al.
Abstract	Millions of web servers and a multitude of applications are producing ever-increasing amounts of data in real time at Meta. Regardless of how data is generated and how it is processed, there is a need for infrastructure that can accommodate the transport of arbitrarily large data streams from their generation location to their processing location with low latency. This paper presents Scribe, a multi-tenant message queue service that natively supports the requirements of Meta’s data-intensive applications, ingesting > 15 TB/s and serving > 110 TB/s to its consumers. Scribe relies on a multi-hop write path and opportunistic data placement to maximise write availability, whereas its read path adapts replica placement and representation based on the incoming workload as a means to minimise resource consumption for both Scribe and its downstreams. The wide range of Scribe use cases can pick from a range of offered guarantees, based on the trade-offs favourable for each one.

Fall 2025 Week 6

Wed, 05 Nov 2025 00:00:00 +0000

Title	Delta Sharing: An Open Protocol for Cross-Platform Data Sharing
Authors	Krishna Puttaswamy, et al.
Abstract	Organizations across industries increasingly rely on sharing data to drive collaboration, innovation, and business performance. However, securely and efficiently sharing live data across diverse platforms and adhering to varying governance requirements remains a significant challenge. Traditional approaches, such as FTP and proprietary in-data-warehouse solutions, often fail to meet the demands of interoperability, cost, scalability, and low overhead. This paper introduces Delta Sharing, an open protocol we developed in collaboration with industry partners, to overcome these limitations. Delta Sharing leverages open formats like Delta Lake and Apache Parquet alongside simple HTTP APIs to enable seamless, secure, and live data sharing across heterogeneous systems. Since its launch in 2021, Delta Sharing has been adopted by over 4000 enterprises and supported by hundreds of major software and data vendors. We discuss the key challenges in developing Delta Sharing and how our design addresses them. We also present, to our knowledge, the first large-scale study of production data sharing workloads offering insights into this emerging data platform capability.

DIPr Lab at URMP Poster Competition

Thu, 25 Sep 2025 00:00:00 +0000

Portland State University’s Undergraduate Research & Mentor Program (URMP) hosted a poster event that brings together student researchers and allows them to demonstrate their work. Spencer Henwood presented a poster about securing Publish-Subscribe (pub/sub) systems with fine-grained access control (FGAC).

UR2PhD program acceptance

Thu, 04 Sep 2025 00:00:00 +0000

About UR2PHD:

UR2PhD (Undergraduate Research to PhD) is a three-month, national virtual program run by the Computing Research Association that pairs undergraduate researchers with graduate student mentors in computing. As part of this, there is a structured mentorship course for graduate students. The course helps graduate mentors learn how to support undergraduate researchers effectively, with a focus on inclusive, research-based mentoring practices. Topics include how people learn in research settings, aligning expectations, articulating a mentoring philosophy, giving effective feedback, and helping mentees feel that they belong in the computing research community.

Orobosa Ekhator and Anadi Shakya got accepted into the Graduate Student Mentor Training Course for Fall 2025. The goal of the course is to provide mentors of undergraduate researchers with the essential skills necessary to build robust research settings.

“As a PhD student, this is helping me build skills I’ll need to eventually lead my own research projects and create a lab culture where undergraduates feel welcomed, supported, and able to see themselves as future researchers.” – Anadi

Summer 2025 Week 4

Wed, 20 Aug 2025 00:00:00 +0000

Title	TSB-UAD: An End-to-End Benchmark Suite for Univariate Time-Series Anomaly Detection
Authors	John Paparrizos ,Yuhao Kang , Paul Boniol , Ruey S. Tsay ,Themis Palpanas , Michael J. Franklin
Abstract	The detection of anomalies in time series has gained ample academic and industrial attention. However, no comprehensive benchmark exists to evaluate time-series anomaly detection methods. It is common to use (i) proprietary or synthetic data, often biased to support particular claims; or (ii) a limited collection of publicly available datasets. Consequently, we often observe methods performing exceptionally well in one dataset but surprisingly poorly in another, creating an illusion of progress. To address the issues above, we thoroughly studied over one hundred papers to identify, collect, process, and systematically format datasets proposed in the past decades. We summarize our effort in TSB-UAD, a new benchmark to ease the evaluation of univariate time-series anomaly detection methods. Overall, TSB-UAD contains 13766 time series with labeled anomalies spanning different domains with high variability of anomaly types, ratios, and sizes. TSB-UAD includes 18 previously proposed datasets containing 1980 time series and we contribute two collections of datasets. Specifically, we generate 958 time series using a principled methodology for transforming 126 time-series classification datasets into time series with labeled anomalies. In addition, we present data transformations with which we introduce new anomalies, resulting in 10828 time series with varying complexity for anomaly detection. Finally, we evaluate 12 representative methods demonstrating that TSB-UAD is a robust resource for assessing anomaly detection methods. We make our data and code available at www.timeseries.org/TSB-UAD. TSB-UAD provides a valuable, reproducible, and frequently updated resource to establish a leaderboard of univariate time-series anomaly detection methods.

Summer 2025 Week 3

Wed, 06 Aug 2025 00:00:00 +0000

Title	HoneyBee: Efficient Role-based Access Control for Vector Databases via Dynamic Partitioning
Authors	Hongbin Zhong, Matthew Lentz, Nina Narodytska, Adriana Szekeres, Kexin Rong
Abstract	As vector databases gain traction in enterprise applications, robust access control has become critical to safeguard sensitive data. Access control in these systems is often implemented through hybrid vector queries, which combine nearest neighbor search on vector data with relational predicates based on user permissions. However, existing approaches face significant trade-offs: creating dedicated indexes for each user minimizes query latency but introduces excessive storage redundancy, while building a single index and applying access control after vector search reduces storage overhead but suffers from poor recall and increased query latency. This paper introduces HoneyBee, a dynamic partitioning framework that bridges the gap between these approaches by leveraging the structure of Role-Based Access Control (RBAC) policies. RBAC, widely adopted in enterprise settings, groups users into roles and assigns permissions to those roles, creating a natural "thin waist" in the permission structure that is ideal for partitioning decisions. Specifically, HoneyBee produces overlapping partitions where vectors can be strategically replicated across different partitions to reduce query latency while controlling storage overhead. By introducing analytical models for the performance and recall of the vector search, HoneyBee formulates the partitioning strategy as a constrained optimization problem to dynamically balance storage, query efficiency, and recall. Evaluations on RBAC workloads demonstrate that HoneyBee reduces storage redundancy compared to role partitioning and achieves up to 6x faster query speeds than row-level security (RLS) with only 1.4x storage increase, offering a practical middle ground for secure and efficient vector search.

Summer 2025 Week 2

Wed, 23 Jul 2025 00:00:00 +0000

Title	An Elephant Under the Microscope: Analyzing the Interaction of Optimizer Components in PostgreSQL
Authors	Rico Bergmann, Claudio Hartmann, Dirk Habich, Wolfgang Lehner
Abstract	Despite an ever-growing corpus of novel query optimization strategies, the interaction of the core components of query optimizers is still not well understood. This situation can be problematic for two main reasons: On the one hand, this may cause surprising results when two components influence each other in an unexpected way. On the other hand, this can lead to wasted effort in regard to both engineering and research, e.g., when an improvement for one component is dwarfed or entirely canceled out by problems of another component. Therefore, we argue that making improvements to a single optimization component requires a thorough understanding of how these changes might affect the other components. To achieve this understanding, we present results of a comprehensive experimental analysis of the interplay in the traditional optimizer architecture using the widely-used PostgreSQL system as prime representative. Our evaluation and analysis revisit the core building blocks of such an optimizer, i.e. per-column statistics, cardinality estimation, cost model, and plan generation. In particular, we analyze how these building blocks influence each other and how they react when faced with faulty input, such as imprecise cardinality estimates. Based on our results, we draw novel conclusions and make recommendations on how these should be taken into account.

Summer 2025 Week 1

Wed, 09 Jul 2025 00:00:00 +0000

Title	Streaming Democratized: Ease Across the Latency Spectrum with Delayed View Semantics and Snowflake Dynamic Tables
Authors	Daniel Sotolongo, Daniel Mills, Tyler Akidau, Anirudh Santhiar, Attila-Péter Tóth, Botong Huang, Boyuan Zhang, Igor Belianski, Ling Geng, Matt Uhlar, Nikhil Shah, Olivia Zhou, Saras Nowak, Sasha Lionheart, Vlad Lifliand, Wendy Grus, Yiwen Zhu, Ankur Sharma, Dzmitry Pauliukevich, Enrico Sartorello, Ilaria Battiston, Ivan Kalev, Lawrence Benson, Leon Papke, Niklas Semmler, Till Merker, Yi Huang
Abstract	Streaming data pipelines remain challenging and expensive to build and maintain, despite significant advancements in stronger consistency, event time semantics, and SQL support over the last decade. Persistent obstacles continue to hinder usability, such as the need for manual incrementalization, semantic discrepancies across SQL implementations, and the lack of enterprise-grade operational features (e.g. granular access control, disaster recovery). While the rise of incremental view maintenance (IVM) as a way to integrate streaming with databases has been a huge step forward, transaction isolation in the presence of IVM remains underspecified, which leaves the maintenance of application-level invariants as a painful exercise for the user. Meanwhile, most streaming systems optimize for latencies of 100 milliseconds to 3 seconds, whereas many practical use cases are well-served by latencies ranging from seconds to tens of minutes. In this paper, we present delayed view semantics (DVS), a conceptual foundation that bridges the semantic gap between streaming and databases, and introduce Dynamic Tables, Snowflake’s declarative streaming transformation primitive designed to democratize analytical stream processing. DVS formalizes the intuition that stream processing is primarily a technique to eagerly compute derived results asynchronously, while also addressing the need to reason about the resulting system end to end. Dynamic Tables then offer two key advantages: ease of use through DVS, enterprise-grade features, and simplicity; as well as scalable cost efficiency via IVM with an architecture designed for diverse latency requirements. We first develop extensions to transaction isolation that permit the preservation of invariants in streaming applications. We then detail the implementation challenges of Dynamic Tables and our experience operating it at scale. Finally, we share insights into user adoption and discuss our vision for the future of stream processing.

If You Give a Website a Cookie: Educating Children About Online Privacy

Fri, 06 Jun 2025 00:00:00 +0000

Spring 2025 Week 9

Fri, 30 May 2025 00:00:00 +0000

Title	In-Database Time Series Clustering
Authors	Yunxiang Su, Kenny Ye Liang, Shaoxu Song
Abstract	Time series data are often clustered repeatedly across various time ranges to mine frequent subsequence patterns from different periods, which could further support downstream applications. Existing state-of-the-art (SOTA) time series clustering method, such as K-Shape, can proficiently cluster time series data referring to their shapes. However, in-database time series clustering problem has been neglected, especially in IoT scenarios with large-volume data and high efficiency demands. Most time series databases employ LSM-Tree based storage to support intensive writings, yet causing underlying data points out-of-order in timestamps. Therefore, to apply existing out-of-database methods, all data points must be fully loaded into memory and chronologically sorted. Additionally, out-of-database methods must cluster from scratch each time, making them inefficient when handling queries across different time ranges. In this work, we propose an in-database adaptation of SOTA time series clustering method K-Shape. Moreover, to solve the problem that K-Shape cannot efficiently handle long time series, we propose Medoid-Shape, as well as its in-database adaptation for further acceleration. Extensive experiments are conducted to demonstrate the higher efficiency of our proposals, with comparable effectiveness. Remarkably, all proposals have already been implemented in an open-source commodity time series database, Apache IoTDB.

Spring 2025 Week 8

Fri, 23 May 2025 00:00:00 +0000

Title	Highly Efficient and Scalable Access Control Mechanism for IoT Devices in Pervasive Environments
Authors	Alian Yu, Jian Kang, Wei Jiang and Dan Lin
Abstract	With the continuous advancement of sensing, networking, controlling, and computing technologies, there is a growing number of IoT (Internet of Things) devices emerging that are expected to integrate into public infrastructure in the near future. However, the deployment of these smart devices in public venues presents new challenges for existing access control mechanisms, particularly in terms of efficiency. To address these challenges, we have developed a highly efficient and scalable access control mechanism that enables automatic and fine-grained access control management while incurring low overhead in large-scale settings. Our mechanism includes a dual-hierarchy access control structure and associated information retrieval algorithms, which we have used to develop a large-scale IoT device access control system called FACT+. FACT+ overcomes the efficiency issues of granting and inquiring access control status over millions of devices in pervasive environments. Additionally, our system offers a pay-and-consume scheme and plug-and-play device management for convenient adoption by service providers. We have conducted extensive experiments to demonstrate the practicality, effectiveness, and efficiency of our access control mechanism.

Spring 2025 Week 6

Fri, 09 May 2025 00:00:00 +0000

Title	Grouping, Subsumption, and Disjunctive Join Optimizations in Oracle
Authors	Rafi Ahmed, Krishna Kantikiran Pasupuleti, Sriram Tirupattur, Lei Sheng, Hong Su, Mohamed Ziauddin
Abstract	Query optimization must evolve with new workloads. As analytic and data warehouse workloads become more ubiquitous, optimization techniques that reduce the amount of data processed during query execution, enable shared computation and avoid expensive data access and joins must be rigorously explored. In this paper, we present aggregate-decomposition techniques as enhancements to an existing query transformation that performs grouping before joins. Consequently, the transformation generates more query rewrite candidates and can also be applied to a larger set of queries. Further, we introduce two new query transformations, i) subsumption of views and subqueries that explores opportunities for sharing computation and ii) union-all duplicator transformation for queries with disjunctive join predicates that removes the need for multiple data access and joins. These techniques are applicable to commonly noticed query patterns in customer workloads and provide significant performance benefit as indicated in our performance study. They have been implemented in Oracle RDBMS.

DIPr Lab at URMP Poster Competition

Tue, 06 May 2025 00:00:00 +0000

Portland State University’s Undergraduate Research & Mentor Program (URMP) hosted a poster competition that brings together student researchers and allows them to demonstrate their work. Satvik Mudgal presented a poster about Sieve, and placed as the second runner-up.

The poster can be found here.

Spring 2025 Week 4

Fri, 25 Apr 2025 00:00:00 +0000

Title	How good are query optimizers, really?
Authors	Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, Thomas Neumann
Abstract	Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentally revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries. We investigate the quality of industrial-strength cardinality estimators and find that all estimators routinely produce large errors. We further show that while estimates are essential for finding a good join order, query performance is unsatisfactory if the query engine relies too heavily on these estimates. Using another set of experiments that measure the impact of the cost model, we find that it has much less influence on query performance than the cardinality estimates. Finally, we investigate plan enumeration techniques comparing exhaustive dynamic programming with heuristic algorithms and find that exhaustive enumeration improves performance despite the sub-optimal cardinality estimates.

Spring 2025 Week 3

Fri, 18 Apr 2025 00:00:00 +0000

Title	PDX: A Data Layout for Vector Similarity Search
Authors	Leonardo Kuffo, Elena Krippner, and Peter Boncz from CWI Amsterdam, The Netherlands
Abstract	We propose Partition Dimensions Across (PDX), a data layout for vectors (e.g., embeddings) that, similar to PAX, stores multiple vectors in one block, using a vertical layout for the dimensions (Figure 1). PDX accelerates exact and approximate similarity search thanks to its dimension-by-dimension search strategy that operates on multiple-vectors-at-a-time in tight loops. It beats SIMD-optimized distance kernels on standard horizontal vector storage (avg 40% faster), only relying on scalar code that gets auto-vectorized. We combined the PDX layout with recent dimension-pruning algorithms ADSampling and BSA that accelerate approximate vector search. We found that these algorithms on the horizontal vector layout can lose to SIMD-optimized linear scans, even if they are SIMD-optimized. However, when used on PDX, their benefit is restored to 2-7x. We find that search on PDX is especially fast if a limited number of dimensions has to be scanned fully, which is what the dimension-pruning approaches do. We finally introduce PDX-BOND, an even more flexible dimension-pruning strategy, with good performance on exact search and reasonable performance on approximate search. Unlike previous pruning algorithms, it can work on vector data "as-is" without preprocessing; making it attractive for vector databases with frequent updates.

Spring 2025 Week 1

Fri, 04 Apr 2025 00:00:00 +0000

Title	Navigating Labels and Vectors: A Unified Approach to Filtered Approximate Nearest Neighbor Search
Authors	Yuzheng Cai, Jiayang Shi, Yizhuo Chen, Weigue Zheng
Abstract	Given a query vector, approximate nearest neighbor search (ANNS) aims to retrieve similar vectors from a set of high-dimensional base vectors. However, many real-world applications jointly query both vector data and structured data, imposing label constraints such as attributes and keywords on the search, known as filtered ANNS. Effectively incorporating filtering conditions with vector similarity presents significant challenges, including index for dynamically filtered search space, agnostic query labels, computational overhead for label-irrelevant vectors, and potential inadequacy in returning results. To tackle these challenges, we introduce a novel approach called the Label Navigating Graph, which encodes the containment relationships of label sets for all vectors. Built upon graph-based ANNS methods, we develop a general framework termed Unified Navigating Graph (UNG) to bridge the gap between label set containment and vector proximity relations. UNG offers several advantages, including versatility in supporting any query label size and specificity, fidelity in exclusively searching filtered vectors, completeness in providing sufficient answers, and adaptability in integration with most graph-based ANNS algorithms. Extensive experiments on real datasets demonstrate that the proposed framework outperforms all baselines, achieving 10x speedups at the same accuracy.

DIPr Lab at PerCom 2025

Tue, 18 Mar 2025 00:00:00 +0000

IEEE PerCom is an annual meeting that brings together researchers and practitioners of pervasive computing and communications for technical talks and networking on the broad topic of pervasive computing. Dylan Conklin presented the demo paper and a live demonstration of BL(u)E CRAB.

We thank the National Science Foundation (NSF) for awarding Dylan Conklin a travel grant for this conference.

BL(u)E CRAB: A User-Centric Framework for Identifying Suspicious Bluetooth Trackers

Mon, 17 Mar 2025 00:00:00 +0000

My Privacy Awareness Learning Games (MyPAL Games)

Mon, 03 Mar 2025 00:00:00 +0000

My Privacy Awareness Learning Games (MyPAL Games) is an educational website design to help children learn about different aspects of online privacy. It presents lessons in the format of comics, and then quizzes them on their knowledge after each lesson.

Fine Grained Access Control in Vector Databases

Thu, 20 Feb 2025 00:00:00 +0000

Vector databases are particularly well-suited for similarity search using search algorithms like approximate nearest neighbor (ANN) search and they are used in development of Retrieval-Augmented Generation (RAG) systems, to reduce hallucinations in responses of AI systems. One significant challenge in using vector databases, especially in applications like RAG, is ensuring data privacy and security. For example, a clothing company that builds an AI chatbot that interacts with a vector database containing customer orders and product data could expose sensitive customer information without proper access restrictions. Incorporating Fine-Grained Access Control in vector databases is important for enforcing user preferences on data sharing and complying with privacy regulations. This project explores how to embed fine-grained access control within vector databases to ensure secure and privacy-compliant query answering.

DIPr Lab at NorthWest Database Society (2025) Meeting

Fri, 07 Feb 2025 00:00:00 +0000

DIPr lab members (from left to right): Orobosa Ekhator, Anadi Shakya, and Primal Pappachan attended the full day event at University of Washington, Seattle.

CyberPDX 2024

Mon, 29 Jul 2024 00:00:00 +0000

About CyberPDX:

CyberPDX is a no-cost 5-day residential summer STEAM camp hosted at the Portland State University Maseeh College of Engineering & Computer Science. This year’s camp is designed for Native and Indigenous high school students to introduce them to cybersecurity skills, principles, policies, and careers through an interactive STEAM curriculum and career exposure. All are welcome. The camp activities will be led by many PSU faculty and local experts, which includes local Indigenous leaders. This is an incredible opportunity for Native and Indigenous students to get an immersive, interdisciplinary, and fun experience on a college campus. No prior experience is necessary, and beginners are welcome!

This year’s CyberPDX had threads on Cryptography, Programming, Privacy, Policy and Film. Primal Pappachan was the instructor for the Privacy thread and Dylan Conklin was the Teaching Assistant. The topics covered in this thread include

Importance of Privacy
Difference between Privacy and Security
Debating risk versus convenience of app features
Reading Privacy Policies of Social Media Applications
Programming in Python to use Generative AI as a Privacy Assistant

To learn more about, please check out the CyberPDX website.

RPE presentation - Anadi Shakya

Fri, 21 Jun 2024 00:00:00 +0000

Anadi Shakya successfully completed the oral part of her Research Profiency Examination (RPE) on ‘Sieve and Cache: Scalable Access Control for Dynamic IoT Applications’.

Committee Members: Primal Pappachan (Portland State University) ; Dave Maier (Portland State University) ; Wu-chang Feng (Portland State University)

Abstract: The proliferation of smart technologies and newer privacy regulations necessitate effective management of user preferences about data sharing using access control policies. Current Database Management Systems (DBMS) struggle to efficiently answer queries while enforcing a large number of Fine-Grained Access Control (FGAC) policies. Sieve is a middleware for relational DBMSs that generate guarded expressions, which are efficient rewriting of queries with a given set of policies. In this paper, we extend Sieve with caching of guarded expressions to better handle dynamic workloads involving a series of policy insertions and queries at different frequencies. Our novel caching approach includes a replacement policy based on clock algorithm and a refresh strategy based on cost analysis to select between regeneration and update of guarded expressions. We also develop a workload generator that creates different workloads with different distributions of policies, and queries that are relevant to a chosen scenario. These workloads simulate realistic IoT scenarios, such as smart campus applications. Experimental results show that Sieve, enhanced with caching, is effective in terms of cache-hits and efficient in terms of system load and latency of query answering, in dynamic environments.

DIPr Lab at 2024 Summer Research Academy (SRA)

Thu, 20 Jun 2024 00:00:00 +0000

Summer Research Academy (SRA) is designed to engage undergraduate students interested in making a difference in their communities by pursuing research opportunities in STEM, biomedical, behavioral, clinical, health, and social sciences. Students do not have to have prior research experience to participate!

DIPr lab had a table at SRA 2024 with the following members attending (from left to right): Nicholas G.E. Morales, Michael Howard, Satvik Mudgal, Steve Willoughby, Primal Pappachan, and Dylan Conklin. We had many interested undergraduates stop by our table to learn about our various research projects.

To learn more about, please check out their website 2024 SRA website.

Accord

Sat, 01 Jun 2024 00:00:00 +0000

Users are increasingly adopting collaborative cloud services like Google Drive. The lack of fine-grained access controls on many cloud services make actions that violate the expectations of other users likely, resulting in multiuser conflicts. For example, a user with editor permissions may add a user outside the organization and revoke the permissions of another user, all without consent from the original resource owner. These multiuser conflicts may compromise a resource’s confidentiality, integrity, or availability, leading to a lack of trust in cloud services.

ACCORD is a web application built on top of Google Drive which prevents and detects multiuser conflicts. It employs a simulator to help users preemptively identify potential conflicts and assist them in defining action constraints. Then, using these action constraints, ACCORD can automatically detect future conflicts and suggest resolutions.

Currently, we are testing the scalability and practicality of ACCORD with larger numbers of users and resources.

BL(u)E CRAB

Sat, 01 Jun 2024 00:00:00 +0000

Detecting unwanted or suspicious Bluetooth Low Energy (BLE)-based trackers is challenging, due in part to cross-platform compatibility issues, and inconsistent detection methods. BL(u)E CRAB identifies suspicious BLE trackers based on various risk factors including the number of encounters, time with the user, distance traveled with the user, number of areas each device appeared in and device proximity to user. BL(u)E CRAB presents this information in an intuitive way to help users determine which devices pose the biggest threat to them based on their context.

Sieve

Sat, 01 Jun 2024 00:00:00 +0000

SIEVE is a versatile middleware that enhances access control in DBMS, enabling efficient query processing even with a large number of access control policies. We’re currently integrating caching to further improve query performance. Additionally, we’ve developed a workload generator that simulates various scenarios to test policy models and ensure access control compliance, reflecting real-world conditions.

Tattletale

Sat, 01 Jun 2024 00:00:00 +0000

Tattletale uses denial constraints to discovery data inferences inside of a database relative to sensitive cells. The cells that make up the denial constraints are then checked to see which cells infer information on them. In the end all the cells that infer data on the sensitive cells and the cells that could be used to reconstruct those inferences are placed into a list which is used to generate a view that does not contain those cells. Since inference can only be reconstructed as long as only one predicate is missing we can use that to minimize how many cells we have to hide. The benefit of Tattletale is that it provides protection against inference which access control lists don’t provide. The current challenge is trying to improve the run time performance and decrease the number of cells that have to be hidden while also guaranteeing a certain level of protection against reconstruction.Tattletale uses denial constraints to discovery data inferences inside of a database relative to sensitive cells. The cells that make up the denial constraints are then checked to see which cells infer information on them. In the end all the cells that infer data on the sensitive cells and the cells that could be used to reconstruct those inferences are placed into a list which is used to generate a view that does not contain those cells. Since inference can only be reconstructed as long as only one predicate is missing we can use that to minimize how many cells we have to hide. The benefit of Tattletale is that it provides protection against inference which access control lists don’t provide. The current challenge is trying to improve the run time performance and decrease the number of cells that have to be hidden while also guaranteeing a certain level of protection against reconstruction.

ACCORD: Constraint-driven mediation of multi-user conflicts in cloud services

Mon, 13 May 2024 00:00:00 +0000

Welcome to DIPr lab at PSU

Tue, 30 Jan 2024 00:00:00 +0000

The Database & Internet Privacy Lab offically starts at the Portland State University. We are a diverse group of researchers studying problems in user privacy, data management, access control, differential privacy, systems, algorithms, and privacy regulations.

GenAIPABench: A Benchmark for Generative AI-based Privacy Assistants

Tue, 19 Dec 2023 00:00:00 +0000

About

Sat, 04 Nov 2023 00:00:00 +0000

People

Mon, 24 Oct 2022 00:00:00 +0000

Don't Be a Tattle-Tale: Preventing Leakages through Data Dependencies on Access Control Protected Data

Fri, 01 Jul 2022 00:00:00 +0000

Mon, 01 Jan 0001 00:00:00 +0000